[前][次][番号順一覧][スレッド一覧]

ruby-changes:50644

From: watson1978 <ko1@a...>
Date: Sun, 18 Mar 2018 19:29:05 +0900 (JST)
Subject: [ruby-changes:50644] watson1978:r62806 (trunk): Improve CSV performance

watson1978	2018-03-18 19:28:58 +0900 (Sun, 18 Mar 2018)

  New Revision: 62806

  https://svn.ruby-lang.org/cgi-bin/viewvc.cgi?view=revision&revision=62806

  Log:
    Improve CSV performance
    
    If it will not use special variables (like $1, $&, $`...),
    it can improve the performance by using Regexp#match? or String#match? instead of Regexp#=~ or String#=~.
    
    This patch is same idea as https://github.com/ruby/ruby/pull/1836
    
    [Fix GH-1842]
    
    ## Environment
    * OS : Ubuntu 17.10
    * Compiler : gcc version 7.2.0
    * CPU : Intel(R) Core(TM) i5-3210M CPU @ 2.50GHz
    * Memory : 16 GB
    
    ## TL;DR
    Methods     | Before | After  | Speed up
    ----------- | ------ | ------ | --------
    CSV.foreach | 44.825 | 48.201 | 7.5%
    CSV#shift   | 45.200 | 49.584 | 9.7%
    CSV.read    | 42.968 | 46.853 | 9.0%
    CSV.table   | 10.933 | 11.277 | 3.1%
    
    ## Before
    ```
    Calculating -------------------------------------
             CSV.foreach     44.825  (?\194?\177 0.0%) i/s -    228.000  in   5.086576s
               CSV#shift     45.200  (?\194?\177 0.0%) i/s -    228.000  in   5.044297s
                CSV.read     42.968  (?\194?\177 0.0%) i/s -    216.000  in   5.027504s
               CSV.table     10.933  (?\194?\177 0.0%) i/s -     55.000  in   5.031098s
    ```
    
    ## After
    ```
    Calculating -------------------------------------
             CSV.foreach     48.201  (?\194?\177 0.0%) i/s -    244.000  in   5.062256s
               CSV#shift     49.584  (?\194?\177 0.0%) i/s -    248.000  in   5.001652s
                CSV.read     46.853  (?\194?\177 0.0%) i/s -    236.000  in   5.037044s
               CSV.table     11.277  (?\194?\177 0.0%) i/s -     57.000  in   5.054694s
    ```
    
    ## Benchmark code
    ```ruby
    require 'csv'
    require 'benchmark/ips'
    
    CSV.open("/tmp/file.csv", "w") do |csv|
      csv << ["player", "gameA", "gameB"]
      1000.times do
        csv << ['"Alice"', "84.0", "79.5"]
        csv << ['"Bob"', "20.0", "56.5"]
      end
    end
    
    Benchmark.ips do |x|
      x.report "CSV.foreach" do
        CSV.foreach("/tmp/file.csv") do |row|
        end
      end
    
      x.report "CSV#shift" do
        CSV.open("/tmp/file.csv") do |csv|
          while line = csv.shift
          end
        end
      end
    
      x.report "CSV.read" do
        CSV.read("/tmp/file.csv")
      end
    
      x.report "CSV.table" do
        CSV.table("/tmp/file.csv")
      end
    end
    ```

  Modified files:
    trunk/lib/csv.rb
Index: lib/csv.rb
===================================================================
--- lib/csv.rb	(revision 62805)
+++ lib/csv.rb	(revision 62806)
@@ -970,7 +970,7 @@ class CSV https://github.com/ruby/ruby/blob/trunk/lib/csv.rb#L970
     date:      lambda { |f|
       begin
         e = f.encode(ConverterEncoding)
-        e =~ DateMatcher ? Date.parse(e) : f
+        e.match?(DateMatcher) ? Date.parse(e) : f
       rescue  # encoding conversion or date parse errors
         f
       end
@@ -978,7 +978,7 @@ class CSV https://github.com/ruby/ruby/blob/trunk/lib/csv.rb#L978
     date_time: lambda { |f|
       begin
         e = f.encode(ConverterEncoding)
-        e =~ DateTimeMatcher ? DateTime.parse(e) : f
+        e.match?(DateTimeMatcher) ? DateTime.parse(e) : f
       rescue  # encoding conversion or date parse errors
         f
       end
@@ -1271,7 +1271,7 @@ class CSV https://github.com/ruby/ruby/blob/trunk/lib/csv.rb#L1271
     begin
       f = File.open(filename, mode, file_opts)
     rescue ArgumentError => e
-      raise unless /needs binmode/ =~ e.message and mode == "r"
+      raise unless /needs binmode/.match?(e.message) and mode == "r"
       mode = "rb"
       file_opts = {encoding: Encoding.default_external}.merge(file_opts)
       retry
@@ -1870,7 +1870,7 @@ class CSV https://github.com/ruby/ruby/blob/trunk/lib/csv.rb#L1870
           if part.end_with?(@quote_char) && part.count(@quote_char) % 2 != 0
             # extended column ends
             csv.last << part[0..-2]
-            if csv.last =~ @parsers[:stray_quote]
+            if csv.last.match?(@parsers[:stray_quote])
               raise MalformedCSVError,
                     "Missing or stray quote in line #{lineno + 1}"
             end
@@ -1888,7 +1888,7 @@ class CSV https://github.com/ruby/ruby/blob/trunk/lib/csv.rb#L1888
           elsif part.end_with?(@quote_char)
             # regular quoted column
             csv << part[1..-2]
-            if csv.last =~ @parsers[:stray_quote]
+            if csv.last.match?(@parsers[:stray_quote])
               raise MalformedCSVError,
                     "Missing or stray quote in line #{lineno + 1}"
             end
@@ -1899,9 +1899,9 @@ class CSV https://github.com/ruby/ruby/blob/trunk/lib/csv.rb#L1899
             raise MalformedCSVError,
                   "Missing or stray quote in line #{lineno + 1}"
           end
-        elsif part =~ @parsers[:quote_or_nl]
+        elsif part.match?(@parsers[:quote_or_nl])
           # Unquoted field with bad characters.
-          if part =~ @parsers[:nl_or_lf]
+          if part.match?(@parsers[:nl_or_lf])
             raise MalformedCSVError, "Unquoted fields do not allow " +
                                      "\\r or \\n (line #{lineno + 1})."
           else

Property changes on: lib/csv.rb
___________________________________________________________________
Added: svn:executable
## -0,0 +1 ##
+*
\ No newline at end of property

--
ML: ruby-changes@q...
Info: http://www.atdot.net/~ko1/quickml/

[前][次][番号順一覧][スレッド一覧]