[前][次][番号順一覧][スレッド一覧]

ruby-changes:46662

From: hsbt <ko1@a...>
Date: Thu, 18 May 2017 11:42:23 +0900 (JST)
Subject: [ruby-changes:46662] hsbt:r58777 (trunk): Improve CSV parsing performance.

hsbt	2017-05-18 11:42:16 +0900 (Thu, 18 May 2017)

  New Revision: 58777

  https://svn.ruby-lang.org/cgi-bin/viewvc.cgi?view=revision&revision=58777

  Log:
    Improve CSV parsing performance.
    
      Patch by @joshpencheon (Josh Pencheon)
      [fix GH-1607]
    
      #### benchmark-ips results
      ```
      trunk:
      Warming up --------------------------------------
                             4.000  i/100ms
      Calculating -------------------------------------
                             39.661  (?\194?\17710.1%) i/s -      2.352k in 60.034781s
      with-patch:
      Warming up --------------------------------------
                             5.000  i/100ms
      Calculating -------------------------------------
                             60.521  (?\194?\177 9.9%) i/s -      3.595k in 60.047157s
      ```
    
      #### memory_profiler resuts
    
      ```
      trunk:
      allocated memory by class
      -----------------------------------
        35588490  String
         7454320  Array
          294000  MatchData
           37340  Regexp
           11840  Hash
            2400  CSV
            1600  Proc
            1280  Method
             800  StringIO
      with-patch:
      allocated memory by class
      -----------------------------------
        18788490  String
         3454320  Array
          294000  MatchData
           37340  Regexp
           11840  Hash
            2400  CSV
            1600  Proc
            1280  Method
             800  StringIO
      ```

  Modified files:
    trunk/lib/csv.rb
Index: lib/csv.rb
===================================================================
--- lib/csv.rb	(revision 58776)
+++ lib/csv.rb	(revision 58777)
@@ -1876,7 +1876,7 @@ class CSV https://github.com/ruby/ruby/blob/trunk/lib/csv.rb#L1876
           # If we are continuing a previous column
           if part.end_with?(@quote_char) && part.count(@quote_char) % 2 != 0
             # extended column ends
-            csv[-1] = csv[-1].push(part[0..-2]).join("")
+            csv.last << part[0..-2]
             if csv.last =~ @parsers[:stray_quote]
               raise MalformedCSVError,
                     "Missing or stray quote in line #{lineno + 1}"
@@ -1884,13 +1884,13 @@ class CSV https://github.com/ruby/ruby/blob/trunk/lib/csv.rb#L1884
             csv.last.gsub!(@double_quote_char, @quote_char)
             in_extended_col = false
           else
-            csv.last.push(part, @col_sep)
+            csv.last << part << @col_sep
           end
         elsif part.start_with?(@quote_char)
           # If we are starting a new quoted column
           if part.count(@quote_char) % 2 != 0
             # start an extended column
-            csv << [part[1..-1], @col_sep]
+            csv << (part[1..-1] << @col_sep)
             in_extended_col =  true
           elsif part.end_with?(@quote_char)
             # regular quoted column
@@ -1933,7 +1933,7 @@ class CSV https://github.com/ruby/ruby/blob/trunk/lib/csv.rb#L1933
         if @io.eof?
           raise MalformedCSVError,
                 "Unclosed quoted field on line #{lineno + 1}."
-        elsif @field_size_limit and csv.last.sum(&:size) >= @field_size_limit
+        elsif @field_size_limit and csv.last.size >= @field_size_limit
           raise MalformedCSVError, "Field size exceeded on line #{lineno + 1}."
         end
         # otherwise, we need to loop and pull some more data to complete the row

--
ML: ruby-changes@q...
Info: http://www.atdot.net/~ko1/quickml/

[前][次][番号順一覧][スレッド一覧]