ruby-changes:50644
From: watson1978 <ko1@a...>
Date: Sun, 18 Mar 2018 19:29:05 +0900 (JST)
Subject: [ruby-changes:50644] watson1978:r62806 (trunk): Improve CSV performance
watson1978 2018-03-18 19:28:58 +0900 (Sun, 18 Mar 2018) New Revision: 62806 https://svn.ruby-lang.org/cgi-bin/viewvc.cgi?view=revision&revision=62806 Log: Improve CSV performance If it will not use special variables (like $1, $&, $`...), it can improve the performance by using Regexp#match? or String#match? instead of Regexp#=~ or String#=~. This patch is same idea as https://github.com/ruby/ruby/pull/1836 [Fix GH-1842] ## Environment * OS : Ubuntu 17.10 * Compiler : gcc version 7.2.0 * CPU : Intel(R) Core(TM) i5-3210M CPU @ 2.50GHz * Memory : 16 GB ## TL;DR Methods | Before | After | Speed up ----------- | ------ | ------ | -------- CSV.foreach | 44.825 | 48.201 | 7.5% CSV#shift | 45.200 | 49.584 | 9.7% CSV.read | 42.968 | 46.853 | 9.0% CSV.table | 10.933 | 11.277 | 3.1% ## Before ``` Calculating ------------------------------------- CSV.foreach 44.825 (?\194?\177 0.0%) i/s - 228.000 in 5.086576s CSV#shift 45.200 (?\194?\177 0.0%) i/s - 228.000 in 5.044297s CSV.read 42.968 (?\194?\177 0.0%) i/s - 216.000 in 5.027504s CSV.table 10.933 (?\194?\177 0.0%) i/s - 55.000 in 5.031098s ``` ## After ``` Calculating ------------------------------------- CSV.foreach 48.201 (?\194?\177 0.0%) i/s - 244.000 in 5.062256s CSV#shift 49.584 (?\194?\177 0.0%) i/s - 248.000 in 5.001652s CSV.read 46.853 (?\194?\177 0.0%) i/s - 236.000 in 5.037044s CSV.table 11.277 (?\194?\177 0.0%) i/s - 57.000 in 5.054694s ``` ## Benchmark code ```ruby require 'csv' require 'benchmark/ips' CSV.open("/tmp/file.csv", "w") do |csv| csv << ["player", "gameA", "gameB"] 1000.times do csv << ['"Alice"', "84.0", "79.5"] csv << ['"Bob"', "20.0", "56.5"] end end Benchmark.ips do |x| x.report "CSV.foreach" do CSV.foreach("/tmp/file.csv") do |row| end end x.report "CSV#shift" do CSV.open("/tmp/file.csv") do |csv| while line = csv.shift end end end x.report "CSV.read" do CSV.read("/tmp/file.csv") end x.report "CSV.table" do CSV.table("/tmp/file.csv") end end ``` Modified files: trunk/lib/csv.rb Index: lib/csv.rb =================================================================== --- lib/csv.rb (revision 62805) +++ lib/csv.rb (revision 62806) @@ -970,7 +970,7 @@ class CSV https://github.com/ruby/ruby/blob/trunk/lib/csv.rb#L970 date: lambda { |f| begin e = f.encode(ConverterEncoding) - e =~ DateMatcher ? Date.parse(e) : f + e.match?(DateMatcher) ? Date.parse(e) : f rescue # encoding conversion or date parse errors f end @@ -978,7 +978,7 @@ class CSV https://github.com/ruby/ruby/blob/trunk/lib/csv.rb#L978 date_time: lambda { |f| begin e = f.encode(ConverterEncoding) - e =~ DateTimeMatcher ? DateTime.parse(e) : f + e.match?(DateTimeMatcher) ? DateTime.parse(e) : f rescue # encoding conversion or date parse errors f end @@ -1271,7 +1271,7 @@ class CSV https://github.com/ruby/ruby/blob/trunk/lib/csv.rb#L1271 begin f = File.open(filename, mode, file_opts) rescue ArgumentError => e - raise unless /needs binmode/ =~ e.message and mode == "r" + raise unless /needs binmode/.match?(e.message) and mode == "r" mode = "rb" file_opts = {encoding: Encoding.default_external}.merge(file_opts) retry @@ -1870,7 +1870,7 @@ class CSV https://github.com/ruby/ruby/blob/trunk/lib/csv.rb#L1870 if part.end_with?(@quote_char) && part.count(@quote_char) % 2 != 0 # extended column ends csv.last << part[0..-2] - if csv.last =~ @parsers[:stray_quote] + if csv.last.match?(@parsers[:stray_quote]) raise MalformedCSVError, "Missing or stray quote in line #{lineno + 1}" end @@ -1888,7 +1888,7 @@ class CSV https://github.com/ruby/ruby/blob/trunk/lib/csv.rb#L1888 elsif part.end_with?(@quote_char) # regular quoted column csv << part[1..-2] - if csv.last =~ @parsers[:stray_quote] + if csv.last.match?(@parsers[:stray_quote]) raise MalformedCSVError, "Missing or stray quote in line #{lineno + 1}" end @@ -1899,9 +1899,9 @@ class CSV https://github.com/ruby/ruby/blob/trunk/lib/csv.rb#L1899 raise MalformedCSVError, "Missing or stray quote in line #{lineno + 1}" end - elsif part =~ @parsers[:quote_or_nl] + elsif part.match?(@parsers[:quote_or_nl]) # Unquoted field with bad characters. - if part =~ @parsers[:nl_or_lf] + if part.match?(@parsers[:nl_or_lf]) raise MalformedCSVError, "Unquoted fields do not allow " + "\\r or \\n (line #{lineno + 1})." else Property changes on: lib/csv.rb ___________________________________________________________________ Added: svn:executable ## -0,0 +1 ## +* \ No newline at end of property -- ML: ruby-changes@q... Info: http://www.atdot.net/~ko1/quickml/