ruby-changes:40958
From: normal <ko1@a...>
Date: Fri, 11 Dec 2015 17:24:17 +0900 (JST)
Subject: [ruby-changes:40958] normal:r53037 (trunk): hash.c (rb_num_hash_start): avoid pathological behavior
normal 2015-12-11 17:23:46 +0900 (Fri, 11 Dec 2015) New Revision: 53037 http://svn.ruby-lang.org/cgi-bin/viewvc.cgi?view=revision&revision=53037 Log: hash.c (rb_num_hash_start): avoid pathological behavior The OR-ing itself is bad for a hash function, and shifting 3 bits left was not enough to undo the damage done by shifting (RUBY_SPECIAL_SHIFT+3) bits right. Experimentally, shifting 16-17 bits seemed to work well in preparing the number for murmur hash. Add a few more benchmarks to based on bm_hash_shift to ensure we don't hurt performance too much with tweaks. I'm pretty confident about this change and commiting it now; especially since we're still using Murmur behind it (but perhaps we can update to a newer hash from Murmur...) [ruby-core:72028] [Feature #11405] target 0: a (ruby 2.3.0dev (2015-12-11 trunk 53027) [x86_64-linux]) at "/home/ew/rrrr/b/ruby" target 1: b (ruby 2.3.0dev (2015-12-11 master 53027) [x86_64-linux]) at "/home/ew/ruby/b/ruby" benchmark results: minimum results in each 5 measurements. Execution time (sec) name a b hash_aref_dsym 0.279 0.276 hash_aref_dsym_long 4.951 4.936 hash_aref_fix 0.281 0.283 hash_aref_flo 0.060 0.060 hash_aref_miss 0.409 0.410 hash_aref_str 0.387 0.385 hash_aref_sym 0.275 0.270 hash_aref_sym_long 0.410 0.411 hash_flatten 0.252 0.237 hash_ident_flo 0.035 0.032 hash_ident_num 0.254 0.251 hash_ident_obj 0.252 0.256 hash_ident_str 0.250 0.252 hash_ident_sym 0.259 0.270 hash_keys 0.267 0.267 hash_shift 0.016 0.015 hash_shift_u16 0.074 0.072 hash_shift_u24 0.071 0.071 hash_shift_u32 0.073 0.072 hash_to_proc 0.008 0.008 hash_values 0.263 0.264 Speedup ratio: compare with the result of `a' (greater is better) name b hash_aref_dsym 1.009 hash_aref_dsym_long 1.003 hash_aref_fix 0.993 hash_aref_flo 1.001 hash_aref_miss 0.996 hash_aref_str 1.006 hash_aref_sym 1.017 hash_aref_sym_long 0.998 hash_flatten 1.061 hash_ident_flo 1.072 hash_ident_num 1.012 hash_ident_obj 0.987 hash_ident_str 0.993 hash_ident_sym 0.959 hash_keys 0.997 hash_shift 1.036 hash_shift_u16 1.039 hash_shift_u24 1.001 hash_shift_u32 1.017 hash_to_proc 1.001 hash_values 0.995 Added files: trunk/benchmark/bm_hash_shift_u16.rb trunk/benchmark/bm_hash_shift_u24.rb trunk/benchmark/bm_hash_shift_u32.rb Modified files: trunk/ChangeLog trunk/hash.c Index: ChangeLog =================================================================== --- ChangeLog (revision 53036) +++ ChangeLog (revision 53037) @@ -1,3 +1,8 @@ https://github.com/ruby/ruby/blob/trunk/ChangeLog#L1 +Fri Dec 11 16:48:57 2015 Eric Wong <e@8...> + + * hash.c (rb_num_hash_start): avoid pathological behavior + [ruby-core:72028] [Feature #11405] + Fri Dec 11 11:58:46 2015 SHIBATA Hiroshi <hsbt@r...> * NEWS: Mentioned rubygems-2.5.1 Index: hash.c =================================================================== --- hash.c (revision 53036) +++ hash.c (revision 53037) @@ -191,11 +191,11 @@ rb_num_hash_start(st_index_t n) https://github.com/ruby/ruby/blob/trunk/hash.c#L191 * - (n >> (RUBY_SPECIAL_SHIFT+3)) was added to make symbols hash well, * n.b.: +3 to remove most ID scope, +1 worked well initially, too * n.b.: +1 (instead of 3) worked well initially, too - * - (n << 3) was finally added to avoid losing bits for fixnums + * - (n << 16) was finally added to avoid losing bits for fixnums * - avoid expensive modulo instructions, it is currently only * shifts and bitmask operations. */ - return (n >> (RUBY_SPECIAL_SHIFT + 3) | (n << 3)) ^ (n >> 3); + return (n >> (RUBY_SPECIAL_SHIFT + 3) ^ (n << 16)) ^ (n >> 3); } long Index: benchmark/bm_hash_shift_u32.rb =================================================================== --- benchmark/bm_hash_shift_u32.rb (revision 0) +++ benchmark/bm_hash_shift_u32.rb (revision 53037) @@ -0,0 +1,10 @@ https://github.com/ruby/ruby/blob/trunk/benchmark/bm_hash_shift_u32.rb#L1 +h = {} + +(0xffff4000..0xffffffff).each do |i| + h[i] = nil +end + +300000.times do + k, v = h.shift + h[k] = v +end Index: benchmark/bm_hash_shift_u24.rb =================================================================== --- benchmark/bm_hash_shift_u24.rb (revision 0) +++ benchmark/bm_hash_shift_u24.rb (revision 53037) @@ -0,0 +1,10 @@ https://github.com/ruby/ruby/blob/trunk/benchmark/bm_hash_shift_u24.rb#L1 +h = {} + +(0xff4000..0xffffff).each do |i| + h[i] = nil +end + +300000.times do + k, v = h.shift + h[k] = v +end Index: benchmark/bm_hash_shift_u16.rb =================================================================== --- benchmark/bm_hash_shift_u16.rb (revision 0) +++ benchmark/bm_hash_shift_u16.rb (revision 53037) @@ -0,0 +1,10 @@ https://github.com/ruby/ruby/blob/trunk/benchmark/bm_hash_shift_u16.rb#L1 +h = {} + +(16384..65536).each do |i| + h[i] = nil +end + +300000.times do + k, v = h.shift + h[k] = v +end -- ML: ruby-changes@q... Info: http://www.atdot.net/~ko1/quickml/