ruby-changes:40958

normal	2015-12-11 17:23:46 +0900 (Fri, 11 Dec 2015)

  New Revision: 53037

  http://svn.ruby-lang.org/cgi-bin/viewvc.cgi?view=revision&revision=53037

  Log:
    hash.c (rb_num_hash_start): avoid pathological behavior
    
    The OR-ing itself is bad for a hash function, and shifting 3 bits
    left was not enough to undo the damage done by shifting
    (RUBY_SPECIAL_SHIFT+3) bits right.  Experimentally, shifting 16-17
    bits seemed to work well in preparing the number for murmur hash.
    
    Add a few more benchmarks to based on bm_hash_shift to ensure
    we don't hurt performance too much with tweaks.
    
    I'm pretty confident about this change and commiting it now;
    especially since we're still using Murmur behind it (but perhaps
    we can update to a newer hash from Murmur...)
    
    [ruby-core:72028] [Feature #11405]
    
    target 0: a (ruby 2.3.0dev (2015-12-11 trunk 53027) [x86_64-linux]) at "/home/ew/rrrr/b/ruby"
    target 1: b (ruby 2.3.0dev (2015-12-11 master 53027) [x86_64-linux]) at "/home/ew/ruby/b/ruby"
    
    benchmark results:
    minimum results in each 5 measurements.
    Execution time (sec)
    name	a	b
    hash_aref_dsym	0.279	0.276
    hash_aref_dsym_long	4.951	4.936
    hash_aref_fix	0.281	0.283
    hash_aref_flo	0.060	0.060
    hash_aref_miss	0.409	0.410
    hash_aref_str	0.387	0.385
    hash_aref_sym	0.275	0.270
    hash_aref_sym_long	0.410	0.411
    hash_flatten	0.252	0.237
    hash_ident_flo	0.035	0.032
    hash_ident_num	0.254	0.251
    hash_ident_obj	0.252	0.256
    hash_ident_str	0.250	0.252
    hash_ident_sym	0.259	0.270
    hash_keys	0.267	0.267
    hash_shift	0.016	0.015
    hash_shift_u16	0.074	0.072
    hash_shift_u24	0.071	0.071
    hash_shift_u32	0.073	0.072
    hash_to_proc	0.008	0.008
    hash_values	0.263	0.264
    
    Speedup ratio: compare with the result of `a' (greater is better)
    name	b
    hash_aref_dsym	1.009
    hash_aref_dsym_long	1.003
    hash_aref_fix	0.993
    hash_aref_flo	1.001
    hash_aref_miss	0.996
    hash_aref_str	1.006
    hash_aref_sym	1.017
    hash_aref_sym_long	0.998
    hash_flatten	1.061
    hash_ident_flo	1.072
    hash_ident_num	1.012
    hash_ident_obj	0.987
    hash_ident_str	0.993
    hash_ident_sym	0.959
    hash_keys	0.997
    hash_shift	1.036
    hash_shift_u16	1.039
    hash_shift_u24	1.001
    hash_shift_u32	1.017
    hash_to_proc	1.001
    hash_values	0.995

  Added files:
    trunk/benchmark/bm_hash_shift_u16.rb
    trunk/benchmark/bm_hash_shift_u24.rb
    trunk/benchmark/bm_hash_shift_u32.rb
  Modified files:
    trunk/ChangeLog
    trunk/hash.c
Index: ChangeLog
===================================================================
--- ChangeLog	(revision 53036)
+++ ChangeLog	(revision 53037)
@@ -1,3 +1,8 @@ https://github.com/ruby/ruby/blob/trunk/ChangeLog#L1
+Fri Dec 11 16:48:57 2015  Eric Wong  <e@8...>
+
+	* hash.c (rb_num_hash_start): avoid pathological behavior
+	  [ruby-core:72028] [Feature #11405]
+
 Fri Dec 11 11:58:46 2015  SHIBATA Hiroshi  <hsbt@r...>
 
 	* NEWS: Mentioned rubygems-2.5.1
Index: hash.c
===================================================================
--- hash.c	(revision 53036)
+++ hash.c	(revision 53037)
@@ -191,11 +191,11 @@ rb_num_hash_start(st_index_t n) https://github.com/ruby/ruby/blob/trunk/hash.c#L191
      * - (n >> (RUBY_SPECIAL_SHIFT+3)) was added to make symbols hash well,
      *   n.b.: +3 to remove most ID scope, +1 worked well initially, too
      *   n.b.: +1 (instead of 3) worked well initially, too
-     * - (n << 3) was finally added to avoid losing bits for fixnums
+     * - (n << 16) was finally added to avoid losing bits for fixnums
      * - avoid expensive modulo instructions, it is currently only
      *   shifts and bitmask operations.
      */
-    return (n >> (RUBY_SPECIAL_SHIFT + 3) | (n << 3)) ^ (n >> 3);
+    return (n >> (RUBY_SPECIAL_SHIFT + 3) ^ (n << 16)) ^ (n >> 3);
 }
 
 long
Index: benchmark/bm_hash_shift_u32.rb
===================================================================
--- benchmark/bm_hash_shift_u32.rb	(revision 0)
+++ benchmark/bm_hash_shift_u32.rb	(revision 53037)
@@ -0,0 +1,10 @@ https://github.com/ruby/ruby/blob/trunk/benchmark/bm_hash_shift_u32.rb#L1
+h = {}
+
+(0xffff4000..0xffffffff).each do |i|
+  h[i] = nil
+end
+
+300000.times do
+  k, v = h.shift
+  h[k] = v
+end
Index: benchmark/bm_hash_shift_u24.rb
===================================================================
--- benchmark/bm_hash_shift_u24.rb	(revision 0)
+++ benchmark/bm_hash_shift_u24.rb	(revision 53037)
@@ -0,0 +1,10 @@ https://github.com/ruby/ruby/blob/trunk/benchmark/bm_hash_shift_u24.rb#L1
+h = {}
+
+(0xff4000..0xffffff).each do |i|
+  h[i] = nil
+end
+
+300000.times do
+  k, v = h.shift
+  h[k] = v
+end
Index: benchmark/bm_hash_shift_u16.rb
===================================================================
--- benchmark/bm_hash_shift_u16.rb	(revision 0)
+++ benchmark/bm_hash_shift_u16.rb	(revision 53037)
@@ -0,0 +1,10 @@ https://github.com/ruby/ruby/blob/trunk/benchmark/bm_hash_shift_u16.rb#L1
+h = {}
+
+(16384..65536).each do |i|
+  h[i] = nil
+end
+
+300000.times do
+  k, v = h.shift
+  h[k] = v
+end

--
ML: ruby-changes@q...
Info: http://www.atdot.net/~ko1/quickml/