[前][次][番号順一覧][スレッド一覧]

ruby-changes:45625

From: normal <ko1@a...>
Date: Fri, 24 Feb 2017 10:01:29 +0900 (JST)
Subject: [ruby-changes:45625] normal:r57698 (trunk): string.c (str_uminus): deduplicate strings

normal	2017-02-24 10:01:23 +0900 (Fri, 24 Feb 2017)

  New Revision: 57698

  https://svn.ruby-lang.org/cgi-bin/viewvc.cgi?view=revision&revision=57698

  Log:
    string.c (str_uminus): deduplicate strings
    
    This exposes the rb_fstring internal function to return a
    deduped and frozen string when a non-frozen string is given.
    This is useful for writing all sorts of record processing key
    values maybe stored, but certain keys and values are often
    duplicated at a high frequency, so memory savings can
    noticeable.
    
    Use cases are many:
    
    * email/NNTP header processing
    
      There are some standard header keys everybody uses
      (From/To/Cc/Date/Subject/Received/Message-ID/References/In-Reply-To),
      as well as common ones specific to a certain lists:
      (ruby-core has X-Redmine-* headers)
      It is also useful to dedupe values, as most inboxes have
      multiple messages from the same sender, or MUA.
    
    * package management systems -
      things like RubyGems stores identical strings for licenses,
      dependency names, author names/emails, etc
    
    * HTTP headers/trailers -
      standard headers (Host/Accept/Accept-Encoding/User-Agent/...)
      are common, but there are also uncommon ones.
      Values may be deduped, as well, as it is likely a user
      agent will make multiple/parallel requests to the same
      server.
    
    * version control systems -
      this can be useful for deduplicating names of frequent
      committers (like "nobu" :)
    
      In linux.git and git.git, there are also common
      trailers such as Signed-Off-By/Acked-by/Reviewed-by/Fixes/...
      as well as less common ones.
    
    * audio metadata -
    
      There are commonly used tags (Artist/Album/Title/Tracknumber),
      but Vorbis comments allows arbitrary key values to be stored.
      Music collections contain songs by the same artist or mutiple
      songs from the same album, so deduplicating values will be
      helpful there, too.
    
    * JSON, YAML, XML, HTML processing
    
      Certain fields, tags and attributes are commonly used
      across the same and multiple documents
    
    There is no security concern in this being a DoS vector by
    causing immortal strings.  The fstring table is not a GC-root
    and not walked during the mark phase.  GC-able dynamic symbols
    since Ruby 2.2 are handled in the same manner, and that
    implementation also relies on the non-immortality of fstrings.
    
    [Feature #13077] [ruby-core:79663]

  Modified files:
    trunk/string.c
Index: string.c
===================================================================
--- string.c	(revision 57697)
+++ string.c	(revision 57698)
@@ -2530,7 +2530,7 @@ str_uminus(VALUE str) https://github.com/ruby/ruby/blob/trunk/string.c#L2530
 	return str;
     }
     else {
-	return rb_str_freeze(rb_str_dup(str));
+	return rb_fstring(str);
     }
 }
 

--
ML: ruby-changes@q...
Info: http://www.atdot.net/~ko1/quickml/

[前][次][番号順一覧][スレッド一覧]