[前][次][番号順一覧][スレッド一覧]

ruby-changes:43151

From: duerst <ko1@a...>
Date: Tue, 31 May 2016 10:10:14 +0900 (JST)
Subject: [ruby-changes:43151] duerst:r55225 (trunk): * string.c: Activate full Unicode case mapping for UTF-8 by removing

duerst	2016-05-31 10:10:06 +0900 (Tue, 31 May 2016)

  New Revision: 55225

  https://svn.ruby-lang.org/cgi-bin/viewvc.cgi?view=revision&revision=55225

  Log:
    * string.c: Activate full Unicode case mapping for UTF-8 by removing
      the protective check for the presence of an option.
      Update documentation.
    * test/ruby/enc/test_case_comprehensive.rb: Adjust tests for above change.

  Modified files:
    trunk/ChangeLog
    trunk/string.c
    trunk/test/ruby/enc/test_case_comprehensive.rb
Index: test/ruby/enc/test_case_comprehensive.rb
===================================================================
--- test/ruby/enc/test_case_comprehensive.rb	(revision 55224)
+++ test/ruby/enc/test_case_comprehensive.rb	(revision 55225)
@@ -78,10 +78,10 @@ class TestComprehensiveCaseFold < Test:: https://github.com/ruby/ruby/blob/trunk/test/ruby/enc/test_case_comprehensive.rb#L78
     end
 
     tests = [
-      CaseTest.new(:downcase,   [:lithuanian], downcase),
-      CaseTest.new(:upcase,     [:lithuanian], upcase),
-      CaseTest.new(:capitalize, [:lithuanian], titlecase, downcase),
-      # swapcase?????!!!!!
+      CaseTest.new(:downcase,   [], downcase),
+      CaseTest.new(:upcase,     [], upcase),
+      CaseTest.new(:capitalize, [], titlecase, downcase),
+      # @@@@ TODO: figure out how to test swapcase
       CaseTest.new(:downcase,   [:fold],       casefold),
       CaseTest.new(:upcase,     [:turkic],     turkic_upcase),
       CaseTest.new(:downcase,   [:turkic],     turkic_downcase),
Index: string.c
===================================================================
--- string.c	(revision 55224)
+++ string.c	(revision 55225)
@@ -5850,7 +5850,7 @@ rb_str_upcase_bang(int argc, VALUE *argv https://github.com/ruby/ruby/blob/trunk/string.c#L5850
     enc = STR_ENC_GET(str);
     rb_str_check_dummy_enc(enc);
     s = RSTRING_PTR(str); send = RSTRING_END(str);
-    if (enc==rb_utf8_encoding() && argc>0) { /* :lithuanian can temporarily be used for new functionality without options */
+    if (enc==rb_utf8_encoding()) {
 	str_shared_replace(str, rb_str_casemap(str, &flags, enc));
 	modify = ONIGENC_CASE_MODIFIED & flags;
     }
@@ -5940,7 +5940,7 @@ rb_str_downcase_bang(int argc, VALUE *ar https://github.com/ruby/ruby/blob/trunk/string.c#L5940
     enc = STR_ENC_GET(str);
     rb_str_check_dummy_enc(enc);
     s = RSTRING_PTR(str); send = RSTRING_END(str);
-    if (enc==rb_utf8_encoding() && argc>0) { /* :lithuanian can temporarily be used for new functionality without options */
+    if (enc==rb_utf8_encoding()) {
 	str_shared_replace(str, rb_str_casemap(str, &flags, enc));
 	modify = ONIGENC_CASE_MODIFIED & flags;
     }
@@ -5999,11 +5999,11 @@ rb_str_downcase_bang(int argc, VALUE *ar https://github.com/ruby/ruby/blob/trunk/string.c#L5999
  *  The meaning of the +options+ is as follows:
  *
  *  No option ::
- *    Currently, old behavior (only the ASCII region, i.e. characters
- *    ``A'' to ``Z'', and/or ``a'' to ``z'', are affected).
- *    This will change very soon to full Unicode case mapping.
+ *    Full Unicode case mapping, suitable for most languages
+ *    (see :turkic and :lithuanian options below for exceptions)
  *  :ascii ::
- *    Only the ASCII region, i.e. the characters ``A'' to ``Z'', are affected.
+ *    Only the ASCII region, i.e. the characters ``A'' to ``Z'' and
+ *    ``a'' to ``z'', are affected.
  *    This option cannot be combined with any other option.
  *  :turkic ::
  *    Full Unicode case mapping, adapted for Turkic languages
@@ -6012,21 +6012,23 @@ rb_str_downcase_bang(int argc, VALUE *ar https://github.com/ruby/ruby/blob/trunk/string.c#L6012
  *  :lithuanian ::
  *    Currently, just full Unicode case mapping. In the future, full Unicode
  *    case mapping adapted for Lithuanian (keeping the dot on the lower case
- *    i even if there's an accent on top).
+ *    i even if there is an accent on top).
  *  :fold ::
- *    Only available on +downcase+ and +downcase!+. Unicode case folding, which
- *    is more far-reaching than Unicode case mapping. This option currently
- *    cannot be combined with any other option (i.e. we do not currenty
- *    implement a variant for turkic languages).
+ *    Only available on +downcase+ and +downcase!+. Unicode case <b>folding</b>,
+ *    which is more far-reaching than Unicode case mapping.
+ *    This option currently cannot be combined with any other option
+ *    (i.e. we do not currenty implement a variant for turkic languages).
  *
  *  Please note that several assumptions that are valid for ASCII-only case
  *  conversions do not hold for more general case conversions. For example,
  *  the length of the result may not be the same as the length of the input
- *  (neither in characters nor in bytes), and some roundtrip assumptions
- *  (e.g. str.downcase == str.downcase.upcase.downcase) may not apply.
+ *  (neither in characters nor in bytes), some roundtrip assumptions
+ *  (e.g. str.downcase == str.upcase.downcase) may not apply, and Unicode
+ *  normalization (i.e. String#unicode_normalize) is not necessarily maintained
+ *  by case mapping operations.
  *
- *  Non-ASCII case mapping/folding is currently only supported for UTF-8 Strings,
- *  but this support will be extended to other encodings in the future.
+ *  Non-ASCII case mapping/folding is currently only supported for UTF-8
+ *  Strings/Symbols, but this support will be extended to other encodings.
  *
  *     "hEllO".downcase   #=> "hello"
  */
@@ -6071,7 +6073,7 @@ rb_str_capitalize_bang(int argc, VALUE * https://github.com/ruby/ruby/blob/trunk/string.c#L6073
     enc = STR_ENC_GET(str);
     rb_str_check_dummy_enc(enc);
     if (RSTRING_LEN(str) == 0 || !RSTRING_PTR(str)) return Qnil;
-    if (enc==rb_utf8_encoding() && argc>0) { /* :lithuanian can temporarily be used for new functionality without options */
+    if (enc==rb_utf8_encoding()) {
 	str_shared_replace(str, rb_str_casemap(str, &flags, enc));
 	modify = ONIGENC_CASE_MODIFIED & flags;
     }
@@ -6147,7 +6149,7 @@ rb_str_swapcase_bang(int argc, VALUE *ar https://github.com/ruby/ruby/blob/trunk/string.c#L6149
     enc = STR_ENC_GET(str);
     rb_str_check_dummy_enc(enc);
     s = RSTRING_PTR(str); send = RSTRING_END(str);
-    if (enc==rb_utf8_encoding() && argc>0) { /* :lithuanian can temporarily be used for new functionality without options */
+    if (enc==rb_utf8_encoding()) {
 	str_shared_replace(str, rb_str_casemap(str, &flags, enc));
 	modify = ONIGENC_CASE_MODIFIED & flags;
     }
Index: ChangeLog
===================================================================
--- ChangeLog	(revision 55224)
+++ ChangeLog	(revision 55225)
@@ -1,3 +1,11 @@ https://github.com/ruby/ruby/blob/trunk/ChangeLog#L1
+Tue May 31 10:10:03 2016  Martin Duerst  <duerst@i...>
+
+	* string.c: Activate full Unicode case mapping for UTF-8 by removing
+	  the protective check for the presence of an option.
+	  Update documentation.
+
+	* test/ruby/enc/test_case_comprehensive.rb: Adjust tests for above change.
+
 Tue May 31 00:30:11 2016  NAKAMURA Usaku  <usa@r...>
 
 	* ext/socket/raddrinfo.c (host_str, port_str): Use StringValueCStr

--
ML: ruby-changes@q...
Info: http://www.atdot.net/~ko1/quickml/

[前][次][番号順一覧][スレッド一覧]