ruby-changes:4202
From: ko1@a...
Date: Wed, 5 Mar 2008 17:46:17 +0900 (JST)
Subject: [ruby-changes:4202] duerst - Ruby:r15692 (trunk): Web Mar 5 17:43:43 2008 Martin Duerst <duerst@i...>
duerst 2008-03-05 17:45:51 +0900 (Wed, 05 Mar 2008) New Revision: 15692 Modified files: trunk/ChangeLog trunk/test/ruby/test_transcode.rb trunk/transcode.c Log: Web Mar 5 17:43:43 2008 Martin Duerst <duerst@i...> * transcode.c (transcode_loop): Adjusted detection of invalid (ill-formed) UTF-8 sequences. Fixing potential security issue, see http://www.unicode.org/versions/Unicode5.1.0/#Notable_Changes. * test/ruby/test_transcode.rb: Added two tests for above fix. http://svn.ruby-lang.org/cgi-bin/viewvc.cgi/trunk/test/ruby/test_transcode.rb?r1=15692&r2=15691&diff_format=u http://svn.ruby-lang.org/cgi-bin/viewvc.cgi/trunk/ChangeLog?r1=15692&r2=15691&diff_format=u http://svn.ruby-lang.org/cgi-bin/viewvc.cgi/trunk/transcode.c?r1=15692&r2=15691&diff_format=u Index: ChangeLog =================================================================== --- ChangeLog (revision 15691) +++ ChangeLog (revision 15692) @@ -1,3 +1,11 @@ +Web Mar 5 17:43:43 2008 Martin Duerst <duerst@i...> + + * transcode.c (transcode_loop): Adjusted detection of invalid + (ill-formed) UTF-8 sequences. Fixing potential security issue, see + http://www.unicode.org/versions/Unicode5.1.0/#Notable_Changes. + + * test/ruby/test_transcode.rb: Added two tests for above fix. + Wed Mar 5 14:00:49 2008 Yukihiro Matsumoto <matz@r...> * numeric.c (fix_to_s): avoid rb_scan_args() when no argument Index: test/ruby/test_transcode.rb =================================================================== --- test/ruby/test_transcode.rb (revision 15691) +++ test/ruby/test_transcode.rb (revision 15692) @@ -242,6 +242,11 @@ def test_invalid_ignore # arguments only - 'abc'.encode('utf-8', invalid: :ignore) + assert_nothing_raised { 'abc'.encode('utf-8', invalid: :ignore) } + # check handling of UTF-8 ill-formed subsequences + assert_equal("\x00\x41\x00\x3E\x00\x42".force_encoding('UTF-16BE'), + "\x41\xC2\x3E\x42".encode('UTF-16BE', 'UTF-8', invalid: :ignore)) + assert_equal("\x00\x41\x00\xF1\x00\x42".force_encoding('UTF-16BE'), + "\x41\xC2\xC3\xB1\x42".encode('UTF-16BE', 'UTF-8', invalid: :ignore)) end end Index: transcode.c =================================================================== --- transcode.c (revision 15691) +++ transcode.c (revision 15692) @@ -177,8 +177,10 @@ if (from_utf8) { if ((next_byte&0xC0) == 0x80) next_byte -= 0x80; - else + else { + in_p--; /* may need to add more code later to revert other things */ goto invalid; + } } next_table = (const BYTE_LOOKUP *)next_info; goto follow_byte; @@ -390,13 +392,15 @@ /* * call-seq: - * str.encode!(encoding) => str - * str.encode!(to_encoding, from_encoding) => str + * str.encode!(encoding [, options] ) => str + * str.encode!(to_encoding, from_encoding [, options] ) => str * - * With one argument, transcodes the contents of <i>str</i> from + * The first form transcodes the contents of <i>str</i> from * str.encoding to +encoding+. - * With two arguments, transcodes the contents of <i>str</i> from + * The second form transcodes the contents of <i>str</i> from * from_encoding to to_encoding. + * The options Hash gives details for conversion. See String#encode + * for details. * Returns the string even if no changes were made. */ @@ -414,13 +418,15 @@ /* * call-seq: - * str.encode(encoding) => str - * str.encode(to_encoding, from_encoding) => str + * str.encode(encoding [, options] ) => str + * str.encode(to_encoding, from_encoding [, options] ) => str * - * With one argument, returns a copy of <i>str</i> transcoded + * The first form returns a copy of <i>str</i> transcoded * to encoding +encoding+. - * With two arguments, returns a copy of <i>str</i> transcoded + * The second form returns a copy of <i>str</i> transcoded * from from_encoding to to_encoding. + * The options Hash gives details for conversion. Details + * to be added. */ static VALUE -- ML: ruby-changes@q... Info: http://www.atdot.net/~ko1/quickml/