ruby-changes:51873
From: duerst <ko1@a...>
Date: Sat, 28 Jul 2018 18:44:40 +0900 (JST)
Subject: [ruby-changes:51873] duerst:r64087 (trunk): fix range check for Hangul jamo trailers in Unicode normalization
duerst 2018-07-28 18:44:33 +0900 (Sat, 28 Jul 2018) New Revision: 64087 https://svn.ruby-lang.org/cgi-bin/viewvc.cgi?view=revision&revision=64087 Log: fix range check for Hangul jamo trailers in Unicode normalization * lib/unicode_normalize/normalize.rb: Fix the range check for trailing Hangul jamo characters in Unicode normalization. Different from leading or vowel jamos, where LBASE and VBASE are actual characters, a value equal to TBASE expresses the absence of a trailing jamo. This fix is technically correct, but there was no bug because the regular expressions in lib/unicode_normalize/tables.rb eliminate jamos equal to TBASE from normalization processing. * test/test_unicode_normalize.rb: Add preventive test test_no_trailing_jamo based on https://github.com/python/cpython/commit/d134809cd3764c6a634eab7bb8995e3e2eff14d5 just for the case we ever get a regression. This closes issue #14934, thanks to MaLin (Lin Ma) for reporting. Modified files: trunk/lib/unicode_normalize/normalize.rb trunk/test/test_unicode_normalize.rb Index: lib/unicode_normalize/normalize.rb =================================================================== --- lib/unicode_normalize/normalize.rb (revision 64086) +++ lib/unicode_normalize/normalize.rb (revision 64087) @@ -70,7 +70,7 @@ module UnicodeNormalize # :nodoc: https://github.com/ruby/ruby/blob/trunk/lib/unicode_normalize/normalize.rb#L70 if length>1 and 0 <= (lead =string[0].ord-LBASE) and lead < LCOUNT and 0 <= (vowel=string[1].ord-VBASE) and vowel < VCOUNT lead_vowel = SBASE + (lead * VCOUNT + vowel) * TCOUNT - if length>2 and 0 <= (trail=string[2].ord-TBASE) and trail < TCOUNT + if length>2 and 0 < (trail=string[2].ord-TBASE) and trail < TCOUNT (lead_vowel + trail).chr(Encoding::UTF_8) + string[3..-1] else lead_vowel.chr(Encoding::UTF_8) + string[2..-1] Index: test/test_unicode_normalize.rb =================================================================== --- test/test_unicode_normalize.rb (revision 64086) +++ test/test_unicode_normalize.rb (revision 64087) @@ -167,6 +167,13 @@ class TestUnicodeNormalize https://github.com/ruby/ruby/blob/trunk/test/test_unicode_normalize.rb#L167 assert_equal "\u1100\u1161\u11A8", "\uAC00\u11A8".unicode_normalize(:nfd) end + # preventive tests for (non-)bug #14934 + def test_no_trailing_jamo + assert_equal "\u1100\u1176\u11a8", "\u1100\u1176\u11a8".unicode_normalize(:nfc) + assert_equal "\uae30\u11a7", "\u1100\u1175\u11a7".unicode_normalize(:nfc) + assert_equal "\uae30\u11c3", "\u1100\u1175\u11c3".unicode_normalize(:nfc) + end + def test_hangul_plus_accents assert_equal "\uAC00\u0323\u0300", "\uAC00\u0300\u0323".unicode_normalize(:nfc) assert_equal "\uAC00\u0323\u0300", "\u1100\u1161\u0300\u0323".unicode_normalize(:nfc) -- ML: ruby-changes@q... Info: http://www.atdot.net/~ko1/quickml/