ruby-changes:25311
From: kou <ko1@a...>
Date: Sun, 28 Oct 2012 21:43:52 +0900 (JST)
Subject: [ruby-changes:25311] kou:r37363 (trunk): * lib/rexml/parsers/baseparser.rb: Fix a bug that UTF-8 is used
kou 2012-10-28 21:42:37 +0900 (Sun, 28 Oct 2012) New Revision: 37363 http://svn.ruby-lang.org/cgi-bin/viewvc.cgi?view=rev&revision=37363 Log: * lib/rexml/parsers/baseparser.rb: Fix a bug that UTF-8 is used for UTF-16XX encoded XML that doesn't have encoding="UTF-16" in XML declration. * test/rexml/test_document.rb: Add tests for the above change. Modified files: trunk/ChangeLog trunk/lib/rexml/parsers/baseparser.rb trunk/test/rexml/test_document.rb Index: ChangeLog =================================================================== --- ChangeLog (revision 37362) +++ ChangeLog (revision 37363) @@ -1,3 +1,10 @@ +Sun Oct 28 21:40:13 2012 Kouhei Sutou <kou@c...> + + * lib/rexml/parsers/baseparser.rb: Fix a bug that UTF-8 is used + for UTF-16XX encoded XML that doesn't have encoding="UTF-16" in + XML declration. + * test/rexml/test_document.rb: Add tests for the above change. + Sun Oct 28 21:37:34 2012 Kouhei Sutou <kou@c...> * test/rexml/test_document.rb: Group tests that they parse Index: lib/rexml/parsers/baseparser.rb =================================================================== --- lib/rexml/parsers/baseparser.rb (revision 37362) +++ lib/rexml/parsers/baseparser.rb (revision 37363) @@ -215,6 +215,9 @@ if need_source_encoding_update?(encoding) @source.encoding = encoding end + if encoding.nil? and /\AUTF-16(?:BE|LE)\z/i =~ @source.encoding + encoding = "UTF-16" + end standalone = STANDALONE.match(results) standalone = standalone[1] unless standalone.nil? return [ :xmldecl, version, encoding, standalone ] Index: test/rexml/test_document.rb =================================================================== --- test/rexml/test_document.rb (revision 37362) +++ test/rexml/test_document.rb (revision 37363) @@ -246,5 +246,27 @@ assert_equal("UTF-16", document.encoding) end end + + class NoEncodingTest < self + def test_utf_16le + xml = <<-EOX.encode("UTF-16LE").force_encoding("ASCII-8BIT") +<?xml version="1.0"?> +<message>Hello world!</message> +EOX + bom = "\ufeff".encode("UTF-16LE").force_encoding("ASCII-8BIT") + document = REXML::Document.new(bom + xml) + assert_equal("UTF-16", document.encoding) + end + + def test_utf_16be + xml = <<-EOX.encode("UTF-16BE").force_encoding("ASCII-8BIT") +<?xml version="1.0"?> +<message>Hello world!</message> +EOX + bom = "\ufeff".encode("UTF-16BE").force_encoding("ASCII-8BIT") + document = REXML::Document.new(bom + xml) + assert_equal("UTF-16", document.encoding) + end + end end end -- ML: ruby-changes@q... Info: http://www.atdot.net/~ko1/quickml/