Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit8d3e090

Browse files
committed
Extend GB18030 encoding conversion to cover full Unicode range.
Our previous code for GB18030 <-> UTF8 conversion only covered Unicode codepoints up to U+FFFF, but the actual spec defines conversions for all codepoints up to U+10FFFF. That would be rather impractical as a lookup table,but fortunately there is a simple algorithmic conversion between theadditional code points and the equivalent GB18030 byte patterns. Make useof the just-added callback facility in LocalToUtf/UtfToLocal to perform theadditional conversions.Having created the infrastructure to do that, we can use the same code tomap certain linearly-related subranges of the Unicode space below U+FFFF,allowing removal of the corresponding lookup table entries. This morethan halves the lookup table size, which is a substantial savings;utf8_and_gb18030.so drops from nearly a megabyte to about half that.In support of doing that, replace ISO10646-GB18030.TXT with the data filegb-18030-2000.xml (retrieved fromhttp://source.icu-project.org/repos/icu/data/trunk/charset/data/xml/ )in which these subranges have been deleted from the simple lookup entries.Per bug #12845 from Arjen Nienhuis. The conversion code added here isbased on his proposed patch, though I whacked it around rather heavily.
1 parent92edba2 commit8d3e090

File tree

7 files changed

+31111
-128805
lines changed

7 files changed

+31111
-128805
lines changed

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp