Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit4856618

Browse files
committed
Generate EUC_CN mappings from gb18030-2022.ucm
In the wake ofcfa6cd2, EUC_CN was the only encoding that usedgb-18030-2000.xml to generate the .map files. Since EUC_CN is a subsetof GB18030, we can easily use the same UCM file. This allows deletingthe XML file from our repository.Author: Chao Li <lic@highgo.com>Discussion:https://postgr.es/m/CANWCAZaNRXZ-5NuXmsaMA2mKvMZnCGHZqQusLkpE%2B8YX%2Bi5OYg%40mail.gmail.com
1 parent684a745 commit4856618

File tree

3 files changed

+23
-30929
lines changed

3 files changed

+23
-30929
lines changed

‎src/backend/utils/mb/Unicode/Makefile‎

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -50,7 +50,7 @@ $(eval $(call map_rule,gbk,UCS_to_most.pl,CP936.TXT,GBK))
5050
$(eval $(call map_rule,johab,UCS_to_JOHAB.pl,JOHAB.TXT))
5151
$(eval $(call map_rule,uhc,UCS_to_UHC.pl,windows-949-2000.xml))
5252
$(eval $(call map_rule,euc_jp,UCS_to_EUC_JP.pl,CP932.TXT JIS0212.TXT))
53-
$(eval $(call map_rule,euc_cn,UCS_to_EUC_CN.pl,gb-18030-2000.xml))
53+
$(eval $(call map_rule,euc_cn,UCS_to_EUC_CN.pl,gb18030-2022.ucm))
5454
$(eval $(call map_rule,euc_kr,UCS_to_EUC_KR.pl,KSX1001.TXT))
5555
$(eval $(call map_rule,euc_tw,UCS_to_EUC_TW.pl,CNS11643.TXT))
5656
$(eval $(call map_rule,sjis,UCS_to_SJIS.pl,CP932.TXT))
@@ -75,7 +75,7 @@ BIG5.TXT CNS11643.TXT:
7575
euc-jis-2004-std.txtsjis-0213-2004-std.txt:
7676
$(DOWNLOAD) http://x0213.org/codetable/$(@F)
7777

78-
gb-18030-2000.xmlwindows-949-2000.xml:
78+
windows-949-2000.xml:
7979
$(DOWNLOAD) https://raw.githubusercontent.com/unicode-org/icu-data/master/charset/data/xml/$(@F)
8080

8181
gb18030-2022.ucm:

‎src/backend/utils/mb/Unicode/UCS_to_EUC_CN.pl‎

Lines changed: 21 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -2,16 +2,17 @@
22
#
33
# Copyright (c) 2007-2025, PostgreSQL Global Development Group
44
#
5-
# src/backend/utils/mb/Unicode/UCS_to_GB18030.pl
5+
# src/backend/utils/mb/Unicode/UCS_to_EUC_CN.pl
66
#
7-
# Generate UTF-8 <-->GB18030 code conversion tables from
8-
# "gb-18030-2000.xml", obtained from
9-
#http://source.icu-project.org/repos/icu/data/trunk/charset/data/xml/
7+
# Generate UTF-8 <-->EUC_CN code conversion tables from
8+
# "gb18030-2022.ucm", obtained from
9+
#https://github.com/unicode-org/icu/blob/main/icu4c/source/data/mappings/
1010
#
1111
# The lines we care about in the source file look like
12-
# <a u="009A" b="81 30 83 36"/>
13-
# where the "u" field is the Unicode code point in hex,
14-
# and the "b" field is the hex byte sequence for GB18030
12+
# <UXXXX> \xYY[\xYY...] |n
13+
# where XXXX is the Unicode code point in hex,
14+
# and the \xYY... is the hex byte sequence for GB18030,
15+
# and n is a flag indicating the type of mapping.
1516

1617
use strict;
1718
use warningsFATAL=>'all';
@@ -22,17 +23,26 @@
2223

2324
# Read the input
2425

25-
my$in_file ="gb-18030-2000.xml";
26+
my$in_file ="gb18030-2022.ucm";
2627

2728
open(my$in,'<',$in_file) ||die("cannot open$in_file");
2829

2930
my@mapping;
3031

3132
while (<$in>)
3233
{
33-
nextif (!m/<a u="([0-9A-F]+)" b="([0-9A-F ]+)"/);
34-
my ($u,$c) = ($1,$2);
35-
$c =~s///g;
34+
# Mappings may have been removed by commenting out
35+
nextif/^#/;
36+
37+
nextif !/^<U([0-9A-Fa-f]+)>\s+
38+
((?:\\x[0-9A-Fa-f]{2})+)\s+
39+
\|(\d+)/x;
40+
my ($u,$c,$flag) = ($1,$2,$3);
41+
$c =~s/\\x//g;
42+
43+
# We only want round-trip mappings
44+
nextif ($flagne'0');
45+
3646
my$ucs =hex($u);
3747
my$code =hex($c);
3848

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp