Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit8d32717

Browse files
committed
Avoid doing encoding conversions by double-conversion via MULE_INTERNAL.
Previously, we did many conversions for Cyrillic and Central Europeansingle-byte encodings by converting to a related MULE_INTERNAL codingscheme before converting to the destination. This seems unnecessarilyinefficient. Moreover, if the conversion encounters an untranslatablecharacter, the error message will confusingly complain about failureto convert to or from MULE_INTERNAL, rather than the user-visibleencodings. Worse still, this approach results in some completelyunnecessary conversion failures; there are cases where the chosenMULE subset lacks characters that exist in both of the user-visibleencodings, causing a conversion failure that need not occur.This patch fixes the first two of those deficiencies by introducinga new local2local() conversion support subroutine for direct conversionbetween any two single-byte character sets, and adding new conversiontables where needed. However, I generated the new conversion tables bytesting PG 9.5's behavior, so that the actual conversion behavior isbug-compatible with previous releases; the only user-visible behaviorchange is that the error messages for conversion failures are saner.Changes in the conversion behavior will probably ensue after discussion.Interestingly, although this approach requires more tables, the .so filesactually end up smaller (at least on my x86_64 machine); the tables aresmaller than the management code needed for double conversion.Per a complaint from Albe Laurenz.
1 parent5afdfc9 commit8d32717

File tree

5 files changed

+376
-411
lines changed

5 files changed

+376
-411
lines changed

‎src/backend/utils/mb/conv.c

Lines changed: 50 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,51 @@
1414
#include"mb/pg_wchar.h"
1515

1616

17+
/*
18+
* local2local: a generic single byte charset encoding
19+
* conversion between two ASCII-superset encodings.
20+
*
21+
* l points to the source string of length len
22+
* p is the output area (must be large enough!)
23+
* src_encoding is the PG identifier for the source encoding
24+
* dest_encoding is the PG identifier for the target encoding
25+
* tab holds conversion entries for the source charset
26+
* starting from 128 (0x80). each entry in the table holds the corresponding
27+
* code point for the target charset, or 0 if there is no equivalent code.
28+
*/
29+
void
30+
local2local(constunsignedchar*l,
31+
unsignedchar*p,
32+
intlen,
33+
intsrc_encoding,
34+
intdest_encoding,
35+
constunsignedchar*tab)
36+
{
37+
unsignedcharc1,
38+
c2;
39+
40+
while (len>0)
41+
{
42+
c1=*l;
43+
if (c1==0)
44+
report_invalid_encoding(src_encoding, (constchar*)l,len);
45+
if (!IS_HIGHBIT_SET(c1))
46+
*p++=c1;
47+
else
48+
{
49+
c2=tab[c1-HIGHBIT];
50+
if (c2)
51+
*p++=c2;
52+
else
53+
report_untranslatable_char(src_encoding,dest_encoding,
54+
(constchar*)l,len);
55+
}
56+
l++;
57+
len--;
58+
}
59+
*p='\0';
60+
}
61+
1762
/*
1863
* LATINn ---> MIC when the charset's local codes map directly to MIC
1964
*
@@ -141,8 +186,8 @@ pg_mic2ascii(const unsigned char *mic, unsigned char *p, int len)
141186
* lc is the mule character set id for the local encoding
142187
* encoding is the PG identifier for the local encoding
143188
* tab holds conversion entries for the local charset
144-
* starting from 128 (0x80). each entry in the table
145-
*holds the correspondingcode point for the muleinternal code.
189+
* starting from 128 (0x80). each entry in the table holds the corresponding
190+
* code point for the muleencoding, or 0 if there is no equivalent code.
146191
*/
147192
void
148193
latin2mic_with_table(constunsignedchar*l,
@@ -188,9 +233,9 @@ latin2mic_with_table(const unsigned char *l,
188233
* p is the output area (must be large enough!)
189234
* lc is the mule character set id for the local encoding
190235
* encoding is the PG identifier for the local encoding
191-
* tab holds conversion entries for the mule internal code's
192-
*second byte,starting from 128 (0x80). each entry in the table
193-
*holds the correspondingcode point for the local charset.
236+
* tab holds conversion entries for the mule internal code's second byte,
237+
* starting from 128 (0x80). each entry in the table holds the corresponding
238+
* code point for the local charset, or 0 if there is no equivalent code.
194239
*/
195240
void
196241
mic2latin_with_table(constunsignedchar*mic,

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp