Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit5f538ad

Browse files
committed
Renovate display of non-ASCII messages on Windows.
GNU gettext selects a default encoding for the messages it emits in aplatform-specific manner; it uses the Windows ANSI code page on Windowsand follows LC_CTYPE on other platforms. This is inconvenient forPostgreSQL server processes, so realize consistent cross-platformbehavior by calling bind_textdomain_codeset() on Windows each time wepermanently change LC_CTYPE. This primarily affects SQL_ASCII databasesand processes like the postmaster that do not attach to a database,making their behavior consistent with PostgreSQL on non-Windowsplatforms. Messages from SQL_ASCII databases use the encoding impliedby the database LC_CTYPE, and messages from non-database processes useLC_CTYPE from the postmaster system environment. PlatformEncodingbecomes unused, so remove it.Make write_console() prefer WriteConsoleW() to write() regardless of theencodings in use. In this situation, write() will invariably mishandlenon-ASCII characters.elog.c has assumed that messages conform to the database encoding.While usually true, this does not hold for SQL_ASCII and MULE_INTERNAL.Introduce MessageEncoding to track the actual encoding of message text.The present consumers are Windows-specific code for converting messagesto UTF16 for use in system interfaces. This fixes the appearance inWindows event logs and consoles of translated messages from SQL_ASCIIprocesses like the postmaster. Note that SQL_ASCII inherently disclaimsa strong notion of encoding, so non-ASCII byte sequences interpolatedinto messages by %s may yet yield a nonsensical message. MULE_INTERNALhas similar problems at present, albeit for a different reason: its lackof libiconv support or a conversion to UTF8.Consequently, one need no longer restart Windows with a differentWindows ANSI code page to broadly test backend logging under a givenlanguage. Changing the user's locale ("Format") is enough. Severalaccounts can simultaneously run postmasters under different locales, allcorrectly logging localized messages to Windows event logs and consoles.Alexander Law and Noah Misch
1 parent2c1031b commit5f538ad

File tree

9 files changed

+210
-65
lines changed

9 files changed

+210
-65
lines changed

‎src/backend/main/main.c

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -265,6 +265,10 @@ startup_hacks(const char *progname)
265265
/*
266266
* Help display should match the options accepted by PostmasterMain()
267267
* and PostgresMain().
268+
*
269+
* XXX On Windows, non-ASCII localizations of these messages only display
270+
* correctly if the console output code page covers the necessary characters.
271+
* Messages emitted in write_console() do not exhibit this problem.
268272
*/
269273
staticvoid
270274
help(constchar*progname)

‎src/backend/utils/adt/pg_locale.c

Lines changed: 26 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -131,14 +131,16 @@ static char *IsoLocaleName(const char *);/* MSVC specific */
131131
/*
132132
* pg_perm_setlocale
133133
*
134-
* This is identical to the libc function setlocale(), with the addition
135-
* that if the operation is successful, the corresponding LC_XXX environment
136-
* variable is set to match. By setting the environment variable, we ensure
137-
* that any subsequent use of setlocale(..., "") will preserve the settings
138-
* made through this routine. Of course, LC_ALL must also be unset to fully
139-
* ensure that, but that has to be done elsewhere after all the individual
140-
* LC_XXX variables have been set correctly. (Thank you Perl for making this
141-
* kluge necessary.)
134+
* This wraps the libc function setlocale(), with two additions. First, when
135+
* changing LC_CTYPE, update gettext's encoding for the current message
136+
* domain. GNU gettext automatically tracks LC_CTYPE on most platforms, but
137+
* not on Windows. Second, if the operation is successful, the corresponding
138+
* LC_XXX environment variable is set to match. By setting the environment
139+
* variable, we ensure that any subsequent use of setlocale(..., "") will
140+
* preserve the settings made through this routine. Of course, LC_ALL must
141+
* also be unset to fully ensure that, but that has to be done elsewhere after
142+
* all the individual LC_XXX variables have been set correctly. (Thank you
143+
* Perl for making this kluge necessary.)
142144
*/
143145
char*
144146
pg_perm_setlocale(intcategory,constchar*locale)
@@ -172,6 +174,22 @@ pg_perm_setlocale(int category, const char *locale)
172174
if (result==NULL)
173175
returnresult;/* fall out immediately on failure */
174176

177+
/*
178+
* Use the right encoding in translated messages. Under ENABLE_NLS, let
179+
* pg_bind_textdomain_codeset() figure it out. Under !ENABLE_NLS, message
180+
* format strings are ASCII, but database-encoding strings may enter the
181+
* message via %s. This makes the overall message encoding equal to the
182+
* database encoding.
183+
*/
184+
if (category==LC_CTYPE)
185+
{
186+
#ifdefENABLE_NLS
187+
SetMessageEncoding(pg_bind_textdomain_codeset(textdomain(NULL)));
188+
#else
189+
SetMessageEncoding(GetDatabaseEncoding());
190+
#endif
191+
}
192+
175193
switch (category)
176194
{
177195
caseLC_COLLATE:

‎src/backend/utils/error/elog.c

Lines changed: 37 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1813,6 +1813,22 @@ write_syslog(int level, const char *line)
18131813
#endif/* HAVE_SYSLOG */
18141814

18151815
#ifdefWIN32
1816+
/*
1817+
* Get the PostgreSQL equivalent of the Windows ANSI code page. "ANSI" system
1818+
* interfaces (e.g. CreateFileA()) expect string arguments in this encoding.
1819+
* Every process in a given system will find the same value at all times.
1820+
*/
1821+
staticint
1822+
GetACPEncoding(void)
1823+
{
1824+
staticintencoding=-2;
1825+
1826+
if (encoding==-2)
1827+
encoding=pg_codepage_to_encoding(GetACP());
1828+
1829+
returnencoding;
1830+
}
1831+
18161832
/*
18171833
* Write a message line to the windows event log
18181834
*/
@@ -1858,16 +1874,18 @@ write_eventlog(int level, const char *line, int len)
18581874
}
18591875

18601876
/*
1861-
* Convert message to UTF16 text and write it with ReportEventW, but
1862-
* fall-back into ReportEventA if conversion failed.
1877+
* If message character encoding matches the encoding expected by
1878+
* ReportEventA(), call it to avoid the hazards of conversion. Otherwise,
1879+
* try to convert the message to UTF16 and write it with ReportEventW().
1880+
* Fall back on ReportEventA() if conversion failed.
18631881
*
18641882
* Also verify that we are not on our way into error recursion trouble due
1865-
* to error messages thrown deep insidepgwin32_toUTF16().
1883+
* to error messages thrown deep insidepgwin32_message_to_UTF16().
18661884
*/
1867-
if (GetDatabaseEncoding()!=GetPlatformEncoding()&&
1868-
!in_error_recursion_trouble())
1885+
if (!in_error_recursion_trouble()&&
1886+
GetMessageEncoding()!=GetACPEncoding())
18691887
{
1870-
utf16=pgwin32_toUTF16(line,len,NULL);
1888+
utf16=pgwin32_message_to_UTF16(line,len,NULL);
18711889
if (utf16)
18721890
{
18731891
ReportEventW(evtHandle,
@@ -1879,6 +1897,7 @@ write_eventlog(int level, const char *line, int len)
18791897
0,
18801898
(LPCWSTR*)&utf16,
18811899
NULL);
1900+
/* XXX Try ReportEventA() when ReportEventW() fails? */
18821901

18831902
pfree(utf16);
18841903
return;
@@ -1904,22 +1923,30 @@ write_console(const char *line, int len)
19041923
#ifdefWIN32
19051924

19061925
/*
1907-
* WriteConsoleW() will fail if stdout is redirected, so just fall through
1926+
* Try to convert the message to UTF16 and write it with WriteConsoleW().
1927+
* Fall back on write() if anything fails.
1928+
*
1929+
* In contrast to write_eventlog(), don't skip straight to write() based
1930+
* on the applicable encodings. Unlike WriteConsoleW(), write() depends
1931+
* on the suitability of the console output code page. Since we put
1932+
* stderr into binary mode in SubPostmasterMain(), write() skips the
1933+
* necessary translation anyway.
1934+
*
1935+
* WriteConsoleW() will fail if stderr is redirected, so just fall through
19081936
* to writing unconverted to the logfile in this case.
19091937
*
19101938
* Since we palloc the structure required for conversion, also fall
19111939
* through to writing unconverted if we have not yet set up
19121940
* CurrentMemoryContext.
19131941
*/
1914-
if (GetDatabaseEncoding()!=GetPlatformEncoding()&&
1915-
!in_error_recursion_trouble()&&
1942+
if (!in_error_recursion_trouble()&&
19161943
!redirection_done&&
19171944
CurrentMemoryContext!=NULL)
19181945
{
19191946
WCHAR*utf16;
19201947
intutf16len;
19211948

1922-
utf16=pgwin32_toUTF16(line,len,&utf16len);
1949+
utf16=pgwin32_message_to_UTF16(line,len,&utf16len);
19231950
if (utf16!=NULL)
19241951
{
19251952
HANDLEstdHandle;

‎src/backend/utils/init/postinit.c

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -357,11 +357,6 @@ CheckMyDatabase(const char *name, bool am_superuser)
357357
SetConfigOption("lc_collate",collate,PGC_INTERNAL,PGC_S_OVERRIDE);
358358
SetConfigOption("lc_ctype",ctype,PGC_INTERNAL,PGC_S_OVERRIDE);
359359

360-
/* Use the right encoding in translated messages */
361-
#ifdefENABLE_NLS
362-
pg_bind_textdomain_codeset(textdomain(NULL));
363-
#endif
364-
365360
ReleaseSysCache(tup);
366361
}
367362

‎src/backend/utils/mb/encnames.c

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -352,10 +352,13 @@ pg_enc2name pg_enc2name_tbl[] =
352352

353353
/* ----------
354354
* These are encoding names for gettext.
355+
*
356+
* This covers all encodings except MULE_INTERNAL, which is alien to gettext.
355357
* ----------
356358
*/
357359
pg_enc2gettextpg_enc2gettext_tbl[]=
358360
{
361+
{PG_SQL_ASCII,"US-ASCII"},
359362
{PG_UTF8,"UTF-8"},
360363
{PG_LATIN1,"LATIN1"},
361364
{PG_LATIN2,"LATIN2"},
@@ -389,6 +392,13 @@ pg_enc2gettext pg_enc2gettext_tbl[] =
389392
{PG_EUC_KR,"EUC-KR"},
390393
{PG_EUC_TW,"EUC-TW"},
391394
{PG_EUC_JIS_2004,"EUC-JP"},
395+
{PG_SJIS,"SHIFT-JIS"},
396+
{PG_BIG5,"BIG5"},
397+
{PG_GBK,"GBK"},
398+
{PG_UHC,"UHC"},
399+
{PG_GB18030,"GB18030"},
400+
{PG_JOHAB,"JOHAB"},
401+
{PG_SHIFT_JIS_2004,"SHIFT_JISX0213"},
392402
{0,NULL}
393403
};
394404

‎src/backend/utils/mb/mbutils.c

Lines changed: 93 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -53,11 +53,11 @@ static FmgrInfo *ToServerConvProc = NULL;
5353
staticFmgrInfo*ToClientConvProc=NULL;
5454

5555
/*
56-
* These variables track the currentlyselected FE and BE encodings.
56+
* These variables track the currently-selected encodings.
5757
*/
5858
staticpg_enc2name*ClientEncoding=&pg_enc2name_tbl[PG_SQL_ASCII];
5959
staticpg_enc2name*DatabaseEncoding=&pg_enc2name_tbl[PG_SQL_ASCII];
60-
staticpg_enc2name*PlatformEncoding=NULL;
60+
staticpg_enc2name*MessageEncoding=&pg_enc2name_tbl[PG_SQL_ASCII];
6161

6262
/*
6363
* During backend startup we can't set client encoding because we (a)
@@ -881,46 +881,102 @@ SetDatabaseEncoding(int encoding)
881881
Assert(DatabaseEncoding->encoding==encoding);
882882
}
883883

884-
/*
885-
* Bind gettext to the codeset equivalent with the database encoding.
886-
*/
887884
void
888-
pg_bind_textdomain_codeset(constchar*domainname)
885+
SetMessageEncoding(intencoding)
889886
{
890-
#if defined(ENABLE_NLS)
891-
intencoding=GetDatabaseEncoding();
892-
inti;
887+
/* Some calls happen before we can elog()! */
888+
Assert(PG_VALID_ENCODING(encoding));
893889

894-
/*
895-
* gettext() uses the codeset specified by LC_CTYPE by default, so if that
896-
* matches the database encoding we don't need to do anything. In CREATE
897-
* DATABASE, we enforce or trust that the locale's codeset matches
898-
* database encoding, except for the C locale. In C locale, we bind
899-
* gettext() explicitly to the right codeset.
900-
*
901-
* On Windows, though, gettext() tends to get confused so we always bind
902-
* it.
903-
*/
904-
#ifndefWIN32
905-
constchar*ctype=setlocale(LC_CTYPE,NULL);
890+
MessageEncoding=&pg_enc2name_tbl[encoding];
891+
Assert(MessageEncoding->encoding==encoding);
892+
}
906893

907-
if (pg_strcasecmp(ctype,"C")!=0&&pg_strcasecmp(ctype,"POSIX")!=0)
908-
return;
909-
#endif
894+
#ifdefENABLE_NLS
895+
/*
896+
* Make one bind_textdomain_codeset() call, translating a pg_enc to a gettext
897+
* codeset. Fails for MULE_INTERNAL, an encoding unknown to gettext; can also
898+
* fail for gettext-internal causes like out-of-memory.
899+
*/
900+
staticbool
901+
raw_pg_bind_textdomain_codeset(constchar*domainname,intencoding)
902+
{
903+
boolelog_ok= (CurrentMemoryContext!=NULL);
904+
inti;
910905

911906
for (i=0;pg_enc2gettext_tbl[i].name!=NULL;i++)
912907
{
913908
if (pg_enc2gettext_tbl[i].encoding==encoding)
914909
{
915910
if (bind_textdomain_codeset(domainname,
916-
pg_enc2gettext_tbl[i].name)==NULL)
911+
pg_enc2gettext_tbl[i].name)!=NULL)
912+
return true;
913+
914+
if (elog_ok)
917915
elog(LOG,"bind_textdomain_codeset failed");
916+
else
917+
write_stderr("bind_textdomain_codeset failed");
918+
918919
break;
919920
}
920921
}
922+
923+
return false;
924+
}
925+
926+
/*
927+
* Bind a gettext message domain to the codeset corresponding to the database
928+
* encoding. For SQL_ASCII, instead bind to the codeset implied by LC_CTYPE.
929+
* Return the MessageEncoding implied by the new settings.
930+
*
931+
* On most platforms, gettext defaults to the codeset implied by LC_CTYPE.
932+
* When that matches the database encoding, we don't need to do anything. In
933+
* CREATE DATABASE, we enforce or trust that the locale's codeset matches the
934+
* database encoding, except for the C locale. (On Windows, we also permit a
935+
* discrepancy under the UTF8 encoding.) For the C locale, explicitly bind
936+
* gettext to the right codeset.
937+
*
938+
* On Windows, gettext defaults to the Windows ANSI code page. This is a
939+
* convenient departure for software that passes the strings to Windows ANSI
940+
* APIs, but we don't do that. Compel gettext to use database encoding or,
941+
* failing that, the LC_CTYPE encoding as it would on other platforms.
942+
*
943+
* This function is called before elog() and palloc() are usable.
944+
*/
945+
int
946+
pg_bind_textdomain_codeset(constchar*domainname)
947+
{
948+
boolelog_ok= (CurrentMemoryContext!=NULL);
949+
intencoding=GetDatabaseEncoding();
950+
intnew_msgenc;
951+
952+
#ifndefWIN32
953+
constchar*ctype=setlocale(LC_CTYPE,NULL);
954+
955+
if (pg_strcasecmp(ctype,"C")==0||pg_strcasecmp(ctype,"POSIX")==0)
921956
#endif
957+
if (encoding!=PG_SQL_ASCII&&
958+
raw_pg_bind_textdomain_codeset(domainname,encoding))
959+
returnencoding;
960+
961+
new_msgenc=pg_get_encoding_from_locale(NULL,elog_ok);
962+
if (new_msgenc<0)
963+
new_msgenc=PG_SQL_ASCII;
964+
965+
#ifdefWIN32
966+
if (!raw_pg_bind_textdomain_codeset(domainname,new_msgenc))
967+
/* On failure, the old message encoding remains valid. */
968+
returnGetMessageEncoding();
969+
#endif
970+
971+
returnnew_msgenc;
922972
}
973+
#endif
923974

975+
/*
976+
* The database encoding, also called the server encoding, represents the
977+
* encoding of data stored in text-like data types. Affected types include
978+
* cstring, text, varchar, name, xml, and json.
979+
*/
924980
int
925981
GetDatabaseEncoding(void)
926982
{
@@ -949,19 +1005,17 @@ pg_client_encoding(PG_FUNCTION_ARGS)
9491005
returnDirectFunctionCall1(namein,CStringGetDatum(ClientEncoding->name));
9501006
}
9511007

1008+
/*
1009+
* gettext() returns messages in this encoding. This often matches the
1010+
* database encoding, but it differs for SQL_ASCII databases, for processes
1011+
* not attached to a database, and under a database encoding lacking iconv
1012+
* support (MULE_INTERNAL).
1013+
*/
9521014
int
953-
GetPlatformEncoding(void)
1015+
GetMessageEncoding(void)
9541016
{
955-
if (PlatformEncoding==NULL)
956-
{
957-
/* try to determine encoding of server's environment locale */
958-
intencoding=pg_get_encoding_from_locale("", true);
959-
960-
if (encoding<0)
961-
encoding=PG_SQL_ASCII;
962-
PlatformEncoding=&pg_enc2name_tbl[encoding];
963-
}
964-
returnPlatformEncoding->encoding;
1017+
Assert(MessageEncoding);
1018+
returnMessageEncoding->encoding;
9651019
}
9661020

9671021
#ifdefWIN32
@@ -971,13 +1025,13 @@ GetPlatformEncoding(void)
9711025
* is also passed to utf16len if not null. Returns NULL iff failed.
9721026
*/
9731027
WCHAR*
974-
pgwin32_toUTF16(constchar*str,intlen,int*utf16len)
1028+
pgwin32_message_to_UTF16(constchar*str,intlen,int*utf16len)
9751029
{
9761030
WCHAR*utf16;
9771031
intdstlen;
9781032
UINTcodepage;
9791033

980-
codepage=pg_enc2name_tbl[GetDatabaseEncoding()].codepage;
1034+
codepage=pg_enc2name_tbl[GetMessageEncoding()].codepage;
9811035

9821036
/*
9831037
* Use MultiByteToWideChar directly if there is a corresponding codepage,
@@ -994,7 +1048,7 @@ pgwin32_toUTF16(const char *str, int len, int *utf16len)
9941048
char*utf8;
9951049

9961050
utf8= (char*)pg_do_encoding_conversion((unsignedchar*)str,
997-
len,GetDatabaseEncoding(),PG_UTF8);
1051+
len,GetMessageEncoding(),PG_UTF8);
9981052
if (utf8!=str)
9991053
len=strlen(utf8);
10001054

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp