Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commit3ff1588

Browse files
committed
Perform conversion from Python unicode to string/bytes object via UTF-8.
We used to convert the unicode object directly to a string in the serverencoding by calling Python's PyUnicode_AsEncodedString function. In otherwords, we used Python's routines to do the encoding. However, that has afew problems. First of all, it required keeping a mapping table of Pythonencoding names and PostgreSQL encodings. But the real killer was that Pythondoesn't support EUC_TW and MULE_INTERNAL encodings at all.Instead, convert the Python unicode object to UTF-8, and use PostgreSQL'sencoding conversion functions to convert from UTF-8 to server encoding. Wewere already doing the same in the other direction in PLyUnicode_FromString,so this is more consistent, too.Note: This makes SQL_ASCII to behave more leniently. We used to mapSQL_ASCII to Python's 'ascii', which on Python means strict 7-bit ASCIIonly, so you got an error if the python string contained anything but pureASCII. You no longer get an error; you get the UTF-8 representation of thestring instead.Backpatch to 9.0, where these conversions were introduced.Jan Urbański
1 parent149ac7d commit3ff1588

File tree

2 files changed

+44
-108
lines changed

2 files changed

+44
-108
lines changed

‎src/pl/plpython/expected/plpython_unicode_3.out

Lines changed: 0 additions & 54 deletions
This file was deleted.

‎src/pl/plpython/plpy_util.c

Lines changed: 44 additions & 54 deletions
Original file line numberDiff line numberDiff line change
@@ -61,66 +61,56 @@ PLy_free(void *ptr)
6161
PyObject*
6262
PLyUnicode_Bytes(PyObject*unicode)
6363
{
64-
PyObject*rv;
65-
constchar*serverenc;
64+
PyObject*bytes,*rv;
65+
char*utf8string,*encoded;
66+
67+
/* First encode the Python unicode object with UTF-8. */
68+
bytes=PyUnicode_AsUTF8String(unicode);
69+
if (bytes==NULL)
70+
PLy_elog(ERROR,"could not convert Python Unicode object to bytes");
71+
72+
utf8string=PyBytes_AsString(bytes);
73+
if (utf8string==NULL) {
74+
Py_DECREF(bytes);
75+
PLy_elog(ERROR,"could not extract bytes from encoded string");
76+
}
6677

6778
/*
68-
* Map PostgreSQL encoding to a Python encoding name.
79+
* Then convert to server encoding if necessary.
80+
*
81+
* PyUnicode_AsEncodedString could be used to encode the object directly
82+
* in the server encoding, but Python doesn't support all the encodings
83+
* that PostgreSQL does (EUC_TW and MULE_INTERNAL). UTF-8 is used as an
84+
* intermediary in PLyUnicode_FromString as well.
6985
*/
70-
switch (GetDatabaseEncoding())
86+
if (GetDatabaseEncoding()!=PG_UTF8)
7187
{
72-
casePG_SQL_ASCII:
73-
/*
74-
* Mapping SQL_ASCII to Python's 'ascii' is a bit bogus. Python's
75-
* 'ascii' means true 7-bit only ASCII, while PostgreSQL's
76-
* SQL_ASCII means that anything is allowed, and the system doesn't
77-
* try to interpret the bytes in any way. But not sure what else
78-
* to do, and we haven't heard any complaints...
79-
*/
80-
serverenc="ascii";
81-
break;
82-
casePG_WIN1250:
83-
serverenc="cp1250";
84-
break;
85-
casePG_WIN1251:
86-
serverenc="cp1251";
87-
break;
88-
casePG_WIN1252:
89-
serverenc="cp1252";
90-
break;
91-
casePG_WIN1253:
92-
serverenc="cp1253";
93-
break;
94-
casePG_WIN1254:
95-
serverenc="cp1254";
96-
break;
97-
casePG_WIN1255:
98-
serverenc="cp1255";
99-
break;
100-
casePG_WIN1256:
101-
serverenc="cp1256";
102-
break;
103-
casePG_WIN1257:
104-
serverenc="cp1257";
105-
break;
106-
casePG_WIN1258:
107-
serverenc="cp1258";
108-
break;
109-
casePG_WIN866:
110-
serverenc="cp866";
111-
break;
112-
casePG_WIN874:
113-
serverenc="cp874";
114-
break;
115-
default:
116-
/* Other encodings have the same name in Python. */
117-
serverenc=GetDatabaseEncodingName();
118-
break;
88+
PG_TRY();
89+
{
90+
encoded= (char*)pg_do_encoding_conversion(
91+
(unsignedchar*)utf8string,
92+
strlen(utf8string),
93+
PG_UTF8,
94+
GetDatabaseEncoding());
95+
}
96+
PG_CATCH();
97+
{
98+
Py_DECREF(bytes);
99+
PG_RE_THROW();
100+
}
101+
PG_END_TRY();
119102
}
103+
else
104+
encoded=utf8string;
105+
106+
/* finally, build a bytes object in the server encoding */
107+
rv=PyBytes_FromStringAndSize(encoded,strlen(encoded));
108+
109+
/* if pg_do_encoding_conversion allocated memory, free it now */
110+
if (utf8string!=encoded)
111+
pfree(encoded);
120112

121-
rv=PyUnicode_AsEncodedString(unicode,serverenc,"strict");
122-
if (rv==NULL)
123-
PLy_elog(ERROR,"could not convert Python Unicode object to PostgreSQL server encoding");
113+
Py_DECREF(bytes);
124114
returnrv;
125115
}
126116

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp