NotificationsYou must be signed in to change notification settings
Fork6
Star31

Commit3ff1588

committed

Perform conversion from Python unicode to string/bytes object via UTF-8.

We used to convert the unicode object directly to a string in the serverencoding by calling Python's PyUnicode_AsEncodedString function. In otherwords, we used Python's routines to do the encoding. However, that has afew problems. First of all, it required keeping a mapping table of Pythonencoding names and PostgreSQL encodings. But the real killer was that Pythondoesn't support EUC_TW and MULE_INTERNAL encodings at all.Instead, convert the Python unicode object to UTF-8, and use PostgreSQL'sencoding conversion functions to convert from UTF-8 to server encoding. Wewere already doing the same in the other direction in PLyUnicode_FromString,so this is more consistent, too.Note: This makes SQL_ASCII to behave more leniently. We used to mapSQL_ASCII to Python's 'ascii', which on Python means strict 7-bit ASCIIonly, so you got an error if the python string contained anything but pureASCII. You no longer get an error; you get the UTF-8 representation of thestring instead.Backpatch to 9.0, where these conversions were introduced.Jan Urbański

1 parent149ac7d commit3ff1588Copy full SHA for 3ff1588

File tree

2 files changed

+44

-108

lines changed

src/pl/plpython
- expected
  - plpython_unicode_3.out
- plpy_util.c

2 files changed

+44

-108

lines changed

`‎src/pl/plpython/expected/plpython_unicode_3.out`

Lines changed: 0 additions & 54 deletions

This file was deleted.

`‎src/pl/plpython/plpy_util.c`

Lines changed: 44 additions & 54 deletions

Original file line number	Diff line number	Diff line change
`@@ -61,66 +61,56 @@ PLy_free(void *ptr)`
`61`	`61`	`PyObject*`
`62`	`62`	`PLyUnicode_Bytes(PyObject*unicode)`
`63`	`63`	`{`
`64`		`-PyObject*rv;`
`65`		`-constchar*serverenc;`
	`64`	`+PyObjectbytes,rv;`
	`65`	`+charutf8string,encoded;`
	`66`	`+`
	`67`	`+/* First encode the Python unicode object with UTF-8. */`
	`68`	`+bytes=PyUnicode_AsUTF8String(unicode);`
	`69`	`+if (bytes==NULL)`
	`70`	`+PLy_elog(ERROR,"could not convert Python Unicode object to bytes");`
	`71`	`+`
	`72`	`+utf8string=PyBytes_AsString(bytes);`
	`73`	`+if (utf8string==NULL) {`
	`74`	`+Py_DECREF(bytes);`
	`75`	`+PLy_elog(ERROR,"could not extract bytes from encoded string");`
	`76`	`+}`
`66`	`77`
`67`	`78`	`/*`
`68`		`- * Map PostgreSQL encoding to a Python encoding name.`
	`79`	`+ * Then convert to server encoding if necessary.`
	`80`	`+ *`
	`81`	`+ * PyUnicode_AsEncodedString could be used to encode the object directly`
	`82`	`+ * in the server encoding, but Python doesn't support all the encodings`
	`83`	`+ * that PostgreSQL does (EUC_TW and MULE_INTERNAL). UTF-8 is used as an`
	`84`	`+ * intermediary in PLyUnicode_FromString as well.`
`69`	`85`	`*/`
`70`		`-switch (GetDatabaseEncoding())`
	`86`	`+if (GetDatabaseEncoding()!=PG_UTF8)`
`71`	`87`	`{`
`72`		`-casePG_SQL_ASCII:`
`73`		`-/*`
`74`		`- * Mapping SQL_ASCII to Python's 'ascii' is a bit bogus. Python's`
`75`		`- * 'ascii' means true 7-bit only ASCII, while PostgreSQL's`
`76`		`- * SQL_ASCII means that anything is allowed, and the system doesn't`
`77`		`- * try to interpret the bytes in any way. But not sure what else`
`78`		`- * to do, and we haven't heard any complaints...`
`79`		`- */`
`80`		`-serverenc="ascii";`
`81`		`-break;`
`82`		`-casePG_WIN1250:`
`83`		`-serverenc="cp1250";`
`84`		`-break;`
`85`		`-casePG_WIN1251:`
`86`		`-serverenc="cp1251";`
`87`		`-break;`
`88`		`-casePG_WIN1252:`
`89`		`-serverenc="cp1252";`
`90`		`-break;`
`91`		`-casePG_WIN1253:`
`92`		`-serverenc="cp1253";`
`93`		`-break;`
`94`		`-casePG_WIN1254:`
`95`		`-serverenc="cp1254";`
`96`		`-break;`
`97`		`-casePG_WIN1255:`
`98`		`-serverenc="cp1255";`
`99`		`-break;`
`100`		`-casePG_WIN1256:`
`101`		`-serverenc="cp1256";`
`102`		`-break;`
`103`		`-casePG_WIN1257:`
`104`		`-serverenc="cp1257";`
`105`		`-break;`
`106`		`-casePG_WIN1258:`
`107`		`-serverenc="cp1258";`
`108`		`-break;`
`109`		`-casePG_WIN866:`
`110`		`-serverenc="cp866";`
`111`		`-break;`
`112`		`-casePG_WIN874:`
`113`		`-serverenc="cp874";`
`114`		`-break;`
`115`		`-default:`
`116`		`-/* Other encodings have the same name in Python. */`
`117`		`-serverenc=GetDatabaseEncodingName();`
`118`		`-break;`
	`88`	`+PG_TRY();`
	`89`	`+{`
	`90`	`+encoded= (char*)pg_do_encoding_conversion(`
	`91`	`+(unsignedchar*)utf8string,`
	`92`	`+strlen(utf8string),`
	`93`	`+PG_UTF8,`
	`94`	`+GetDatabaseEncoding());`
	`95`	`+}`
	`96`	`+PG_CATCH();`
	`97`	`+{`
	`98`	`+Py_DECREF(bytes);`
	`99`	`+PG_RE_THROW();`
	`100`	`+}`
	`101`	`+PG_END_TRY();`
`119`	`102`	`}`
	`103`	`+else`
	`104`	`+encoded=utf8string;`
	`105`	`+`
	`106`	`+/* finally, build a bytes object in the server encoding */`
	`107`	`+rv=PyBytes_FromStringAndSize(encoded,strlen(encoded));`
	`108`	`+`
	`109`	`+/* if pg_do_encoding_conversion allocated memory, free it now */`
	`110`	`+if (utf8string!=encoded)`
	`111`	`+pfree(encoded);`
`120`	`112`
`121`		`-rv=PyUnicode_AsEncodedString(unicode,serverenc,"strict");`
`122`		`-if (rv==NULL)`
`123`		`-PLy_elog(ERROR,"could not convert Python Unicode object to PostgreSQL server encoding");`
	`113`	`+Py_DECREF(bytes);`
`124`	`114`	`returnrv;`
`125`	`115`	`}`
`126`	`116`

0 commit comments

Comments

(0)

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit3ff1588

File tree

2 files changed

2 files changed

`‎src/pl/plpython/expected/plpython_unicode_3.out`

`‎src/pl/plpython/plpy_util.c`

0 commit comments