NotificationsYou must be signed in to change notification settings
Fork6
Star31

Commit5b1f92e

committed

Update multi-byte support README

1 parent853cf66 commit5b1f92eCopy full SHA for 5b1f92e

File tree

1 file changed

+103

-75

lines changed

doc
- README.mb

1 file changed

+103

-75

lines changed

`‎doc/README.mb‎`

Lines changed: 103 additions & 75 deletions

Original file line number	Diff line number	Diff line change
`@@ -1,20 +1,20 @@`
`1`		`-postgresql 6.5.1 multi-byte (MB) support READMEJuly 11 1999`
	`1`	`+PostgreSQL 7.0 multi-byte (MB) support READMEMar 22 2000`
`2`	`2`
`3`	`3`	`Tatsuo Ishii`
`4`		`-t-ishii@sra.co.jp`
	`4`	`+ishii@postgresql.org`
`5`	`5`	`http://www.sra.co.jp/people/t-ishii/PostgreSQL/`
`6`	`6`
`7`	`7`	`0. Introduction`
`8`	`8`
`9`	`9`	`The MB support is intended for allowing PostgreSQL to handle`
`10`	`10`	`multi-byte character sets such as EUC(Extended Unix Code), Unicode and`
`11`	`11`	`Mule internal code. With the MB enabled you can use multi-byte`
`12`		`-character sets in regexp ,LIKE and some functions. The default`
	`12`	`+character sets in regexp ,LIKE and someotherfunctions. The default`
`13`	`13`	`encoding system chosen is determined while initializing your`
`14`	`14`	`PostgreSQL installation using initdb(1). Note that this can be`
`15`		`-overridden when you create a database using createdb(1) orcreate`
`16`		`-database SQL command. So you could have multiple databases with`
`17`		`-different encodingsystems.`
	`15`	`+overridden when you create a database using createdb(1) orby using a`
	`16`	`+createdatabase SQL command. So you could have multiple databases with`
	`17`	`+eachdifferent encodingsystem.`
`18`	`18`
`19`	`19`	`MB also fixes some problems concerning with 8-bit single byte`
`20`	`20`	`character sets including ISO8859. (I would not say all of problems`
`@@ -24,11 +24,11 @@ me know if you find any problem while using 8-bit characters)`
`24`	`24`
`25`	`25`	`1. How to use`
`26`	`26`
`27`		`-run configure withthe mb option:`
	`27`	`+run configure witha multibyte option:`
`28`	`28`
`29`		`-% configure --with-mb=encoding_system`
	`29`	`+%./configure --enable-multibyte[=encoding_system]`
`30`	`30`
`31`		`-where encoding_system is one of:`
	`31`	`+wheretheencoding_system is one of:`
`32`	`32`
`33`	`33`	`SQL_ASCIIASCII`
`34`	`34`	`EUC_JPJapanese EUC`
`@@ -48,21 +48,21 @@ where encoding_system is one of:`
`48`	`48`
`49`	`49`	`Example:`
`50`	`50`
`51`		`-% configure --with-mb=EUC_JP`
	`51`	`+%./configure --enable-multibyte=EUC_JP`
`52`	`52`
`53`		`-IfMB is disabled, nothingischanged except better supporting for`
`54`		`-8-bit single byte character sets.`
	`53`	`+Ifthe encoding systemisomitted (./configure --enable-multibyte),`
	`54`	`+SQL_ASCII is assumed.`
`55`	`55`
`56`		`-2. How to set encoding`
	`56`	`+2. How to settheencoding`
`57`	`57`
`58`	`58`	`initdb command defines the default encoding for a PostgreSQL`
`59`	`59`	`installation. For example:`
`60`	`60`
`61`		`-% initdb -e EUC_JP`
	`61`	`+% initdb -E EUC_JP`
`62`	`62`
`63`	`63`	`sets the default encoding to EUC_JP(Extended Unix Code for Japanese).`
`64`		`-Note that you can use "-pgencoding" instead of "-e" if you like longer`
`65`		`-option string:-) If no -e or -pgencoding option is given, the encoding`
	`64`	`+Note that you can use "--encoding" instead of "-E" if you like longer`
	`65`	`+option string:-) If no -E or --encoding option is given, the encoding`
`66`	`66`	`specified at the compile time is used.`
`67`	`67`
`68`	`68`	`You can create a database with a different encoding.`
`@@ -75,78 +75,85 @@ another way to accomplish this is to use a SQL command:`
`75`	`75`	`CREATE DATABASE korean WITH ENCODING = 'EUC_KR';`
`76`	`76`
`77`	`77`	`The encoding for a database is represented as "encoding" column in the`
`78`		`-pg_database system catalog.`
	`78`	`+pg_database system catalog. You can see that by using -l or \l of psql`
	`79`	`+command.`
`79`	`80`
`80`		`-datname \|datdba\|encoding\|datpath`
`81`		`--------------+------+--------+-------------`
`82`		`-template1 \| 1739\| 1\|template1`
`83`		`-postgres \| 1739\| 0\|postgres`
`84`		`-euc_jp \| 1739\| 1\|euc_jp`
`85`		`-euc_kr \| 1739\| 3\|euc_kr`
`86`		`-euc_cn \| 1739\| 2\|euc_cn`
`87`		`-unicode \| 1739\| 5\|unicode`
`88`		`-mule_internal\| 1739\| 6\|mule_internal`
	`81`	`+$ psql -l`
	`82`	`+ List of databases`
	`83`	`+ Database \| Owner \| Encoding`
	`84`	`+---------------+---------+---------------`
	`85`	`+ euc_cn \| t-ishii \| EUC_CN`
	`86`	`+ euc_jp \| t-ishii \| EUC_JP`
	`87`	`+ euc_kr \| t-ishii \| EUC_KR`
	`88`	`+ euc_tw \| t-ishii \| EUC_TW`
	`89`	`+ mule_internal \| t-ishii \| MULE_INTERNAL`
	`90`	`+ regression \| t-ishii \| SQL_ASCII`
	`91`	`+ template1 \| t-ishii \| EUC_JP`
	`92`	`+ test \| t-ishii \| EUC_JP`
	`93`	`+ unicode \| t-ishii \| UNICODE`
	`94`	`+(9 rows)`
`89`	`95`
`90`		`-A number in the encoding column is "encoding id" and can be translated`
`91`		`-to the encoding name using pg_encoding command.`
	`96`	`+3. Automatic encoding translation between backend and frontend`
`92`	`97`
`93`		`-$ pg_encoding 1`
`94`		`-EUC_JP`
	`98`	`+PostgreSQL supports an automatic encoding translation between backend`
	`99`	`+and frontend for some encodings.`
`95`	`100`
`96`		`-If an argument to pg_encoding is not a number, then it is regarded as`
`97`		`-an encoding name and pg_encoding will return the encoding id.`
	`101`	`+ encoding of backendavailable encoding of frontend`
	`102`	`+ --------------------------------------------------------------------`
	`103`	`+EUC_JPEUC_JP, SJIS`
	`104`	`+`
	`105`	`+EUC_TWEUC_TW, BIG5`
	`106`	`+`
	`107`	`+ LATIN2LATIN2, WIN1250`
	`108`	`+`
	`109`	`+LATIN5LATIN5, WIN, ALT`
	`110`	`+`
	`111`	`+MULE_INTERNALEUC_JP, SJIS, EUC_KR, EUC_CN,`
	`112`	`+EUC_TW, BIG5, LATIN1 to LATIN5,`
	`113`	`+WIN, ALT, WIN1250`
`98`	`114`
`99`		`-$ pg_encoding EUC_JP`
`100`		`-1`
	`115`	`+To enable the automatic encoding translation, you have to tell`
	`116`	`+PostgreSQL the encoding you would like to use in frontend. There are`
	`117`	`+several ways to accomplish this.`
`101`	`118`
`102`		`-3. PGCLIENTENCODING`
	`119`	`+o using \encoding command in psql`
`103`	`120`
`104`		`-If an environment variable PGCLIENTENCODING is defined on the`
`105`		`-frontend, automatic encoding translation is done by the backend. For`
`106`		`-example, if the backend has been compiled with MB=EUC_JP and`
`107`		`-PGCLIENTENCODING=SJIS(Shift JIS: yet another Japanese encoding`
`108`		`-system), then any SJIS strings coming from the frontend would be`
`109`		`-translated to EUC_JP before going into the parser. Outputs from the`
`110`		`-backend would be translated to SJIS of course.`
	`121`	`+\encoding allows you to change frontend encoding on the fly. For`
	`122`	`+example, to change the encoding to SJIS, type:`
`111`	`123`
`112`		`-Supported encodings for PGCLIENTENCODING are:`
	`124`	`+\encoding SJIS`
`113`	`125`
`114`		`-SQL_ASCIIASCII`
`115`		`-EUC_JPJapanese EUC`
`116`		`-SJISYet another Japanese encoding`
`117`		`-EUC_CNChinese EUC`
`118`		`-EUC_KRKorean EUC`
`119`		`-EUC_TWTaiwan EUC`
`120`		`-BIG5Traditional Chinese`
`121`		`-MULE_INTERNALMule internal`
`122`		`-LATIN1ISO 8859-1 English and some European languages`
`123`		`-LATIN2ISO 8859-2 English and some European languages`
`124`		`-LATIN3ISO 8859-3 English and some European languages`
`125`		`-LATIN4ISO 8859-4 English and some European languages`
`126`		`-LATIN5ISO 8859-5 English and some European languages`
`127`		`-KOI8KOI8-R`
`128`		`-WINWindows CP1251`
`129`		`-ALTWindows CP866`
`130`		`-WIN1250Windows CP1250 (Czech)`
	`126`	`+o using libpq functions`
`131`	`127`
`132`		`-Note that UNICODE is not supported(yet). Also note that the`
`133`		`-translation is not always possible. Suppose you choose EUC_JP for the`
`134`		`-backend, LATIN1 for the frontend, then some Japanese characters cannot`
`135`		`-be translated into latin. In this case, a letter cannot be represented`
`136`		`-in the Latin character set, would be transformed as:`
	`128`	`+\encoding actually calls PQsetClientEncoding() for its purpose.`
`137`	`129`
`138`		`-(HEXA DECIMAL)`
	`130`	`+ int PQsetClientEncoding(PGconn conn, const char encoding)`
	`131`	`+`
	`132`	`+conn is a connection to the backend, and encoding is an encoding you`
	`133`	`+want to use. If it successfully sets the encoding, it returns 0,`
	`134`	`+otherwise -1. The current encoding for this connection can be shown by`
	`135`	`+using:`
	`136`	`+`
	`137`	`+ int PQclientEncoding(const PGconn *conn)`
	`138`	`+`
	`139`	`+Note that it returns the "encoding id," not the encoding symbol string`
	`140`	`+such as "EUC_JP." To convert an encoding id to an encoding symbol, you`
	`141`	`+can use:`
	`142`	`+`
	`143`	`+char *pg_encoding_to_char(int encoding_id)`
	`144`	`+`
	`145`	`+o using PGCLIENTENCODING`
	`146`	`+`
	`147`	`+If an environment variable PGCLIENTENCODING is defined in the`
	`148`	`+frontend, an automatic encoding translation is done by the backend.`
`139`	`149`
`140`		`-3. SET CLIENT_ENCODING TO command`
	`150`	`+o using SET CLIENT_ENCODING TO command`
`141`	`151`
`142`		`-Actually setting the frontend side encoding information is done by a`
`143`		`-new command:`
	`152`	`+Setting the frontend side encoding can be done a SQL command:`
`144`	`153`
`145`	`154`	`SET CLIENT_ENCODING TO 'encoding';`
`146`	`155`
`147`		`-where encoding is one of the encodings those can be set to`
`148`		`-PGCLIENTENCODING. Also you can use SQL92 syntax "SET NAMES" for this`
`149`		`-purpose:`
	`156`	`+Also you can use SQL92 syntax "SET NAMES" for this purpose:`
`150`	`157`
`151`	`158`	`SET NAMES 'encoding';`
`152`	`159`
`@@ -158,10 +165,21 @@ To return to the default encoding:`
`158`	`165`
`159`	`166`	`RESET CLIENT_ENCODING;`
`160`	`167`
`161`		`-This would reset the frontend encoding to same as the backend`
`162`		`-encoding, thus no encoding translation would be performed.`
	`168`	`+4. About Unicode`
`163`	`169`
`164`		`-4. References`
	`170`	`+An automatic encoding translation between Unicode and any other`
	`171`	`+encodings is not supported (yet).`
	`172`	`+`
	`173`	`+5. What happens if the translation is not possible?`
	`174`	`+`
	`175`	`+Suppose you choose EUC_JP for the backend, LATIN1 for the frontend,`
	`176`	`+then some Japanese characters could not be translated into LATIN1. In`
	`177`	`+this case, a letter cannot be represented in the LATIN1 character set,`
	`178`	`+would be transformed as:`
	`179`	`+`
	`180`	`+(HEXA DECIMAL)`
	`181`	`+`
	`182`	`+6. References`
`165`	`183`
`166`	`184`	`These are good sources to start learning various kind of encoding`
`167`	`185`	`systems.`
`@@ -178,6 +196,16 @@ Unicode: http://www.unicode.org/`
`178`	`196`
`179`	`197`	`5. History`
`180`	`198`
	`199`	`+Mar 22, 2000`
	`200`	`+* Add new libpq functions PQsetClientEncoding, PQclientEncoding`
	`201`	`+* ./configure --with-mb=EUC_JP`
	`202`	`+ now deprecated. use`
	`203`	`+ ./configure --enable-multibyte=EUC_JP`
	`204`	`+ instead`
	`205`	`+ * Add SQL_ASCII regression test case`
	`206`	`+* Add SJIS User Defined Character (UDC) support`
	`207`	`+* All of above will appear in 7.0`
	`208`	`+`
`181`	`209`	`July 11, 1999`
`182`	`210`	`* Add support for WIN1250 (Windows Czech) as a client encoding`
`183`	`211`	`(contributed by Pavel Behal)`

0 commit comments

Comments

(0)

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Commit5b1f92e

File tree

1 file changed

1 file changed

`‎doc/README.mb‎`

0 commit comments