Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commiteea348b

Browse files
committed
Add a README file for multi-byte. This file is contributed by
Chih-Chang Hsieh <cch@cc.kmu.edu.tw>, written in traditional Chinese(Big5).
1 parent7edff16 commiteea348b

File tree

1 file changed

+326
-0
lines changed

1 file changed

+326
-0
lines changed

‎doc/README.mb.big5

Lines changed: 326 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,326 @@
1+
PostgreSQL 7.0.1 multi-byte (MB) support README May 20 2000
2+
3+
Tatsuo Ishii
4+
ishii@postgresql.org
5+
http://www.sra.co.jp/people/t-ishii/PostgreSQL/
6+
7+
[��] 1. �P�¥ۤ��F�� (Tatsuo Ishii) ����!
8+
2. �����������ҵL, ��Ķ�Y�����~, ���p�� cch@cc.kmu.edu.tw
9+
10+
11+
0. ²��
12+
13+
MB �䴩�O���F�� PostgreSQL ��B�z�h�줸�զr�� (multi-byte character),
14+
�Ҧp: EUC (Extended Unix Code), Unicode (�Τ@�X) �M Mule internal code
15+
(�h��y�����X). �b MB ���䴩�U, �A�i�H�b���W���ܦ� (regexp), LIKE ��
16+
��L�@�Ǩ禡���ϥΦh�줸�զr��. �w�]���s�X�t�Υi���M��A�w�� PostgreSQL
17+
�ɪ� initdb(1) �R�O, ��i�� createdb(1) �R�O�Ϋإ߸�Ʈw�� SQL �R�O�M�w.
18+
�ҥH�A�i�H���h�Ӥ��P�s�X�t�Ϊ���Ʈw.
19+
20+
MB �䴩�]�ѨM�F�@�� 8 �줸��줸�զr���� (�]�t ISO-8859-1) ���������D,
21+
(�ڨèS�����Ҧ����������D���ѨM�F, �ڥu�O�T�{�F�j�k���հ��榨�\,
22+
�Ӥ@�Ǫk�y�r���b MB �׸ɤU�i�H�ϥ�. �p�G�A�b�ϥ� 8 �줸�r���ɵo�{�F
23+
������D, �гq����)
24+
25+
1. �p��ϥ�
26+
27+
�sĶ PostgreSQL �e, ���� configure �ɨϥ� multibyte ���ﶵ
28+
29+
% ./configure --enable-multibyte[=encoding_system]
30+
% ./configure --enable-multibyte[=�s�X�t��]
31+
32+
�䤤���s�X�t�Υi�H���w���U���䤤���@:
33+
34+
SQL_ASCIIASCII
35+
EUC_JPJapanese EUC
36+
EUC_CNChinese EUC
37+
EUC_KRKorean EUC
38+
EUC_TWTaiwan EUC
39+
UNICODEUnicode(UTF-8)
40+
MULE_INTERNALMule internal
41+
LATIN1ISO 8859-1 English and some European languages
42+
LATIN2ISO 8859-2 English and some European languages
43+
LATIN3ISO 8859-3 English and some European languages
44+
LATIN4ISO 8859-4 English and some European languages
45+
LATIN5ISO 8859-5 English and some European languages
46+
KOI8KOI8-R
47+
WINWindows CP1251
48+
ALTWindows CP866
49+
50+
�Ҧp:
51+
52+
% ./configure --enable-multibyte=EUC_JP
53+
54+
�p�G�ٲ����w�s�X�t��, ����w�]�ȴN�O SQL_ASCII.
55+
56+
2. �p��]�w�s�X
57+
58+
initdb �R�O�w�q PostgresSQL �w�˫᪺�w�]�s�X, �Ҧp:
59+
60+
% initdb -E EUC_JP
61+
62+
�N�w�]���s�X�]�w�� EUC_JP (Extended Unix Code for Japanese), �p�G�A���w
63+
�������r��, �A�]�i�H�� "--encoding" �Ӥ��� "-E". �p�G�S���ϥ� -E ��
64+
--encoding ���ﶵ, ����sö�ɪ��]�w�|�����w�]��.
65+
66+
�A�i�H�إߨϥΤ��P�s�X����Ʈw:
67+
68+
% createdb -E EUC_KR korean
69+
70+
�o�өR�O�|�إߤ@�ӥs�� "korean" ����Ʈw, �Ө�ĥ� EUC_KR �s�X.
71+
�t�~���@�Ӥ�k, �O�ϥ� SQL �R�O, �]�i�H�F��P�˪��ت�:
72+
73+
CREATE DATABASE korean WITH ENCODING = 'EUC_KR';
74+
75+
�b pg_database �t�γW��� (system catalog) �����@�� "encoding" �����,
76+
�N�O�ΨӬ����@�Ӹ�Ʈw���s�X. �A�i�H�� psql -l �ζi�J psql ��� \l ��
77+
�R�O�Ӭd�ݸ�Ʈw�ĥΦ�ؽs�X:
78+
79+
$ psql -l
80+
List of databases
81+
Database | Owner | Encoding
82+
---------------+---------+---------------
83+
euc_cn | t-ishii | EUC_CN
84+
euc_jp | t-ishii | EUC_JP
85+
euc_kr | t-ishii | EUC_KR
86+
euc_tw | t-ishii | EUC_TW
87+
mule_internal | t-ishii | MULE_INTERNAL
88+
regression | t-ishii | SQL_ASCII
89+
template1 | t-ishii | EUC_JP
90+
test | t-ishii | EUC_JP
91+
unicode | t-ishii | UNICODE
92+
(9 rows)
93+
94+
3. �e�ݻP��ݽs�X���۰��ഫ
95+
96+
[��: �e�ݪx���Ȥ�ݪ��{��, �i��O psql �R�O��Ķ��, �αĥ� libpq �� C
97+
�{��, Perl �{��, �Ϊ̬O�z�L ODBC ���������ε{��. �ӫ�ݴN�O�� PostgreSQL
98+
��Ʈw�����A�{��]
99+
100+
PostgreSQL �䴩�Y�ǽs�X�b�e�ݻP��ݶ����۰��ഫ: [��: �o�̩ҿת��۰�
101+
�ഫ�O���A�b�e�ݤΫ�ݩҫŧi�ĥΪ��s�X���P, ���u�n PostgreSQL �䴩�o
102+
��ؽs�X�����ഫ, ���򥦷|���A�b�s���e���ഫ]
103+
104+
encoding of backendavailable encoding of frontend
105+
--------------------------------------------------------------------
106+
EUC_JPEUC_JP, SJIS
107+
108+
EUC_TWEUC_TW, BIG5
109+
110+
LATIN2LATIN2, WIN1250
111+
112+
LATIN5LATIN5, WIN, ALT
113+
114+
MULE_INTERNALEUC_JP, SJIS, EUC_KR, EUC_CN,
115+
EUC_TW, BIG5, LATIN1 to LATIN5,
116+
WIN, ALT, WIN1250
117+
118+
�b�Ұʦ۰ʽs�X�ഫ���e, �A�����i�D PostgreSQL �A�n�b�e�ݱĥΦ�ؽs�X.
119+
���n�X�Ӥ�k�i�H�F��o�ӥت�:
120+
121+
o �b psql �R�O��Ķ�����ϥ� \encoding �o�өR�O
122+
123+
\encoding �o�өR�O�i�H���A���W�����e�ݽs�X, �Ҧp, �A�n�N�e�ݽs�X������ SJIS,
124+
�����:
125+
126+
\encoding SJIS
127+
128+
o �ϥ� libpq [��: PostgreSQL ��Ʈw�� C API �{���w] ���禡
129+
130+
psql �� \encoding �R�O���u�O�h�I�s PQsetClientEncoding() �o�Ө禡�ӹF��ت�.
131+
132+
int PQsetClientEncoding(PGconn *conn, const char *encoding)
133+
134+
�W���� conn �o�ӰѼƥN���@�ӹ��ݪ��s�u, encoding �o�ӰѼƭn��A�Q�Ϊ��s�X,
135+
���p�����\�a�]�w�F�s�X, �K�|�Ǧ^ 0 ��, ���Ѫ��ܶǦ^ -1. �ܩ�ثe�s�u���s�X�i
136+
�Q�ΥH�U�禡�d��:
137+
138+
int PQclientEncoding(const PGconn *conn)
139+
140+
�o�̭n�`�N���O: �o�Ө禡�Ǧ^���O�s�X���N�� (encoding id, �O�Ӿ�ƭ�),
141+
�Ӥ��O�s�X���W�٦r�� (�p "EUC_JP"), �p�G�A�n�ѽs�X�N���o���s�X�W��,
142+
�����I�s:
143+
144+
char *pg_encoding_to_char(int encoding_id)
145+
146+
o �ϥ� PGCLIENTENCODING �o�������ܼ�
147+
148+
�p�G�e�ݩ��]�w�F PGCLIENTENCODING �o�@�������ܼ�, �����ݷ|���s�X�۰��ഫ.
149+
150+
[��] PostgreSQL 7.0.0 ~ 7.0.3 ���� bug -- ���{�o�������ܼ�
151+
152+
o �ϥ� SET CLIENT_ENCODING TO �o�� SQL ���R�O
153+
154+
�n�]�w�e�ݪ��s�X�i�H�ΥH�U�o�� SQL �R�O:
155+
156+
SET CLIENT_ENCODING TO 'encoding';
157+
158+
�A�]�i�H�ϥ� SQL92 ���y�k "SET NAMES" �F��P�˪��ت�:
159+
160+
SET NAMES 'encoding';
161+
162+
�d�ߥثe���e�ݽs�X�i�H�ΥH�U�o�� SQL �R�O:
163+
164+
SHOW CLIENT_ENCODING;
165+
166+
��������ӹw�]���s�X, �ΥH�U�o�� SQL �R�O:
167+
168+
RESET CLIENT_ENCODING;
169+
170+
[��] �ϥ� psql �R�O��Ķ����, ��ij���n�γo�Ӥ�k, �Х� \encoding
171+
172+
4. ���� Unicode (�Τ@�X)
173+
174+
�Τ@�X�M��L�s�X�����ഫ�i��n�b 7.1 ����~�|��{.
175+
176+
5. �p�G�L�k�ഫ�|�o�ͤ����?
177+
178+
���]�A�b��ݿ�ܤF EUC_JP �o�ӽs�X, �e�ݨϥ� LATIN1, (�Y�Ǥ��r���L�k�ഫ��
179+
LATIN1) �b�o�Ӫ��p�U, �Y�Ӧr���Y�����ন LATIN1 �r����, �N�|�Q�ন�H�U������:
180+
181+
(�Q���i���)
182+
183+
6. �ѦҸ��
184+
185+
These are good sources to start learning various kind of encoding
186+
systems.
187+
188+
ftp://ftp.ora.com/pub/examples/nutshell/ujip/doc/cjk.inf
189+
Detailed explanations of EUC_JP, EUC_CN, EUC_KR, EUC_TW
190+
appear in section 3.2.
191+
192+
Unicode: http://www.unicode.org/
193+
The homepage of UNICODE.
194+
195+
RFC 2044
196+
UTF-8 is defined here.
197+
198+
5. History
199+
200+
May 20, 2000
201+
* SJIS UDC (NEC selection IBM kanji) support contributed
202+
by Eiji Tokuya
203+
* Changes above will appear in 7.0.1
204+
205+
Mar 22, 2000
206+
* Add new libpq functions PQsetClientEncoding, PQclientEncoding
207+
* ./configure --with-mb=EUC_JP
208+
now deprecated. use
209+
./configure --enable-multibyte=EUC_JP
210+
instead
211+
* Add SQL_ASCII regression test case
212+
* Add SJIS User Defined Character (UDC) support
213+
* All of above will appear in 7.0
214+
215+
July 11, 1999
216+
* Add support for WIN1250 (Windows Czech) as a client encoding
217+
(contributed by Pavel Behal)
218+
* fix some compiler warnings (contributed by Tomoaki Nishiyama)
219+
220+
Mar 23, 1999
221+
* Add support for KOI8(KOI8-R), WIN(CP1251), ALT(CP866)
222+
(thanks Oleg Broytmann for testing)
223+
* Fix problem with MB and locale
224+
225+
Jan 26, 1999
226+
* Add support for Big5 for fronend encoding
227+
(you need to create a database with EUC_TW to use Big5)
228+
* Add regression test case for EUC_TW
229+
(contributed by Jonah Kuo <jonahkuo@mail.ttn.com.tw>)
230+
231+
Dec 15, 1998
232+
* Bugs related to SQL_ASCII support fixed
233+
234+
Nov 5, 1998
235+
* 6.4 release. In this version, pg_database has "encoding"
236+
column that represents the database encoding
237+
238+
Jul 22, 1998
239+
* determine encoding at initdb/createdb rather than compile time
240+
* support for PGCLIENTENCODING when issuing COPY command
241+
* support for SQL92 syntax "SET NAMES"
242+
* support for LATIN2-5
243+
* add UNICODE regression test case
244+
* new test suite for MB
245+
* clean up source files
246+
247+
Jun 5, 1998
248+
* add support for the encoding translation between the backend
249+
and the frontend
250+
* new command SET CLIENT_ENCODING etc. added
251+
* add support for LATIN1 character set
252+
* enhance 8 bit cleaness
253+
254+
April 21, 1998 some enhancements/fixes
255+
* character_length(), position(), substring() are now aware of
256+
multi-byte characters
257+
* add octet_length()
258+
* add --with-mb option to configure
259+
* new regression tests for EUC_KR
260+
(contributed by "Soonmyung. Hong" <hong@lunaris.hanmesoft.co.kr>)
261+
* add some test cases to the EUC_JP regression test
262+
* fix problem in regress/regress.sh in case of System V
263+
* fix toupper(), tolower() to handle 8bit chars
264+
265+
Mar 25, 1998 MB PL2 is incorporated into PostgreSQL 6.3.1
266+
267+
Mar 10, 1998 PL2 released
268+
* add regression test for EUC_JP, EUC_CN and MULE_INTERNAL
269+
* add an English document (this file)
270+
* fix problems concerning 8-bit single byte characters
271+
272+
Mar 1, 1998 PL1 released
273+
274+
Appendix:
275+
276+
[Here is a good documentation explaining how to use WIN1250 on
277+
Windows/ODBC from Pavel Behal. Please note that Installation step 1)
278+
is not necceary in 6.5.1 -- Tatsuo]
279+
280+
Version: 0.91 for PgSQL 6.5
281+
Author: Pavel Behal
282+
Revised by: Tatsuo Ishii
283+
Email: behal@opf.slu.cz
284+
Licence: The Same as PostgreSQL
285+
286+
Sorry for my Eglish and C code, I'm not native :-)
287+
288+
!!!!!!!!!!!!!!!!!!!!!!!!! NO WARRANTY !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
289+
290+
Instalation:
291+
------------
292+
1) Change three affected files in source directories
293+
(I don't have time to create proper patch diffs, I don't know how)
294+
2) Compile with enabled locale and multibyte set to LATIN2
295+
3) Setup properly your instalation, do not forget to create locale
296+
variables in your profile (environment). Ex. (may not be exactly true):
297+
LC_ALL=cs_CZ.ISO8859-2
298+
LC_COLLATE=cs_CZ.ISO8859-2
299+
LC_CTYPE=cs_CZ.ISO8859-2
300+
LC_MONETARY=cs_CZ.ISO8859-2
301+
LC_NUMERIC=cs_CZ.ISO8859-2
302+
LC_TIME=cs_CZ.ISO8859-2
303+
4) You have to start the postmaster with locales set!
304+
5) Try it with Czech language, it have to sort
305+
5) Install ODBC driver for PgSQL into your M$ Windows
306+
6) Setup properly your data source. Include this line in your ODBC
307+
configuration dialog in field "Connect Settings:" :
308+
SET CLIENT_ENCODING = 'WIN1250';
309+
7) Now try it again, but in Windows with ODBC.
310+
311+
Description:
312+
------------
313+
- Depends on proper system locales, tested with RH6.0 and Slackware 3.6,
314+
with cs_CZ.iso8859-2 loacle
315+
- Never try to set-up server multibyte database encoding to WIN1250,
316+
always use LATIN2 instead. There is not WIN1250 locale in Unix
317+
- WIN1250 encoding is useable only for M$W ODBC clients. The characters are
318+
on thy fly re-coded, to be displayed and stored back properly
319+
320+
Important:
321+
----------
322+
- it reorders your sort order depending on your LC_... setting, so don't be
323+
confused with regression tests, they don't use locale
324+
- "ch" is corectly sorted only in some newer locales (Ex. RH6.0)
325+
- you have to insert money as '162,50' (with comma in aphostrophes!)
326+
- not tested properly

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp