1- <!-- $Header: /cvsroot/pgsql/doc/src/sgml/charset.sgml,v 2.8 2001/05/03 21:38:44 momjian Exp $ -->
1+ <!-- $Header: /cvsroot/pgsql/doc/src/sgml/charset.sgml,v 2.9 2001/09/09 23:52:12 petere Exp $ -->
22
33<chapter id="charset">
44 <title>Localization</>
5353 <firstterm>Locale</> support refers to an application respecting
5454 cultural preferences regarding alphabets, sorting, number
5555 formatting, etc. <productname>PostgreSQL</> uses the standard ISO
56- C and POSIX-like locale facilities provided by the server operating
56+ C and<acronym> POSIX</acronym> -like locale facilities provided by the server operating
5757 system. For additional information refer to the documentation of your
5858 system.
5959 </para>
@@ -103,27 +103,27 @@ export LANG=sv_SE
103103 <tgroup cols="2">
104104 <tbody>
105105 <row>
106- <entry>LC_COLLATE</>
106+ <entry><envar> LC_COLLATE</> </>
107107 <entry>String sort order</>
108108 </row>
109109 <row>
110- <entry>LC_CTYPE</>
110+ <entry><envar> LC_CTYPE</> </>
111111 <entry>Character classification (What is a letter? The upper-case equivalent?)</>
112112 </row>
113113 <row>
114- <entry>LC_MESSAGES</>
114+ <entry><envar> LC_MESSAGES</> </>
115115 <entry>Language of messages</>
116116 </row>
117117 <row>
118- <entry>LC_MONETARY</>
118+ <entry><envar> LC_MONETARY</> </>
119119 <entry>Formatting of currency amounts</>
120120 </row>
121121 <row>
122- <entry>LC_NUMERIC</>
122+ <entry><envar> LC_NUMERIC</> </>
123123 <entry>Formatting of numbers</>
124124 </row>
125125 <row>
126- <entry>LC_TIME</>
126+ <entry><envar> LC_TIME</> </>
127127 <entry>Formatting of dates and times</>
128128 </row>
129129 </tbody>
@@ -204,7 +204,7 @@ export LANG=sv_SE
204204
205205 <para>
206206 If locale support doesn't work in spite of the explanation above,
207- check that the locale support in your operating system isokay .
207+ check that the locale support in your operating system iscorrectly configured .
208208 To check whether a given locale is installed and functional you
209209 can use <application>Perl</>, for example. Perl has also support
210210 for locales and if a locale is broken <command>perl -v</> will
@@ -226,9 +226,9 @@ perl: warning: Falling back to the standard locale ("C").
226226
227227 <para>
228228 Check that your locale files are in the right location. Possible
229- locations include: <filename>/usr/lib/locale</filename> (Linux,
230- Solaris), <filename>/usr/share/locale</filename> (Linux),
231- <filename>/usr/lib/nls/loc</filename> (DUX 4.0). Check the locale
229+ locations include: <filename>/usr/lib/locale</filename> (<systemitem class="osname"> Linux</> ,
230+ <systemitem class="osname"> Solaris</> ), <filename>/usr/share/locale</filename> (<systemitem class="osname"> Linux</> ),
231+ <filename>/usr/lib/nls/loc</filename> (<systemitem class="osname"> DUX 4.0</> ). Check the locale
232232 man page of your system if you are not sure.
233233 </para>
234234
@@ -258,8 +258,8 @@ perl: warning: Falling back to the standard locale ("C").
258258 <para>
259259 Multibyte (<acronym>MB</acronym>) support is intended to allow
260260 <productname>Postgres</productname> to handle
261- multiple-byte character sets such as EUC (Extended Unix Code), Unicode and
262- Mule internal code. With <acronym>MB</acronym> enabled you can usemulti-byte
261+ multiple-byte character sets such as<acronym> EUC</> (Extended Unix Code), Unicode and
262+ Mule internal code. With <acronym>MB</acronym> enabled you can usemultibyte
263263 character sets in regular expressions (regexp), LIKE, and some
264264 other functions. The default
265265 encoding system is selected while initializing your
@@ -304,63 +304,63 @@ perl: warning: Falling back to the standard locale ("C").
304304 </thead>
305305 <tbody>
306306<row>
307- <entry>SQL_ASCII</entry>
308- <entry>ASCII</entry>
307+ <entry><literal> SQL_ASCII</literal> </entry>
308+ <entry><acronym> ASCII</acronym> </entry>
309309</row>
310310<row>
311- <entry>EUC_JP</entry>
312- <entry>Japanese EUC</entry>
311+ <entry><literal> EUC_JP</literal> </entry>
312+ <entry>Japanese<acronym> EUC</> </entry>
313313</row>
314314<row>
315- <entry>EUC_CN</entry>
316- <entry>Chinese EUC</entry>
315+ <entry><literal> EUC_CN</literal> </entry>
316+ <entry>Chinese<acronym> EUC</> </entry>
317317</row>
318318<row>
319- <entry>EUC_KR</entry>
320- <entry>Korean EUC</entry>
319+ <entry><literal> EUC_KR</literal> </entry>
320+ <entry>Korean<acronym> EUC</> </entry>
321321</row>
322322<row>
323- <entry>EUC_TW</entry>
324- <entry>Taiwan EUC</entry>
323+ <entry><literal> EUC_TW</literal> </entry>
324+ <entry>Taiwan<acronym> EUC</acronym> </entry>
325325</row>
326326<row>
327- <entry>UNICODE</entry>
328- <entry>Unicode( UTF-8)</entry>
327+ <entry><literal> UNICODE</literal> </entry>
328+ <entry>Unicode (<acronym> UTF</acronym> -8)</entry>
329329</row>
330330<row>
331- <entry>MULE_INTERNAL</entry>
331+ <entry><literal> MULE_INTERNAL</literal> </entry>
332332 <entry>Mule internal</entry>
333333</row>
334334<row>
335- <entry>LATIN1</entry>
335+ <entry><literal> LATIN1</literal> </entry>
336336 <entry>ISO 8859-1 English and some European languages</entry>
337337</row>
338338<row>
339- <entry>LATIN2</entry>
339+ <entry><literal> LATIN2</literal> </entry>
340340 <entry>ISO 8859-2 English and some European languages</entry>
341341</row>
342342<row>
343- <entry>LATIN3</entry>
343+ <entry><literal> LATIN3</literal> </entry>
344344 <entry>ISO 8859-3 English and some European languages</entry>
345345</row>
346346<row>
347- <entry>LATIN4</entry>
347+ <entry><literal> LATIN4</literal> </entry>
348348 <entry>ISO 8859-4 English and some European languages</entry>
349349</row>
350350<row>
351- <entry>LATIN5</entry>
351+ <entry><literal> LATIN5</literal> </entry>
352352 <entry>ISO 8859-5 English and some European languages</entry>
353353</row>
354354<row>
355- <entry>KOI8</entry>
356- <entry>KOI8 -R(U)</entry>
355+ <entry><literal> KOI8</literal> </entry>
356+ <entry><acronym>KOI</acronym>8 -R(U)</entry>
357357</row>
358358<row>
359- <entry>WIN</entry>
359+ <entry><literal> WIN</literal> </entry>
360360 <entry>Windows CP1251</entry>
361361</row>
362362<row>
363- <entry>ALT</entry>
363+ <entry><literal> ALT</literal> </entry>
364364 <entry>Windows CP866</entry>
365365</row>
366366 </tbody>
@@ -395,7 +395,7 @@ perl: warning: Falling back to the standard locale ("C").
395395% initdb -E EUC_JP
396396 </programlisting>
397397
398- sets the default encoding to EUC_JP (Extended Unix Code for Japanese).
398+ sets the default encoding to<literal> EUC_JP</literal> (Extended Unix Code for Japanese).
399399 Note that you can use "--encoding" instead of "-E" if you prefer
400400 to type longer option strings.
401401 If no -E or --encoding option is given, the encoding
@@ -409,7 +409,7 @@ perl: warning: Falling back to the standard locale ("C").
409409% createdb -E EUC_KR korean
410410 </programlisting>
411411
412- will create a database named" korean" with EUC_KR encoding.
412+ will create a database named<database> korean</database> with<literal> EUC_KR</literal> encoding.
413413 Another way to accomplish this is to use a SQL command:
414414
415415 <programlisting>
@@ -419,7 +419,7 @@ CREATE DATABASE korean WITH ENCODING = 'EUC_KR';
419419 The encoding for a database is represented as an
420420 <firstterm>encoding column</firstterm> in the
421421 <literal>pg_database</literal> system catalog.
422- You can see that by using-l or\l of psql
422+ You can see that by using<option>-l</option> or<command>\l</command> of<command> psql</command>
423423 command.
424424
425425 <programlisting>
@@ -462,26 +462,26 @@ $ psql -l
462462 </thead>
463463 <tbody>
464464<row>
465- <entry>EUC_JP</entry>
466- <entry>EUC_JP, SJIS</entry>
465+ <entry><literal> EUC_JP</literal> </entry>
466+ <entry><literal> EUC_JP</literal>, <literal> SJIS</literal> </entry>
467467</row>
468468<row>
469- <entry>EUC_TW</entry>
470- <entry>EUC_TW, BIG5</entry>
469+ <entry><literal> EUC_TW</literal> </entry>
470+ <entry><literal> EUC_TW</literal>, <literal> BIG5</literal> </entry>
471471</row>
472472<row>
473- <entry>LATIN2</entry>
474- <entry>LATIN2, WIN1250</entry>
473+ <entry><literal> LATIN2</literal> </entry>
474+ <entry><literal> LATIN2</literal>, <literal> WIN1250</literal> </entry>
475475</row>
476476<row>
477- <entry>LATIN5</entry>
478- <entry>LATIN5, WIN, ALT</entry>
477+ <entry><literal> LATIN5</literal> </entry>
478+ <entry><literal> LATIN5</literal>, <literal> WIN</literal>, <literal> ALT</literal> </entry>
479479</row>
480480<row>
481- <entry>MULE_INTERNAL</entry>
482- <entry>EUC_JP, SJIS, EUC_KR, EUC_CN,
483- EUC_TW, BIG5, LATIN1 to LATIN5,
484- WIN, ALT, WIN1250</entry>
481+ <entry><literal> MULE_INTERNAL</literal> </entry>
482+ <entry><literal> EUC_JP</literal>, <literal> SJIS</literal>, <literal> EUC_KR</literal>, <literal> EUC_CN</literal> ,
483+ <literal> EUC_TW</literal>, <literal> BIG5</literal>, <literal> LATIN1</literal> to<literal> LATIN5</literal> ,
484+ <literal> WIN</literal>, <literal> ALT</literal>, <literal> WIN1250</literal> </entry>
485485</row>
486486 </tbody>
487487 </tgroup>
@@ -501,7 +501,7 @@ $ psql -l
501501<application>psql</application>.
502502<command>\encoding</command> allows you to change frontend
503503encoding on the fly. For
504- example, to change the encoding to SJIS, type:
504+ example, to change the encoding to<literal> SJIS</literal> , type:
505505
506506<programlisting>
507507\encoding SJIS
@@ -511,9 +511,9 @@ $ psql -l
511511
512512 <listitem>
513513 <para>
514- Using libpq functions.
514+ Using<application> libpq</> functions.
515515<command>\encoding</command> actually calls
516- PQsetClientEncoding() for its purpose.
516+ <function> PQsetClientEncoding()</function> for its purpose.
517517
518518<programlisting>
519519int PQsetClientEncoding(PGconn *<replaceable>conn</replaceable>, const char *<replaceable>encoding</replaceable>)
@@ -530,7 +530,7 @@ int PQclientEncoding(const PGconn *<replaceable>conn</replaceable>)
530530</programlisting>
531531
532532Note that it returns the "encoding id," not the encoding symbol string
533- such as" EUC_JP." To convert an encoding id to an encoding symbol, you
533+ such as<literal> EUC_JP</literal>. To convert an encoding id to an encoding symbol, you
534534can use:
535535
536536<programlisting>
@@ -591,18 +591,18 @@ RESET CLIENT_ENCODING;
591591 encodings has been supported since PostgreSQL 7.1.
592592 Because this requires huge conversion tables, it's not enabled by default.
593593 To enable this feature, run configure with the
594- --enable-unicode-conversion option. Note that this requires
595- the --enable-multibyte option also.
594+ <option> --enable-unicode-conversion</option> option. Note that this requires
595+ the<option> --enable-multibyte</option> option also.
596596 </para>
597597 </sect2>
598598
599599 <sect2>
600600 <title>What happens if the translation is not possible?</title>
601601
602602 <para>
603- Suppose you choose EUC_JP for the backend, LATIN1 for the frontend,
604- then some Japanese characters could not be translated into LATIN1. In
605- this case, a letter that cannot be represented in the LATIN1 character set
603+ Suppose you choose<literal> EUC_JP</literal> for the backend,<literal> LATIN1</literal> for the frontend,
604+ then some Japanese characters could not be translated into<literal> LATIN1</literal> . In
605+ this case, a letter that cannot be represented in the<literal> LATIN1</literal> character set
606606 would be transformed as:
607607
608608 <programlisting>
@@ -623,22 +623,22 @@ RESET CLIENT_ENCODING;
623623 <para>
624624<ulink url="ftp://ftp.ora.com/pub/examples/nutshell/ujip/doc/cjk.inf">
625625 ftp://ftp.ora.com/pub/examples/nutshell/ujip/doc/cjk.inf</ulink>
626- Detailed explanations of EUC_JP, EUC_CN, EUC_KR, EUC_TW
626+ Detailed explanations of<literal> EUC_JP</literal>, <literal> EUC_CN</literal>, <literal> EUC_KR</literal>, <literal> EUC_TW</literal>
627627appear in section 3.2.
628628 </para>
629629 </listitem>
630630
631631 <listitem>
632632 <para>
633633Unicode: <ulink url="http://www.unicode.org/">http://www.unicode.org/</ulink>
634- The homepage ofUNICODE .
634+ The homepage ofUnicode .
635635 </para>
636636 </listitem>
637637
638638 <listitem>
639639 <para>
640640<literal>RFC 2044</literal>
641- UTF-8 is defined here.
641+ <literal> UTF</literal> -8 is defined here.
642642 </para>
643643 </listitem>
644644 </itemizedlist>
@@ -763,7 +763,8 @@ Sorry for my Eglish and C code, I'm not native :-)
763763 <listitem>
764764 <para>
765765Success depends on proper system locales. This has been tested
766- with RH6.0 and Slackware 3.6, with cs_CZ.iso8859-2 locale.
766+ with <systemitem class="osname">Red Hat 6.0</> and <systemitem
767+ class="osname">Slackware 3.6</>, with <literal>cs_CZ.iso8859-2</literal> locale.
767768 </para>
768769 </listitem>
769770
@@ -777,7 +778,7 @@ Sorry for my Eglish and C code, I'm not native :-)
777778
778779 <listitem>
779780 <para>
780- WIN1250 encoding isuseable only forM$W ODBC clients. The
781+ WIN1250 encoding isusable only forWindows ODBC clients. The
781782characters are recoded on the fly, to be displayed and stored
782783back properly.
783784 </para>
@@ -864,7 +865,7 @@ LC_TIME=cs_CZ.ISO8859-2
864865
865866 <step>
866867 <para>
867- Install ODBC driver forPgSQL on your M$ Windows machine.
868+ Install ODBC driver for<productname>PostgreSQL</productname> on your Windows machine.
868869 </para>
869870 </step>
870871
@@ -953,7 +954,7 @@ HostCharset <replaceable>host_spec</> <replaceable>host_charset</>
953954 cannot use different encodings on the same host at the same
954955 time. It is also inconvenient when you boot your client hosts into
955956 multiple operating systems. Nevertheless, when these restrictions are
956- not limiting and you do not needmulti-byte characters than it is a
957+ not limiting and you do not needmultibyte characters than it is a
957958 simple and effective solution.
958959 </para>
959960 </sect1>