1515 Using the locale features of the operating system to provide
1616 locale-specific collation order, number formatting, translated
1717 messages, and other aspects.
18+ This is covered in <xref linkend="locale"> and
19+ <xref linkend="collation">.
1820 </para>
1921 </listitem>
2022
2325 Providing a number of different character sets to support storing text
2426 in all kinds of languages, and providing character set translation
2527 between client and server.
28+ This is covered in <xref linkend="multibyte">.
2629 </para>
2730 </listitem>
2831 </itemizedlist>
@@ -138,9 +141,12 @@ initdb --locale=sv_SE
138141 fixed when the database is created. You can use different settings
139142 for different databases, but once a database is created, you cannot
140143 change them for that database anymore. <literal>LC_COLLATE</literal>
141- and <literal>LC_CTYPE</literal> are thesetype of categories. They affect
144+ and <literal>LC_CTYPE</literal> are these categories. They affect
142145 the sort order of indexes, so they must be kept fixed, or indexes on
143- text columns would become corrupt. The default values for these
146+ text columns would become corrupt.
147+ (But you can alleviate this restriction using collations, as discussed
148+ in <xref linkend="collation">.)
149+ The default values for these
144150 categories are determined when <command>initdb</command> is run, and
145151 those values are used when new databases are created, unless
146152 specified otherwise in the <command>CREATE DATABASE</command> command.
@@ -153,7 +159,7 @@ initdb --locale=sv_SE
153159 linkend="runtime-config-client-format"> for details). The values
154160 that are chosen by <command>initdb</command> are actually only written
155161 into the configuration file <filename>postgresql.conf</filename> to
156- serve as defaults when the server is started. If youdisable these
162+ serve as defaults when the server is started. If youremove these
157163 assignments from <filename>postgresql.conf</filename> then the
158164 server will inherit the settings from its execution environment.
159165 </para>
@@ -308,66 +314,69 @@ initdb --locale=sv_SE
308314 <title>Collation Support</title>
309315
310316 <para>
311- The collationsupport allows specifying the sort order and certain
312- other locale aspects of data per column orper operation at run
313- time. This alleviates theproblem that the
317+ The collationfeature allows specifying the sort order and certain
318+ other locale aspects of data per- column, oreven per- operation.
319+ This alleviates therestriction that the
314320 <symbol>LC_COLLATE</symbol> and <symbol>LC_CTYPE</symbol> settings
315321 of a database cannot be changed after its creation.
316322 </para>
317323
318324 <note>
319325 <para>
320- The collation support feature is currently only known to work on
321- Linux/ glibc and Mac OS X platforms.
326+ Collation support is currently only known to work on
327+ Linux ( glibc) and Mac OS X platforms.
322328 </para>
323329 </note>
324330
325331 <sect2>
326332 <title>Concepts</title>
327333
328334 <para>
329- Conceptually, everydatum of a collatable data type has a
330- collation. (Collatable data types inthe base system are
335+ Conceptually, everyexpression of a collatable data type has a
336+ collation. (The built- incollatable data types are
331337 <type>text</type>, <type>varchar</type>, and <type>char</type>.
332338 User-defined base types can also be marked collatable.) If the
333- datum is a column reference, the collation of thedatum is the
334- defined collation of the column. If thedatum is a constant, the
339+ expression is a column reference, the collation of theexpression is the
340+ defined collation of the column. If theexpression is a constant, the
335341 collation is the default collation of the data type of the
336- constant. The collation of more complexexpressions is derived
337- from theinput collations as described below.
342+ constant. The collation ofa more complexexpression is derived
343+ from the collations of its inputs, as described below.
338344 </para>
339345
340346 <para>
341- The collation ofa datum can also be the <quote>default</quote>
342- collation, whichreverts to the locale settings defined for the
343- database. In some cases,a datum can also have no known
347+ The collation ofan expression can be the <quote>default</quote>
348+ collation, whichmeans the locale settings defined for the
349+ database. In some cases,an expression can also have no known
344350 collation. In such cases, ordering operations and other
345351 operations that need to know the collation will fail.
346352 </para>
347353
348354 <para>
349355 When the database system has to perform an ordering or a
350- comparison, itconsiders the collation of the inputdata . This
351- happens in two situations: an <literal>ORDER BY</literal>clause
352- anda function or operatorcall such as <literal><</literal>.
353- The collation to apply forthe performance of the <literal>ORDER
354- BY</literal> clause is simply the collation of the sort key. The
355- collation to apply for a function or operator call is derived from
356- the arguments, as described below. Additionally , collations are
357- taken into account by functions that convert between lower and
358- upper case letters,that is, <function>lower</function>,
359- <function>upper</function>, and <function> initcap</function >.
356+ comparison, ituses the collation of the inputexpression . This
357+ happens, for example, with <literal>ORDER BY</literal>clauses
358+ and function or operatorcalls such as <literal><</literal>.
359+ The collation to apply foran <literal>ORDER BY</literal> clause
360+ is simply the collation of the sort key. The collation to apply for a
361+ function or operator call is derived from the arguments, as described
362+ below. In addition to comparison operators , collations are taken into
363+ account by functions that convert between lower and upper case
364+ letters,such as <function>lower</>, < function>upper</>, and
365+ <function>initcap</>.
360366 </para>
361367
362368 <para>
363- For a function call, the collation that is derived from combining
364- the argument collations is both used for performing any
365- comparisons or ordering and for the collation of the function
366- result, if the result type is collatable.
369+ For a function or operator call, the collation that is derived by
370+ examining the argument collations is used at run time for performing
371+ the specified operation. If the result of the function or operator
372+ call is of a collatable data type, the collation is also used at parse
373+ time as the defined collation of the function or operator expression,
374+ in case there is a surrounding expression that requires knowledge of
375+ its collation.
367376 </para>
368377
369378 <para>
370- The <firstterm>collation derivation</firstterm> ofa datum can be
379+ The <firstterm>collation derivation</firstterm> ofan expression can be
371380 implicit or explicit. This distinction affects how collations are
372381 combined when multiple different collations appear in an
373382 expression. An explicit collation derivation arises when a
@@ -379,18 +388,18 @@ initdb --locale=sv_SE
379388 <orderedlist>
380389 <listitem>
381390 <para>
382- If any inputitem has an explicit collation derivation, then
383- all explicitly derived collations among the inputitems must be
384- the same, otherwise an error is raised. Ifan explicitly
391+ If any inputexpression has an explicit collation derivation, then
392+ all explicitly derived collations among the inputexpressions must be
393+ the same, otherwise an error is raised. Ifany explicitly
385394 derived collation is present, that is the result of the
386395 collation combination.
387396 </para>
388397 </listitem>
389398
390399 <listitem>
391400 <para>
392- Otherwise, all inputitems must have the same implicit
393- collation derivation or the default collation. Ifan
401+ Otherwise, all inputexpressions must have the same implicit
402+ collation derivation or the default collation. Ifany
394403 implicitly derived collation is present, that is the result of
395404 the collation combination. Otherwise, the result is the
396405 default collation.
@@ -428,19 +437,19 @@ SELECT a || ('foo' COLLATE "y") FROM test1;
428437 A collation is an SQL schema object that maps an SQL name to
429438 operating system locales. In particular, it maps to a combination
430439 of <symbol>LC_COLLATE</symbol> and <symbol>LC_CTYPE</symbol>. (As
431- the name wouldindicate , the main purpose of a collation is to set
440+ the name wouldsuggest , the main purpose of a collation is to set
432441 <symbol>LC_COLLATE</symbol>, which controls the sort order. But
433442 it is rarely necessary in practice to have an
434443 <symbol>LC_CTYPE</symbol> setting that is different from
435444 <symbol>LC_COLLATE</symbol>, so it is more convenient to collect
436445 these under one concept than to create another infrastructure for
437- setting <symbol>LC_CTYPE</symbol> perdatum .) Also, a collation
438- is tied to a character encoding. The same collation name may
439- exist for different encodings.
446+ setting <symbol>LC_CTYPE</symbol> perexpression .) Also, a collation
447+ is tied to a characterset encoding (see <xref linkend="multibyte">).
448+ The same collation name may exist for different encodings.
440449 </para>
441450
442451 <para>
443- When a databasesystem is initialized, <command>initdb</command>
452+ When a databasecluster is initialized, <command>initdb</command>
444453 populates the system catalog <literal>pg_collation</literal> with
445454 collations based on all the locales it finds on the operating
446455 system at the time. For example, the operating system might
@@ -463,8 +472,19 @@ SELECT a || ('foo' COLLATE "y") FROM test1;
463472 collation may be created using
464473 the <xref linkend="sql-createcollation"> command. That command
465474 can also be used to create a new collation from an existing
466- collation, which can be useful to be able to use operating-system
467- independent collation names in applications.
475+ collation, which can be useful to be able to use
476+ operating-system-independent collation names in applications.
477+ </para>
478+
479+ <para>
480+ Within any particular database, only collations that use that
481+ database's encoding are of interest. Other entries in
482+ <literal>pg_collation</literal> are ignored. Thus, a stripped collation
483+ name such as <literal>de_DE</literal> can be considered unique
484+ within a given database even though it would not be unique globally.
485+ Use of the stripped collation names is recommendable, since it will
486+ make one less thing you need to change if you decide to change to
487+ another database encoding.
468488 </para>
469489 </sect2>
470490 </sect1>