@@ -68,7 +68,7 @@ initdb --locale=sv_SE
6868 <para>
6969 This example for Unix systems sets the locale to Swedish
7070 (<literal>sv</>) as spoken
71- in Sweden (<literal>SE</>). Other possibilities mightbe
71+ in Sweden (<literal>SE</>). Other possibilities mightinclude
7272 <literal>en_US</> (U.S. English) and <literal>fr_CA</> (French
7373 Canadian). If more than one character set can be used for a
7474 locale then the specifications can take the form
@@ -133,7 +133,8 @@ initdb --locale=sv_SE
133133
134134 <para>
135135 If you want the system to behave as if it had no locale support,
136- use the special locale <literal>C</> or <literal>POSIX</>.
136+ use the special locale name <literal>C</>, or equivalently
137+ <literal>POSIX</>.
137138 </para>
138139
139140 <para>
@@ -257,7 +258,9 @@ initdb --locale=sv_SE
257258 operator classes exist. These allow the creation of an index that
258259 performs a strict character-by-character comparison, ignoring
259260 locale comparison rules. Refer to <xref linkend="indexes-opclass">
260- for more information.
261+ for more information. Another approach is to create indexes using
262+ the <literal>C</> collation, as discussed in
263+ <xref linkend="collation">.
261264 </para>
262265 </sect2>
263266
@@ -321,21 +324,15 @@ initdb --locale=sv_SE
321324 of a database cannot be changed after its creation.
322325 </para>
323326
324- <note>
325- <para>
326- Collation support is currently only known to work on
327- Linux (glibc) and Mac OS X platforms.
328- </para>
329- </note>
330-
331327 <sect2>
332328 <title>Concepts</title>
333329
334330 <para>
335331 Conceptually, every expression of a collatable data type has a
336332 collation. (The built-in collatable data types are
337333 <type>text</type>, <type>varchar</type>, and <type>char</type>.
338- User-defined base types can also be marked collatable.) If the
334+ User-defined base types can also be marked collatable, and of course
335+ a domain over a collatable data type is collatable.) If the
339336 expression is a column reference, the collation of the expression is the
340337 defined collation of the column. If the expression is a constant, the
341338 collation is the default collation of the data type of the
@@ -346,8 +343,8 @@ initdb --locale=sv_SE
346343 <para>
347344 The collation of an expression can be the <quote>default</quote>
348345 collation, which means the locale settings defined for the
349- database.In some cases, an expression can also have no known
350- collation . In such cases, ordering operations and other
346+ database.It is also possible for an expression's collation to be
347+ indeterminate . In such cases, ordering operations and other
351348 operations that need to know the collation will fail.
352349 </para>
353350
@@ -379,7 +376,7 @@ initdb --locale=sv_SE
379376 The <firstterm>collation derivation</firstterm> of an expression can be
380377 implicit or explicit. This distinction affects how collations are
381378 combined when multiple different collations appear in an
382- expression. An explicit collation derivationarises when a
379+ expression. An explicit collation derivationoccurs when a
383380 <literal>COLLATE</literal> clause is used; all other collation
384381 derivations are implicit. When multiple collations need to be
385382 combined, for example in a function call, the following rules are
@@ -399,34 +396,90 @@ initdb --locale=sv_SE
399396 <listitem>
400397 <para>
401398 Otherwise, all input expressions must have the same implicit
402- collation derivation or the default collation. If any
403- implicitly derived collation is present, that is the result of
404- the collation combination. Otherwise, the result is the
405- default collation.
399+ collation derivation or the default collation. If any non-default
400+ collation is present, that is the result of the collation combination.
401+ Otherwise, the result is the default collation.
402+ </para>
403+ </listitem>
404+
405+ <listitem>
406+ <para>
407+ If there are conflicting non-default implicit collations among the
408+ input expressions, then the combination is deemed to have indeterminate
409+ collation. This is not an error condition unless the particular
410+ function being invoked requires knowledge of the collation it should
411+ apply. If it does, an error will be raised at run-time.
406412 </para>
407413 </listitem>
408414 </orderedlist>
409415
410- For example,take this table definition:
416+ For example,consider this table definition:
411417<programlisting>
412418CREATE TABLE test1 (
413- a text COLLATE "x",
419+ a text COLLATE "de_DE",
420+ b text COLLATE "es_ES",
414421 ...
415422);
416423</programlisting>
417424
418425 Then in
419426<programlisting>
420- SELECT a|| 'foo' FROM test1;
427+ SELECT a< 'foo' FROM test1;
421428</programlisting>
422- theresult collation of the <literal>|| </literal>operator is
423- <literal>"x" </literal> becauseit combines an implicitly derived
424- collation with the default collation. But in
429+ the <literal>< </literal>comparison is performed according to
430+ <literal>de_DE </literal>rules, becausethe expression combines an
431+ implicitly derived collation with the default collation. But in
425432<programlisting>
426- SELECT a || ('foo' COLLATE "y") FROM test1;
433+ SELECT a < ('foo' COLLATE "fr_FR") FROM test1;
434+ </programlisting>
435+ the comparison is performed using <literal>fr_FR</literal> rules,
436+ because the explicit collation derivation overrides the implicit one.
437+ Furthermore, given
438+ <programlisting>
439+ SELECT a < b FROM test1;
440+ </programlisting>
441+ the parser cannot determine which collation to apply, since the
442+ <structfield>a</> and <structfield>b</> columns have conflicting
443+ implicit collations. Since the <literal><</literal> operator
444+ does need to know which collation to use, this will result in an
445+ error. The error can be resolved by attaching an explicit collation
446+ specifier to either input expression, thus:
447+ <programlisting>
448+ SELECT a < b COLLATE "de_DE" FROM test1;
449+ </programlisting>
450+ or equivalently
451+ <programlisting>
452+ SELECT a COLLATE "de_DE" < b FROM test1;
453+ </programlisting>
454+ On the other hand, the structurally similar case
455+ <programlisting>
456+ SELECT a || b FROM test1;
457+ </programlisting>
458+ does not result in an error, because the <literal>||</> operator
459+ does not care about collations: its result is the same regardless
460+ of the collation.
461+ </para>
462+
463+ <para>
464+ The collation assigned to a function or operator's combined input
465+ expressions is also considered to apply to the function or operator's
466+ result, if the function or operator delivers a result of a collatable
467+ data type. So, in
468+ <programlisting>
469+ SELECT * FROM test1 ORDER BY a || 'foo';
470+ </programlisting>
471+ the ordering will be done according to <literal>de_DE</literal> rules.
472+ But this query:
473+ <programlisting>
474+ SELECT * FROM test1 ORDER BY a || b;
475+ </programlisting>
476+ results in an error, because even though the <literal>||</> operator
477+ doesn't need to know a collation, the <literal>ORDER BY</> clause does.
478+ As before, the conflict can be resolved with an explicit collation
479+ specifier:
480+ <programlisting>
481+ SELECT * FROM test1 ORDER BY a || b COLLATE "fr_FR";
427482</programlisting>
428- the result collation is <literal>"y"</literal> because the explicit
429- collation derivation overrides the implicit one.
430483 </para>
431484 </sect2>
432485
@@ -449,7 +502,22 @@ SELECT a || ('foo' COLLATE "y") FROM test1;
449502 </para>
450503
451504 <para>
452- When a database cluster is initialized, <command>initdb</command>
505+ On all platforms, the collations named <literal>default</>,
506+ <literal>C</>, and <literal>POSIX</> are available. Additional
507+ collations may be available depending on operating system support.
508+ The <literal>default</> collation selects the <symbol>LC_COLLATE</symbol>
509+ and <symbol>LC_CTYPE</symbol> values specified at database creation time.
510+ The <literal>C</> and <literal>POSIX</> collations both specify
511+ <quote>traditional C</> behavior, in which only the ASCII letters
512+ <quote><literal>A</></quote> through <quote><literal>Z</></quote>
513+ are treated as letters, and sorting is done strictly by character
514+ code byte values.
515+ </para>
516+
517+ <para>
518+ If the operating system provides support for using multiple locales
519+ within a single program (<function>newlocale</> and related functions),
520+ then when a database cluster is initialized, <command>initdb</command>
453521 populates the system catalog <literal>pg_collation</literal> with
454522 collations based on all the locales it finds on the operating
455523 system at the time. For example, the operating system might
@@ -484,7 +552,21 @@ SELECT a || ('foo' COLLATE "y") FROM test1;
484552 within a given database even though it would not be unique globally.
485553 Use of the stripped collation names is recommendable, since it will
486554 make one less thing you need to change if you decide to change to
487- another database encoding.
555+ another database encoding. Note however that the <literal>default</>,
556+ <literal>C</>, and <literal>POSIX</> collations can be used
557+ regardless of the database encoding.
558+ </para>
559+
560+ <para>
561+ <productname>PostgreSQL</productname> considers distinct collation
562+ objects to be incompatible even when they have identical properties.
563+ Thus for example,
564+ <programlisting>
565+ SELECT a COLLATE "C" < b COLLATE "POSIX" FROM test1;
566+ </programlisting>
567+ will draw an error even though the <literal>C</> and <literal>POSIX</>
568+ collations have identical behaviors. Mixing stripped and non-stripped
569+ collation names is therefore not recommended.
488570 </para>
489571 </sect2>
490572 </sect1>