Movatterモバイル変換


[0]ホーム

URL:



Facebook
Postgres Pro
Facebook
Downloads
23.1. Locale Support
Prev UpChapter 23. LocalizationHome Next

23.1. Locale Support#

Locale support refers to an application respecting cultural preferences regarding alphabets, sorting, number formatting, etc.PostgreSQL uses the standard ISO C andPOSIX locale facilities provided by the server operating system. For additional information refer to the documentation of your system.

Locale support is automatically initialized when a database cluster is created usinginitdb.initdb will initialize the database cluster with the locale setting of its execution environment by default, so if your system is already set to use the locale that you want in your database cluster then there is nothing else you need to do. If you want to use a different locale (or you are not sure which locale your system is set to), you can instructinitdb exactly which locale to use by specifying the--locale option. For example:

initdb --locale=sv_SE

This example for Unix systems sets the locale to Swedish (sv) as spoken in Sweden (SE). Other possibilities might includeen_US (U.S. English) andfr_CA (French Canadian). If more than one character set can be used for a locale then the specifications can take the formlanguage_territory.codeset. For example,fr_BE.UTF-8 represents the French language (fr) as spoken in Belgium (BE), with aUTF-8 character set encoding.

What locales are available on your system under what names depends on what was provided by the operating system vendor and what was installed. On most Unix systems, the commandlocale -a will provide a list of available locales. Windows uses more verbose locale names, such asGerman_Germany orSwedish_Sweden.1252, but the principles are the same.

Occasionally it is useful to mix rules from several locales, e.g., use English collation rules but Spanish messages. To support that, a set of locale subcategories exist that control only certain aspects of the localization rules:

LC_COLLATEString sort order
LC_CTYPECharacter classification (What is a letter? Its upper-case equivalent?)
LC_MESSAGESLanguage of messages
LC_MONETARYFormatting of currency amounts
LC_NUMERICFormatting of numbers
LC_TIMEFormatting of dates and times

The category names translate into names ofinitdb options to override the locale choice for a specific category. For instance, to set the locale to French Canadian, but use U.S. rules for formatting currency, useinitdb --locale=fr_CA --lc-monetary=en_US.

If you want the system to behave as if it had no locale support, use the special locale nameC, or equivalentlyPOSIX.

Some locale categories must have their values fixed when the database is created. You can use different settings for different databases, but once a database is created, you cannot change them for that database anymore.LC_COLLATE andLC_CTYPE are these categories. They affect the sort order of indexes, so they must be kept fixed, or indexes on text columns would become corrupt. (But you can alleviate this restriction using collations, as discussed inSection 23.2.) The default values for these categories are determined wheninitdb is run, and those values are used when new databases are created, unless specified otherwise in theCREATE DATABASE command.

The other locale categories can be changed whenever desired by setting the server configuration parameters that have the same name as the locale categories (seeSection 19.11.2 for details). The values that are chosen byinitdb are actually only written into the configuration filepostgresql.conf to serve as defaults when the server is started. If you remove these assignments frompostgresql.conf then the server will inherit the settings from its execution environment.

Note that the locale behavior of the server is determined by the environment variables seen by the server, not by the environment of any client. Therefore, be careful to configure the correct locale settings before starting the server. A consequence of this is that if client and server are set up in different locales, messages might appear in different languages depending on where they originated.

Note

When we speak of inheriting the locale from the execution environment, this means the following on most operating systems: For a given locale category, say the collation, the following environment variables are consulted in this order until one is found to be set:LC_ALL,LC_COLLATE (or the variable corresponding to the respective category),LANG. If none of these environment variables are set then the locale defaults toC.

Some message localization libraries also look at the environment variableLANGUAGE which overrides all other locale settings for the purpose of setting the language of messages. If in doubt, please refer to the documentation of your operating system, in particular the documentation aboutgettext.

To enable messages to be translated to the user's preferred language,NLS must have been selected at build time (configure --enable-nls). All other locale support is built in automatically.

The locale settings influence the following SQL features:

  • Sort order in queries usingORDER BY or the standard comparison operators on textual data

  • Theupper,lower, andinitcap functions

  • Pattern matching operators (LIKE,SIMILAR TO, and POSIX-style regular expressions); locales affect both case insensitive matching and the classification of characters by character-class regular expressions

  • Theto_char family of functions

  • The ability to use indexes withLIKE clauses

The drawback of using locales other thanC orPOSIX inPostgreSQL is its performance impact. It slows character handling and prevents ordinary indexes from being used byLIKE. For this reason use locales only if you actually need them.

As a workaround to allowPostgreSQL to use indexes withLIKE clauses under a non-C locale, several custom operator classes exist. These allow the creation of an index that performs a strict character-by-character comparison, ignoring locale comparison rules. Refer toSection 11.10 for more information. Another approach is to create indexes using theC collation, as discussed inSection 23.2.

23.1.3. Selecting Locales#

Locales can be selected in different scopes depending on requirements. The above overview showed how locales are specified usinginitdb to set the defaults for the entire cluster. The following list shows where locales can be selected. Each item provides the defaults for the subsequent items, and each lower item allows overriding the defaults on a finer granularity.

  1. As explained above, the environment of the operating system provides the defaults for the locales of a newly initialized database cluster. In many cases, this is enough: if the operating system is configured for the desired language/territory, by defaultPostgreSQL will also behave according to that locale.

  2. As shown above, command-line options forinitdb specify the locale settings for a newly initialized database cluster. Use this if the operating system does not have the locale configuration you want for your database system.

  3. A locale can be selected separately for each database. The SQL commandCREATE DATABASE and its command-line equivalentcreatedb have options for that. Use this for example if a database cluster houses databases for multiple tenants with different requirements.

  4. Locale settings can be made for individual table columns. This uses an SQL object calledcollation and is explained inSection 23.2. Use this for example to sort data in different languages or customize the sort order of a particular table.

  5. Finally, locales can be selected for an individual query. Again, this uses SQL collation objects. This could be used to change the sort order based on run-time choices or for ad-hoc experimentation.

23.1.4. Locale Providers#

A locale provider specifies which library defines the locale behavior for collations and character classifications.

The commands and tools that select the locale settings, as described above, each have an option to select the locale provider. Here is an example to initialize a database cluster using the ICU provider:

initdb --locale-provider=icu --icu-locale=en

See the description of the respective commands and programs for details. Note that you can mix locale providers at different granularities, for example uselibc by default for the cluster but have one database that uses theicu provider, and then have collation objects using either provider within those databases.

Regardless of the locale provider, the operating system is still used to provide some locale-aware behavior, such as messages (seelc_messages).

The available locale providers are listed below:

builtin

Thebuiltin provider uses built-in operations. Only theC andC.UTF-8 locales are supported for this provider.

TheC locale behavior is identical to theC locale in the libc provider. When using this locale, the behavior may depend on the database encoding.

TheC.UTF-8 locale is available only for when the database encoding isUTF-8, and the behavior is based on Unicode. The collation uses the code point values only. The regular expression character classes are based on the "POSIX Compatible" semantics, and the case mapping is the "simple" variant.

icu

Theicu provider uses the external ICU library.PostgreSQL must have been configured with support.

ICU provides collation and character classification behavior that is independent of the operating system and database encoding, which is preferable if you expect to transition to other platforms without any change in results.LC_COLLATE andLC_CTYPE can be set independently of the ICU locale.

Note

For the ICU provider, results may depend on the version of the ICU library used, as it is updated to reflect changes in natural language over time.

libc

Thelibc provider uses the operating system's C library. The collation and character classification behavior is controlled by the settingsLC_COLLATE andLC_CTYPE, so they cannot be set independently.

Note

The same locale name may have different behavior on different platforms when using the libc provider.

23.1.5. ICU Locales#

23.1.5.1. ICU Locale Names#

The ICU format for the locale name is aLanguage Tag.

CREATE COLLATION mycollation1 (provider = icu, locale = 'ja-JP');CREATE COLLATION mycollation2 (provider = icu, locale = 'fr');

23.1.5.2. Locale Canonicalization and Validation#

When defining a new ICU collation object or database with ICU as the provider, the given locale name is transformed ("canonicalized") into a language tag if not already in that form. For instance,

CREATE COLLATION mycollation3 (provider = icu, locale = 'en-US-u-kn-true');NOTICE:  using standard form "en-US-u-kn" for locale "en-US-u-kn-true"CREATE COLLATION mycollation4 (provider = icu, locale = 'de_DE.utf8');NOTICE:  using standard form "de-DE" for locale "de_DE.utf8"

If you see this notice, ensure that theprovider andlocale are the expected result. For consistent results when using the ICU provider, specify the canonicallanguage tag instead of relying on the transformation.

A locale with no language name, or the special language nameroot, is transformed to have the languageund ("undefined").

ICU can transform most libc locale names, as well as some other formats, into language tags for easier transition to ICU. If a libc locale name is used in ICU, it may not have precisely the same behavior as in libc.

If there is a problem interpreting the locale name, or if the locale name represents a language or region that ICU does not recognize, you will see the following warning:

CREATE COLLATION nonsense (provider = icu, locale = 'nonsense');WARNING:  ICU locale "nonsense" has unknown language "nonsense"HINT:  To disable ICU locale validation, set parameter icu_validation_level to DISABLED.CREATE COLLATION

icu_validation_level controls how the message is reported. Unless set toERROR, the collation will still be created, but the behavior may not be what the user intended.

23.1.5.3. Language Tag#

A language tag, defined in BCP 47, is a standardized identifier used to identify languages, regions, and other information about a locale.

Basic language tags are simplylanguage-region; or even justlanguage. Thelanguage is a language code (e.g.fr for French), andregion is a region code (e.g.CA for Canada). Examples:ja-JP,de, orfr-CA.

Collation settings may be included in the language tag to customize collation behavior. ICU allows extensive customization, such as sensitivity (or insensitivity) to accents, case, and punctuation; treatment of digits within text; and many other options to satisfy a variety of uses.

To include this additional collation information in a language tag, append-u, which indicates there are additional collation settings, followed by one or more-key-value pairs. Thekey is the key for acollation setting andvalue is a valid value for that setting. For boolean settings, the-key may be specified without a corresponding-value, which implies a value oftrue.

For example, the language tagen-US-u-kn-ks-level2 means the locale with the English language in the US region, with collation settingskn set totrue andks set tolevel2. Those settings mean the collation will be case-insensitive and treat a sequence of digits as a single number:

CREATE COLLATION mycollation5 (provider = icu, deterministic = false, locale = 'en-US-u-kn-ks-level2');SELECT 'aB' = 'Ab' COLLATE mycollation5 as result; result-------- t(1 row)SELECT 'N-45' < 'N-123' COLLATE mycollation5 as result; result-------- t(1 row)

SeeSection 23.2.3 for details and additional examples of using language tags with custom collation information for the locale.

23.1.6. Problems#

If locale support doesn't work according to the explanation above, check that the locale support in your operating system is correctly configured. To check what locales are installed on your system, you can use the commandlocale -a if your operating system provides it.

Check thatPostgreSQL is actually using the locale that you think it is. TheLC_COLLATE andLC_CTYPE settings are determined when a database is created, and cannot be changed except by creating a new database. Other locale settings includingLC_MESSAGES andLC_MONETARY are initially determined by the environment the server is started in, but can be changed on-the-fly. You can check the active locale settings using theSHOW command.

The directorysrc/test/locale in the source distribution contains a test suite forPostgreSQL's locale support.

Client applications that handle server-side errors by parsing the text of the error message will obviously have problems when the server's messages are in a different language. Authors of such applications are advised to make use of the error code scheme instead.

Maintaining catalogs of message translations requires the on-going efforts of many volunteers that want to seePostgreSQL speak their preferred language well. If messages in your language are currently not available or not fully translated, your assistance would be appreciated. If you want to help, refer toChapter 55 or write to the developers' mailing list.


Prev Up Next
Chapter 23. Localization Home 23.2. Collation Support
pdfepub
Go to PostgreSQL 17
By continuing to browse this website, you agree to the use of cookies. Go toPrivacy Policy.

[8]ページ先頭

©2009-2025 Movatter.jp