Collation in GoogleSQL Stay organized with collections Save and categorize content based on your preferences.
GoogleSQL for Spanner supports collation. Collation defines rules to sort andcompare strings in anORDER BY operation.
By default, GoogleSQL sorts strings case-sensitively. This means thata andA are treated as different letters, andZ would come beforea.
Example default sorting: Apple, Zebra, apple
By contrast, collation lets you sort and compare strings case-insensitively oraccording to specific language rules.
Example case-insensitive collation: Apple, apple, Zebra
To customize collation inthe operation, include theCOLLATE clause with acollationspecification.
Collation is useful when you need fine-tuned control over how values are sorted,joined, or grouped in tables.
Where you can assign a collation specification
In theORDER BY clause, you can specify a collation specification for acollation-supported column. This overrides anycollation specifications set previously.
For example:
SELECTPlaceFROMLocationsORDERBYPlaceCOLLATE"und:ci"Query statements
You can assign a collation specification to the following query statements.
| Type | Support |
|---|---|
| Sorting | ORDER BY clause |
Collation specification details
A collation specification determines how strings are sorted and compared incollation-supported operations. You can define acollation specification forcollation-supported types.These types of collation specifications are available:
If a collation specification isn't defined, the default collation specificationis used. To learn more, see the next section.
Default collation specification
When a collation specification isn't assigned or is empty,the ordering behavior is identical to'unicode' collation,which you can learn about in theUnicode collation specification.
Unicode collation specification
collation_specification:'language_tag[:collation_attribute]'A unicode collation specification indicates that the operation should use theUnicode Collation Algorithm to sort and comparestrings. The collation specification can be aSTRING literal or aquery parameter.
The language tag
The language tag determines how strings are generally sorted and compared.Allowed values forlanguage_tag are:
- A standard locale string: This name is usually two or three lettersthat represent the language, optionally followed by an underscore or dash andtwo letters that represent the region — for example,
en_US. Thesenames are defined by theCommon Locale Data Repository (CLDR). und: A locale string representing theundetermined locale.undis aspecial language tag defined in theIANA language subtag registry and used toindicate an undetermined locale. This is also known as theroot locale andcan be considered thedefault Unicode collation. It defines a reasonable,locale agnostic collation. It differs significantly fromunicode.unicode: Returns data in Unicode code point order, which is identical tothe ordering behavior whenCOLLATEisn't used. The sort orderwill look largely arbitrary to human users.
The collation attribute
In addition to the language tag, the unicode collation specification can havean optionalcollation_attribute, which enables additional rules for sortingand comparing strings. Allowed values are:
ci: Collation is case-insensitive.cs: Collation is case-sensitive. By default,collation_attributeisimplicitlycs.
If you're using theunicode language tag with a collation attribute, thesecaveats apply:
unicode:csis identical tounicode.unicode:ciis identical tound:ci. It's recommended to migrateunicode:citobinary.
Collation specification example
This is what theci collation attribute looks like when used with theund language tag in theORDER BY clause:
SELECTPlaceFROMLocationsORDERBYPlaceCOLLATE'und:ci'Caveats
Differing strings can be considered equal.For instance,
ẞ(LATIN CAPITAL LETTER SHARP S) is considered equal to'SS'in some contexts. The following expressions both evaluate toTRUE:COLLATE('ẞ', 'und:ci') > COLLATE('SS', 'und:ci')COLLATE('ẞ1', 'und:ci') < COLLATE('SS2', 'und:ci')
This is similar to how case insensitivity works.
In search operations, strings with different lengths could be consideredequal. To ensure consistency, collation should be used withoutsearch tailoring.
There are a wide range of unicode code points (punctuation, symbols, etc),that are treated as if they aren't there. So strings withand without them are sorted identically. For example, the format controlcode point
U+2060is ignored when the following strings are sorted:SELECT*FROMUNNEST(['oran\u2060ge1','\u2060orange2','orange3'])ASfruitORDERBYfruitCOLLATE'und'/*---------+| fruit |+---------+| orange1 || orange2 || orange3 |+---------*/Orderingmay change. The Unicode specification of the
undcollation canchange occasionally, which can affect sortingorder. If you need a stable sort order that'sguaranteed to never change, useunicodecollation.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-12-17 UTC.