Collation in GoogleSQL

GoogleSQL for Spanner supports collation. Collation defines rules to sort andcompare strings in anORDER BY operation.

By default, GoogleSQL sorts strings case-sensitively. This means thata andA are treated as different letters, andZ would come beforea.

Example default sorting: Apple, Zebra, apple

By contrast, collation lets you sort and compare strings case-insensitively oraccording to specific language rules.

Example case-insensitive collation: Apple, apple, Zebra

To customize collation inthe operation, include theCOLLATE clause with acollationspecification.

Collation is useful when you need fine-tuned control over how values are sorted,joined, or grouped in tables.

Where you can assign a collation specification

In theORDER BY clause, you can specify a collation specification for acollation-supported column. This overrides anycollation specifications set previously.

For example:

SELECTPlaceFROMLocationsORDERBYPlaceCOLLATE"und:ci"

Query statements

You can assign a collation specification to the following query statements.

TypeSupport
SortingORDER BY clause

Collation specification details

A collation specification determines how strings are sorted and compared incollation-supported operations. You can define acollation specification forcollation-supported types.These types of collation specifications are available:

If a collation specification isn't defined, the default collation specificationis used. To learn more, see the next section.

Default collation specification

When a collation specification isn't assigned or is empty,the ordering behavior is identical to'unicode' collation,which you can learn about in theUnicode collation specification.

Unicode collation specification

collation_specification:'language_tag[:collation_attribute]'

A unicode collation specification indicates that the operation should use theUnicode Collation Algorithm to sort and comparestrings. The collation specification can be aSTRING literal or aquery parameter.

The language tag

The language tag determines how strings are generally sorted and compared.Allowed values forlanguage_tag are:

  • A standard locale string: This name is usually two or three lettersthat represent the language, optionally followed by an underscore or dash andtwo letters that represent the region — for example,en_US. Thesenames are defined by theCommon Locale Data Repository (CLDR).
  • und: A locale string representing theundetermined locale.und is aspecial language tag defined in theIANA language subtag registry and used toindicate an undetermined locale. This is also known as theroot locale andcan be considered thedefault Unicode collation. It defines a reasonable,locale agnostic collation. It differs significantly fromunicode.
  • unicode: Returns data in Unicode code point order, which is identical tothe ordering behavior whenCOLLATE isn't used. The sort orderwill look largely arbitrary to human users.

The collation attribute

In addition to the language tag, the unicode collation specification can havean optionalcollation_attribute, which enables additional rules for sortingand comparing strings. Allowed values are:

  • ci: Collation is case-insensitive.
  • cs: Collation is case-sensitive. By default,collation_attribute isimplicitlycs.

If you're using theunicode language tag with a collation attribute, thesecaveats apply:

  • unicode:cs is identical tounicode.
  • unicode:ci is identical tound:ci. It's recommended to migrateunicode:ci tobinary.

Collation specification example

This is what theci collation attribute looks like when used with theund language tag in theORDER BY clause:

SELECTPlaceFROMLocationsORDERBYPlaceCOLLATE'und:ci'

Caveats

  • Differing strings can be considered equal.For instance, (LATIN CAPITAL LETTER SHARP S) is considered equal to'SS'in some contexts. The following expressions both evaluate toTRUE:

    • COLLATE('ẞ', 'und:ci') > COLLATE('SS', 'und:ci')
    • COLLATE('ẞ1', 'und:ci') < COLLATE('SS2', 'und:ci')

    This is similar to how case insensitivity works.

  • In search operations, strings with different lengths could be consideredequal. To ensure consistency, collation should be used withoutsearch tailoring.

  • There are a wide range of unicode code points (punctuation, symbols, etc),that are treated as if they aren't there. So strings withand without them are sorted identically. For example, the format controlcode pointU+2060 is ignored when the following strings are sorted:

    SELECT*FROMUNNEST(['oran\u2060ge1','\u2060orange2','orange3'])ASfruitORDERBYfruitCOLLATE'und'/*---------+| fruit   |+---------+| orange1 || orange2 || orange3 |+---------*/
  • Orderingmay change. The Unicode specification of theund collation canchange occasionally, which can affect sortingorder. If you need a stable sort order that'sguaranteed to never change, useunicode collation.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-17 UTC.