Collation

GoogleSQL for BigQuery supports collation. Collationdefines rules to sort and compare strings in certainoperations, such as conditional expressions, joins, andgroupings.

By default, GoogleSQL sorts strings case-sensitively. This means thata andA are treated as different letters, andZ would come beforea.

Example default sorting: Apple, Zebra, apple

By contrast, collation lets you sort and compare strings case-insensitively oraccording to specific language rules.

Example case-insensitive collation: Apple, apple, Zebra

To customize collation for acollation-supported operation, you typicallyassign a collationspecification to at least one string in the operation inputs.Some operations can't use collation, but canpropagate collation throughthem.

Collation is useful when you need fine-tuned control over how values are sorted,joined, or grouped in tables.

Operations affected by collation

The following example query operations are affected by collation whensorting and comparing strings:

Operations
Collation-supportedcomparison operations
Join operations
ORDER BY
GROUP BY
WINDOW for window functions
Collation-supportedscalar functions
Collation-supportedaggregate functions
Set operations
NULLIF conditional expression

Operations that propagate collation

Collation can pass through some query operations to other partsof a query. When collation passes through an operation in aquery, this is known aspropagation. During propagation:

  • If an input contains no collation specification or an emptycollation specification and another input contains an explicitly definedcollation, the explicitly defined collation is used for all of the inputs.
  • All inputs with a non-empty explicitly defined collation specification musthave the same type of collation specification, otherwise an error is thrown.

GoogleSQL has severalfunctions,operators, andexpressionsthat can propagate collation.

In the following example, the'und:ci' collation specification is propagatedfrom thecharacter column to theORDER BY operation.

-- With collationSELECT*FROMUNNEST([COLLATE('B','und:ci'),'b','a'])AScharacterORDERBYcharacter/*-----------+ | character | +-----------+ | a         | | B         | | b         | +-----------*/
-- Without collationSELECT*FROMUNNEST(['B','b','a'])AScharacterORDERBYcharacter/*-----------+ | character | +-----------+ | B         | | a         | | b         | +-----------*/

Functions

The following example functions propagate collation.

FunctionNotes
AEAD.DECRYPT_STRING
ANY_VALUE
ARRAY_AGGCollation on input arguments are propagated as collation on the array element.
ARRAY_FIRST
ARRAY_LAST
ARRAY_SLICE
ARRAY_TO_STRINGCollation on array elements are propagated to output.
COLLATE
CONCAT
FORMATCollation fromformat_string to the returned string is propagated.
FORMAT_DATECollation fromformat_string to the returned string is propagated.
FORMAT_DATETIMECollation fromformat_string to the returned string is propagated.
FORMAT_TIMECollation fromformat_string to the returned string is propagated.
FORMAT_TIMESTAMPCollation fromformat_string to the returned string is propagated.
GREATEST
LAG
LEAD
LEAST
LEFT
LOWER
LPAD
MAX
MIN
NET.HOST
NET.PUBLIC_SUFFIX
NET.REG_DOMAIN
NTH_VALUE
NORMALIZE
NORMALIZE_AND_CASEFOLD
REPEAT
REPLACE
REVERSE
RIGHT
RPAD
SOUNDEX
SPLITCollation on input arguments are propagated as collation on the array element.
STRING_AGG
SUBSTR
UPPER

Operators

The following example operators propagate collation.

OperatorNotes
|| concatenation operator
Array subscript operatorPropagated to output.
Set operatorsCollation of an output column is decided by the collations of input columns at the same position.
STRUCT field access operatorWhen getting aSTRUCT, collation on theSTRUCT field is propagated as the output collation.
UNNESTCollation on the input array element is propagated to output.

Expressions

The following example expressions propagate collation.

ExpressionNotes
ARRAYWhen you construct anARRAY, collation on input arguments is propagated on the elements in theARRAY.
CASE
CASE expr
COALESCE
IF
IFNULL
NULLIF
STRUCTWhen you construct aSTRUCT, collation on input arguments is propagated on the fields in theSTRUCT.

Additional features that support collation

These features in BigQuery generally support collation:

FeatureNotes
Views
Materialized viewsThis feature supports collation,butlimitations apply
Table functionsThis feature supports collation,butlimitations apply
BI engine

Where you can assign a collation specification

You can assign acollation specification to thesecollation-supported types:

  • ASTRING
  • ASTRING field in aSTRUCT
  • ASTRING element in anARRAY

In addition:

  • You can assign a default collation specification to a dataset when youcreate or alter it. This assigns a default collation specification to allfuture tables that are added to the dataset if the tables don't have theirown default collation specifications.
  • You can assign a default collation specification to a table when you createor alter it. This assigns a collation specification to all futurecollation-supported columns that are added to the table if the columns don'thave collation specifications. This overrides adefault collation specification on a dataset.
  • You can assign a collation specification to a collation-supported typein a column. A column that contains a collation-supported type in itscolumn dataset is a collation-supported column. This overrides adefault collation specification on a table.
  • You can assign a collation specification to a collation-supportedquery operation.
  • You can assign a collation specification to a collation-supported expressionwith theCOLLATE function. This overrides any collation specifications setpreviously.

In summary:

You can define a default collation specification for a dataset. For example:

CREATESCHEMA(...)DEFAULTCOLLATE'und:ci'

You can define a default collation specification for a table. For example:

CREATETABLE(...)DEFAULTCOLLATE'und:ci'

You can define a collation specification for a collation-supported column.For example:

CREATETABLE(case_insensitive_columnSTRINGCOLLATE'und:ci')

You can specify a collation specification for a collation-supported expressionwith theCOLLATE function. For example:

SELECTCOLLATE('a','und:ci')AScharacter

DDL statements

You can assign a collation specification to the following DDL statements.

LocationSupportNotes
DatasetCREATE SCHEMACreate a dataset and optionally add a defaultcollation specification to the dataset.
DatasetALTER SCHEMAUpdates the default collation specificationfor a dataset.
TableCREATE TABLECreate a table and optionally add a defaultcollation specification to a tableor a collation specification to acollation-supported type in a column.

You can't have collation on a column usedwithCLUSTERING.
TableALTER TABLEUpdate the default collation specificationfor collation-supported type in a table.
ColumnADD COLUMNAdd a collation specification to acollation-supported type in a new columnin an existing table.

Data types

You can assign a collation specification to the following data types.

TypeNotes
STRINGYou can apply a collation specification directly tothis data type.
STRUCTYou can apply a collation specification to aSTRING field in aSTRUCT. ASTRUCT canhaveSTRING fields with differentcollation specifications.ASTRUCT can only be used in comparisons with thefollowing operators and conditional expressions:=,!=,IN,NULLIF, andCASE.
ARRAYYou can apply a collation specification to aSTRING element in anARRAY. AnARRAY canhaveSTRING elements with differentcollation specifications.
Note: Use theCOLLATE function to apply a collation specificationto collation-supported expressions.

Functions, operators, and conditional expressions

You can assign a collation specification to the following functions, operators,and conditional expressions.

Functions

TypeSupportNotes
ScalarCOLLATE
ScalarENDS_WITH
ScalarGREATEST
ScalarINSTR
ScalarLEAST
ScalarREPLACE
ScalarSPLIT
ScalarSTARTS_WITH
ScalarSTRPOS
AggregateCOUNTThis operator is only affected bycollation when the input includestheDISTINCT argument.
AggregateMAX
AggregateMIN

Operators

SupportNotes
<
<=
>
>=
=
!=
[NOT] BETWEEN
[NOT] INLimitations apply.
[NOT] LIKELimitations apply.
Quantified[NOT] LIKELimitations apply.

Conditional expressions

Support
CASE
CASE expr
NULLIF

The preceding collation-supported operations(functions, operators, and conditional expressions)can include input with explicitly defined collation specifications forcollation-supported types. In a collation-supported operation:

  • All inputs with a non-empty, explicitly defined collation specification mustbe the same, otherwise an error is thrown.
  • If an input doesn't contain an explicitly defined collationand another input contains an explicitly defined collation, theexplicitly defined collation is used for both.

For example:

-- Assume there's a table with this column declaration:CREATETABLEtable_a(col_aSTRINGCOLLATE'und:ci',col_bSTRINGCOLLATE'',col_cSTRING,col_dSTRINGCOLLATE'und:ci');-- This runs. Column 'b' has a collation specification and the-- column 'c' doesn't.SELECTSTARTS_WITH(col_b_expression,col_c_expression)FROMtable_a;-- This runs. Column 'a' and 'd' have the same collation specification.SELECTSTARTS_WITH(col_a_expression,col_d_expression)FROMtable_a;-- This runs. Even though column 'a' and 'b' have different-- collation specifications, column 'b' is considered the default collation-- because it's assigned to an empty collation specification.SELECTSTARTS_WITH(col_a_expression,col_b_expression)FROMtable_a;-- This works. Even though column 'a' and 'b' have different-- collation specifications, column 'b' is updated to use the same-- collation specification as column 'a'.SELECTSTARTS_WITH(col_a_expression,COLLATE(col_b_expression,'und:ci'))FROMtable_a;-- This runs. Column 'c' doesn't have a collation specification, so it uses the-- collation specification of column 'd'.SELECTSTARTS_WITH(col_c_expression,col_d_expression)FROMtable_a;

Collation specification details

A collation specification determines how strings are sorted and compared incollation-supported operations. You can define theUnicode collation specification,und:ci, forcollation-supported types.

If a collation specification isn't defined, the default collation specificationis used. To learn more, see the next section.

Default collation specification

When a collation specification isn't assigned or is empty,'binary' collation is used. Binary collation indicates that theoperation should return data inUnicode code point order.You can't set binary collation explicitly.

In general, the following behavior occurs when an empty string is included incollation:

  • If a string hasund:ci collation, the string comparison iscase-insensitive.
  • If a string has empty collation, the string comparison is case-sensitive.
  • If string not assigned collation, the string comparison is case-sensitive.
  • A column with unassigned collation inherit the table's defaultcollation.
  • A column with empty collation doesn't inherit the table's default collation.

Unicode collation specification

collation_specification:'language_tag:collation_attribute'

A unicode collation specification indicates that the operation should use theUnicode Collation Algorithm to sort and comparestrings. The collation specification can be aSTRING literal or aquery parameter.

The language tag

The language tag determines how strings are generally sorted and compared.Allowed values forlanguage_tag are:

  • und: A locale string representing theundetermined locale.und is aspecial language tag defined in theIANA language subtag registry and used toindicate an undetermined locale. This is also known as theroot locale andcan be considered thedefault Unicode collation. It defines a reasonable,locale agnostic collation.

The collation attribute

In addition to the language tag, the unicode collation specification musthave acollation_attribute, which enables additional rules for sortingand comparing strings. Allowed values are:

  • ci: Collation is case-insensitive.

Collation specification example

This is what theci collation attribute looks like when used with theund language tag in theCOLLATE function:

COLLATE('orange1','und:ci')

Caveats

  • Differing strings can be considered equal.For instance, (LATIN CAPITAL LETTER SHARP S) is considered equal to'SS'in some contexts. The following expressions both evaluate toTRUE:

    • COLLATE('ẞ', 'und:ci') > COLLATE('SS', 'und:ci')
    • COLLATE('ẞ1', 'und:ci') < COLLATE('SS2', 'und:ci')

    This is similar to how case insensitivity works.

  • In search operations, strings with different lengths could be consideredequal. To ensure consistency, collation should be used withoutsearch tailoring.

  • There are a wide range of unicode code points (punctuation, symbols, etc),that are treated as if they aren't there. So strings withand without them are sorted identically. For example, the format controlcode pointU+2060 is ignored when the following strings are sorted:

    SELECT*FROMUNNEST([COLLATE('oran\u2060ge1','und:ci'),COLLATE('\u2060orange2','und:ci'),COLLATE('orange3','und:ci')])ASfruitORDERBYfruit/*---------+| fruit   |+---------+| orange1 || orange2 || orange3 |+---------*/
  • Orderingmay change. The Unicode specification of theund collation canchange occasionally, which can affect sortingorder.

Limitations

Limitations for supported features are captured in the previoussections, but here are a few general limitations to keep in mind:

  • und:ci and empty collation are supported, but not othercollation specifications.
  • Operations and functions that don't support collation produce an errorif they encounter collated values.
  • You can't set non-empty collation on a clustering field.

    CREATETABLEmy_dataset.my_table(wordSTRINGCOLLATE'und:ci',numberINT64)CLUSTERBYword;-- User error:-- "CLUSTER BY STRING column word with-- collation und:ci isn't supported"
  • You can't create a materialized view with collated sort keys in anaggregate function.

    CREATEMATERIALIZEDVIEWmy_dataset.my_viewASSELECT-- Assume collated_table.col_ci is a string column with 'und:ci' collation.ARRAY_AGG(col_int64ORDERBYcol_ci)AScol_int64_arrFROMmy_dataset.collated_table;-- User error:-- "Sort key with collation in aggregate function array_agg isn't-- supported in materialized view"
  • If a materialized view has joined on collated columns and not all of thecollated columns were produced by the materialized view, it's possible thata query with the materialized view will use data from base tables rather thanthe materialized view.

    CREATEMATERIALIZEDVIEWmy_dataset.my_mvASSELECTt1.col_ciASt1_col_ci,t2.col_int64ASt2_col_int64FROMmy_dataset.collated_table1ASt1JOINmy_dataset.collated_table2ASt2ONt1.col_ci=t2.col_ciSELECT*FROMmy_dataset.my_mvWHEREt1_col_ci='abc'-- Assuming collated_table1.col_ci and collated_table2.col_ci are columns-- with 'und:ci' collation, the query to my_mv may use data from-- collated_table1 and collated_table2, rather than data from my_mv.
  • Table functions can't take collated arguments.

    CREATETABLEFUNCTIONmy_dataset.my_tvf(xSTRING)AS(SELECTx);SELECT*FROMmy_dataset.my_tvf(COLLATE('abc','und:ci'));-- User error:-- "Collation 'und:ci' on argument of TVF call isn't allowed"
  • A table function with collated output columns isn't supported if an explicitresult schema is present.

    CREATETABLEFUNCTIONmy_dataset.my_tvf(xSTRING)RETURNSTABLE<output_strSTRING>AS(SELECTCOLLATE(x,'und:ci')ASoutput_str);-- User error:-- "Collation 'und:ci' on output column output_str isn't allowed when an-- explicit result schema is present"
  • User-defined functions (UDFs) can't take collated arguments.

    CREATEFUNCTIONtmp_dataset.my_udf(xSTRING)AS(x);SELECTtmp_dataset.my_udf(col_ci)FROMshared_dataset.table_collation_simple;-- User error:-- "Collation isn't allowed on argument x ('und:ci').-- Use COLLATE(arg, '') to remove collation at [1:8]"
  • Collation in the return type of a user-defined function body isn't allowed.

    CREATEFUNCTIONmy_dataset.my_udf(xSTRING)AS(COLLATE(x,'und:ci'));-- User error:-- "Collation ['und:ci'] in return type of user-defined function body is-- not allowed"
  • External tables don't support collation.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-19 UTC.