Collation

GoogleSQL for BigQuery supports collation. Collationdefines rules to sort and compare strings in certainoperations, such as conditional expressions, joins, andgroupings.

By default, GoogleSQL sorts strings case-sensitively. This means thata andA are treated as different letters, andZ would come beforea.

Example default sorting: Apple, Zebra, apple

By contrast, collation lets you sort and compare strings case-insensitively oraccording to specific language rules.

Example case-insensitive collation: Apple, apple, Zebra

To customize collation for acollation-supported operation, you typicallyassign a collationspecification to at least one string in the operation inputs.Some operations can't use collation, but canpropagate collation throughthem.

Collation is useful when you need fine-tuned control over how values are sorted,joined, or grouped in tables.

Operations affected by collation

The following example query operations are affected by collation whensorting and comparing strings:

Operations
Collation-supportedcomparison operations
Join operations
`ORDER BY`
`GROUP BY`
`WINDOW` for window functions
Collation-supportedscalar functions
Collation-supportedaggregate functions
Set operations
`NULLIF` conditional expression

Operations that propagate collation

Collation can pass through some query operations to other partsof a query. When collation passes through an operation in aquery, this is known aspropagation. During propagation:

If an input contains no collation specification or an emptycollation specification and another input contains an explicitly definedcollation, the explicitly defined collation is used for all of the inputs.
All inputs with a non-empty explicitly defined collation specification musthave the same type of collation specification, otherwise an error is thrown.

GoogleSQL has severalfunctions,operators, andexpressionsthat can propagate collation.

In the following example, the'und:ci' collation specification is propagatedfrom thecharacter column to theORDER BY operation.

-- With collationSELECT*FROMUNNEST([COLLATE('B','und:ci'),'b','a'])AScharacterORDERBYcharacter/*-----------+ | character | +-----------+ | a         | | B         | | b         | +-----------*/

-- Without collationSELECT*FROMUNNEST(['B','b','a'])AScharacterORDERBYcharacter/*-----------+ | character | +-----------+ | B         | | a         | | b         | +-----------*/

Functions

The following example functions propagate collation.

Function	Notes
`AEAD.DECRYPT_STRING`
`ANY_VALUE`
`ARRAY_AGG`	Collation on input arguments are propagated as collation on the array element.
`ARRAY_FIRST`
`ARRAY_LAST`
`ARRAY_SLICE`
`ARRAY_TO_STRING`	Collation on array elements are propagated to output.
`COLLATE`
`CONCAT`
`FORMAT`	Collation from`format_string` to the returned string is propagated.
`FORMAT_DATE`	Collation from`format_string` to the returned string is propagated.
`FORMAT_DATETIME`	Collation from`format_string` to the returned string is propagated.
`FORMAT_TIME`	Collation from`format_string` to the returned string is propagated.
`FORMAT_TIMESTAMP`	Collation from`format_string` to the returned string is propagated.
`GREATEST`
`LAG`
`LEAD`
`LEAST`
`LEFT`
`LOWER`
`LPAD`
`MAX`
`MIN`
`NET.HOST`
`NET.PUBLIC_SUFFIX`
`NET.REG_DOMAIN`
`NTH_VALUE`
`NORMALIZE`
`NORMALIZE_AND_CASEFOLD`
`REPEAT`
`REPLACE`
`REVERSE`
`RIGHT`
`RPAD`
`SOUNDEX`
`SPLIT`	Collation on input arguments are propagated as collation on the array element.
`STRING_AGG`
`SUBSTR`
`UPPER`

Operators

The following example operators propagate collation.

Operator	Notes
`\|\|` concatenation operator
Array subscript operator	Propagated to output.
Set operators	Collation of an output column is decided by the collations of input columns at the same position.
`STRUCT` field access operator	When getting a`STRUCT`, collation on the`STRUCT` field is propagated as the output collation.
`UNNEST`	Collation on the input array element is propagated to output.

Expressions

The following example expressions propagate collation.

Expression	Notes
`ARRAY`	When you construct an`ARRAY`, collation on input arguments is propagated on the elements in the`ARRAY`.
`CASE`
`CASE` expr
`COALESCE`
`IF`
`IFNULL`
`NULLIF`
`STRUCT`	When you construct a`STRUCT`, collation on input arguments is propagated on the fields in the`STRUCT`.

Additional features that support collation

These features in BigQuery generally support collation:

Feature	Notes
Views
Materialized views	This feature supports collation,butlimitations apply
Table functions	This feature supports collation,butlimitations apply
BI engine

Where you can assign a collation specification

You can assign acollation specification to thesecollation-supported types:

ASTRING
ASTRING field in aSTRUCT
ASTRING element in anARRAY

In addition:

You can assign a default collation specification to a dataset when youcreate or alter it. This assigns a default collation specification to allfuture tables that are added to the dataset if the tables don't have theirown default collation specifications.
You can assign a default collation specification to a table when you createor alter it. This assigns a collation specification to all futurecollation-supported columns that are added to the table if the columns don'thave collation specifications. This overrides adefault collation specification on a dataset.
You can assign a collation specification to a collation-supported typein a column. A column that contains a collation-supported type in itscolumn dataset is a collation-supported column. This overrides adefault collation specification on a table.
You can assign a collation specification to a collation-supportedquery operation.
You can assign a collation specification to a collation-supported expressionwith theCOLLATE function. This overrides any collation specifications setpreviously.

In summary:

You can define a default collation specification for a dataset. For example:

CREATESCHEMA(...)DEFAULTCOLLATE'und:ci'

You can define a default collation specification for a table. For example:

CREATETABLE(...)DEFAULTCOLLATE'und:ci'

You can define a collation specification for a collation-supported column.For example:

CREATETABLE(case_insensitive_columnSTRINGCOLLATE'und:ci')

You can specify a collation specification for a collation-supported expressionwith theCOLLATE function. For example:

SELECTCOLLATE('a','und:ci')AScharacter

DDL statements

You can assign a collation specification to the following DDL statements.

Location	Support	Notes
Dataset	`CREATE SCHEMA`	Create a dataset and optionally add a defaultcollation specification to the dataset.
Dataset	`ALTER SCHEMA`	Updates the default collation specificationfor a dataset.
Table	`CREATE TABLE`	Create a table and optionally add a defaultcollation specification to a tableor a collation specification to acollation-supported type in a column. You can't have collation on a column usedwith`CLUSTERING`.
Table	`ALTER TABLE`	Update the default collation specificationfor collation-supported type in a table.
Column	`ADD COLUMN`	Add a collation specification to acollation-supported type in a new columnin an existing table.

Data types

You can assign a collation specification to the following data types.

Type	Notes
`STRING`	You can apply a collation specification directly tothis data type.
`STRUCT`	You can apply a collation specification to a`STRING` field in a`STRUCT`. A`STRUCT` canhave`STRING` fields with differentcollation specifications.A`STRUCT` can only be used in comparisons with thefollowing operators and conditional expressions:`=`,`!=`,`IN`,`NULLIF`, and`CASE`.
`ARRAY`	You can apply a collation specification to a`STRING` element in an`ARRAY`. An`ARRAY` canhave`STRING` elements with differentcollation specifications.

Note: Use the COLLATE function to apply a collation specificationto collation-supported expressions.

Functions, operators, and conditional expressions

You can assign a collation specification to the following functions, operators,and conditional expressions.

Functions

Type	Support	Notes
Scalar	`COLLATE`
Scalar	`ENDS_WITH`
Scalar	`GREATEST`
Scalar	`INSTR`
Scalar	`LEAST`
Scalar	`REPLACE`
Scalar	`SPLIT`
Scalar	`STARTS_WITH`
Scalar	`STRPOS`
Aggregate	`COUNT`	This operator is only affected bycollation when the input includesthe`DISTINCT` argument.
Aggregate	`MAX`
Aggregate	`MIN`

Operators

Support	Notes
`<`
`<=`
`>`
`>=`
`=`
`!=`
`[NOT] BETWEEN`
`[NOT] IN`	Limitations apply.
`[NOT] LIKE`	Limitations apply.
Quantified`[NOT] LIKE`	Limitations apply.

Conditional expressions

Support
`CASE`
`CASE` expr
`NULLIF`

The preceding collation-supported operations(functions, operators, and conditional expressions)can include input with explicitly defined collation specifications forcollation-supported types. In a collation-supported operation:

All inputs with a non-empty, explicitly defined collation specification mustbe the same, otherwise an error is thrown.
If an input doesn't contain an explicitly defined collationand another input contains an explicitly defined collation, theexplicitly defined collation is used for both.

For example:

-- Assume there's a table with this column declaration:CREATETABLEtable_a(col_aSTRINGCOLLATE'und:ci',col_bSTRINGCOLLATE'',col_cSTRING,col_dSTRINGCOLLATE'und:ci');-- This runs. Column 'b' has a collation specification and the-- column 'c' doesn't.SELECTSTARTS_WITH(col_b_expression,col_c_expression)FROMtable_a;-- This runs. Column 'a' and 'd' have the same collation specification.SELECTSTARTS_WITH(col_a_expression,col_d_expression)FROMtable_a;-- This runs. Even though column 'a' and 'b' have different-- collation specifications, column 'b' is considered the default collation-- because it's assigned to an empty collation specification.SELECTSTARTS_WITH(col_a_expression,col_b_expression)FROMtable_a;-- This works. Even though column 'a' and 'b' have different-- collation specifications, column 'b' is updated to use the same-- collation specification as column 'a'.SELECTSTARTS_WITH(col_a_expression,COLLATE(col_b_expression,'und:ci'))FROMtable_a;-- This runs. Column 'c' doesn't have a collation specification, so it uses the-- collation specification of column 'd'.SELECTSTARTS_WITH(col_c_expression,col_d_expression)FROMtable_a;

Collation specification details

A collation specification determines how strings are sorted and compared incollation-supported operations. You can define theUnicode collation specification,und:ci, forcollation-supported types.

If a collation specification isn't defined, the default collation specificationis used. To learn more, see the next section.

Default collation specification

When a collation specification isn't assigned or is empty,'binary' collation is used. Binary collation indicates that theoperation should return data inUnicode code point order.You can't set binary collation explicitly.

In general, the following behavior occurs when an empty string is included incollation:

If a string hasund:ci collation, the string comparison iscase-insensitive.
If a string has empty collation, the string comparison is case-sensitive.
If string not assigned collation, the string comparison is case-sensitive.
A column with unassigned collation inherit the table's defaultcollation.
A column with empty collation doesn't inherit the table's default collation.

Unicode collation specification

collation_specification:'language_tag:collation_attribute'

A unicode collation specification indicates that the operation should use theUnicode Collation Algorithm to sort and comparestrings. The collation specification can be aSTRING literal or aquery parameter.

The language tag

The language tag determines how strings are generally sorted and compared.Allowed values forlanguage_tag are:

und: A locale string representing theundetermined locale.und is aspecial language tag defined in theIANA language subtag registry and used toindicate an undetermined locale. This is also known as theroot locale andcan be considered thedefault Unicode collation. It defines a reasonable,locale agnostic collation.

The collation attribute

In addition to the language tag, the unicode collation specification musthave acollation_attribute, which enables additional rules for sortingand comparing strings. Allowed values are:

ci: Collation is case-insensitive.

Collation specification example

This is what theci collation attribute looks like when used with theund language tag in theCOLLATE function:

COLLATE('orange1','und:ci')

Caveats

Differing strings can be considered equal.For instance,ẞ (LATIN CAPITAL LETTER SHARP S) is considered equal to'SS'in some contexts. The following expressions both evaluate toTRUE:
- COLLATE('ẞ', 'und:ci') > COLLATE('SS', 'und:ci')
- COLLATE('ẞ1', 'und:ci') < COLLATE('SS2', 'und:ci')
This is similar to how case insensitivity works.
In search operations, strings with different lengths could be consideredequal. To ensure consistency, collation should be used withoutsearch tailoring.
There are a wide range of unicode code points (punctuation, symbols, etc),that are treated as if they aren't there. So strings withand without them are sorted identically. For example, the format controlcode pointU+2060 is ignored when the following strings are sorted:
```
SELECT*FROMUNNEST([COLLATE('oran\u2060ge1','und:ci'),COLLATE('\u2060orange2','und:ci'),COLLATE('orange3','und:ci')])ASfruitORDERBYfruit/*---------+| fruit   |+---------+| orange1 || orange2 || orange3 |+---------*/
```
Orderingmay change. The Unicode specification of theund collation canchange occasionally, which can affect sortingorder.

Limitations

Limitations for supported features are captured in the previoussections, but here are a few general limitations to keep in mind:

und:ci and empty collation are supported, but not othercollation specifications.
Operations and functions that don't support collation produce an errorif they encounter collated values.

You can't set non-empty collation on a clustering field.

CREATETABLEmy_dataset.my_table(wordSTRINGCOLLATE'und:ci',numberINT64)CLUSTERBYword;-- User error:-- "CLUSTER BY STRING column word with-- collation und:ci isn't supported"

You can't create a materialized view with collated sort keys in anaggregate function.

CREATEMATERIALIZEDVIEWmy_dataset.my_viewASSELECT-- Assume collated_table.col_ci is a string column with 'und:ci' collation.ARRAY_AGG(col_int64ORDERBYcol_ci)AScol_int64_arrFROMmy_dataset.collated_table;-- User error:-- "Sort key with collation in aggregate function array_agg isn't-- supported in materialized view"

If a materialized view has joined on collated columns and not all of thecollated columns were produced by the materialized view, it's possible thata query with the materialized view will use data from base tables rather thanthe materialized view.

CREATEMATERIALIZEDVIEWmy_dataset.my_mvASSELECTt1.col_ciASt1_col_ci,t2.col_int64ASt2_col_int64FROMmy_dataset.collated_table1ASt1JOINmy_dataset.collated_table2ASt2ONt1.col_ci=t2.col_ciSELECT*FROMmy_dataset.my_mvWHEREt1_col_ci='abc'-- Assuming collated_table1.col_ci and collated_table2.col_ci are columns-- with 'und:ci' collation, the query to my_mv may use data from-- collated_table1 and collated_table2, rather than data from my_mv.

Table functions can't take collated arguments.

CREATETABLEFUNCTIONmy_dataset.my_tvf(xSTRING)AS(SELECTx);SELECT*FROMmy_dataset.my_tvf(COLLATE('abc','und:ci'));-- User error:-- "Collation 'und:ci' on argument of TVF call isn't allowed"

A table function with collated output columns isn't supported if an explicitresult schema is present.

CREATETABLEFUNCTIONmy_dataset.my_tvf(xSTRING)RETURNSTABLE<output_strSTRING>AS(SELECTCOLLATE(x,'und:ci')ASoutput_str);-- User error:-- "Collation 'und:ci' on output column output_str isn't allowed when an-- explicit result schema is present"

User-defined functions (UDFs) can't take collated arguments.

CREATEFUNCTIONtmp_dataset.my_udf(xSTRING)AS(x);SELECTtmp_dataset.my_udf(col_ci)FROMshared_dataset.table_collation_simple;-- User error:-- "Collation isn't allowed on argument x ('und:ci').-- Use COLLATE(arg, '') to remove collation at [1:8]"

Collation in the return type of a user-defined function body isn't allowed.

CREATEFUNCTIONmy_dataset.my_udf(xSTRING)AS(COLLATE(x,'und:ci'));-- User error:-- "Collation ['und:ci'] in return type of user-defined function body is-- not allowed"

External tables don't support collation.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-19 UTC.

Movatterモバイル変換

Collation Stay organized with collections Save and categorize content based on your preferences.

Operations affected by collation

Operations that propagate collation

Functions

Operators

Expressions

Additional features that support collation

Where you can assign a collation specification

DDL statements

Data types

Functions, operators, and conditional expressions

Functions

Operators

Conditional expressions

Collation specification details

Default collation specification

Unicode collation specification

The language tag

The collation attribute

Collation specification example

Caveats

Limitations

Collation