Collation Stay organized with collections Save and categorize content based on your preferences.
GoogleSQL for BigQuery supports collation. Collationdefines rules to sort and compare strings in certainoperations, such as conditional expressions, joins, andgroupings.
By default, GoogleSQL sorts strings case-sensitively. This means thata andA are treated as different letters, andZ would come beforea.
Example default sorting: Apple, Zebra, apple
By contrast, collation lets you sort and compare strings case-insensitively oraccording to specific language rules.
Example case-insensitive collation: Apple, apple, Zebra
To customize collation for acollation-supported operation, you typicallyassign a collationspecification to at least one string in the operation inputs.Some operations can't use collation, but canpropagate collation throughthem.
Collation is useful when you need fine-tuned control over how values are sorted,joined, or grouped in tables.
Operations affected by collation
The following example query operations are affected by collation whensorting and comparing strings:
| Operations |
|---|
| Collation-supportedcomparison operations |
| Join operations |
ORDER BY |
GROUP BY |
WINDOW for window functions |
| Collation-supportedscalar functions |
| Collation-supportedaggregate functions |
| Set operations |
NULLIF conditional expression |
Operations that propagate collation
Collation can pass through some query operations to other partsof a query. When collation passes through an operation in aquery, this is known aspropagation. During propagation:
- If an input contains no collation specification or an emptycollation specification and another input contains an explicitly definedcollation, the explicitly defined collation is used for all of the inputs.
- All inputs with a non-empty explicitly defined collation specification musthave the same type of collation specification, otherwise an error is thrown.
GoogleSQL has severalfunctions,operators, andexpressionsthat can propagate collation.
In the following example, the'und:ci' collation specification is propagatedfrom thecharacter column to theORDER BY operation.
-- With collationSELECT*FROMUNNEST([COLLATE('B','und:ci'),'b','a'])AScharacterORDERBYcharacter/*-----------+ | character | +-----------+ | a | | B | | b | +-----------*/-- Without collationSELECT*FROMUNNEST(['B','b','a'])AScharacterORDERBYcharacter/*-----------+ | character | +-----------+ | B | | a | | b | +-----------*/Functions
The following example functions propagate collation.
| Function | Notes |
|---|---|
AEAD.DECRYPT_STRING | |
ANY_VALUE | |
ARRAY_AGG | Collation on input arguments are propagated as collation on the array element. |
ARRAY_FIRST | |
ARRAY_LAST | |
ARRAY_SLICE | |
ARRAY_TO_STRING | Collation on array elements are propagated to output. |
COLLATE | |
CONCAT | |
FORMAT | Collation fromformat_string to the returned string is propagated. |
FORMAT_DATE | Collation fromformat_string to the returned string is propagated. |
FORMAT_DATETIME | Collation fromformat_string to the returned string is propagated. |
FORMAT_TIME | Collation fromformat_string to the returned string is propagated. |
FORMAT_TIMESTAMP | Collation fromformat_string to the returned string is propagated. |
GREATEST | |
LAG | |
LEAD | |
LEAST | |
LEFT | |
LOWER | |
LPAD | |
MAX | |
MIN | |
NET.HOST | |
NET.PUBLIC_SUFFIX | |
NET.REG_DOMAIN | |
NTH_VALUE | |
NORMALIZE | |
NORMALIZE_AND_CASEFOLD | |
REPEAT | |
REPLACE | |
REVERSE | |
RIGHT | |
RPAD | |
SOUNDEX | |
SPLIT | Collation on input arguments are propagated as collation on the array element. |
STRING_AGG | |
SUBSTR | |
UPPER |
Operators
The following example operators propagate collation.
| Operator | Notes |
|---|---|
|| concatenation operator | |
| Array subscript operator | Propagated to output. |
| Set operators | Collation of an output column is decided by the collations of input columns at the same position. |
STRUCT field access operator | When getting aSTRUCT, collation on theSTRUCT field is propagated as the output collation. |
UNNEST | Collation on the input array element is propagated to output. |
Expressions
The following example expressions propagate collation.
| Expression | Notes |
|---|---|
ARRAY | When you construct anARRAY, collation on input arguments is propagated on the elements in theARRAY. |
CASE | |
CASE expr | |
COALESCE | |
IF | |
IFNULL | |
NULLIF | |
STRUCT | When you construct aSTRUCT, collation on input arguments is propagated on the fields in theSTRUCT. |
Additional features that support collation
These features in BigQuery generally support collation:
| Feature | Notes |
|---|---|
| Views | |
| Materialized views | This feature supports collation,butlimitations apply |
| Table functions | This feature supports collation,butlimitations apply |
| BI engine |
Where you can assign a collation specification
You can assign acollation specification to thesecollation-supported types:
- A
STRING - A
STRINGfield in aSTRUCT - A
STRINGelement in anARRAY
In addition:
- You can assign a default collation specification to a dataset when youcreate or alter it. This assigns a default collation specification to allfuture tables that are added to the dataset if the tables don't have theirown default collation specifications.
- You can assign a default collation specification to a table when you createor alter it. This assigns a collation specification to all futurecollation-supported columns that are added to the table if the columns don'thave collation specifications. This overrides adefault collation specification on a dataset.
- You can assign a collation specification to a collation-supported typein a column. A column that contains a collation-supported type in itscolumn dataset is a collation-supported column. This overrides adefault collation specification on a table.
- You can assign a collation specification to a collation-supportedquery operation.
- You can assign a collation specification to a collation-supported expressionwith the
COLLATEfunction. This overrides any collation specifications setpreviously.
In summary:
You can define a default collation specification for a dataset. For example:
CREATESCHEMA(...)DEFAULTCOLLATE'und:ci'You can define a default collation specification for a table. For example:
CREATETABLE(...)DEFAULTCOLLATE'und:ci'You can define a collation specification for a collation-supported column.For example:
CREATETABLE(case_insensitive_columnSTRINGCOLLATE'und:ci')You can specify a collation specification for a collation-supported expressionwith theCOLLATE function. For example:
SELECTCOLLATE('a','und:ci')AScharacterDDL statements
You can assign a collation specification to the following DDL statements.
| Location | Support | Notes |
|---|---|---|
| Dataset | CREATE SCHEMA | Create a dataset and optionally add a defaultcollation specification to the dataset. |
| Dataset | ALTER SCHEMA | Updates the default collation specificationfor a dataset. |
| Table | CREATE TABLE | Create a table and optionally add a defaultcollation specification to a tableor a collation specification to acollation-supported type in a column. You can't have collation on a column usedwith CLUSTERING. |
| Table | ALTER TABLE | Update the default collation specificationfor collation-supported type in a table. |
| Column | ADD COLUMN | Add a collation specification to acollation-supported type in a new columnin an existing table. |
Data types
You can assign a collation specification to the following data types.
| Type | Notes |
|---|---|
STRING | You can apply a collation specification directly tothis data type. |
STRUCT | You can apply a collation specification to aSTRING field in aSTRUCT. ASTRUCT canhaveSTRING fields with differentcollation specifications.ASTRUCT can only be used in comparisons with thefollowing operators and conditional expressions:=,!=,IN,NULLIF, andCASE. |
ARRAY | You can apply a collation specification to aSTRING element in anARRAY. AnARRAY canhaveSTRING elements with differentcollation specifications. |
COLLATE function to apply a collation specificationto collation-supported expressions.Functions, operators, and conditional expressions
You can assign a collation specification to the following functions, operators,and conditional expressions.
Functions
| Type | Support | Notes |
|---|---|---|
| Scalar | COLLATE | |
| Scalar | ENDS_WITH | |
| Scalar | GREATEST | |
| Scalar | INSTR | |
| Scalar | LEAST | |
| Scalar | REPLACE | |
| Scalar | SPLIT | |
| Scalar | STARTS_WITH | |
| Scalar | STRPOS | |
| Aggregate | COUNT | This operator is only affected bycollation when the input includestheDISTINCT argument. |
| Aggregate | MAX | |
| Aggregate | MIN |
Operators
| Support | Notes |
|---|---|
< | |
<= | |
> | |
>= | |
= | |
!= | |
[NOT] BETWEEN | |
[NOT] IN | Limitations apply. |
[NOT] LIKE | Limitations apply. |
Quantified[NOT] LIKE | Limitations apply. |
Conditional expressions
| Support | |
|---|---|
CASE | |
CASE expr | |
NULLIF |
The preceding collation-supported operations(functions, operators, and conditional expressions)can include input with explicitly defined collation specifications forcollation-supported types. In a collation-supported operation:
- All inputs with a non-empty, explicitly defined collation specification mustbe the same, otherwise an error is thrown.
- If an input doesn't contain an explicitly defined collationand another input contains an explicitly defined collation, theexplicitly defined collation is used for both.
For example:
-- Assume there's a table with this column declaration:CREATETABLEtable_a(col_aSTRINGCOLLATE'und:ci',col_bSTRINGCOLLATE'',col_cSTRING,col_dSTRINGCOLLATE'und:ci');-- This runs. Column 'b' has a collation specification and the-- column 'c' doesn't.SELECTSTARTS_WITH(col_b_expression,col_c_expression)FROMtable_a;-- This runs. Column 'a' and 'd' have the same collation specification.SELECTSTARTS_WITH(col_a_expression,col_d_expression)FROMtable_a;-- This runs. Even though column 'a' and 'b' have different-- collation specifications, column 'b' is considered the default collation-- because it's assigned to an empty collation specification.SELECTSTARTS_WITH(col_a_expression,col_b_expression)FROMtable_a;-- This works. Even though column 'a' and 'b' have different-- collation specifications, column 'b' is updated to use the same-- collation specification as column 'a'.SELECTSTARTS_WITH(col_a_expression,COLLATE(col_b_expression,'und:ci'))FROMtable_a;-- This runs. Column 'c' doesn't have a collation specification, so it uses the-- collation specification of column 'd'.SELECTSTARTS_WITH(col_c_expression,col_d_expression)FROMtable_a;Collation specification details
A collation specification determines how strings are sorted and compared incollation-supported operations. You can define theUnicode collation specification,und:ci, forcollation-supported types.
If a collation specification isn't defined, the default collation specificationis used. To learn more, see the next section.
Default collation specification
When a collation specification isn't assigned or is empty,'binary' collation is used. Binary collation indicates that theoperation should return data inUnicode code point order.You can't set binary collation explicitly.
In general, the following behavior occurs when an empty string is included incollation:
- If a string has
und:cicollation, the string comparison iscase-insensitive. - If a string has empty collation, the string comparison is case-sensitive.
- If string not assigned collation, the string comparison is case-sensitive.
- A column with unassigned collation inherit the table's defaultcollation.
- A column with empty collation doesn't inherit the table's default collation.
Unicode collation specification
collation_specification:'language_tag:collation_attribute'A unicode collation specification indicates that the operation should use theUnicode Collation Algorithm to sort and comparestrings. The collation specification can be aSTRING literal or aquery parameter.
The language tag
The language tag determines how strings are generally sorted and compared.Allowed values forlanguage_tag are:
und: A locale string representing theundetermined locale.undis aspecial language tag defined in theIANA language subtag registry and used toindicate an undetermined locale. This is also known as theroot locale andcan be considered thedefault Unicode collation. It defines a reasonable,locale agnostic collation.
The collation attribute
In addition to the language tag, the unicode collation specification musthave acollation_attribute, which enables additional rules for sortingand comparing strings. Allowed values are:
ci: Collation is case-insensitive.
Collation specification example
This is what theci collation attribute looks like when used with theund language tag in theCOLLATE function:
COLLATE('orange1','und:ci')Caveats
Differing strings can be considered equal.For instance,
ẞ(LATIN CAPITAL LETTER SHARP S) is considered equal to'SS'in some contexts. The following expressions both evaluate toTRUE:COLLATE('ẞ', 'und:ci') > COLLATE('SS', 'und:ci')COLLATE('ẞ1', 'und:ci') < COLLATE('SS2', 'und:ci')
This is similar to how case insensitivity works.
In search operations, strings with different lengths could be consideredequal. To ensure consistency, collation should be used withoutsearch tailoring.
There are a wide range of unicode code points (punctuation, symbols, etc),that are treated as if they aren't there. So strings withand without them are sorted identically. For example, the format controlcode point
U+2060is ignored when the following strings are sorted:SELECT*FROMUNNEST([COLLATE('oran\u2060ge1','und:ci'),COLLATE('\u2060orange2','und:ci'),COLLATE('orange3','und:ci')])ASfruitORDERBYfruit/*---------+| fruit |+---------+| orange1 || orange2 || orange3 |+---------*/Orderingmay change. The Unicode specification of the
undcollation canchange occasionally, which can affect sortingorder.
Limitations
Limitations for supported features are captured in the previoussections, but here are a few general limitations to keep in mind:
und:ciand empty collation are supported, but not othercollation specifications.- Operations and functions that don't support collation produce an errorif they encounter collated values.
You can't set non-empty collation on a clustering field.
CREATETABLEmy_dataset.my_table(wordSTRINGCOLLATE'und:ci',numberINT64)CLUSTERBYword;-- User error:-- "CLUSTER BY STRING column word with-- collation und:ci isn't supported"You can't create a materialized view with collated sort keys in anaggregate function.
CREATEMATERIALIZEDVIEWmy_dataset.my_viewASSELECT-- Assume collated_table.col_ci is a string column with 'und:ci' collation.ARRAY_AGG(col_int64ORDERBYcol_ci)AScol_int64_arrFROMmy_dataset.collated_table;-- User error:-- "Sort key with collation in aggregate function array_agg isn't-- supported in materialized view"If a materialized view has joined on collated columns and not all of thecollated columns were produced by the materialized view, it's possible thata query with the materialized view will use data from base tables rather thanthe materialized view.
CREATEMATERIALIZEDVIEWmy_dataset.my_mvASSELECTt1.col_ciASt1_col_ci,t2.col_int64ASt2_col_int64FROMmy_dataset.collated_table1ASt1JOINmy_dataset.collated_table2ASt2ONt1.col_ci=t2.col_ciSELECT*FROMmy_dataset.my_mvWHEREt1_col_ci='abc'-- Assuming collated_table1.col_ci and collated_table2.col_ci are columns-- with 'und:ci' collation, the query to my_mv may use data from-- collated_table1 and collated_table2, rather than data from my_mv.Table functions can't take collated arguments.
CREATETABLEFUNCTIONmy_dataset.my_tvf(xSTRING)AS(SELECTx);SELECT*FROMmy_dataset.my_tvf(COLLATE('abc','und:ci'));-- User error:-- "Collation 'und:ci' on argument of TVF call isn't allowed"A table function with collated output columns isn't supported if an explicitresult schema is present.
CREATETABLEFUNCTIONmy_dataset.my_tvf(xSTRING)RETURNSTABLE<output_strSTRING>AS(SELECTCOLLATE(x,'und:ci')ASoutput_str);-- User error:-- "Collation 'und:ci' on output column output_str isn't allowed when an-- explicit result schema is present"User-defined functions (UDFs) can't take collated arguments.
CREATEFUNCTIONtmp_dataset.my_udf(xSTRING)AS(x);SELECTtmp_dataset.my_udf(col_ci)FROMshared_dataset.table_collation_simple;-- User error:-- "Collation isn't allowed on argument x ('und:ci').-- Use COLLATE(arg, '') to remove collation at [1:8]"Collation in the return type of a user-defined function body isn't allowed.
CREATEFUNCTIONmy_dataset.my_udf(xSTRING)AS(COLLATE(x,'und:ci'));-- User error:-- "Collation ['und:ci'] in return type of user-defined function body is-- not allowed"External tables don't support collation.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2026-02-19 UTC.