Set operators¶
Set operators combine the intermediate results of multiple query blocks into a single result set.
General syntax¶
[(]<query>[)]{INTERSECT|{MINUS|EXCEPT}|UNION[{DISTINCT|ALL}][BYNAME]}[(]<query>[)][ORDERBY...][LIMIT...]
General usage notes¶
Each query can itself contain query operators, so that you can combine multiple query expressions with set operators.
You can apply theORDER BY andLIMIT / FETCH clauses to the resultof the set operator.
When using these operators:
Make sure that each query selects the same number of columns, with the exception of queries that include UNION BY NAMEor UNION ALL BY NAME.
Make sure that the data type of each column is consistent across the rows from different sources.One of the examples in theUse the UNION operator and cast mismatched data types sectionillustrates the potential problem and solution when data types don’t match.
In general, make sure the “meanings,” as well as the data types, of the columns match. The following query with theUNION ALL operator won’t produce the desired results:
SELECTLastName,FirstNameFROMemployeesUNIONALLSELECTFirstName,LastNameFROMcontractors;
CopyThe risk of error increases when you use an asterisk to specify all columns of a table, for example:
SELECT*FROMtable1UNIONALLSELECT*FROMtable2;
CopyIf the tables have the same number of columns, but the columns aren’t in the same order, the query results willprobably be incorrect when you use these operators.
The UNION BY NAME and UNION ALL BY NAME operators are exceptions for this scenario. For example, the followingquery returns the correct results:
SELECTLastName,FirstNameFROMemployeesUNIONALLBYNAMESELECTFirstName,LastNameFROMcontractors;
CopyThe names of the output columns are based on the names of the columns of the first query. For example,consider the following query:
SELECTLastName,FirstNameFROMemployeesUNIONALLSELECTFirstName,LastNameFROMcontractors;
CopyThis query behaves as though the query were the following:
SELECTLastName,FirstNameFROMemployeesUNIONALLSELECTFirstNameASLastName,LastNameASFirstNameFROMcontractors;
Copy
The precedence of the set operators matches the ANSI and ISO SQL standards:
The UNION [ALL] and MINUS (EXCEPT) operators have equal precedence.
The INTERSECT operator has higher precedence than UNION [ALL] and MINUS (EXCEPT).
Snowflake processes operators of equal precedence from left to right.
You can use parentheses to force the expressions to be evaluated in a different order.
Not all database vendors follow the ANSI/ISO standard for precedence of set operators. Snowflake recommends using parentheses tospecify the order of evaluation, especially if you are porting code from another vendor to Snowflake, or writing code that youmight execute on other databases as well as on Snowflake.
Sample tables for examples¶
Some of the examples in this topic use the following sample tables. Both tables have a postal code column. One table records the postal code ofeach sales office, and the other records the postal code of each customer.
CREATEORREPLACETABLEsales_office_postal_example(office_nameVARCHAR,postal_codeVARCHAR);INSERTINTOsales_office_postal_exampleVALUES('sales1','94061');INSERTINTOsales_office_postal_exampleVALUES('sales2','94070');INSERTINTOsales_office_postal_exampleVALUES('sales3','98116');INSERTINTOsales_office_postal_exampleVALUES('sales4','98005');CREATEORREPLACETABLEcustomer_postal_example(customerVARCHAR,postal_codeVARCHAR);INSERTINTOcustomer_postal_exampleVALUES('customer1','94066');INSERTINTOcustomer_postal_exampleVALUES('customer2','94061');INSERTINTOcustomer_postal_exampleVALUES('customer3','98444');INSERTINTOcustomer_postal_exampleVALUES('customer4','98005');
INTERSECT¶
Returns rows from one query’s result set which also appear in another query’s result set, with duplicate elimination.
Syntax¶
[(]<query>[)]INTERSECT[(]<query>[)]
INTERSECT operator examples¶
To find the postal codes that are in both thesales_office_postal_example
table and thecustomer_postal_example
table, query thesample tables:
SELECTpostal_codeFROMsales_office_postal_exampleINTERSECTSELECTpostal_codeFROMcustomer_postal_exampleORDERBYpostal_code;
+-------------+| POSTAL_CODE ||-------------|| 94061 || 98005 |+-------------+
MINUS , EXCEPT¶
Returns the rows returned by the first query that aren’t also returned by the second query.
The MINUS and EXCEPT keywords have the same meaning and can be used interchangeably.
Syntax¶
[(]<query>[)]MINUS[(]<query>[)][(]<query>[)]EXCEPT[(]<query>[)]
MINUS operator examples¶
Query thesample tables to find the postal codes in thesales_office_postal_example
table that aren’t also in thecustomer_postal_example
table:
SELECTpostal_codeFROMsales_office_postal_exampleMINUSSELECTpostal_codeFROMcustomer_postal_exampleORDERBYpostal_code;
+-------------+| POSTAL_CODE ||-------------|| 94070 || 98116 |+-------------+
Query thesample tables to find the postal codes in thecustomer_postal_example
table that aren’t also in thesales_office_postal_example
table:
SELECTpostal_codeFROMcustomer_postal_exampleMINUSSELECTpostal_codeFROMsales_office_postal_exampleORDERBYpostal_code;
+-------------+| POSTAL_CODE ||-------------|| 94066 || 98444 |+-------------+
UNION [ { DISTINCT | ALL } ] [ BY NAME ]¶
Combines the result sets from two queries:
UNION [ DISTINCT ] combines rows by column position with duplicate elimination.
UNION ALL combines rows by column position without duplicate elimination.
UNION [ DISTINCT ] BY NAME combines rows by column name with duplicate elimination.
UNION ALL BY NAME combines rows by column name without duplicate elimination.
The default is UNION DISTINCT (that is, combine rows by column position with duplicate elimination).The DISTINCT keyword is optional. The DISTINCT keyword and the ALL keyword are mutuallyexclusive.
Use UNION or UNION ALL when the column positions match in the tables that you are combining. UseUNION BY NAME or UNION ALL BY NAME for the following use cases:
The tables that you are combining have varying column orders.
The tables that you are combining have evolving schemas, where columns are added or reordered.
You want to combine subsets of columns that have different positions in the tables.
Syntax¶
[(]<query>[)]UNION[{DISTINCT|ALL}][BYNAME][(]<query>[)]
Usage notes for the BY NAME clause¶
In addition to thegeneral usage notes, the following usage notes apply toUNION BY NAME and UNION ALL BY NAME:
Columns with the same identifiers are matched and combined. Matching of unquoted identifiers is case-insensitive,and matching of quoted identifiers is case-sensitive.
The inputs aren’t required to have the same number of columns. If a column exists in one input but not the other, itis filled with NULL values in the combined result set for each row where it’s missing.
The order of columns in the combined result set is determined by the order of unique columns fromleft to right, as they are first encountered.
UNION operator examples¶
The following examples use the UNION operator:
Combine the results from two queries by column position¶
To combine the result sets by column position from two queries on thesample tables, use the UNION operator:
SELECToffice_nameoffice_or_customer,postal_codeFROMsales_office_postal_exampleUNIONSELECTcustomer,postal_codeFROMcustomer_postal_exampleORDERBYpostal_code;
+--------------------+-------------+| OFFICE_OR_CUSTOMER | POSTAL_CODE ||--------------------+-------------|| sales1 | 94061 || customer2 | 94061 || customer1 | 94066 || sales2 | 94070 || sales4 | 98005 || customer4 | 98005 || sales3 | 98116 || customer3 | 98444 |+--------------------+-------------+
Combine the results from two queries by column name¶
Create two tables with differing column order and insert data:
CREATEORREPLACETABLEunion_demo_column_order1(aINTEGER,bVARCHAR);INSERTINTOunion_demo_column_order1VALUES(1,'one'),(2,'two'),(3,'three');CREATEORREPLACETABLEunion_demo_column_order2(BVARCHAR,AINTEGER);INSERTINTOunion_demo_column_order2VALUES('three',3),('four',4);
To combine the result sets by column name from two queries, use the UNION BY NAME operator:
SELECT*FROMunion_demo_column_order1UNIONBYNAMESELECT*FROMunion_demo_column_order2ORDERBYa;
+---+-------+| A | B ||---+-------|| 1 | one || 2 | two || 3 | three || 4 | four |+---+-------+
The output shows that the query eliminated the duplicate row (with3
in columnA
andthree
in columnB
).
To combine the tables without duplicate elimination, use the UNION ALL BY NAME operator:
SELECT*FROMunion_demo_column_order1UNIONALLBYNAMESELECT*FROMunion_demo_column_order2ORDERBYa;
+---+-------+| A | B ||---+-------|| 1 | one || 2 | two || 3 | three || 3 | three || 4 | four |+---+-------+
Notice that the cases of the column names don’t match in the two tables. The column names are lowercase intheunion_demo_column_order1
table and uppercase in theunion_demo_column_order2
table. If you runa query with quotation marks around the column names, an error is returned because the matching of quotedidentifiers is case-sensitive. For example, the following query places quotation marks around the column names:
SELECT'a','b'FROMunion_demo_column_order1UNIONALLBYNAMESELECT'B','A'FROMunion_demo_column_order2ORDERBYa;
000904 (42000): SQL compilation error: error line 4 at position 9invalid identifier 'A'
Use an alias to combine the results from two queries with different column names¶
When you use the UNION BY NAME operator to combine the result sets by column name from two queries on thesample tables, the rows in the result set have NULL values becausethe column names don’t match:
SELECToffice_name,postal_codeFROMsales_office_postal_exampleUNIONBYNAMESELECTcustomer,postal_codeFROMcustomer_postal_exampleORDERBYpostal_code;
+-------------+-------------+-----------+| OFFICE_NAME | POSTAL_CODE | CUSTOMER ||-------------+-------------+-----------|| sales1 | 94061 | NULL || NULL | 94061 | customer2 || NULL | 94066 | customer1 || sales2 | 94070 | NULL || sales4 | 98005 | NULL || NULL | 98005 | customer4 || sales3 | 98116 | NULL || NULL | 98444 | customer3 |+-------------+-------------+-----------+
The output shows that columns with different identifiers aren’t combined and that rows have NULLvalues for columns that are in one table but not the other. Thepostal_code
column is in both tables,so there are no NULL values in the output for thepostal_code
column.
The following query uses the aliasoffice_or_customer
so that columns with different nameshave the same name for the duration of the query:
SELECToffice_nameASoffice_or_customer,postal_codeFROMsales_office_postal_exampleUNIONBYNAMESELECTcustomerASoffice_or_customer,postal_codeFROMcustomer_postal_exampleORDERBYpostal_code;
+--------------------+-------------+| OFFICE_OR_CUSTOMER | POSTAL_CODE ||--------------------+-------------|| sales1 | 94061 || customer2 | 94061 || customer1 | 94066 || sales2 | 94070 || sales4 | 98005 || customer4 | 98005 || sales3 | 98116 || customer3 | 98444 |+--------------------+-------------+
Use the UNION operator and cast mismatched data types¶
This example demonstrates a potential issue with using the UNION operator when data types don’t match,then provides the solution.
Start by creating the tables and inserting some data:
CREATEORREPLACETABLEunion_test1(vVARCHAR);CREATEORREPLACETABLEunion_test2(iINTEGER);INSERTINTOunion_test1(v)VALUES('Smith, Jane');INSERTINTOunion_test2(i)VALUES(42);
Execute a UNION by column position operation with different data types (a VARCHAR value inunion_test1
and anINTEGER value inunion_test2
):
SELECTvFROMunion_test1UNIONSELECTiFROMunion_test2;
This query returns an error:
100038 (22018): Numeric value 'Smith, Jane' is not recognized
Now use explicit casting to convert the inputs to a compatible type:
SELECTv::VARCHARFROMunion_test1UNIONSELECTi::VARCHARFROMunion_test2;
+-------------+| V::VARCHAR ||-------------|| Smith, Jane || 42 |+-------------+