Query multiple tables using a wildcard table

Note: Wildcard tables have many limitations and are lessperformant than regular BigQuery tables that take advantage ofpartitioning andclustering.As a best practice, use partitioned or clustered tables when possible.

Wildcard tables enable you to query multiple tables using concise SQLstatements. Wildcard tables are available only in GoogleSQL. For equivalentfunctionality in legacy SQL, seeTable wildcard functions.

A wildcard table represents a union of all the tables that match the wildcardexpression. For example, the followingFROM clause uses the wildcardexpressiongsod* to match all tables in thenoaa_gsod dataset that beginwith the stringgsod.

FROM`bigquery-public-data.noaa_gsod.gsod*`

Each row in the wildcard table contains a special column,_TABLE_SUFFIX, which containsthe value matched by the wildcard character.

Limitations

Wildcard table queries are subject to the following limitations.

The wildcard table functionality doesn't support views. If the wildcard tablematches any view in the dataset, the query returns an error even if your querycontains aWHERE clause on the_TABLE_SUFFIX pseudocolumn to filter outthe view.
Cached results are not supported for queries against multipletables using a wildcard even if theUse Cached Results option is checked.If you run the same wildcard query multiple times, you are billed for eachquery.
Wildcard tables support built-in BigQuery storage only. Youcannot use wildcards to query anexternal tableor aview.
You cannot use wildcard queries over tables with incompatible partitioning ora mix of partitioned and non-partitioned tables. The queried tables alsoneed to have identical clustering specifications.
You can use wildcard tables with partitioned tables, and both partitionpruning and cluster pruning are supported. However, tables that are clusteredbut not partitioned don't get the full cluster pruning benefit from wildcardusage.
Queries that containdata manipulation language(DML) statements cannot use a wildcard table as the target of the query. Forexample, a wildcard table may be used in theFROM clause of anUPDATEquery, but a wildcard table cannot be used as the target of theUPDATEoperation.
Filters on the_TABLE_SUFFIX or_PARTITIONTIME pseudocolumns that includeJavaScript user-defined functions don't limit the number of tables scanned in awildcard table.
Wildcard queries are not supported for tables protected by customer-managedencryption keys (CMEK).
All tables referenced in a wildcard query must have exactly the same set oftag keys and values.
If a single scanned table has a schema mismatch (that is, a column with thesame name is of a different type), the query fails with the errorCannotread field of type X as Y Field: column_name. All tables are matched evenif you are using the equality operator=. For example, in the followingquery, the tablemy_dataset.my_table_03_backup is also scanned. Thus, thequery may fail due to schema mismatch. However, if there is no schemamismatch, the results come from the tablemy_dataset.my_table_03 only, asexpected.
```
SELECT *FROMmy_project.my_dataset.my_table_*WHERE _TABLE_SUFFIX = '03'
```
Wildcard tables aren't supported by BI Engine.
Wildcard tables aren't supported for tables with flexible column names orrenamed columns. To work around this limitation, exclude the affected columnsfrom the query or filter the table using the_TABLE_SUFFIX pseudocolumn.

Before you begin

Ensure that you are using GoogleSQL. For moreinformation, seeSwitching SQL dialects.
If you are using legacy SQL, seeTable wildcard functions.
Many of the examples on this page use a public dataset from theNational Oceanic and Atmospheric Administration (NOAA). For more informationabout the data, seeNOAA Global Surface Summary of the Day Weather Data.

When to use wildcard tables

Wildcard tables are useful when a dataset contains multiple, similarly namedtables that have compatible schemas. Typically, such datasets contain tablesthat each represent data from a single day, month, or year. For example, apublic dataset hosted by BigQuery, theNOAA Global Surface Summary of the Day Weather Data,contains a table for each year from 1929 through the present.

A query that scans all the table IDs from 1929 through 1940 would be very longif you have to name all 12 tables in theFROM clause (most of the tables areomitted in this sample):

#standardSQLSELECTmax,ROUND((max-32)*5/9,1)celsius,mo,da,yearFROM(SELECT*FROM`bigquery-public-data.noaa_gsod.gsod1929`UNIONALLSELECT*FROM`bigquery-public-data.noaa_gsod.gsod1930`UNIONALLSELECT*FROM`bigquery-public-data.noaa_gsod.gsod1931`UNIONALL# ... Tables omitted for brevitySELECT*FROM`bigquery-public-data.noaa_gsod.gsod1940`)WHEREmax!=9999.9# code for missing dataORDERBYmaxDESC

The same query using a wildcard table is much more concise:

#standardSQLSELECTmax,ROUND((max-32)*5/9,1)celsius,mo,da,yearFROM`bigquery-public-data.noaa_gsod.gsod19*`WHEREmax!=9999.9# code for missing dataAND_TABLE_SUFFIXBETWEEN'29'AND'40'ORDERBYmaxDESC

Wildcard tables support built-in BigQuery storage only. You cannot use wildcards when querying anexternal table or aview.

Wildcard table syntax

Wildcard table syntax:

SELECTFROM  `<project-id>.<dataset-id>.<table-prefix>*`WHEREbool_expression

<project-id>: Cloud Platform project ID. Optional if you use your default project ID.
<dataset-id>: BigQuery dataset ID.
<table-prefix>: A string that is common across all tables that are matched by the wildcardcharacter. The table prefix is optional. Omitting the table prefix matchesall tables in the dataset.
* (wildcard character): The wildcard character, "*", represents one or more characters of a tablename. The wildcard character can appear only as the final character of awildcard table name.

Queries with wildcard tables support the_TABLE_SUFFIX pseudocolumn in theWHERE clause. This column contains the values matched by the wildcardcharacter, so that queries can filter which tables are accessed. For example,the followingWHERE clauses usecomparison operatorsto filter the matched tables:

WHERE  _TABLE_SUFFIX BETWEEN '29' AND '40'WHERE  _TABLE_SUFFIX = '1929'WHERE  _TABLE_SUFFIX < '1941'

For more information about the_TABLE_SUFFIX pseudocolumn, seeFiltering selected tables using _TABLE_SUFFIX.

Enclose table names with wildcards in backticks

The wildcard table name contains the special character (*), which means thatyou must enclose the wildcard table name in backtick (`) characters. Forexample, the following query is valid because it uses backticks:

#standardSQL/* Valid SQL query */SELECTmaxFROM`bigquery-public-data.noaa_gsod.gsod*`WHEREmax!=9999.9# code for missing dataAND_TABLE_SUFFIX='1929'ORDERBYmaxDESC

The following query is NOT valid because it isn't properly quoted with backticks:

#standardSQL/* Syntax error: Expected end of statement but got "-" at [4:11] */SELECTmaxFROM# missing backticksbigquery-public-data.noaa_gsod.gsod*WHEREmax!=9999.9# code for missing dataAND_TABLE_SUFFIX='1929'ORDERBYmaxDESC

Quotation marks don't work:

#standardSQL/* Syntax error: Unexpected string literal: 'bigquery-public-data.noaa_gsod.gsod*' at [4:3] */SELECTmaxFROM# quotes are not backticks'bigquery-public-data.noaa_gsod.gsod*'WHEREmax!=9999.9# code for missing dataAND_TABLE_SUFFIX='1929'ORDERBYmaxDESC

Query tables using wildcard tables

Wildcard tables enable you to query several tables concisely.For example, a public dataset hosted by BigQuery,theNOAA Global Surface Summary of the Day Weather Data,contains a table for each year from 1929 through the present that all share thecommon prefixgsod followed by the four-digit year. The tables are namedgsod1929,gsod1930,gsod1931, etc.

To query a group of tables that share a common prefix, use the table wildcardsymbol (*) after the table prefix in yourFROM statement. For example,the following query finds the maximum temperature reported during the 1940s:

#standardSQLSELECT  max,  ROUND((max-32)*5/9,1) celsius,  mo,  da,  yearFROM`bigquery-public-data.noaa_gsod.gsod194*`WHERE  max != 9999.9 # code for missing dataORDER BY  max DESC

Filter selected tables using _TABLE_SUFFIX

To restrict a query so that it scans only a specified set of tables, use the_TABLE_SUFFIX pseudocolumn in aWHERE clause with a condition that is a constant expression.

The_TABLE_SUFFIX pseudocolumn contains the values matched by the tablewildcard. For example, the previous sample query, which scans all tables fromthe 1940s, uses a table wildcard to represent the last digit of the year:

FROM`bigquery-public-data.noaa_gsod.gsod194*`

The corresponding_TABLE_SUFFIX pseudocolumn contains values in the range0 through9, representing the tablesgsod1940 throughgsod1949. These_TABLE_SUFFIX values can be used in aWHERE clause to filter for specifictables.

For example, to filter for the maximum temperature in the years 1940 and 1944,use the values0 and4 for_TABLE_SUFFIX:

#standardSQLSELECT  max,  ROUND((max-32)*5/9,1) celsius,  mo,  da,  yearFROM`bigquery-public-data.noaa_gsod.gsod194*`WHERE  max != 9999.9 # code for missing data  AND ( _TABLE_SUFFIX = '0'    OR _TABLE_SUFFIX = '4' )ORDER BY  max DESC

Using_TABLE_SUFFIX can greatly reduce the number of bytes scanned, which helps reduce the cost of running your queries.

However, filters on_TABLE_SUFFIX that include conditions without constantexpressions don't limit the number of tables scanned in a wildcard table. Forexample, the following query does not limit the tables scanned for the wildcardtablebigquery-public-data.noaa_gsod.gsod19* because the filter uses thedynamic value of thetable_id column:

#standardSQL# Scans all tables that match the prefix `gsod19`SELECTROUND((max-32)*5/9,1)celsiusFROM`bigquery-public-data.noaa_gsod.gsod19*`WHERE_TABLE_SUFFIX=(SELECTSUBSTR(MAX(table_name),LENGTH('gsod19')+1)FROM`bigquery-public-data.noaa_gsod.INFORMATION_SCHEMA.TABLES`WHEREtable_nameLIKE'gsod194%')

As another example, the following query limits the scan based on the firstfilter condition,_TABLE_SUFFIX BETWEEN '40' and '60', because it is a constant expression.However, the following query does not limit the scan based on the second filtercondition,_TABLE_SUFFIX = (SELECT SUBSTR(MAX(table_name), LENGTH('gsod19')+ 1) FROMbigquery-public-data.noaa_gsod.INFORMATION_SCHEMA.TABLES WHERE table_name LIKE'gsod194%'), because it is a dynamic expression:

#standardSQL# Scans all tables with names that fall between `gsod1940` and `gsod1960`SELECTROUND((max-32)*5/9,1)celsiusFROM`bigquery-public-data.noaa_gsod.gsod19*`WHERE_TABLE_SUFFIXBETWEEN'40'AND'60'AND_TABLE_SUFFIX=(SELECTSUBSTR(MAX(table_name),LENGTH('gsod19')+1)FROM`bigquery-public-data.noaa_gsod.INFORMATION_SCHEMA.TABLES`WHEREtable_nameLIKE'gsod194%')

As a workaround, you can perform two separate queries instead; for example:

First query:

#standardSQL# Get the list of tables that match the required table name prefixesSELECT SUBSTR(MAX(table_name), LENGTH('gsod19') + 1)      FROM`bigquery-public-data.noaa_gsod.INFORMATION_SCHEMA.TABLES`      WHERE table_name LIKE 'gsod194%'

Second query:

#standardSQL# Construct the second query based on the values from the first querySELECT  ROUND((max-32)*5/9,1) celsiusFROM`bigquery-public-data.noaa_gsod.gsod19*`WHERE _TABLE_SUFFIX = '49'

These example queries use theINFORMATION_SCHEMA.TABLES view. For moreinformation about theINFORMATION_SCHEMA table, seeGetting tablemetadata using INFORMATION_SCHEMA.

Scanning a range of tables using _TABLE_SUFFIX

To scan a range of tables, use the_TABLE_SUFFIX pseudocolumn along withtheBETWEEN clause. For example, to find the maximum temperature reported inthe years between 1929 and 1935 inclusive, use the table wildcard to representthe last two digits of the year:

#standardSQLSELECT  max,  ROUND((max-32)*5/9,1) celsius,  mo,  da,  yearFROM`bigquery-public-data.noaa_gsod.gsod19*`WHERE  max != 9999.9 # code for missing data  AND _TABLE_SUFFIX BETWEEN '29' and '35'ORDER BY  max DESC

Scanning a range of ingestion-time partitioned tables using _PARTITIONTIME

To scan a range of ingestion-time partitioned tables, use the_PARTITIONTIMEpseudocolumn with the_TABLE_SUFFIX pseudocolumn. For example, the following query scans the January 1, 2017 partition in the tablemy_dataset.mytable_id1.

#standardSQLSELECT  field1,  field2,  field3FROM`my_dataset.mytable_*`WHERE  _TABLE_SUFFIX = 'id1'  AND _PARTITIONTIME = TIMESTAMP('2017-01-01')

Querying all tables in a dataset

To scan all tables in a dataset, you can use an empty prefix and the tablewildcard, which means that the_TABLE_SUFFIX pseudocolumn containsfull table names. For example, the followingFROM clause scans all tables inthe GSOD dataset:

FROM`bigquery-public-data.noaa_gsod.*`

With an empty prefix, the_TABLE_SUFFIX pseudocolumn contains full tablenames. For example, the following query is equivalent to the previous examplethat finds the maximum temperature between the years 1929 and 1935, but usesfull table names in theWHERE clause:

#standardSQLSELECT  max,  ROUND((max-32)*5/9,1) celsius,  mo,  da,  yearFROM`bigquery-public-data.noaa_gsod.*`WHERE  max != 9999.9 # code for missing data  AND _TABLE_SUFFIX BETWEEN 'gsod1929' and 'gsod1935'ORDER BY  max DESC

Note, however, that longer prefixes generally perform better. For moreinformation, seeBest practices.

Query execution details

Schema used for query evaluation

In order to execute a GoogleSQL query that uses a wildcard table,BigQuery automatically infers the schema for that table.BigQuery uses the schema for the most recently created tablethat matches the wildcard as the schema for the wildcard table. Even if yourestrict the number of tables that you want to use from the wildcard table usingthe_TABLE_SUFFIX pseudocolumn in aWHERE clause, BigQueryuses the schema for the most recently created table that matches the wildcard.

If a column from the inferred schema doesn't exist in a matched table, thenBigQuery returnsNULL values for that column in the rows forthe table that is missing the column.

If the schema is inconsistent across the tables matched by the wildcardquery, then BigQuery returns an error. This is the case whenthe columns of the matched tables have different data types, or when the columnswhich are not present in all of the matched tables cannot be assumed to have anull value.

Best practices

Longer prefixes generally perform better than shorter prefixes. For example,the following query uses a long prefix (gsod200):

#standardSQLSELECTmaxFROM`bigquery-public-data.noaa_gsod.gsod200*`WHEREmax!=9999.9# code for missing dataAND_TABLE_SUFFIXBETWEEN'0'AND'1'ORDERBYmaxDESC

The following query generally performs worse because it uses an emptyprefix:

#standardSQLSELECTmaxFROM`bigquery-public-data.noaa_gsod.*`WHEREmax!=9999.9# code for missing dataAND_TABLE_SUFFIXBETWEEN'gsod2000'AND'gsod2001'ORDERBYmaxDESC

Partitioning is recommended over sharding, because partitioned tables performbetter. Sharding reduces performance while creating more tables to manage. Formore information, seePartitioning versus sharding.

For best practices for controlling costs in BigQuery, seeControlling costs in BigQuery

What's next

For more information about GoogleSQL, see theGoogleSQL query reference.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-18 UTC.

Movatterモバイル変換