Partition and cluster BigQuery tables Stay organized with collections Save and categorize content based on your preferences.
By configuring partitioning and clustering for your BigQuerydestination, you can optimize query performance and control costs. Partitioningand clustering lets you reduce the amount of data that BigQuery needsto scan during query execution.
Table partitioning
Table partitioning divides your table into segments called partitions, which makeit easier to manage and query your data. You partition tables by specifying apartition column, which is used to segment the table.
Datastream supports three types of partitioning for BigQuerytables:
Partitioning by ingestion time: you can partition a table based on thetime that Datastream ingests the data into BigQuery. Youcan select hourly, daily, monthly, or yearly granularity for ingestion timepartitioning. Daily is the default granularity.
Partitioning by time-unit: you can partition a table based on asource column of a data type that Datastream maps to one of thefollowing BigQuery data types:
Note: When you configure partitioning in Google Cloud, the column selectiondrop-down shows only the source columns or fields that are eligible forpartitioning. If you use the Datastream API, Google Cloud CLI,or Terraform, the request fails with an error if you try to use a sourcetype that doesn't map to an eligible BigQuery data type.DATE,DATETIME, orTIMESTAMP.For information about how Datastream maps source data types toBigQuery data types, seeData type mappings inBigQuery.Depending on the BigQuery data type to which the source columnmaps, you can select one of the following granularity types:
DAY,MONTH,YEAR: for theDATEdata type.
Note:HOUR,DAY,MONTH,YEAR: for theDATETIMEandTIMESTAMPdatatypes.DAYis the default granularity type for all eligible data types.
Partitioning by integer range: you can partition a table by asource column of a data type that Datastream then maps to theBigQuery
INTEGERdata type. You need to provide the followinginformation:- Start: the start value of the first partition range (inclusive).
- End: The end value of the last partition range (exclusive).
- Interval: The width of each partition range.
For more information, seeInteger range partitioning.
For information about how to configure partitioning for your stream, seeConfigure information about the source database for the stream.
Table clustering
Clustering sorts the data in a table based on the values in the clusteringcolumns and colocates data with similar values. Queries that filter by theclustered columns only scan the relevant data blocks instead of the entire tableor table partition. This can improve query performance and reduce query costs.
You can cluster your BigQuery tables by up to four columns. The orderin which you specify the columns determines the sort order of the data.
Note: If you don't specify clustering settings for a table, Datastream defaults to using up to four primary keys from the source table as the clustering keys in BigQuery.You can cluster a table based on a source column of a data type thatDatastream maps to one of the following BigQuery data types:
BIGNUMERICBOOLEANDATEDATETIMEGEOGRAPHYINT64NUMERICRANGESTRINGTIMESTAMP
For information about how Datastream maps source data types toBigQuery data types, seeData type mappings in BigQuery.
For information about how to configure clustering for your stream, seeConfigure information about the source database for the stream.
Limitations
For partitioning limitations in BigQuery, seeIntroduction topartitioned tables.
For clustering limitations in BigQuery, seeIntroduction toclustered tables.
For quota limits that apply to BigQuery partitioned tables, seeQuotas and limits.
Additionally, the following limitations apply when usingDatastream:
- Partitioning and clustering is only applied when Datastreamcreates a new BigQuery table. If you want to configure or modifypartitioning and clustering settings for a table that already exists,first delete it from BigQuery, and then configure or modifyits settings in Datastream.
- For MongoDB sources, you can partition your tables only by ingestion time.
- Clustering isn't supported for MongoDB databases.
What's next
- To learn more about partitioning in BigQuery, seeIntroduction to partitioned tables andManaging partitioned tables.
- For more information about clustering in BigQuery, seeIntroduction to clustered tables andManage clustered tables.
- To learn more about streams, seeStreamlifecycle.
- To learn how to create a stream, seeCreate astream.
- To learn more about BigQuery, seeBigQuerydestination.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-12-15 UTC.