Partition and cluster BigQuery tables

By configuring partitioning and clustering for your BigQuerydestination, you can optimize query performance and control costs. Partitioningand clustering lets you reduce the amount of data that BigQuery needsto scan during query execution.

Table partitioning

Table partitioning divides your table into segments called partitions, which makeit easier to manage and query your data. You partition tables by specifying apartition column, which is used to segment the table.

Datastream supports three types of partitioning for BigQuerytables:

For information about how to configure partitioning for your stream, seeConfigure information about the source database for the stream.

Table clustering

Clustering sorts the data in a table based on the values in the clusteringcolumns and colocates data with similar values. Queries that filter by theclustered columns only scan the relevant data blocks instead of the entire tableor table partition. This can improve query performance and reduce query costs.

You can cluster your BigQuery tables by up to four columns. The orderin which you specify the columns determines the sort order of the data.

Note: If you don't specify clustering settings for a table, Datastream defaults to using up to four primary keys from the source table as the clustering keys in BigQuery.

You can cluster a table based on a source column of a data type thatDatastream maps to one of the following BigQuery data types:

  • BIGNUMERIC
  • BOOLEAN
  • DATE
  • DATETIME
  • GEOGRAPHY
  • INT64
  • NUMERIC
  • RANGE
  • STRING
  • TIMESTAMP

For information about how Datastream maps source data types toBigQuery data types, seeData type mappings in BigQuery.

For information about how to configure clustering for your stream, seeConfigure information about the source database for the stream.

Limitations

  • For partitioning limitations in BigQuery, seeIntroduction topartitioned tables.

  • For clustering limitations in BigQuery, seeIntroduction toclustered tables.

  • For quota limits that apply to BigQuery partitioned tables, seeQuotas and limits.

  • Additionally, the following limitations apply when usingDatastream:

    • Partitioning and clustering is only applied when Datastreamcreates a new BigQuery table. If you want to configure or modifypartitioning and clustering settings for a table that already exists,first delete it from BigQuery, and then configure or modifyits settings in Datastream.
    • For MongoDB sources, you can partition your tables only by ingestion time.
    • Clustering isn't supported for MongoDB databases.

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.