Optimize storage for query performance
This page provides best practices for optimizing BigQuerystorage for query performance. You can alsooptimize storage for cost.While these best practices are primarily focused on tables using BigQuerystorage, they can be applied to external tables as well.
BigQuery stores data in columnar format. Column-orienteddatabases are optimized for analytic workloads that aggregate data over a verylarge number of records. As columns have typically more redundancy than rows,this characteristic allows for greater data compression by using techniques suchas run-length encoding. For more information about how BigQuerystores data, seeOverview of BigQuery storage.Optimizing BigQuery storage improvesquery performanceandcontrols cost.
BigQuery provides details about the storage consumption of yourresources.To view the table storage metadata, query the followingINFORMATION_SCHEMA views:
INFORMATION_SCHEMA.TABLE_STORAGEINFORMATION_SCHEMA.TABLE_STORAGE_BY_ORGANIZATIONINFORMATION_SCHEMA.TABLE_STORAGE_USAGE_TIMELINEINFORMATION_SCHEMA.TABLE_STORAGE_USAGE_TIMELINE_BY_ORGANIZATION
Cluster table data
Best practice: Create clustered tables.
To optimize storage for queries, start by clustering table data. By clusteringfrequently used columns, you can reduce the total volume of data scanned by thequery. For information about how to create clusters, seeCreate and use clustered tables.
Partition table data
Best practice: Divide large tables with partitions.
With partitions, you can group and sort your data by a set of definedcolumn characteristics, such as an integer column, a time-unit column, or theingestion time. Partitioning improves the query performance and controlcosts by reducing the number of bytes read by a query.
For more information about partitions, seeIntroduction to partitioned tables.
Use the table and partition expiration settings
Best practice: To optimize storage, configure the default expiration settings fordatasets,tables,andpartitioned tables.
You can control storage costs and optimize storage usage by setting the defaulttable expiration for newly created tables in a dataset. When a table expires,it gets deleted along with all of the data that the table contains. If you setthe property when the dataset is created, any table created in the dataset isdeleted after the expiration period. If you set the property after the datasetis created, only new tables are deleted after the expiration period.
For example, if you set the default table expiration to seven days, older data isautomatically deleted after one week.
This option is useful if you need access to only the most recent data. It isalso useful if you are experimenting with data and don't need to preserve it.
If your tables are partitioned by date, the dataset's default table expirationapplies to the individual partitions. You can also control partition expirationusing thetime_partitioning_expiration flag in the bq command-line tool ortheexpirationMs configuration setting in the API. When a partition expires,data in the partition is deleted but the partitioned table is not dropped evenif the table is empty.
For example, the following command expires partitions after three days:
bqmk\--time_partitioning_type=DAY\--time_partitioning_expiration=259200\project_id:dataset.table
Aggregate long-term data
Best practice: Identify if row-level data needs to be stored long term, andif not, only store aggregated data long term.
In many cases, details contained in transactional or row-level data are usefulin the short term, but are referenced less over the long term. In thesesituations, you can build aggregation queries to compute and store the metricsassociated with this data, and then use table or partition expiration tosystematically remove the row-level data. This reduces storage chargeswhile keeping metrics available for long-term consumption.
What's next
- Learn how tooptimize cost.
- Learn how tooptimize query.
- Learn how tooptimize functions.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2026-02-19 UTC.