Manage partition and cluster recommendations
This document describes how the partition and cluster recommender works, howto view your recommendations and insights, and how can apply partition andcluster recommendations.
How the recommender works
The BigQuery partitioning and clustering recommender generatespartition orclusterrecommendations to optimize your BigQuery tables. The recommenderanalyzes workflows on your BigQuery tables and offersrecommendations to better optimize your workflows and query costs using eithertable partitioning or table clustering.
For more information about the Recommender service, see theRecommender overview.
The partitioning and clustering recommenderuses your organization's workload execution data from up to 30 days in the pastto analyze eachBigQuery table for suboptimal partitioning and clusteringconfigurations. The recommender also uses machine learning to predict how muchthe workload execution could be optimized with different partitioning orclustering configurations. If the recommender finds that partitioning orclustering a table yields significant savings, the recommender generates arecommendation. The partitioning and clustering recommendergenerates the following types of recommendations:
| Existing table type | Recommendation subtype | Recommendation example |
|---|---|---|
| Non-partitioned, non-clustered | Partition | "Save about 64 slot hours per month by partitioning on column_C by DAY" |
| Non-partitioned, non-clustered | Cluster | "Save about 64 slot hours per month by clustering on column_C" |
| Partitioned, non-clustered | Cluster | "Save about 64 slot hours per month by clustering on column_C" |
Each recommendation consists of three parts:
- Guidance to either partition or cluster a specific table
- The specific column in a table to partition or cluster
- Estimated monthly savings for applying the recommendation
To calculate potential workload savings, the recommender assumes that the historical execution workload data from the past 30 days represents the future workload.
Note: In some cases, the estimated savings might be overestimated. For more information, seeOverestimation of savings.The recommender API also returns table workload information in the form ofinsights.Insights are findings that help youunderstand your project's workload, providing more context on how a partition orcluster recommendation might improve workload costs.
Limitations
The partitioning and clustering recommender does not supportBigQuery tables with legacy SQL. When generating arecommendation, the recommender excludes any legacy SQL queries in itsanalysis. Additionally, applying partition recommendations onBigQuery tables with legacy SQL breaks any legacy SQLworkflows in that table.
Before you apply partition recommendations,migrate your legacy SQL workflows into GoogleSQL.
BigQuery does not support changing the partitioning schemeof a table in place. You can only change the partitioning of a table on a copyof the table. For more information, seeApply partition recommendations.
The partitioning and clustering recommender runs daily. However, if therun takes longer than 24 hours to complete, the following day's run is skipped.
Locations
The partitioning and clustering recommender is available in the followingprocessing locations:
| Region description | Region name | Details | |
|---|---|---|---|
| Asia Pacific | |||
| Delhi | asia-south2 | ||
| Hong Kong | asia-east2 | ||
| Jakarta | asia-southeast2 | ||
| Mumbai | asia-south1 | ||
| Osaka | asia-northeast2 | ||
| Seoul | asia-northeast3 | ||
| Singapore | asia-southeast1 | ||
| Sydney | australia-southeast1 | ||
| Taiwan | asia-east1 | ||
| Tokyo | asia-northeast1 | ||
| Europe | |||
| Belgium | europe-west1 | ||
| Berlin | europe-west10 | ||
| EU multi-region | eu | ||
| Frankfurt | europe-west3 | ||
| London | europe-west2 | ||
| Netherlands | europe-west4 | ||
| Zürich | europe-west6 | ||
| Americas | |||
| Iowa | us-central1 | ||
| Las Vegas | us-west4 | ||
| Los Angeles | us-west2 | ||
| Montréal | northamerica-northeast1 | ||
| Northern Virginia | us-east4 | ||
| Oregon | us-west1 | ||
| Salt Lake City | us-west3 | ||
| São Paulo | southamerica-east1 | ||
| Toronto | northamerica-northeast2 | ||
| US multi-region | us | ||
Before you begin
Required permissions
To get the permissions that you need to access partition and cluster recommendations, ask your administrator to grant you theBigQuery Partitioning Clustering Recommender Viewer (roles/recommender.bigqueryPartitionClusterViewer) IAM role. For more information about granting roles, seeManage access to projects, folders, and organizations.
This predefined role contains the permissions required to access partition and cluster recommendations. To see the exact permissions that are required, expand theRequired permissions section:
Required permissions
The following permissions are required to access partition and cluster recommendations:
recommender.bigqueryPartitionClusterRecommendations.getrecommender.bigqueryPartitionClusterRecommendations.list
You might also be able to get these permissions withcustom roles or otherpredefined roles.
For more information about IAM roles and permissions inBigQuery, seeIntroduction to IAM.
View recommendations
This section describes how to view partition and cluster recommendations andinsights using the Google Cloud console, the Google Cloud CLI, or the Recommender API.
Note: You can also export recommendations to BigQuery using theBigQuery Data Transfer Service. For more information, seeExport recommendations to BigQuery.Select one of the following options:
Console
In the Google Cloud console, go to theBigQuery page.
In the navigation menu, clickRecommendations.
The recommendations tab lists all recommendations available to yourproject.
In theOptimize BigQuery workload cost panel, clickView all.
The cost recommendation table lists all recommendations generated forthe current project. For example, the following screenshot shows thatthe recommender analyzed the
example_tabletable, and then recommendedclustering theexample_columncolumn to save an approximate amount ofbytes and slots.
To see more information about the table insight and recommendation,click a recommendation.
gcloud
To view partition or cluster recommendations for a specific project, usethegcloud recommender recommendations list command:
gcloud recommender recommendations list \ --project=PROJECT_NAME \ --location=REGION_NAME \ --recommender=google.bigquery.table.PartitionClusterRecommender \ --format=FORMAT_TYPE \
Replace the following:
PROJECT_NAME: the name of the project thatcontains your BigQuery tableREGION_NAME: the region that your project isinFORMAT_TYPE: a supportedgcloud CLI output format—forexample, JSON
| Property | Relevant for subtype | Description |
|---|---|---|
recommenderSubtype | Partition or cluster | Indicates the type of recommendation. |
content.overview.partitionColumn | Partition | Recommended partitioning column name. |
content.overview.partitionTimeUnit | Partition | Recommended partitioning time unit. For example,DAY means the recommendation is to have daily partitions on the recommended column. |
content.overview.clusterColumns | Cluster | Recommended clustering column names. |
- For more information about other fields in the recommender response, seeREST Resource:
projects.locations.recommendersrecommendation. - For more information about using the Recommender API, seeUsing the API - Recommendations.
To view table insights using the gcloud CLI, use thegcloud recommender insights list command:
gcloud recommender insights list \ --project=PROJECT_NAME \ --location=REGION_NAME \ --insight-type=google.bigquery.table.StatsInsight \ --format=FORMAT_TYPE \
Replace the following:
PROJECT_NAME: the name of the project thatcontains your BigQuery tableREGION_NAME: the region that your project isinFORMAT_TYPE: a supportedgcloud CLI output format—forexample, JSON
| Property | Relevant for subtype | Description |
|---|---|---|
content.existingPartitionColumn | Cluster | Existing partitioning column, if any |
content.tableSizeTb | All | Size of the table in terabytes |
content.bytesReadMonthly | All | Monthly bytes read from the table |
content.slotMsConsumedMonthly | All | Monthly slot milliseconds consumed by the workload running on the table |
content.queryJobsCountMonthly | All | Monthly count of jobs running on the table |
- For more information about other fields in the insights response, seeREST Resource:
projects.locations.insightTypes.insights. - For more information about using insights, seeUsing the API - Insights.
REST API
To view partition or cluster recommendations for a specific project,use the REST API. With each command, you must provide an authenticationtoken, which you can get using the gcloud CLI. For moreinformation about getting an authentication token, seeMethods for getting an ID token.
You can use thecurl list request to view all recommendations for aspecific project:
curl -H "Authorization: Bearer$GCLOUD_AUTH_TOKEN" -H "x-goog-user-project:PROJECT_NAME" https://recommender.googleapis.com/v1/projects/my-project/locations/us/recommenders/google.bigquery.table.PartitionClusterRecommender/recommendations
Replace the following:
GCLOUD_AUTH_TOKEN: the name of a validgcloud CLI access tokenPROJECT_NAME: the name of the projectcontaining your BigQuery table
| Property | Relevant for subtype | Description |
|---|---|---|
recommenderSubtype | Partition or cluster | Indicates the type of recommendation. |
content.overview.partitionColumn | Partition | Recommended partitioning column name. |
content.overview.partitionTimeUnit | Partition | Recommended partitioning time unit. For example,DAY means the recommendation is to have daily partitions on the recommended column. |
content.overview.clusterColumns | Cluster | Recommended clustering column names. |
- For more information about other fields in the recommender response, seeREST Resource:
projects.locations.recommendersrecommendation. - For more information about using the Recommender API, seeUsing the API - Recommendations.
To view table insights using the REST API, run the followingcommand:
curl-H "Authorization: Bearer$GCLOUD_AUTH_TOKEN"-H "x-goog-user-project:PROJECT_NAME" https://recommender.googleapis.com/v1/projects/my-project/locations/us/insightTypes/google.bigquery.table.StatsInsight/insights
Replace the following:
GCLOUD_AUTH_TOKEN: the name of a validgcloud CLI access tokenPROJECT_NAME: the name of the projectcontaining your BigQuery table
| Property | Relevant for subtype | Description |
|---|---|---|
content.existingPartitionColumn | Cluster | Existing partitioning column, if any |
content.tableSizeTb | All | Size of the table in terabytes |
content.bytesReadMonthly | All | Monthly bytes read from the table |
content.slotMsConsumedMonthly | All | Monthly slot milliseconds consumed by the workload running on the table |
content.queryJobsCountMonthly | All | Monthly count of jobs running on the table |
- For more information about other fields in the insights response, seeREST Resource:
projects.locations.insightTypes.insights. - For more information about using insights, seeUsing the API - Insights.
View recommendations withINFORMATION_SCHEMA
You can also view your recommendations and insights usingINFORMATION_SCHEMAviews. For example, you can use theINFORMATION_SCHEMA.RECOMMENDATIONS view toview your top three recommendations based on slots savings, as seen in thefollowing example:
SELECTrecommender,target_resources,LAX_INT64(additional_details.overview.bytesSavedMonthly)/POW(1024,3)asest_gb_saved_monthly,LAX_INT64(additional_details.overview.slotMsSavedMonthly)/(1000*3600)asslot_hours_saved_monthly,last_updated_timeFROM`region-us`.INFORMATION_SCHEMA.RECOMMENDATIONSWHEREprimary_impact.category='COST'ANDstate='ACTIVE'ORDERbyslot_hours_saved_monthlyDESCLIMIT3;INFORMATION_SCHEMA view names are case sensitive.The result is similar to the following:
+---------------------------------------------------+--------------------------------------------------------------------------------------------------+| recommender | target_resources | est_gb_saved_monthly | slot_hours_saved_monthly | last_updated_time+---------------------------------------------------+--------------------------------------------------------------------------------------------------+| google.bigquery.materializedview.Recommender | ["project_resource"] | 140805.38289248943 | 9613.139166666666 | 2024-07-01 13:00:00| google.bigquery.table.PartitionClusterRecommender | ["table_resource_1"] | 4393.7416711859405 | 56.61476777777777 | 2024-07-01 13:00:00| google.bigquery.table.PartitionClusterRecommender | ["table_resource_2"] | 3934.07264107652 | 10.499466666666667 | 2024-07-01 13:00:00+---------------------------------------------------+--------------------------------------------------------------------------------------------------+
For more information, see the following resources:
INFORMATION_SCHEMA.RECOMMENDATIONSviewINFORMATION_SCHEMA.RECOMMENDATIONS_BY_ORGANIZATIONviewINFORMATION_SCHEMA.INSIGHTSview
Apply cluster recommendations
To apply cluster recommendations, do one of the following:
- Apply clusters directly to the original table
- Apply clusters to a copied table
- Apply clusters in a materialized view
Apply clusters directly to the original table
You can apply cluster recommendations directly to an existing BigQuerytable. This method is quicker thanapplying recommendations to a copied table,but it does not preserve a backup table.
Follow these steps to apply a new clustering specification to unpartitioned orpartitioned tables.
In the bq tool, update the clustering specification of yourtable to match the new clustering:
bq update --clustering_fields=CLUSTER_COLUMNDATASET.ORIGINAL_TABLE
Replace the following:
CLUSTER_COLUMN: the column you are clustering on—for example,mycolumnDATASET: the name of the dataset containing the table—for example,mydatasetORIGINAL_TABLE: the name of your original table—for example,mytable
You can also call the
tables.updateortables.patchAPI method tomodify the clustering specification.To cluster all rows according to the new clustering specification,run the following
UPDATEstatement: Note: If a new clustering specification is applied to a table that is inlong-term storage, then the table reverts to active storage pricing.For more information, seeStorage pricing.UPDATEDATASET.ORIGINAL_TABLESETCLUSTER_COLUMN=CLUSTER_COLUMNWHEREtrue
Apply clusters to a copied table
When you apply cluster recommendations to a BigQuery table, youcan first copy the original table and then apply the recommendation to thecopied table. This method ensures that your original data is preserved if youneed to roll back the change to the clustering configuration.
You can use this method to apply cluster recommendations to bothunpartitioned and partitioned tables.
In the Google Cloud console, go to the BigQuery page.
In the query editor, create an empty table with the same metadata (includingthe clustering specifications) of the original table by using the
LIKEoperator:CREATETABLEDATASET.COPIED_TABLELIKEDATASET.ORIGINAL_TABLE
Replace the following:
DATASET: the name of the dataset containing the table—for example,mydatasetCOPIED_TABLE: a name for your copied table—for example,copy_mytableORIGINAL_TABLE: the name of your original table—for example,mytable
In the Google Cloud console, open the Cloud Shell Editor.
In the Cloud Shell Editor, update the clustering specification ofthe copied table to match the recommended clustering by using the
bq updatecommand:bq update --clustering_fields=CLUSTER_COLUMNDATASET.COPIED_TABLE
Replace
CLUSTER_COLUMNwith the column you are clustering on—for example,mycolumn.You can also call the
tables.updateortables.patchAPI method tomodify the clustering specification.In the query editor, retrieve the table schema with the partitioning andclustering configuration of the original table, if any partitioning orclustering exists. You can retrieve the schema by viewing the
INFORMATION_SCHEMA.TABLESview of the original table:SELECTddlFROMDATASET.INFORMATION_SCHEMA.TABLESWHEREtable_name='DATASET.ORIGINAL_TABLE;'
The output is the full data definition language (DDL) statement ofORIGINAL_TABLE,including the
PARTITION BYclause. For more information about the argumentsin your DDL output, seeCREATE TABLEstatement.The DDL output indicates the type of partitioning in the original table:
Partitioning type Output example Not partitioned The PARTITION BYclause is absent.Partitioned by table column PARTITION BY c0PARTITION BY DATE(c0)PARTITION BY DATETIME_TRUNC(c0, MONTH)Partitioned by ingestion time PARTITION BY _PARTITIONDATEPARTITION BY DATETIME_TRUNC(_PARTITIONTIME, MONTH)Ingest data into the copied table. The process that you use is based onthe partition type.
- If the original table is non-partitioned or partitioned by a table column,ingest the data from the original table to the copied table:
INSERTINTODATASET.COPIED_TABLESELECT*FROMDATASET.ORIGINAL_TABLE
If the original table is partitioned by ingestion time, follow these steps:
Retrieve the list of columns to form the data ingestion expression byusing the
INFORMATION_SCHEMA.COLUMNSview:SELECTARRAY_TO_STRING((SELECTARRAY(SELECTcolumn_nameFROMDATASET.INFORMATION_SCHEMA.COLUMNSWHEREtable_name='ORIGINAL_TABLE')),", ")
The output is a comma-separated list of column names.
Ingest the data from the original table to the copied table:
INSERTDATASET.COPIED_TABLE(COLUMN_NAMES,_PARTITIONTIME)SELECT*,_PARTITIONTIMEFROMDATASET.ORIGINAL_TABLE
Replace
COLUMN_NAMESwith the list of columnsthat was the output in the preceding step, separated by commas—for example,col1, col2, col3.
You now have a clustered copied table with the same data as the original table.In the next steps, you replace your original table with a newly clustered table.
- If the original table is non-partitioned or partitioned by a table column,ingest the data from the original table to the copied table:
Rename the original table to a backup table:
ALTERTABLEDATASET.ORIGINAL_TABLERENAMETODATASET.BACKUP_TABLE
Replace
BACKUP_TABLEwith a name for your backup table—for example,backup_mytable.Rename the copied table to the original table:
ALTERTABLEDATASET.COPIED_TABLERENAMETODATASET.ORIGINAL_TABLE
Your original table is now clustered according to the cluster recommendation.
- Access and permissions, such asIAM permissions,row-level access, orcolumn-level access.
- Table artifacts such astable clones,table snapshots, orsearch indexes.
- The status of any ongoing table processes, such as anymaterialized views or any jobs that ran when you copied the table.
- The ability to access historical table data usingtime travel.
- Any metadata associated with the original table—for example,
table_option_listorcolumn_option_list. For more information, seeData definition language statements.
If any issues arise, you must manually migrate the affected artifacts to the new table.
After reviewing the clustered table, you can optionally delete the backup table with the following command:DROPTABLEDATASET.BACKUP_TABLE
Apply clusters in a materialized view
You can create a materialized view of the table to store data from the original table with the recommendation applied. Using materialized views to apply recommendations ensures that the clustered data is kept up to date usingautomatic refreshes.There arepricing considerations when you query, maintain, and store materialized views. To learn how to create a clustered materialized view, seeClustered materialized views.Apply partition recommendations
To apply partition recommendations, you must apply it to a copy of the originaltable. BigQuery does not support the changing of a partitioningscheme of a table in place, such as changing an unpartitioned table to apartitioned table, changing the partitioning scheme of a table, or creating amaterialized view with a different partitioning scheme from the base table. Youcan only change the partitioning of a table on a copy of the table.
Caution: Migrate your legacy SQL workflows to GoogleSQL before applyingpartition recommendations. For more information, seeLimitations.Apply partition recommendations to a copied table
When you apply partition recommendations to a BigQuery table, youmust first copy the original table and then apply the recommendation to thecopied table. This approach ensures that your original data is preserved if youneed to roll back a partition.
The following procedure uses an example recommendation to partition a table bythe partition time unitDAY.
Create a copied table using the partition recommendations:
CREATETABLEDATASET.COPIED_TABLEPARTITIONBYDATE_TRUNC(PARTITION_COLUMN,DAY)ASSELECT*FROMDATASET.ORIGINAL_TABLE
Replace the following:
DATASET: the name of the dataset containing the table—for example,mydatasetCOPIED_TABLE: a name for your copied table—for example,copy_mytablePARTITION_COLUMN: the column you are partitioning on—for example,mycolumn
For more information about creating partitioned tables, seeCreating partitioned tables.
Rename the original table to a backup table:
ALTERTABLEDATASET.ORIGINAL_TABLERENAMETODATASET.BACKUP_TABLE
Replace
BACKUP_TABLEwith a name for your backup table—for example,backup_mytable.Rename the copied table to the original table:
ALTERTABLEDATASET.COPIED_TABLERENAMETODATASET.ORIGINAL_TABLE
Your original table is now partitioned according to the partitionrecommendation.
- Access and permissions, such asIAM permissions,row-level access, orcolumn-level access.
- Table artifacts such astable clones,table snapshots, orsearch indexes.
- The status of any ongoing table processes, such as anymaterialized views or any jobs that ran when you copied the table.
- The ability to access historical table data usingtime travel.
- Any metadata associated with the original table—for example,
table_option_listorcolumn_option_list. For more information, seeData definition language statements. - Ability to use legacy SQL to write query results into partitioned tables. The use of legacy SQL isnot fully supported in partitioned tables. One solution is tomigrate your legacy SQL workflows into GoogleSQL before applying a partition recommendation.
If any issues arise, you must manually migrate the affected artifacts to the new table.
After reviewing the partitioned table, you can optionally delete the backup table with the following command:DROPTABLEDATASET.BACKUP_TABLE
Overestimation of savings
In some cases, the BigQuery clustering recommender might provideestimated savings that appear disproportionately large—for example,exceeding your total monthly billed bytes for that specific table. This istypically caused by a pattern known assubquery summation.
What triggers this overestimation
The overestimation is most common for workloads involving complexGoogleSQL queries that reference the same table multiple times.Examples include the following:
- Queries with many self-joins on a single large table.
- Queries with multiple common table expressions or subqueries that allscan the same underlying table.
Why overestimation happens
BigQuery execution plans often break complex queries intomultiple distinctstages. The clustering recommender calculates andsums the potential savings for every individual stage independently.
If a single job consists of many stages that each read from the same table, therecommender might count the potential savings for each stage in that singlejob, rather than de-duplicating the savings at the job level. This can lead tooverestimated recommendations for tables with complex query patterns.
Verify if your workload is affected
If you have a specific clustering recommendation that you want to verify, youcan run the following query in the Google Cloud console to identify jobsthat might be triggering this overestimation.
This query searches your job history for instances where a single job scans thesame table across more than 10 distinct execution stages.
SELECTjob_id,project_id,user_email,table_name,scan_count,total_billed_gb,creation_timeFROM(SELECTjob_id,project_id,user_email,creation_time,total_bytes_billed/(1024*1024*1024)astotal_billed_gb,-- Extract the table name from the 'READ' substepsREGEXP_EXTRACT(substep,r'FROM ([^ ]+)')astable_name,COUNT(DISTINCTstage.id)asscan_countFROM`region-REGION_NAME`.INFORMATION_SCHEMA.JOBS,UNNEST(job_stages)asstage,UNNEST(stage.steps)asstep,UNNEST(step.substeps)assubstepWHEREcreation_time>TIMESTAMP_SUB(CURRENT_TIMESTAMP(),INTERVAL14DAY)ANDstep.kind='READ'ANDsubstepLIKE'FROM %'-- Exclude internal intermediate stagesANDNOTREGEXP_CONTAINS(substep,r'FROM __stage')GROUPBY1,2,3,4,5,6)WHEREscan_count>10-- Adjust this threshold to find more complex query patternsORDERBYscan_countDESCLIMIT100;
ReplaceREGION_NAME with the region that your project is in.
If you find jobs with a highscan_count (for example, greater than 20) for thetable in your recommendation, it is likely that the estimated savings for thattable are inflated. While clustering might still provide a performance benefit,the actual savings won't reach the level suggested by the recommendation.
Pricing
When you apply a recommendation to a table, you can incur the followingcosts:
- Processing costs. When you apply a recommendation, you execute adata definition language (DDL) or data manipulation language(DML) query to your BigQuery project.
- Storage costs. If you use the method of copying a table, you useextra storage for the copied (or backup) table.
Standard processing and storage charges apply depending on the billing accountthat's associated with the project. For more information, seeBigQuery pricing.
Troubleshooting
Issue: No recommendations appear for a specific table.
Partition recommendations might not appear for tables that meet these criteria:
- The table is less than 100GB.
- The table is already partitioned or clustered.
Cluster recommendations might not appear for tables that meet these criteria:
- The table is less than 10GB.
- The table is already clustered.
Both partition and cluster recommendations might be suppressed when:
- The table has a high write cost fromdata manipulation language (DML)operations.
- The table was not read in the past 30 days.
- The estimated monthly savings is too insignificant (less than 1 slot hour of savings).
Issue: The estimated savings appear disproportionately large.
Estimated monthly savings might be overestimated if your workload involvescomplex queries that reference the same table multiple times across manydistinct execution stages. For more information, seeOverestimation of savings.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2026-02-18 UTC.