Contribution analysis overview
Use this document to understand the contribution analysis use case,and the options for performing contribution analysis in BigQuery ML.
What is contribution analysis?
Contribution analysis, also called key driver analysis, is a method used togenerate insights about changes to key metrics in your multi-dimensional data.For example, you can use contribution analysis to see what data contributed to achange in revenue numbers across two quarters, or to compare two sets oftraining data to understand changes in an ML model's performance.
Contribution analysis is a form ofaugmented analytics,which is the use of artificial intelligence (AI) to enhance and automate theanalysis and understanding of data. Contribution analysis accomplishes one ofthe key goals of augmented analytics, which is to help users find patterns intheir data.
Contribution analysis with BigQuery ML
To use contribution analysis in BigQuery ML, create acontribution analysis model with theCREATE MODEL statement.
A contribution analysis model detects segments of data that show changes ina given metric by comparing a test set of data to a control set of data. Forexample, you might use atable snapshotof sales data taken at the end of 2023 as your test data and a table snapshottaken at the end of 2022 as your control data, and compare them to see howyour sales changed over time. A contribution analysis model could show youwhich segment of data, such as online customers in a particular region, drovethe biggest change in sales from one year to the next.
Ametric is the numerical value that contribution analysis models useto measure and compare the changes between the test and control data. You canspecify the following types of metrics with a contribution analysis model:
- Summable:sums the values of a metric column that you specify, and then determinesa total for each segment of the data.
- Summable ratio:sums the values of two numeric columns that you specify, and determinesthe ratio between them for each segment of the data.
- Summable by category:sums the value of a numeric column and divides it by the number of distinctvalues from a categorical column.
Asegment is a slice of the data identified by a given combination ofdimension values. For example, for a contribution analysis model based on thestore_number,customer_id, andday dimensions, every unique combination ofthose dimension values represents a segment. In the following table, each rowrepresents a different segment:
store_number | customer_id | day |
| store 1 | ||
| store 1 | customer 1 | |
| store 1 | customer 1 | Monday |
| store 1 | customer 1 | Tuesday |
| store 1 | customer 2 | |
| store 2 |
To reduce model creation time, specify anapriori support threshold.An apriori support threshold lets you prune small and less relevant segmentsso that the model uses only the largest and most relevant segments.
After you have created a contribution analysis model, you can use theML.GET_INSIGHTS functionto retrieve the metric information calculated by the model. The model outputconsists of rows of insights, where each insight corresponds to a segment andprovides the segment's corresponding metrics.
Contribution analysis user journey
The following table describes the statements and functions you can use withcontribution analysis models:
| Model creation | Feature preprocessing | Insights generation | Tutorials |
|---|---|---|---|
CREATE MODEL | Manual preprocessing | ML.GET_INSIGHTS |
What's next
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2026-02-19 UTC.