The ML.GET_INSIGHTS function

This document describes theML.GET_INSIGHTS function, which you can use toretrieve information about changes to key metrics in your multi-dimensional datafrom acontribution analysis model.You can use aCREATE MODEL statementto create a contribution analysis model in BigQuery.

Syntax

ML.GET_INSIGHTS(  MODEL `PROJECT_ID.DATASET.MODEL_NAME`)

Arguments

ML.GET_INSIGHTS takes the following arguments:

  • PROJECT_ID: Your project ID.
  • DATASET: The BigQuery dataset thatcontains the model.
  • MODEL_NAME: The name of the contribution analysismodel.

Output

Some of theML.GET_INSIGHTS output columns contain metrics that compare thevalues for a given segment in either the test or control dataset against thevalues for thepopulation, which is all segments in the same dataset. Themetric values calculated for the entire population except for the given segmentare referred to ascomplement values.

Output for summable metric contribution analysis models

ML.GET_INSIGHTS returns the following output columns for contributionanalysis models that usesummable metrics,in addition to the dimension columns:

  • contributors: anARRAY<STRING> value that contains the dimension valuesfor a given segment. The other output metrics that are returned in the samerow apply to the segment described by these dimensions.
  • metric_test: a numeric value that contains the sum of the value of themetric column in the test dataset for the given segment. The metric column isspecified in theCONTRIBUTION_METRIC option of the contribution analysismodel.

    SUM(metric_column_name) WHERE is_test_col = TRUE

  • metric_control: a numeric value that contains the sum of the value of themetric column in the control dataset for the given segment. The metric columnis specified in theCONTRIBUTION_METRIC option of the contribution analysismodel.

    SUM(metric_column_name) WHERE is_test_col = FALSE

  • difference: a numeric value that contains the difference between themetric_test andmetric_control values:

    metric_test - metric_control

  • relative_difference: a numeric value that contains the relative change inthe segment value between the test and control datasets:

    difference / metric_control

  • unexpected_difference: a numeric value that contains the unexpecteddifference between the segment's actualmetric_test value and the segment'sexpectedmetric_test value, which is determined by comparing the ratio ofchange for this segment against the complement ratio of change. Theunexpected_difference value is calculated as follows:

    1. Determine themetric_test value for all segments except the givensegment, referred to here ascomplement_test_change:

      complement_test_change = sum(metric_test for the population) - metric_test

    2. Determine themetric_control value for all segments except the givensegment, referred to here ascomplement_control_change:

      complement_control_change = sum(metric_control for the population) - metric_control

    3. Determine the ratio between thecomplement_test_change andcomplement_control_change values, referred to here ascomplement_change_ratio:

      complement_change_ratio = complement_test_change / complement_control_change

    4. Determine the expectedmetric_test value for the givensegment, referred to here asexpected_metric_test:

      expected_metric_test = metric_control * complement_change_ratio

    5. Determine theunexpected_difference value:

      unexpected_difference = metric_test - expected_metric_test

  • relative_unexpected_difference: a numeric value that contains theratio between theunexpected_difference value and theexpected_metric_testvalue:

    unexpected_difference / expected_metric_test

    You canuse therelative_unexpected_difference value to determine if the change tothis segment is smaller than expected compared to the change in all of theother segments.

  • apriori_support: a numeric value that contains the apriori support valuefor the segment. The apriori support value is either the ratio between themetric_test value for the segment and themetric_test value for thepopulation, or the ratio between themetric_control value for the segmentand themetric_control value for the population, whichever is greater.The calculation is expressed as the following:

    GREATEST(  metric_test / SUM(metric_test for the population),  metric_control / SUM(metric_control for the population))

    If theapriori_support value is less than theapriori support thresholdvalue specified in the model, then the segment is considered too small to beof interest and is excluded by the model.

  • contribution: a numeric value that contains the absolute value of thedifference value:ABS(difference).

Insights are automatically ordered by contribution in descending order todetermine the contributors associated with the largest differences in yourdata between the test and control sets.

Output for summable ratio metric contribution analysis models

ML.GET_INSIGHTS returns the following output columns for contributionanalysis models that usesummable ratio metrics, inaddition to the dimension columns:

  • contributors: anARRAY<STRING> value that contains the dimension valuesfor a given segment. The other output metrics that are returned in the samerow apply to the segment described by these dimensions.
  • metric_test: a numeric value that contains the ratio between thetwo metrics that you are evaluating, in the test dataset for the givenmetric. These two metrics are specified in theCONTRIBUTION_METRIC option of thecontribution analysis model. Themetric_test value is calculated as thefollowing:

    sum(numerator_metric_column_name) / sum(denominator_metric_column_name) WHERE is_test_col = TRUE

  • metric_control: a numeric value that contains the ratio between thetwo metrics that you are evaluating, in the control dataset for thegiven metric. These two metrics are specified in theCONTRIBUTION_METRIC option of thecontribution analysis model. Themetric_control value is calculated asthe following:

    SUM(numerator_metric_column_name) / SUM(denominator_metric_column_name) WHERE is_test_col = FALSE

  • metric_test_over_metric_control: a numeric value that contains the ratiobetween themetric_test value and themetric_control value:

    metric_test / metric_control

  • metric_test_over_complement: a numeric value that contains the ratiobetween themetric_test value for this segment and the complementmetric_test value:

    metric_test / SUM(metric_test for the complement)

    You can use themetric_test_over_complement value to compare the size of this segment tothe size the other segments.

    For example, consider the following table of test data:

    dim1
    dim2
    dim3
    metric_a
    metric_b
    1
    10
    20
    50
    100
    1
    15
    30
    75
    200
    5
    20
    40
    1
    10

    Assume that theCONTRIBUTION_METRIC value isSUM(metric_a) / SUM(metric_b). Using the data in the preceding table, themetric_a value for the population is126, while themetric_b value forthe population is310. Themetric_test_over_complement value for the segment in the firstrow of the table is calculated as the following:

    (50/100)/((75+1)/(200+10)) = .5/(76/210) = 1.38

    Thismetric_test_over_complement value indicates that the size of this segmentis larger than the size of all of the other segments combined.Alternatively, themetric_test_over_complement value for the segment inthe third row of table is calculated as the following:

    (1/10)/((50+75)/(100+200)) = .1/(125/300) = 0.24

    Thismetric_test_over_complement value indicates that the size of thissegment is smaller than the combined size of the rest of the segments.

  • metric_control_over_complement: a numeric value that contains the ratiobetween themetric_control value for this segment and the complementmetric_control value:

    metric_control / sum(metric_control for the complement)

    You can use themetric_control_over_complement value to compare the size of this segment tothe size of the other segments.

  • aumann_shapley_attribution: a numeric value that contains theAumann-Shapleyvaluefor this segment. The Aumann-Shapley value measures the contribution of thesegment ratio relative to the population ratio. You can use the Aumann-Shapleyvalue to determinehow much a feature contributes to the predictionvalue.In the context of contribution analysis, BigQuery ML uses theAumann-Shapley value to measure the attribution of the segment relative to thepopulation. When calculating this measurement, the service considers thesegment ratio changes and the complement population changes between the testand control datasets.

  • apriori_support: a numeric value that contains the apriori support valuefor the segment. The apriori support value is calculated using the numeratorcolumn specified in the model'sCONTRIBUTION_METRIC option:

    numerator column value for the given segment / SUM(numerator column value for the population)

    If theapriori_support value is less than theapriori support thresholdvalue specified in the model, then the segment is considered too small to beof interest and is excluded by the model.

  • contribution: a numeric value that contains the absolute value of theaumann_shapley_attribution:

    ABS(aumann_shapley_attribution)

Insights are automatically ordered by contribution in descending order todetermine the contributors associated with the largest differences in yourdata between the test and control sets.

Output for summable by category metric contribution analysis models

ML.GET_INSIGHTS returns the following output columns for contribution analysismodels that use summable category metrics, in addition to the dimension columns:

  • contributors: anARRAY<STRING> value that contains the dimension valuesfor a given segment. The other output metrics that are returned in the samerow apply to the segment described by these dimensions.
  • metric_test: a numeric value that contains the ratio between the sum ofthe metric column and the number of distinct values of the count distinctcolumn in the test dataset for a given segment:

    SUM(sum_column_name) / COUNT(DISTINCT categorical_column_name) WHERE is_test_col = TRUE

    The metric and count distinct columns are specified in theCONTRIBUTION_METRIC option of the contribution analysis model.

  • metric_control: a numeric value that contains the ratio between the sumof the metric column and the number of distinct values of the count distinctcolumn in the control dataset for a given segment:

    SUM(sum_column_name) / COUNT(DISTINCT categorical_column_name) WHERE is_test_col = FALSE

    The metric and categorical columns are specified intheCONTRIBUTION_METRIC option of the contribution analysis model.

  • difference: a numeric value that contains the difference between themetric_test andmetric_control values:

    metric_test - metric_control.

  • relative_difference: a numeric value that contains the relative change inthe segment value between the test and control datasets:

    difference/metric_control

  • metric_test_over_population: a numeric value that contains the ratiobetween themetric_test value for this segment and themetric_test valuefor the population:

    metric_test / (metric_test for the population)

    You can use themetric_test_over_population value to compare the size of the segment to theoverall size of the test dataset.

  • metric_control_over_population: a numeric value that contains the ratiobetween themetric_control value for this segment and themetric_controlvalue for the population:

    metric_control / (metric_control for the population)

    You can use themetric_control_over_population value to compare the size of the segment tothe overall size of the control dataset.

  • apriori_support: a numeric value that contains the apriori support valuefor the segment. To calculate apriori support, thesum_metric_columnis used to compute the segment size relative to the population for both thetest and control datasets andapriori_support is selected as the greater ofthe two values. The calculation is expressed as the following:

    GREATEST(  SUM(sum_column_name test) / SUM(sum_column_name test for the population),  SUM(sum_column_name control) / SUM(sum_column_name control for the population))

    If theapriori_support value is less than the apriori support thresholdvalue specified in the model, then the segment is considered too small to beof interest and is excluded by the model.

  • contribution: a numeric value that contains the absolute value of thedifference, calculated asABS(difference).

Insights are automatically ordered by contribution in descending order toquickly determine the contributors associated with the largest differences inyour data between the test and control sets.

What's next

Get data insights from a contribution analysis model.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.