Introduction to Vertex Explainable AI

Machine learning models are often "black boxes"; even their designers cannotexplain how or why a model produced a specific inference. Vertex Explainable AI offersFeature-based andExample-based explanations to provide better understandingof model decision making.

Knowing how a model behaves, and how its training dataset influences the model, givesanyone who builds or uses ML new abilities to improve models, build confidencein their inferences, and understand when and why things go awry.

Example-based explanations

With example-based explanations, Vertex AI usesnearest neighbor searchto return a list of examples (typically from the training set) that are mostsimilar to the input. Because we generally expect similar inputs to yieldsimilar inferences, we can use these explanations to explore and explain our model'sbehavior.

Example-based explanations can be useful in several scenarios:

  • Improve your data or model: One of the core use cases for example-basedexplanations is helping you understand why your model made certain mistakesin its inferences, and using those insights to improve your data or model.To do so, first select test data that is of interest to you. This could beeither driven by business needs or heuristics like data where the model madethe most egregious mistakes.

    For example, suppose we have a model that classifies images as either a birdor a plane and misclassifies the following bird as a plane with highconfidence. Use Example-based explanations to retrieve similar images fromthe training set to figure out what is happening.

    Example-based explanation showing a misclassified image of a bird insilhouette and similar images of planes in silhouette from the trainingdata.

    Since all of its explanations are dark silhouettes from the plane class,it's a signal to get more bird silhouettes.

    However, if the explanations were mainly from the bird class, it's a signalthat our model can't learn relationships even when the data is rich, and weshould consider increasing model complexity (for example, adding morelayers).

  • Interpret novel data: Assume, for the moment, that your model wastrained to classify birdsand planes, but in the real world, the model also encounters images ofkites, drones, and helicopters. If your nearest neighbor dataset includessome labeled images of kites, drones, and helicopters, you can use example-basedexplanations to classify novel images by applying the most frequentlyoccurring label of its nearest neighbors. This is possible because we expectthe latent representation of kites to be different from that of birds orplanes and more similar to the labeled kites in the nearest neighbordataset.

  • Detect anomalies: Intuitively, if an instance is far away from all ofthe data in the training set, then it is likely an outlier. Neural networksare known to be overconfident in their mistakes, thus masking their errors.Monitoring your models using example-based explanations helps identify themost serious outliers.

  • Active learning: Example-based explanations can help you identify theinstances that might benefit from human labeling. This is particularlyuseful if labeling is slow or expensive, ensuring that you get the richestpossible dataset from limited labeling resources.

    For example, suppose we have a model that classifies a medical patient aseither having a cold or a flu. If a patient is classified as having the flu,and all of her example-based explanations are from the flu class, then thedoctor can be more confident in the model's inference without having totake a closer look. However, if some of the explanations are from the fluclass, and some others from cold class, it would be worthwhile to get adoctor's opinion. This will lead to a dataset where difficult instances havemore labels, making it easier for downstream models to learn complexrelationships.

To create a Model that supports example-based explanations, seeConfiguringexample-basedexplanations.

Supported model types

Any TensorFlow model that can provide an embedding (latent representation) forinputs is supported. Tree-based models, such as decision trees, are notsupported. Models from other frameworks, such as PyTorch or XGBoost, are notsupported yet.

For deep neural networks, we generally assume that the higher layers (closer tothe output layer) have learned something "meaningful", and thus, the penultimatelayer is often chosen for embeddings. Experiment with a few different layers,investigate the examples you are getting, and choose one based on somequantitative (class match) or qualitative (looks sensible) measures.

For a demonstration on how to extract embeddings from a TensorFlow model andperform nearest neighbor search, see theexample-based explanationnotebook.

Feature-based explanations

Vertex Explainable AI integratesfeature attributions into Vertex AI. Thissection provides a brief conceptual overview of the feature attribution methodsavailable with Vertex AI.

Feature attributions indicate how much each feature in your model contributed tothe inferences for each given instance. When you request inferences, you getvalues as appropriate for your model. When you requestexplanations,you get the inferences along with feature attribution information.

Feature attributions work on tabular data, and include built-in visualizationcapabilities for image data. Consider the following examples:

  • A deep neural network is trained to predict the duration of a bike ride,based on weather data and previous ride sharing data. If you request onlyinferences from this model, you get predicted durations of bike rides innumber of minutes. If you requestexplanations, you get the predicted biketrip duration, along with an attribution score for each feature in yourexplanations request. The attribution scores show how much the featureaffected the change in inference value, relative to the baseline value thatyou specify. Choose a meaningful baseline that makes sense for your model -in this case, the median bike ride duration. You can plot the featureattribution scores to see which features contributed most strongly to theresulting inference:

    A feature attribution chart for one predicted bike ride duration

  • An image classification model is trained to predict whether a given imagecontains a dog or a cat. If you request inferences from this model on a newset of images, then you receive an inference for each image ("dog" or"cat"). If you requestexplanations, you get the predicted class alongwith an overlay for the image, showing which pixels in the image contributedmost strongly to the resulting inference:

    A photo of a cat with feature attribution overlay
    A photo of a cat with feature attribution overlay
    A photo of a dog with feature attribution overlay
    A photo of a dog with feature attribution overlay
  • An image classification model is trained to predict the species of a flowerin the image. If you request inferences from this model on a new set ofimages, then you receive an inference for each image ("daisy" or"dandelion"). If you requestexplanations, you get the predicted classalong with an overlay for the image, showing which areas in the imagecontributed most strongly to the resulting inference:

    A photo of a daisy with feature attribution overlay
    A photo of a daisy with feature attribution overlay

Supported model types

Feature attribution is supported for all types of models (both AutoML andcustom-trained), frameworks (TensorFlow, scikit, XGBoost), BigQuery MLmodels, and modalities (images, text, tabular).

To use feature attribution,configure your model for featureattributionwhen you upload or register the model to the Vertex AI Model Registry.

Additionally, for the following types of AutoML models, feature attribution isintegrated into the Google Cloud console:

  • AutoML image models (classification models only)
  • AutoML tabular models (classification and regression models only)

For AutoML model types that are integrated, you can enable feature attributionin the Google Cloud console during training and seemodel feature importance forthe model overall, andlocal feature importance for bothonline andbatch inferences.

For AutoML model types that are not integrated, you can still enable featureattribution by exporting the model artifacts and configuring feature attributionwhen you upload the model artifacts to the Vertex AI Model Registry.

Advantages

If you inspect specific instances, and also aggregate feature attributionsacross your training dataset, you can get deeper insight into how your modelworks. Consider the following advantages:

  • Debugging models: Feature attributions can help detect issues in thedata that standard model evaluation techniques would usually miss.

  • Optimizing models: You can identify and remove features that are lessimportant, which can result in more efficient models.

Feature attribution methods

Each feature attribution method is based onShapley values - a cooperativegame theory algorithm that assigns credit to each player in a game for aparticular outcome. Applied to machine learning models, this means that eachmodel feature is treated as a "player" in the game. Vertex Explainable AI assignsproportional credit to each feature for the outcome of a particular inference.

Sampled Shapley method

Thesampled Shapley method provides a sampling approximation of exact Shapleyvalues. AutoML tabular models use the sampled Shapley method for featureimportance. Sampled Shapley works well for these models, which aremeta-ensembles of trees and neural networks.

For in-depth information about how the sampled Shapley method works, read thepaperBounding the Estimation Error of Sampling-based Shapley ValueApproximation.

Integrated gradients method

In theintegrated gradients method, the gradient of the inference output iscalculated with respect to the features of the input, along an integral path.

  1. The gradients are calculated at different intervals of a scaling parameter.The size of each interval is determined by using theGaussianquadrature rule. (Forimage data, imagine this scaling parameter as a "slider" that is scaling allpixels of the image to black.)
  2. The gradients are integrated as follows:
    1. The integral is approximated by using a weighted average.
    2. The element-wise product of the averaged gradients and the originalinput is calculated.

For an intuitive explanation of this process as applied to images, refer to theblog post,"Attributing a deep network's inference to its input features".The authors of the original paper about integrated gradients (AxiomaticAttribution for Deep Networks) show in the preceding blog post what the imageslook like at each step of the process.

XRAI method

TheXRAI method combines the integrated gradients method with additional stepsto determine whichregions of the image contribute the most to a given classinference.

  1. Pixel-level attribution: XRAI performs pixel-level attribution for the inputimage. In this step, XRAI uses the integrated gradients method with a blackbaseline and a white baseline.
  2. Oversegmentation: Independently of pixel-level attribution, XRAIoversegments the image to create a patchwork of small regions. XRAI usesFelzenswalb's graph-based method to create theimage segments.
  3. Region selection: XRAI aggregates the pixel-level attribution within eachsegment to determine its attribution density. Using these values, XRAI rankseach segment and then orders the segments from most to least positive. Thisdetermines which areas of the image are most salient, or contribute moststrongly to a given class inference.

Images that show the steps of the XRAI algorithm

Compare feature attribution methods

Vertex Explainable AI offers three methods to use for feature attributions:sampledShapley,integrated gradients, andXRAI.

MethodBasic explanationRecommended model typesExample use casesCompatible Vertex AIModel resources
Sampled ShapleyAssigns credit for the outcome to each feature, and considers different permutations of the features. This method provides a sampling approximation of exact Shapley values.Non-differentiable models, such as ensembles of trees and neural networks
  • Classification and regression on tabular data
  • Any custom-trained model (running in any inference container)
  • AutoML tabular models
Integrated gradientsA gradients-based method to efficiently compute feature attributions with the same axiomatic properties as the Shapley value.Differentiable models, such as neural networks. Recommended especially for models with large feature spaces.
Recommended for low-contrast images, such as X-rays.
  • Classification and regression on tabular data
  • Classification on image data
XRAI (eXplanation with Ranked Area Integrals)Based on the integrated gradients method, XRAI assesses overlapping regions of the image to create a saliency map, which highlights relevant regions of the image rather than pixels.Models that accept image inputs. Recommended especially fornatural images, which are any real-world scenes that contain multiple objects.
  • Classification on image data

Differentiable and non-differentiable models

Note: This section only applies to custom-trained TensorFlow models that use aTensorFlow prebuilt container toserve inferences.

Indifferentiable models, you can calculate the derivative of all theoperations in your TensorFlow graph. This property helps to make backpropagationpossible in such models. For example, neural networks are differentiable. To getfeature attributions for differentiable models, use the integrated gradientsmethod.

The integrated gradients method doesnot work for non-differentiable models.Learn more aboutencoding non-differentiableinputs to workwith the integrated gradients method.

Non-differentiable models include non-differentiable operations in theTensorFlow graph, such as operations that perform decoding and rounding tasks.For example, a model built as an ensemble of trees and neural networks isnon-differentiable. To get feature attributions for non-differentiable models,use the sampled Shapley method. Sampled Shapley also works on differentiablemodels, but in that case, it is more computationally expensive than necessary.

Conceptual limitations

Consider the following limitations of feature attributions:

  • Feature attributions, including local feature importance for AutoML, arespecific to individual inferences. Inspecting the feature attributions foran individual inference may provide good insight, but the insight may notbe generalizable to the entire class for that individual instance, or theentire model.

    To get more generalizable insight for AutoML models, refer to the modelfeature importance. To get more generalizable insight for other models,aggregate attributions over subsets over your dataset, or the entiredataset.

  • Although feature attributions can help with model debugging, they don'talways indicate clearly whether an issue arises from the model or from thedata that the model is trained on. Use your best judgment, and diagnosecommon data issues to narrow the space of potential causes.

  • Feature attributions are subject to similar adversarial attacks asinferences in complex models.

For more information about limitations, refer to thehigh-level limitationslist.

References

For feature attribution, the implementations of sampled Shapley, integratedgradients, and XRAI are based on the following references, respectively:

Notebooks

To get started using Vertex Explainable AI, use these notebooks:

NotebookExplainability methodML frameworkModalityTask
GitHub linkexample-based explanationsTensorFlowimageTrain a classification model that predicts the class of the provided input image and get online explanations
GitHub linkfeature-basedAutoMLtabularTrain a binary classification model that predicts whether a bank custom purchased a term deposit and get batch explanations
GitHub linkfeature-basedAutoMLtabularTrain a classification model that predicts the type of Iris flower species and get online explanations
GitHub linkfeature-based (sampled Shapley)scikit-learntabularTrain a linear regression model that predicts taxi fares and get online explanations
GitHub linkfeature-based (integrated gradients)TensorFlowimageTrain a classification model that predicts the class of the provided input image and get batch explanations
GitHub linkfeature-based (integrated gradients)TensorFlowimageTrain a classification model that predicts the class of the provided input image and get online explanations
GitHub linkfeature-based (integrated gradients)TensorFlowtabularTrain a regression model that predicts the median price of a house and get batch explanations
GitHub linkfeature-based (integrated gradients)TensorFlowtabularTrain a regression model that predicts the median price of a house and get online explanations
GitHub linkfeature-based (sampled Shapley)TensorFlowtextTrain a LSTM model that classifies movie reviews as positive or negative using the text of the review and get online explanations

Educational resources

The following resources provide further useful educational material:

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-18 UTC.