Introduction to Vertex Explainable AI Stay organized with collections Save and categorize content based on your preferences.
Machine learning models are often "black boxes"; even their designers cannotexplain how or why a model produced a specific inference. Vertex Explainable AI offersFeature-based andExample-based explanations to provide better understandingof model decision making.
Knowing how a model behaves, and how its training dataset influences the model, givesanyone who builds or uses ML new abilities to improve models, build confidencein their inferences, and understand when and why things go awry.
Example-based explanations
With example-based explanations, Vertex AI usesnearest neighbor searchto return a list of examples (typically from the training set) that are mostsimilar to the input. Because we generally expect similar inputs to yieldsimilar inferences, we can use these explanations to explore and explain our model'sbehavior.
Example-based explanations can be useful in several scenarios:
Improve your data or model: One of the core use cases for example-basedexplanations is helping you understand why your model made certain mistakesin its inferences, and using those insights to improve your data or model.To do so, first select test data that is of interest to you. This could beeither driven by business needs or heuristics like data where the model madethe most egregious mistakes.
For example, suppose we have a model that classifies images as either a birdor a plane and misclassifies the following bird as a plane with highconfidence. Use Example-based explanations to retrieve similar images fromthe training set to figure out what is happening.

Since all of its explanations are dark silhouettes from the plane class,it's a signal to get more bird silhouettes.
However, if the explanations were mainly from the bird class, it's a signalthat our model can't learn relationships even when the data is rich, and weshould consider increasing model complexity (for example, adding morelayers).
Interpret novel data: Assume, for the moment, that your model wastrained to classify birdsand planes, but in the real world, the model also encounters images ofkites, drones, and helicopters. If your nearest neighbor dataset includessome labeled images of kites, drones, and helicopters, you can use example-basedexplanations to classify novel images by applying the most frequentlyoccurring label of its nearest neighbors. This is possible because we expectthe latent representation of kites to be different from that of birds orplanes and more similar to the labeled kites in the nearest neighbordataset.
Detect anomalies: Intuitively, if an instance is far away from all ofthe data in the training set, then it is likely an outlier. Neural networksare known to be overconfident in their mistakes, thus masking their errors.Monitoring your models using example-based explanations helps identify themost serious outliers.
Active learning: Example-based explanations can help you identify theinstances that might benefit from human labeling. This is particularlyuseful if labeling is slow or expensive, ensuring that you get the richestpossible dataset from limited labeling resources.
For example, suppose we have a model that classifies a medical patient aseither having a cold or a flu. If a patient is classified as having the flu,and all of her example-based explanations are from the flu class, then thedoctor can be more confident in the model's inference without having totake a closer look. However, if some of the explanations are from the fluclass, and some others from cold class, it would be worthwhile to get adoctor's opinion. This will lead to a dataset where difficult instances havemore labels, making it easier for downstream models to learn complexrelationships.
To create a Model that supports example-based explanations, seeConfiguringexample-basedexplanations.
Supported model types
Any TensorFlow model that can provide an embedding (latent representation) forinputs is supported. Tree-based models, such as decision trees, are notsupported. Models from other frameworks, such as PyTorch or XGBoost, are notsupported yet.
For deep neural networks, we generally assume that the higher layers (closer tothe output layer) have learned something "meaningful", and thus, the penultimatelayer is often chosen for embeddings. Experiment with a few different layers,investigate the examples you are getting, and choose one based on somequantitative (class match) or qualitative (looks sensible) measures.
For a demonstration on how to extract embeddings from a TensorFlow model andperform nearest neighbor search, see theexample-based explanationnotebook.
Feature-based explanations
Vertex Explainable AI integratesfeature attributions into Vertex AI. Thissection provides a brief conceptual overview of the feature attribution methodsavailable with Vertex AI.
Feature attributions indicate how much each feature in your model contributed tothe inferences for each given instance. When you request inferences, you getvalues as appropriate for your model. When you requestexplanations,you get the inferences along with feature attribution information.
Feature attributions work on tabular data, and include built-in visualizationcapabilities for image data. Consider the following examples:
A deep neural network is trained to predict the duration of a bike ride,based on weather data and previous ride sharing data. If you request onlyinferences from this model, you get predicted durations of bike rides innumber of minutes. If you requestexplanations, you get the predicted biketrip duration, along with an attribution score for each feature in yourexplanations request. The attribution scores show how much the featureaffected the change in inference value, relative to the baseline value thatyou specify. Choose a meaningful baseline that makes sense for your model -in this case, the median bike ride duration. You can plot the featureattribution scores to see which features contributed most strongly to theresulting inference:

An image classification model is trained to predict whether a given imagecontains a dog or a cat. If you request inferences from this model on a newset of images, then you receive an inference for each image ("dog" or"cat"). If you requestexplanations, you get the predicted class alongwith an overlay for the image, showing which pixels in the image contributedmost strongly to the resulting inference:

A photo of a cat with feature attribution overlay 
A photo of a dog with feature attribution overlay An image classification model is trained to predict the species of a flowerin the image. If you request inferences from this model on a new set ofimages, then you receive an inference for each image ("daisy" or"dandelion"). If you requestexplanations, you get the predicted classalong with an overlay for the image, showing which areas in the imagecontributed most strongly to the resulting inference:

A photo of a daisy with feature attribution overlay
Supported model types
Feature attribution is supported for all types of models (both AutoML andcustom-trained), frameworks (TensorFlow, scikit, XGBoost), BigQuery MLmodels, and modalities (images, text, tabular).
To use feature attribution,configure your model for featureattributionwhen you upload or register the model to the Vertex AI Model Registry.
Additionally, for the following types of AutoML models, feature attribution isintegrated into the Google Cloud console:
- AutoML image models (classification models only)
- AutoML tabular models (classification and regression models only)
For AutoML model types that are integrated, you can enable feature attributionin the Google Cloud console during training and seemodel feature importance forthe model overall, andlocal feature importance for bothonline andbatch inferences.
For AutoML model types that are not integrated, you can still enable featureattribution by exporting the model artifacts and configuring feature attributionwhen you upload the model artifacts to the Vertex AI Model Registry.
Advantages
If you inspect specific instances, and also aggregate feature attributionsacross your training dataset, you can get deeper insight into how your modelworks. Consider the following advantages:
Debugging models: Feature attributions can help detect issues in thedata that standard model evaluation techniques would usually miss.
Optimizing models: You can identify and remove features that are lessimportant, which can result in more efficient models.
Feature attribution methods
Each feature attribution method is based onShapley values - a cooperativegame theory algorithm that assigns credit to each player in a game for aparticular outcome. Applied to machine learning models, this means that eachmodel feature is treated as a "player" in the game. Vertex Explainable AI assignsproportional credit to each feature for the outcome of a particular inference.
Sampled Shapley method
Thesampled Shapley method provides a sampling approximation of exact Shapleyvalues. AutoML tabular models use the sampled Shapley method for featureimportance. Sampled Shapley works well for these models, which aremeta-ensembles of trees and neural networks.
For in-depth information about how the sampled Shapley method works, read thepaperBounding the Estimation Error of Sampling-based Shapley ValueApproximation.
Integrated gradients method
In theintegrated gradients method, the gradient of the inference output iscalculated with respect to the features of the input, along an integral path.
- The gradients are calculated at different intervals of a scaling parameter.The size of each interval is determined by using theGaussianquadrature rule. (Forimage data, imagine this scaling parameter as a "slider" that is scaling allpixels of the image to black.)
- The gradients are integrated as follows:
- The integral is approximated by using a weighted average.
- The element-wise product of the averaged gradients and the originalinput is calculated.
For an intuitive explanation of this process as applied to images, refer to theblog post,"Attributing a deep network's inference to its input features".The authors of the original paper about integrated gradients (AxiomaticAttribution for Deep Networks) show in the preceding blog post what the imageslook like at each step of the process.
XRAI method
TheXRAI method combines the integrated gradients method with additional stepsto determine whichregions of the image contribute the most to a given classinference.
- Pixel-level attribution: XRAI performs pixel-level attribution for the inputimage. In this step, XRAI uses the integrated gradients method with a blackbaseline and a white baseline.
- Oversegmentation: Independently of pixel-level attribution, XRAIoversegments the image to create a patchwork of small regions. XRAI usesFelzenswalb's graph-based method to create theimage segments.
- Region selection: XRAI aggregates the pixel-level attribution within eachsegment to determine its attribution density. Using these values, XRAI rankseach segment and then orders the segments from most to least positive. Thisdetermines which areas of the image are most salient, or contribute moststrongly to a given class inference.

Compare feature attribution methods
Vertex Explainable AI offers three methods to use for feature attributions:sampledShapley,integrated gradients, andXRAI.
| Method | Basic explanation | Recommended model types | Example use cases | Compatible Vertex AIModel resources |
|---|---|---|---|---|
| Sampled Shapley | Assigns credit for the outcome to each feature, and considers different permutations of the features. This method provides a sampling approximation of exact Shapley values. | Non-differentiable models, such as ensembles of trees and neural networks |
|
|
| Integrated gradients | A gradients-based method to efficiently compute feature attributions with the same axiomatic properties as the Shapley value. | Differentiable models, such as neural networks. Recommended especially for models with large feature spaces. Recommended for low-contrast images, such as X-rays. |
|
|
| XRAI (eXplanation with Ranked Area Integrals) | Based on the integrated gradients method, XRAI assesses overlapping regions of the image to create a saliency map, which highlights relevant regions of the image rather than pixels. | Models that accept image inputs. Recommended especially fornatural images, which are any real-world scenes that contain multiple objects. |
|
|
Differentiable and non-differentiable models
Note: This section only applies to custom-trained TensorFlow models that use aTensorFlow prebuilt container toserve inferences.Indifferentiable models, you can calculate the derivative of all theoperations in your TensorFlow graph. This property helps to make backpropagationpossible in such models. For example, neural networks are differentiable. To getfeature attributions for differentiable models, use the integrated gradientsmethod.
The integrated gradients method doesnot work for non-differentiable models.Learn more aboutencoding non-differentiableinputs to workwith the integrated gradients method.
Non-differentiable models include non-differentiable operations in theTensorFlow graph, such as operations that perform decoding and rounding tasks.For example, a model built as an ensemble of trees and neural networks isnon-differentiable. To get feature attributions for non-differentiable models,use the sampled Shapley method. Sampled Shapley also works on differentiablemodels, but in that case, it is more computationally expensive than necessary.
Conceptual limitations
Consider the following limitations of feature attributions:
Feature attributions, including local feature importance for AutoML, arespecific to individual inferences. Inspecting the feature attributions foran individual inference may provide good insight, but the insight may notbe generalizable to the entire class for that individual instance, or theentire model.
To get more generalizable insight for AutoML models, refer to the modelfeature importance. To get more generalizable insight for other models,aggregate attributions over subsets over your dataset, or the entiredataset.
Although feature attributions can help with model debugging, they don'talways indicate clearly whether an issue arises from the model or from thedata that the model is trained on. Use your best judgment, and diagnosecommon data issues to narrow the space of potential causes.
Feature attributions are subject to similar adversarial attacks asinferences in complex models.
For more information about limitations, refer to thehigh-level limitationslist.
References
For feature attribution, the implementations of sampled Shapley, integratedgradients, and XRAI are based on the following references, respectively:
- Bounding the Estimation Error of Sampling-based Shapley ValueApproximation
- Axiomatic Attribution for Deep Networks
- XRAI: Better Attributions Through Regions
Notebooks
To get started using Vertex Explainable AI, use these notebooks:
| Notebook | Explainability method | ML framework | Modality | Task |
|---|---|---|---|---|
| GitHub link | example-based explanations | TensorFlow | image | Train a classification model that predicts the class of the provided input image and get online explanations |
| GitHub link | feature-based | AutoML | tabular | Train a binary classification model that predicts whether a bank custom purchased a term deposit and get batch explanations |
| GitHub link | feature-based | AutoML | tabular | Train a classification model that predicts the type of Iris flower species and get online explanations |
| GitHub link | feature-based (sampled Shapley) | scikit-learn | tabular | Train a linear regression model that predicts taxi fares and get online explanations |
| GitHub link | feature-based (integrated gradients) | TensorFlow | image | Train a classification model that predicts the class of the provided input image and get batch explanations |
| GitHub link | feature-based (integrated gradients) | TensorFlow | image | Train a classification model that predicts the class of the provided input image and get online explanations |
| GitHub link | feature-based (integrated gradients) | TensorFlow | tabular | Train a regression model that predicts the median price of a house and get batch explanations |
| GitHub link | feature-based (integrated gradients) | TensorFlow | tabular | Train a regression model that predicts the median price of a house and get online explanations |
| GitHub link | feature-based (sampled Shapley) | TensorFlow | text | Train a LSTM model that classifies movie reviews as positive or negative using the text of the review and get online explanations |
Educational resources
The following resources provide further useful educational material:
- Explainable AI for Practitioners
- Interpretable Machine Learning: Shapley values
- Ankur Taly'sIntegrated Gradients GitHub repository.
- Introduction to Shapley values
What's next
- Configure your model for feature-based explanations
- Configure your model for example-based explanations
- Viewfeature importance for AutoML tabularmodels.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2026-02-18 UTC.