AI and ML perspective: Security

Last reviewed 2025-11-26 UTC

This document in theWell-Architected Framework: AI and ML perspective provides an overview of principles and recommendations to ensure that your AIand ML deployments meet the security and compliance requirements of yourorganization. The recommendations in this document align with thesecurity pillar of the Google Cloud Well-Architected Framework.

Secure deployment of AI and ML workloads is a critical requirement, particularlyin enterprise environments. To meet this requirement, you need to adopt aholistic security approach that starts from the initial conceptualization ofyour AI and ML solutions and extends to development, deployment, and ongoingoperations. Google Cloud offers robust tools and services that are designed tohelp secure your AI and ML workloads.

The recommendations in this document are mapped to the following coreprinciples:

For more information about AI security, you can also review the followingresources:

  • Google Cloud's Secure AI Framework (SAIF) provides a comprehensive guide for building secure and responsible AIsystems. It outlines key principles and best practices for addressingsecurity and compliance considerations throughout the AI lifecycle.
  • To learn more about Google Cloud's approach to trust in AI, seeourcompliance resource center.

Define clear goals and requirements

Effective AI and ML security is a core component of your overarching businessstrategy. It's easier to integrate the required security and compliance controlsearly in your design and development process, instead of adding controls afterdevelopment.

From the start of your design and development process, make decisions that areappropriate for your specific risk environment and your specific businesspriorities. For example, overly restrictive security measures might protect databut also impede innovation and slow down development cycles. However, a lack ofsecurity can lead to data breaches, reputational damage, and financial losses,which are detrimental to business goals.

To define clear goals and requirements, consider the followingrecommendations.

Align AI and ML security with business goals

To align your AI and ML security efforts with your business goals, use astrategic approach that integrates security into every stage of the AIlifecycle. To follow this approach, do the following:

  1. Define clear business objectives and security requirements:

    • Identify key business goals: Define clear business objectives thatyour AI and ML initiatives are designed to achieve. For example, yourobjectives might be to improve customer experience, optimize operations, ordevelop new products.
    • Translate goals into security requirements: When you clarify yourbusiness goals, define specific security requirements to support thosegoals. For example, your goal might be to use AI to personalize customerrecommendations. To support that goal, your security requirements might beto protect customer data privacy and prevent unauthorized access torecommendation algorithms.
  2. Balance security with business needs:

    • Conduct risk assessments: Identify potential security threats andvulnerabilities in your AI systems.
    • Prioritize security measures: Base the priority of these securitymeasures upon their potential impact on your business goals.
    • Analyze the costs and benefits: Ensure that you invest in the mosteffective solutions. Consider the costs and benefits of different securitymeasures.
    • Shift left on security: Implement security best practices early inthe design phase, and adapt your safety measures as business needs changeand threats emerge.

Identify potential attack vectors and risks

Consider potential attack vectors that could affect your AI systems, such asdata poisoning, model inversion, or adversarial attacks. Continuously monitorand assess the evolving attack surface as your AI system develops, and keeptrack of new threats and vulnerabilities. Remember that changes in your AIsystems can also introduce changes to their attack surface.

To mitigate potential legal and reputational risks, you also need to addresscompliance requirements related to data privacy, algorithmic bias, and otherrelevant regulations.

To anticipate potential threats and vulnerabilities early and make designchoices that mitigate risks, adopt asecure by design approach.

Google Cloud provides a comprehensive suite of tools and services to helpyou implement a secure by design approach:

  • Cloud posture management:Use Security Command Center to identify potential vulnerabilities andmisconfigurations in your AI infrastructure.
  • Attack exposure scores and attack paths:Refine and use the attack exposure scores and attack paths thatSecurity Command Center generates.
  • Google Threat Intelligence:Stay informed about new threats and attack techniques that emerge to targetAI systems.
  • Logging and Monitoring:Track the performance and security of your AI systems, and detect anyanomalies or suspicious activities. Conduct regular security audits toidentify and address potential vulnerabilities in your AI infrastructureand models.
  • Vulnerability management:Implement a vulnerability management process to track and remediatesecurity vulnerabilities in your AI systems.

For more information, seeSecure by Design at Google andImplement security by design.

Keep data secure and prevent loss or mishandling

Data is a valuable and sensitive asset that must be kept secure. Data securityhelps you to maintain user trust, support your business objectives, and meetyour compliance requirements.

To help keep your data secure, consider the following recommendations.

Adhere to data minimization principles

To ensure data privacy, adhere to the principle of data minimization. Tominimize data, don't collect, keep, or use data that's not strictly necessaryfor your business goals. Where possible, use synthetic or fully anonymized data.

Data collection can help drive business insights and analytics, but it's crucialto exercise discretion in the data collection process. If you collect personallyidentifiable information (PII) about your customer, reveal sensitive information,or create bias or controversy, then you might build biased ML models.

You can use Google Cloud features to help you improve data minimizationand data privacy for various use cases:

  • To de-identify your data and also preserve its utility, applytransformation methods like pseudonymization, de-identification, andgeneralization such as bucketing. To implement these methods, you can useSensitive Data Protection.
  • To enrich data and mitigate potential bias, you can use aVertex AI data labeling job. The data labelingprocess adds informative and meaningful tags to raw data, which transformsit into structured training data for ML models. Data labeling addsspecificity to the data and reduces ambiguity.
  • To help protect resources from prolonged access or manipulation, useCloud Storage features tocontrol data lifecycles.

For best practices about how to implement data encryption, seedata encryption at rest and in transit in the Well-Architected Framework.

Monitor data collection, storage, and transformation

Your AI application's training data poses the largest risks for theintroduction of bias and data leakage. To stay compliant and manage data acrossdifferent teams, establish a data governance layer to monitor data flows,transformations, and access. Maintain logs for data access and manipulationactivities. The logs help you audit data access, detect unauthorized accessattempts, and prevent unwanted access.

You can use Google Cloud features to help you implement data governancestrategies:

  • To establish an organization-wide or department-wide data governanceplatform, useDataplex Universal Catalog.A data governance platform can help you to centrally discover, manage,monitor, and govern data and AI artifacts across your data platforms. Thedata governance platform also provides access to trusted users. You canperform the following tasks with Dataplex Universal Catalog:
    • Manage data lineage. BigQuery can also provide column-levellineage.
    • Manage data quality checks and data profiles.
    • Manage data discovery, exploration, and processing acrossdifferent data marts.
    • Manage feature metadata and model artifacts.
    • Create a business glossary to manage metadata and establish astandardized vocabulary.
    • Enrich the metadata with context through aspects and aspect types.
    • Unify data governance across BigLake and open-format tableslike Iceberg and Delta.
    • Build adata mesh to decentralize data ownership among data owners from different teamsor domains. This practice adheres to data security principles and it canhelp improve data accessibility and operational efficiency.
    • Inspect and send sensitive data results from BigQuery to Dataplex Universal Catalog.
  • To build a unified openlakehouse that is well-governed, integrate your data lakes and warehouses with managedmetastore services likeDataproc Metastore andBigLake metastore.An open lakehouse uses open table formats that are compatible withdifferent data processing engines.
  • To schedule the monitoring of features and feature groups, useVertex AI Feature Store.
  • To scan your Vertex AI datasets at the organization,folder, or project level, useSensitive data discovery for Vertex AI.You can alsoanalyze the data profiles that are stored in BigQuery.
  • To capture real-time logs and collect metrics related to data pipelines,useCloud Logging andCloud Monitoring.To collect audit trails of API calls, useCloud Audit Logs.Don't log PII or confidential data in experiments or in different logservers.

Implement role-based access controls with least privilege principles

Implement role-based access controls (RBAC) to assign different levels of accessbased on user roles. Users must have only the minimum permissions that arenecessary to let them perform their role activities. Assign permissions based ontheprinciple of least privilege so that users have only the access that they need, such as no-access, read-only,or write.

RBAC with least privilege is important for security when yourorganization uses sensitive data that resides in data lakes, in feature stores,or in hyperparameters for model training. This practice helps you to preventdata theft, preserve model integrity, and limit the surface area for accidentsor attacks.

To help you implement these access strategies, you can use the followingGoogle Cloud features:

  • To implement access granularity, consider the following options:

    • Map the IAM roles of different products to auser, group, or service account to allow granular access. Map theseroles based on your project needs, access patterns, ortags.
    • SetIAM policies with conditions to manage granular access to your data, model, and modelconfigurations, such as code, resource settings, and hyperparameters.
    • Explore application-level granular access that helps you securesensitive data that you audit and share outside of your team.

  • To limit access to certain resources, you can useprincipal access boundary (PAB) policies.You can also usePrivileged Access Manager to control just-in-time, temporary privilege elevation for selectprincipals. Later, you can view the audit logs for this Privileged Access Manageractivity.

  • To restrict access to resources based on the IP address and end userdevice attributes, you canextend Identity-Aware Proxy (IAP) access policies.

  • To create access patterns for different user groups, you can useVertex AI access control with IAM to combine the predefined or custom roles.

  • To protect Vertex AI Workbench instances by using context-aware accesscontrols, useAccess Context Manager andChrome Enterprise Premium.With this approach, access is evaluated each time a user authenticates tothe instance.

Implement security measures for data movement

Implement secure perimeters and other measures like encryption and restrictionson data movement. These measures help you to prevent data exfiltration and dataloss, which can cause financial losses, reputational damage, legal liabilities,and a disruption to business operations.

To help prevent data exfiltration and loss on Google Cloud, you can use acombination of security tools and services.

To implement encryption, consider the following:

  • To gain more control over encryption keys, use customer-managedencryption keys (CMEKs) in Cloud KMS. When you use CMEKs, thefollowing CMEK-integrated services encrypt data at rest for you:
  • To help protect your data in Cloud Storage, use server-sideencryption to store your CMEKs. If you manage CMEKs on your own servers,server-side encryption can help protect your CMEKs and associated data, evenif your CMEK storage system is compromised.
  • To encrypt data in transit, use HTTPS for all of your API calls to AI andML services. To enforce HTTPS for your applications and APIs, useHTTPS load balancers.

For more best practices about how to encrypt data, seeEncrypt data at rest and in transit in the security pillar of the Well-Architected Framework.

To implement perimeters, consider the following:

  • To create a security boundary around your AI and ML resources andprevent data exfiltration from your Virtual Private Cloud (VPC), useVPC Service Controls to define a service perimeter. Include your AI and ML resources andsensitive data in the perimeter. To control data flow, configure ingressand egress rules for your perimeter.
  • To restrict inbound and outbound traffic to your AI and ML resources,configure firewall rules.Implement policies that deny all traffic by default and explicitly allowonly the traffic that meets your criteria. For a policy example, seeExample: Deny all external connections except to specific ports.

To implement restrictions on data movement, consider the following:

  • To share data and to scale across privacy boundaries in a secureenvironment, useBigQuery sharing andBigQuery data clean rooms,which provide a robust security and privacy framework.
  • To share data directly into built-in destinations from businessintelligence dashboards, useLooker Action Hub,which provides a secure cloud environment.

Guard against data poisoning

Data poisoning is a type of cyberattack in which attackers inject maliciousdata into training datasets to manipulate model behavior or to degradeperformance. This cyberattack can be a serious threat to ML training systems.To protect the validity and quality of the data, maintain practices that guardyour data. This approach is crucial for consistent unbiasedness, reliability,and integrity of your model.

To track inconsistent behavior, transformation, or unexpected access to yourdata, set up comprehensivemonitoring and alerting for data pipelines and ML pipelines.

Google Cloud features can help you implement more protections against datapoisoning:

  • To validate data integrity, consider the following:

    • Implement robust data validation checks before you use the datafor training. Verify data formats, ranges, and distributions. You canuse the automatic data quality capabilities inDataplex Universal Catalog.
    • Use Sensitive Data Protection with Model Armor to takeadvantage of comprehensive data loss prevention capabilities. For moreinformation, seeModel Armor key concepts.Sensitive Data Protection with Model Armor lets youdiscover, classify, and protect sensitive data such as intellectualproperty. These capabilities can help you prevent the unauthorizedexposure of sensitive data in LLM interactions.
    • To detect anomalies in your training data that might indicatedata poisoning, useanomaly detection in BigQuery with statistical methods or ML models.
  • To prepare for robust training, do the following:

    • Employensemble methods to reduce the impact of poisoned datapoints. Train multiple models on different subsets of the data withhyperparameter tuning.
    • Use data augmentation techniques to balance the distribution ofdata across datasets. This approach can reduce the impact of datapoisoning and lets you add adversarial examples.
  • To incorporate human review for training data or model outputs, do thefollowing:

    • Analyze model evaluation metrics todetect potential biases, anomalies, or unexpected behavior that mightindicate data poisoning. For details, seeModel evaluation in Vertex AI.
    • Take advantage of domain expertise to evaluate the model orapplication and identify suspicious patterns or data points thatautomated methods might not detect. For details, seeGen AI evaluation service overview.

For best practices about how to create data platforms that focus oninfrastructure and data security, see theImplement security by design principle in the Well-Architected Framework.

Keep AI pipelines secure and robust against tampering

Your AI and ML code and the code-defined pipelines are critical assets. Codethat isn't secured can be tampered with, which can lead to data leaks,compliance failure, and disruption of critical business activities. Keeping yourAI and ML code secure helps to ensure the integrity and value of your models andmodel outputs.

To keep AI code and pipelines secure, consider the following recommendations.

Use secure coding practices

To prevent vulnerabilities, use secure coding practices when you develop yourmodels. We recommend that you implement AI-specific input and outputvalidation, manage all of your software dependencies, and consistently embedsecure coding principles into your development. Embed security into every stageof the AI lifecycle, from data preprocessing to your final application code.

To implement rigorous validation, consider the following:

  • To prevent model manipulation or system exploits, validate and sanitizeinputs and outputs in your code.

    • Use Model Armor or fine-tuned LLMs to automaticallyscreen prompts and responses for common risks.
    • Implement data validation within your data ingestion andpreprocessing scripts for data types, formats, and ranges. ForVertex AI Pipelines or BigQuery, you can use Python to implement this data validation.
    • Use coding assistant LLM agents, likeCodeMender,to improve code security. Keep a human in the loop to validate itsproposed changes.
  • To manage and secure your AI model API endpoints, useApigee,which includes configurable features like request validation, trafficcontrol, and authentication.

  • To help mitigate risk throughout the AI lifecycle, you can useAI Protection to do the following:

    • Discover AI inventory in your environment.
    • Assess the inventory for potential vulnerabilities.
    • Secure AI assets with controls, policies, and protections.
    • Manage AI systems with detection, investigation, and responsecapabilities.

To help secure the code and artifact dependencies in your CI/CD pipeline,consider the following:

  • To address the risks that open-source library dependencies canintroduce to your project, useArtifact Analysis withArtifact Registry to detect known vulnerabilities. Use and maintain the approved versions oflibraries. Store your custom ML packages and vetted dependencies in aprivate Artifact Registry repository.
  • To embed dependency scanning into your Cloud Build MLOps pipelines,useBinary Authorization.Enforce policies that allow deployments only if your code's containerimages pass the security checks.
  • To get security information about your software supply chain, usedashboards in the Google Cloud console that provide details about sources, builds,artifacts, deployments, and runtimes. This information includesvulnerabilities in build artifacts, build provenance, and Software Bill ofMaterials (SBOM) dependency lists.
  • To assess the maturity level of your software supply chain security, usetheSupply chain Levels for Software Artifacts (SLSA) framework.

To consistently embed secure coding principles into every stage of development,consider the following:

  • To prevent the exposure of sensitive data from model interactions, useLogging with Sensitive Data Protection. When you usethese products together, you can control what data your AI applications andpipeline components log, and hide sensitive data.
  • To implement the principle of least privilege, ensure that the serviceaccounts that you use for your Vertex AI custom jobs,pipelines, and deployed models have only the minimum requiredIAM permissions. For more information, seeImplement role-based access controls with least privilege principles.
  • To help secure and protect your pipelines and build artifacts,understand the security configurations (VPC andVPC Service Controls) in the environment your code runs in.

Protect pipelines and model artifacts from unauthorized access

Your model artifacts and pipelines are intellectual property, and theirtraining data also contains proprietary information. To protect model weights,files, and deployment configurations from tampering and vulnerabilities, storeand access these artifacts with improved security. Implement different accesslevels for each artifact based on user roles and needs.

To help secure your model artifacts, consider the following:

  • To protect model artifacts and other sensitive files, encrypt them withCloud KMS. This encryption helps to protect data at rest and intransit, even if the underlying storage becomes compromised.
  • To help secure access to your files, store them in Cloud Storageand configure access controls.
  • To track any incorrect or inadequate configurations and any drift fromyour defined standards, use Security Command Center to configuresecurity postures.
  • To enable fine-grained access control and encryption at rest, store yourmodel artifacts inVertex AI Model Registry. For additional security, createa digital signature for packages and containers that are produced duringthe approved build processes.
  • To benefit from Google Cloud's enterprise-grade security, use modelsthat are available in Model Garden. Model Gardenprovides Google's proprietary models and it offersthird-party models from featured partners.
  • To enforce central management for all user and group lifecycles and toenforce the principle of least privilege, use IAM.

    • Create and use dedicated, least-privilege service accounts foryour MLOps pipelines. For example, a training pipeline's serviceaccount has the permissions to read data from only a specificCloud Storage bucket and to write model artifacts toModel Registry.
    • Use IAM Conditionsto enforce conditional, attribute-based access control. For example, acondition allows a service account to trigger aVertex AI pipeline only if the request originates from atrusted Cloud Build trigger.

To help secure your deployment pipelines, consider the following:

  • To manage MLOps stages on Google Cloud services and resources,use Vertex AI Pipelines,which can integrate with other services and provide low-level accesscontrol. When you re-execute the pipelines, ensure that you performVertex Explainable AI andresponsible AI checks before you deploy the model artifacts. These checks can help youdetect or prevent the following security issues:

    • Unauthorized changes, which can indicate model tampering.
    • Cross-site scripting (XSS), which can indicate compromisedcontainer images or dependencies.
    • Insecure endpoints, which can indicate misconfigured servinginfrastructure.
  • To help secure model interactions during inference, use privateendpoints based onPrivate Service Connect withprebuilt containers orcustom containers.Create model signatures with a predefined input and output schema.

  • To automate code change tracking, use Git for source code management,and integrate version control with robust CI/CD pipelines.

For more information, seeSecuring the AI Pipeline.

Enforce lineage and tracking

To help meet the regulatory compliance requirements that you might have, enforcelineage and tracking of your AI and ML assets. Data lineage and trackingprovides extensive change records for data, models, and code. Model provenanceprovides transparency and accountability throughout the AI and ML lifecycle.

To effectively enforce lineage and tracking in Google Cloud, consider thefollowing tools and services:

  • To track the lineage of models, datasets, and artifacts that areautomatically encrypted at rest, useVertex ML Metadata.Log metadata about data sources, transformations, model parameters, andexperiment results.
  • To track thelineage of pipeline artifacts from Vertex AI Pipelines, and to search for model and datasetresources, you can use Dataplex Universal Catalog. Track individual pipeline artifactswhen you want to perform debugging, troubleshooting, or a root cause analysis.To track your entire MLOps pipeline, which includes the lineage of pipelineartifacts, useVertex ML Metadata.Vertex ML Metadata also lets you analyze the resources and runs.Model Registry applies and manages the versions ofeach model that you store.
  • To track API calls and administrative actions, enableaudit logs for Vertex AI.Analyze audit logs with Log Analytics to understand who accessed or modified data andmodels, and when. You can alsoroute logs to third-party destinations.

Deploy on secure systems with secure tools and artifacts

Ensure that your code and models run in a secure environment. This environmentmust have a robust access control system and provide security assurances for thetools and artifacts that you deploy.

To deploy your code on secure systems, consider the followingrecommendations.

Train and deploy models in a secure environment

To maintain system integrity, confidentiality, and availability for your AI andML systems, implement stringent access controls that prevent unauthorizedresource manipulation. This defense helps you to do the following:

  • Mitigate model tampering that could produce unexpected or conflictingresults.
  • Protect your training data from privacy violations.
  • Maintain service uptime.
  • Maintain regulatory compliance.
  • Build user trust.

To train your ML models in an environment with improved security, use managedservices in Google Cloud like Cloud Run,GKE, and Dataproc. You can also useVertex AI serverless training.

This section provides recommendations to help you further help secure yourtraining and deployment environment.

To help secure your environment and perimeters, consider the following:

To help secure your deployment, consider the following:

  • When you deploy models, use Model Registry.If you deploy models in containers, use GKE Sandbox andContainer-Optimized OS toenhance security and isolate workloads.Restrict access to models from Model Garden according to user roles andresponsibilities.
  • To help secure your model APIs, use Apigee orAPI Gateway.To prevent abuse, implement API keys, authentication,authorization, and rate limiting. To control access to model APIs, useAPI keys and authentication mechanisms.
  • To help secure access to models during prediction, useVertex AI Inference.To prevent data exfiltration,use VPC Service Controls perimeters to protectprivate endpoints and govern access to the underlying models. You use private endpointsto enable access to the models within a VPC network.IAM isn't directly applied to the private endpoint, but the target service usesIAM to manage access to the models. For online prediction, werecommend that you use Private Service Connect.
  • To track API calls that are related to model deployment, enableCloud Audit Logs for Vertex AI.Relevant API calls include activities such as endpoint creation, modeldeployment, and configuration updates.
  • To extend Google Cloud infrastructure to edge locations, considerGoogle Distributed Cloud solutions.For a fully disconnected solution, you can useDistributed Cloud air-gapped,which doesn't require connectivity to Google Cloud.
  • To help standardize deployments and to help ensure compliance withregulatory and security needs, useAssured Workloads.

Follow SLSA guidelines for AI artifacts

Follow the standardSupply-chain Levels for Software Artifacts (SLSA) guidelines for your AI-specific artifacts, like models and software packages.

SLSA is a security framework that's designed to help you improve the integrityof software artifacts and help prevent tampering. When you adhere to the SLSAguidelines, you can enhance the security of your AI and ML pipeline and theartifacts that the pipeline produces. SLSA adherence can provide the followingbenefits:

  • Increased trust in your AI and ML artifacts: SLSA helps to ensurethat tampering doesn't occur to your models and software packages. Userscan also trace models and software packages back to their source, whichincreases users' confidence in the integrity and reliability of theartifacts.
  • Reduced risk of supply chain attacks: SLSA helps to mitigate therisk of attacks that exploit vulnerabilities in the software supply chain,like attacks that inject malicious code or that compromise build processes.
  • Enhanced security posture: SLSA helps you to strengthen the overallsecurity posture of your AI and ML systems. This implementation can helpreduce the risk of attacks and protect your valuable assets.

To implement SLSA for your AI and ML artifacts on Google Cloud, dothe following:

  1. Understand SLSA levels: Familiarize yourself with the differentSLSA levels and their requirements. As the levels increase, the integrity that theyprovide also increases.
  2. Assess your current level: Evaluate your current practices againstthe SLSA framework to determine your current level and to identify areas forimprovement.
  3. Set your target level: Determine the appropriate SLSA level totarget based on your risk tolerance, security requirements, and thecriticality of your AI and ML systems.
  4. Implement SLSA requirements: To meet your target SLSA level,implement the necessary controls and practices,which could include the following:

    • Source control: Use a version control systemlike Git to track changes to your code and configurations.
    • Build process: Use a service that helps to secure yourbuilds, likeCloud Build,and ensure that your build process is scripted or automated.
    • Provenance generation: Generate provenance metadata thatcaptures details about how your artifacts were built, including thebuild process, source code, and dependencies. For details, seeTrack Vertex ML Metadata andTrack executions and artifacts.
    • Artifact signing: Sign your artifacts to verify theirauthenticity and integrity.
    • Vulnerability management: Scan your artifacts anddependencies for vulnerabilities on a regular basis. Use tools likeArtifact Analysis.
    • Deployment security: Implement deployment practices thathelp to secure your systems, such as the practices that are described inthis document.
  5. Continuous improvement: Monitor and improve your SLSA implementationto address new threats and vulnerabilities, and strive for higher SLSAlevels.

Use validated prebuilt container images

To prevent a single point of failure for your MLOps stages, isolate the tasksthat require different dependency management into different containers. Forexample, use separate containers for feature engineering, training orfine-tuning, and inference tasks. This approach also gives ML engineers theflexibility to control and customize their environment.

To promote MLOps consistency across your organization, use prebuilt containers.Maintain acentral repository of verified and trusted base platform images with the following bestpractices:

  • Maintain a centralized platform team in your organization that builds andmanages standardized base containers.
  • Extend the prebuilt container images that Vertex AIprovides specifically for AI and ML. Manage the container images in acentral repository within your organization.

Vertex AI provides a variety ofprebuilt containers for training and inference, and it also lets you usecustom containers.For smaller models, you can reduce latency for inference if youload models in containers.

To improve the security of your container management, consider the followingrecommendations:

  • UseArtifact Registry to create, store, and manage repositories of container images with differentformats. Artifact Registry handles access control with IAM,and it has integrated observability andvulnerability assessment features. Artifact Registry lets you enable container security features,scan container images, and investigate vulnerabilities.
  • Runcontinuous integration steps andbuild container images with Cloud Build. Dependency issues can be highlighted at thisstage. If you want to deploy only the images that are built byCloud Build, you can useBinary Authorization.To help prevent supply chain attacks,deploy the images built by Cloud Build in Artifact Registry.Integrate automated testing tools such as SonarQube, PyLint, or OWASP ZAP.
  • Use a container platform like GKE orCloud Run, which are optimized for GPU or TPU for AI and MLworkloads. Consider thevulnerability scanning options for containers in GKE clusters.

Consider Confidential Computing for GPUs

To protectdata in use,you can use Confidential Computing.Conventional security measures protect data at rest and in transit, butConfidential Computing encrypts data during processing. When you useConfidential Computing for GPUs, you help to protect sensitive trainingdata and model parameters from unauthorized access. You can also help to preventunauthorized access from privileged cloud users or potential attackers who mightgain access to the underlying infrastructure.

To determine whether you need Confidential Computing for GPUs, consider thesensitivity of the data, regulatory requirements, and potential risks.

If you set up Confidential Computing, consider the following options:

  • For general-purpose AI and ML workloads, useConfidential VM instances with NVIDIA T4 GPUs. These VM instances offer hardware-based encryption ofdata in use.
  • For containerized workloads, useConfidential GKE Nodes.These nodes provide a secure and isolated environment for your pods.
  • To ensure that your workload is running in a genuine and secure enclave,verify the attestation reports that Confidential VM provides.
  • To track performance, resource utilization, and security events, monitoryour Confidential Computing resources and yourConfidential GKE Nodes by using Monitoring andLogging.

Verify and protect inputs

Treat all of the inputs to your AI systems as untrusted, regardless of whetherthe inputs are from end users or other automated systems. To help keep your AIsystems secure and to ensure that they operate as intended, you must detect andsanitize potential attack vectors early.

To verify and protect your inputs, consider the following recommendations.

Implement practices that help secure generative AI systems

Treat prompts as a critical application component that has the same importanceto security as code does. Implement adefense-in-depthstrategy that combines proactive design, automated screening, and disciplinedlifecycle management.

To help secure your generative AI prompts, you must design them for security,screen them before use, and manage them throughout their lifecycle.

To improve the security of your prompt design and engineering, consider thefollowing practices:

  • Structure prompts for clarity: Design and test all of your promptsby usingVertex AI Studio prompt management capabilities.Prompts need to have a clear, unambiguous structure. Define a role,include few-shot examples,and give specific, bounded instructions. These methods reduce the risk thatthe model might misinterpret a user's input in a way that creates asecurity loophole.
  • Test the inputs for robustness and grounding: Test all of yoursystems proactively against unexpected, malformed, and malicious inputs inorder to prevent crashes or insecure outputs. Use red team testing tosimulate real-world attacks. As a standard step in yourVertex AI Pipelines, automate your robustness tests. You can usethe following testing techniques:

    • Fuzz testing.
    • Test directly against PII, sensitive inputs, and SQL injections.
    • Scan multimodal inputs that can contain malware or violateprompt policies.
  • Implement a layered defense: Use multiple defenses and never relyon a single defensive measure. For example, for an application based onretrieval-augmented generation (RAG), use a separate LLM to classifyincoming user intent and check for malicious patterns. Then, that LLM canpass the request to the more-powerful primary LLM that generates the finalresponse.

  • Sanitize and validate inputs: Before you incorporate external input oruser-provided input into a prompt, filter and validate all of the input inyour application code. This validation is important to help you preventindirect prompt injection.

For automated prompt and response screening, consider the followingpractices:

  • Use comprehensive security services: Implement a dedicated,model-agnostic security service like Model Armoras a mandatory protection layer for your LLMs. Model Armorinspects prompts and responses for threats like prompt injection,jailbreak attempts,and harmful content. To help ensure that your models don't leaksensitive training data or intellectual property in their responses, usethe Sensitive Data Protection integrationwith Model Armor. For details, seeModel Armor filters.
  • Monitor and log interactions: Maintain detailed logs for all of theprompts and responses for your model endpoints. UseLogging to audit these interactions, identify patterns of misuse, and detect attackvectors that might emerge against your deployed models.

To help secure prompt lifecycle management, consider the following practices:

  • Implement versioning for prompts: Treat all of your productionprompts like application code. Use a version control system like Git tocreate a complete history of changes, enforce collaboration standards, andenable rollbacks to previous versions. This core MLOps practice can helpyou to maintain stable and secure AI systems.
  • Centralize prompt management: Use a central repository to store,manage, and deploy all of your versioned prompts. This strategy enforcesconsistency across environments and it enables runtime updates without theneed for a full application redeployment.
  • Conduct regular audits and red team testing: Test your system'sdefenses continuously against known vulnerabilities, such as those listedin theOWASP Top 10 for LLM Applications.As an AI engineer, you must be proactive and red-team test your ownapplication to discover and remediate weaknesses before an attacker canexploit them.

Prevent malicious queries to your AI systems

Along with authentication and authorization, which this document discussedearlier, you can take further measures to help secure your AI systems againstmalicious inputs. You need to prepare your AI systems forpost-authenticationscenarios in which attackers bypass both the authentication and authorizationprotocols, and then attempt to attack the system internally.

To implement a comprehensive strategy that can help protect your system frompost-authentication attacks, apply the following requirements:

  • Secure network and application layers: Establish a multi-layereddefense for all of your AI assets.

    • To create a security perimeter that prevents data exfiltrationof models from Model Registry or of sensitive datafrom BigQuery, useVPC Service Controls.Always usedry run mode to validate the impact of a perimeter before you enforce it.
    • To help protect web-based tools such as notebooks, useIAP.
    • To help secure all of the inference endpoints, useApigee for enterprise-grade security and governance. You can also useAPI Gateway for straightforward authentication.
  • Watch for query pattern anomalies: For example, an attacker thatprobes a system for vulnerabilities might send thousands of slightlydifferent, sequential queries. Flag abnormal query patterns that don'treflect normal user behavior.

  • Monitor the volume of requests: A sudden spike in query volumestrongly indicates a denial-of-service (DoS) attack or a model theftattack, which is an attempt to reverse-engineer the model. Use ratelimiting and throttling to control the volume of requests from a single IPaddress or user.

  • Monitor and set alerts for geographic and temporal anomalies:Establish a baseline for normal access patterns. Generate alerts for suddenactivity from unusual geographic locations or at odd hours. For example, amassive spike in logins from a new country at 3 AM.

Monitor, evaluate, and prepare to respond to outputs

AI systems deliver value because they produce outputs that augment, optimize,or automate human decision-making. To maintain the integrity and trustworthinessof your AI systems and applications, ensure that the outputs are secure andwithin the expected parameters. You also need a plan to respond to incidents.

To maintain your outputs, consider the following recommendations.

Evaluate model performance with metrics and security measures

To ensure that your AI models meet performance benchmarks, meet securityrequirements, and adhere to fairness and compliance standards, thoroughlyevaluate the models. Conduct evaluations before deployment, and then continue toevaluate the models in production on a regular basis. To minimize risks andbuild trustworthy AI systems, implement a comprehensive evaluation strategy thatcombines performance metrics with specific AI security assessments.

To evaluate model robustness and security posture, consider the followingrecommendations:

  • Implement model signing and verification in your MLOps pipeline.

    • For containerized models, useBinary Authorization to verify signatures.
    • For models that are deployed directly to Vertex AIendpoints, use custom checks in your deployment scripts for verification.
    • For any model, use Cloud Build formodel signing.
  • Assess your model's resilience to unexpected or adversarial inputs.

    • For all of your models, test your model for common datacorruptions and any potentially malicious data modifications. Toorchestrate these tests, you can use Vertex AI trainingor Vertex AI Pipelines.
    • For security-critical models, conduct adversarial attacksimulations to understand the potential vulnerabilities.
    • For models that are deployed in containers, useArtifact Analysis in Artifact Registry to scan the baseimages for vulnerabilities.
  • Use Vertex AI Model Monitoring to detect drift and skew fordeployed models. Then, feed these insights back into the re-evaluation orretraining cycles.

  • Use model evaluations from Vertex AI as a pipelinecomponent with Vertex AI Pipelines. You can run the modelevaluation component by itself or with other pipeline components. Comparethe model versions against your defined metrics and datasets. Log theevaluation results to Vertex ML Metadata for lineage andtracking.

  • Use or build upon theGen AI evaluation service to evaluate your chosen models or to implement custom human-evaluationworkflows.

To assess fairness, bias, explainability, and factuality, consider thefollowing recommendations:

  • Define fairness measures that match your use cases, and then evaluate your models for potentialbiases across different data slices.
  • Understand which features drive model predictions in order to ensurethat the features, and the predictions that result, align with domainknowledge and ethical guidelines.
  • UseVertex Explainable AI to get feature attributions for your models.
  • Use the Gen AI evaluation service to compute metrics. During the sourceverification phase of testing, the service's grounding metric checks forfactuality against the source text that's provided.
  • Enablegrounding for your model's output in order to facilitate a second layer of sourceverification at the user level.
  • Review ourAI principles and adapt them for your AI applications.

Monitor AI and ML model outputs in production

Continuously monitor your AI and ML models and their supporting infrastructurein production. It's important to promptly identify and diagnose degradations inmodel output quality or performance, security vulnerabilities that emerge, anddeviations from compliance mandates. This monitoring helps you sustain systemsafety, reliability, and trustworthiness.

To monitor AI system outputs for anomalies, threats, and quality degradation,consider the following recommendations:

  • Use Model Monitoring for your model outputs to trackunexpected shifts in prediction distributions or spikes in low-confidencemodel predictions. Actively monitor your generative AI model outputs forgenerated content that's unsafe, biased, off-topic, or malicious. You canalso use Model Armor to screen all of your model outputs.
  • Identify specific error patterns, capture quality indicators, or detectharmful or non-compliant outputs at the application level. To find theseissues, use custom monitoring in Monitoring dashboards anduse log-based metrics from Logging.

To monitor outputs for security-specific signals and unauthorized changes,consider the following recommendations:

  • Identify unauthorized access attempts to AI models, datasets inCloud Storage or BigQuery, or MLOps pipelinecomponents. In particular, identify unexpected or unauthorized changes inIAM permissions for AI resources. To track these activitiesand review them for suspicious patterns, use the Admin Activity audit logsand Data Access audit logs in Cloud Audit Logs.Integrate the findings from Security Command Center, which can flag securitymisconfigurations and flag potential threats that are relevant to your AIassets.
  • Monitor outputs for high volumes of requests or requests from suspicioussources, which might indicate attempts to reverse engineer models orexfiltrate data. You can also use Sensitive Data Protection tomonitor for the exfiltration of potentially sensitive data.
  • Integrate logs into your security operations. Use Google Security Operationsto help you detect, orchestrate, and respond to any cyber threats from yourAI systems.

To track the operational health and performance of the infrastructure thatserves your AI models, consider the following recommendations:

  • Identify operational issues that can impact service delivery or modelperformance.
  • Monitor Vertex AI endpoints for latency, error rates, andtraffic patterns.
  • Monitor MLOps pipelines for execution status and errors.
  • Use Monitoring, which provides ready-made metrics. You canalso create custom dashboards to help you identify issues like endpointoutages or pipeline failures.

Implement alerting and incident response procedures

When you identify any potential performance, security, or compliance issues, aneffective response is critical. To ensure timely notifications to theappropriate teams, implement robust alerting mechanisms. Establish andoperationalize comprehensive, AI-aware incident response procedures to manage,contain, and remediate these issues efficiently.

To establish robust alerting mechanisms for AI issues that you identify,consider the following recommendations:

  • Configure actionable alerts to notify the relevant teams, based on themonitoring activities of your platform. For example, configure alerts totrigger when Model Monitoring detects significantdrift, skew, or prediction anomalies. Or, configure alerts to trigger whenModel Armor or custom Monitoring rules flagmalicious inputs or unsafe outputs.
  • Define clear notification channels, which can include Slack, email, orSMS through Pub/Sub integrations. Customize the notificationchannels for your alert severities and the responsible teams.

Develop and operationalize an AI-aware incident response plan. A structuredincident response plan is vital to minimize any potential impacts and ensurerecovery. Customize this plan to address AI-specific risks such as modeltampering, incorrect predictions due to drift, prompt injection, or unsafeoutputs from generative models. To create an effective plan, include thefollowing key phases:

  • Preparation: Identify assets and their vulnerabilities, developplaybooks, and ensure that your teams have appropriate privileges. Thisphase includes the following tasks:

    • Identify critical AI assets, such as models, datasets, andspecific Vertex AI resources like endpoints orVertex AI Feature Store instances.
    • Identify the assets' potential failure modes or attack vectors.
    • Develop AI-specific playbooks for incidents that match yourorganization's threat model. For example, playbooks might include thefollowing:

      • A model rollback that uses versioning inModel Registry.
      • An emergency retraining pipeline onVertex AI training.
      • The isolation of a compromised data source inBigQuery or Cloud Storage.
    • Use IAM to ensure that response teams have the necessaryleast-privilege access to tools that are required during an incident.

  • Identification and triage: Use configured alerts to detect andvalidate potential incidents. Establish clear criteria and thresholds forhow your organization investigates or declares an AI-related incident. Fordetailed investigation and evidence collection, use Loggingfor application logs and service logs, and use Cloud Audit Logs foradministrative activities and data access patterns. Security teams can useGoogle SecOps for deeper analyses of security telemetry.

  • Containment: Isolate affected AI systems or components to preventfurther impact or data exfiltration. This phase might include the followingtasks:

    • Disable a problematic Vertex AI endpoint.
    • Revoke specific IAM permissions.
    • Update firewall rules or Cloud Armor policies.
    • Pause a Vertex AI pipeline that's misbehaving.
  • Eradication: Identify and remove the root cause of the incident.This phase might include the following tasks:

    • Patch the vulnerable code in a custom model container.
    • Remove the identified malicious backdoors from a model.
    • Sanitize the poisoned data before you initiate a secureretraining job on Vertex AI training.
    • Update any insecure configurations.
    • Refine the input validation logic to block specific prompt-injectiontechniques.
  • Recovery and secure redeployment: Restore the affected AI systems toa known good and secure operational state. This phase might include thefollowing tasks:

    • Deploy a previously validated and trusted model version fromModel Registry.
    • Ensure that you find and apply all of the security patches forvulnerabilities that might be present in your code or system.
    • Reset the IAM permissions to the principle ofleast privilege.
  • Post-incident activity and lessons learned: After you resolve thesignificant AI incidents, conduct a thorough post-incident review. Thisreview involves all of the relevant teams, such as the AI and ML, MLOps,security, and data science teams. Understand the full lifecycle of theincident. Use these insights to refine the AI system design, updatesecurity controls, improve Monitoring configurations, andenhance the AI incident response plan and playbooks.

Integrate the AI incident response with the broader organizational frameworks,such as IT and security incident management, for a coordinated effort. To alignyour AI-specific incident response with your organizational frameworks, considerthe following:

  • Escalation: Define clear paths for how you escalate significant AIincidents to central SOC, IT, legal, or relevant business units.
  • Communication: Use established organizational channels for allinternal and external incident reports and updates.
  • Tooling and processes: Use existing enterprise incident managementand ticketing systems for AI incidents to ensure consistent tracking andvisibility.
  • Collaboration: Pre-define collaboration protocols between AI and ML,MLOps, data science, security, legal, and compliance teams for effective AIincident responses.

Contributors

Authors:

Other contributors:

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-11-26 UTC.