Model Armor overview Stay organized with collections Save and categorize content based on your preferences.
Model Armor is a Google Cloud service designed to enhance thesecurity and safety of your AI applications. It works by proactively screeningLLM prompts and responses, protecting against various risks and ensuringresponsible AI practices. Whether you are deploying AI in Google Cloud or othercloud providers, Model Armor can help you prevent maliciousinput, verify content safety, protect sensitive data, maintain compliance, andenforce your AI safety and security policies consistently across your AIapplications.
Architecture
This architecture diagram shows an application using Model Armorto protect an LLM and a user. The following steps explain the data flow:
- A user provides a prompt to the application.
- Model Armor inspects the incoming prompt for potentiallysensitive content.
- The prompt (or sanitized prompt) is sent to the LLM.
- The LLM generates a response.
- Model Armor inspects the generated response for potentiallysensitive content.
- The response (or sanitized response) is sent to the user.Model Armor sends a detailed description of triggered anduntriggered filters in the response.
Model Armor filters both input (prompts) and output (responses)to prevent the LLM from exposure to or generation of malicious or sensitivecontent.
Use cases
Model Armor has several use cases, which include the following:
Security
- Mitigate the risk of leaking sensitive intellectual property (IP) andpersonally identifiable information (PII) in LLM prompts or responses.
- Protect against prompt injection and jailbreak attacks, preventingmalicious actors from manipulating AI systems to perform unintendedactions.
- Scan text in PDFs for sensitive or malicious content.
Safety and responsible AI
- Prevent your chatbot from recommending competitor solutions, maintainingbrand integrity and customer loyalty.
- Filter social media posts generated by AI applications that containharmful messaging, such as dangerous or hateful content.
Model Armor templates
Model Armor templates let you configure howModel Armor screens prompts and responses. They function as setsof customized filters and thresholds for different safety and securityconfidence levels, allowing control over what content is flagged.
The thresholds represent confidence levels—how confidentModel Armor is that the prompt or response includes offendingcontent. For example, you can create a template that filters prompts for hatefulcontent with aHIGH threshold, meaning Model Armor reports highconfidence that the prompt contains hateful content. ALOW_AND_ABOVE thresholdindicates any level of confidence (LOW,MEDIUM, andHIGH) in making thatclaim.
For more information, seeModel Armortemplates.
Model Armor confidence levels
You can set confidence levels for responsible AI safety categories (sexuallyexplicit, dangerous, harassment, and hate speech), prompt injection andjailbreak detection, and sensitive data protection (including topicality).
Note: Confidence levels for Sensitive Data Protection operate differentlythan those for other filters. For more information about confidence levels forSensitive Data Protection, seeSensitive Data Protection matchlikelihood.For confidence levels that support granular thresholds,Model Armor interprets them as follows:
- High: Identify if the message has content with a high likelihood.
- Medium and above: Identify if the message has content with a medium orhigh likelihood.
- Low and above: Identify if the message has content with a low, medium,or high likelihood.
Model Armor filters
Model Armor offers a variety of filters to help you provide safeand secure AI models. The following filter categories are available.
Responsible AI safety filter
You can screen prompts and responses at the specified confidence levels for thefollowing categories:
| Category | Definition |
|---|---|
| Hate Speech | Negative or harmful comments targeting identity and/or protected attributes. |
| Harassment | Threatening, intimidating, bullying, or abusive comments targeting another individual. |
| Sexually Explicit | Contains references to sexual acts or other lewd content. |
| Dangerous Content | Promotes or enables access to harmful goods, services, and activities. |
| CSAM | Contains references to child sexual abuse material (CSAM). This filter is applied by default and cannot be turned off. |
Prompt injection and jailbreak detection
Prompt injection is a security vulnerability where attackers craft specialcommands within the text input (the prompt) to trick an AI model. This can makethe AI ignore its usual instructions, reveal sensitive information, or performactions it wasn't designed to do. Jailbreaking in the context of LLMs refers tothe act of bypassing the safety protocols and ethical guidelines that are builtinto the model. This lets the LLM generate responses that it was originallydesigned to avoid, such as harmful, unethical, and dangerous content.
When prompt injection and jailbreak detection is enabled,Model Armor scans prompts and responses for malicious content. Ifdetected, Model Armor blocks the prompt or response.
Sensitive Data Protection
Sensitive Data Protection is a Google Cloud service that helps you discover,classify, and de-identify sensitive data. Sensitive Data Protectioncan identify sensitive elements, context, and documents to help you reduce therisk of data leakage going into and out of AI workloads. You can useSensitive Data Protection directly within Model Armor totransform, tokenize, and redact sensitive elements while retaining non-sensitivecontext. Model Armor can accept existing inspection templates,which function as blueprints to streamline the process of scanning andidentifying sensitive data specific to your business and compliance needs. Thisensures consistency and interoperability between other workloads that useSensitive Data Protection.
Model Armor offers two modes for Sensitive Data Protectionconfiguration:
Basic configuration: In this mode, you configureSensitive Data Protection by specifying the types of sensitive data toscan for. This mode supports the following categories:
- Credit card number
- US social security number (SSN)
- Financial account number
- US individual taxpayer identification number (ITIN)
- Google Cloud credentials
- Google Cloud API key
Basic configuration only supports inspection operations and does not supportthe use of Sensitive Data Protection templates. For more information, seeBasic Sensitive Data Protection configuration.
Advanced configuration: This mode offers more flexibility andcustomization through Sensitive Data Protection templates.Sensitive Data Protection templates are predefined configurations thatlet you specify more granular detection rules and de-identificationtechniques. Advanced configuration supports both inspection andde-identification operations.
Confidence levels for Sensitive Data Protection operate differently thanconfidence levels for other filters. For more information about confidencelevels for Sensitive Data Protection, seeSensitive Data Protection matchlikelihood. For more informationabout Sensitive Data Protection in general, seeSensitive Data Protectionoverview.
Malicious URL detection
Malicious URLs are often disguised to look legitimate, making them a potent toolfor phishing attacks, malware distribution, and other online threats. Forexample, if a PDF contains an embedded malicious URL, it can be used tocompromise any downstream systems processing LLM outputs.
When malicious URL detection is enabled, Model Armor scans URLsto identify whether they're malicious. This lets you take action and preventmalicious URLs from being returned.
Note: Model Armor scans only the first 40 malicious URLs found inprompts and responses.Define the enforcement type
Enforcement defines what happens after a violation is detected. To configure howModel Armor handles detections, you set the enforcement type.Model Armor offers the following enforcement types:
- Inspect only: Model Armor inspects requests that violatethe configured settings, but it doesn't block them.
- Inspect and block: Model Armor blocks requests thatviolate the configured settings.
For more information, seeDefine the enforcement type for templates andDefine theenforcement type for floorsettings.
To effectively useInspect only and gain valuable insights, enableCloud Logging. Without Cloud Logging enabled,Inspect only won't yieldany useful information.
Access your logs through Cloud Logging. Filter by the service namemodelarmor.googleapis.com. Look for entries related to the operations that youenabled in your template. For more information, seeView logs by using the LogsExplorer.
Model Armor floor settings
Although Model Armor templates provide flexibility for individualapplications, organizations often need to establish a baseline level ofprotection across all their AI applications. You use Model Armorfloor settings to establish this baseline. They define minimum requirements forall templates created at the project level in the Google Cloud resourcehierarchy.
For more information, seeModel Armor floor settings.
Language support
Model Armor filters support sanitizing prompts and responsesacross multiple languages.
- TheSensitive Data Protection filtersupports English and other languages depending on theinfoTypes that youselected.
Theresponsible AIandprompt injection and jailbreakdetection filters are tested onthe following languages:
- Chinese (Mandarin)
- English
- French
- German
- Italian
- Japanese
- Korean
- Portuguese
- Spanish
These filters can work in many other languages, but the quality of results might vary. For language codes, seeSupported languages.
There are two ways to enable multi-language detection:
Enable on each request: For granular control, enable multi-languagedetection on a per-request basis whensanitizing a user prompt andsanitizing a modelresponse.
Enable one-time: If you prefer a simpler setup, you can enablemulti-language detection as a one-time configuration at theModel Armor template level using the REST API. For moreinformation, seeCreate a Model Armor template.
Document screening
Text in documents can include malicious and sensitive content.Model Armor can screen the following types of documents forsafety, prompt injection and jailbreak attempts, sensitive data, and maliciousURLs:
- PDFs
- CSV
- Text files: TXT
- Microsoft Word documents: DOCX, DOCM, DOTX, DOTM
- Microsoft PowerPoint slides: PPTX, PPTM, POTX, POTM, POT
- Microsoft Excel sheets: XLSX, XLSM, XLTX, XLTM
Pricing
Model Armor can be purchased as an integrated part ofSecurity Command Center or as a standalone service. For pricing information, seeSecurity Command Centerpricing.
Tokens
Generative AI models break down text and other data into units calledtokens.Model Armor uses the total number of tokens in AI prompts andresponses for pricing purposes. Model Armor limits the number oftokens processed in each prompt and response. For token limits, seetokenlimits.
What's next
- Learn aboutModel Armor templates.
- Learn aboutModel Armor floor settings.
- Learn aboutModel Armor endpoints.
- Sanitize prompts and responses.
- Learn aboutModel Armor audit logging.
- Troubleshoot Model Armor issues.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2026-02-19 UTC.