Model Armor overview Stay organized with collections Save and categorize content based on your preferences.
Model Armor is a Google Cloud service designed to enhance the securityand safety of your AI applications. It works by proactively screening LLM promptsand responses, protecting against various risks and ensuring responsible AIpractices. Whether you are deploying AI in Google Cloud or other cloud providers,Model Armor can help you prevent maliciousinput, verify content safety, protect sensitive data, maintain compliance, andenforce your AI safety and security policies consistently across your AI applications.
Architecture
This architecture diagram shows an application using Model Armorto protect an LLM and a user. The following steps explain the data flow:
- A user provides a prompt to the application.
- Model Armor inspects the incoming prompt for potentially sensitivecontent.
- The prompt (or sanitized prompt) is sent to the LLM.
- The LLM generates a response.
- Model Armor inspects the generated response for potentially sensitivecontent.
- The response (or sanitized response) is sent to the user.Model Armor sends a detailed description of triggered and untriggeredfilters in the response.
Model Armor filters both input (prompts) and output (responses)to prevent the LLM from exposure to or generation of malicious or sensitive content.
Use cases
Model Armor has several use cases, which include the following:
Security
- Mitigate the risk of leaking sensitive intellectualproperty (IP) and personally identifiable information (PII) in LLM promptsor responses.
- Protect against prompt injection and jailbreak attacks,preventing malicious actors from manipulating AI systems to performunintended actions.
- Scan text in PDFs for sensitive or malicious content.
Safety and responsible AI
- Prevent your chatbot from recommending competitor solutions, maintainingbrand integrity and customer loyalty.
- Organizations can filter social media posts generated by their AI applicationsthat contain harmful messaging, such as dangerous or hateful content.
Model Armor templates
Model Armor templates let you configure how Model Armorscreens prompts and responses. They function as sets of customized filters andthresholds for different safety and security confidence levels, allowing controlover what content is flagged.
The thresholds represent confidence levels—how confident Model Armoris that the prompt or response includes offending content. For example, youcan create a template that filters prompts for hateful content with aHIGHthreshold, meaning Model Armor reports high confidence that the promptcontains hateful content. ALOW_AND_ABOVE threshold indicates any level ofconfidence (LOW,MEDIUM, andHIGH) in making that claim.
For more information, seeModel Armor templates.
Model Armor confidence levels
You can set confidence levels for responsible AI safety categories (sexuallyexplicit, dangerous, harassment, and hate speech), prompt injection and jailbreakdetection, and sensitive data protection (including topicality).
Note: Confidence levels for Sensitive Data Protection operate differentlythan those for other filters. For more information about confidence levels forSensitive Data Protection, seeSensitive Data Protection match likelihood.For confidence levels that allow granular thresholds, Model Armorinterprets them as follows:
- High: Identify if the message has content with a high likelihood.
- Medium and above: Identify if the message has content with a medium or highlikelihood.
- Low and above: Identify if the message has content with a low, medium, or highlikelihood.
Model Armor filters
Model Armor offers a variety of filters to help you provide safe andsecure AI models. The following filter categories are available.
Responsible AI safety filter
You can screen prompts and responses at the aforementioned confidence levelsfor the following categories:
| Category | Definition |
|---|---|
| Hate Speech | Negative or harmful comments targeting identity and/or protected attributes. |
| Harassment | Threatening, intimidating, bullying, or abusive comments targeting another individual. |
| Sexually Explicit | Contains references to sexual acts or other lewd content. |
| Dangerous Content | Promotes or enables access to harmful goods, services, and activities. |
The child sexual abuse material (CSAM) filter is applied by default andcannot be turned off.
Prompt injection and jailbreak detection
Prompt injection is a security vulnerability where attackers craft specialcommands within the text input (the prompt) to trick an AI model. This canmake the AI ignore its usual instructions, reveal sensitive information, orperform actions it wasn't designed to do. Jailbreaking in the context of LLMsrefers to the act of bypassing the safety protocols and ethical guidelines thatare built into the model. This allows the LLM to generate responses that it wasoriginally designed to avoid, such as harmful, unethical, and dangerous content.
When prompt injection and jailbreak detection is enabled, Model Armorscans prompts and responses for malicious content. If detected,Model Armor blocks the prompt or response.
Sensitive Data Protection
Sensitive Data Protection is a Google Cloud service that helps you discover,classify, and de-identify sensitive data. Sensitive Data Protectioncan identify sensitive elements, context, and documents to help you reduce therisk of data leakage going into and out of AI workloads. You can useSensitive Data Protection directly within Model Armor to transform,tokenize, and redact sensitive elements while retaining non-sensitive context.Model Armor can accept existing inspection templates, which areconfigurations that act like blueprints to streamline the process ofscanning and identifying sensitive data specific to your business and complianceneeds. This way, you can have consistency and interoperability between otherworkloads that use Sensitive Data Protection.
Model Armor offers two modes for Sensitive Data Protectionconfiguration:
Basic configuration: In this mode, you configure Sensitive Data Protectionby specifying the types of sensitive data to scan for. This mode supportsthe following categories:
- Credit card number
- US social security number (SSN)
- Financial account number
- US individual taxpayer identification number (ITIN)
- Google Cloud credentials
- Google Cloud API key
Basic configuration only allows for inspection operations and does not supportthe use of Sensitive Data Protection templates. For more information, seeBasic Sensitive Data Protection configuration.
Advanced configuration: This mode offers moreflexibility and customization through Sensitive Data Protectiontemplates. Sensitive Data Protection templates are predefined configurationsthat allow you to specify more granular detection rules and de-identificationtechniques. Advanced configuration supports both inspection and de-identificationoperations.
Confidence levels for Sensitive Data Protection operate in a slightlydifferent way than confidence levels for other filters. For more informationabout confidence levels for Sensitive Data Protection, seeSensitive Data Protection match likelihood.For more information about Sensitive Data Protection in general, seeSensitive Data Protection overview.
Malicious URL detection
Malicious URLs are often disguised to look legitimate, making them a potent toolfor phishing attacks, malware distribution, and other online threats. Forexample, if a PDF contains an embedded malicious URL, it can be used tocompromise any downstream systems processing LLM outputs.
When malicious URL detection is enabled, Model Armor scans URLsto identify whether they're malicious. This lets you take action and preventmalicious URLs from being returned.
Note: Model Armor scans only the first 40 malicious URLs found inthe prompts and responses.Define the enforcement type
Enforcement defines what happens after a violation is detected. To configure howModel Armor handles detections, you set the enforcement type.Model Armor offers the following enforcement types:
- Inspect only: Model Armor inspects requests that violate theconfigured settings, but it doesn't block them.
- Inspect and block: Model Armor blocks requests that violatethe configured settings.
For more information, seeDefine the enforcement type for templates andDefine the enforcement type for floor settings.
To effectively useInspect only and gain valuable insights, enable Cloud Logging.Without Cloud Logging enabled,Inspect only won't yield any useful information.
Access your logs through Cloud Logging. Filter by the service namemodelarmor.googleapis.com. Look for entries related to the operations that youenabled in your template. For more information, seeView logs by using the Logs Explorer.
Model Armor floor settings
Although Model Armor templates provide flexibility for individualapplications, organizations often need to establish a baseline level ofprotection across all their AI applications. This is where Model Armorfloor settings are used. They act as rules that define minimum requirementsfor all templates created at an organization, folder, or project level in theGoogle Cloud resource hierarchy.
Note: Floor settings cannot enforce Sensitive Data Protection.For more information, seeModel Armor floor settings.
Language support
Model Armor filters support sanitizing prompts and responses across multiplelanguages.
- TheSensitive Data Protection filtersupports English and other languages depending on theinfoTypes that you selected.
Theresponsible AIandprompt injection and jailbreak detectionfilters are tested on the following languages:
- Chinese (Mandarin)
- English
- French
- German
- Italian
- Japanese
- Korean
- Portuguese
- Spanish
These filters can work in many other languages, but the quality of resultsmight vary. For language codes, seeSupported languages.
There are two ways to enable multi-language detection:
Enable on each request: For granular control, enable multi-language detectionon a per-request basis whensanitizing a user promptandsanitizing a model response.
Enable one-time: If you prefer a simpler setup, you can enablemulti-language detection as a one-time configuration at the Model Armortemplate level using the REST API. For more information, seeCreate a Model Armor template.
Document screening
Text in documents can include malicious and sensitive content. Model Armorcan screen the following types of documents for safety, prompt injection and jailbreakattempts, sensitive data, and malicious URLs:
- PDFs
- CSV
- Text files: TXT
- Microsoft Word documents: DOCX, DOCM, DOTX, DOTM
- Microsoft PowerPoint slides: PPTX, PPTM, POTX, POTM, POT
- Microsoft Excel sheets: XLSX, XLSM, XLTX, XLTM
Pricing
Model Armor can be purchased as an integrated part of Security Command Centeror as a standalone service. For pricing information, seeSecurity Command Center pricing.
Tokens
Generative AI models break down text and other data into units calledtokens.Model Armor uses the total number of tokens in AI prompts and responsesfor pricing purposes. Model Armor limits the number of tokens processedin each prompt and response. For token limits, seetoken limits.
What's next
- Learn aboutModel Armor templates.
- Learn aboutModel Armor floor settings.
- Learn aboutModel Armor endpoints.
- Sanitize prompts and responses.
- Learn aboutModel Armor audit logging.
- Troubleshoot Model Armor issues.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-12-15 UTC.