Model Armor overview

Model Armor is a Google Cloud service designed to enhance thesecurity and safety of your AI applications. It works by proactively screeningLLM prompts and responses, protecting against various risks and ensuringresponsible AI practices. Whether you are deploying AI in Google Cloud or othercloud providers, Model Armor can help you prevent maliciousinput, verify content safety, protect sensitive data, maintain compliance, andenforce your AI safety and security policies consistently across your AIapplications.

Architecture

Architecture diagram showing the data flow in Model Armor

This architecture diagram shows an application using Model Armorto protect an LLM and a user. The following steps explain the data flow:

A user provides a prompt to the application.
Model Armor inspects the incoming prompt for potentiallysensitive content.
The prompt (or sanitized prompt) is sent to the LLM.
The LLM generates a response.
Model Armor inspects the generated response for potentiallysensitive content.
The response (or sanitized response) is sent to the user.Model Armor sends a detailed description of triggered anduntriggered filters in the response.

Model Armor filters both input (prompts) and output (responses)to prevent the LLM from exposure to or generation of malicious or sensitivecontent.

Use cases

Model Armor has several use cases, which include the following:

Security
- Mitigate the risk of leaking sensitive intellectual property (IP) andpersonally identifiable information (PII) in LLM prompts or responses.
- Protect against prompt injection and jailbreak attacks, preventingmalicious actors from manipulating AI systems to perform unintendedactions.
- Scan text in PDFs for sensitive or malicious content.
Safety and responsible AI
- Prevent your chatbot from recommending competitor solutions, maintainingbrand integrity and customer loyalty.
- Filter social media posts generated by AI applications that containharmful messaging, such as dangerous or hateful content.

Model Armor templates

Model Armor templates let you configure howModel Armor screens prompts and responses. They function as setsof customized filters and thresholds for different safety and securityconfidence levels, allowing control over what content is flagged.

The thresholds represent confidence levels—how confidentModel Armor is that the prompt or response includes offendingcontent. For example, you can create a template that filters prompts for hatefulcontent with aHIGH threshold, meaning Model Armor reports highconfidence that the prompt contains hateful content. ALOW_AND_ABOVE thresholdindicates any level of confidence (LOW,MEDIUM, andHIGH) in making thatclaim.

For more information, seeModel Armortemplates.

Model Armor confidence levels

You can set confidence levels for responsible AI safety categories (sexuallyexplicit, dangerous, harassment, and hate speech), prompt injection andjailbreak detection, and sensitive data protection (including topicality).

Note: Confidence levels for Sensitive Data Protection operate differentlythan those for other filters. For more information about confidence levels forSensitive Data Protection, see Sensitive Data Protection matchlikelihood.

For confidence levels that support granular thresholds,Model Armor interprets them as follows:

High: Identify if the message has content with a high likelihood.
Medium and above: Identify if the message has content with a medium orhigh likelihood.
Low and above: Identify if the message has content with a low, medium,or high likelihood.

Note: You can set confidence levels only for prompt injection and jailbreakdetection andresponsible AI safetyfilters.

Model Armor filters

Model Armor offers a variety of filters to help you provide safeand secure AI models. The following filter categories are available.

Responsible AI safety filter

You can screen prompts and responses at the specified confidence levels for thefollowing categories:

Category	Definition
Hate Speech	Negative or harmful comments targeting identity and/or protected attributes.
Harassment	Threatening, intimidating, bullying, or abusive comments targeting another individual.
Sexually Explicit	Contains references to sexual acts or other lewd content.
Dangerous Content	Promotes or enables access to harmful goods, services, and activities.
CSAM	Contains references to child sexual abuse material (CSAM). This filter is applied by default and cannot be turned off.

Prompt injection and jailbreak detection

Prompt injection is a security vulnerability where attackers craft specialcommands within the text input (the prompt) to trick an AI model. This can makethe AI ignore its usual instructions, reveal sensitive information, or performactions it wasn't designed to do. Jailbreaking in the context of LLMs refers tothe act of bypassing the safety protocols and ethical guidelines that are builtinto the model. This lets the LLM generate responses that it was originallydesigned to avoid, such as harmful, unethical, and dangerous content.

When prompt injection and jailbreak detection is enabled,Model Armor scans prompts and responses for malicious content. Ifdetected, Model Armor blocks the prompt or response.

Sensitive Data Protection

Sensitive Data Protection is a Google Cloud service that helps you discover,classify, and de-identify sensitive data. Sensitive Data Protectioncan identify sensitive elements, context, and documents to help you reduce therisk of data leakage going into and out of AI workloads. You can useSensitive Data Protection directly within Model Armor totransform, tokenize, and redact sensitive elements while retaining non-sensitivecontext. Model Armor can accept existing inspection templates,which function as blueprints to streamline the process of scanning andidentifying sensitive data specific to your business and compliance needs. Thisensures consistency and interoperability between other workloads that useSensitive Data Protection.

Model Armor offers two modes for Sensitive Data Protectionconfiguration:

Basic configuration: In this mode, you configureSensitive Data Protection by specifying the types of sensitive data toscan for. This mode supports the following categories:
- Credit card number
- US social security number (SSN)
- Financial account number
- US individual taxpayer identification number (ITIN)
- Google Cloud credentials
- Google Cloud API key
Basic configuration only supports inspection operations and does not supportthe use of Sensitive Data Protection templates. For more information, seeBasic Sensitive Data Protection configuration.
Advanced configuration: This mode offers more flexibility andcustomization through Sensitive Data Protection templates.Sensitive Data Protection templates are predefined configurations thatlet you specify more granular detection rules and de-identificationtechniques. Advanced configuration supports both inspection andde-identification operations.

Confidence levels for Sensitive Data Protection operate differently thanconfidence levels for other filters. For more information about confidencelevels for Sensitive Data Protection, seeSensitive Data Protection matchlikelihood. For more informationabout Sensitive Data Protection in general, seeSensitive Data Protectionoverview.

Malicious URL detection

Malicious URLs are often disguised to look legitimate, making them a potent toolfor phishing attacks, malware distribution, and other online threats. Forexample, if a PDF contains an embedded malicious URL, it can be used tocompromise any downstream systems processing LLM outputs.

When malicious URL detection is enabled, Model Armor scans URLsto identify whether they're malicious. This lets you take action and preventmalicious URLs from being returned.

Note: Model Armor scans only the first 40 malicious URLs found inprompts and responses.

Define the enforcement type

Enforcement defines what happens after a violation is detected. To configure howModel Armor handles detections, you set the enforcement type.Model Armor offers the following enforcement types:

Inspect only: Model Armor inspects requests that violatethe configured settings, but it doesn't block them.
Inspect and block: Model Armor blocks requests thatviolate the configured settings.

For more information, see Define the enforcement type for templates andDefine theenforcement type for floorsettings.

To effectively useInspect only and gain valuable insights, enableCloud Logging. Without Cloud Logging enabled,Inspect only won't yieldany useful information.

Access your logs through Cloud Logging. Filter by the service namemodelarmor.googleapis.com. Look for entries related to the operations that youenabled in your template. For more information, seeView logs by using the LogsExplorer.

Model Armor floor settings

Although Model Armor templates provide flexibility for individualapplications, organizations often need to establish a baseline level ofprotection across all their AI applications. You use Model Armorfloor settings to establish this baseline. They define minimum requirements forall templates created at the project level in the Google Cloud resourcehierarchy.

For more information, seeModel Armor floor settings.

Language support

Model Armor filters support sanitizing prompts and responsesacross multiple languages.

TheSensitive Data Protection filtersupports English and other languages depending on theinfoTypes that youselected.
Theresponsible AIandprompt injection and jailbreakdetection filters are tested onthe following languages:
- Chinese (Mandarin)
- English
- French
- German
- Italian
- Japanese
- Korean
- Portuguese
- Spanish
These filters can work in many other languages, but the quality of results might vary. For language codes, seeSupported languages.

There are two ways to enable multi-language detection:

Enable on each request: For granular control, enable multi-languagedetection on a per-request basis whensanitizing a user prompt andsanitizing a modelresponse.
Enable one-time: If you prefer a simpler setup, you can enablemulti-language detection as a one-time configuration at theModel Armor template level using the REST API. For moreinformation, seeCreate a Model Armor template.

Document screening

Text in documents can include malicious and sensitive content.Model Armor can screen the following types of documents forsafety, prompt injection and jailbreak attempts, sensitive data, and maliciousURLs:

PDFs
CSV
Text files: TXT
Microsoft Word documents: DOCX, DOCM, DOTX, DOTM
Microsoft PowerPoint slides: PPTX, PPTM, POTX, POTM, POT
Microsoft Excel sheets: XLSX, XLSM, XLTX, XLTM

Note: Input size is limited to 4 MB for files and text.Model Armor skips the files or text exceeding this limit.Model Armor rejects requests to scan rich text format files thatare 50 bytes or less in size, because such files are highly likely to beinvalid.

Pricing

Model Armor can be purchased as an integrated part ofSecurity Command Center or as a standalone service. For pricing information, see Security Command Centerpricing.

Tokens

Generative AI models break down text and other data into units calledtokens.Model Armor uses the total number of tokens in AI prompts andresponses for pricing purposes. Model Armor limits the number oftokens processed in each prompt and response. For token limits, seetokenlimits.

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-19 UTC.

Movatterモバイル変換

Model Armor overview Stay organized with collections Save and categorize content based on your preferences.