Model Armor overview

Model Armor is a Google Cloud service designed to enhance the securityand safety of your AI applications. It works by proactively screening LLM promptsand responses, protecting against various risks and ensuring responsible AIpractices. Whether you are deploying AI in Google Cloud or other cloud providers,Model Armor can help you prevent maliciousinput, verify content safety, protect sensitive data, maintain compliance, andenforce your AI safety and security policies consistently across your AI applications.

Architecture

Model Armor architecture This architecture diagram shows an application using Model Armorto protect an LLM and a user. The following steps explain the data flow:

  1. A user provides a prompt to the application.
  2. Model Armor inspects the incoming prompt for potentially sensitivecontent.
  3. The prompt (or sanitized prompt) is sent to the LLM.
  4. The LLM generates a response.
  5. Model Armor inspects the generated response for potentially sensitivecontent.
  6. The response (or sanitized response) is sent to the user.Model Armor sends a detailed description of triggered and untriggeredfilters in the response.

Model Armor filters both input (prompts) and output (responses)to prevent the LLM from exposure to or generation of malicious or sensitive content.

Use cases

Model Armor has several use cases, which include the following:

  • Security

    • Mitigate the risk of leaking sensitive intellectualproperty (IP) and personally identifiable information (PII) in LLM promptsor responses.
    • Protect against prompt injection and jailbreak attacks,preventing malicious actors from manipulating AI systems to performunintended actions.
    • Scan text in PDFs for sensitive or malicious content.
  • Safety and responsible AI

    • Prevent your chatbot from recommending competitor solutions, maintainingbrand integrity and customer loyalty.
    • Organizations can filter social media posts generated by their AI applicationsthat contain harmful messaging, such as dangerous or hateful content.

Model Armor templates

Model Armor templates let you configure how Model Armorscreens prompts and responses. They function as sets of customized filters andthresholds for different safety and security confidence levels, allowing controlover what content is flagged.

The thresholds represent confidence levels—how confident Model Armoris that the prompt or response includes offending content. For example, youcan create a template that filters prompts for hateful content with aHIGHthreshold, meaning Model Armor reports high confidence that the promptcontains hateful content. ALOW_AND_ABOVE threshold indicates any level ofconfidence (LOW,MEDIUM, andHIGH) in making that claim.

For more information, seeModel Armor templates.

Model Armor confidence levels

You can set confidence levels for responsible AI safety categories (sexuallyexplicit, dangerous, harassment, and hate speech), prompt injection and jailbreakdetection, and sensitive data protection (including topicality).

Note: Confidence levels for Sensitive Data Protection operate differentlythan those for other filters. For more information about confidence levels forSensitive Data Protection, seeSensitive Data Protection match likelihood.

For confidence levels that allow granular thresholds, Model Armorinterprets them as follows:

  • High: Identify if the message has content with a high likelihood.
  • Medium and above: Identify if the message has content with a medium or highlikelihood.
  • Low and above: Identify if the message has content with a low, medium, or highlikelihood.
Note: You can set confidence levels only forprompt injection and jailbreak detection andresponsible AI safety filters.

Model Armor filters

Model Armor offers a variety of filters to help you provide safe andsecure AI models. The following filter categories are available.

Responsible AI safety filter

You can screen prompts and responses at the aforementioned confidence levelsfor the following categories:

CategoryDefinition
Hate SpeechNegative or harmful comments targeting identity and/or protected attributes.
HarassmentThreatening, intimidating, bullying, or abusive comments targeting another individual.
Sexually ExplicitContains references to sexual acts or other lewd content.
Dangerous ContentPromotes or enables access to harmful goods, services, and activities.

The child sexual abuse material (CSAM) filter is applied by default andcannot be turned off.

Prompt injection and jailbreak detection

Prompt injection is a security vulnerability where attackers craft specialcommands within the text input (the prompt) to trick an AI model. This canmake the AI ignore its usual instructions, reveal sensitive information, orperform actions it wasn't designed to do. Jailbreaking in the context of LLMsrefers to the act of bypassing the safety protocols and ethical guidelines thatare built into the model. This allows the LLM to generate responses that it wasoriginally designed to avoid, such as harmful, unethical, and dangerous content.

When prompt injection and jailbreak detection is enabled, Model Armorscans prompts and responses for malicious content. If detected,Model Armor blocks the prompt or response.

Sensitive Data Protection

Sensitive Data Protection is a Google Cloud service that helps you discover,classify, and de-identify sensitive data. Sensitive Data Protectioncan identify sensitive elements, context, and documents to help you reduce therisk of data leakage going into and out of AI workloads. You can useSensitive Data Protection directly within Model Armor to transform,tokenize, and redact sensitive elements while retaining non-sensitive context.Model Armor can accept existing inspection templates, which areconfigurations that act like blueprints to streamline the process ofscanning and identifying sensitive data specific to your business and complianceneeds. This way, you can have consistency and interoperability between otherworkloads that use Sensitive Data Protection.

Model Armor offers two modes for Sensitive Data Protectionconfiguration:

  • Basic configuration: In this mode, you configure Sensitive Data Protectionby specifying the types of sensitive data to scan for. This mode supportsthe following categories:

    • Credit card number
    • US social security number (SSN)
    • Financial account number
    • US individual taxpayer identification number (ITIN)
    • Google Cloud credentials
    • Google Cloud API key

    Basic configuration only allows for inspection operations and does not supportthe use of Sensitive Data Protection templates. For more information, seeBasic Sensitive Data Protection configuration.

  • Advanced configuration: This mode offers moreflexibility and customization through Sensitive Data Protectiontemplates. Sensitive Data Protection templates are predefined configurationsthat allow you to specify more granular detection rules and de-identificationtechniques. Advanced configuration supports both inspection and de-identificationoperations.

Confidence levels for Sensitive Data Protection operate in a slightlydifferent way than confidence levels for other filters. For more informationabout confidence levels for Sensitive Data Protection, seeSensitive Data Protection match likelihood.For more information about Sensitive Data Protection in general, seeSensitive Data Protection overview.

Malicious URL detection

Malicious URLs are often disguised to look legitimate, making them a potent toolfor phishing attacks, malware distribution, and other online threats. Forexample, if a PDF contains an embedded malicious URL, it can be used tocompromise any downstream systems processing LLM outputs.

When malicious URL detection is enabled, Model Armor scans URLsto identify whether they're malicious. This lets you take action and preventmalicious URLs from being returned.

Note: Model Armor scans only the first 40 malicious URLs found inthe prompts and responses.

Define the enforcement type

Enforcement defines what happens after a violation is detected. To configure howModel Armor handles detections, you set the enforcement type.Model Armor offers the following enforcement types:

  • Inspect only: Model Armor inspects requests that violate theconfigured settings, but it doesn't block them.
  • Inspect and block: Model Armor blocks requests that violatethe configured settings.

For more information, seeDefine the enforcement type for templates andDefine the enforcement type for floor settings.

To effectively useInspect only and gain valuable insights, enable Cloud Logging.Without Cloud Logging enabled,Inspect only won't yield any useful information.

Access your logs through Cloud Logging. Filter by the service namemodelarmor.googleapis.com. Look for entries related to the operations that youenabled in your template. For more information, seeView logs by using the Logs Explorer.

Model Armor floor settings

Although Model Armor templates provide flexibility for individualapplications, organizations often need to establish a baseline level ofprotection across all their AI applications. This is where Model Armorfloor settings are used. They act as rules that define minimum requirementsfor all templates created at an organization, folder, or project level in theGoogle Cloud resource hierarchy.

Note: Floor settings cannot enforce Sensitive Data Protection.

For more information, seeModel Armor floor settings.

Language support

Model Armor filters support sanitizing prompts and responses across multiplelanguages.

There are two ways to enable multi-language detection:

Document screening

Text in documents can include malicious and sensitive content. Model Armorcan screen the following types of documents for safety, prompt injection and jailbreakattempts, sensitive data, and malicious URLs:

  • PDFs
  • CSV
  • Text files: TXT
  • Microsoft Word documents: DOCX, DOCM, DOTX, DOTM
  • Microsoft PowerPoint slides: PPTX, PPTM, POTX, POTM, POT
  • Microsoft Excel sheets: XLSX, XLSM, XLTX, XLTM
Note: Input size is limited to 4 MB for both files and text. Model Armorskips the files or text exceeding this limit.Model Armor rejects requests to scan rich text format files thatare 50 bytes or less in size, because such files are highly likely to be invalid.

Pricing

Model Armor can be purchased as an integrated part of Security Command Centeror as a standalone service. For pricing information, seeSecurity Command Center pricing.

Tokens

Generative AI models break down text and other data into units calledtokens.Model Armor uses the total number of tokens in AI prompts and responsesfor pricing purposes. Model Armor limits the number of tokens processedin each prompt and response. For token limits, seetoken limits.

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.