Prompt caching

The Anthropic Claude models offer prompt caching to reduce latency and costswhen reusing the same content in multiple requests. When you send a query, youcan cache all or specific parts of your input so that subsequent queries canuse the cached results from the previous request. This avoids additional computeand network costs. Caches are unique to your Google Cloud project andcannot be used by other projects.

For details about how to structure your prompts, see the AnthropicPromptcaching documentation.

Data processing

Anthropic explicit prompt caching is a feature of Anthropic Claude models. TheVertex AI offering of these Anthropic models behaves as described intheAnthropic documentation.

Prompt caching is an optional feature. Claude computes the hashes (fingerprints)of requests for caching keys. These hashes are only computed for requests thathave caching enabled.

Although prompt caching is a feature implemented by the Claude models, from adata handling perspective, Google considers these hashes to be a type of "UserMetadata". They are treated as customer "Service Data" under theGoogle CloudPrivacy Notice and not as"Customer Data" under theCloud Data Processing Addendum (Customers).In particular, additional protections for "Customer Data" don't apply to thesehashes. Google does not use these hashes for any other purpose.

If you want to completely disable this prompt caching feature and make itunavailable in particular Google Cloud projects, you can request this bycontactingcustomer supportand providing the relevant project numbers. After explicit caching is disabledfor a project, requests from the project with prompt caching enabled arerejected.

Use prompt caching

You can use theAnthropic Claude SDK orthe Vertex AI REST API to send requests to the Vertex AI endpoint.

For more information, seeHow prompt caching works.

For additional examples, see thePrompt caching examples inthe Anthropic documentation.

Caching automatically occurs when subsequent requests contain the identicaltext, images, andcache_control parameter as the first request. All requestsmust also include thecache_control parameter in the same blocks.

By default, the cache has a five-minute lifetime or time to live (TTL). You canextend the TTL to one hour by setting"ttl": "1h" within thecache_controlobject. The cache lifetime is refreshed each time the cached content isaccessed. For more information, see1-hour cache duration.

The one-hour TTL isn't supported for the following models:Claude 3.7 Sonnet, Claude 3.5 Sonnet v2,Claude 3.5 Sonnet, and Claude 3 Opus.

Pricing

Prompt caching can affect billing costs. Note that:

  • Cache write tokens with a five-minute lifetime are 25% more expensive than base input tokens.
  • Cache write tokens with a one-hour lifetime are 100% more expensive than base input tokens.
  • Cache read tokens are 90% cheaper than base input tokens.
  • Regular input and output tokens are priced at standard rates.

For more information, see thePricingpage.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.