Provisioned Throughput overview

This page explains what Provisioned Throughput is and when to use Provisioned Throughput.

Introduction to Provisioned Throughput

Provisioned Throughput is a fixed-cost, fixed-term subscriptionavailable in several term-lengths that reserves throughput forsupported generative AI models on Vertex AI.To reserve your throughput, you must specify the model andavailablelocations in which the modelruns.

When to use Provisioned Throughput

If any of the following considerations apply to your use case, consider usingProvisioned Throughput:

  • You are building real-time generative AI production applications, such aschatbots and agents.
  • Your critical workloads consistently require high throughput. Throughputmeasurement depends on the model.
  • You want to provide a consistent and predictable experience for users of yourapplications.
  • You want deterministic generative AI costs by paying a fixed monthly or weeklyprice with control of overages.

Provisioned Throughput is one of two ways to consume yourgenerative AI models. The second way is pay-as-you-go, which is also referred toason-demand.

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-19 UTC.