Purchase Provisioned Throughput

This page provides details to consider before subscribing toProvisioned Throughput, the permissions you must have to place or toview a Provisioned Throughput order, and the instructions forplacing and viewing your orders for standard Provisioned Throughput.

If you want to purchase Single Zone Provisioned Throughput,contact your Google Cloud account representative for assistance.For more information about Single Zone Provisioned Throughput, SeeSingle Zone Provisioned Throughput.

What to consider before purchasing

To help you decide whether you want to purchaseProvisioned Throughput, consider the following:

  • You can't cancel your order in the middle of your term.

    Your Provisioned Throughput purchase is a commitment, whichmeans that you can't cancel the order in the middle of your term. However, youcan increase the number of purchased GSUs. If you accidentally purchase acommitment or there's a problem with your configuration,contact yourGoogle Cloud account representative for assistance.

  • You can auto-renew your subscription.

    When you submit your order, you can choose to auto-renew your subscription atthe end of its term, or let the subscription expire. You can also modify theauto renewal behavior of your order under specific circumstances. To learnabout scenarios where you can't modify an order, seeWhen you can't change an order.

    You can configure monthly subscriptions to renew automatically each month.Weekly terms don't support automatic renewal.

    For more information, seeChange Provisioned Throughput order. You can alsocontact your Google Cloud account representative for assistance.

  • You can change your auto-renewal behavior, model, model version, or region with notice.

    After you've chosen your project, region, model, model version, andauto-renewal behavior and your order is approved and activated,Provisioned Throughput is enabled, subject to available capacity.You can change your auto-renewal behavior, model, model version, or region bymodifying your existing Provisioned Throughput orderusing the Google Cloud console.

    All changes are processed on a best-effort basis and are typicallyfulfilled within 10 business days of the initial request.

    Model changes are limited to a specific publisher. For example, you canswitch the model assignment of Provisioned Throughput from GoogleGemini 2.0 Pro to GoogleGemini 2.0 Flash, but you can't switch from GoogleGemini 2.0 Flash to Anthropic's Claude 3.5 Sonnet v2.

  • By default, the overage is billed as pay-as-you-go.

    If your throughput exceeds your Provisioned Throughput orderamount, overages are processed and billed as standard pay-as-you-go. You cancontrol overages on a per-request basis. For more information, seeUse Provisioned Throughput.

For information about pricing, seeProvisioned Throughput.

Purchase Provisioned Throughput for preview models

Preview

This product or feature is subject to the "Pre-GA Offerings Terms" in the General Service Terms section of theService Specific Terms. Pre-GA products and features are available "as is" and might have limited support. For more information, see thelaunch stage descriptions.

You can purchase Provisioned Throughput for Google models inpreview, provided that a generally available version of the model hasn't beenreleased.

If you have an active Provisioned Throughput order for a previewmodel and a generally available version of the model is released, then you cando either of the following:

  • Move the order to the generally available version of the model. Note thatafter you move your order to the generally available model, you can't switchyour order back to the preview model. For more information about changing anorder, seeChange Provisioned Throughput order.

  • Alternatively, continue using Provisioned Throughput for thepreview version of a model as long as the preview version is stable. For moreinformation about stable and retired models, seeModel versions and lifecycle.

Roles and permissions

The following role grants full access to manage Vertex AI Provisioned Throughput:

  • roles/aiplatform.provisionedThroughputAdmin: You can accessVertex AI Provisioned Throughput resources.

This role includes the following permissions:

PermissionsDescription
aiplatform.provisionedThroughputs.createSubmit a new Provisioned Throughput order.
aiplatform.provisionedThroughputs.getView a specific Provisioned Throughput order.
aiplatform.provisionedThroughputs.listView all Provisioned Throughput orders.
aiplatform.provisionedThroughputs.updateModify a Provisioned Throughput order.
aiplatform.provisionedThroughputs.cancelCancel a pending order or pending update.

Place a standard Provisioned Throughput order

If you expect your QPM to exceed 30,000, then to maximize yourProvisioned Throughput order,request a quota adjustment for yourdefault Vertex AI system quota using the following information:

  • Service: The Vertex AI API.
  • Name:Online prediction requests per minute per region
  • Service type: A quota.
  • Dimensions: The region where you ordered Provisioned Throughput.
  • Value: This is your chosen online-prediction traffic limit.

Provisioned Throughput orders are processed based on the size ofthe order and the available capacity. Depending on the number of GSUs requestedand the available capacity, it might take from a few minutes to a few weeks toprocess your order. While placing a Provisioned Throughput order,you can use the Generative AI scale unit estimator tool to calculate thenumber of GSUs that you need to purchase. After reviewing the estimate, you caneither proceed with it, or modify the number of GSUs to purchase.

Follow these steps to purchase standard Provisioned Throughput.For assistance with purchasing Single Zone Provisioned Throughput,contact your Google Cloud account representative.

Console

  1. In the Google Cloud console, go to theProvisioned Throughput page.

    Go to Provisioned Throughput

  2. To start a new order, clickNew order.
  3. Enter anOrder name.
  4. Select theModel.
  5. Select theRegion.
  6. ClickEstimation tool.
  7. In theGenerative AI scale unit estimation tool pane, perform thefollowing steps to estimate the number of GSUs that you need.

    1. Select yourModel.
    2. Based on the selected model, enter the details to estimate the numberof GSUs needed. For information about the GSU minimum and purchaseincrements for each model, seeSupported models.For information about a model's capabilities and input or output limits,see the documentation for the model.

      • For theGemini 3 Pro,Gemini 2.5 Pro,Gemini 2.5 Flash, andGemini 2.5 Flash-Litemodels, enter the following:

        • Estimated queries per second requiring assurance
        • Input tokens per query
        • Input image tokens per query
        • Input video tokens per query
        • Input audio tokens per query
        • Average cache hit %
        • Output response text tokens per query
        • Output reasoning text tokens per query
      • For theGemini 3 Pro Imagemodel, enter the following:

        • Estimated queries per second requiring assurance
        • Input tokens per query
        • Input image tokens per query
        • Output response text tokens per query
        • Output reasoning text tokens per query
        • Output image tokens per query
      • For theGemini 2.5 Flash Imagemodel, enter the following:

        • Estimated queries per second requiring assurance
        • Input tokens per query
        • Input image tokens per query
        • Output response text tokens per query
        • Output image tokens per query
      • For theGemini 2.5 Flash with Gemini Live API native audio)model, enter the following:

        • Estimated queries per second requiring assurance
        • Input tokens per query
        • Input image tokens per query
        • Input video tokens per query
        • Input audio tokens per query
        • Session memory (cached) tokens per query
        • Output response text tokens per query
        • Output audio tokens per query
      • For the Gemini 2.5 Flash with Gemini Live API, enter the following:

        • Estimated queries per second requiring assurance
        • Input tokens per query
        • Input image tokens per query
        • Input video tokens per query
        • Input audio tokens per query
        • Session memory (cached) tokens per query
        • Output response text tokens per query
        • Output audio tokens per query
      Note: For information about access to this release, see theaccess request page.For more information about using Provisioned Throughputfor Gemini 2.5 Flash with Gemini Live API, seeProvisioned Throughput for Gemini Live API.
      • For theGemini 2.0 FlashandGemini 2.0 Flash-Lite models, enter the following:

        • Estimated queries per second requiring assurance
        • Input tokens per query
        • Input image tokens per query
        • Input video tokens per query
        • Input audio tokens per query
        • Output text tokens per query
      • For theVeo 3andVeo 3 Fastmodels, enter the following:

        • Frequency—Specify how often outputs are generated, in seconds.This isn't the latency.
        • Output video seconds per query—Enter the totalrequested video seconds. For example, 12 seconds representsthe sum of3x4 or2x6 video seconds.
        • Output video+audio seconds per query—Enter the totalrequested video and audio seconds. For example, 12 secondsrepresents the sum of3x4 or2x6 video and audio seconds.
      • For Imagen models, enter the following:

        • Queries per second
        • Output images per query
      • For open models, enter the following:

        • Estimated queries per second requiring assurance
        • Input tokens per query
        • Output response text tokens per query
    3. In theEstimated GSUs and monthly prices section, review theestimated number of GSUs that you need and the prices.

  8. ClickUse calculated.

  9. Optional: Modify theNumber of generative AI scale units (GSUs) per month.

  10. Select yourTerm. Note that term fees are not cancelable for theduration of the term and will apply regardless of actual usage or if themodel is discontinued. Google recommendschanging your assigned modelprior to itsdiscontinuation date.Google won't proactively cancel auto-renewal for discontinued models.

    The following options are available:

    • 1 week (available only for Google models)
    • 1 month
    • 3 months
    • 1 year
  11. Optional: Select theStart date and time for your term (Preview).

    You can provide a start date and time within two weeks into the future from whenyou place the order. If you don't specify a start date and time, then theorder is processed as soon as the capacity is available. Requested startdates and times are processed on a best-effort basis, and orders aren'tguaranteed to be fulfilled by these dates until the order status is set toApproved.

    If your requested start date is too close to the current date, your ordermight be approved and activated after your requested start date. In thiscase, the end date is adjusted, based on the duration of the selectedterm, starting from the activation date. For information about cancellinga pending order, seeChange Provisioned Throughput order.

    Note: You can schedule a future start date and time only for Google models. Scheduling a start date and time isn't available for open models.
  12. In theRenewal list, specify whether you want to automatically renewthe order at the end of the term. You can specify the renewal option onlyif you select1 month,3 months, or1 year as the term.

  13. ClickContinue.

  14. In theConfirm and submit section, review the price and throughput estimates foryour order. Read the terms listed and linked in the form.

  15. To finalize and submit your order, enterCONFIRM in thePurchase confirmationfield, then clickSubmit order.

    It can take from a few minutes to a few weeks to process an order,depending on the order size and the available capacity. After the order isprocessed, its status in the Google Cloud console changes toActive. You're billed for the order only after it becomes active.

Change a standard Provisioned Throughput order

Preview

This feature is subject to the "Pre-GA Offerings Terms" in the General Service Terms section of theService Specific Terms. Pre-GA features are available "as is" and might have limited support. For more information, see thelaunch stage descriptions.

This table describes how you can modify your Provisioned Throughputorders through theGoogle Cloud console based on thestatus of your order and any existing conditions. Modifying your orders is aPreview feature and is only available for online orders placed throughthe console. For changes to offline orders, contact yourGoogle Cloud accountrepresentative for assistance.

Also, changes made when using the Google Cloud console to your model or modelversion modifies the existing order while keeping the same subscription enddate.

To change a Provisioned Throughput order for an open model,contact your Google Cloud account representativefor assistance. You can't change Provisioned Throughput orders forGoogle models to open models.

Order statusActionNoteSteps in Google Cloud console
Pending reviewYou can cancel your order.

If you have additional changes to your order, then cancel the pending order, and place a new order.

If you have multiple models, each model can have only one pending order revision or pending order at a time.

To cancel your pending order in the Google Cloud console, do the following:
  1. Go to theProvisioned Throughput page.
  2. Select theRegion where your pending order is located.
  3. To go to theOrder details page, click theOrder ID for the order that you want to cancel.
  4. ClickCancel.
  5. In theAre you sure you want to cancel the order? dialog, clickCancel Order.
ApprovedYou can't modify your order.The order is awaiting activation. You can't make changes to your order at this time.Not applicable
Active

You can make the following changes only if the order doesn't expire in the next five days or renews automatically:

  • Increase GSUs on existing orders. An increase in GSUs is applied immediately upon approval, regardless of the auto-renewal schedule.
  • Decrease GSUs on existing orders. A decrease in GSUs is applied during auto-renewal for the next term.
  • Enable or disable automatic renewals.
  • Change the model or model version.
  • Change the region.
You can't change an active order if it expires in less than five days and isn't set up to renew automatically.To change your active order in the Google Cloud console, use one of the following methods:
  • In theProvisioned Throughput page, click the symbol from theActions column, and clickEdit.
  • In theOrder details page, click theEdit button.

When you can't change an order

You can't change or cancel a Provisioned Throughput orderfrom the Google Cloud console if any of the following conditions apply:

  • Model limitations: The order is for a model that doesn't supportmodifications.

  • Order status: The order status isn'tActive.

  • Expiring in less than five days: The order status isActive, butit expires in less than five days and isn't configured for auto-renewal.

  • Existing order change request: There's an existing order change requestthat's either pending or approved.

Check order status

After you submit your Provisioned Throughput order, the order status mightappear as one of the following:

  • Pending review: You placed your order. Because approval depends onavailable capacity to provision your order, your order is waiting for reviewand approval. For more information about the status of your pending order,contact your Google Cloud account representative.
  • Approved: Google has approved your order and the order is awaitingactivation. You can't make changes after the order is approved.
  • Active: Google has activated your order, and then billing starts.
  • Expired: Your order has expired.

View standard Provisioned Throughput orders

Follow these steps to view your Provisioned Throughput orders:

Console

  1. In the Google Cloud console, go to the Provisioned Throughput page.

    Go to Provisioned Throughput

  2. Select theRegion. Your list of orders appears.

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-19 UTC.