Vertex AI partner models for MaaS

Vertex AI supports a curated list of models developed by Google partners.Partner models can be used withVertex AI as a model as aservice (MaaS) and are offered as a managed API. When you use a partner model,you continue to send your requests to Vertex AI endpoints. Partner modelsare serverless so there's no need to provision or manage infrastructure.

Partner models can be discovered using Model Garden. You can alsodeploy models using Model Garden. For more information, seeExplore AImodels inModel Garden.While information about each available partner model can be found on its modelcard in Model Garden, only third-party models that perform as aMaaS with Vertex AI are documented in this guide.

Anthropic's Claude and Mistral models are examples of third-party managed modelsthat are available to use on Vertex AI.

Partner models

The following partner models are offered as managed APIs on Vertex AIModel Garden (MaaS):

Model nameModalityDescriptionQuickstart
Claude Sonnet 4.6Language, VisionClaude Sonnet 4.6 delivers frontier intelligence at scale—built for coding, agents, and enterprise workflows.Model card
Claude Opus 4.6Language, VisionThe next generation of Anthropic's most intelligent model, Claude Opus 4.6 is an industry leader across coding, agents, computer use, and enterprise workflows.Model card
Claude Opus 4.5Language, VisionThe next generation of Anthropic's most intelligent model, Claude Opus 4.5 is an industry leader across coding, agents, computer use, and enterprise workflows.Model card
Claude Sonnet 4.5Language, VisionAnthropic's mid-sized model for powering real-world agents, with capabilities in coding, computer use, cybersecurity, and working with office files like spreadsheets.Model card
Claude Opus 4.1Language, VisionAn industry leader for coding. It delivers sustained performance on long-running tasks that require focused effort and thousands of steps, significantly expanding what AI agents can solve. Ideal for powering frontier agent products and features.Model card
Claude Haiku 4.5Language, VisionClaude Haiku 4.5 delivers near-frontier performance for a wide range of use cases, and stands out as one of the best coding models in the world–with the right speed and cost to power free products and high-volume user experiences.Model card
Claude Opus 4Language, VisionClaude Opus 4 delivers sustained performance on long-running tasks that require focused effort and thousands of steps, significantly expanding what AI agents can solve.Model card
Claude Sonnet 4Language, VisionAnthropic's mid-size model with superior intelligence for high-volume uses, such as coding, in-depth research, and agents.Model card
Anthropic's Claude 3.5 Sonnet v2Language, VisionThe upgraded Claude 3.5 Sonnet is a state-of-the-art model for real-world software engineering tasks and agentic capabilities. Claude 3.5 Sonnet delivers these advancements at the same price and speed as its predecessor.Model card
Anthropic's Claude 3 HaikuLanguageAnthropic's fastest vision and text model for near-instant responses to basic queries, meant for seamless AI experiences mimicking human interactions.Model card
Anthropic's Claude 3.5 SonnetLanguageClaude 3.5 Sonnet outperforms Anthropic's Claude 3 Opus on a wide range of Anthropic's evaluations with the speed and cost of Anthropic's mid-tier model, Claude 3 Sonnet.Model card
Jamba 1.5 Large (Preview)LanguageAI21 Labs's Jamba 1.5 Large is designed for superior quality responses, high throughput, and competitive pricing compared to other models in its size class.Model card
Jamba 1.5 Mini (Preview)LanguageAI21 Labs's Jamba 1.5 Mini is well balanced across quality, throughput, and low cost.Model card
Mistral Medium 3LanguageMistral Medium 3 is a versatile model designed for a wide range of tasks, including programming, mathematical reasoning, understanding long documents, summarization, and dialogue.Model card
Mistral OCR (25.05)Language, VisionMistral OCR (25.05) is an Optical Character Recognition API for document understanding. The model comprehends each element of documents such as media, text, tables, and equations.Model card
Mistral Small 3.1 (25.03)LanguageMistral Small 3.1 (25.03) is the latest version of Mistral's Small model, featuring multimodal capabilities and extended context length.Model card
Codestral 2Language, CodeCodestral 2 is Mistral's code generation specialized model built specifically for high-precision fill-in-the-middle (FIM) completion that helps developers write and interact with code through a shared instruction and completion API endpoint.Model card

Vertex AI partner model pricing with capacity assurance

Google offers provisioned throughput for some partner models that reservesthroughput capacity for your models for a fixed fee. You decide on thethroughput capacity and in which regions to reserve that capacity. Becauseprovisioned throughput requests are prioritized over the standard pay-as-you-gorequests, provisioned throughput provides increased availability. When thesystem is overloaded, your requests can still be completed as long as thethroughput remains under your reserved throughput capacity. For more informationor to subscribe to the service,contact sales.

Regional and global endpoints

For regional endpoints, requests are served from your specified region. In caseswhere you have data residency requirements or if a model doesn't support theglobal endpoint, use the regional endpoints.

When you use the global endpoint, Google can process and serve your requestsfrom any region that is supported by the model that you are using, which mightresult in higher latency in some cases. The global endpoint helps improveoverall availability and helps reduce errors.

There is a price difference between regional endpoints and global endpoints.The global endpoint quotas and supported model capabilitiescan differ from the regional endpoints. For more information, view the relatedthird-party model page.

Specify the global endpoint

To use the global endpoint, set the region toglobal.

For example, the request URL for a curl command uses the following format:https://aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/global/publishers/PUBLISHER_NAME/models/MODEL_NAME

For the Vertex AI SDK, a regional endpoint is the default. Set theregion toGLOBAL to use the global endpoint.

Supported models

The global endpoint is available for the following models:

Note: Prompt Caching is supported when using the global endpoint.Provisioned Throughput isn't supported when using the global endpoint.

Restrict global API endpoint usage

To help enforce the use of regional endpoints, use theconstraints/gcp.restrictEndpointUsage organization policy constraint to blockrequests to the global API endpoint. For more information, seeRestricting endpoint usage.

Grant user access to partner models

For you to enable partner models and make a prompt request, a Google Cloudadministrator mustset the required permissions andverifythe organization policy allows the use of requiredAPIs.

Set required permissions to use partner models

The following roles and permissions are required to use partner models:

  • You must have the Consumer Procurement Entitlement ManagerIdentity and Access Management (IAM) role. Anyone who's been granted this role canenable partner models in Model Garden.

  • You must have theaiplatform.endpoints.predict permission. This permissionis included in the Vertex AI User IAM role. For moreinformation, seeVertex AIUser andAccess control.

Console

  1. To grant the Consumer Procurement Entitlement Manager IAMroles to a user, go to theIAM page.

    Go to IAM

  2. In thePrincipal column, find the userprincipal for which youwant to enable access to partner models, and then clickEdit principal in that row.

  3. In theEdit access pane, clickAdd another role.

  4. InSelect a role, selectConsumer Procurement Entitlement Manager.

  5. In theEdit access pane, clickAdd another role.

  6. InSelect a role, selectVertex AI User.

  7. ClickSave.

gcloud

  1. In the Google Cloud console, activate Cloud Shell.

    Activate Cloud Shell

  2. Grant the Consumer Procurement Entitlement Manager role that's requiredto enable partner models in Model Garden

    gcloudprojectsadd-iam-policy-bindingPROJECT_ID\--member=PRINCIPAL--role=roles/consumerprocurement.entitlementManager
  3. Grant the Vertex AI User role that includes theaiplatform.endpoints.predict permission which is required to makeprompt requests:

    gcloudprojectsadd-iam-policy-bindingPROJECT_ID\--member=PRINCIPAL--role=roles/aiplatform.user

    ReplacePRINCIPAL with the identifier forthe principal. The identifier takes the formuser|group|serviceAccount:email ordomain:domain—forexample,user:cloudysanfrancisco@gmail.com,group:admins@example.com,serviceAccount:test123@example.domain.com, ordomain:example.domain.com.

    The output is a list of policy bindings that includes the following:

    -   members:  -   user:PRINCIPAL  role: roles/roles/consumerprocurement.entitlementManager

    For more information, seeGrant a single roleandgcloud projects add-iam-policy-binding.

Set the organization policy for partner model access

To enable partner models, your organization policy must allow the followingAPI: Cloud Commerce Consumer Procurement API -cloudcommerceconsumerprocurement.googleapis.com

If your organization sets an organization policy torestrict service usage,then an organization administrator must verify thatcloudcommerceconsumerprocurement.googleapis.com is allowed bysetting the organization policy.

Also, if you have an organization policy that restricts model usage inModel Garden, the policy must allow access to partner models. For moreinformation, seeControl modelaccess.

Partner model regulatory compliance

Thecertifications forGenerative AI on Vertex AI continue toapply when partner models are used as a managed API using Vertex AI.If you need details about the models themselves, additional information can befound in the respective Model Card, or you can contact the respective modelpublisher.

Your data is stored at rest within the selected region or multi-region forpartner models on Vertex AI, but the regionalization of dataprocessing may vary. For a detailed list of partner models' data processingcommitments, seeData residency for partnermodels.

Customer prompts and model responses are not shared with third-parties whenusing the Vertex AI API, including partner models. Google only processesCustomer Data as instructed by the Customer, which is further described in ourCloud Data Processing Addendum.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-19 UTC.