Vertex AI partner models for MaaS Stay organized with collections Save and categorize content based on your preferences.
Vertex AI supports a curated list of models developed by Google partners.Partner models can be used withVertex AI as a model as aservice (MaaS) and are offered as a managed API. When you use a partner model,you continue to send your requests to Vertex AI endpoints. Partner modelsare serverless so there's no need to provision or manage infrastructure.
Partner models can be discovered using Model Garden. You can alsodeploy models using Model Garden. For more information, seeExplore AImodels inModel Garden.While information about each available partner model can be found on its modelcard in Model Garden, only third-party models that perform as aMaaS with Vertex AI are documented in this guide.
Anthropic's Claude and Mistral models are examples of third-party managed modelsthat are available to use on Vertex AI.
Partner models
The following partner models are offered as managed APIs on Vertex AIModel Garden (MaaS):
| Model name | Modality | Description | Quickstart |
|---|---|---|---|
| Claude Sonnet 4.6 | Language, Vision | Claude Sonnet 4.6 delivers frontier intelligence at scale—built for coding, agents, and enterprise workflows. | Model card |
| Claude Opus 4.6 | Language, Vision | The next generation of Anthropic's most intelligent model, Claude Opus 4.6 is an industry leader across coding, agents, computer use, and enterprise workflows. | Model card |
| Claude Opus 4.5 | Language, Vision | The next generation of Anthropic's most intelligent model, Claude Opus 4.5 is an industry leader across coding, agents, computer use, and enterprise workflows. | Model card |
| Claude Sonnet 4.5 | Language, Vision | Anthropic's mid-sized model for powering real-world agents, with capabilities in coding, computer use, cybersecurity, and working with office files like spreadsheets. | Model card |
| Claude Opus 4.1 | Language, Vision | An industry leader for coding. It delivers sustained performance on long-running tasks that require focused effort and thousands of steps, significantly expanding what AI agents can solve. Ideal for powering frontier agent products and features. | Model card |
| Claude Haiku 4.5 | Language, Vision | Claude Haiku 4.5 delivers near-frontier performance for a wide range of use cases, and stands out as one of the best coding models in the world–with the right speed and cost to power free products and high-volume user experiences. | Model card |
| Claude Opus 4 | Language, Vision | Claude Opus 4 delivers sustained performance on long-running tasks that require focused effort and thousands of steps, significantly expanding what AI agents can solve. | Model card |
| Claude Sonnet 4 | Language, Vision | Anthropic's mid-size model with superior intelligence for high-volume uses, such as coding, in-depth research, and agents. | Model card |
| Anthropic's Claude 3.5 Sonnet v2 | Language, Vision | The upgraded Claude 3.5 Sonnet is a state-of-the-art model for real-world software engineering tasks and agentic capabilities. Claude 3.5 Sonnet delivers these advancements at the same price and speed as its predecessor. | Model card |
| Anthropic's Claude 3 Haiku | Language | Anthropic's fastest vision and text model for near-instant responses to basic queries, meant for seamless AI experiences mimicking human interactions. | Model card |
| Anthropic's Claude 3.5 Sonnet | Language | Claude 3.5 Sonnet outperforms Anthropic's Claude 3 Opus on a wide range of Anthropic's evaluations with the speed and cost of Anthropic's mid-tier model, Claude 3 Sonnet. | Model card |
| Jamba 1.5 Large (Preview) | Language | AI21 Labs's Jamba 1.5 Large is designed for superior quality responses, high throughput, and competitive pricing compared to other models in its size class. | Model card |
| Jamba 1.5 Mini (Preview) | Language | AI21 Labs's Jamba 1.5 Mini is well balanced across quality, throughput, and low cost. | Model card |
| Mistral Medium 3 | Language | Mistral Medium 3 is a versatile model designed for a wide range of tasks, including programming, mathematical reasoning, understanding long documents, summarization, and dialogue. | Model card |
| Mistral OCR (25.05) | Language, Vision | Mistral OCR (25.05) is an Optical Character Recognition API for document understanding. The model comprehends each element of documents such as media, text, tables, and equations. | Model card |
| Mistral Small 3.1 (25.03) | Language | Mistral Small 3.1 (25.03) is the latest version of Mistral's Small model, featuring multimodal capabilities and extended context length. | Model card |
| Codestral 2 | Language, Code | Codestral 2 is Mistral's code generation specialized model built specifically for high-precision fill-in-the-middle (FIM) completion that helps developers write and interact with code through a shared instruction and completion API endpoint. | Model card |
Vertex AI partner model pricing with capacity assurance
Google offers provisioned throughput for some partner models that reservesthroughput capacity for your models for a fixed fee. You decide on thethroughput capacity and in which regions to reserve that capacity. Becauseprovisioned throughput requests are prioritized over the standard pay-as-you-gorequests, provisioned throughput provides increased availability. When thesystem is overloaded, your requests can still be completed as long as thethroughput remains under your reserved throughput capacity. For more informationor to subscribe to the service,contact sales.
Regional and global endpoints
For regional endpoints, requests are served from your specified region. In caseswhere you have data residency requirements or if a model doesn't support theglobal endpoint, use the regional endpoints.
When you use the global endpoint, Google can process and serve your requestsfrom any region that is supported by the model that you are using, which mightresult in higher latency in some cases. The global endpoint helps improveoverall availability and helps reduce errors.
There is a price difference between regional endpoints and global endpoints.The global endpoint quotas and supported model capabilitiescan differ from the regional endpoints. For more information, view the relatedthird-party model page.
Specify the global endpoint
To use the global endpoint, set the region toglobal.
For example, the request URL for a curl command uses the following format:https://aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/global/publishers/PUBLISHER_NAME/models/MODEL_NAME
For the Vertex AI SDK, a regional endpoint is the default. Set theregion toGLOBAL to use the global endpoint.
Supported models
The global endpoint is available for the following models:
- Claude Opus 4.6
- Claude Sonnet 4.6
- Claude Opus 4.5
- Claude Opus 4.1
- Claude Opus 4
- Claude Sonnet 4.5
- Claude Sonnet 4
- Claude 3.7 Sonnet
- Claude 3.5 Sonnet v2
- Claude Haiku 4.5
Restrict global API endpoint usage
To help enforce the use of regional endpoints, use theconstraints/gcp.restrictEndpointUsage organization policy constraint to blockrequests to the global API endpoint. For more information, seeRestricting endpoint usage.
Grant user access to partner models
For you to enable partner models and make a prompt request, a Google Cloudadministrator mustset the required permissions andverifythe organization policy allows the use of requiredAPIs.
Set required permissions to use partner models
The following roles and permissions are required to use partner models:
You must have the Consumer Procurement Entitlement ManagerIdentity and Access Management (IAM) role. Anyone who's been granted this role canenable partner models in Model Garden.
You must have the
aiplatform.endpoints.predictpermission. This permissionis included in the Vertex AI User IAM role. For moreinformation, seeVertex AIUser andAccess control.
Console
To grant the Consumer Procurement Entitlement Manager IAMroles to a user, go to theIAM page.
In thePrincipal column, find the userprincipal for which youwant to enable access to partner models, and then clickEdit principal in that row.
In theEdit access pane, clickAdd another role.
InSelect a role, selectConsumer Procurement Entitlement Manager.
In theEdit access pane, clickAdd another role.
InSelect a role, selectVertex AI User.
ClickSave.
gcloud
In the Google Cloud console, activate Cloud Shell.
Grant the Consumer Procurement Entitlement Manager role that's requiredto enable partner models in Model Garden
gcloudprojectsadd-iam-policy-bindingPROJECT_ID\--member=PRINCIPAL--role=roles/consumerprocurement.entitlementManagerGrant the Vertex AI User role that includes the
aiplatform.endpoints.predictpermission which is required to makeprompt requests:gcloudprojectsadd-iam-policy-bindingPROJECT_ID\--member=PRINCIPAL--role=roles/aiplatform.userReplace
PRINCIPALwith the identifier forthe principal. The identifier takes the formuser|group|serviceAccount:emailordomain:domain—forexample,user:cloudysanfrancisco@gmail.com,group:admins@example.com,serviceAccount:test123@example.domain.com, ordomain:example.domain.com.The output is a list of policy bindings that includes the following:
- members: - user:PRINCIPAL role: roles/roles/consumerprocurement.entitlementManagerFor more information, seeGrant a single roleand
gcloud projects add-iam-policy-binding.
Set the organization policy for partner model access
To enable partner models, your organization policy must allow the followingAPI: Cloud Commerce Consumer Procurement API -cloudcommerceconsumerprocurement.googleapis.com
If your organization sets an organization policy torestrict service usage,then an organization administrator must verify thatcloudcommerceconsumerprocurement.googleapis.com is allowed bysetting the organization policy.
Also, if you have an organization policy that restricts model usage inModel Garden, the policy must allow access to partner models. For moreinformation, seeControl modelaccess.
Partner model regulatory compliance
Thecertifications forGenerative AI on Vertex AI continue toapply when partner models are used as a managed API using Vertex AI.If you need details about the models themselves, additional information can befound in the respective Model Card, or you can contact the respective modelpublisher.
Your data is stored at rest within the selected region or multi-region forpartner models on Vertex AI, but the regionalization of dataprocessing may vary. For a detailed list of partner models' data processingcommitments, seeData residency for partnermodels.
Customer prompts and model responses are not shared with third-parties whenusing the Vertex AI API, including partner models. Google only processesCustomer Data as instructed by the Customer, which is further described in ourCloud Data Processing Addendum.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2026-02-19 UTC.