Vertex AI release notes

This page documents production updates to Vertex AI.Check this page for announcements about new or updated features, bug fixes,known issues, and deprecated functionality.

You can see the latest product updates for all of Google Cloud on the Google Cloud page, browse and filter all release notes in theGoogle Cloud console, or programmatically access release notes inBigQuery.

To get the latest product updates delivered to you, add the URL of this page to yourfeed reader, or add thefeed URL directly.

December 17, 2025

Feature

Cloud API Registry is availablein the Google Cloud console inPreview. UseCloud API Registry in the Google Cloud console to view and manage the MCPservers and tools your agent has access to.

Feature

Gemini 3 Flash

Gemini 3 Flash is now available in public preview. This model is designed totackle the most challenging agentic problems with strong coding andstate-of-the-art reasoning capabilities, and is our best model for complexmultimodal understanding.

For more information, seeGemini 3Flash.

December 16, 2025

Change

Vertex AI Agent Engine

Vertex AI Agent Engine is now available in the following regions:

europe-west6 (Zurich)
europe-west8 (Milan)
asia-east2 (Hong Kong)
asia-northeast3 (Seoul)
asia-southeast2 (Jakarta)
northamerica-northeast2 (Toronto)
southamerica-east1 (São Paulo)

For more information, seeVertex AI Agent Builder locations.

Announcement

Vertex AI Agent Engine

Vertex AI Agent EngineSessionsandMemory Bank are nowGenerally Available.

Change

Updated pricing forVertex AI Agent Engine:

Pricing for Vertex AI Agent Engine Runtime was lowered.
OnJanuary 28, 2026, Sessions, Memory Bank, and Code Execution will begincharging for usage.

For more information, seePricing.

December 12, 2025

Feature

Gemini 2.5 Flash with Gemini Live API Native Audio

Gemini 2.5 Flash with Gemini Live API Native Audio (gemini-live-2.5-flash-native-audio) is Generally Available (GA).This model features cutting-edge native audio functionality forGemini Live API, including enhanced voice quality and adaptability, Proactive Audio, and Affective Dialog.

December 10, 2025

Feature

DeepSeek-V3.2 is available in Model Garden.DeepSeek-V3.2 is a state-of-the-art large language model fromDeepSeek.DeepSeek-V3.2 is available as a managed API in Model Garden. To learn more, seeDeepSeek-V3.2.

December 09, 2025

Feature

The following models are available through Model Garden:

December 08, 2025

Feature

Veo 3.1 video extension

Veo 3.1 supports video extension in Preview.

For more information, see the following:

December 02, 2025

Feature

The following models are available through Model Garden:

Feature

The Vertex AI Model Garden model co-hosting vLLM container is available to use withthis sample notebook. You can use this container to serve multiple replicas of a model and serve multiple models with dynamic loading and unloading. This allows you to maximize resource utilization and serving efficiency, and flexibly adjust the models to serve.

November 24, 2025

Feature

Anthropic's Claude Opus 4.5

Claude Opus 4.5is available in Model Garden.

November 17, 2025

Feature

Veo video generation

Veo 3.1 is Generally Available, and introduces the following models:

For more information, see the following:

Announcement

LearnLM in Gemini

The LearnLM model is no longer a separate offering or listing on AI Studio asLearnLM capabilities have been integrated into the latest Gemini models (starting with Gemini 2.5).

Built in collaboration with experts in education,LearnLM represents ourcapabilities fine-tuned for learning informed by rigorous research. Theseadvancements and improvements are available directly in Gemini, enhancingeducational experiences and applications.

Pre-existing learnlm-2.0-flash-experimental projects will not remain functionalpast December 3, 2025 unless an alternative model is manually selected—weencourage developers to switch to the latest Gemini models and optimize theirprompts by reviewing ourLearnLM Partner Prompt Guide.

November 13, 2025

Feature

Updated Prompt Caching for Anthropic Claude Models

Prompt caching for Anthropic Claude models now supports a one-hour Time To Live (TTL).

For more information, seePrompt caching.

Feature

Kimi K2 Thinking is available in Model Garden. This model isa thinking model that excels at complex problem-solving and deep reasoning.Kimi K2 Thinking is available as a managed API in Model Garden. To learn more, seeKimi K2 Thinking.

November 11, 2025

Deprecated

Anthropic's Claude 3.7 Sonnet

Anthropic's Claude 3.7 Sonnet is deprecated as of November 11, 2025 and will beshut down on May 11, 2026. For more information, seePartner model deprecations.

November 07, 2025

Feature

Vertex AI Agent Engine

The following features are now available inPreview:

Configure, manage, and viewobservabilityfeaturessuch as sessions, traces, logs, and events for your agent in the Google Cloud console.
Use theplaygroundto test and interact with your agent in the Google Cloud console.
Evaluate your agents using theGen AI evaluation service's GenAI Client in Vertex AI SDK.
Create and manage memoryrevisions for Memory Bank.
Use Identity Access Management (IAM) to create anagentidentity to manage access andauthentication when using agents on Vertex AI Agent Engine Runtime.

The following features are now available inGA:

Express mode support forVertex AI Agent Engine Runtime.
Use the new free tier with Vertex AI Agent Engine Runtime. For moreinformation, seePricing.

November 04, 2025

Feature

MiniMax M2 is available in Model Garden. This model isis built for end-to-end development workflows and has strong capabilitiesin planning and executing complex tool-calling tasks. The model isoptimized to provide a balance of performance, cost, and inference speed.MiniMax M2 is available as a managed API in Model Garden. To learn more, seeMiniMax M2.

October 23, 2025

Feature

The following models are available through Model Garden:

October 21, 2025

Security

On September 23, 2025, we discovered a technical issue inthe Vertex AI API that resulted in a limited amount of responsesbeing misrouted between recipients for certain third-party modelswhen using streaming requests. This issue is now resolved.Google models, e.g. Gemini, werenot impacted.

Some internal proxies did not properly handle HTTP requests thathave anExpect: 100-continue header, resulting ina desynchronization in a streaming response connection, wherea response intended for one request was instead delivered asthe response for a subsequent request.

For more information, seeSecurity bulletins.

October 16, 2025

Feature

vLLM TPU

vLLM TPU, ahighly-efficient serving framework for large language models (LLM) that'soptimized forCloud TPU hardware, is available throughModel Garden.

Feature

Mistral's Codestral 2

You can use Mistral'sCodestral 2in Model Garden.

October 15, 2025

Feature

Anthropic's Claude Haiku 4.5

You can use Anthropic'sClaude Haiku 4.5in Model Garden.

Feature

Veo video generation

Veo 2 supports adding and removing objects from videos in Preview.

For more information about Veo 2, seeVeo 2Preview

For more information about adding and removing objects, see the following:

Feature

Veo video generation

Veo 3.1 is available in Preview, and introduces the following models:

For more information, see the following:

October 14, 2025

Deprecated

Imagen 4 preview models

The following Imagen 4 preview models will be removed onNovember 30, 2025 :imagen-4.0-generate-preview-06-06,imagen-4.0-ultra-generate-preview-06-06, andimagen-4.0-fast-generate-preview-06-06. To avoid servicedisruption, migrate all workflows that use Imagen 4 preview models beforeNovember 30, 2025 , 2025, to the following Imagen 4 GenerallyAvailable models:imagen-4.0-generate-001,imagen-4.0-ultra-generate-001,imagen-4.0-fast-generate-001.

Deprecated

Imagen subject and style fine-tuning

Imagen subject model and style model tuning will be removed onDecember 31, 2025. We recommend that you useGemini 2.5 Flash Image, which supports most use cases that requirefine-tuning. For more information, see Edit images withGemini.

October 09, 2025

Change

Imagen

Imagen's virtual try-on model,virtual-try-on-preview-08-04was updated on September 30, 2025, to more accurately preserve the person'sbody shape and preserve the garment's identity.

October 07, 2025

Feature

The following Qwen models are available inModel Garden:

Qwen-Image
Qwen-Image-Edit
Qwen-Image-Edit-2509

Feature

Save and share prompts in Vertex AI Studio: You can now save and share prompts in Vertex AI Studio. Sharing prompts lets you collaborate with team members, ensure consistency, and build a library of effective prompts for various tasks. For more information, seeSave and share prompts.

Announcement

TheGemini 2.5 Computer Use model and tool (gemini-2.5-computer-use-preview-10-2025) is now available in Preview. The Computer Use model and tool lets you enable your applications to interact with and automate tasks in the browser. With the Computer Use model and tool, you can build agents that can:

Automate repetitive data entry or form filling on websites.
Navigate websites to gather information.
Assist users by performing sequences of actions in web applications.

October 06, 2025

Change

Updated pricing for Vertex AI Agent Engine: Starting onNovember 6, 2025, Vertex AI Agent Engine Runtime will start charging for runtime usage for the following regions:

asia-southeast1 (Singapore)
australia-southeast2 (Melbourne)
europe-west2 (London)
europe-west3 (Frankfurt)
europe-west4 (Netherlands)

For more details, seePricing for Vertex AI Agent Engine.

Feature

Access Transparency for Vertex AI Agent Engine: Access Transparency is now available for Vertex AI Agent Engine. For more information, see the overview forEnterprise security.

October 03, 2025

Feature

Prompt management

Vertex AI offers tooling to help manage prompts and prompt versions. In addition to the prompt management capabilities in Vertex AI Studio, prompts can be stored and versioned using the Vertex AI SDK.

For more information, see thePrompt management API reference.

October 02, 2025

Announcement

Gemini 2.5 Flash Image (gemini-2.5-flash-image) is now generally available. This GA release adds support for aspect ratio controls, image-only response modality, regional endpoints,support for batch predictions,image generation from multiple reference images, andimproved multi-turn image editing.

SeeGemini 2.5 Flash Image for more information.

Feature

Google Gen AI SDK in C# Preview

Preview: The Google Gen AI SDK is available in C#. Seegoogleapis/dotnet-genai.

This release includes support forGenerateContentAsync,GenerateContentStreamAsync,GenerateImagesAsync, and three Live APIs, which includesSendClientContentAsync,SendRealtimeInputAsync, andSendToolResponseAsync.

September 30, 2025

Feature

DeepSeek-V3.2-Exp is available through Model Garden.

September 25, 2025

Announcement

New preview models forGemini 2.5 Flash and2.5 Flash-Lite are now available. These models are available at the following versioned endpoints:

gemini-2.5-flash-preview-09-2025
gemini-2.5-flash-lite-preview-09-2025

September 24, 2025

Deprecated

Access to Gemini's 1.5 models has been discontinued. For more information, see ourModel versions page.

September 23, 2025

Announcement

Gemini 2.5 Flash with Live API Native Audio Preview

Gemini 2.5 Flash with Live API Native Audio (gemini-live-2.5-flash-preview-native-audio-09-2025) is available inPreview. A single, unified model processes audio input and generates audio output directly, eliminating separate text-to-speech/speech-to-text conversions. This results in-low latency, high-quality, and incredibly human-like conversations. New features and capabilities include:

Improved Barge-in: Interrupt Gemini more naturally and reliably, even in loud and noisy environments.
Robust Function Calling: We've improved the triggering rate, allowing Gemini to successfully execute the functions you define with greater precision.
Accurate Transcription: The accuracy of audio-to-text transcription has been significantly enhanced.
Seamless Multilingual Support: Speak to Gemini in multiple languages, and it will effortlessly switch between them without any pre-configuration. Language is no longer a barrier!
Enhanced Audio Quality: Experience a dramatically improved audio quality that truly feels like speaking with a person.
Proactive Audio: Define Gemini's expertise and set conditions for when it should respond. Gemini can act as a "silent listener," only chiming in when the conversation touches upon its designated area of expertise.
Affective Dialog: Gemini can adapt and adjust its generated voice to match the emotional tone of the speaker, creating more empathetic and natural interactions.

Watch our comprehensive demo to see these features in action, including seamless language switching, expert mode, emotionally aware responses, memory recall, and interactive screen sharing for engineering tasks – all demonstrated directly within Vertex AI Studio without writing a single line of code!

September 22, 2025

Feature

DeepSeek-V3.1-Terminus is available through Model Garden.

September 18, 2025

Change

Grounding with Google Maps

Grounding with Google Maps has implemented the following changes:

Removed the following fields from the API response:
- grounding_chunk.maps.text
- grounding_chunk.maps.place_answer_sources.review_snippets.author_attribution
- grounding_chunk.maps.place_answer_sources.flag_content_uri
- grounding_chunk.maps.place_answer_sources.review_snippets.flag_content_uri
The widget context token is only returned when the optionalwidget_token_enable input flag is set.

To learn more, seeGrounding with Google Maps.

September 15, 2025

Change

Imagen

We improved Imagen's virtual try-on model,virtual-try-on-preview-08-04, so that it is better at preserving the person's body shape and preserving the garment product's identity.

September 10, 2025

Feature

Vertex AI Agent Engine

Agent Engine now supports the following features:

Agent EngineCode Execution, now in Preview, lets your agent run code in an isolated sandbox environment. For more information, seeCode Execution.
You can now develop, deploy, and use agents that support theAgent-to-Agent (A2A) protocol on Agent Engine. For more information, seeDevelop an Agent2Agent agent.
Agent Engine now supportsbidirectional streaming. For more information, seeBidirectional streaming.
TheAgent Engine page in the Cloud Console UI now has a newMemory Bank tab for displaying and managing memories.

Breaking

Vertex AI Agent Engine

In versionv1.112.0 of the Vertex AI SDK for Python, theagent_engines module has been refactored to aclient-based design. For information about updating your existing code to the new design, see theMigration guide.

September 09, 2025

Feature

AI Singapore's SEA-LION V4 models are available through Model Garden. They are open models for Southeast Asian languages, built by leveraging Vertex Model Development Service for enhanced training efficiency and model accuracy.

Feature

EmbeddingGemma andDeepSeek-V3.1 models are available through Model Garden.

September 08, 2025

Feature

Veo video generation

Veo 3 support for short-duration videos isgenerally available. You can use Veo 3 to create 4, 6, or 8 second videos. For more information, see the following:

September 03, 2025

Change

Vertex AI RAG Engine: Managed Database (Spanner)

Customers will be charged for the use of a Google-managed Spanner instance that's provisioned in a Google tenant project, using standard Spanner SKUs.

For more information, seeVertex AI RAG Engine billing.

August 26, 2025

Announcement

Gemini 2.5 Flash Image Preview

Gemini 2.5 Flash Image (gemini-2.5-flash-image-preview) is available inPreview. Gemini 2.5 Flash Image Preview supports additional image generation and editing features such asimage generation from multiple reference images andimproved multi-turn image editing.

Feature

Vertex AI model tuning and Gen AI evaluation service

Vertex AI model tuning now supports integration with the Gen AI evaluation service in Preview. You can automatically run evaluations on your tuned models and intermediate checkpoints. For more information, seeCreate a tuning job.

August 21, 2025

Feature

Vertex AI Agent Engine

Agent Engine now supports the following enterprise security features:

You can now deploy your agents in a private VPC environment, configuring a Private Service Connect interface, to ensure data privacy and meet security and compliance requirements. For more information, seeConfigure Private Service Connect interface.
You can now use your owncustomer-managed encryption keys (CMEK) to protect data at rest.
You can now specifycustomized resource controls, such as the minimum and maximum number of application instances, resource limits for each container, and concurrency for each container.
As a part of Vertex AI Platform, Vertex AI Agent Engine now supportsHIPAA workloads.

For more information, seeAgent Engine overview.

August 14, 2025

Announcement

Imagen

Imagen 4 is Generally Available.

Imagen 4 introduces the following models:

For more information, seeGenerate images using text prompts andImage generation API.

Feature

Gemma 3 270M,Wan 2.2 andWan 2.1 models are available through Model Garden.

August 13, 2025

Feature

OpenAI'sgpt-oss-120b andgpt-oss-20b are available as Model as a Service (MaaS) models in Model Garden.

Feature

Qwen3 Coder andQwen3 235B are available as Model as a Service (MaaS) models in Model Garden.

August 08, 2025

Feature

Gemini 2.5 Flash-Lite andGemini 2.5 Pro now support supervised fine-tuning. For more information, seeAbout supervised fine-tuning for Gemini models.

August 07, 2025

Feature

Vertex AI prompt optimizer

The Vertex AI prompt optimizer is nowgenerally available. For more information, seeOptimize prompts.

We now offer azero-shot prompt optimizer.

Feature

Model tuning

You can now perform supervised fine-tuning on open models such as Llama 3.1. For more information, seeTune an open model.

Feature

Vertex AI Agent Engine

You can use your owncustom service account for agent identity to manage permissions and access according to your organization's security policies.

August 06, 2025

Feature

Imagen

Virtual try-on lets you generate virtual try-on images from an image of aperson and product photos that you provide, and is available in Preview. For more information, seeGenerate Virtual Try-On Images andVirtual Try-On API.

This release note is incorrect; see entry forOctober 9, 2025.

Feature

OpenAI's gpt-oss models are available through Model Garden.

July 29, 2025

Announcement

Veo video generation Veo 3 and Veo 3 Fast are now generally available. For more information, seeGenerate videos using text prompts.

July 23, 2025

Change

Grounding with Google Maps is available in all regions (except for the EEA) as a Preview (Pre-GA) feature.

July 22, 2025

Announcement

Gemini 2.5 Flash-Lite is now generally available and accessible using the API and Vertex AI Studio. This GA release includes support for explicit caching andbatch prediction, as well as expanded region support.

SeeGemini 2.5 Flash-Lite for more information.

July 17, 2025

Announcement

Veo 3 preview models now support upscaling for 1080p resolution using the newresolution parameter. For more information, seeVeo on Vertex AI.

July 16, 2025

Feature

AddedGemma 3 fine-tuning notebook using Axolotl docker with support for 1b, 4b, 12b, and 27b variants.

July 14, 2025

Feature

Multimodal MedGemma 27B IT,MedSigLIP, andT5Gemma models are available through Model Garden.

July 08, 2025

Feature

Vertex AI Agent Engine

Vertex AI Agent Engine Memory Bank is now available in Preview. Memory Bank lets you dynamically generate long-term memories based on users' conversations with your agent.

July 03, 2025

Feature

Vertex AI Agent Garden

Vertex AI Agent Garden now supports filtering by tags.

June 27, 2025

Feature

Gemma 3n models are now available through Model Garden.

Feature

Multimodal datasets are now available in preview. For more information, seeMultimodal datasets.

June 24, 2025

Deprecated

Starting on June 24, 2025, Imagen versions 1 and 2, image captioning, and visual question answering are deprecated.

On September 24, 2025, the following features and models will be removed:

image captioning
visual question answering
Imagen 1 modelimagegeneration@002
Imagen 2 modelsimagegeneration@005 andimagegeneration@006

For more information, seeMigrate to Imagen 3.

June 23, 2025

Announcement

Veo 2 support for advanced video controls is Generally Available. In addition to a providing a first frame of a video, you can specify the last frame of a video or a video to extend in length. For more information, seeVeo on Vertex AI API.

June 17, 2025

Deprecated

Preview endpoint availability and removal: All existing Gemini 2.5 Flash and Pro preview endpoints (listed below) will continue to be available with their current preview pricing until July 15, 2025. After this date, these preview endpoints will be shut down.

gemini-2.5-flash-preview-04-17
gemini-2.5-flash-preview-05-20
gemini-2.5-pro-preview-03-25
gemini-2.5-pro-preview-05-06
gemini-2.5-pro-preview-06-05

Announcement

Gemini 2.5 Flash andGemini 2.5 Pro are now generally available and accessible using the API and Vertex AI Studio.

SeeGemini 2.5 Flash andGemini 2.5 Pro for more information.

Announcement

Live API is now available as a private general availability offering in the API and Vertex AI Studio. Reach out to your Google account team representative to request access.

SeeLive API for more information.

Announcement

Gemini 2.5 Flash-Lite is now available as a preview offering in both the API and Vertex AI Studio.

SeeGemini 2.5 Flash-Lite for more information.

Change

Provisioned Throughput (PT): Once a model is GA, all new PT purchases will be for GA endpoints only. If you've purchased PT for a specific preview version, it will still work for that specific preview. However, you mustmigrate the existing PT to the GA endpoint or purchase new PT for the GA endpoint byJuly 15, 2025.

Change

Updated pricing for Gemini 2.5 Flash GA: The price for Gemini 2.5 Flash in GA will be adjusted to reflect its quality and unified output token pricing. This includes lower prices for thinking output, higher prices for non-thinking output. These pricing changes will take effect on the new GA endpoint as shared above. Preview pricing will only continue on existing preview endpoints for 30 days post-GA onJuly 15, 2025.

Change

Updated preview endpoints: EffectiveJune 19, 2025,gemini-2.5-flash-preview-04-17 endpoint will serve the Gemini 2.5 Flash model version released on 05-20, which has been promoted to GA. Similarly, thegemini-2.5-pro-preview-05-06 and03-25 endpoints will serve the Gemini 2.5 Pro model version released on 06-05, also promoted to GA. This update ensures continuity during your transition.

June 16, 2025

Announcement

The DeepSeek API service on Vertex AI is inPreview. For more information, see theDeepSeek model card in Model Garden.

June 11, 2025

Change

Imagen 4's public preview models are updated to the following:

imagen-4.0-generate-preview-06-06
imagen-4.0-fast-generate-preview-06-06
imagen-4.0-ultra-generate-preview-06-06

For more information about each model, seePreview Imagen models.

To avoid service interruption, migrate fromimagen-4.0-ultra-generate-exp-05-20 andimagen-4.0-generate-preview-05-20 before 2025-07-07.

June 09, 2025

Change

Gemini API

Thelogprobs andresponse_logprobs parameters for the Gemini API are nowgenerally available. For more information, seeGenerate content with Gemini API.

June 05, 2025

Feature

Gemini 2.5 Pro's public preview version has been updated togemini-2.5-pro-preview-06-05 and includes expanded support for thinking. This model version is available in the API and Vertex AI Studio.

SeeGemini 2.5 Pro for model details.

June 03, 2025

Announcement

Model Garden now includesDeepSeek-R1-0528 variants.

Announcement

In Model Garden, the following fine tuning features have been added:

Gemma 3 UI fine-tuning using PEFT docker.
Qwen 2.5 fine-tuning notebook using PEFT docker.
Qwen 3 fine-tuning notebook using Axolotl docker.
lm-evaluation-harness as an evaluation service in theLlama 3.3,Llama 3.1,Gemma 3 andGemma 2 fine-tuning notebooks.

May 23, 2025

Announcement

Mistral OCR is an Optical Character Recognition API for document understanding. It isGA on Vertex AI. For more information, see theMistral OCR model card in Model Garden.

May 22, 2025

Announcement

Anthropic's Claude Opus 4 and Claude Sonnet 4 areGA on Vertex AI and supportProvision Throughput. For more information, see theClaude Opus 4 orClaude Sonnet 4 model card in Model Garden.

May 20, 2025

Feature

Vertex AI Agent Engine

The following features are now available in Preview:

Change

Gemini 2.5 Flash's public preview version has been updated togemini-2.5-flash-preview-5-20.

SeeGemini 2.5 Flash for model details.

The model is available in the API and Vertex AI Studio.

Feature

Audio-to-audio support forGemini 2.5 Flash with Live API is now available as a private preview. Users must be allowlisted to use this new feature.

The model is available in the API and Vertex AI Studio.

SeeLive API for details.

Announcement

MedGemma models are available in Model Garden.

Feature

Veo 3

Veo 3 is available in Preview for allowlisted accounts.

For more information about Veo 3, seeVeo | AI Video Generator andVeo on Vertex AI API.

The model is available in the API and Vertex AI Studio.

Announcement

Lyria 2, our latest music generation model, is now generally available.

See ourmusic generation prompt guide and ouruser guide for more information.

The model is available in the API and Vertex AI Studio.

Feature

Imagen 4

Imagen 4 offers two Preview models:Imagen 4 Generate Preview 05-20, andImagen 4 Ultra Generate Experimental 05-20.

For more information, seeGenerate images using text prompts and theGenerate images API.

The model is available in the API and Vertex AI Studio.

Feature

Thought summaries are now available as an experimental feature for Gemini 2.5 Pro and 2.5 Flash.

For details, seeThinking.

The model is available in the API and Vertex AI Studio.

Announcement

New stable text embeddings models are now generally available:

gemini-embedding-001
text-embedding-005

For more information, seeGet text embeddings.

May 14, 2025

Deprecated

MedLM is deprecated. Access to MedLM will no longer be available on or after September 29, 2025.

May 07, 2025

Feature

Gemini 2.0 Flash with image generation (gemini-2.0-flash-preview-image-generation) is now available as a public preview offering.

For more information, seeGenerate images with Gemini.

Fixed

Seed parameter is now in GA and supports Gemini 2.5 model family.

May 05, 2025

Change

Grounding

The following grounding features aregenerally available:

May 02, 2025

Announcement

The global endpoint is generally available (GA). For details, seeGlobal endpoint.

April 30, 2025

Feature

Llama 4 Maverick and Scout models are available inModel Garden withModel-as-a-Service API Service andself-hosted deployments.
HiDream-I1,Llama Guard 4,Llama Prompt Guard 2, andQwen3 are available inModel Garden.

Change

Additional materials are available for deploying a model in Model Garden by using thePython SDK, gcloud CLI, or API, which are available inPreview:

April 29, 2025

Announcement

Gemini 1.5 Pro and Gemini 1.5 Flash models are not available in projects that have no prior usage of these models, including new projects. For details, seeModel versions and lifecycle.

April 17, 2025

Announcement

Gemini 2.5 Flash with thinking and other well-rounded capabilities is now available inPreview.

April 10, 2025

Announcement

Managed APIs for Llama 4 Maverick and Scout are inPreview on Vertex AI. For more information, see theLlama 4 model card.

April 09, 2025

Feature

Agent Development Kit (ADK) is now available inPreview. For more information, seeAgent Development Kit.

Feature

Gemini Live API is now available as a public preview offering and has been updated with the following features:

Support for responses in 8 voices and 31 languages using Chirp 3
Updated UI support in Vertex AI Studio
Expanded conversation session window
Ability to extend conversation sessions
Support to share your current screen with Gemini during conversations
Transcription support for audio in and audio out
Support to change or update the system instructions mid-session

For more information, seeGemini 2.0 Flash Live API.

Feature

Vertex AI Agent Engine

The following features are now available for Vertex AI Agent Engine in Preview:

The following features are now generally available for Vertex AI Agent Engine:

Agent monitoring

Feature

Gemini 2.5 Pro is now available as a public preview offering.

For more information, seeGemini 2.5 Pro.

Feature

Agent Garden is now available inPreview. For more information, seeVertex AI Agent Builder overview or go directly toAgent Garden in the Cloud Console.

Feature

Grounding: Grounding with Google Maps is now available as a Public Experimental feature. For more information, seeGrounding with Google Maps.

Feature

Grounding: Web Grounding for Enterprise is now Generally available. For more information, seeWeb Grounding for Enterprise.

Change

Vertex AI Agent Builder now refers to a suite of features for building and deploying AI agents in Vertex AI. For more information see,Vertex AI Agent Builder overview.

The original Vertex AI Agent Builder product has been renamedAI Applications. The product functionality and endpoints remain the same. For more information, seeWhat is AI Applications?.

March 25, 2025

Feature

DeepSeek-V3-0324,TxGemma andSesame CSM are now available inModel Garden.
DeepSeek-R1,V3 andV3-0324 can be deployed with H200 GPUs and improved vLLM support.
You can deploy a model in Model Garden by using thePython SDK, gcloud CLI, or API, which are available inPreview. You can get started with the "Equivalent code" in the deploy panel in the Model Garden console.

March 20, 2025

Announcement

Anthropic's Claude Sonnet 3.7 isGA on Vertex AI and supportsProvision Throughput. To learn more, view theClaude Sonnet 3.7 model card in Model Garden.

March 17, 2025

Announcement

Mistral Small 3.1 (25.03) feature multimodal capabilities and a context of up to 128,000 tokens. For more information, see theMistral Small 3.1 (25.03) model card in Model Garden.

March 14, 2025

Feature

Judge model evaluation and customization tools are now available in Preview for theGen AI evaluation service in Vertex AI.

March 13, 2025

Announcement

Context caching for Gemini on Vertex AI is generally available (GA).

March 12, 2025

Feature

Gemma 3 andShieldGemma 2 are now available in Model Garden.
CogVideoX-2b is now available in Model Garden.

Change

Model Garden fine tuning updates:

Added aworkbench-based notebook for Llama 3.1 finetuning.
UpdatedLlama 3.1 andGemma 2 UI fine-tuning with the updated PEFT docker.

March 11, 2025

Change

Gemini 2.0 Flash Tuning

Gemini 2.0 Flash fine-tuning is now generally available (GA).

Added support fortuning function calling.

March 04, 2025

Announcement

Vertex AI Agent Engine

Vertex AI Agent Engine is nowgenerally available (GA).

Billing for Vertex AI Agent Engine starts on March 4, 2025. We recommend that you delete unused resources to avoid incurring unwanted costs. For more information, seePricing.

Change

LangChain on Vertex AI has been renamed toVertex AI Agent Engine.

February 25, 2025

Feature

Gemini 2.0 Flash-Lite is now generally available

Gemini 2.0 Flash-Lite is now generally available. For more information, seeGemini 2.0.

February 24, 2025

Announcement

Anthropic's Claude Sonnet 3.7 is inPreview on Vertex AI. To learn more, view theClaude Sonnet 3.7 model card in Model Garden.

February 21, 2025

Change

PEFT Docker updates
- Added support for evaluation metrics like perplexity, bleu, google_bleu, rouge1, rouge2, rougeL, rougeLSum.
- Uses the best checkpoint and loads the model based on the best eval metrics.
- Run training and eval only for data which is less than or equal to themax_seq_length.
- Usegcloud storage rsync instead ofcsfuse to save a checkpoint.
Fine tuning updates
- You can select a service account when you clickFine-tune for a model, such asLlama 3.1.
- Added aPEFT based LLM finetuning tutorial notebook.
- Added aAxolotl based LLM finetuning notebook.
- UpdatedLlama 3.1 andGemma 2 fine-tuning notebooks with the updated PEFT Docker container.
Model updates
- Updated thePaliGemma model card by supporting PaliGemma 2 mix models, and segmentation functionality to Paligemma 1 models.
- Updated theLLaVa model card by supporting LLaVA Next models and adding vLLM to the notebook.

February 12, 2025

Feature

Deepseek-V3 and Deepseek-R1 have been added toModel Garden inPreview:

DeepSeek-V3 (671B) is a powerful Mixture-of-Experts (MoE) language model with 671B total parameters with 37B activated for each token.
DeepSeek-R1 (671B) is one of the first-generation reasoning models introduced by DeepSeek and offers performance comparable to OpenAI-o1 across math, code, and reasoning tasks.

You can use anotebook to deploy these models.

February 11, 2025

Announcement

TheLlama 3.3 70B model that is managed on Vertex AI is now inPreview.

February 07, 2025

Feature

deepseek-ai/deepseek-r1 andmicrosoft/Phi-4 models were added toModel Garden.

Change

The following advanced LLM inference optimization techniques are available in Model Garden inPreview:

Prefix caching reuses computations from previously generated text, eliminating redundant processing. It reduces time-to-first-token for requests with common prompt prefixes. Prefix caching is available for the following models:
- vLLM: Llama 3.1 (8b, 70b), Llama 3.3 (70b)
- Hex-LLM: Llama 2 (7b, 13b), Llama 3 (8b), Llama 3.1 (8b, 70b), Llama 3.2 (1b, 3b), Llama Guard (1b, 8b), CodeLlama (7b, 13b), Gemma (2b, 7b), CodeGemma (2b, 7b), Mistral-7B (v0.2, v0.3), Mixtral-8x7B (v0.1)
Speculative decoding is an effective optimization technique to reduce generation time-per-output-token latency. For more information, see theModel Garden advanced features notebook.

February 05, 2025

Feature

Gemini 2.0 Flash general availability for text-only output

Gemini 2.0 Flash is now generally available for text-only outputs. Multimodal outputs are still available only as a private preview. For more information, seeGemini 2.0.

Feature

New Gemini 2.0 Pro and Gemini 2.0 Flash-Lite models available to users

Two new models in the Gemini 2.0 family are now available to users:

Gemini 2.0 Pro: Our strongest model for coding and world knowledge, featuring a 2M long context window. Gemini 2.0 Pro is available as an experimental model in Vertex AI.
Gemini 2.0 Flash-Lite: Our fastest and most cost efficient Flash model. Gemini 2.0 Flash-Lite is available as a Preview model in Vertex AI.

For more information, seeGemini 2.0

January 31, 2025

Feature

You can now monitor usage, throughput, and latency and troubleshoot 429 errors on Vertex AI foundation models, like Google Gemini and Anthropic Claude, by using a predefined dashboard. After querying a model from theVertex AI Model Garden, you can find the name of the model you queried in the Vertex AIDashboard page under the "Model observability" heading.

To customize the dashboard and explore relevant metrics in Cloud Monitoring, clickShow All Metrics. For information about using dashboards in Cloud Monitoring, seeView and customize Google Cloud dashboards.

January 30, 2025

Deprecated

Mistral Large (24.07) and Codestral (24.05) that are offered as a Model as a Service (MaaS) models in Model Garden are deprecated. For details, seeGenerative AI on Vertex AI deprecations.

January 29, 2025

Feature

New Imagen 3 image generation model available to users

A newer improved Imagen 3 image generation model is now available to all users:

imagen-3.0-generate-002

This image generation model supports the following additional features:

Prompt enhancement - The LLM-based prompt rewriter tool adds additional details and descriptive language to the prompt you provide, generally resulting in higher quality generated images. This feature is configurable and is enabled by default.

For more information, seeImagen on Vertex AI model versions and lifecycle andGenerate images using text prompts.

January 22, 2025

Announcement

LangChain on Vertex AI

Billing for LangChain on Vertex AI will start on March 4, 2025.

The pricing structure is based on vCPU hours and GiB hours used. This means that you will be charged for both the compute (vCPU) and memory resources consumed by your LangChain on Vertex AI workloads.

You can review the pricing details in the table below.

Product	SKU ID	Price
ReasoningEngine vCPU	8A55-0B95-B7DC	$0.0994/vCPU-Hr
ReasoningEngine Memory	0B45-6103-6EC1	$0.0105/GiB-Hr

January 21, 2025

Deprecated

Anthropic's Claude 3 Sonnet that is offered as a Model as a Service (MaaS) model in Model Garden is deprecated. For details, seeGenerative AI on Vertex AI deprecations.

January 17, 2025

Feature

Agent evaluation using theGen AI evaluation service is available inPreview.

December 20, 2024

Change

RAG Engine isgenerally available (GA).

The supported models include the following:

Google Gemini
Google embedding and OSS E5 embedding models
Model Garden self-deployed OSS LLMs
Model as a service (MaaS) Llama models

The supported features include the following:

Data connectors: Google Cloud Storage, Google Drive, Slack, Jira, and SharePoint
Document types: Google Workspace documents, HTML, JSON, Markdown, PDF, and text files
Transformations: fixed-size chunking and chunk overlap
Vector databases: Vertex AI Vector Search and Pinecone

December 18, 2024

Feature

Hex-LLM: High-Efficiency Large Language Model Serving is available inGeneral Availability (GA).

This launch adds support for the following models:

Llama 3.1
Llama 3.2
Phi-3
Qwen2 and Qwen2.5

Additional supported features:

Multi-host serving.
Disaggregated serving (experimental).
Prefix caching.
AWQ quantization.

December 17, 2024

Feature

You can copy tuned Gemini 1.5 Pro 002 and Gemini 1.5 Flash 002 adapter models across projects. For details, seeCopy a model in Vertex AI Model Registry.

December 11, 2024

Feature

The Gemini 2.0 Flash (gemini-2.0-flash-exp) model is Generally available for grounded answer generation with RAG. This model is tuned to address context-based question and answering tasks. For more information, seeGround responses for Gemini models.

December 10, 2024

Feature

Imagen 3 image generation models Generally Available to all users

Imagen 3 image generation models are now available to all users without requiring prior approval. These include the following image generation models:

imagen-3.0-generate-001
imagen-3.0-fast-generate-001 (low latency model)

Prior image generation models (imagegeneration@006,imagegeneration@005,imagegeneration@002) still require approval to use.

For more information, seeImagen on Vertex AI model versions and lifecycle andGenerate images using text prompts.

Feature

Imagen 3 Customization model Generally Available to approved users

Imagen 3 Customization model is now available to approved users. This includes the following model:

imagen-3.0-capability

Imagen 3 Customization lets you guide image generation by providing reference images (few-shot learning). Imagen 3 Customization lets you customize generated images for the following feature categories:

Subject Customization (product, person, and animal companion)
Style Customization
Controlled Customization (canny edge and scribble)
Instruct Customization (Style transfer)

Feature

Imagen 3 editing model Generally Available to approved users

The Imagen 3 Editing model is now available to approved users. This includes the following model:

imagen-3.0-capability

This model offers the following additional features:

Inpainting - Add or remove content from a masked area of an image
Outpainting - Expand a masked area of an image
Product image editing - Identify and maintain a primary product while changing the background or product position

For more information, seeModel versions.

December 06, 2024

Security

A vulnerability was discovered in the Vertex AI API serving Gemini multimodal requests, allowing bypass of VPC Service Controls. For details, see theSecurity bulletins page.

November 21, 2024

Announcement

Mistral Large (24.11) is Generally Available on Vertex AI as a managed model. To learn more, view theMistral Large (24.11) model card in Model Garden.

Feature

The Gen AI evaluation service can now help you evaluate your translation models using MetricX, COMET, and BLEU metrics.To learn more about evaluating your translation models, seeEvaluate translation models.

November 08, 2024

Feature

Batch predictions for Llama models on Vertex AI (MaaS) is available inPreview.

Feature

Batch prediction support for Gemini

Batch prediction is available for Gemini inGeneral Availability (GA). Available Gemini models include Gemini 1.0 Pro, Gemini 1.5 Pro, and Gemini 1.5 Flash. To get started with batch prediction, seeGet batch predictions for Gemini.

November 05, 2024

Change

We are extending the availability of Gemini 1.0 Pro 001 and Gemini 1.0 Pro Vision 001 from February 15, 2025 to April 9, 2025. For details, see theDeprecations.

November 04, 2024

Change

The translation LLM now supports Polish, Turkish, Indonesian, Dutch, Vietnamese, Thai and Czech. For the full list of supported languages, see theTranslate text page.

Announcement

The Anthropic Claude Haiku 3.5 is Generally Available on Vertex AI. To learn more, view theClaude Haiku 3.5 model card in Model Garden.

October 28, 2024

Feature

TheWhisper large v3 and Whisper large v3 turbo models have been added to Model Garden.

Feature

You can now fine-tune the following models from the Cloud console:

Change

Updated the fine-tuning notebooks forGemma 2,Llama 3.1,Mistral, andMixtral with the following enhancements:

The notebooks use an updated high-performance container for single host multi-GPU LoRA fine-tuning.
- Better throughput and GPU utilization with well-tested max-sequence-lengths.
- Support for input token masking.
- No out of memory (OOM) error during fine-tuning.
Added a custom dataset example that uses a template and format validation.
Support for a default accelerator pool with quota checks.
Improved documentation.

October 22, 2024

Announcement

The Anthropic Claude Sonnet 3.5 v2 is Generally Available. To learn more, view theClaude Sonnet 3.5 v2 model card in Model Garden.

October 18, 2024

Announcement

TheLlama 3.1 405B model that is managed on Vertex AI is nowGenerally Available.

October 09, 2024

Feature

The Vertex AI Gemini API SDK supports tokenization capabilities for local token counting and computation. This is a streamlined way to compute tokens locally, ensuring compatibility across different Gemini models and their tokenizers. Supported models include gemini-1.5-flash and gemini-1.5-pro . To learn more, seeCount tokens.

October 04, 2024

Feature

The AI assistant in Vertex AI Studio can help you refine and generate prompts. This feature is inPreview. To learn more, seeUse AI-powered prompt writing tools.

Change

Added multiple deployment settings (with A100-80G and H100) and sample requests for some popular models, includingLlama 3.1,Gemma 2, andMixtral.

Feature

You can deploy Hugging Face models on Google Cloud that havetext embedding inference enabled orpytorch inference enabled. For more information, see theHugging Face model deployment in the console.

Change

Added dynamic LoRA serving forLlama 3.1 andStable Diffusion XL.

Feature

Prompt Guard andFlux were added toModel Garden.

October 01, 2024

Feature

Grounding: Dynamic retrieval for grounded results (GA)

Dynamic retrieval lets you choose when to turn off grounding with Google Search. This is useful when a prompt doesn't require an answer grounded in Google Search, and the supported models can provide an answer based on their knowledge without grounding. Dynamic retrieval helps you manage latency, quality, and cost more effectively.

This feature isGenerally Available. For more information, seeDynamic retrieval.

September 30, 2024

Feature

Prompt templates let you to test how different prompt formats perform with different sets of prompt data. This feature is inPreview. To learn more, seeUse prompt templates.

September 25, 2024

Announcement

The Llama 3.2 90B model is available inPreview on Vertex AI. Llama 3.2 90B enables developers to build and deploy the latest generative AI models and applications that use Llama's capabilities, such as image reasoning. Llama 3.2 is also designed to be more accessible for on-device applications. For more information, seeLlama models.

September 24, 2024

Announcement

New stable versions of Gemini 1.5 Pro (gemini-1.5-pro-002) and Gemini 1.5 Flash (gemini-1.5-flash-002) areGenerally Available. These models introduce broad quality improvements over the previous001 versions, with significant gains in the following categories:

Factuality and reduce model hallucinations
Openbook Q&A for RAG use cases
Instruction following
Multilingual understanding in 102 languages, especially in Korean, French, German, Spanish, Japanese, Russian, and Chinese.
SQL generation
Audio understanding
Document understanding
Long context
Math and reasoning

For more information about differences with the previous model versions, seeModel versions and lifecycle.

Feature

The new API parametersaudioTimestamp,responseLogprob, andlogprobs are inPublic Preview. For more information, seeAPI reference.

Announcement

The 2M context window with Gemini 1.5 Pro is now inGenerally Available, which opens up long-form multimodal use cases that only Gemini can support.

Feature

Use Gemini to directly analyze YouTube videos and publicly available media (such as images, audio, and video) by using a link. This feature is inPublic Preview.

Change

The latest versions ofGemini 1.5 Flash (gemini-1.5-flash-002) andGemini 1.5 Pro (gemini-1.5-pro-002) usedynamic shared quota, which distributes on-demand capacity among all queries being processed. Dynamic shared quota isGenerally Available.

Feature

Gemini 1.5 Pro and Gemini 1.5 Flash now support multimodal input withfunction calling. This feature is inPreview.

Change

Gemini 1.5 Pro and Gemini 1.5 Flash Tuning is now available inGA.Tune Gemini with text, image, audio, and document data types using the latest models:

gemini-1.5-pro-002
gemini-1.5-flash-002

Gemini 1.0 tuning remains in preview.

For more information on tuning Gemini, seeTune Gemini models by using supervised fine-tuning.

Feature

The Vertex AI prompt optimizer adapts your prompts using the optimal instructions and examples to elicit the best performance from your chosen model. This feature is available inPreview. To learn more, seeOptimize prompts.

Announcement

Controlled generation is nowGenerally Available.

September 20, 2024

Feature

Add label metadata togenerateContent andstreamGenerateContent API calls. For details, seeAdd labels to API calls.

September 18, 2024

Announcement

Model Garden supports an organization policy so that administrators can limit access to certain models and capabilities. For more information, seeControl access to Model Garden models

September 03, 2024

Change

Gemini 1.5 Flash (gemini-1.5-flash) supportscontrolled generation.

August 30, 2024

Feature

Gen AI Evaluation Service is Generally Available. To learn more, see theGen AI Evaluation Service overview.

August 26, 2024

Change

For controlled generation, you can have the model respond with an enum value in plain text, as defined in your response schema. Set theresponseMimeType totext/x.enum. For more information, seeControl generated output.

August 22, 2024

Change

AI21 Labs

Managed models from AI21 Labs are available on Vertex AI. To use a AI21 Labs model on Vertex AI, send a request directly to the Vertex AI API endpoint. For more information, seeAI21 models.

August 09, 2024

Announcement

Gemini on Vertex AI supports multiple response candidates. For details, seeGenerate content with the Gemini API.

August 05, 2024

Change

The translation LLM now supports Arabic, Hindi, and Russian. For the full list of supported languages, see theTranslate text page.

August 02, 2024

Feature

Vertex AI SDK for Python supports token listing and counting for prompts without the need to make API calls. This feature is available in (Preview). For details, seeList and count tokens.

July 31, 2024

Announcement

Gemma 2 2B is available in Model Garden. For details, seeUse Gemma open models.

Feature

New Imagen on Vertex AI image generation model and features

The Imagen 3 image generation models (imagen-3.0-generate-001 and the low-latency versionimagen-3.0-fast-generate-001) are Generally Available toapproved users. These models offer the following additional features:

Additional aspect ratios (1:1, 3:4, 4:3, 9:16, 16:9)
Digital watermark (SynthID) enabled by default
Watermark verification
User-configurable safety features (safety setting, person/face setting)

For more information, seeModel versions andGenerate images using text prompts.

Change

Resource and deployment settings were made to the following models:

Added GPU inferences forgemma2-27b andgemma2-27b-it with verified performances.
Added verified deployment settings for Mistral AI models that are deployed from Huggingface, includingmistralai/mistral-nemo-instruct-2407,mistralai/mistral-nemo-base-2407,mistralai/mistral-large-instruct-2407, andmistralai/codestral-22b-v0.1.
Added multiple deployment settings with A100 (40G), A100 (80G) and H100 (80G) for select models, such asllama3.1,llama3,gemma2,gemma, andmistral-7b.

Feature

The following models have been added toModel Garden:

Gemma 2 2B: A foundation LLM by Google DeepMind.
Qwen2: An LLM series by Alibaba Cloud.
Phi-3: An LLM series by Microsoft.

July 30, 2024

Announcement

See theGemini Online Inference on Vertex AI Service Level Agreement (SLA).

July 24, 2024

Announcement

Mistral AI

Managed models from Mistral AI are available on Vertex AI. To use a Mistral AI model on Vertex AI, send a request directly to the Vertex AI API endpoint. For more information, seeMistral AI models.

July 23, 2024

Announcement

Llama 3.1

The Llama 3.1 405B model is available inPreview on Vertex AI. Llama 3.1 405B provides capabilities from synthetic data generation to model distillation, steerability, math, tool use, multilingual translation, and more. For more information, seeLlama models.

July 02, 2024

Announcement

Google's open weight Gemma 2 model is available in Model Garden. For details, seeUse Gemma open models.

Change

MaMMUT is now available inModel Garden. MaMMUT is a vision-encoder and text-decoder model for multimodal tasks such as visual question answering, image-text retrieval, text-image retrieval, and generation of multimodal embeddings.

June 28, 2024

Feature

The following models have been added toModel Garden:

36 Hugging Face embedding models with verified deployment settings such asBAAI/bge-m3 andintfloat/multilingual-e5-large-instruct.
35 Hugging Face PyTorch models with verified deployment settings such asstabilityai/stable-diffusion-2-1.

For more information, see theHugging Face model deployment in the console.

Feature

LaunchedHex-LLM for high-efficiency large language model serving. This performant TPU serving solution is based on XLA and optimized kernels to achieve high throughput and low latency.

Hex-LLM uses several parallelism strategies for multiple TPU chips, quantizations, dynamic LoRA, and more. Hex-LLM supports the following dense and sparse LLMs:

Gemma 2B and 7B
Gemma 2 9B and 27B
Llama 2 7B, 13B and 70B
Llama 3 8B and 70B
Mistral 7B and Mixtral 8x7B

Change

Updated Docker images inLlama 3 notebooks that are more efficient at tuning.
A notebook-based interactive workshop UI was added inModel Garden for image generative models such asstable-diffusion-xl-base,image inpainting,controlnet. You can find these models from theOpen Notebook list.
Colab Notebooks for frequently used models in Model Garden have been revised with no-code or low-code implementations to improve accessibility and user experience.

June 27, 2024

Feature

Context caching is available for Gemini 1.5 Pro. Use context caching to reduce the cost of requests that contain repeat content with high input token counts. For more information, seeContext caching overview.

June 25, 2024

Feature

Controlled generation is available on Gemini 1.5 Pro and supports the JSON schema. For more information, seeControl generated output.

June 20, 2024

Announcement

The Anthropic Claude Sonnet 3.5 is Generally Available. To learn more, view theClaude Sonnet 3.5 model card in Model Garden.

June 17, 2024

Change

Increased the input token limit for Gemini 1.5 Pro from 1M to 2M. For more information, seeGoogle models.

June 11, 2024

Change

Upload media from Google Drive

You can upload media, such as PDF, MP4, WAV, and JPG files from Google Drive, when you sendimage,video,audio, anddocument prompt requests.

June 10, 2024

Feature

Experiment in the Vertex AI Studio login-free

The Vertex AI Studio multi-model prompt designer can be accessed login-free. With this feature, prospective customers can use the Vertex AI Studio to test queries before deciding to sign up and create an account. To learn more about this experience, seeVertex AI Studio console experiences or to access the console directly go toVertex AI Studio.

May 31, 2024

Change

Anthropic Claude 3.0 Opus model

TheAnthropic Claude 3.0 Opus model isGenerally Available. To learn more, see itsmodel card in Model Garden.

Change

Generative AI on Vertex AI Regional APIs

Generative AI on Vertex AI regional APIs are available in thefollowing three regions:

us-east5
me-central1
me-central2

May 28, 2024

Feature

Gemini models support thefrequencyPenalty andpresencePenalty parameters. UsefrequencyPenalty to control the probability of repeated text in a response. UsepresencePenalty to control the probability of generating more diverse content. For more information, seeGemini model parameters.

May 24, 2024

Announcement

The Gemini 1.5 Pro (gemini-1.5-pro-001) and Gemini 1.5 Flash (gemini-1.5-flash-001) models areGenerally Available. For more information, seeGoogle models,Overview of the Gemini API, andSend multimodal prompt requests.

May 20, 2024

Feature

The following models have been added toModel Garden:

E5: A text embedding model series that can be served with a GPU or CPU.
Instant ID: An identity preserving text-to-image generation model.
Stable Diffusion XL lightning: A text-to-image generation model that is based on SDXL but requires fewer inference iterations.

To see a list of all available models, seeExplore models in Model Garden.

May 14, 2024

Announcement

Gemini 1.5 Flash (Preview)

Gemini 1.5 Flash (gemini-1.5-flash-preview-0514) is available inPreview. Gemini 1.5 Flash is a multimodal model designed for fast, high volume, cost-effective text generation and chat applications. It can analyze text, code, audio, PDF, video, and video with audio.

Feature

Batch prediction support for Gemini

Batch prediction is available for Gemini inpreview. Available Gemini models include Gemini 1.0 Pro, Gemini 1.5 Pro, and Gemini 1.5 Flash. To get started with batch prediction, seeGet batch predictions for Gemini.

Feature

Grounding Gemini with Google Search is GA

The Gemini API Grounding with Google Search feature is available inGA. This is available for Gemini 1.0 Pro models. To learn more about model grounding, seeGrounding with Google Search.

Announcement

PaliGemma model

ThePaliGemma model is available. PaliGemma is a lightweight open model that's part of the Google Gemma model family. It's the Gemma model family's best model option for image captioning tasks and visual question and answering tasks. Gemma models are based on Gemini models and intended to be extended by customers.

Feature

New stable text embedding models

The following text embedding models are availableGA:

text-embedding-004
text-multilingual-embedding-002

For details on how to use these models, seeGet text embeddings.

April 18, 2024

Feature

Meta's open weightLlama 3 model is available in the Vertex AI Model Garden.

April 11, 2024

Announcement

Anthropic Claude 3.0 Opus model

TheAnthropic Claude 3.0 Opus model is available inPreview. The Claude 3.0 Opus model is an Anthropic partner model that you can use with Vertex AI. It's the most capable of the Anthropic models at performing complex tasks quickly. To learn more, see itsmodel card in Model Garden.

April 09, 2024

Feature

New Imagen on Vertex AI image generation model and features

The 006 version of the Imagen 2 image generation model (imagegeneration@006) is now available. This model offers the following additional features:

Additional aspect ratios (1:1, 3:4, 4:3, 9:16, 16:9)
Digital watermark (SynthID) enabled by default
Watermark verification*
New user-configurable safety features (safety setting, person/face setting)

For more information, seeModel versions andGenerate images using text prompts.

* Theseed field can't be used while digital watermark is enabled.

Feature

New Imagen on Vertex AI image editing model and features

The 006 version of the Imagen 2 image editing model (imagegeneration@006) is now available. This model offers the following additional features:

Inpainting - Add or remove content from a masked area of an image
Outpainting - Expand a masked area of an image
Product image editing - Identify and maintain a primary product while changing the background or product position

For more information, seeModel versions.

Change

Change in Imagen image generation version 006 (imagegeneration@006)seed field behavior

For the new Imagen image generation model version 006 (imagegeneration@006) theseed field behavior has changed. For the v.006 model a digital watermark is enabled by default for image generation. To be able to use aseed value to get deterministic output you must disable digital watermark generation by setting the followingparameter:"addWatermark": false.

For more information, see theImagen for image generation and editing API reference.

Announcement

Gemini 1.5 Pro (Preview)

Gemini 1.5 Pro (gemini-1.5-pro-preview-0409) is available inPreview. Gemini 1.5 Pro is a multimodal model that analyzes text, code, audio, PDF, video, and video with audio.

Announcement

CodeGemma model

TheCodeGemma model is available. CodeGemma is a lightweight open model that's part of the Google Gemma model family. CodeGemma is the Gemma model family's code generation and code completion offering. Gemma models are based on Gemini models and intended to be extended by customers.

Change

Vertex AI Studio features and updates

The Vertex AI Studio supports side-by-side comparison to allow users to compare up to 3 prompts in a side-by-side view.
The Vertex AI Studio supports rapid evaluation in console and the ability to upload a ground truth response (or a model response to try to emulate).

To learn more, seeTry your prompts in Vertex AI Studio

Feature

Generative AI on Vertex AI security control update

Security controls are available for the online prediction feature for Gemini 1.0 Pro and Gemini 1.0 Pro Vision.

Announcement

Gemini 1.0 Pro stable version 002

The 002 version of the Gemini 1.0 Pro multimodal model (gemini-1.0-pro-002) is available. For more information about stable versions of Gemini models, seeGemini model versions and lifecycle.

Feature

Text translation

Translate text in Vertex AI Studio is available inPreview.

Change

Regional APIs

Regional APIs areavailable in 11 new countries for Gemini, Imagen, and embeddings.
US and EU have machine-learning processing boundaries for thegemini-1.0-pro-001,gemini-1.0-pro-002,gemini-1.0-pro-vision-001, andimagegeneration@005 models.

Feature

System instructions

System instructions are supported inPreview by the Gemini 1.0 Pro (stable versiongemini-1.0-pro-002 only) and Gemini 1.5 Pro (Preview) multimodal models. Use system instructions to guide model behavior based on your specific needs and use cases. For more information, seeSystem instructions examples.

Feature

New text embedding models

The following text embedding models are now inPreview.

text-embedding-preview-0409
text-multilingual-embedding-preview-0409

When evaluated using theMTEB benchmarks, these models produce better embeddings compared to previous versions. The new models also offerdynamic embedding sizes, which you can use to output smaller embedding dimensions, with minor performance loss, to save on computing and storage costs.

For details on how to use these models, refer to thepublic documentation and try out ourColab.

Feature

Supervised Tuning for Gemini

Supervised tuning is available for thegemini-1.0-pro-002 model.

Change

Generative AI Knowledge Base

TheJump Start Solution: Generative AI Knowledge Base demonstrates how to build a simple chatbot with business- and domain-specific knowledge.

Feature

Online Evaluation Service

Generative AI evaluation supportsonline evaluation in addition topipeline evaluation. The list of supported evaluation metrics has also expanded. SeeAPI reference andSDK reference.

Feature

Grounding Gemini and Grounding with Google Search

The Gemini API now supports Grounding with Google Search inPreview. Currently available for Gemini 1.0 Pro models.

April 02, 2024

Feature

Model Garden supports allText Generation Inference supported models inHuggingFace:

Verified deployment settings for about 400 Hugging Face text generation models (includinggoogle/gemma-7b-it,meta-llama/Llama-2-7b-chat-hf, andmistralai/Mistral-7B-v0.1).
Other Hugging Face text generation models have unverified deployment settings that are auto generated.

March 29, 2024

Change

The MedLM-large model infrastructure has been upgraded to improvelatency and stability. Responses from the model might be slightly different.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-17 UTC.

Movatterモバイル変換

Vertex AI release notes Stay organized with collections Save and categorize content based on your preferences.

December 17, 2025

December 16, 2025

December 12, 2025

December 10, 2025

December 09, 2025

December 08, 2025

December 02, 2025

November 24, 2025

November 17, 2025

November 13, 2025

November 11, 2025

November 07, 2025

November 04, 2025

October 23, 2025

October 21, 2025

October 16, 2025

October 15, 2025

October 14, 2025

October 09, 2025

October 07, 2025

October 06, 2025

October 03, 2025

October 02, 2025

September 30, 2025

September 25, 2025

September 24, 2025

September 23, 2025

September 22, 2025

September 18, 2025

September 15, 2025

September 10, 2025

September 09, 2025

September 08, 2025

September 03, 2025

August 26, 2025

August 21, 2025

August 14, 2025

August 13, 2025

August 08, 2025

August 07, 2025

August 06, 2025

July 29, 2025

July 23, 2025

July 22, 2025

July 17, 2025

July 16, 2025

July 14, 2025

July 08, 2025

July 03, 2025

June 27, 2025

June 24, 2025

June 23, 2025

June 17, 2025

June 16, 2025

June 11, 2025

June 09, 2025

June 05, 2025

June 03, 2025

May 23, 2025

May 22, 2025

May 20, 2025

May 14, 2025

May 07, 2025

May 05, 2025

May 02, 2025

April 30, 2025

April 29, 2025

April 17, 2025

April 10, 2025

April 09, 2025

March 25, 2025

March 20, 2025

March 17, 2025

March 14, 2025

March 13, 2025

March 12, 2025

March 11, 2025

March 04, 2025

Vertex AI release notes