Migrate to the latest Gemini models

This guide explains how to update your application to the latest Geminiversion. This guide assumes your application already uses an older Geminiversion. To learn how to start using Gemini in Vertex AI, seetheGemini API in Vertex AI quickstart.

This guide doesn't cover how to switch your application from theVertex AI SDK to the current Google Gen AI SDK. For that information,see ourVertex AI SDK migration guide.

What changes should I expect?

Updating most generative AI applications to the latest Gemini versionrequires few code or prompt changes. However, some applications may requireprompt adjustments. It's hard to predict these changes without first testingyour prompts with the new version. Thorough testing is recommended before fullymigrating. For tips on creating effective prompts, see ourprompt strategyguidance.Use ourprompt health checklistto help find and fix prompt issues.

You only need to make major code changes for certainbreakingchanges or to use new Gemini capabilities.

Which Gemini model should I migrate to?

The Gemini model you use depends on your application's needs.The following table compares the older Gemini 1.5 modelswith the latest Gemini models:

Feature	2.0 Flash	2.0 Flash-Lite	2.5 Pro	2.5 Flash	2.5 Flash-Lite	3 Flash	3 Pro	3.1 Pro
Launch stage	GA	GA	GA	GA	GA	Preview	Preview	Preview
Input modalities	Text,Code,Images,Audio,Video	Text,Code,Images,Audio,Video	Text,Code,Images,Audio,Video	Text,Code,Images,Audio,Video	Text,Code,Images,Audio,Video	Text,Code,Images,Audio,Video,PDF	Text,Code,Images,Audio,Video,PDF	Text,Code,Images,Audio,Video,PDF
Output modalities	Text	Text	Text	Text	Text	Text	Text	Text
Context window, total token limit	1,048,576	1,048,576	1,048,576	1,048,576	1,048,576	1,048,576	1,048,576	1,048,576
Output context length	8,192 (default)	8,192 (default)	65,535 (default)	65,535 (default)	65,535 (default)	65,536	65,536	65,536
Grounding with Google Search
Function calling
Code execution
Implicit context caching
Explicit context caching
Batch prediction
Gemini Live API
Fine-tuning
Latency
Recommended SDK	Gen AI SDK	Gen AI SDK	Gen AI SDK	Gen AI SDK	Gen AI SDK	Gen AI SDK	Gen AI SDK	Gen AI SDK
Pricing units	Token	Token	Token	Token	Token	Token	Token	Token
Retirement date	June 1, 2026	June 1, 2026	June 17, 2026	June 17, 2026	July 22, 2026

Before you begin migrating

Before you start the migration process, you should consider the following:

InfoSec, governance, and regulatory approvals

Obtain approvals from your information security (InfoSec), risk, and compliance teams early. Cover any specific risk and compliance rules, especially in regulated industries like healthcare and finance.

Caution: Security control support varies by model. For details on each Gemini model's support level, see the security controls guide.

Location availability

Google and Partner models and generative AI features on Vertex AI are available through specificregional endpoints and aglobal endpoint. Global endpoints cover the entire world and offer improved availability and reliability compared to single regions.

Regional endpoint availability varies by model. For details on each model, see ourlocations guide.

Modality and tokenization-based pricing differences

Pricing varies between each Gemini model. Ourpricing page lists costs for all modalities (text, code, images, speech, etc.) per model.

Note: Gemini 2 and later models are priced per token for inputs and outputs. Gemini 1.5 models are priced per character.

Purchase or change Provisioned Throughput orders

If needed,purchase more Provisioned Throughput orchange existing Provisioned Throughput orders.

Supervised fine-tuning

The latest Gemini models offer better output quality. This can mean your application no longer needs a fine-tuned model. If your application usessupervised fine-tuning with an older Gemini model, first test your application with the latest model without fine-tuning and evaluate the results.

If you choose to use supervised fine-tuning, you cannot move your existing tuned model from older Gemini versions. You need to run a new tuning job for the new Gemini version.

When tuning a new Gemini model, start with the default tuning settings. Don't reuse hyperparameter values from previous Gemini versions, because the tuning service is optimized for the latest versions. Reusing old settings is unlikely to give optimal results.

Regression testing

When upgrading to the latest Gemini version, you'll need three main types of regression tests:

Code regression tests: Regression testing from a software engineering and developer operations (DevOps) perspective. This type of regression testing isalways required.
Model performance regression tests: Regression testing from a data science or machine learning perspective. This means ensuring that the new Gemini model version provides outputs that at least maintain the same level of quality as the previous version.
Model performance regression tests are model evaluations done when a system or its underlying model changes. They include:
- Offline performance testing: Tests that assert the quality of model outputs in a dedicated experimentation environment based on various model output quality metrics.
- Online model performance testing: Tests that assert the quality of model outputs in a live, online deployment based on implicit or explicit user feedback.
Load testing: These tests check how well the application handles many requests at once. Load testing isrequired for applications that useProvisioned Throughput.

How to migrate to the latest version

The following sections outline the steps to migrate to the latest Geminiversion. For optimal results, complete these steps in order.

1. Document model evaluation and testing requirements

Prepare to repeat any relevant evaluations you performed when you firstbuilt your application, plus any evaluations performed since then.
If your current evaluations don't fully cover or measure all tasks yourapplication performs, design and prepare more evaluations. You can use ourevaluation playbookand ourevaluationrecipesto help you get started.
If your application involves RAG, tool use, complex agentic workflows, orprompt chains, make sure that your existing evaluation data allows forassessing each component independently. If not, gather input-output examplesfor each component.
If your application is critical or part of a larger user-facing real-timesystem, include online evaluation.

2. Make code upgrades and run tests

Upgrading your code requires three main changes:

The following sections goes over these changes in further detail.

Upgrade to the Google Gen AI SDK

If your Gemini 1.x application uses the Vertex AI SDK, switch tothe Gen AI SDK. See ourVertex AI SDK migrationguide fordetails, including code examples for making similar calls with theGen AI SDK. Vertex AI SDK releases after June 2026 won'tsupport Gemini, and new Gemini features are only availablein the Gen AI SDK.

If you're new to the Gen AI SDK, see theGetting started withGoogle Generative AI using the Gen AI SDKnotebook.

Change your Gemini calls

Update your prediction code to use one of the latest Gemini models. Ata minimum, this means changing the model endpoint name.

The exact code changes will vary based on how you built your application,especially whether you used the Gen AI SDK or theVertex AI SDK.

After making code changes, run code regression tests and other software teststo ensure your code functions as expected. This step checks whether the codefunctions, but not the quality of model responses.

Fix breaking code changes

Dynamic retrieval: Switch to usingGrounding with Google Search.This feature requires theGen AI SDKand isn't supported by the Vertex AI SDK.
Content filters: Note thedefault content filter settings.Change your code if it relies on a default that has changed.
Top-K token sampling parameter:Models aftergemini-1.0-pro-vision don't support changing theTop-K parameter.
Thinking: Gemini 3 Pro and later models use thethinking_level parameter instead ofthinking_budget. For moreinformation, seeControl model thinking.
Thought signatures: For Gemini 3 Pro and later models, if athought signature is expected in a turn but not provided, the model returnsan error instead of a warning. SeeThought signatures.
Media resolution and tokenization: Gemini 3 Pro and latermodels use a variable sequence length for media tokenization instead ofPan and Scan, and have new default resolutions and token costs forimages, PDFs, and video. SeeImage understandingandVideo understanding.
Usage metadata: For Gemini 3 Pro and later models, PDF tokencounts inusage_metadata are reported under theIMAGE modalityinstead ofDOCUMENT.
Image segmentation: Image segmentation is not supported byGemini 3 Pro and later models.
Multimodal function responses: For Gemini 3 Pro and latermodels, you can include image and PDF data in function responses. SeeMultimodal function responses.
PDF processing: For Gemini 3 Pro and later models, OCR isnot used by default when processing scanned PDFs.

For this step, focus only on code changes. You may need to make other changeslater, but wait until you start your evaluation. After your evaluations, considerthese adjustments based on the evaluation results:

If you're switching from dynamic retrieval, you may need to adjust yoursystem instructions to control when Google Search is used (forexample,"Only generate queries for the Google Search tool if theuser asks about sports. Don't generate queries for any other topic.").However, wait until you evaluate before changing prompts.
If you used theTop-K parameter, adjust other token sampling parameters,likeTop-P,to get similar results.

3. Run offline evaluations

Repeat the evaluations you performed when you first developed and launched yourapplication, any offline evaluations done since then, and any additionalevaluations you identified instep 1. If you still feel your evaluation doesn'tfully cover your application's scope, conduct further evaluations.

If you don't have an automated way to run offline evaluations, consider using theGen AI evaluation service.

If your application uses fine-tuning, perform offline evaluation beforere-tuning your model with the latest version of Gemini. The latestmodels offer improved output quality, which can mean your application no longerneeds a fine-tuned model.

4. Assess evaluation results and tune your prompts and hyperparameters

If your offline evaluation shows your application performing less effectively, improveyour application until its performance matches the older model. Do this by:

Iteratively refining your prompts to boost performance ("Hill Climbing").If you're new to hill climbing, see theVertex Gemini hillclimbing online training.TheVertex AI prompt optimizer(example notebook)can also help.
If your application is affected by Dynamic Retrieval and Top-Kbreakingchanges, experiment with adjusting your prompt andtoken sampling parameters.

5. Run load tests

If your application needs a certain minimum throughput, perform load testing toensure the latest version of your application meets your throughputrequirements.

Load testing must occur before online evaluation, because online evaluationinvolves exposing the model to live traffic. Use your existing load testingtools and instrumentation for this step.

If your application already meets throughput needs, consider usingProvisioned Throughput.You'll need extra short-term Provisioned Throughput to cover load testingwhile your current Provisioned Throughput order handles production traffic.

6. (Optional) Run online evaluations

Move to online evaluation only if your offline evaluation shows highGemini output qualityand your application requires onlineevaluation.

Online evaluation is a specific type of online testing. Try to use yourorganization's existing tools and methods for online evaluation. For example:

If your organization regularly performsA/Btests, perform one to compareyour application's current version with the latest Gemini version.
If your organization regularly usescanarydeployments,use them with the latest models and measure changes in userbehavior.

You can also do online evaluation by adding new feedback and measurementfeatures to your application. Different applications need different feedbackmethods. For example:

Adding thumbs-up and thumbs-down buttons next to model outputs andcomparing the rates between an older model and the latest Gemini models.
Showing users outputs from both the older model and the latest modelsside-by-side and asking them to pick their favorite.
Tracking how often users override or manually adjust outputs from the oldermodel versus the latest models.

These feedback methods often require running the latest Gemini versionalongside your existing version. This parallel deployment is sometimes called"shadow mode" or "blue-green deployment."

If online evaluation results differ greatly from offline evaluation results,your offline evaluation isn't capturing key aspects of the live environment oruser experience. Apply the online evaluation findings to create a new offlineevaluation that covers the gap, then return to step 3.

If you use Provisioned Throughput, you may need topurchase additionalshort-term Provisioned Throughputto continue to meet your throughput requirements for users in online evaluation.

7. Deploy to production

Once your evaluation shows that the latest Gemini model performs aswell as or better than an older model, replace the existingapplication version with the new version. Follow your organization's standardprocedures for production rollout.

If you're usingProvisioned Throughput, changeyour Provisioned Throughput order to your chosen Gemini model. Ifyou're rolling out your application incrementally, use short-termProvisioned Throughput to meet throughput needs for two differentGemini models.

Improving model performance

As you migrate, apply these tips to achieve optimal performance from yourchosen Gemini model:

For Gemini 3 Pro and later models, Google strongly recommendskeeping thetemperatureparameter at its default value of1.0. While previous models oftenbenefited from tuning temperature to control creativity versusdeterminism, Gemini 3 Pro and later models' reasoningcapabilities are optimized for the default setting. Changing thetemperature (setting it to less than1.0) may lead to unexpected behavior,such as looping or degraded performance, particularly in complexmathematical or reasoning tasks.
Check yoursysteminstructions,prompts,andfew-shot learningexamples forany inconsistencies, contradictions, or irrelevant instructions andexamples.
Test a more powerful model. For example, if you evaluatedGemini 2.0 Flash-Lite, tryGemini 2.0 Flash.
Review automated evaluation results to ensure they match human judgment,especially results using ajudgemodel. Ensureyour judge model instructions are clear, consistent, and unambiguous.
To improve judge model instructions, test the instructions with multiplehumans working in isolation. If humans interpret the instructionsdifferently and provide different judgments, your judge model instructionsare unclear.
Fine-tune the model.
Examine evaluation outputs for patterns that show specific types offailures. Grouping failures by model, type, or category provides moretargeted evaluation data, making it easier to adjust prompts to fix theseerrors.
Ensure you are evaluating different generative AI components independently.
Experiment with adjustingtoken samplingparameters.

Getting help

If you require assistance, Google Cloud offers support packages to meet your needs, suchas 24/7 coverage, phone support, and access to a technical support manager. Formore information, seeGoogle Cloud Support.

What's next

Guide

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-19 UTC.

Movatterモバイル変換

Migrate to the latest Gemini models

What changes should I expect?

Which Gemini model should I migrate to?

Before you begin migrating

InfoSec, governance, and regulatory approvals

Location availability

Modality and tokenization-based pricing differences

Purchase or change Provisioned Throughput orders

Supervised fine-tuning

Regression testing

How to migrate to the latest version

1. Document model evaluation and testing requirements

2. Make code upgrades and run tests

Upgrade to the Google Gen AI SDK

Change your Gemini calls

Fix breaking code changes

3. Run offline evaluations

4. Assess evaluation results and tune your prompts and hyperparameters

5. Run load tests

6. (Optional) Run online evaluations

7. Deploy to production

Improving model performance

Getting help

What's next

Get started with Vertex AI

Google models

Partner models

Open models

Frequently asked questions

Movatterモバイル変換

Migrate to the latest Gemini models Stay organized with collections Save and categorize content based on your preferences.

What changes should I expect?

Which Gemini model should I migrate to?

Before you begin migrating

InfoSec, governance, and regulatory approvals

Location availability

Modality and tokenization-based pricing differences

Purchase or change Provisioned Throughput orders

Supervised fine-tuning

Regression testing

How to migrate to the latest version

1. Document model evaluation and testing requirements

2. Make code upgrades and run tests

Upgrade to the Google Gen AI SDK

Change your Gemini calls

Fix breaking code changes

3. Run offline evaluations

4. Assess evaluation results and tune your prompts and hyperparameters

5. Run load tests

6. (Optional) Run online evaluations

7. Deploy to production

Improving model performance

Getting help

What's next

Get started with Vertex AI

Google models

Partner models

Open models

Frequently asked questions

Migrate to the latest Gemini models