About supervised fine-tuning for Gemini models

Supervised fine-tuning is a good option when you have a well-defined task withavailable labeled data. It's particularly effective for domain-specificapplications where the language or content significantly differs from the datathe large model was originally trained on. You can tunetext,image,audio,video, anddocumentdata types.You can also create Gemini-based applications and agents that caninteract with real-time information and services like databases, customerrelationship management systems, and document repositories.

Supervised fine-tuning adapts model behavior with a labeled dataset. This processadjusts the model's weights to minimize the difference between its predictionsand the actual labels. For example, it can improve model performance for thefollowing types of tasks:

  • Classification
  • Summarization
  • Extractive question answering
  • Chat

For a discussion of the top tuning use cases, check out the blog postHundreds of organizations are fine-tuning Gemini models. Here's their favorite use cases.

To learn more, seeWhen to use supervised fine-tuning for Gemini.

Supported models

The following Gemini models support supervised fine-tuning:

For models that supportthinking, we suggest settingthe thinking budget to off or its lowest value. This can improve performance andreduce costs for tuned tasks. During supervised fine-tuning, the model learnsfrom the training data and omits the thinking process. Therefore, the resultingtuned model can perform tuned tasks effectively without a thinking budget.

Limitations

Supervised fine-tuning is not a Covered Service and is excluded from the SLO ofany Service Level Agreement.

The following table shows the limitations on supervised fine-tuning datasets:

Gemini 2.5 Flash
Gemini 2.5 Flash-Lite

SpecificationValue
Maximum input and output tokens per training example131,072
Maximum input and output serving tokensSame as base Gemini model
Maximum number of examples in a validation dataset5000 examples or 30% of the number of training examples if there are more than 1000 validation examples
Maximum training dataset file size1GB for JSONL
Maximum training dataset size10M text-only examples or 300K multimodal examples
Adapter sizeSupported values are 1, 2, 4, 8, and 16

Gemini 2.5 Pro

SpecificationValue
Maximum input and output training tokens131,072
Maximum input and output serving tokensSame as base Gemini model
Maximum validation dataset size5000 examples or 30% of the number of training examples if there are more than 1000 validation examples
Maximum training dataset file size1GB for JSONL
Maximum training dataset size10M text-only examples or 300K multimodal examples
Adapter sizeSupported values are 1, 2, 4, and 8

Gemini 2.0 Flash
Gemini 2.0 Flash-Lite

SpecificationValue
Maximum input and output training tokens131,072
Maximum input and output serving tokensSame as base Gemini model
Maximum validation dataset size5000 examples or 30% of the number of training examples if there are more than 1000 validation examples
Maximum training dataset file size1GB for JSONL
Maximum training dataset size10M text-only examples or 300K multimodal examples
Adapter sizeSupported values are 1, 2, 4, and 8

Known issues

  • Applyingcontrolled generationwhen submitting inference requests to tuned Gemini models canresult in decreased model quality due todata misalignment during tuning and inference time. During tuning,controlled generation isn't applied, so the tuned model isn't able tohandle controlled generation well at inference time. Supervised fine-tuningeffectively customizes the model to generate structured output. Thereforeyou don't need to apply controlled generation when making inference requestson tuned models.

Use cases for using supervised fine-tuning

Foundation models work well when the expected output or task can be clearlyand concisely defined in a prompt and the prompt consistently produces theexpected output. If you want a model to learn something niche or specific thatdeviates from general patterns, then you might want to considertuning that model. For example, you can use model tuning to teach the model thefollowing:

  • Specific structures or formats for generating output.
  • Specific behaviors such as when to provide a terse or verbose output.
  • Specific customized outputs for specific types of inputs.

The following examples are use cases that are difficult to capture with onlyprompt instructions:

  • Classification: The expected response is a specific word or phrase.

    Tuning the model can help prevent the model from generating verbose responses.

  • Summarization: The summary follows a specific format. For example, youmight need to remove personally identifiable information (PII) in a chatsummary.

    This formatting of replacing the names of the speakers with#Person1 and#Person2 is difficult to describe and the foundation model might not naturallyproduce such a response.

  • Extractive question answering: The question is about a context and theanswer is a substring of the context.

    The response "Last Glacial Maximum" is a specific phrase from the context.

  • Chat: You need to customize model response to follow a persona, role,or character.

You can also tune a model in the following situations:

  • Prompts are not producing the expected results consistently enough.
  • The task is too complicated to define in a prompt. For example, you want themodel to do behavior cloning for a behavior that's hard to articulate in aprompt.
  • You have complex intuitions about a task that are difficult to formalize ina prompt.
  • You want to reduce the context length by removing the few-shot examples.

Configure a tuning job region

User data, such as the transformed dataset and the tuned model, is stored in thetuning job region. During tuning, computation could be offloaded to otherUS orEU regions for available accelerators. The offloading is transparent to users.

  • If you use the Vertex AI SDK, you can specify the region atinitialization. For example:

    importvertexaivertexai.init(project='myproject',location='us-central1')
  • If you create a supervised fine-tuning job by sending a POST request usingthetuningJobs.createmethod, then you use the URL to specify the region where the tuning jobruns. For example, in the following URL, you specify a region byreplacing both instances ofTUNING_JOB_REGION with the regionwhere the job runs.

    https://TUNING_JOB_REGION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/TUNING_JOB_REGION/tuningJobs
  • If you use theGoogle Cloud console,you can select the region name in theRegiondrop-down field on theModel details page. This is the same pagewhere you select the base model and a tuned model name.

Evaluating tuned models

You can evaluate tuned models in the following ways:

  • Tuning and validation metrics:Evaluate the tuned model usingtuning and validation metrics after the tuning job completes.

  • Integrated evaluation with Gen AI evaluation service (Preview):Configure tuning jobs to automatically run evaluations using theGen AI evaluation service during tuning. The following interfaces, models, and regions are supported for the tuning integration with Gen AI evaluation service:

    • Supported interfaces: Google Gen AI SDK and REST API.

    • Supported models:gemini-2.5-pro,gemini-2.5-flash, andgemini-2.5-flash-lite.

    • Supported regions: For a list of supported regions, seeSupported regions.

Quota

Quota is enforced on the number of concurrent tuning jobs. Every project comeswith a default quota to run at least one tuning job. This is a global quota,shared across all available regions and supported models. If you want to run more jobs concurrently, you need torequest additional quota forGlobal concurrent tuning jobs.

If you configure theGen AI evaluation service to run evaluations automatically during tuning, see theGen AI evaluation service quotas.

Pricing

Pricing for Gemini supervised fine-tuningcan be found here:Vertex AI pricing.

The number of training tokens is calculated by multiplying the number of tokens in your training datasetby the number of epochs. After tuning, inference (prediction request) costsfor the tuned model still apply. Inference pricing is the same for each stable version of Gemini.For more information, seeAvailable Gemini stable model versions.

If you configure the Gen AI evaluation service to run automatically during tuning, evaluations are charged as batch prediction jobs. For more information, seePricing.

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-17 UTC.