About preference tuning for Gemini models

To see an example of preference tuning, run the "Get Started with Gemini Preference Optimization" notebook in one of the following environments:

Open in Colab |Open in Colab Enterprise |Openin Vertex AI Workbench |View on GitHub

Vertex AI preference tuning lets you tune your Gemini modelswith human feedback data.

Preference tuning enables the model to learn from subjective userpreferences that are hard to define by using specific labels or through supervisedfine-tuning alone.

The preference tuning input dataset contains examples consisting of a prompt andpair of responses indicating which one is preferred and which one isdispreferred. The model learns to generate preferred responses with higherprobability and dispreferred responses with lower probability.

To learn how to prepare the dataset, seePrepare preference tuning data for Gemini models.

Supported models

The following Gemini models support preference tuning:

Limitations

Specification	Value
Modalities	Text
File size of the training dataset	1GB
Maximum input and output tokens per training example	131,072
Maximum input and output serving tokens	Same as base Gemini model
Maximum number of training examples in a training dataset	10M text-only training examples
Maximum validation dataset size	5000 examples or 30% of the number of training examples if there are more than 1000 validation examples
Adapter size	Supported values are 1, 2, 4, 8, and 16

Best practices

Before you apply the preference optimization algorithmto your model, we strongly recommend that you do the following:

Tune the model usingsupervised fine-tuningon the preferred response data. This teaches the model to generate preferredresponses during inference.
Continue tuningfrom the checkpoint produced from step 1 using preference tuning. Thisteaches the model to increase the likelihood gap between preferred anddispreferred responses.

For creating the supervised fine-tuning dataset, use the prompt and acceptedresponse pairs in your preference dataset as prompt and target for yoursupervised fine-tuning dataset. Typically one or two epochs of supervisedfine-tuning should be sufficient, although this can change based on the datasetsize and how aligned your training dataset is with the Gemini modelinitially.

To use supervised fine-tuning to tune the model, follow the steps inTune Gemini models by using supervised fine-tuning.

Quota

Quota is enforced on the number of concurrent tuning jobs. Every project comeswith a default quota to run at least one tuning job. This is a global quota,shared across all available regions and supported models. If you want to runmore jobs concurrently, you need torequest additional quota forGlobal concurrent tuning jobs.

Pricing

Pricing for Gemini preference tuningcan be found here:Vertex AI pricing.

For pricing purposes, the number of tokens for each tuning example is calculatedby multiplying the number of tokens in the prompt by 2, and then adding thenumber of completion tokens.

What's next

Prepare apreference tuning dataset.
Learn about deploying a tuned Gemini model.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-19 UTC.

Movatterモバイル変換

About preference tuning for Gemini models Stay organized with collections Save and categorize content based on your preferences.