About preference tuning for Gemini models Stay organized with collections Save and categorize content based on your preferences.
To see an example of preference tuning, run the "Get Started with Gemini Preference Optimization" notebook in one of the following environments:
Open in Colab |Open in Colab Enterprise |Openin Vertex AI Workbench |View on GitHub
Vertex AI preference tuning lets you tune your Gemini modelswith human feedback data.
Preference tuning enables the model to learn from subjective userpreferences that are hard to define by using specific labels or through supervisedfine-tuning alone.
The preference tuning input dataset contains examples consisting of a prompt andpair of responses indicating which one is preferred and which one isdispreferred. The model learns to generate preferred responses with higherprobability and dispreferred responses with lower probability.
To learn how to prepare the dataset, seePrepare preference tuning data for Gemini models.
Supported models
The following Gemini models support preference tuning:
Limitations
| Specification | Value |
|---|---|
| Modalities | Text |
| File size of the training dataset | 1GB |
| Maximum input and output tokens per training example | 131,072 |
| Maximum input and output serving tokens | Same as base Gemini model |
| Maximum number of training examples in a training dataset | 10M text-only training examples |
| Maximum validation dataset size | 5000 examples or 30% of the number of training examples if there are more than 1000 validation examples |
| Adapter size | Supported values are 1, 2, 4, 8, and 16 |
Best practices
Before you apply the preference optimization algorithmto your model, we strongly recommend that you do the following:
- Tune the model usingsupervised fine-tuningon the preferred response data. This teaches the model to generate preferredresponses during inference.
- Continue tuningfrom the checkpoint produced from step 1 using preference tuning. Thisteaches the model to increase the likelihood gap between preferred anddispreferred responses.
For creating the supervised fine-tuning dataset, use the prompt and acceptedresponse pairs in your preference dataset as prompt and target for yoursupervised fine-tuning dataset. Typically one or two epochs of supervisedfine-tuning should be sufficient, although this can change based on the datasetsize and how aligned your training dataset is with the Gemini modelinitially.
To use supervised fine-tuning to tune the model, follow the steps inTune Gemini models by using supervised fine-tuning.
Quota
Quota is enforced on the number of concurrent tuning jobs. Every project comeswith a default quota to run at least one tuning job. This is a global quota,shared across all available regions and supported models. If you want to runmore jobs concurrently, you need torequest additional quota forGlobal concurrent tuning jobs.
Pricing
Pricing for Gemini preference tuningcan be found here:Vertex AI pricing.
For pricing purposes, the number of tokens for each tuning example is calculatedby multiplying the number of tokens in the prompt by 2, and then adding thenumber of completion tokens.
What's next
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2026-02-19 UTC.
Open in Colab
Open in Colab Enterprise
Openin Vertex AI Workbench
View on GitHub