About preference tuning for Gemini models

Vertex AI preference tuning lets you tune your Gemini modelswith human feedback data.

Preference tuning enables the model to learn from subjective userpreferences that are hard to define by using specific labels or through supervisedfine-tuning alone.

The preference tuning input dataset contains examples consisting of a prompt andpair of responses indicating which one is preferred and which one isdispreferred. The model learns to generate preferred responses with higherprobability and dispreferred responses with lower probability.

To learn how to prepare the dataset, seePrepare preference tuning data for Gemini models.

Supported models

The following Gemini models support preference tuning:

Limitations

SpecificationValue
ModalitiesText
File size of the training dataset1GB
Maximum input and output tokens per training example131,072
Maximum input and output serving tokensSame as base Gemini model
Maximum number of training examples in a training dataset10M text-only training examples
Maximum validation dataset size5000 examples or 30% of the number of training examples if there are more than 1000 validation examples
Adapter sizeSupported values are 1, 2, 4, 8, and 16

Best practices

Before you apply the preference optimization algorithmto your model, we strongly recommend that you do the following:

  1. Tune the model usingsupervised fine-tuningon the preferred response data. This teaches the model to generate preferredresponses during inference.
  2. Continue tuningfrom the checkpoint produced from step 1 using preference tuning. Thisteaches the model to increase the likelihood gap between preferred anddispreferred responses.

For creating the supervised fine-tuning dataset, use the prompt and acceptedresponse pairs in your preference dataset as prompt and target for yoursupervised fine-tuning dataset. Typically one or two epochs of supervisedfine-tuning should be sufficient, although this can change based on the datasetsize and how aligned your training dataset is with the Gemini modelinitially.

To use supervised fine-tuning to tune the model, follow the steps inTune Gemini models by using supervised fine-tuning.

Quota

Quota is enforced on the number of concurrent tuning jobs. Every project comeswith a default quota to run at least one tuning job. This is a global quota,shared across all available regions and supported models. If you want to runmore jobs concurrently, you need torequest additional quota forGlobal concurrent tuning jobs.

Pricing

Pricing for Gemini preference tuningcan be found here:Vertex AI pricing.

For pricing purposes, the number of tokens for each tuning example is calculatedby multiplying the number of tokens in the prompt by 2, and then adding thenumber of completion tokens.

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-19 UTC.