Image tuning

This page provides prerequisites and detailed instructions for fine-tuningGemini on image data using supervised learning.

Use cases

Fine-tuning lets you adapt base Gemini models for specialized tasks.Here are some image use cases:

  • Product catalog enhancement: Extract key attributes from images (e.g.,brand, color, size) to automatically build and enrich your product catalog.
  • Image moderation: Fine-tune a model to detect and flag inappropriate orharmful content in images, ensuring a safer online experience.
  • Visual inspection: Train a model to identify specific objects or defectswithin images, automating quality control or inspection processes.
  • Image classification: Improve the accuracy of image classification for specificdomains, such as medical imaging or satellite imagery analysis.
  • Image-based recommendations: Analyze images to provide personalizedrecommendations, such as suggesting similar products or complementary items.
  • Table content extraction: Extract data from tables within images and convertit into structured formats like spreadsheets or databases.

Limitations

  • Maximum images per example: 30
  • Maximum image file size: 20MB

To learn more about image sample requirements, see theImage understanding page.

Dataset format

ThefileUri for your dataset can be the URI for a file in a Cloud Storagebucket, or it can be a publicly available HTTP or HTTPS URL.

ThemediaResolution field of theGenerationConfig objectis used to control the tradeoff between the quality of the media file being sentfor tuning and the number of tokens used to represent the media. A higherresolution allows the model to perceive more detail, which can lead to a morenuanced tuned model behavior, but it will also use more tokens. This doesn'taffect the image dimensions sent to the model. When not specified, the defaultresolution isMEDIA_RESOLUTION_HIGH.

Following are the supportedmediaResolution values for fine-tuning image data:

  • MEDIA_RESOLUTION_LOW: 64 tokens
  • MEDIA_RESOLUTION_MEDIUM: 256 tokens
  • MEDIA_RESOLUTION_HIGH: 256 tokens + (256 tokens * number of pan and scan images)

To see the generic format example, seeDataset example for Gemini.

The following is an example of an image dataset.

{"contents":[{"role":"user","parts":[{"fileData":{"mimeType":"image/jpeg","fileUri":"gs://cloud-samples-data/ai-platform/generative_ai/gemini-2_0/image/longcap100/100.jpeg"}},{"text":"Describe this image in detail that captures the essence of it."}]},{"role":"model","parts":[{"text":"A man stands on a road, wearing a blue denim jacket, tan pants, and white sneakers. He has his hands in his pockets and is wearing a white t-shirt under his jacket. The man's pants are cuffed, and his shoes are white. The road is dark grey, and the leaves are green. The man is standing in the shade, and the light is shining on the ground."}]}],"generationConfig":{"mediaResolution":"MEDIA_RESOLUTION_LOW"}}

Sample datasets

You can use the following sample datasets to learn how to tune aGemini model. To use these datasets, specify the URIs in theapplicable parameters when creating a text model supervised fine-tuning job.

To use the sample tuning dataset, specify its location as follows:

"training_dataset_uri":"gs://cloud-samples-data/ai-platform/generative_ai/gemini-2_0/image/sft_train_data.jsonl",

To use the sample validation dataset, specify its location as follows:

"validation_dataset_uri":"gs://cloud-samples-data/ai-platform/generative_ai/gemini-2_0/image/sft_validation_data.jsonl",

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-19 UTC.