Document tuning Stay organized with collections Save and categorize content based on your preferences.
This page provides prerequisites and detailed instructions for fine-tuningGemini on document data using supervised learning.
Use cases
Fine-tuning lets you customize powerful language models for your specific needs.Here are some key use cases where fine-tuning with your own set of PDFs cansignificantly enhance a model's performance:
- Internal knowledge base: Convert your internal documents into an AI-powered knowledge base that provides instant answers and insights. For example, a sales representative could instantly access product specifications and pricing details from past training materials.
- Research assistant: Create a research assistant capable of analyzing a collection of research papers, articles, and books. A researcher studying climate change could quickly analyze scientific papers to identify trends in sea level rise or assess the effectiveness of different mitigation strategies.
- Legal or regulatory compliance: Fine-tuning on legal documents can help automate contract review, flagging potential inconsistencies or areas of risk. This allows legal professionals to focus on higher-level tasks while ensuring compliance.
- Automated report generation: Automate the analysis of complex financial reports, extracting key performance indicators and generating summaries for stakeholders. This can save time and reduce the risk of errors compared to manual analysis.
- Content summarization and analysis: Summarize lengthy PDF documents, extract key insights, and analyze trends. For example, a market research team could analyze a collection of customer surveys to identify key themes and sentiment.
- Document comparison and version control: Compare different versions of a document to identify changes and track revisions. This can be particularly useful in collaborative environments where multiple authors contribute to a document.
Limitations
Gemini 2.5 models
| Specification | Value |
|---|---|
| Maximum PDF pages per example | 300 |
| Maximum PDF files per example | 4 |
| Maximum PDF file size | 20MB |
Gemini 2.0 Flash
Gemini 2.0 Flash-Lite
| Specification | Value |
|---|---|
| Maximum PDF pages per example | 300 |
| Maximum PDF files per example | 4 |
| Maximum PDF file size | 20MB |
To learn more about document understanding requirements, seeDocument understanding.
Dataset format
ThefileUri for your dataset can be the URI for a file in a Cloud Storagebucket, or it can be a publicly available HTTP or HTTPS URL.
To see the generic format example, seeDataset example for Gemini.
The following is an example of a document dataset.
{"contents":[{"role":"user","parts":[{"fileData":{"mimeType":"application/pdf","fileUri":"gs://cloud-samples-data/generative-ai/pdf/2403.05530.pdf"}},{"text":"You are a very professional document summarization specialist. Please summarize the given document."}]},{"role":"model","parts":[{"text":"The report introduces Gemini 2.0 Flash, a multimodal AI model developed by Google DeepMind. The report positions Gemini 2.0 Flash as a significant advancement in multimodal AI, pushing the boundaries of long-context understanding and opening new avenues for future research and applications."}]}]}What's next
- To learn more about the document understanding capability of Gemini models, see theDocument understanding overview.
- To start tuning, seeTune Gemini models by using supervised fine-tuning
- To learn how supervised fine-tuning can be used in a solution that builds agenerative AI knowledge base, seeJump Start Solution: Generative AIknowledge base.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-12-16 UTC.