Gemini 2.5 Flash-Lite
Best for high volume, cost efficient tasks
Introducing 2.5 Flash-Lite, a thinking model for those looking for low cost and latency.
Upgrade to Gemini 2.5
2.5 Flash-Lite excels at high-volume, latency-sensitive tasks like translation and classification.
Hands-on with 2.5 Flash-Lite
Performance
2.5 Flash-Lite has all-round, significantly higher performance than 2.0 Flash-Lite on coding, math, science, reasoning and multimodal benchmarks.
| Benchmark | Notes | Gemini 2.5 Flash-LiteThinking | Gemini 2.5 Flash-LiteNon-thinking | Gemini 2.0 Flash |
|---|---|---|---|---|
| Reasoning & knowledge Humanity's Last Exam (no tools) | 6.9% | 5.1% | 5.1%* | |
| Mathematics AIME 2025 | 63.1% | 49.8% | 29.7% | |
| Code generation LiveCodeBench(UI: 1/1/2025-5/1/2025) | 34.3% | 33.7% | 29.1% | |
| Code editing Aider Polyglot | 27.1% | 26.7% | 21.3% | |
| Agentic coding SWE-bench Verified | single attempt | 27.6% | 31.6% | 21.4% |
| multiple attempts | 44.9% | 42.6% | 34.2% | |
| Factuality SimpleQA | 13.0% | 10.7% | 29.9% | |
| Factuality FACTS grounding | 86.8% | 84.1% | 84.6% | |
| Visual reasoning MMMU | 72.9% | 72.9% | 69.3% | |
| Image understanding Vibe-Eval (Reka) | 57.5% | 51.3% | 55.4% | |
| Long context MRCR v2 | 128k (average) | 30.6% | 16.6% | 19.0% |
| 1M (pointwise) | 5.4% | 4.1% | 5.3% | |
| Multilingual performance Global MMLU (Lite) | 84.5% | 81.1% | 83.4% |
Methodology
Gemini results: All Gemini scores are pass @1."Single attempt" settings allow no majority voting or parallel test-time compute; "multiple attempts" settings allow test-time selection of the candidate answer. They are all run with the AI Studio API with default sampling settings. To reduce variance, we average over multiple trials for smaller benchmarks. Aider Polyglot score is the pass rate average of 3 trials. Vibe-Eval results are reported using Gemini as a judge. Google's scaffolding for "multiple attempts" for SWE-Bench includes drawing multiple trajectories and re-scoring them using model's own judgement. For Aider results differ from the official leaderboard due to a difference in the settings used for evaluation (non-default).
Result sources: Where provider numbers are not available we report numbers from leaderboards reporting results on these benchmarks: Humanity's Last Exam results are sourced fromhttps://agi.safe.ai/ andhttps://scale.com/leaderboard/humanitys_last_exam, LiveCodeBench results are fromhttps://livecodebench.github.io/leaderboard.html (1/1/2025 - 5/1/2025 in the UI), Aider Polyglot numbers come fromhttps://aider.chat/docs/leaderboards/. FACTS come fromhttps://www.kaggle.com/benchmarks/google/facts-grounding. For MRCR v2 which is not publically available yet we include 128k results as a cumulative score to ensure they can be comparable with other models and a pointwise value for 1M context window to show the capability of the model at full length. The methodology has changed in this table vs previously published results for MRCR v2 as we have decided to focus on a harder, 8-needle version of the benchmark going forward.
* these results are on an earlier HLE dataset, obtained fromhttps://scale.com/leaderboard/humanitys_last_exam_preview
Model information
- Name
- 2.5 Flash-Lite
- Status
- General availability
- Input
- Text
- Image
- Video
- Audio
- Output
- Text
- Input tokens
- 1M
- Output tokens
- 64k
- Knowledge cutoff
- January 2025
- Tool use
- Search as a tool
- Code execution
- Best for
- High volume, low-cost and low latency tasks
- Availability
- Google AI Studio
- Gemini API
- Vertex AI
- Documentation
- View developer docs
- Model card
- View model card
- Technical report
- View technical report