Gemini 3 Pro

Best for complex tasks and bringing creative concepts to life

Learn, plan, and build like never before with Gemini 3 Pro’s incredible reasoning powers

Try in Gemini

Build with Gemini

Our most intelligent model yet

Partner with a pro

With state-of-the-art reasoning and multimodal capabilities

Learn anything

Understand complex topics in a way that makes sense for you – with clear, concise, and helpful responses

Build anything

Bring your ideas to life – from sketches and prompts to interactive tools and experiences

Plan anything

Delegate tasks and multi-step projects to get things done faster than ever before

Get started

Build with Gemini 3

Slide 1 of 3

New

Google Antigravity

Build with our new agentic development platform

Download

Learn more

Google AI Studio

Leap from prompt to production

Try Google AI Studio

Gemini API

Get started building with cutting-edge AI models

Build with Gemini

Hands-on

Explore what you can do with Gemini 3 Pro

Slide 1 of 3

Vibe code like never before to create beautiful, responsive experiences

Gemini 3 Pro excels at practical, front-end development – with a more intuitive interface and richer design

Gain a deeper understanding through more nuanced responses

Gemini 3 Pro’s state-of-the-art reasoning provides unprecedented nuance and depth

Generate code for interactive flashcards, games and experiences to help you master new material

Gemini 3 Pro seamlessly synthesizes information across text, images, video, audio, and even code to help you learn

Performance

Gemini 3 is state-of-the-art across a wide range of benchmarks

Our most intelligent model yet sets a new bar for AI model performance.

Slide 1 of 3

Bar chart titled "Humanity’s Last Exam" measuring reasoning and knowledge. Gemini 3 Pro achieves the highest score at 37.5%, followed by GPT-5.1 at 26.5%, Gemini 2.5 at 21.6%, and Claude Sonnet 4.5 at 13.7%.

Bar chart titled "Terminal-Bench 2.0" measuring agentic coding. Gemini 3 Pro scores highest at 54.2%, followed by GPT-5.1 at 47.6%, Claude Sonnet 4.5 at 42.8%, and Gemini 2.5 at 32.6%.

Bar chart titled "SimpleQA Verified" measuring parametric knowledge. Gemini 3 Pro shows a significant lead with 72.1%, followed by Gemini 2.5 at 54.5%, GPT-5.1 at 34.9%, and Claude Sonnet 4.5 at 29.3%.

Benchmark	Notes	Gemini 3 Pro	Gemini 2.5 Pro	Claude Sonnet 4.5	GPT-5.1
Academic reasoning Humanity's Last Exam	No tools	37.5%	21.6%	13.7%	26.5%
Academic reasoning Humanity's Last Exam	With search and code execution	45.8%	—	—	—
Visual reasoning puzzles ARC-AGI-2	ARC Prize Verified	31.1%	4.9%	13.6%	17.6%
Scientific knowledge GPQA Diamond	No tools	91.9%	86.4%	83.4%	88.1%
Mathematics AIME 2025	No tools	95.0%	88.0%	87.0%	94.0%
Mathematics AIME 2025	With code execution	100.0%	—	100.0%	—
Challenging Math Contest problems MathArena Apex		23.4%	0.5%	1.6%	1.0%
Multimodal understanding and reasoning MMMU-Pro		81.0%	68.0%	68.0%	76.0%
Screen understanding ScreenSpot-Pro		72.7%	11.4%	36.2%	3.5%
Information synthesis from complex charts CharXiv Reasoning		81.4%	69.6%	68.5%	69.5%
OCR OmniDocBench 1.5	Overall Edit Distance, lower is better	0.115	0.145	0.145	0.147
Knowledge acquisition from videos Video-MMMU		87.6%	83.6%	77.8%	80.4%
Competitive coding problems LiveCodeBench Pro	Elo Rating, higher is better	2,439	1,775	1,418	2,243
Agentic terminal coding Terminal-Bench 2.0	Terminus-2 agent	54.2%	32.6%	42.8%	47.6%
Agentic coding SWE-Bench Verified	Single attempt	76.2%	59.6%	77.2%	76.3%
Agentic tool use τ2-bench		85.4%	54.9%	84.7%	80.2%
Long-horizon agentic tasks Vending-Bench 2	Net worth (mean), higher is better	$5,478.16	$573.64	$3,838.74	$1,473.43
Held out internal grounding, parametric, MM, and search retrieval benchmarks FACTS Benchmark Suite		70.5%	63.4%	50.4%	50.8%
Parametric knowledge SimpleQA Verified		72.1%	54.5%	29.3%	34.9%
Multilingual Q&A MMMLU		91.8%	89.5%	89.1%	91.0%
Commonsense reasoning across 100 Languages and Cultures Global PIQA		93.4%	91.5%	90.1%	90.9%
Long context performance MRCR v2 (8-needle)	128k (average)	77.0%	58.0%	47.1%	61.6%
Long context performance MRCR v2 (8-needle)	1M (pointwise)	26.3%	16.4%	not supported	not supported