Movatterモバイル変換


[0]ホーム

URL:


Skip to main content
Google DeepMindDeepMind
code
Build with Gemini
spark
Chat with Gemini

Gemini 3 Pro

Best for complex tasks and bringing creative concepts to life

Learn, plan, and build like never before with Gemini 3 Pro’s incredible reasoning powers

Our most intelligent model yet

Partner with a pro

With state-of-the-art reasoning and multimodal capabilities


Get started

Build with Gemini 3

Google AI Studio

Leap from prompt to production

Gemini API

Get started building with cutting-edge AI models

Hands-on

Explore what you can do with Gemini 3 Pro

Slide 1 of 3

Performance

Gemini 3 is state-of-the-art across a wide range of benchmarks

Our most intelligent model yet sets a new bar for AI model performance.

Bar chart titled "Humanity’s Last Exam" measuring reasoning and knowledge. Gemini 3 Pro achieves the highest score at 37.5%, followed by GPT-5.1 at 26.5%, Gemini 2.5 at 21.6%, and Claude Sonnet 4.5 at 13.7%.Bar chart titled "Humanity’s Last Exam" measuring reasoning and knowledge. Gemini 3 Pro achieves the highest score at 37.5%, followed by GPT-5.1 at 26.5%, Gemini 2.5 at 21.6%, and Claude Sonnet 4.5 at 13.7%.
Bar chart titled "Terminal-Bench 2.0" measuring agentic coding. Gemini 3 Pro scores highest at 54.2%, followed by GPT-5.1 at 47.6%, Claude Sonnet 4.5 at 42.8%, and Gemini 2.5 at 32.6%.Bar chart titled "Terminal-Bench 2.0" measuring agentic coding. Gemini 3 Pro scores highest at 54.2%, followed by GPT-5.1 at 47.6%, Claude Sonnet 4.5 at 42.8%, and Gemini 2.5 at 32.6%.
Bar chart titled "SimpleQA Verified" measuring parametric knowledge. Gemini 3 Pro shows a significant lead with 72.1%, followed by Gemini 2.5 at 54.5%, GPT-5.1 at 34.9%, and Claude Sonnet 4.5 at 29.3%.Bar chart titled "SimpleQA Verified" measuring parametric knowledge. Gemini 3 Pro shows a significant lead with 72.1%, followed by Gemini 2.5 at 54.5%, GPT-5.1 at 34.9%, and Claude Sonnet 4.5 at 29.3%.
BenchmarkNotesGemini 3 ProGemini 2.5 ProClaude Sonnet 4.5GPT-5.1
Academic reasoning Humanity's Last ExamNo tools37.5%21.6%13.7%26.5%
With search and code execution45.8%
Visual reasoning puzzles ARC-AGI-2ARC Prize Verified31.1%4.9%13.6%17.6%
Scientific knowledge GPQA DiamondNo tools91.9%86.4%83.4%88.1%
Mathematics AIME 2025No tools95.0%88.0%87.0%94.0%
With code execution100.0%100.0%
Challenging Math Contest problems MathArena Apex23.4%0.5%1.6%1.0%
Multimodal understanding and reasoning MMMU-Pro81.0%68.0%68.0%76.0%
Screen understanding ScreenSpot-Pro72.7%11.4%36.2%3.5%
Information synthesis from complex charts CharXiv Reasoning81.4%69.6%68.5%69.5%
OCR OmniDocBench 1.5Overall Edit Distance, lower is better0.1150.1450.1450.147
Knowledge acquisition from videos Video-MMMU87.6%83.6%77.8%80.4%
Competitive coding problems LiveCodeBench ProElo Rating, higher is better2,4391,7751,4182,243
Agentic terminal coding Terminal-Bench 2.0Terminus-2 agent54.2%32.6%42.8%47.6%
Agentic coding SWE-Bench VerifiedSingle attempt76.2%59.6%77.2%76.3%
Agentic tool use τ2-bench85.4%54.9%84.7%80.2%
Long-horizon agentic tasks Vending-Bench 2Net worth (mean), higher is better$5,478.16$573.64$3,838.74$1,473.43
Held out internal grounding, parametric, MM, and search retrieval benchmarks FACTS Benchmark Suite70.5%63.4%50.4%50.8%
Parametric knowledge SimpleQA Verified72.1%54.5%29.3%34.9%
Multilingual Q&A MMMLU91.8%89.5%89.1%91.0%
Commonsense reasoning across 100 Languages and Cultures Global PIQA93.4%91.5%90.1%90.9%
Long context performance MRCR v2 (8-needle)128k (average)77.0%58.0%47.1%61.6%
1M (pointwise)26.3%16.4%not supportednot supported
Slide 1 of 4

Name
3 Pro
Status
Preview
Input
Output
Input tokens
1M
Output tokens
64k
Knowledge cutoff
January 2025
Tool use
Best for
Availability
Documentation
View developer docs
Model card
View model card

[8]ページ先頭

©2009-2025 Movatter.jp