Movatterモバイル変換


[0]ホーム

URL:


Epoch AI's logoEpoch AI's logo
Latest
Publications & Commentary
Data & Resources
Projects
About
Contact
Epoch AI's logoEpoch AI's logo
Search epoch.ai
Search
Enter a query to search for results
Data InsightOpen-weight models lag state-of-the-art by around 3 months on average

Open-weight models lag state-of-the-art by around 3 months on average

Frontier open-weight models lag behind the most capable models by an average of 3 months in theEpoch Capabilities Index (ECI), our holistic measure of model capability. That corresponds to an average ECI gap of around 7 points, similar to the gap between o3 and GPT-5.

However, the gap varies considerably over time, sometimes even closing completely. Until the release of o1-mini, Llama 3.1-405B was rated on par with the closed-source state-of-the-art model, Claude 3.5 Sonnet.

You can see more detailed analysis about the gap in ourearlier article.

Published

October 30, 2025

Epoch’s work is free to use, distribute, and reproduce provided the source and authors are credited under theCreative Commons BY license.

Explore this data

Learn more

Overview

We calculate the average gap between closed-weight and open-weight state-of-the-art performance according to our internal capability metric, theEpoch Capability Index (ECI). ECI is a composite measure which captures performance across many benchmarks.

Analysis

To calculate the average time gap, we calculate the horizontal distance between the two lines across the range of ECI values where such lines can be drawn. Since ECI is estimated with some noise, we count an open-source model as “catching up” to a previous SOTA if their difference in scores is not statistically significant. For example, DeepSeek-V2 is counted as having caught up to GPT-4 after about 14 months, despite getting an ECI score of 125 vs. GPT-4’s 126.

We use a similar procedure to estimate the average ECI gap, taking the vertical distance across all dates where such a vertical line can be drawn.

We find an average “horizontal” time gap of 3.5 months, with a 90% confidence interval of 1.1 to 5.3 months. Along the vertical dimension, we find an average gap of 7 ECI points, with a 90% confidence interval of 0 to 14 units.

We also note that the current gap likely appears larger than it really is; we do not yet have enough evaluations of frontier open-weight models like gpt-oss-120b or MiniMax-M2 to assign ECI scores, but it is likely that they improve on DeepSeek R1 (May 2025).

Code for our analysis is availablehere.

Related insights

Models with downloadable weights currently lag behind the top-performing models
November 27, 2024
LLM providers offer a trade-off between accuracy and speed
June 11, 2025
LLMs now accept longer inputs, and the best models can use them more effectively
June 25, 2025
Epoch AI's logo
Sign up for our newsletter to read the latest updates on our research and weekly commentary on AI news and developments.Subscribe to our newsletter
Publications & Commentary
© 2025 Epoch AI
Privacy NoticeCookie Policy

We value your privacy

Our website uses cookies to enhance your browsing experience and analyze site traffic. By clicking ‘Accept All,’ you consent to our use of cookies as described in ourPrivacy Policy andCookie Policy. If you wish to withdraw your consent, you can contact us atops@epoch.ai.
Epoch AI's logo

Help us make our website better!

Please tell us about you.

Leave feedback

Have a question? Noticed something wrong? Let us know.

Please enter your feedback

If you would like a reply, please include your name and email address.

Thank you for your feedback!

Your comment will be reviewed. We may not be able to respond to every submission.

There’s been an error in submitting your feedback. Please try again later.


[8]ページ先頭

©2009-2025 Movatter.jp