Movatterモバイル変換


[0]ホーム

URL:


Epoch AI's logoEpoch AI's logo
Latest
Publications & Commentary
Data & Resources
Projects
About
Contact
Epoch AI's logoEpoch AI's logo
Search epoch.ai
Search
Enter a query to search for results
paper

Machine learning model sizes and the parameter gap

The model size of notable machine learning systems has grown ten times faster than before since 2018. After 2020 growth has not been entirely continuous: there was a jump of one order of magnitude which persists until today. This is relevant for forecasting model size and thus AI capabilities.

Published

Jul 5, 2022

Authors

Resources

Summary: The model size of notable machine learning systems has grown ten times faster than before since 2018. After 2020 growth has not been entirely continuous: there was a jump of one order of magnitude which persists until today. This is relevant for forecasting model size and thus AI capabilities.

Trends in model size

In current ML systems, model size (number of parameters) is related to performance via known scaling laws. We used our dataset to analyze trends in the model size of 237 milestone machine learning systems. The systems are categorized into Language, Vision, Games and Other according to the task they solve.

Model size slowly increased by 7 orders of magnitude from the 1950s to around 2018. Since 2018, growth has accelerated for language models, with model size increasing by another 4 orders of magnitude in the four years from 2018 to 2022 (see Figure 1). Other domains like vision have grown at a more moderate pace, but still faster than before 2018.

Figure 1. Left: Transition period around 2018, assuming a single post-2018 trend. Right: the same period, assuming two separate post-2018 trends.

PeriodDataScale (start to end)SlopeDoubling time$$R^2$$
1952 to 2018$$n=109$$1e+01 to 3e+7 params0.1 OOMs/year
[0.1; 0.1; 0.1]
39.7 months
[36.4; 39.7; 40.7]
0.62
2018 to 2022
(single trend)
$$n=129$$3e+7 to 2e+12 params0.9 OOMs/year
[0.9; 0.9; 1.0]
4.0 months
[3.5; 4.0; 4.3]
0.31
2018 to 2022
(above gap)
$$n=27$$7e+10 to 2e+12 params-0.1 OOMs/year
[-0.4; -0.1; 0.2]
-14.2 months
[-52.5; -14.2; 52.0]
0.00
2018 to 2022
(below gap)
$$n=102$$3e+7 to 2e+10 params0.5 OOMs/year
[0.4; 0.5; 0.5]
8.0 months
[7.0; 8.0; 9.8]
0.25

Table 1. Summary of our main results. Around 2018 there was a general increase in growth. This can be split into the previous trend increasing its growth rate, and a separate cluster of very large models appearing on top.

The parameter gap

Starting in 2020, we see many models below 20B parameters and above 70B parameters, but very few in the 20B-70B range. We refer to this scarcity as the parameter gap (see Figure 2).

Figure 2: Model size over time, separated by domain. Red lines highlight the parameter gap. Most systems above the gap are language or multimodal models.

We have come up with some hypotheses that explain the parameter gap, of which these two are the ones most consistent with the evidence:

  1. Increasing model size beyond 20B parameters has a high marginal cost due to the need to adopt different parallelism techniques, so that mid-sized models are less cost-effective than bigger or smaller ones.
  2. GPT-3 initiated the gap by ‘jumping’ one order of magnitude in size over previous systems. This gap was maintained because researchers are incentivized to build the cheapest model that can outperform previous models. Those competing with GPT-3 are above the gap; the rest are below.

The existence of the parameter gap suggests that model size has some underlying constraints that might cause discontinuities in the future.
 

Read the full paper now on the arXiv

About the authors

Former employee
Pablo Villalobos has a background in Mathematics and Computer Science. After spending some time as a software engineer, he decided to pivot towards AI. His interests include the economic consequences of advanced AI systems and the role of algorithmic improvements in AI progress.
Jaime Sevilla is the director of Epoch AI. His research is focused on technological forecasting and the trajectory of AI. He has a background in Mathematics and Computer Science.
Former employee
Tamay Besiroglu co-founded Epoch AI and remains contributing to the organization as a research advisor. He left Epoch to co-lead Mechanize, a startup building virtual work environments, benchmarks, and training data for AI development. His research expertise focuses on the economics of computing and broader trends in machine learning.
Former employee
Lennart Heim is a former researcher at Epoch AI. His research focuses on the role of compute for advanced AI systems and how compute can be leveraged as an instrument for AI governance, with an emphasis on policy development and security implications.
Anson Ho is a researcher at Epoch AI. He is interested in helping develop a more rigorous understanding of future developments in AI and its societal impacts.
Former employee
Marius Hobbhahn builds models for AI timelines and takeoff using historical trends and his best understanding of the future.

Tags

Related work

paper · 7 min read
Compute trends across three eras of machine learning
We’ve compiled a comprehensive dataset of the training compute of AI models, providing key insights into AI development.
Feb 16, 2022 · Updated May 2, 2022 · By Jaime Sevilla, Lennart Heim, Anson Ho, Tamay Besiroglu, Marius Hobbhahn and Pablo Villalobos
paper · 4 min read
How much does it cost to train frontier AI models?
The cost of training top AI models has grown 2-3x annually for the past eight years. By 2027, the largest models could cost over a billion dollars.
Jun 3, 2024 · Updated Jan 13, 2025 · By Ben Cottier, Robi Rahman, Loredana Fattorini, Nestor Maslej and David Owen
paper · 3 min read
Who is leading in AI? An analysis of industry AI research
Industry has emerged as a driving force in AI. We compare top companies on research impact, training runs, and contributions to algorithmic innovations.
Nov 27, 2023 · By Ben Cottier, Tamay Besiroglu and David Owen
Epoch AI's logo
Sign up for our newsletter to read the latest updates on our research and weekly commentary on AI news and developments.Subscribe to our newsletter
Publications & Commentary
© 2025 Epoch AI
Privacy NoticeCookie Policy

We value your privacy

Our website uses cookies to enhance your browsing experience and analyze site traffic. By clicking ‘Accept All,’ you consent to our use of cookies as described in ourPrivacy Policy andCookie Policy. If you wish to withdraw your consent, you can contact us atops@epoch.ai.
Epoch AI's logo

Help us make our website better!

Please tell us about you.

Leave feedback

Have a question? Noticed something wrong? Let us know.

Please enter your feedback

If you would like a reply, please include your name and email address.

Thank you for your feedback!

Your comment will be reviewed. We may not be able to respond to every submission.

There’s been an error in submitting your feedback. Please try again later.


[8]ページ先頭

©2009-2025 Movatter.jp