Search epoch.ai

Enter a query to search for results

Article Machine learning model sizes and the parameter gap

paper

Machine learning model sizes and the parameter gap

The model size of notable machine learning systems has grown ten times faster than before since 2018. After 2020 growth has not been entirely continuous: there was a jump of one order of magnitude which persists until today. This is relevant for forecasting model size and thus AI capabilities.

Published

Jul 5, 2022

Authors

Resources

Paper

Summary: The model size of notable machine learning systems has grown ten times faster than before since 2018. After 2020 growth has not been entirely continuous: there was a jump of one order of magnitude which persists until today. This is relevant for forecasting model size and thus AI capabilities.

Trends in model size

In current ML systems, model size (number of parameters) is related to performance via known scaling laws. We used our dataset to analyze trends in the model size of 237 milestone machine learning systems. The systems are categorized into Language, Vision, Games and Other according to the task they solve.

Model size slowly increased by 7 orders of magnitude from the 1950s to around 2018. Since 2018, growth has accelerated for language models, with model size increasing by another 4 orders of magnitude in the four years from 2018 to 2022 (see Figure 1). Other domains like vision have grown at a more moderate pace, but still faster than before 2018.

Figure 1. Left: Transition period around 2018, assuming a single post-2018 trend. Right: the same period, assuming two separate post-2018 trends.

Period	Data	Scale (start to end)	Slope	Doubling time	$$R^2$$
1952 to 2018	$$n=109$$	1e+01 to 3e+7 params	0.1 OOMs/year [0.1; 0.1; 0.1]	39.7 months [36.4; 39.7; 40.7]	0.62
2018 to 2022 (single trend)	$$n=129$$	3e+7 to 2e+12 params	0.9 OOMs/year [0.9; 0.9; 1.0]	4.0 months [3.5; 4.0; 4.3]	0.31
2018 to 2022 (above gap)	$$n=27$$	7e+10 to 2e+12 params	-0.1 OOMs/year [-0.4; -0.1; 0.2]	-14.2 months [-52.5; -14.2; 52.0]	0.00
2018 to 2022 (below gap)	$$n=102$$	3e+7 to 2e+10 params	0.5 OOMs/year [0.4; 0.5; 0.5]	8.0 months [7.0; 8.0; 9.8]	0.25

Table 1. Summary of our main results. Around 2018 there was a general increase in growth. This can be split into the previous trend increasing its growth rate, and a separate cluster of very large models appearing on top.

The parameter gap

Starting in 2020, we see many models below 20B parameters and above 70B parameters, but very few in the 20B-70B range. We refer to this scarcity as the parameter gap (see Figure 2).

We have come up with some hypotheses that explain the parameter gap, of which these two are the ones most consistent with the evidence:

Increasing model size beyond 20B parameters has a high marginal cost due to the need to adopt different parallelism techniques, so that mid-sized models are less cost-effective than bigger or smaller ones.
GPT-3 initiated the gap by ‘jumping’ one order of magnitude in size over previous systems. This gap was maintained because researchers are incentivized to build the cheapest model that can outperform previous models. Those competing with GPT-3 are above the gap; the rest are below.

The existence of the parameter gap suggests that model size has some underlying constraints that might cause discontinuities in the future.

Read the full paper now on the arXiv

About the authors

Former employee

Pablo Villalobos has a background in Mathematics and Computer Science. After spending some time as a software engineer, he decided to pivot towards AI. His interests include the economic consequences of advanced AI systems and the role of algorithmic improvements in AI progress.

Jaime Sevilla is the director of Epoch AI. His research is focused on technological forecasting and the trajectory of AI. He has a background in Mathematics and Computer Science.

Former employee

Tamay Besiroglu co-founded Epoch AI and remains contributing to the organization as a research advisor. He left Epoch to co-lead Mechanize, a startup building virtual work environments, benchmarks, and training data for AI development. His research expertise focuses on the economics of computing and broader trends in machine learning.

Former employee

Lennart Heim is a former researcher at Epoch AI. His research focuses on the role of compute for advanced AI systems and how compute can be leveraged as an instrument for AI governance, with an emphasis on policy development and security implications.

Anson Ho is a researcher at Epoch AI. He is interested in helping develop a more rigorous understanding of future developments in AI and its societal impacts.

Former employee

Marius Hobbhahn builds models for AI timelines and takeoff using historical trends and his best understanding of the future.

Related work

paper · 7 min read

Compute trends across three eras of machine learning

We’ve compiled a comprehensive dataset of the training compute of AI models, providing key insights into AI development.

Feb 16, 2022 · Updated May 2, 2022 · By Jaime Sevilla, Lennart Heim, Anson Ho, Tamay Besiroglu, Marius Hobbhahn and Pablo Villalobos

paper · 4 min read

How much does it cost to train frontier AI models?

The cost of training top AI models has grown 2-3x annually for the past eight years. By 2027, the largest models could cost over a billion dollars.

Jun 3, 2024 · Updated Jan 13, 2025 · By Ben Cottier, Robi Rahman, Loredana Fattorini, Nestor Maslej and David Owen

paper · 3 min read

Who is leading in AI? An analysis of industry AI research

Industry has emerged as a driving force in AI. We compare top companies on research impact, training runs, and contributions to algorithmic innovations.

Nov 27, 2023 · By Ben Cottier, Tamay Besiroglu and David Owen

Epoch AI’s work is free to use, distribute, and reproduce provided the source and authors are credited under theCreative Commons Attribution license.

Cite this work as

Pablo Villalobos, Jaime Sevilla, Tamay Besiroglu, Lennart Heim, Anson Ho, and Marius Hobbhahn. ‘Machine Learning Model Sizes and the Parameter Gap’.ArXiv [Cs.LG], 2022. arXiv. http://arxiv.org/abs/2207.02852.

BibTeX citation

@misc{villalobos2022machine,      title={Machine Learning Model Sizes and the Parameter Gap},       author={Pablo Villalobos and Jaime Sevilla and Tamay Besiroglu and Lennart Heim and Anson Ho and Marius Hobbhahn},      year={2022},      eprint={2207.02852},      archivePrefix={arXiv},      primaryClass={cs.LG}}

Movatterモバイル変換

Machine learning model sizes and the parameter gap

Published

Authors

Resources

Trends in model size

The parameter gap

About the authors

Tags

Related work

Cite this work as

BibTeX citation

Not implemented yet

We value your privacy