Movatterモバイル変換


[0]ホーム

URL:


Jump to content
WikipediaThe Free Encyclopedia
Search

Double descent

From Wikipedia, the free encyclopedia
Concept in machine learning
For the concept of double descent in anthropology, seeKinship § Descent rules.
Part of a series on
Machine learning
anddata mining
Journals and conferences
An example of the double descent phenomenon in a two-layerneural network: as the ratio of parameters to data points increases, the test error first falls, then rises, then falls again.[1] The vertical line marks the "interpolation threshold" boundary between the underparametrized region (more data points than parameters) and the overparameterized region (more parameters than data points).

Double descent instatistics andmachine learning is the phenomenon where amodel with a small number ofparameters and a model with an extremely large number of parameters both have a smalltraining error, but a model whose number of parameters is about the same as the number ofdata points used to train the model will have a much greatertest error than one with a much larger number of parameters.[2] This phenomenon has been considered surprising, as it contradicts assumptions aboutoverfitting in classical machine learning.[3]

History

[edit]

Early observations of what would later be called double descent in specific models date back to 1989.[4][5]

The term "double descent" was coined by Belkin et. al.[6] in 2019,[3] when the phenomenon gained popularity as a broader concept exhibited by many models.[7][8] The latter development was prompted by a perceived contradiction between the conventional wisdom that too many parameters in the model result in a significant overfitting error (an extrapolation of thebias–variance tradeoff),[9] and the empirical observations in the 2010s that some modern machine learning techniques tend to perform better with larger models.[6][10]

Theoretical models

[edit]

Double descent occurs inlinear regression withisotropicGaussian covariates and isotropic Gaussian noise.[11]

A model of double descent at thethermodynamic limit has been analyzed using thereplica trick, and the result has been confirmed numerically.[12]

Empirical examples

[edit]

The scaling behavior of double descent has been found to follow abroken neural scaling law[13] functional form.

See also

[edit]

References

[edit]
  1. ^Rocks, Jason W. (2022)."Memorizing without overfitting: Bias, variance, and interpolation in overparameterized models".Physical Review Research.4 (1).arXiv:2010.13933.doi:10.1103/PhysRevResearch.4.013201.
  2. ^"Deep Double Descent".OpenAI. 2019-12-05. Retrieved2022-08-12.
  3. ^abSchaeffer, Rylan; Khona, Mikail; Robertson, Zachary; Boopathy, Akhilan; Pistunova, Kateryna; Rocks, Jason W.; Fiete, Ila Rani; Koyejo, Oluwasanmi (2023-03-24). "Double Descent Demystified: Identifying, Interpreting & Ablating the Sources of a Deep Learning Puzzle".arXiv:2303.14151v1 [cs.LG].
  4. ^Vallet, F.; Cailton, J.-G.; Refregier, Ph (June 1989)."Linear and Nonlinear Extension of the Pseudo-Inverse Solution for Learning Boolean Functions".Europhysics Letters.9 (4): 315.Bibcode:1989EL......9..315V.doi:10.1209/0295-5075/9/4/003.ISSN 0295-5075.
  5. ^Loog, Marco; Viering, Tom; Mey, Alexander; Krijthe, Jesse H.; Tax, David M. J. (2020-05-19)."A brief prehistory of double descent".Proceedings of the National Academy of Sciences.117 (20):10625–10626.arXiv:2004.04328.Bibcode:2020PNAS..11710625L.doi:10.1073/pnas.2001875117.ISSN 0027-8424.PMC 7245109.PMID 32371495.
  6. ^abBelkin, Mikhail; Hsu, Daniel; Ma, Siyuan; Mandal, Soumik (2019-08-06)."Reconciling modern machine learning practice and the bias-variance trade-off".Proceedings of the National Academy of Sciences.116 (32):15849–15854.arXiv:1812.11118.doi:10.1073/pnas.1903070116.ISSN 0027-8424.PMC 6689936.PMID 31341078.
  7. ^Spigler, Stefano; Geiger, Mario; d'Ascoli, Stéphane; Sagun, Levent; Biroli, Giulio; Wyart, Matthieu (2019-11-22)."A jamming transition from under- to over-parametrization affects loss landscape and generalization".Journal of Physics A: Mathematical and Theoretical.52 (47): 474001.arXiv:1810.09665.doi:10.1088/1751-8121/ab4c8b.ISSN 1751-8113.
  8. ^Viering, Tom; Loog, Marco (2023-06-01)."The Shape of Learning Curves: A Review".IEEE Transactions on Pattern Analysis and Machine Intelligence.45 (6):7799–7819.arXiv:2103.10948.doi:10.1109/TPAMI.2022.3220744.ISSN 0162-8828.PMID 36350870.
  9. ^Geman, Stuart; Bienenstock, Élie; Doursat, René (1992)."Neural networks and the bias/variance dilemma"(PDF).Neural Computation.4:1–58.doi:10.1162/neco.1992.4.1.1.S2CID 14215320.
  10. ^Preetum Nakkiran; Gal Kaplun; Yamini Bansal; Tristan Yang; Boaz Barak; Ilya Sutskever (29 December 2021). "Deep double descent: where bigger models and more data hurt".Journal of Statistical Mechanics: Theory and Experiment.2021 (12). IOP Publishing Ltd and SISSA Medialab srl: 124003.arXiv:1912.02292.Bibcode:2021JSMTE2021l4003N.doi:10.1088/1742-5468/ac3a74.S2CID 207808916.
  11. ^Nakkiran, Preetum (2019-12-16). "More Data Can Hurt for Linear Regression: Sample-wise Double Descent".arXiv:1912.07242v1 [stat.ML].
  12. ^Advani, Madhu S.; Saxe, Andrew M.; Sompolinsky, Haim (2020-12-01)."High-dimensional dynamics of generalization error in neural networks".Neural Networks.132:428–446.doi:10.1016/j.neunet.2020.08.022.ISSN 0893-6080.PMC 7685244.PMID 33022471.
  13. ^Caballero, Ethan; Gupta, Kshitij; Rish, Irina; Krueger, David (2022)."Broken Neural Scaling Laws". International Conference on Learning Representations (ICLR), 2023.

Further reading

[edit]

External links

[edit]
Continuous data
Center
Dispersion
Shape
Count data
Summary tables
Dependence
Graphics
Study design
Survey methodology
Controlled experiments
Adaptive designs
Observational studies
Statistical theory
Frequentist inference
Point estimation
Interval estimation
Testing hypotheses
Parametric tests
Specific tests
Goodness of fit
Rank statistics
Bayesian inference
Correlation
Regression analysis
Linear regression
Non-standard predictors
Generalized linear model
Partition of variance
Categorical
Multivariate
Time-series
General
Specific tests
Time domain
Frequency domain
Survival
Survival function
Hazard function
Test
Biostatistics
Engineering statistics
Social statistics
Spatial statistics
Concepts
Applications
Implementations
Audio–visual
Text
Decisional
People
Architectures
Stub icon

Thisstatistics-related article is astub. You can help Wikipedia byexpanding it.

Retrieved from "https://en.wikipedia.org/w/index.php?title=Double_descent&oldid=1281063221"
Categories:
Hidden categories:

[8]ページ先頭

©2009-2025 Movatter.jp