Bitter lesson

From Wikipedia, the free encyclopedia

Principle in artificial intelligence

Artificial intelligence (AI)
Part ofa series on

Major goals Artificial general intelligence Intelligent agent Recursive self-improvement Planning Computer vision General game playing Knowledge representation Natural language processing Robotics AI safety
Approaches Machine learning Symbolic Deep learning Bayesian networks Evolutionary algorithms Hybrid intelligent systems Systems integration Open-source AI data centers
Applications Bioinformatics Deepfake Earth sciences Finance Generative AI Art Audio Music Government Healthcare Mental health Industry Software development Translation Military Physics Projects
Philosophy AI alignment Artificial consciousness The bitter lesson Chinese room Friendly AI Ethics Existential risk Turing test Uncanny valley Human–AI interaction
History Timeline Progress AI winter AI boom AI bubble
Controversies Deepfake pornography Taylor Swift deepfake pornography controversy Grok sexual deepfake scandal Google Gemini image generation controversy Pause Giant AI Experiments Removal of Sam Altman from OpenAI Statement on AI Risk Tay (chatbot) Théâtre D'opéra Spatial Voiceverse NFT plagiarism scandal
Glossary Glossary
v t e

Thebitter lesson is the observation inartificial intelligence that, in the long run, approaches thatscale with available computational power (such asbrute-force search orstatistical learning from large datasets) tend to outperform ones based ondomain-specific understanding because they are better at taking advantage ofMoore's law. The principle was proposed and named in a 2019 essay byRichard Sutton^[1] and is now widely accepted.^[2]^[3]^[4]^[5]^[6]^[7]^[8]

The essay

[edit]

Sutton gives several examples that illustrate the lesson:

Game playing. Inchess, theDeep Blue system that became the firstcomputer opponent to defeat aworld champion relied on a relatively simplealpha–beta search algorithm that scaled up by applying large amounts of specialized hardware to search for the best move. This defeated previous attempts to exploit the unique structure of chess or to includegrandmaster knowledge directly. Likewise in the game ofGo, theAlphaGo algorithm that surpassed human performance relied much less on expert skill at the game itself than previous generations of AI, and was further surpassed byAlphaGo Zero, which removed human expertise completely and trained only byself-play.
Speech recognition. Approaches based on training a general-purposehidden Markov model with large numbers of speech samples consistently outperformed the hand-crafted approaches of the 1970s, anddeep learning has continued this trend.
Computer vision. Algorithms that were assumed to approximate the humanvisual system (such as explicitly encodededge detection or detecting high-level features withSIFT) were outperformed byconvolutional neural networks that make far fewer assumptions about the nature ofvisual perception.

Sutton concludes that time is better invested in finding simple scalable solutions that can take advantage of Moore's law, rather than introducing ever-more-complex human insights, and calls this the "bitter lesson". He also cites two general-purpose techniques that have been shown to scale effectively:search andlearning. The lesson is considered "bitter" because it is lessanthropocentric than many researchers expected and so they have been slow to accept it.

Impact

[edit]

The essay was published on Sutton's website incompleteideas.net in 2019, and has received hundreds of formal citations according toGoogle Scholar. Some of these provide alternative statements of the principle; for example, the 2022 paper "A Generalist Agent" fromGoogle DeepMind summarized the lesson as:^[2]

Historically, generic models that are better atleveraging computation have also tended to overtake more specialized domain-specific approaches, eventually.

Another phrasing of the principle is seen in a Google paper on switchtransformers coauthored byNoam Shazeer:^[3]

Simple architectures—backed by a generous computational budget, data set size and parameter count—surpass more complicated algorithms.

The principle is further referenced in many other works on artificial intelligence. For example,From Deep Learning to Rational Machines draws a connection to long-standing debates in the field, such asMoravec's paradox and the contrast betweenneats and scruffies.^[9] In "Engineering a Less Artificial Intelligence", the authors concur that "flexible methods so far have always outperformed handcrafted domain knowledge in the long run" although note that "[w]ithout the right (implicit) assumptions,generalization is impossible".^[5] More recently, "The Brain's Bitter Lesson: Scaling Speech Decoding With Self-Supervised Learning" continues Sutton's argument, contending that (as of 2025) the lesson has not been fully learned in the fields of speech recognition andbrain data.^[6]

Other work has looked to apply the principle and validate it in new domains. For example, the 2022 paper "Beyond the Imitation Game" applies the principle tolarge language models to conclude that "it is vitally important that we understand their capabilities and limitations" to "avoid devoting research resources to problems that are likely to be solved by scale alone".^[7] In 2024, "Learning the Bitter Lesson: Empirical Evidence from 20 Years of CVPR Proceedings" looked at further evidence from the field of computer vision andpattern recognition, and concludes that the previous twenty years of experience in the field shows "a strong adherence tothe core principles of the 'bitter lesson'".^[4] In "Overestimation, Overfitting, and Plasticity in Actor-Critic: the Bitter Lesson of Reinforcement Learning", the authors look at generalization ofactor-critic algorithms and find that "general methods that are motivated by stabilization ofgradient-based learning significantly outperformRL-specific algorithmic improvements across a variety of environments" and note that this is consistent with the bitter lesson.^[8]

References

[edit]

^Sutton, Rich (March 13, 2019)."The Bitter Lesson".www.incompleteideas.net. RetrievedSeptember 7, 2025.
^^a ^bReed, Scott; Zolna, Konrad; Parisotto, Emilio; et al. (2022)."A Generalist Agent".Transactions on Machine Learning Research (2834–8856).arXiv:2205.06175. RetrievedSeptember 7, 2025.
^^a ^bFedus, William; Zoph, Barret; Shazeer, Noam (2022)."Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity".Journal of Machine Learning Research.23 (120):1–39. RetrievedSeptember 14, 2025.
^^a ^bYousefi, Mojtaba; Collins, Jack."Learning the Bitter Lesson: Empirical Evidence from 20 Years of CVPR Proceedings".Proceedings of the 1st Workshop on NLP for Science (NLP4Science). Association for Computational Linguistics. pp. 175–187. RetrievedSeptember 7, 2025.
^^a ^bSinz, Fabian H.; Pitkow, Xaq; Reimer, Jacob; et al. (2019)."Engineering a Less Artificial Intelligence".Neuron.103 (6). Elsevier:967–979.doi:10.1016/j.neuron.2019.08.034. RetrievedSeptember 13, 2025.
^^a ^bJayalath, Dulhan; Landau, Gilad; Shillingford, Brendan; Woolrich, Mark; Parker Jones, ʻŌiwi (2025)."The Brain's Bitter Lesson: Scaling Speech Decoding With Self-Supervised Learning".Forty-second International Conference on Machine Learning. Proceedings of Machine Learning Research. RetrievedSeptember 13, 2025.
^^a ^bSrivastava, Aarohi; Rastogi, Abhinav; Rao, Abhishek; Awal, Abu; Abid, Abubakar; et al."Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models".The Fourteenth International Conference on Learning Representations.
^^a ^bNauman, Michal; Bortkiewicz, Michał; Miłoś, Piotr; Trzciński, Tomasz; Ostaszewski, Mateusz; et al. (2024)."Overestimation, Overfitting, and Plasticity in Actor-Critic: the Bitter Lesson of Reinforcement Learning".Proceedings of the 41st International Conference on Machine Learning. Proceedings of Machine Learning Research. RetrievedSeptember 13, 2025.
^Buckner, Cameron J. (December 11, 2023).From Deep Learning to Rational Machines: What the History of Philosophy Can Teach Us about the Future of Artificial Intelligence. Oxford University Press.doi:10.1093/oso/9780197653302.001.0001.ISBN 9780197653302.

Retrieved from "https://en.wikipedia.org/w/index.php?title=Bitter_lesson&oldid=1321524726"

Category:

Philosophy of artificial intelligence

Hidden categories:

ページ先頭