Movatterモバイル変換

4155Accesses
97Citations
3Altmetric
Explore all metrics

Abstract

We study the phenomenon of cognitive learning from an algorithmic standpoint. How does the brain effectively learn concepts from a small number of examples despite the fact that each example contains a huge amount of information? We provide a novel algorithmic analysis via a model ofrobust concept learning (closely related to “margin classifiers”), and show that a relatively small number of examples are sufficient to learn rich concept classes. The new algorithms have several advantages—they are faster, conceptually simpler, and resistant to low levels of noise. For example, a robust half-space can be learned in linear time using only a constant number of training examples, regardless of the number of attributes. A general (algorithmic) consequence of the model, that “more robust concepts are easier to learn”, is supported by a multitude of psychological studies.

Article PDF

Maximum Independent Sets and Supervised Learning

Article16 May 2022

Linear Classifiers with the L1 Margin from a Small Number of High-Dimensional Vectors

Machine Learning

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Achlioptas, D. (2001). Database friendly random projections. InProc. Principles of Database Systems (PODS) (pp. 274–281).
Agmon, S. (1954). The relaxation method for linear inequalities.Canadian Journal of Mathematics,6(3), 382–392.
MATH MathSciNet Google Scholar
Arriaga, R. I., & Vempala, S. (1999). An algorithmic theory of Learning: Robust concepts and random projection. InProc. of the 39th IEEE Foundations of Computer Science.
Balcan, N., Blum, A., & Vempala, S. (2004). On kernels, margins and low-dimensional mappings. InProc. of Algorithmic Learning Theory.
Bartlett, P., & Shawe-Taylor, J. (1998). Generalization performance of support vector machines and other pattern classifiers, In B., Schvlkopf, C., Burges, & A.J. Smola, (eds.),Advances in kernel methods—support vector learning. MIT press.
Baum, E. B. (1990). On learning a union of half spaces.journal of Complexity,6(1), 67–101.
Article MATH MathSciNet Google Scholar
Ben-David, S., Eiron, N., & Simon, H.(2004). Limitations of learning via embeddings in euclidean half spaces.Journal of Machine Learning Research,3, 441–461.
Article MathSciNet Google Scholar
Blum, A., Frieze, A. Kannan, R., & Vempala, S. (1996). A polynomial-time algorithm for learning noisy linear threshold functions. InProc. of the 37th IEEE Foundations of Computer Science.
Blum, A., & Kannan, R. (1993). Learning an intersection ofk halfspaces over a uniform distribution. InProc. of the 34th IEEE Symposium on the Foundations of Computer Science.
Blumer, A., Ehrenfeucht, A., Haussler, D., & Warmuth, M. K. (1989). Learnability and the vapnik-chervonenkis dimension.Journal of ACM,36(4), 929–965.
Article MathSciNet Google Scholar
Bylander, T. (1994). Learning linear threshold functions in the presence of classification noise. InProc 7th Workshop on Computational Learning Theory.
Cohen, E. (1997). Learning noisy perceptrons by a perceptron in polynomial time. InProc. of the 38th IEEE Foundations of Computer Science.
Cortes, C., & Vapnik, V. (1995). Support-vector networks.Machine Learning,20, 273–297.
Google Scholar
Dasgupta, S., & Gupta, A. (1999). An elementary proof of the Johnson-Lindenstrauss Lemma.Tech Rep.U.C. Berkeley.
Feller, W. (1957).An introduction to probability theory and its applications. John Wiley and Sons, Inc.
Frankl, P., & Maehara, H. (1988). The Johnson-Lindenstrauss Lemma and the Sphericity of some graphs,J Comb. Theory, B44, 355–362.
Google Scholar
Freund, Y., & Schapire, R. E. (1999). “Large margin classification using the perceptron algorithm.Machine learning,37(3), 277–296.
Article Google Scholar
Garg, A., Har-Peled, S., & Roth, D. (2002). On generalization bounds, projection profile, and margin distribution.ICML, 171–178.
Garg, A., & Roth, D. (2003). Margin distribution and learning.ICML, 210–217.
Glass, A.L., Holyoak, K.J., & Santa J.L. (1979). The structure of categories. InCognition. Addison-Wesley.
Grötschel, M., Lovász, L., & Schrijver, A. (1988).Geometric algorithms and combinatorial optimization, Springer.
Hoeffding, W. (1963). Probability inequalities for sums of bounded random variables.Journal of the American Statistical Association,58, 13–30.
Article MATH MathSciNet Google Scholar
Indyk, P., & Motwani, R. (1998). Approximate nearest neighbors: Towards removing the curse of dimensionality. InProcedings of ACM STOC.
Johnson, W. B., & Lindenstrauss, J.(1984). Extensions of lipshitz mapping into Hilbert space.Contemporary Mathematics,26, 189–206.
MathSciNet Google Scholar
Kearns, M.J., & Schapire, R.E. (1994). Efficient distribution-free learning of probabilistic concepts.Journal of Computer and System Sciences,48(3), 464–497.
Article MathSciNet Google Scholar
Kearns, M.J., & Vazirani, U. (1994).Introduction to computational learning theory. MIT Press.
Kleinberg, J. (1997). Two algorithms for nearest-neighbor search in high dimensions. InProcedings 29th ACM Symposium on Theory of Computing.
Klivans, A., & Servedio, R. (2004). Learning intersections of halfspaces with a margin, InProcedings 17th Workshop on Computational Learning Theory.
Knowlton, B. (1999). What can neuropsychology tell us about category learning.Trends in Cognitive Science,3, 123–124.
Article Google Scholar
Komatsu, L.K. (1992). Recent views on conceptual structure.Psychological Bulletin,112(3).
Linial, N., London, E., & Rabinovich, Y. (1994). The geometry of graphs and some of its algorithmic applications. InProc. of 35th IEEE Foundations of Computer Science.
Littlestone, N. (1987). Learning quickly when irrelevant attributes abound: A new linear-threshold algorithm.Machine Learning,2, 285–318.
Google Scholar
Littlestone, N. (1991). Redundant noisy attributes, attribute errors, and linear threshold learning using winnow. InProc. 4th workshop on computational learning theory.
Mandler, J. M. (2003).Conceptual Categorization, Chapter 5. In D. H., Rakinson, & L. M. Oakes, (eds.),Early category and concept development: Making sense of the blooming, buzzing confusion. Oxford University Press.
Minsky, M., & Papert., S., (1969).Perceptrons: An introduction to computational geometry. The MIT press.
Nosofsky, R., & Zaki, S., (1969). Math modeling, neuropsychology, and category learning: Response to B. Knowlton (1999).Trends in Cognitive Science,3, 125–126, 1999.Perceptrons: An introduction to computational geometry. The MIT press.
Rakinson, D. H., & Oakes, L. M. (eds.), (2003).Early category and concept development: Making sense of the blooming, buzzing confusion. Oxford University Press.
Reed, S. K. (1982).Categorization, in cognition: Theory and applications. brooks/cole.
Reed, S. K., & Friedman, M. P. (1973). Perceptual vs. conceptual categorization.Memory and Cognition, 1.
Rosch, E. (1978). Principles of categorization. in E., Rosch, & Lloyd, B. B. (eds.),Cognition and categorization Hillsdale.
Rosch, E. H., Mervis, C. B., Gray, W. D., Johnson, D. M., & Boyes-Braem, P. (1976). Basic ojects in natural categories.Cognitive Psychology,8.
Rosenblatt, F. (1962).Principles of neurodynamics. Spartan Books.
Schapire, R. E., Freund, Y., Bartlett, P. L., & Lee, W. S. (1998). Boosting the margin: A new explanation for the effectiveness of voting methods.Annals of Statistics,26(5), 1651–1686.
Article MathSciNet Google Scholar
Valiant, L. G. (1984). A theory of the learnable.Communications of the ACM,27(11), 1134–1142.
Article MATH Google Scholar
Valiant, L. G. (1998). A neuroidal architecture for cognitive computation. InProc. of ICALP.
Vapnik, V. N. (1995).The nature of statistical learning theory. Springer.
Vapnik, V. N., & Chervonenkis, A. Ya., (1971). On the uniform convergence of relative frequencies of events to their probabilities. Theory of Probability and its applications,26(2), 264–280.
Article MathSciNet Google Scholar
Vempala, S. (1997). A random sampling based algorithm for learning the Intersection of Half-spaces. InProc. of the 38th IEEE Foundations of Computer Science.
Vempala, S. (2004).The random projection method.65. DIMACS series, AMS.

Download references

Author information

Authors and Affiliations

Department of Psychology, Southern New Hampshire University, 2500 N. River Road, Manchester, NH, 03106
Rosa I. Arriaga
Department of Mathematics, M.I.T., 77 massachusetts avenue, cambridge, ma, 02139-4307
Santosh Vempala

Authors

Rosa I. Arriaga
View author publications
You can also search for this author inPubMed Google Scholar
Santosh Vempala
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence toSantosh Vempala.

Additional information

Editor: Shai Ben-David

A preliminary version of this paper appeared in the Proc. of the Symposium on the Foundations of Computer Science, 1999

Rights and permissions

Reprints and permissions

About this article

Cite this article

Arriaga, R.I., Vempala, S. An algorithmic theory of learning: Robust concepts and random projection.Mach Learn63, 161–182 (2006). https://doi.org/10.1007/s10994-006-6265-7

Download citation

Received:11 August 2003
Revised:27 July 2004
Accepted:10 November 2005
Published:28 March 2006
Issue Date:May 2006
DOI:https://doi.org/10.1007/s10994-006-6265-7