Movatterモバイル変換


[0]ホーム

URL:


Jump to content
WikipediaThe Free Encyclopedia
Search

Occam learning

From Wikipedia, the free encyclopedia
Model of algorithmic learning
Part of a series on
Machine learning
anddata mining
Journals and conferences

Incomputational learning theory,Occam learning is a model of algorithmic learning where the objective of the learner is to output a succinct representation of received training data. This is closely related toprobably approximately correct (PAC) learning, where the learner is evaluated on its predictive power of a test set.

Occam learnability implies PAC learning, and for a wide variety ofconcept classes, the converse is also true: PAC learnability implies Occam learnability.

Introduction

[edit]

Occam Learning is named afterOccam's razor, which is a principle stating that, given all other things being equal, a shorter explanation for observed data should be favored over a lengthier explanation. The theory of Occam learning is a formal and mathematical justification for this principle. It was first shown by Blumer, et al.[1] that Occam learning implies PAC learning, which is the standard model of learning in computational learning theory. In other words,parsimony (of the output hypothesis) impliespredictive power.

Definition of Occam learning

[edit]

The succinctness of a conceptc{\displaystyle c} inconcept classC{\displaystyle {\mathcal {C}}} can be expressed by the lengthsize(c){\displaystyle size(c)} of the shortest bit string that can representc{\displaystyle c} inC{\displaystyle {\mathcal {C}}}. Occam learning connects the succinctness of a learning algorithm's output to its predictive power on unseen data.

LetC{\displaystyle {\mathcal {C}}} andH{\displaystyle {\mathcal {H}}} be concept classes containing target concepts and hypotheses respectively. Then, for constantsα0{\displaystyle \alpha \geq 0} and0β<1{\displaystyle 0\leq \beta <1}, a learning algorithmL{\displaystyle L} is an(α,β){\displaystyle (\alpha ,\beta )}-Occam algorithm forC{\displaystyle {\mathcal {C}}} usingH{\displaystyle {\mathcal {H}}} iff, given a setS={x1,,xm}{\displaystyle S=\{x_{1},\dots ,x_{m}\}} ofm{\displaystyle m} samples labeled according to a conceptcC{\displaystyle c\in {\mathcal {C}}},L{\displaystyle L} outputs a hypothesishH{\displaystyle h\in {\mathcal {H}}} such that

wheren{\displaystyle n} is the maximum length of any samplexS{\displaystyle x\in S}. An Occam algorithm is calledefficient if it runs in time polynomial inn{\displaystyle n},m{\displaystyle m}, andsize(c).{\displaystyle size(c).} We say a concept classC{\displaystyle {\mathcal {C}}} isOccam learnable with respect to a hypothesis classH{\displaystyle {\mathcal {H}}} if there exists an efficient Occam algorithm forC{\displaystyle {\mathcal {C}}} usingH.{\displaystyle {\mathcal {H}}.}

The relation between Occam and PAC learning

[edit]

Occam learnability implies PAC learnability, as the following theorem of Blumer, et al.[2] shows:

Theorem (Occam learning implies PAC learning)

[edit]

LetL{\displaystyle L} be an efficient(α,β){\displaystyle (\alpha ,\beta )}-Occam algorithm forC{\displaystyle {\mathcal {C}}} usingH{\displaystyle {\mathcal {H}}}. Then there exists a constanta>0{\displaystyle a>0} such that for any0<ϵ,δ<1{\displaystyle 0<\epsilon ,\delta <1}, for any distributionD{\displaystyle {\mathcal {D}}}, givenma(1ϵlog1δ+((nsize(c))α)ϵ)11β){\displaystyle m\geq a\left({\frac {1}{\epsilon }}\log {\frac {1}{\delta }}+\left({\frac {(n\cdot size(c))^{\alpha })}{\epsilon }}\right)^{\frac {1}{1-\beta }}\right)} samples drawn fromD{\displaystyle {\mathcal {D}}} and labelled according to a conceptcC{\displaystyle c\in {\mathcal {C}}} of lengthn{\displaystyle n} bits each, the algorithmL{\displaystyle L} will output a hypothesishH{\displaystyle h\in {\mathcal {H}}} such thaterror(h)ϵ{\displaystyle error(h)\leq \epsilon } with probability at least1δ{\displaystyle 1-\delta } .

Here,error(h){\displaystyle error(h)} is with respect to the conceptc{\displaystyle c} and distributionD{\displaystyle {\mathcal {D}}}. This implies that the algorithmL{\displaystyle L} is also a PAC learner for the concept classC{\displaystyle {\mathcal {C}}} using hypothesis classH{\displaystyle {\mathcal {H}}}. A slightly more general formulation is as follows:

Theorem (Occam learning implies PAC learning, cardinality version)

[edit]

Let0<ϵ,δ<1{\displaystyle 0<\epsilon ,\delta <1}. LetL{\displaystyle L} be an algorithm such that, givenm{\displaystyle m} samples drawn from a fixed but unknown distributionD{\displaystyle {\mathcal {D}}} and labeled according to a conceptcC{\displaystyle c\in {\mathcal {C}}} of lengthn{\displaystyle n} bits each, outputs a hypothesishHn,m{\displaystyle h\in {\mathcal {H}}_{n,m}} that is consistent with the labeled samples. Then, there exists a constantb{\displaystyle b} such that iflog|Hn,m|bϵmlog1δ{\displaystyle \log |{\mathcal {H}}_{n,m}|\leq b\epsilon m-\log {\frac {1}{\delta }}}, thenL{\displaystyle L} is guaranteed to output a hypothesishHn,m{\displaystyle h\in {\mathcal {H}}_{n,m}} such thaterror(h)ϵ{\displaystyle error(h)\leq \epsilon } with probability at least1δ{\displaystyle 1-\delta }.

While the above theorems show that Occam learning is sufficient for PAC learning, it doesn't say anything aboutnecessity. Board and Pitt show that, for a wide variety of concept classes, Occam learning is in fact necessary for PAC learning.[3] They proved that for any concept class that ispolynomially closed under exception lists, PAC learnability implies the existence of an Occam algorithm for that concept class. Concept classes that are polynomially closed under exception lists include Boolean formulas, circuits,deterministic finite automata, decision-lists, decision-trees, and other geometrically-defined concept classes.

A concept classC{\displaystyle {\mathcal {C}}} is polynomially closed under exception lists if there exists a polynomial-time algorithmA{\displaystyle A} such that, when given the representation of a conceptcC{\displaystyle c\in {\mathcal {C}}} and a finite listE{\displaystyle E} ofexceptions, outputs a representation of a conceptcC{\displaystyle c'\in {\mathcal {C}}} such that the conceptsc{\displaystyle c} andc{\displaystyle c'} agree except on the setE{\displaystyle E}.

Proof that Occam learning implies PAC learning

[edit]

We first prove the Cardinality version. Call a hypothesishH{\displaystyle h\in {\mathcal {H}}}bad iferror(h)ϵ{\displaystyle error(h)\geq \epsilon }, where againerror(h){\displaystyle error(h)} is with respect to the true conceptc{\displaystyle c} and the underlying distributionD{\displaystyle {\mathcal {D}}}. The probability that a set of samplesS{\displaystyle S} is consistent withh{\displaystyle h} is at most(1ϵ)m{\displaystyle (1-\epsilon )^{m}}, by the independence of the samples. By the union bound, the probability that there exists a bad hypothesis inHn,m{\displaystyle {\mathcal {H}}_{n,m}} is at most|Hn,m|(1ϵ)m{\displaystyle |{\mathcal {H}}_{n,m}|(1-\epsilon )^{m}}, which is less thanδ{\displaystyle \delta } iflog|Hn,m|O(ϵm)log1δ{\displaystyle \log |{\mathcal {H}}_{n,m}|\leq O(\epsilon m)-\log {\frac {1}{\delta }}}. This concludes the proof of the second theorem above.

Using the second theorem, we can prove the first theorem. Since we have a(α,β){\displaystyle (\alpha ,\beta )}-Occam algorithm, this means that any hypothesis output byL{\displaystyle L} can be represented by at most(nsize(c))αmβ{\displaystyle (n\cdot size(c))^{\alpha }m^{\beta }} bits, and thuslog|Hn,m|(nsize(c))αmβ{\displaystyle \log |{\mathcal {H}}_{n,m}|\leq (n\cdot size(c))^{\alpha }m^{\beta }}. This is less thanO(ϵm)log1δ{\displaystyle O(\epsilon m)-\log {\frac {1}{\delta }}} if we setma(1ϵlog1δ+((nsize(c))α)ϵ)11β){\displaystyle m\geq a\left({\frac {1}{\epsilon }}\log {\frac {1}{\delta }}+\left({\frac {(n\cdot size(c))^{\alpha })}{\epsilon }}\right)^{\frac {1}{1-\beta }}\right)} for some constanta>0{\displaystyle a>0}. Thus, by the Cardinality version Theorem,L{\displaystyle L} will output a consistent hypothesish{\displaystyle h} with probability at least1δ{\displaystyle 1-\delta }. This concludes the proof of the first theorem above.

Improving sample complexity for common problems

[edit]

Though Occam and PAC learnability are equivalent, the Occam framework can be used to produce tighter bounds on the sample complexity of classical problems including conjunctions,[2] conjunctions with few relevant variables,[4] and decision lists.[5]

Extensions

[edit]

Occam algorithms have also been shown to be successful for PAC learning in the presence of errors,[6][7] probabilistic concepts,[8] function learning[9] and Markovian non-independent examples.[10]

See also

[edit]

References

[edit]
  1. ^abBlumer, A., Ehrenfeucht, A., Haussler, D., & Warmuth, M. K. (1987).Occam's razor. Information processing letters, 24(6), 377-380.
  2. ^abcKearns, M. J., & Vazirani, U. V. (1994).An introduction to computational learning theory, chapter 2. MIT press.
  3. ^Board, R., & Pitt, L. (1990, April). On the necessity of Occam algorithms. In Proceedings of the twenty-second annual ACM symposium on Theory of computing (pp. 54-63). ACM.
  4. ^Haussler, D. (1988).Quantifying inductive bias: AI learning algorithms and Valiant's learning frameworkArchived 2013-04-12 at theWayback Machine. Artificial intelligence, 36(2), 177-221.
  5. ^Rivest, R. L. (1987).Learning decision lists. Machine learning, 2(3), 229-246.
  6. ^Angluin, D., & Laird, P. (1988). Learning from noisy examples. Machine Learning, 2(4), 343-370.
  7. ^Kearns, M., & Li, M. (1993). Learning in the presence of malicious errors. SIAM Journal on Computing, 22(4), 807-837.
  8. ^Kearns, M. J., & Schapire, R. E. (1990, October).Efficient distribution-free learning of probabilistic concepts. In Foundations of Computer Science, 1990. Proceedings., 31st Annual Symposium on (pp. 382-391). IEEE.
  9. ^Natarajan, B. K. (1993, August). Occam's razor for functions. In Proceedings of the sixth annual conference on Computational learning theory (pp. 370-376). ACM.
  10. ^Aldous, D., & Vazirani, U. (1990, October).A Markovian extension of Valiant's learning model. In Foundations of Computer Science, 1990. Proceedings., 31st Annual Symposium on (pp. 392-396). IEEE.

Further reading

[edit]
Retrieved from "https://en.wikipedia.org/w/index.php?title=Occam_learning&oldid=1172116408"
Categories:
Hidden categories:

[8]ページ先頭

©2009-2025 Movatter.jp