Movatterモバイル変換

[0]ホーム

Jump to content

Law of the unconscious statistician

Español

Edit links

From Wikipedia, the free encyclopedia

Theorem in probability and statistics

Inprobability theory andstatistics, thelaw of the unconscious statistician, orLOTUS, is a theorem which expresses theexpected value of afunctiong(X) of arandom variableX in terms ofg and theprobability distribution ofX.

The form of the law depends on the type of random variableX in question. If the distribution ofX isdiscrete and one knows itsprobability mass functionp_X, then the expected value ofg(X) is $\operatorname {E} [g(X)]=\sum _{x}g(x)p_{X}(x),\,$ where the sum is over all possible valuesx ofX. If instead the distribution ofX iscontinuous withprobability density functionf_X, then the expected value ofg(X) is $\operatorname {E} [g(X)]=\int _{-\infty }^{\infty }g(x)f_{X}(x)\,\mathrm {d} x$

Both of these special cases can be expressed in terms of thecumulative probability distribution functionF_X ofX, with the expected value ofg(X) now given by theLebesgue–Stieltjes integral $\operatorname {E} [g(X)]=\int _{-\infty }^{\infty }g(x)\,\mathrm {d} F_{X}(x).$

In even greater generality,X could be arandom element in anymeasurable space, in which case the law is given in terms ofmeasure theory and theLebesgue integral. In this setting, there is no need to restrict the context toprobability measures, and the law becomes a general theorem ofmathematical analysis on Lebesgue integration relative to apushforward measure.

Etymology

[edit]

This proposition is (sometimes) known as thelaw of the unconscious statistician because of a purported tendency to think of the aforementioned law as the very definition of the expected value of a functiong(X) and a random variableX, rather than (more formally) as a consequence of the true definition of expected value.^[1] The naming is sometimes attributed toSheldon Ross' textbookIntroduction to Probability Models, although he removed the reference in later editions.^[2] Many statistics textbooks do present the result as the definition of expected value.^[3]

Joint distributions

[edit]

A similar property holds forjoint distributions, or equivalently, forrandom vectors. For discrete random variablesX andY, a function of two variablesg, and joint probability mass function $p_{X,Y}(x,y)$ :^[4] $\operatorname {E} [g(X,Y)]=\sum _{y}\sum _{x}g(x,y)p_{X,Y}(x,y)$ In theabsolutely continuous case, with $f_{X,Y}(x,y)$ being the joint probability density function, $\operatorname {E} [g(X,Y)]=\int _{-\infty }^{\infty }\int _{-\infty }^{\infty }g(x,y)f_{X,Y}(x,y)\,\mathrm {d} x\,\mathrm {d} y$

Special cases

[edit]

A number of special cases are given here. In the simplest case, where the random variableX takes on countably many values (so that its distribution is discrete), the proof is particularly simple, and holds without modification ifX is a discreterandom vector or even a discreterandom element.

The case of acontinuous random variable is more subtle, since the proof in generality requires subtle forms of the change-of-variables formula for integration. However, in the framework ofmeasure theory, the discrete case generalizes straightforwardly to general (not necessarily discrete)random elements, and the case of a continuous random variable is then a special case by making use of theRadon–Nikodym theorem.

Discrete case

[edit]

Suppose thatX is a random variable which takes on only finitely or countably many different valuesx₁,x₂, ..., with probabilitiesp₁,p₂, .... Then for any functiong of these values, the random variableg(X) has valuesg(x₁),g(x₂), ..., although some of these may coincide with each other. For example, this is the case ifX can take on both values1 and−1 andg(x) =x².

Lety₁,y₂, ... enumerate the possibledistinct values of $g(X)$ , and for eachi letI_i denote the collection of allj withg(x_j) =y_i. Then, according to the definition of expected value, there is $\operatorname {E} [g(X)]=\sum _{i}y_{i}p_{g(X)}(y_{i}).$

Since a $y_{i}$ can be the image of multiple, distinct $x_{j}$ , it holds that $p_{g(X)}(y_{i})=\sum _{j\in I_{i}}p_{X}(x_{j}).$

Then the expected value can be rewritten as $\sum _{i}y_{i}p_{g(X)}(y_{i})=\sum _{i}y_{i}\sum _{j\in I_{i}}p_{X}(x_{j})=\sum _{i}\sum _{j\in I_{i}}g(x_{j})p_{X}(x_{j})=\sum _{x}g(x)p_{X}(x).$ This equality relates the average of the outputs ofg(X) as weighted by the probabilities of the outputs themselves to the average of the outputs ofg(X) as weighted by the probabilities of the outputs ofX.

IfX takes on only finitely many possible values, the above is fully rigorous. However, ifX takes on countably many values, the last equality given does not always hold, as seen by theRiemann series theorem. Because of this, it is necessary to assume theabsolute convergence of the sums in question.^[5]

Continuous case

[edit]

Suppose thatX is a random variable whose distribution has a continuous densityf. Ifg is a general function, then the probability thatg(X) is valued in a set of real numbersK equals the probability thatX is valued ing⁻¹(K), which is given by $\int _{g^{-1}(K)}f(x)\,\mathrm {d} x.$ Under various conditions ong, thechange-of-variables formula for integration can be applied to relate this to an integral overK, and hence to identify the density ofg(X) in terms of the density ofX. In the simplest case, ifg is differentiable with nowhere-vanishing derivative, then the above integral can be written as $\int _{K}f(g^{-1}(y))(g^{-1})'(y)\,\mathrm {d} y,$ thereby identifyingg(X) as possessing the densityf (g⁻¹(y))(g⁻¹)′(y). The expected value ofg(X) is then identified as $\int _{-\infty }^{\infty }yf(g^{-1}(y))(g^{-1})'(y)\,\mathrm {d} y=\int _{-\infty }^{\infty }g(x)f(x)\,\mathrm {d} x,$ where the equality follows by another use of the change-of-variables formula for integration. This shows that the expected value ofg(X) is encoded entirely by the functiong and the densityf ofX.^[6]

The assumption thatg is differentiable with nonvanishing derivative, which is necessary for applying the usual change-of-variables formula, excludes many typical cases, such asg(x) =x². The result still holds true in these broader settings, although the proof requires more sophisticated results frommathematical analysis such asSard's theorem and thecoarea formula. In even greater generality, using theLebesgue theory as below, it can be found that the identity $\operatorname {E} [g(X)]=\int _{-\infty }^{\infty }g(x)f(x)\,\mathrm {d} x$ holds true wheneverX has a densityf (which does not have to be continuous) and wheneverg is ameasurable function for whichg(X) has finite expected value. (Every continuous function is measurable.) Furthermore, without modification to the proof, this holds even ifX is arandom vector (with density) andg is a multivariable function; the integral is then taken over the multi-dimensional range of values ofX.

Measure-theoretic formulation

[edit]

An abstract and general form of the result is available using the framework ofmeasure theory and theLebesgue integral. Here, the setting is that of ameasure space(Ω,μ) and ameasurable mapX fromΩ to ameasurable spaceΩ'. The theorem then says that for any measurable functiong onΩ' which is valued inreal numbers (or even theextended real number line), there is $\int _{\Omega }g\circ X\,\mathrm {d} \mu =\int _{\Omega '}g\,\mathrm {d} (X_{\sharp }\mu ),$ (interpreted as saying, in particular, that either side of the equality exists if the other side exists). HereX_♯μ denotes thepushforward measure onΩ′. The 'discrete case' given above is the special case arising whenX takes on only countably many values andμ is aprobability measure. In fact, the discrete case (although without the restriction to probability measures) is the first step in proving the general measure-theoretic formulation, as the general version follows therefrom by an application of themonotone convergence theorem.^[7] Without any major changes, the result can also be formulated in the setting ofouter measures.^[8]

Ifμ is aσ-finite measure, the theory of theRadon–Nikodym derivative is applicable. In the special case that the measureX_♯μ isabsolutely continuous relative to some background σ-finite measureν onΩ′, there is a real-valued functionf_X onΩ' representing theRadon–Nikodym derivative of the two measures, and then $\int _{\Omega '}g\,\mathrm {d} (X_{\sharp }\mu )=\int _{\Omega '}gf_{X}\,\mathrm {d} \nu .$ In the further special case thatΩ′ is thereal number line, as in the contexts discussed above, it is natural to takeν to be theLebesgue measure, and this then recovers the 'continuous case' given above wheneverμ is aprobability measure. (In this special case, the condition of σ-finiteness is vacuous, since Lebesgue measure and every probability measure are trivially σ-finite.)^[9]

References

[edit]

^DeGroot & Schervish 2014, pp. 213−214.
^Casella & Berger 2001, Section 2.2;Ross 2019.
^Casella & Berger 2001, Section 2.2.
^Ross 2019.
^Feller 1968, Section IX.2.
^Papoulis & Pillai 2002, Chapter 5.
^Bogachev 2007, Section 3.6;Cohn 2013, Section 2.6;Halmos 1950, Section 39.
^Federer 1969, Section 2.4.
^Halmos 1950, Section 39.

Bogachev, V. I. (2007).Measure theory. Volume I. Berlin:Springer-Verlag.doi:10.1007/978-3-540-34514-5.ISBN 978-3-540-34513-8.MR 2267655.Zbl 1120.28001.
Casella, George;Berger, Roger L. (2001).Statistical inference. Duxbury Advanced Series (Second edition of 1990 original ed.). Pacific Grove, CA: Duxbury.ISBN 0-534-11958-1.Zbl 0699.62001.
Cohn, Donald L. (2013).Measure theory. Birkhäuser Advanced Texts: Basler Lehrbücher (Second edition of 1980 original ed.). New York:Birkhäuser/Springer.doi:10.1007/978-1-4614-6956-8.ISBN 978-1-4614-6955-1.MR 3098996.Zbl 1292.28002.
DeGroot, Morris H.; Schervish, Mark J. (2014).Probability and statistics (Fourth edition of 1975 original ed.).Pearson Education.ISBN 0-321-50046-6.MR 0373075.Zbl 0619.62001.
Federer, Herbert (1969).Geometric measure theory. Die Grundlehren der mathematischen Wissenschaften. Vol. 153. Berlin–Heidelberg–New York:Springer-Verlag.doi:10.1007/978-3-642-62010-2.ISBN 978-3-540-60656-7.MR 0257325.Zbl 0176.00801.
Feller, William (1968).An introduction to probability theory and its applications. Volume I (Third edition of 1950 original ed.). New York–London–Sydney:John Wiley & Sons, Inc.MR 0228020.Zbl 0155.23101.
Halmos, Paul R. (1950).Measure theory. New York:D. Van Nostrand Co., Inc.doi:10.1007/978-1-4684-9440-2.MR 0033869.Zbl 0040.16802.
Papoulis, Athanasios;Pillai, S. Unnikrishna (2002).Probability, random variables, and stochastic processes (Fourth edition of 1965 original ed.). New York:McGraw-Hill.ISBN 0-07-366011-6.
Ross, Sheldon M. (2019).Introduction to probability models (Twelfth edition of 1972 original ed.). London:Academic Press.doi:10.1016/C2017-0-01324-1.ISBN 978-0-12-814346-9.MR 3931305.Zbl 1408.60002.

Retrieved from "https://en.wikipedia.org/w/index.php?title=Law_of_the_unconscious_statistician&oldid=1265412832"

Categories:

Hidden categories:

[8]ページ先頭