Movatterモバイル変換


[0]ホーム

URL:


Jump to content
WikipediaThe Free Encyclopedia
Search

Large deviations theory

From Wikipedia, the free encyclopedia
Branch of probability theory

Inprobability theory, the theory oflarge deviations concerns the asymptotic behaviour of remote tails of sequences of probability distributions. While some basic ideas of the theory can be traced toLaplace, the formalization started with insurance mathematics, namelyruin theory withCramér andLundberg. A unified formalization of large deviation theory was developed in 1966, in a paper byVaradhan.[1] Large deviations theory formalizes the heuristic ideas ofconcentration of measures and widely generalizes the notion ofconvergence of probability measures.

Roughly speaking, large deviations theory concerns itself with the exponential decline of the probability measures of certain kinds of extreme ortail events.

Introductory examples

[edit]

Any large deviation is done in the least unlikely of all the unlikely ways!

— Frank den Hollander, Large Deviations, p. 10

An elementary example

[edit]

Consider a sequence of independent tosses of a fair coin. The possible outcomes could be heads or tails. Let us denote the possible outcome of the i-th trial byXi{\displaystyle X_{i}}, where we encode head as 1 and tail as 0. Now letMN{\displaystyle M_{N}} denote the mean value afterN{\displaystyle N} trials, namely

MN=1Ni=1NXi{\displaystyle M_{N}={\frac {1}{N}}\sum _{i=1}^{N}X_{i}}.

ThenMN{\displaystyle M_{N}} lies between 0 and 1. From thelaw of large numbers it follows that as N grows, the distribution ofMN{\displaystyle M_{N}} converges to0.5=E[X]{\displaystyle 0.5=\operatorname {E} [X]} (the expected value of a single coin toss).

Moreover, by thecentral limit theorem, it follows thatMN{\displaystyle M_{N}} is approximately normally distributed for largeN{\displaystyle N}. The central limit theorem can provide more detailed information about the behavior ofMN{\displaystyle M_{N}} than the law of large numbers. For example, we can approximately find a tail probability ofMN{\displaystyle M_{N}} – the probability thatMN{\displaystyle M_{N}} is greater than some valuex{\displaystyle x} – for a fixed value ofN{\displaystyle N}. However, the approximation by the central limit theorem may not be accurate ifx{\displaystyle x} is far fromE[Xi]{\displaystyle \operatorname {E} [X_{i}]} andN{\displaystyle N} is not sufficiently large. Also, it does not provide information about the convergence of the tail probabilities asN{\displaystyle N\to \infty }. However, the large deviation theory can provide answers for such problems.

Let us make this statement more precise. For a given value0.5<x<1{\displaystyle 0.5<x<1}, let us compute the tail probabilityP(MN>x){\displaystyle P(M_{N}>x)}. Define

I(x)=xlnx+(1x)ln(1x)+ln2{\displaystyle I(x)=x\ln {x}+(1-x)\ln(1-x)+\ln {2}}.

Note that the functionI(x){\displaystyle I(x)} is a convex, nonnegative function that is zero atx=12{\displaystyle x={\tfrac {1}{2}}} and increases asx{\displaystyle x} approaches1{\displaystyle 1}. It is the negative of theBernoulli entropy withp=12{\displaystyle p={\tfrac {1}{2}}}; that it's appropriate for coin tosses follows from theasymptotic equipartition property applied to aBernoulli trial. Then byChernoff's inequality, it can be shown thatP(MN>x)<exp(NI(x)){\displaystyle P(M_{N}>x)<\exp(-NI(x))}.[2] This bound is rather sharp, in the sense thatI(x){\displaystyle I(x)} cannot be replaced with a larger number which would yield a strict inequality for all positiveN{\displaystyle N}.[3] (However, the exponential bound can still be reduced by a subexponential factor on the order of1/N{\displaystyle 1/{\sqrt {N}}}; this follows from theStirling approximation applied to thebinomial coefficient appearing in theBernoulli distribution.) Hence, we obtain the following result:

P(MN>x)exp(NI(x)){\displaystyle P(M_{N}>x)\approx \exp(-NI(x))}.

The probabilityP(MN>x){\displaystyle P(M_{N}>x)} decays exponentially asN{\displaystyle N\to \infty } at a rate depending onx. This formula approximates any tail probability of the sample mean of i.i.d. variables and gives its convergence as the number of samples increases.

Large deviations for sums of independent random variables

[edit]
Main article:Cramér's theorem (large deviations)

In the above example of coin-tossing we explicitly assumed that each toss is anindependent trial, and the probability of getting head or tail is always the same.

LetX,X1,X2,{\displaystyle X,X_{1},X_{2},\ldots } beindependent and identically distributed (i.i.d.) random variables whose common distribution satisfies a certain growth condition. Then the following limit exists:

limN1NlnP(MN>x)=I(x){\displaystyle \lim _{N\to \infty }{\frac {1}{N}}\ln P(M_{N}>x)=-I(x)}.

Here

MN=1Ni=1NXi{\displaystyle M_{N}={\frac {1}{N}}\sum _{i=1}^{N}X_{i}},

as before.

FunctionI(){\displaystyle I(\cdot )} is called the "rate function" or "Cramér function" or sometimes the "entropy function".

The above-mentioned limit means that for largeN{\displaystyle N},

P(MN>x)exp[NI(x)]{\displaystyle P(M_{N}>x)\approx \exp[-NI(x)]},

which is the basic result of large deviations theory.[4][5]

If we know the probability distribution ofX{\displaystyle X}, an explicit expression for the rate function can be obtained. This is given by aLegendre–Fenchel transformation,[6]

I(x)=supθ>0[θxλ(θ)]{\displaystyle I(x)=\sup _{\theta >0}[\theta x-\lambda (\theta )]},

where

λ(θ)=lnE[exp(θX)]{\displaystyle \lambda (\theta )=\ln \operatorname {E} [\exp(\theta X)]}

is called thecumulant generating function (CGF) andE{\displaystyle \operatorname {E} } denotes themathematical expectation.

IfX{\displaystyle X} follows anormal distribution, the rate function becomes a parabola with its apex at the mean of the normal distribution.

If{Xi}{\displaystyle \{X_{i}\}} is an irreducible and aperiodicMarkov chain, the variant of the basic large deviations result stated above may hold.[citation needed]

Moderate deviations for sums of independent random variables

[edit]

The previous example controlled the probability of the event[MN>x]{\displaystyle [M_{N}>x]}, that is, the concentration of the law ofMN{\displaystyle M_{N}} on thecompact set[x,x]{\displaystyle [-x,x]}. It is also possible to control the probability of the event[MN>xaN]{\displaystyle [M_{N}>xa_{N}]} for some sequenceaN0{\displaystyle a_{N}\to 0}. The following is an example of amoderate deviations principle:[7][8]

TheoremLetX1,X2,{\displaystyle X_{1},X_{2},\dots } be a sequence of centered i.i.d variables with finite varianceσ2{\displaystyle \sigma ^{2}} such thatλR, lnE[eλX1]<{\displaystyle \forall \lambda \in \mathbb {R} ,\ \ln \mathbb {E} [e^{\lambda X_{1}}]<\infty }. DefineMN:=1NnNXN{\displaystyle M_{N}:={\frac {1}{N}}\sum \limits _{n\leq N}X_{N}}. Then for any sequence1aNN{\displaystyle 1\ll a_{N}\ll {\sqrt {N}}}:

limN+aN2NlnP[aNMNx]=x22σ2{\displaystyle \lim \limits _{N\to +\infty }{\frac {a_{N}^{2}}{N}}\ln \mathbb {P} [a_{N}M_{N}\geq x]=-{\frac {x^{2}}{2\sigma ^{2}}}}

In particular, the limit caseaN=N{\displaystyle a_{N}={\sqrt {N}}} is thecentral limit theorem.

Formal definition

[edit]

Given aPolish spaceX{\displaystyle {\mathcal {X}}} let{PN}{\displaystyle \{\mathbb {P} _{N}\}} be a sequence ofBorel probability measures onX{\displaystyle {\mathcal {X}}}, let{aN}{\displaystyle \{a_{N}\}} be a sequence of positive real numbers such thatlimNaN={\displaystyle \lim _{N}a_{N}=\infty }, and finally letI:X[0,]{\displaystyle I:{\mathcal {X}}\to [0,\infty ]} be alower semicontinuous functional onX.{\displaystyle {\mathcal {X}}.} The sequence{PN}{\displaystyle \{\mathbb {P} _{N}\}} is said to satisfy alarge deviation principle withspeed{an}{\displaystyle \{a_{n}\}} andrateI{\displaystyle I} if, and only if, for each Borelmeasurable setEX{\displaystyle E\subset {\mathcal {X}}},

infxEI(x)lim_NaN1log(PN(E))lim¯NaN1log(PN(E))infxE¯I(x){\displaystyle -\inf _{x\in E^{\circ }}I(x)\leq \varliminf _{N}a_{N}^{-1}\log(\mathbb {P} _{N}(E))\leq \varlimsup _{N}a_{N}^{-1}\log(\mathbb {P} _{N}(E))\leq -\inf _{x\in {\overline {E}}}I(x)},

whereE¯{\displaystyle {\overline {E}}} andE{\displaystyle E^{\circ }} denote respectively theclosure andinterior ofE{\displaystyle E}.[citation needed]

Brief history

[edit]

The first rigorous results concerning large deviations are due to the Swedish mathematicianHarald Cramér, who applied them to model the insurance business.[9] From the pointof view of an insurance company, the earning is at a constant rate per month (the monthly premium) but the claims come randomly. For the company to be successful over a certain period of time (preferably many months), the total earning should exceed the total claim. Thus to estimate the premium you have to ask the following question: "What should we choose as the premiumq{\displaystyle q} such that overN{\displaystyle N} months the total claimC=ΣXi{\displaystyle C=\Sigma X_{i}} should be less thanNq{\displaystyle Nq}?" This is clearly the same question asked by the large deviations theory. Cramér gave a solution to this question for i.i.d.random variables, where the rate function is expressed as apower series.

A very incomplete list of mathematicians who have made important advances would includePetrov,[10]Sanov,[11]S.R.S. Varadhan (who has won the Abel prize for his contribution to the theory),D. Ruelle,O.E. Lanford,Mark Freidlin,Alexander D. Wentzell,Amir Dembo, andOfer Zeitouni.[12]

Applications

[edit]

Principles of large deviations may be effectively applied to gather information out of a probabilistic model. Thus, theory of large deviations finds its applications ininformation theory andrisk management. In physics, the best known application of large deviations theory arise inthermodynamics andstatistical mechanics (in connection with relatingentropy with rate function).[13][14]

Large deviations and entropy

[edit]
Main article:asymptotic equipartition property

The rate function is related to theentropy in statistical mechanics. This can be heuristically seen in the following way. In statistical mechanics the entropy of a particular macro-state is related to the number of micro-states which corresponds to this macro-state. In our coin tossing example the mean valueMN{\displaystyle M_{N}} could designate a particular macro-state. And the particular sequence of heads and tails which gives rise to a particular value ofMN{\displaystyle M_{N}} constitutes a particular micro-state. Loosely speaking a macro-state having a higher number of micro-states giving rise to it, has higher entropy. And a state with higher entropy has a higher chance of being realised in actual experiments. The macro-state with mean value of 1/2 (as many heads as tails) has the highest number of micro-states giving rise to it and it is indeed the state with the highest entropy. And in most practical situations we shall indeed obtain this macro-state for large numbers of trials. The "rate function" on the other hand measures the probability of appearance of a particular macro-state. The smaller the rate function the higher is the chance of a macro-state appearing. In our coin-tossing the value of the "rate function" for mean value equal to 1/2 is zero. In this way one can see the "rate function" as the negative of the "entropy".

There is a relation between the "rate function" in large deviations theory and theKullback–Leibler divergence, the connection is established bySanov's theorem (see Sanov[11] and Novak,[15] ch. 14.5).

In a special case, large deviations are closely related to the concept ofGromov–Hausdorff limits.[16]

See also

[edit]

References

[edit]
  1. ^S.R.S. Varadhan,Asymptotic probability and differential equations,Comm. Pure Appl. Math. 19 (1966),261-286.
  2. ^"Large deviations for performance analysis: queues, communications, and computing", Shwartz, Adam, 1953- TN: 1228486
  3. ^Varadhan, S.R.S.,The Annals of Probability 2008, Vol. 36, No. 2, 397–419,[1]
  4. ^"Large Deviations"(PDF).www.math.nyu.edu. 2 February 2012. Retrieved11 June 2024.
  5. ^S.R.S. Varadhan, Large Deviations and Applications (SIAM, Philadelphia, 1984)
  6. ^Touchette, Hugo (1 July 2009). "The large deviation approach to statistical mechanics".Physics Reports.478 (1–3):1–69.arXiv:0804.0327.Bibcode:2009PhR...478....1T.doi:10.1016/j.physrep.2009.05.002.S2CID 118416390.
  7. ^Dembo, Amir; Zeitouni, Ofer (3 November 2009).Large Deviations Techniques and Applications. Springer Science & Business Media. p. 109.ISBN 978-3-642-03311-7.
  8. ^Sethuraman, Jayaram; O., Robert (2011),"Moderate Deviations", in Lovric, Miodrag (ed.),International Encyclopedia of Statistical Science, Berlin, Heidelberg: Springer Berlin Heidelberg, pp. 847–849,doi:10.1007/978-3-642-04898-2_374,ISBN 978-3-642-04897-5, retrieved2 July 2023
  9. ^Cramér, H. (1944). On a new limit theorem of the theory of probability. Uspekhi Matematicheskikh Nauk, (10), 166-178.
  10. ^Petrov V.V. (1954) Generalization of Cramér's limit theorem. Uspehi Matem. Nauk, v. 9, No 4(62), 195--202.(Russian)
  11. ^abSanov I.N. (1957) On the probability of large deviations of random magnitudes. Matem. Sbornik, v. 42 (84), 11--44.
  12. ^Dembo, A., & Zeitouni, O. (2009). Large deviations techniques and applications (Vol. 38). Springer Science & Business Media
  13. ^Gingrich, Todd R.; Horowitz, Jordan M.; Perunov, Nikolay; England, Jeremy L. (21 March 2016)."Dissipation Bounds All Steady-State Current Fluctuations".Physical Review Letters.116 (12) 120601.arXiv:1512.02212.doi:10.1103/PhysRevLett.116.120601.
  14. ^Harvey, Sarah E.; Lahiri, Subhaneil; Ganguli, Surya (7 July 2023)."Universal energy-accuracy tradeoffs in nonequilibrium cellular sensing".Physical Review E.108 (1) 014403.arXiv:2002.10567.doi:10.1103/PhysRevE.108.014403.
  15. ^Novak S.Y. (2011) Extreme value methods with applications to finance. Chapman & Hall/CRC Press.ISBN 978-1-4398-3574-6.
  16. ^Kotani M.,Sunada T.Large deviation and the tangent cone at infinity of a crystal lattice, Math. Z. 254, (2006), 837-870.

Bibliography

[edit]
  • Special invited paper: Large deviations by S. R. S. Varadhan The Annals of Probability 2008, Vol. 36, No. 2, 397–419doi:10.1214/07-AOP348
  • A basic introduction to large deviations: Theory, applications, simulations, Hugo Touchette, arXiv:1106.4146.
  • Entropy, Large Deviations and Statistical Mechanics by R.S. Ellis, Springer Publication.ISBN 3-540-29059-1
  • Large Deviations for Performance Analysis by Alan Weiss and Adam Shwartz. Chapman and HallISBN 0-412-06311-5
  • Large Deviations Techniques and Applications by Amir Dembo and Ofer Zeitouni. SpringerISBN 0-387-98406-2
  • A course on large deviations with an introduction to Gibbs measures by Firas Rassoul-Agha and Timo Seppäläinen. Grad. Stud. Math., 162. American Mathematical SocietyISBN 978-0-8218-7578-0
  • Random Perturbations of Dynamical Systems byM.I. Freidlin and A.D. Wentzell. SpringerISBN 0-387-98362-7
  • "Large Deviations for Two Dimensional Navier-Stokes Equation with Multiplicative Noise", S. S. Sritharan and P. Sundar, Stochastic Processes and Their Applications, Vol. 116 (2006) 1636–1659.[2]
  • "Large Deviations for the Stochastic Shell Model of Turbulence", U. Manna, S. S. Sritharan and P. Sundar, NoDEA Nonlinear Differential Equations Appl. 16 (2009), no. 4, 493–521.[3]
Discrete time
Continuous time
Both
Fields and other
Time series models
Financial models
Actuarial models
Queueing models
Properties
Limit theorems
Inequalities
Tools
Disciplines
Retrieved from "https://en.wikipedia.org/w/index.php?title=Large_deviations_theory&oldid=1334892800"
Categories:
Hidden categories:

[8]ページ先頭

©2009-2026 Movatter.jp