Inprobability theory, thecontinuous mapping theorem states that continuous functionspreserve limits even if their arguments are sequences of random variables. A continuous function, inHeine's definition, is such a function that maps convergent sequences into convergent sequences: ifxn →x theng(xn) →g(x). Thecontinuous mapping theorem states that this will also be true if we replace the deterministic sequence {xn} with a sequence of random variables {Xn}, and replace the standard notion of convergence of real numbers “→” with one of the types ofconvergence of random variables.
This theorem was first proved byHenry Mann andAbraham Wald in 1943,[1] and it is therefore sometimes called theMann–Wald theorem.[2] Meanwhile,Denis Sargan refers to it as thegeneral transformation theorem.[3]
Let {Xn},X berandom elements defined on ametric spaceS. Suppose a functiong:S→S′ (whereS′ is another metric space) has the set ofdiscontinuity pointsDg such thatPr[X ∈ Dg] = 0. Then[4][5]
where the superscripts, "d", "p", and "a.s." denoteconvergence in distribution,convergence in probability, andalmost sure convergence respectively.
SpacesS andS′ are equipped with certain metrics. For simplicity we will denote both of these metrics using the |x − y| notation, even though the metrics may be arbitrary and not necessarily Euclidean.
We will need a particular statement from theportmanteau theorem: that convergence in distribution is equivalent to
So it suffices to prove that for every bounded continuous functionalf. For simplicity we assumeg continuous. Note that is itself a bounded continuous functional. And so the claim follows from the statement above. The general case is slightly more technical.
Fix an arbitraryε > 0. Then for anyδ > 0 consider the setBδ defined as
This is the set of continuity pointsx of the functiong(·) for which it is possible to find, within theδ-neighborhood ofx, a point which maps outside theε-neighborhood ofg(x). By definition of continuity, this set shrinks asδ goes to zero, so that limδ → 0Bδ = ∅.
Now suppose that |g(X) − g(Xn)| > ε. This implies that at least one of the following is true: either |X−Xn| ≥ δ, orX ∈ Dg, orX∈Bδ. In terms of probabilities this can be written as
On the right-hand side, the first term converges to zero asn → ∞ for any fixedδ, by the definition of convergence in probability of the sequence {Xn}. The second term converges to zero asδ → 0, since the setBδ shrinks to an empty set. And the last term is identically equal to zero by assumption of the theorem. Therefore, the conclusion is that
which means thatg(Xn) converges tog(X) in probability.
By definition of the continuity of the functiong(·),
at each pointX(ω) whereg(·) is continuous. Therefore,
because the intersection of two almost sure events is almost sure.
By definition, we conclude thatg(Xn) converges tog(X) almost surely.