Movatterモバイル変換

[0]ホーム

Jump to content

Mixing (mathematics)

Edit links

From Wikipedia, the free encyclopedia

Mathematical description of mixing substances

Repeated application of thebaker's map to points colored red and blue, initially separated. The baker's map is mixing, shown by the red and blue points being completely mixed after several iterations.

Inmathematics,mixing is an abstract concept originating fromphysics: the attempt to describe the irreversiblethermodynamic process ofmixing in the everyday world: e.g. mixing paint, mixing drinks,industrial mixing.

The concept appears inergodic theory—the study ofstochastic processes andmeasure-preserving dynamical systems. Several different definitions for mixing exist, includingstrong mixing,weak mixing andtopological mixing, with the last not requiring ameasure to be defined. Some of the different definitions of mixing can be arranged in a hierarchical order; thus, strong mixing implies weak mixing. Furthermore, weak mixing (and thus also strong mixing) impliesergodicity: that is, every system that is weakly mixing is also ergodic (and so one says that mixing is a "stronger" condition than ergodicity).

Informal explanation

[edit]

The mathematical definition of mixing aims to capture the ordinary every-day process of mixing, such as mixing paints, drinks, cooking ingredients,industrial process mixing, smoke in a smoke-filled room, and so on. To provide the mathematical rigor, such descriptions begin with the definition of ameasure-preserving dynamical system, written as⁠ $(X,{\mathcal {A}},\mu ,T)$ ⁠.

The set $X {\displaystyle X}$ is understood to be the total space to be filled: the mixing bowl, the smoke-filled room,etc. Themeasure $\mu$ is understood to define the natural volume of the space $X {\displaystyle X}$ and of its subspaces. The collection of subspaces is denoted by⁠ ${\mathcal {A}}$ ⁠, and the size of any givensubset $A\subset X$ is⁠ $\mu (A)$ ⁠; the size is its volume. Naively, one could imagine ${\mathcal {A}}$ to be thepower set of⁠ $X {\displaystyle X}$ ⁠; this doesn't quite work, as not all subsets of a space have a volume (famously, theBanach–Tarski paradox). Thus, conventionally, ${\mathcal {A}}$ consists of the measurable subsets—the subsets that do have a volume. It is always taken to be aBorel set—the collection of subsets that can be constructed by takingintersections,unions andset complements; these can always be taken to be measurable.

The time evolution of the system is described by amap $T:X\to X$ . Given some subset $A\subset X$ , its map $T(A)$ will in general be a deformed version of $A {\displaystyle A}$ – it is squashed or stretched, folded or cut into pieces. Mathematical examples include thebaker's map and thehorseshoe map, both inspired bybread-making. The set $T(A)$ must have the same volume as $A {\displaystyle A}$ ; the squashing/stretching does not alter the volume of the space, only its distribution. Such a system is "measure-preserving" (area-preserving, volume-preserving).

A formal difficulty arises when one tries to reconcile the volume of sets with the need to preserve their size under a map. The problem arises because, in general, several different points in the domain of a function can map to the same point in its range; that is, there may be $x\neq y$ with⁠ $T(x)=T(y)$ ⁠. Worse, a single point $x\in X$ has no size. These difficulties can be avoided by working with the inverse map⁠ $T^{-1}:{\mathcal {A}}\to {\mathcal {A}}$ ⁠; it will map any given subset $A\subset X$ to the parts that were assembled to make it: these parts are⁠ $T^{-1}(A)\in {\mathcal {A}}$ ⁠. It has the important property of not "losing track" of where things came from. More strongly, it has the important property thatany (measure-preserving) map ${\mathcal {A}}\to {\mathcal {A}}$ is the inverse of some map⁠ $X\to X$ ⁠. The proper definition of a volume-preserving map is one for which $\mu (A)=\mu (T^{-1}(A))$ because $T^{-1}(A)$ describes all the pieces-parts that $A {\displaystyle A}$ came from.

One is now interested in studying the time evolution of the system. If a set $A\in {\mathcal {A}}$ eventually visits all of $X {\displaystyle X}$ over a long period of time (that is, if $\cup _{k=1}^{n}T^{k}(A)$ approaches all of $X {\displaystyle X}$ for large $n {\displaystyle n}$ ), the system is said to beergodic. If every set $A {\displaystyle A}$ behaves in this way, the system is aconservative system, placed in contrast to adissipative system, where some subsets $A {\displaystyle A}$ wander away, never to be returned to. An example would be water running downhill—once it's run down, it will never come back up again. The lake that forms at the bottom of this river can, however, become well-mixed. Theergodic decomposition theorem states that every ergodic system can be split into two parts: the conservative part, and the dissipative part.

Mixing is a stronger statement than ergodicity. Mixing asks for this ergodic property to hold between any two sets⁠ $A, B {\displaystyle A,B}$ ⁠, and not just between some set $A {\displaystyle A}$ and⁠ $X {\displaystyle X}$ ⁠. That is, given any two sets⁠ $A,B\in {\mathcal {A}}$ ⁠, a system is said to be (topologically) mixing if there is an integer $N {\displaystyle N}$ such that, for all $A, B {\displaystyle A,B}$ and⁠ $n>N$ ⁠, one has that $T^{n}(A)\cap B\neq \varnothing$ . Here, $\cap$ denotesset intersection and $\varnothing$ is theempty set.

The above definition of topological mixing should be enough to provide an informal idea of mixing (it is equivalent to the formal definition, given below). However, it made no mention of the volume of $A {\displaystyle A}$ and⁠ $B {\displaystyle B}$ ⁠, and, indeed, there is another definition that explicitly works with the volume. Several, actually; one has both strong mixing and weak mixing; they are inequivalent, although a strong mixing system is always weakly mixing. The measure-based definitions are not compatible with the definition of topological mixing: there are systems which are one, but not the other. The general situation remains cloudy: for example, given three sets⁠ $A,B,C\in {\mathcal {A}}$ ⁠, one can define 3-mixing. As of 2020, it is not known if 2-mixing implies 3-mixing. (If one thinks of ergodicity as "1-mixing", then it is clear that 1-mixing does not imply 2-mixing; there are systems that are ergodic but not mixing.)

The concept ofstrong mixing is made in reference to the volume of a pair of sets. Consider, for example, a set $A {\displaystyle A}$ of colored dye that is being mixed into a cup of some sort of sticky liquid, say, corn syrup, or shampoo, or the like. Practical experience shows that mixing sticky fluids can be quite hard: there is usually some corner of the container where it is hard to get the dye mixed into. Pick as set $B {\displaystyle B}$ that hard-to-reach corner. The question of mixing is then, can $A {\displaystyle A}$ , after a long enough period of time, not only penetrate into $B {\displaystyle B}$ but also fill $B {\displaystyle B}$ with the same proportion as it does elsewhere?

One phrases the definition of strong mixing as the requirement that

\lim _{n\to \infty }\mu \left(T^{-n}A\cap B\right)=\mu (A)\mu (B).

The time parameter $n {\displaystyle n}$ serves to separate $A {\displaystyle A}$ and $B {\displaystyle B}$ in time, so that one is mixing $A {\displaystyle A}$ while holding the test volume $B {\displaystyle B}$ fixed. The product $\mu (A)\mu (B)$ is a bit more subtle. Imagine that the volume $B {\displaystyle B}$ is 10% of the total volume, and that the volume of dye $A {\displaystyle A}$ will also be 10% of the grand total. If $A {\displaystyle A}$ is uniformly distributed, then it is occupying 10% of $B {\displaystyle B}$ , which itself is 10% of the total, and so, in the end, after mixing, the part of $A {\displaystyle A}$ that is in $B {\displaystyle B}$ is 1% of the total volume. That is, $\mu \left({\mbox{after-mixing}}(A)\cap B\right)=\mu (A)\mu (B).$ This product-of-volumes has more than passing resemblance toBayes' theorem in probabilities; this is not an accident, but rather a consequence thatmeasure theory andprobability theory are the same theory: they share the same axioms (theKolmogorov axioms), even as they use different notation.

The reason for using $T^{-n}A$ instead of $T^{n}A$ in the definition is a bit subtle, but it follows from the same reasons why $T^{-1}A$ was used to define the concept of a measure-preserving map. When looking at how much dye got mixed into the corner $B {\displaystyle B}$ , one wants to look at where that dye "came from" (presumably, it was poured in at the top, at some time in the past). One must be sure that every place it might have "come from" eventually gets mixed into $B {\displaystyle B}$ .

Mixing in dynamical systems

[edit]

Let $(X,{\mathcal {A}},\mu ,T)$ be ameasure-preserving dynamical system, withT being the time-evolution orshift operator. The system is said to bestrong mixing if, for any $A,B\in {\mathcal {A}}$ , one has

\lim _{n\to \infty }\mu \left(A\cap T^{-n}B\right)=\mu (A)\mu (B).

For shifts parametrized by a continuous variable instead of a discrete integern, the same definition applies, with $T^{-n}$ replaced by $T_{g}$ withg being the continuous-time parameter.

A dynamical system is said to beweak mixing if one has

\lim _{n\to \infty }{\frac {1}{n}}\sum _{k=0}^{n-1}\left|\mu (A\cap T^{-k}B)-\mu (A)\mu (B)\right|=0.

In other words, $T {\displaystyle T}$ is strong mixing if $\mu (A\cap T^{-n}B)-\mu (A)\mu (B)\to 0$ in the usual sense, weak mixing if

\left|\mu (A\cap T^{-n}B)-\mu (A)\mu (B)\right|\to 0,

in theCesàro sense, and ergodic if $\mu \left(A\cap T^{-n}B\right)\to \mu (A)\mu (B)$ in the Cesàro sense. Hence, strong mixing implies weak mixing, which implies ergodicity. However, the converses are not true: There exist ergodic dynamical systems which are not weakly mixing, and weakly mixing dynamical systems which are not strongly mixing. TheChacon system was historically the first example given of a system that is weak mixing but not strong mixing.^[1]

Theorem. Weak mixing implies ergodicity.

Proof. If the action of the map decomposes into two components⁠ $A, B {\displaystyle A,B}$ ⁠, then we have⁠ $\mu (T^{-n}(A)\cap B)=\mu (A\cap B)=\mu (\emptyset )=0$ ⁠, so weak mixing implies⁠ $\vert \mu (A\cap B)-\mu (A)\mu (B)\vert =0$ ⁠, so one of $A, B {\displaystyle A,B}$ has zero measure, and the other one has full measure.

Covering families

[edit]

Given a topological space, such as the unit interval (whether it has its end points or not), we can construct a measure on it by taking the open sets, then take their unions, complements, unions, complements, and so on toinfinity, to obtain all theBorel sets. Next, we define a measure $\mu$ on the Borel sets, then add in all the subsets of measure-zero ("negligible sets"). This is how we obtain theLebesgue measure and the Lebesgue measurable sets.

In most applications of ergodic theory, the underlying space is almost-everywhere isomorphic to an open subset of some $\mathbb {R} ^{n}$ , and so it is a Lebesgue measure space. Verifying strong-mixing can be simplified if we only need to check a smaller set of measurable sets.

A covering family ${\mathcal {C}}$ is a set of measurable sets, such that any open set is adisjoint union of sets in it. Compare this withbase in topology, which is less restrictive as it allows non-disjoint unions.

Theorem. For Lebesgue measure spaces, if $T {\displaystyle T}$ is measure-preserving, and $\lim _{n}\mu (T^{-n}(A)\cap B)=\mu (A)\mu (B)$ for all $A, B {\displaystyle A,B}$ in a covering family, then $T {\displaystyle T}$ is strong mixing.

Proof. Extend the mixing equation from all $A, B {\displaystyle A,B}$ in the covering family, to all open sets by disjoint union, to all closed sets by taking the complement, to all measurable sets by using the regularity of Lebesgue measure to approximate any set with open and closed sets. Thus, $\lim _{n}\mu (T^{-n}(A)\cap B)=\mu (A)\mu (B)$ for all measurable⁠ $A, B {\displaystyle A,B}$ ⁠.

L² formulation

[edit]

The properties of ergodicity, weak mixing and strong mixing of a measure-preserving dynamical system can also be characterized by the average of observables. By von Neumann's ergodic theorem, ergodicity of a dynamical system $(X,{\mathcal {A}},\mu ,T)$ is equivalent to the property that, for any function $f\in L^{2}(X,\mu )$ , the sequence $(f\circ T^{n})_{n\geq 0}$ converges strongly and in the sense of Cesàro to⁠ $\int _{X}f\,d\mu$ ⁠, i.e.,

\lim _{N\to \infty }\left\|{1 \over N}\sum _{n=0}^{N-1}f\circ T^{n}-\int _{X}f\,d\mu \right\|_{L^{2}(X,\mu )}=0.

A dynamical system $(X,{\mathcal {A}},\mu ,T)$ is weakly mixing if, for any functions $f {\displaystyle f}$ and $g\in L^{2}(X,\mu ),$

\lim _{N\to \infty }{1 \over N}\sum _{n=0}^{N-1}\left|\int _{X}f\circ T^{n}\cdot g\,d\mu -\int _{X}f\,d\mu \cdot \int _{X}g\,d\mu \right|=0.

A dynamical system $(X,{\mathcal {A}},\mu ,T)$ is strongly mixing if, for any function⁠ $f\in L^{2}(X,\mu )$ ⁠, the sequence $(f\circ T^{n})_{n\geq 0}$ converges weakly to⁠ $\int _{X}f\,d\mu$ ⁠, i.e., for any function $g\in L^{2}(X,\mu ),$

\lim _{n\to \infty }\int _{X}f\circ T^{n}\cdot g\,d\mu =\int _{X}f\,d\mu \cdot \int _{X}g\,d\mu .

Since the system is assumed to be measure preserving, this last line is equivalent to saying that thecovariance⁠ $\lim _{n\to \infty }\operatorname {Cov} (f\circ T^{n},g)=0$ ⁠, so that the random variables $f\circ T^{n}$ and $g {\displaystyle g}$ become orthogonal as $n {\displaystyle n}$ grows. Actually, since this works for any function⁠ $g {\displaystyle g}$ ⁠, one can informally see mixing as the property that the random variables $f\circ T^{n}$ and $g {\displaystyle g}$ become independent as $n {\displaystyle n}$ grows.

Products of dynamical systems

[edit]

Given two measured dynamical systems $(X,\mu ,T)$ and $(Y,\nu ,S),$ one can construct a dynamical system $(X\times Y,\mu \otimes \nu ,T\times S)$ on the Cartesian product by defining $(T\times S)(x,y)=(T(x),S(y)).$ We then have the following characterizations of weak mixing:^[2]

Proposition. A dynamical system

(X,\mu ,T)

is weakly mixing if and only if, for any ergodic dynamical system⁠

(Y,\nu ,S)

⁠, the system

(X\times Y,\mu \otimes \nu ,T\times S)

is also ergodic.

Proposition. A dynamical system

(X,\mu ,T)

is weakly mixing if and only if

(X^{2},\mu \otimes \mu ,T\times T)

is also ergodic. If this is the case, then

(X^{2},\mu \otimes \mu ,T\times T)

is also weakly mixing.

Generalizations

[edit]

The definition given above is sometimes calledstrong 2-mixing, to distinguish it from higher orders of mixing. Astrong 3-mixing system may be defined as a system for which

\lim _{m,n\to \infty }\mu (A\cap T^{-m}B\cap T^{-m-n}C)=\mu (A)\mu (B)\mu (C)

holds for all measurable setsA,B,C. We can definestrong k-mixing similarly. A system which isstrongk-mixing for allk = 2,3,4,... is calledmixing of all orders.

It is unknown whether strong 2-mixing implies strong 3-mixing. It is known that strongm-mixing impliesergodicity.

Examples

[edit]

Irrational rotations of the circle, and more generally irreducible translations on a torus, are ergodic but neither strongly nor weakly mixing with respect to the Lebesgue measure.

Many maps considered as chaotic are strongly mixing for some well-choseninvariant measure, including: thedyadic map,Arnold's cat map,horseshoe maps,Kolmogorov automorphisms, and theAnosov flow (thegeodesic flow on the unittangent bundle ofcompact manifolds ofnegative curvature.)

The dyadic map is "shift to left in binary". In general, for any $n\in \{2,3,\dots \}$ , the "shift to left in base⁠ $n {\displaystyle n}$ ⁠" map $T(x)=nx{\bmod {1}}$ is strongly mixing on the covering family⁠ $\left\{\left({\tfrac {k}{n^{s}}},{\tfrac {k+1}{n^{s}}}\right)\smallsetminus \mathbb {Q} :s\geq 0,\leq k<n^{s}\right\}$ ⁠, therefore it is strongly mixing on⁠ $(0,1)\smallsetminus \mathbb {Q}$ ⁠, and therefore it is strongly mixing on⁠ $[0,1]$ ⁠.

Similarly, for any finite or countable alphabet⁠ $\Sigma$ ⁠, we can impose a discrete probability distribution on it, then consider the probability distribution on the "coin flip" space, where each "coin flip" can take results from⁠ $\Sigma$ ⁠. We can either construct the singly-infinite space $\Sigma ^{\mathbb {N} }$ or the doubly-infinite space⁠ $\Sigma ^{\mathbb {Z} }$ ⁠. In both cases, theshift map (one letter to the left) is strongly mixing, since it is strongly mixing on the covering family of cylinder sets. TheBaker's map is isomorphic to a shift map, so it is strongly mixing.

Topological mixing

[edit]

A form of mixing may be defined without appeal to ameasure, using only thetopology of the system. Acontinuous map $f:X\to X$ is said to betopologically transitive if, for every pair of non-emptyopen sets $A,B\subset X$ , there exists an integern such that

f^{n}(A)\cap B\neq \varnothing

where $f^{n}$ is thenth iterate off. In theoperator theory, a topologically transitivebounded linear operator (a continuous linear map on atopological vector space) is usually calledhypercyclic operator. A related idea is expressed by thewandering set.

Lemma: IfX is acomplete metric space with noisolated point, thenf is topologically transitive if and only if there exists ahypercyclic point $x\in X$ , that is, a pointx such that its orbit $\{f^{n}(x):n\in \mathbb {N} \}$ isdense inX.

A system is said to betopologically mixing if, given open sets $A {\displaystyle A}$ and⁠ $B {\displaystyle B}$ ⁠, there exists an integerN, such that, for all⁠ $n>N$ ⁠, one has

f^{n}(A)\cap B\neq \varnothing .

For a continuous-time system, $f^{n}$ is replaced by theflow⁠ $\varphi _{g}$ ⁠, withg being the continuous parameter, with the requirement that a non-empty intersection hold for all⁠ $\Vert g\Vert >N$ ⁠.

Aweak topological mixing is one that has no non-constantcontinuous (with respect to the topology) eigenfunctions of the shift operator.

Topological mixing neither implies, nor is implied by either weak or strong mixing: there are examples of systems that are weak mixing but not topologically mixing, and examples that are topologically mixing but not strong mixing.

Mixing in stochastic processes

[edit]

Let $(X_{t})_{-\infty <t<\infty }$ be astochastic process on a probability space⁠ $(\Omega ,{\mathcal {F}},\mathbb {P} )$ ⁠. The sequence space into which the process maps can be endowed with a topology, theproduct topology. Theopen sets of this topology are calledcylinder sets. These cylinder sets generate aσ-algebra, theBorel σ-algebra; this is the smallest σ-algebra that contains the topology.

Define a function $\alpha$ , called thestrong mixing coefficient, as

\alpha (s)=\sup \left\{|\mathbb {P} (A\cap B)-\mathbb {P} (A)\mathbb {P} (B)|:-\infty <t<\infty ,A\in X_{-\infty }^{t},B\in X_{t+s}^{\infty }\right\}

for all⁠ $-\infty <s<\infty$ ⁠. The symbol $X_{a}^{b}$ , with $-\infty \leq a\leq b\leq \infty$ denotes a sub-σ-algebra of the σ-algebra; it is the set of cylinder sets that are specified between timesa andb, i.e. the σ-algebra generated by⁠ $\{X_{a},X_{a+1},\ldots ,X_{b}\}$ ⁠.

The process $(X_{t})_{-\infty <t<\infty }$ is said to bestrongly mixing if $\alpha (s)\to 0$ as⁠ $s\to \infty$ ⁠. That is to say, a strongly mixing process is such that, in a way that is uniform over all times $t {\displaystyle t}$ and all events, the events before time $t {\displaystyle t}$ and the events after time $t+s$ tend towards beingindependent as $s\to \infty$ ; more colloquially, the process, in a strong sense, forgets its history.

Mixing in Markov processes

[edit]

Suppose $(X_{t})$ were a stationaryMarkov process with stationary distribution $\mathbb {Q}$ and let $L^{2}(\mathbb {Q} )$ denote the space of Borel-measurable functions that are square-integrable with respect to the measure $\mathbb {Q}$ . Also let

{\mathcal {E}}_{t}\varphi (x)=\mathbb {E} [\varphi (X_{t})\mid X_{0}=x]

denote the conditional expectation operator on $L^{2}(\mathbb {Q} ).$ Finally, let

Z=\left\{\varphi \in L^{2}(\mathbb {Q} ):\int \varphi \,d\mathbb {Q} =0\right\}

denote the space of square-integrable functions with mean zero.

Theρ-mixing coefficients of the process {x_t} are

\rho _{t}=\sup _{\varphi \in Z:\,\|\varphi \|_{2}=1}\|{\mathcal {E}}_{t}\varphi \|_{2}.

The process is calledρ-mixing if these coefficients converge to zero ast → ∞, and “ρ-mixing with exponential decay rate” ifρ_t <e^−δt for someδ > 0. For a stationary Markov process, the coefficientsρ_t may either decay at an exponential rate, or be always equal to one.^[3]

Theα-mixing coefficients of the process {x_t} are

\alpha _{t}=\sup _{\varphi \in Z:\|\varphi \|_{\infty }=1}\|{\mathcal {E}}_{t}\varphi \|_{1}.

The process is calledα-mixing if these coefficients converge to zero ast → ∞, it is "α-mixing with exponential decay rate" ifα_t <γe^−δt for someδ > 0, and it isα-mixing with a sub-exponential decay rate ifα_t <ξ(t) for some non-increasing function $\xi$ satisfying

{\frac {\ln \xi (t)}{t}}\to 0

as⁠ $t\to \infty$ ⁠.^[3]

Theα-mixing coefficients are always smaller than theρ-mixing ones:α_t ≤ρ_t, therefore if the process isρ-mixing, it will necessarily beα-mixing too. However, whenρ_t = 1, the process may still beα-mixing, with sub-exponential decay rate.

Theβ-mixing coefficients are given by

\beta _{t}=\int \sup _{0\leq \varphi \leq 1}\left|{\mathcal {E}}_{t}\varphi (x)-\int \varphi \,d\mathbb {Q} \right|\,d\mathbb {Q} .

The process is calledβ-mixing if these coefficients converge to zero ast → ∞, it isβ-mixing with an exponential decay rate ifβ_t <γe^−δt for someδ > 0, and it isβ-mixing with a sub-exponential decay rate ifβ_tξ(t) → 0 ast → ∞ for some non-increasing function $\xi$ satisfying

{\frac {\ln \xi (t)}{t}}\to 0

as $t\to \infty$ .^[3]

A strictly stationary Markov process isβ-mixing if and only if it is an aperiodic recurrentHarris chain. Theβ-mixing coefficients are always bigger than theα-mixing ones, so if a process isβ-mixing it will also beα-mixing. There is no direct relationship betweenβ-mixing andρ-mixing: neither of them implies the other.

References

[edit]

V. I. Arnold and A. Avez,Ergodic Problems of Classical Mechanics, (1968) W. A. Benjamin, Inc.
Manfred Einsiedler and Thomas Ward,Ergodic theory with a view towards number theory, (2011) SpringerISBN 978-0-85729-020-5
Achim Klenke,Probability Theory, (2006) SpringerISBN 978-1-84800-047-6
Chen, Xiaohong; Hansen, Lars Peter; Carrasco, Marine (2010). "Nonlinearity and temporal dependence".Journal of Econometrics.155 (2):155–169.CiteSeerX 10.1.1.597.8777.doi:10.1016/j.jeconom.2009.10.001.S2CID 10567129.