Movatterモバイル変換


[0]ホーム

URL:


Jump to content
WikipediaThe Free Encyclopedia
Search

Stochastic chains with memory of variable length

From Wikipedia, the free encyclopedia
Stochastic chain family

Stochastic chains with memory of variable length are a family ofstochastic chains of finite order in a finite alphabet, such as, for every time pass, only one finite suffix of the past, called context, is necessary to predict the next symbol. These models were introduced in the information theory literature byJorma Rissanen in 1983,[1] as a universal tool todata compression, but recently have been used to model data in different areas such asbiology,[2]linguistics[3] andmusic.[4]

Definition

[edit]

A stochastic chain with memory of variable length is a stochastic chain(Xn)nZ{\displaystyle (X_{n})_{n\in Z}}, taking values in a finite alphabetA{\displaystyle A}, and characterized by a probabilistic context tree(τ,p){\displaystyle (\tau ,p)}, so that

History

[edit]

The class of stochastic chains with memory of variable length was introduced byJorma Rissanen in the articleA universal data compression system.[1] Such class of stochastic chains was popularized in the statistical and probabilistic community by P. Bühlmann and A. J. Wyner in 1999, in the articleVariable Length Markov Chains. Named by Bühlmann and Wyner as “variable lengthMarkov chains” (VLMC), these chains are also known as “variable-order Markov models" (VOM), “probabilisticsuffix trees[2] and “contexttree models”.[5] The name “stochastic chains with memory of variable length” seems to have been introduced byGalves and Löcherbach, in 2008, in the article of the same name.[6]

Examples

[edit]

Interrupted light source

[edit]

Consider asystem by a lamp, an observer and a door between both of them. The lamp has two possiblestates: on, represented by 1, or off, represented by 0. When the lamp is on, the observer may see the light through the door, depending on which state the door is at the time: open, 1, or closed, 0. such states are independent of the original state of the lamp.

Let(Xn)n0{\displaystyle (X_{n})_{n\geq 0}} aMarkov chain that represents the state of the lamp, with values inA=0,1{\displaystyle A={0,1}} and letp{\displaystyle p} be aprobability transition matrix. Also, let(ξn)n0{\displaystyle (\xi _{n})_{n\geq 0}} be a sequence ofindependent random variables that represents the door's states, also taking values inA{\displaystyle A}, independent of the chain(Xn)n0{\displaystyle (X_{n})_{n\geq 0}} and such that

P(ξn=1)=1ε{\displaystyle \mathbb {P} (\xi _{n}=1)=1-\varepsilon }

where0<ϵ<1{\displaystyle 0<\epsilon <1}. Define a new sequence(Zn)n0{\displaystyle (Z_{n})_{n\geq 0}} such that

Zn=Xnξn{\displaystyle Z_{n}=X_{n}\xi _{n}} for every(Zn)n0.{\displaystyle (Z_{n})_{n\geq 0}.}

In order to determine the last instant that the observer could see the lamp on, i.e. to identify the least instantk{\displaystyle k}, withk<n{\displaystyle k<n} in whichZk=1{\displaystyle Z_{k}=1}.

Using a context tree it's possible to represent the past states of the sequence, showing which are relevant to identify the next state.

The stochastic chain(Zn)nZ{\displaystyle (Z_{n})_{n\in \mathbb {Z} }} is, then, a chain with memory of variable length, taking values inA{\displaystyle A} and compatible with the probabilistic context tree(τ,p){\displaystyle (\tau ,p)}, where

τ={1,10,100,}{0}.{\displaystyle \tau =\{1,10,100,\cdots \}\cup \{0^{\infty }\}.}

Inferences in chains with variable length

[edit]

Given a sampleXl,,Xn{\displaystyle X_{l},\ldots ,X_{n}}, one can find the appropriated context tree using the following algorithms.

The context algorithm

[edit]

In the articleA Universal Data Compression System,[1] Rissanen introduced a consistent algorithm to estimate the probabilistic context tree that generates the data. This algorithm's function can be summarized in two steps:

  1. Given the sample produced by a chain with memory of variable length, we start with the maximum tree whose branches are all the candidates to contexts to the sample;
  2. The branches in this tree are then cut until you obtain the smallest tree that's well adapted to the data. Deciding whether or not shortening the context is done through a given gain function, such as the ratio of the log-likelihood.

BeX0,,Xn1{\displaystyle X_{0},\ldots ,X_{n-1}} a sample of a finite probabilistic tree(τ,p){\displaystyle (\tau ,p)}. For any sequencexj1{\displaystyle x_{-j}^{-1}} withjn{\displaystyle j\leq n}, it is possible to denote byNn(xj1){\displaystyle N_{n}(x_{-j}^{-1})} the number of occurrences of the sequence in the sample, i.e.,

Nn(xj1)=t=0nj1{Xtt+j1=xj1}{\displaystyle N_{n}(x_{-j}^{-1})=\sum _{t=0}^{n-j}\mathbf {1} \left\{X_{t}^{t+j-1}=x_{-j}^{-1}\right\}}

Rissanen first built a context maximum candidate, given byXnK(n)n1{\displaystyle X_{n-K(n)}^{n-1}}, whereK(n)=Clogn{\displaystyle K(n)=C\log {n}} andC{\displaystyle C} is an arbitrary positive constant. The intuitive reason for the choice ofClogn{\displaystyle C\log {n}} comes from the impossibility of estimating the probabilities of sequence with lengths greater thanlogn{\displaystyle \log {n}} based in a sample of sizen{\displaystyle n}.

From there, Rissanen shortens the maximum candidate through successive cutting the branches according to a sequence of tests based in statistical likelihood ratio. In a more formal definition, if bANnxk1b0 define the probability estimator of the transition probabilityp{\displaystyle p} by

p^n(axk1)=Nn(xk1a)bANn(xk1b){\displaystyle {\hat {p}}_{n}(a\mid x_{-k}^{-1})={\frac {N_{n}(x_{-k}^{-1}a)}{\sum _{b\in A}N_{n}(x_{-k}^{-1}b)}}}

wherexj1a=(xj,,x1,a){\displaystyle x_{-j}^{-1}a=(x_{-j},\ldots ,x_{-1},a)}. IfbANn(xk1b)=0{\displaystyle \sum _{b\in A}N_{n}(x_{-k}^{-1}b)\,=\,0}, definep^n(axk1)=1/|A|{\displaystyle {\hat {p}}_{n}(a\mid x_{-k}^{-1})\,=\,1/|A|}.

Toi1{\displaystyle i\geq 1}, define

Λn(xi1)=2yAaANn(yxi1a)log[p^n(axi1y)p^n(axi1)]{\displaystyle \Lambda _{n}(x_{-i}^{-1})\,=\,2\,\sum _{y\in A}\sum _{a\in A}N_{n}(yx_{-i}^{-1}a)\log \left[{\frac {{\hat {p}}_{n}(a\mid x_{-i}^{-1}y)}{{\hat {p}}_{n}(a\mid x_{-i}^{-1})}}\right]\,}

whereyxi1=(y,xi,,x1){\displaystyle yx_{-i}^{-1}=(y,x_{-i},\ldots ,x_{-1})} and

p^n(axi1y)=Nn(yxi1a)bANn(yxi1b).{\displaystyle {\hat {p}}_{n}(a\mid x_{-i}^{-1}y)={\frac {N_{n}(yx_{-i}^{-1}a)}{\sum _{b\in A}N_{n}(yx_{-i}^{-1}b)}}.}

Note thatΛn(xi1){\displaystyle \Lambda _{n}(x_{-i}^{-1})} is the ratio of the log-likelihood to test the consistency of the sample with the probabilistic context tree(τ,p){\displaystyle (\tau ,p)} against the alternative that is consistent with(τ,p){\displaystyle (\tau ',p')}, whereτ{\displaystyle \tau } andτ{\displaystyle \tau '} differ only by a set of sibling knots.

The length of the current estimated context is defined by

^n(X0n1)=max{i=1,,K(n):Λn(Xnin1)>Clogn}{\displaystyle {\hat {\ell }}_{n}(X_{0}^{n-1})=\max \left\{i=1,\ldots ,K(n):\Lambda _{n}(X_{n-i}^{n-1})\,>\,C\log n\right\}\,}

whereC{\displaystyle C} is any positive constant. At last, by Rissanen,[1] there's the following result. GivenX0,,Xn1{\displaystyle X_{0},\ldots ,X_{n-1}} of a finite probabilistic context tree(τ,p){\displaystyle (\tau ,p)}, then

P(^n(X0n1)(X0n1))0,{\displaystyle P\left({\hat {\ell }}_{n}(X_{0}^{n-1})\neq \ell (X_{0}^{n-1})\right)\longrightarrow 0,}

whenn{\displaystyle n\rightarrow \infty }.

Bayesian information criterion (BIC)

[edit]

The estimator of the context tree by BIC with a penalty constantc>0{\displaystyle c>0} is defined as

τ^BIC=argmaxτTn{logLτ(X1n)cdf(τ)logn}{\displaystyle {\hat {\tau }}_{\mathrm {BIC} }={\underset {\tau \in {\mathcal {T}}_{n}}{\arg \max }}\{\log L_{\tau }(X_{1}^{n})-c\,{\textrm {d}}f(\tau )\log n\}}

Smallest maximizer criterion (SMC)

[edit]

The smallest maximizer criterion[3] is calculated by selecting the smallest treeτ of a set of champion treesC such that

limnlogLτ(X1n)logLτ^(X1n)n=0{\displaystyle \lim _{n\to \infty }{\frac {\log L_{\tau }(X_{1}^{n})-\log L_{\hat {\tau }}(X_{1}^{n})}{n}}=0}

See also

[edit]

References

[edit]
  1. ^abcdRissanen, J (Sep 1983). "A Universal Data Compression System".IEEE Transactions on Information Theory.29 (5):656–664.doi:10.1109/TIT.1983.1056741.
  2. ^abBejenaro, G (2001)."Variations on probabilistic suffix trees: statistical modeling and prediction of protein families".Bioinformatics.17 (5):23–43.doi:10.1093/bioinformatics/17.1.23.PMID 11222260.
  3. ^abGalves A, Galves C, Garcia J, Garcia NL, Leonardi F (2012)."Context tree selection and linguistic rhythm retrieval from written texts".The Annals of Applied Statistics.6 (5):186–209.arXiv:0902.3619.doi:10.1214/11-AOAS511.
  4. ^Dubnov S, Assayag G, Lartillot O, Bejenaro G (2003). "Using machine-learning methods for musical style modeling".Computer.36 (10):73–80.CiteSeerX 10.1.1.628.4614.doi:10.1109/MC.2003.1236474.
  5. ^Galves A, Garivier A,Gassiat E (2012). "Joint estimation of intersecting context tree models".Scandinavian Journal of Statistics.40 (2):344–362.arXiv:1102.0673.doi:10.1111/j.1467-9469.2012.00814.x.
  6. ^Galves A, Löcherbach E (2008)."Stochastic chains with memory of variable length".TICSP Series.38:117–133.arXiv:0804.2050.
Discrete time
Continuous time
Both
Fields and other
Time series models
Financial models
Actuarial models
Queueing models
Properties
Limit theorems
Inequalities
Tools
Disciplines
Retrieved from "https://en.wikipedia.org/w/index.php?title=Stochastic_chains_with_memory_of_variable_length&oldid=1216777104"
Category:
Hidden categories:

[8]ページ先頭

©2009-2025 Movatter.jp