In mathematics, astochastic matrix is asquare matrix used to describe the transitions of aMarkov chain. Each of its entries is anonnegativereal number representing aprobability.[1][2]: 10 It is also called aprobability matrix,transition matrix,substitution matrix, orMarkov matrix. The stochastic matrix was first developed byAndrey Markov at the beginning of the 20th century, and has found use throughout a wide variety of scientific fields, includingprobability theory, statistics,mathematical finance andlinear algebra, as well ascomputer science andpopulation genetics. There are several different definitions and types of stochastic matrices:
In the same vein, one may define aprobability vector as avector whose elements are nonnegative real numbers which sum to 1. Thus, each row of a right stochastic matrix (or column of a left stochastic matrix) is a probability vector. Right stochastic matrices act uponrow vectors of probabilities by multiplication from the right (hence their name) and the matrix entry in thei-th row andj-th column is the probability of transition from statei to statej. Left stochastic matrices act uponcolumn vectors of probabilities by multiplication from the left (hence their name) and the matrix entry in thei-th row andj-th column is the probability of transition from statej to statei.
This article uses the right/row stochastic matrix convention.

The stochastic matrix was developed alongside the Markov chain byAndrey Markov, aRussian mathematician and professor atSt. Petersburg University who first published on the topic in 1906.[3] His initial intended uses were for linguistic analysis and other mathematical subjects likecard shuffling, but both Markov chains and matrices rapidly found use in other fields.[3][4]
Stochastic matrices were further developed by scholars such asAndrey Kolmogorov, who expanded their possibilities by allowing for continuous-time Markov processes.[5] By the 1950s, articles using stochastic matrices had appeared in the fields ofeconometrics[6] andcircuit theory.[7] In the 1960s, stochastic matrices appeared in an even wider variety of scientific works, frombehavioral science[8] to geology[9][10] toresidential planning.[11] In addition, much mathematical work was also done through these decades to improve the range of uses and functionality of the stochastic matrix andMarkovian processes more generally.
From the 1970s to present, stochastic matrices have found use in almost every field that requires formal analysis, fromstructural science[12] tomedical diagnosis[13] topersonnel management.[14] In addition, stochastic matrices have found wide use inland change modeling, usually under the term Markov matrix.[15]
A stochastic matrix describes aMarkov chainXt over afinitestate spaceS withcardinalityα.
If theprobability of moving fromi toj in one time step isPr(j|i) =Pi,j, the stochastic matrixP is given by usingPi,j as thei-th row andj-th column element, e.g.,
Since the total of transition probability from a statei to all other states must be 1,thus this matrix is a right stochastic matrix.
The above elementwise sum across each rowi ofP may be more concisely written asP1 =1, where1 is theα-dimensional column vector of all ones. Using this, it can be seen that the product of two right stochastic matricesP′ andP′′ is also right stochastic:P′P′′1 =P′ (P′′1) =P′1 =1. In general, thek-th powerPk of a right stochastic matrixP is also right stochastic. The probability of transitioning fromi toj in two steps is then given by the(i,j)-th element of the square ofP:
In general, the probability transition of going from any state to another state in a finite Markov chain given by the matrixP ink steps is given byPk.
An initial probability distribution of states, specifying where the system might be initially and with what probabilities, is given as arow vector.
Astationaryprobability vectorπ is defined as a distribution, written as a row vector, that does not change under application of the transition matrix; that is, it is defined as a probability distribution on the set{1, …,n} which is also aleft eigenvector of the probability matrix, associated witheigenvalue 1:

It can be shown that thespectral radius of any stochastic matrix is one. By theGershgorin circle theorem, all of the eigenvalues of a stochastic matrix have absolute values less than or equal to one. More precisely, the eigenvalues of-by- stochastic matrices are restricted to lie within a subset of the complex unit disk, known as Karpelevič regions.[16] This result was originally obtained byFridrikh Karpelevich,[17] following a question originally posed by Kolmogorov[18] and partially addressed byNikolay Dmitriyev andEugene Dynkin.[19]
Additionally, every right stochastic matrix has an "obvious" column eigenvector associated to the eigenvalue 1: the vector1 used above, whose coordinates are all equal to 1. As left and right eigenvalues of a square matrix are the same, every stochastic matrix has, at least, aleft eigenvector associated to theeigenvalue 1 and the largest absolute value of all its eigenvalues is also 1. Finally, theBrouwer Fixed Point Theorem (applied to the compact convex set of all probability distributions of the finite set{1, ...,n}) implies that there is some left eigenvector which is also a stationary probability vector.
On the other hand, thePerron–Frobenius theorem also ensures that everyirreducible stochastic matrix has such a stationary vector, and that the largest absolute value of an eigenvalue is always 1. However, this theorem cannot be applied directly to such matrices because they need not be irreducible. In general, there may be several such vectors. However, for a matrix with strictly positive entries (or, more generally, for an irreducible aperiodic stochastic matrix), this vector is unique and can be computed by observing that for anyi we have the following limit,
whereπj is thej-th element of the row vectorπ. Among other things, this says that the long-term probability of being in a statej is independent of the initial statei. That both of these computations give the same stationary vector is a form of anergodic theorem, which is generally true in a wide variety ofdissipative dynamical systems: the system evolves, over time, to astationary state.
Intuitively, a stochastic matrix represents a Markov chain; the application of the stochastic matrix to a probability distribution redistributes the probability mass of the original distribution while preserving its total mass. If this process is applied repeatedly, the distribution converges to a stationary distribution for the Markov chain.[2]: 14–17 [20]: 116
Stochastic matrices and their product form acategory, which is both a subcategory of thecategory of matrices and of the one ofMarkov kernels.
Suppose there is a timer and a row of five adjacent boxes. At time zero, a cat is in the first box, and a mouse is in the fifth box. The cat and the mouse both jump to a randomadjacent box when the timer advances. For example, if the cat is in the second box and the mouse is in the fourth, the probability thatthe cat will be in the first boxand the mouse in the fifth after the timer advances is one fourth. If the cat is in the first box and the mouse is in the fifth, the probability thatthe cat will be in box two and the mouse will be in box four after the timer advances is one. The cat eats the mouse if both end up in the same box, at which time the game ends. Let therandom variableK be the time the mouse stays in the game.
TheMarkov chain that represents this game contains the following five states specified by the combination of positions (cat,mouse). Note that while a naive enumeration of states would list 25 states, many are impossible either because the mouse can never have a lower index than the cat (as that would mean the mouse occupied the cat's box and survived to move past it), or because the sum of the two indices will always have evenparity. In addition, the 3 possible states that lead to the mouse's death are combined into one:
We use a stochastic matrix, (below), to represent thetransition probabilities of this system (rows and columns in this matrix are indexed by the possible states listed above, with the pre-transition state as the row and post-transition state as the column). For instance, starting from state 1 – 1st row – it is impossible for the system to stay in this state, so; the system also cannot transition to state 2 – because the cat would have stayed in the same box – so, and by a similar argument for the mouse,. Transitions to states 3 or 5 are allowed, and thus .
No matter what the initial state, the cat will eventually catch the mouse (with probability 1) and a stationary stateπ = (0,0,0,0,1) is approached as a limit. To compute the long-term average or expected value of a stochastic variable, for each state and time there is a contribution of. Survival can be treated as a binary variable with for a surviving state and for the terminated state. The states with do not contribute to the long-term average.

As State 5 is an absorbing state, the distribution of time to absorption isdiscrete phase-type distributed. Suppose the system starts in state 2, represented by the vector. The states where the mouse has perished don't contribute to the survival average so state five can be ignored. The initial state and transition matrix can be reduced to,
and
where is theidentity matrix, and represents a column matrix of all ones that acts as a sum over states.
Since each state is occupied for one step of time the expected time of the mouse's survival is just thesum of the probability of occupation over all surviving states and steps in time,
Higher order moments are given by