In this section, we study some of the deepest and most interesting parts of the theory of Markov chains, involving two different but complementary ideas: stationary distributions and limiting distributions. The theory of renewal processes plays a critical role.
As usual, our starting point is a (time homogeneous) Markov chain \( \bs{X} = (X_0, X_1, X_2, \ldots) \) with countable state space \( S \) and transition probability matrix \( P \).
Let \( y \in S \) and let \( n \in \N_+ \). We will denote the number of visits to \( y \) during the first \( n \) time units by
\[ N_{y,n} = \sum_{i=1}^n \bs{1}(X_i = y) \]Note that \( N_{y,n} \to N_y \) as \( n \to \infty \), where
\[ N_y = \sum_{i=1}^\infty \bs{1}(X_i = y) \]is the total number of visits to \( y \), one of the important random variables that we studied in the section on Transience and Recurrence. Next we denote the time of the \( n \)th visit to \( y \) by
\[ T_{y,n} = \min\{k \in \N_+: N_{y,k} = n\} \]where as usual, we define \( \min(\emptyset) = \infty \). Note that \( T_{y,1} \) is the time of the first visit to \( y \), which we denoted simply by \( T_y \) in the section on Transience and Recurrence. Recall also the definition of the hitting probability:
\[ H(x, y) = \P(T_y \lt \infty \mid X_0 = x), \quad (x, y) \in S^2 \]Suppose that \( x, \; y \in S \), \( y \) is recurrent and \( X_0 = x \). Then
Thus, \( (T_{y,n}: n \in \N_+) \) is the sequence of arrival times and \( (N_{y,n}: n \in \N_+) \) is the associated sequence of counting variables. The corresponding renewal function is
\[ G_n(x, y) = \E(N_{y,n} \mid X_0 = x) = \sum_{k=1}^n P^k(x, y), \quad n \in \N \]Note that \( G_n(x, y) \to G(x, y) \) as \( n \to \infty \) where \( G \) is the potential matrix that we studied previously. This matrix gives the expected total number visits to \( y \) starting at \( x \):
\[ G(x, y) = \E(N_y \mid X_0 = x) = \sum_{k=1}^\infty P^k(x, y), \quad (x, y) \in S^2 \]The limit theorems of renewal theory can now be used to explore the limiting behavior of the Markov chain. Let \( \mu(y) = \E(T_y \mid X_0 = y) \) denote the mean return time to state \( y \), starting in \( y \). In the following three exercises, we assume again that \( y \) is recurrent, and we interpret \( 1 / \infty = 0 \).
For \( x \in S \)
\[ \P\left( \frac{N_{n,y}}{n} \to \frac{1}{\mu(y)} \text{ as } n \to \infty \biggm| X_0 = x \right) = H(x, y) \]This result follows from the strong law of large numbers for renewal processes.
For \( x \in S \)
\[ \frac{1}{n} \sum_{k=1}^n P^k(x, y) \to \frac{H(x, y)}{\mu(y)} \text{ as } n \to \infty \]This result follows from the elementary renewal theorem for renewal processes.
Suppose in addition that \( y \) is aperiodic. For \( x \in S \)
\[ P^n(x, y) \to \frac{H(x, y)}{\mu(y)} \text{ as } n \to \infty \]This result follows from the renewal theorem for renewal processes.
Note that \( H(y, y) = 1 \) by the very definition of a recurrent state. Thus, when \( x = y \), Exercise 2 gives convergence with probability 1, and the limit in Exercise 2, Exercise 3, and Exercise 4 is \( 1 / \mu(y) \). By contrast, we already know the corresponding limiting behavior when \( y \) is transient, since the number of visit to \( y \) is finite with probability 1, and the expected number of visits to \( y \) is finite.
Suppose that \( y \in S \) is transient. For \( x \in S \),
On the other hand, the return time to \( y \) is infinite with positive probability, by the very definition of a transient state. Thus \( \mu(y) = \infty \), and thus, the results in parts (b) and (c) agree with Exercise 3 and Exercise 4, respectively. In particular, Exercise 3, which is our fundamental limiting result in many ways, holds for any states \( x, \; y \in S \).
Clearly there is a fundamental dichotomy in terms of the limiting behavior of the chain, depending on whether the mean return time to a given state is finite or infinite. Thus, state \( x \in S \) is said to be positive recurrent if \( \mu(y) \lt \infty \).
If \( x \in S \) is positive recurrent, then \( x \) is recurrent.
Recall that if \( \E(T_x \mid X_0 = x) \lt \infty \) then \( \P(T_x \lt \infty \mid X_0 = x) = 1 \).
On the other hand, it is possible to have \( \P(T_x \lt \infty \mid X_0 = x) = 1 \), so that \( x \) is recurrent, and also \( \E(T_x \mid X_0 = x) = \infty \), so that \( x \) is not positive recurrent. (A random variable can be finite with probability 1, but can have infinite mean.) In such a case, \( x \) is said to be null recurrent. Note that if \( y \) is positive recurrent, then the limits in Exercise 2, Exercise 3, and Exercise 4 are positive if \( x \to y \) and 0 otherwise. If \( y \) is null recurrent or transient, the limits in Exercise 2, Exercise 3, and Exercise 4 are 0.
Like recurrence/transience, and period, the null/positive recurrence property is a class property.
If \( x \) is positive recurrent and \( x \to y \) then \( y \) is positive recurrent.
Suppose that \( x \) is positive recurrent and \( x \to y \). Recall that \( y \) is recurrent and \( y \to x \). Hence there exist \( i, \; j \in \N_+ \) such that \( P^i(x, y) \gt 0 \) and \( P^j(y, x) \gt 0 \). Hence for every \( k \in \N_+ \), \( P^{i+j+k}(y, y) \ge P^j(y, x) P^k(x, x) P^i(x, y) \). Averaging over \( k \) from 1 to \( n \) gives
\[ \frac{G_n(y, y)}{n } - \frac{G_{i+j}(y, y)}{n} \ge P^j(y, x) \frac{G_n(x, x)}{n} P^i(x, y)\]Letting \( n \to \infty \) and using Exercise 3 gives
\[ \frac{1}{\mu(y)} \ge P^j(y, x) \frac{1}{\mu(x)} P^i(x, y) \]Therefore \( \mu(y) \lt \infty \).
Thus, the terms positive recurrent and null recurrent can be applied to equivalence classes (under the to and from equivalence relation), as well as individual states. When the chain is irreducible, the terms can be applied to the chain as a whole.
If \( A \) is a finite, closed set of states then \( A \) contains a positive recurrent state.
Fix a state \( x \in A \) and note that \( P^k(x, A) = \sum_{y \in A} P^k(x, y) = 1 \) for every \( k \in \N_+ \) since \( A \) is closed. Averaging over \( k \) from 1 to \( n \) gives
\[ \sum_{y \in A} \frac{G_n(x, y)}{n} = 1 \]for every \( n \in \N_+ \). Note that the change in the order of summation is justified since both sums are finite. Assume now that all states in \( A \) are transient or null recurrent. Letting \( n \to \infty \) in the displayed equation gives \( 0 = 1 \). Again, the interchange of sum and limit is justified by the fact that \( A \) is finite.
If \( A \) is a finite, closed set of states then \( A \) contains no null recurrent states.
Let \( x \in A \). Note that \( [x] \subseteq A \) since \( A \) is closed. Suppose that \( x \) is recurrent. Note that \( [x] \) is closed and finite and hence must have a positive recurrent state. Hence \( [x] \) is positive recurrent and hence so is \( x \).
If \( A \) is a finite, closed, and irreducible set of states then \( A \) is a positive recurrent equivalence class.
We already know that \( A \) is a recurrent equivalence class, from our study of transience and recurrence. From the previous theorem, \( A \) is positive recurrent.
In particular, a Markov chain with a finite state space cannot have null recurrent states; every state must be transient or positive recurrent.
Returning to the limiting behavior, suppose that the chain \( \bs{X} \) is irreducible, so that either all states are transient, all states are null recurrent, or all states are positive recurrent. From Exercise 3 and Exercise 4, if the chain is transient or if the chain is recurrent and aperiodic, then
\[ P^n(x, y) \to \frac{1}{\mu(y)} \text{ as } n \to \infty \text{ for every } x \in S \]Of course in the transient case and in the null recurrent, aperiodic case, the limit is 0. Only in the positive recurrent, aperiodic case is the limit positive; in this case, the chain is said to ergodic. In all of these cases, the limit is independent of the initial state \( x \). In the ergodic case, as we will see, \( X_n \) has a limiting distribution as \( n \to \infty \) that is independent of the initial distribution.
The behavior when the chain is periodic with period \( d \) is a bit more complicated, but we can understand this behavior by considering the \( d \)-step chain \( (X_0, X_d, X_{2\,d}, \ldots) \) that has transition matrix \( P^d \). Essentially, this allows us to trade periodicity (one form of complexity) for reducibility (another form of complexity). Specifically, recall that the \( d \)-step chain is aperiodic but has \( d \) equivalence classes \( (A_0, A_1, \ldots, A_{d-1}) \); and these are the cyclic classes of original chain \( \bs{X} \).
Note that every single step for the \( d \)-step chain corresponds to \( d \) steps for the original chain. Thus, show that the mean return time to state \( x \) for the \( d \)-step chain is \( \mu_d(x) = \mu(x) / d \).
For \( i, \; j, \; k \in \{0, 1, \ldots, d - 1\} \),
These results follow from the previous theorem and the cyclic behavior of the chain.
If \( y \in S \) is null recurrent (or of course transient) then regardless of the period of \( y \), \( P^n(x, y) \to 0 \) as \( n \to \infty \) for every \( x \in S \).
Our next goal is to see how the limiting behavior is related to invariant distributions. Suppose that \( f \) is a probability density function on the state space \( S \). Recall that \( f \) is invariant for \( P \) (and for the chain \( \bs{X} \)) if \( f \, P = f \). It follows immediately that \( f \, P^n = f \) for every \( n \in \N \). Thus, if \( X_0 \) has probability density function \( f \) then so does \( X_n \) for each \( n \in \N \), and hence \( \bs{X} \) is a sequence of identically distributed random variables.
Suppose that \( g: S \to [0, \infty) \) is left-invariant for \( P \), and let \( C = \sum_{x \in S} g(x) \). If \( 0 \lt C \lt \infty \) then \( f \) defined by \( f(x) = g(x) / C \) for \( x \in S \) is an invariant probability density function.
Suppose that \( g: S \to [0, \infty) \) is left-invariant for \( P \) and satisfies \( \sum_{x \in S} g(x) \lt \infty \). Then
\[ g(y) = \frac{1}{\mu(y)} \sum_{x \in S} g(x) H(x, y), \quad y \in S \]Recall again that \( g \, P^k = g \) for every \( k \in \N \) since \( g \) is left-invariant for \( P \). Averaging over \( k \) from 1 to \( n \) gives \( g \, G_ n / n = g \) for each \( n \in \N_+ \). Letting \( n \to \infty \) and using Exercise 3 gives the result. The dominated convergence theorem justifies interchange the limit with the sum, since \( \sum_{x \in S} g(x) \lt \infty \).
Note that if \( y \) is transient or null recurrent, then \( g(y) = 0 \). Thus, an invariant probability distribution must be concentrated on the positive recurrent states.
Suppose now that the chain is irreducible. If the chain is transient or null recurrent, then the only nonnegative functions that are left-invariant for \( P \) are functions that satisfy \( \sum_{x \in S} g(x) = \infty \) and the function that is identically 0: \( g = \bs{0} \). In particular, the chain does not have an invariant distribution. On the other hand, if the chain is positive recurrent, then \( H(x, y) = 1 \) for all \( x, \; y \in S \). Thus, the only possible invariant probability density function is the function \( f \) given by \( f(x) = 1 / \mu(x) \) for \( x \in S \). Any other nonnegative function \( g \) that is left-invariant for \( P \) and has finite sum, is a multiple of \( f \) (and indeed the multiple is sum of the values). Our next goal is to show that \( f \) really is an invariant probability density function.
If \( \bs{X} \) is an irreducible, positive recurrent chain then the function \( f \) given by \( f(x) = 1 / \mu(x) \) for \( x \in S \) is an invariant probability density function for \( \bs{X} \).
Let \( A \) be a finite subset of \( S \). Then \( \sum_{y \in A} \frac{1}{n} G_n(x, y) \le 1 \) for every \( x \in S \). Letting \( n \to \infty \) using Exercise 3 gives \( \sum_{y \in A} f(y) \le 1 \). The interchange of limit and sum is justified since \( A \) is finite. Thus \( C \le 1 \) where \( C = \sum_{y \in A} f(y) \) is the normalizing constant. Note also that \( C \gt 0 \) since the chain is positive recurrent. Next note that
\[ \sum_{y \in A} \frac{1}{n} G_n(x, y) P(y, z) \le \frac{1}{n} G_{n+1}(x, z) \]for every \( x, \; z \in S \). Letting \( n \to \infty \) gives \( \sum_{y \in A} f(y) P(y, z) \le f(z) \) for every \( z \in S \). It then follows that \( \sum_{y \in S} f(y) P(y, z) \le f(z) \) for every \( z \in S \). Suppose that strict inequality holds for some for some \( z \in S \). Then
\[ \sum_{z \in S} \sum_{y \in S} f(y) P(y, z) \lt \sum_{z \in S} f(z) \]Interchange the order of summation on the left in the displayed inequality yields the contradiction \( \sum_{y \in S} f(y) \lt \sum_{z \in S} f(z) \). Thus \( f \) is left-invariant for \( P \). Hence \( f / C \) is an invariant probability density function. By the uniqueness result noted earlier, it follows that \( f / C = f \) so that in fact \( C = 1 \).
In summary, an irreducible, positive recurrent Markov chain \( \bs{X} \) has a unique invariant probability density function \( f \) given by \( f(x) = 1 / \mu(x) \) for \( x \in S \). We also now have a test for positive recurrence. An irreducible Markov chain \( \bs{X} \) is positive recurrent if and only if there exists a nonnegative function \( g \) on \( S \) that is left-invariant for \( P \), not identically 0, and satisfies \( \sum_{x \in S} g(x) \lt \infty \) (and then, of course, normalizing \( g \) would give \( f \)).
Consider now a general Markov chain \( \bs{X} \) on \( S \). If \( \bs{X} \) has no positive recurrent states, then as noted earlier, there are no invariant distributions. Thus, suppose that \( \bs{X} \) has a collection of positive recurrent equivalence classes \( (A_i: i \in I) \) where \( I \) is a nonempty, countable index set. The chain restricted to \( A \) is irreducible and positive recurrent for each \( i \in I \), and hence has a unique invariant probability density function \( f_i \) on \( A_i \) given by
\[ f_i(x) = \frac{1}{\mu(x)}, \quad x \in A_i \]\( f \) is an invariant probability density function for \( \bs{X} \) if and only if \( f \) has the form \( f(x) = p_i f_i(x) \) for \( x \in A_i \) and \( i \in I \) where \( (p_i: i \in I) \) is a probability density function on the index set \( I \).
Consider again the general two-state chain on \( S = \{0, 1\} \) with transition probability matrix given below, where \( p \in (0, 1) \) and \( q \in (0, 1) \) are parameters.
\[ P = \left[ \begin{matrix} 1 - p & p \\ q & 1 - q \end{matrix} \right] \]Consider a Markov chain with state space \( S = \{a, b, c, d\} \) and transition matrix \( P \) given below:
\[ P = \left[ \begin{matrix} \frac{1}{3} & \frac{2}{3} & 0 & 0 \\ 1 & 0 & 0 & 0 \\ 0 & 0 & 1 & 1 \\ \frac{1}{4} & \frac{1}{4} & \frac{1}{4} & \frac{1}{4} \end{matrix} \right] \]Consider a Markov chain with state space \( S = \{1, 2, 3, 4, 5, 6\} \) and transition matrix \( P \) given below:
\[ P = \left[ \begin{matrix} 0 & 0 & \frac{1}{2} & 0 & \frac{1}{2} & 0 \\ 0 & 0 & 0 & 0 & 0 & 1 \\ \frac{1}{4} & 0 & \frac{1}{2} & 0 & \frac{1}{4} & 0 \\ 0 & 0 & 0 & 1 & 0 & 0 \\ 0 & 0 & \frac{1}{3} & 0 & \frac{2}{3} & 0 \\ 0 & \frac{1}{4} & \frac{1}{4} & \frac{1}{4} & 0 & \frac{1}{4} \end{matrix} \right] \]Consider a Markov chain with state space \( S = \{1, 2, 3, 4, 5, 6\} \) and transition matrix \( P \) given below:
\[ P = \left[ \begin{matrix} \frac{1}{2} & \frac{1}{2} & 0 & 0 & 0 & 0 \\ \frac{1}{4} & \frac{3}{4} & 0 & 0 & 0 & 0 \\ \frac{1}{4} & 0 & \frac{1}{2} & \frac{1}{4} & 0 & 0 \\ \frac{1}{4} & 0 & \frac{1}{4} & \frac{1}{4} & 0 & \frac{1}{4} \\ 0 & 0 & 0 & 0 & \frac{1}{2} & \frac{1}{2} \\ 0 & 0 & 0 & 0 & \frac{1}{2} & \frac{1}{2} \end{matrix} \right] \]Consider the Markov chain with state space \( S = \{1, 2, 3, 4, 5, 6, 7\} \) and transition matrix \( P \) given below:
\[ P = \left[ \begin{matrix} 0 & 0 & \frac{1}{2} & \frac{1}{4} & \frac{1}{4} & 0 & 0 \\ 0 & 0 & \frac{1}{3} & 0 & \frac{2}{3} & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 & \frac{1}{3} & \frac{2}{3} \\ 0 & 0 & 0 & 0 & 0 & \frac{1}{2} & \frac{1}{2} \\ 0 & 0 & 0 & 0 & 0 & \frac{3}{4} & \frac{1}{4} \\ \frac{1}{2} & \frac{1}{2} & 0 & 0 & 0 & 0 & 0 \\ \frac{1}{4} & \frac{3}{4} & 0 & 0 & 0 & 0 & 0 \end{matrix} \right] \]Read the discussion of invariant distributions and limiting distributions in the Ehrenfest chains.
Read the discussion of invariant distributions and limiting distributions in the Bernoulli-Laplace chain.
Read the discussion of positive recurrence and invariant distributions for the reliability chains.
Read the discussion of positive recurrence and limiting distributions for the birth-death chain.
Read the discussion of positive recurrence and for the queuing chains.
Read the discussion of positive recurrence and limiting distributions for the random walks on graphs.