In this section we discuss several topics that are a bit advanced, but very important. In particular the results obtained in this section will be essential for establishing
Some of the concepts from the section on Partial Orders in the chapter on Foundations are essential for this section. As usual, our starting point is a random experiment with probability space \( (S, \mathscr{S}, \P) \). Thus, \( S \) is the sample space, \( \mathscr{S} \) is the collection of events, and \( \P \) is the probability measure.
A sequence of events \((A_1, A_2, \ldots)\) is said to be increasing if \(A_n \subseteq A_{n+1}\) for each \(n\). Thus, the events are increasing with respect to the subset partial order. The terminology is also justified by considering the corresponding indicator variables.
Let \(I_n\) denote the indicator variable of the event \(A_n\) for \(n \in \N_+\). The sequence of events is increasing if and only if the sequence of indicator variables is increasing in the ordinary sense. That is, \(I_n \le I_{n+1}\) for each \(n\).
The sequence of events is increasing if and only if \( s \in A_n \) implies \( s \in A_{n+1} \) for each \( n \in \N_+ \). But this is equivalent to \( I_n(s) = 1 \) implies \( I_{n+1}(s) = 1 \) for \( n \in \N_+ \). Since the variables just take the values 0 and 1, this is equivalent to \( I_n(s) \le I_{n+1}(s) \) for \( n \in \N_+ \) and \( s \in S \).
If \((A_1, A_2, \ldots)\) is an increasing sequence of events, we refer to the union of the events as the limit of the events:
\[\lim_{n \to \infty} A_n = \bigcup_{n=1}^\infty A_n\]Once again, the terminology is clarified by the corresponding indicator variables.
Suppose that \((A_1, A_2, \ldots)\) is an increasing sequence of events. Let \(I_n\) denote the indicator variable of \(A_n\) for \(n \in \N_+\), and let \(I\) denote the indicator variable of the union of the events. Then
\[\lim_{n \to \infty} I_n = I\]If \( s \in \bigcup_{n=1}^\infty A_n\) then \( s \in A_k \) for some \( k \in \N_+ \). Since the events are increasing, \( s \in A_n \) for every \( n \ge k \). In this case, \( I_n(s) = 1 \) for every \( n \ge k \) and \( I(s) = 1 \). On the other hand, if \( s \notin \bigcup_{n=1}^\infty A_n \) then \( s \notin A_n \) for every \( n \in \N_+ \). In this case, \( I_n(s) = 0\) for every \( n \in \N_+ \) and \( I(s) = 0 \). In both cases, \( I_n(s) \to I(s) \) as \( n \to \infty \).
Generally speaking, a function is continuous if it preserves limits. Thus, the result in the following exercise is referred to as the continuity theorem for increasing events:
Suppose that \(A_1, A_2, \ldots\) is an increasing sequence of events. Then
\[\P\left( \lim_{n \to \infty} A_n \right) = \lim_{n \to \infty} \P(A_n)\]Let \(B_1 = A_1\) and let \(B_i = A_i \setminus A_{i-1}\) for \(i \in \{2, 3, \ldots\}\). Note that the collection of events \(\{B_1, B_2, \ldots \}\) is pairwise disjoint and has the same union as \(\{A_1, A_2, \ldots \}\). From the additivity axiom of probability and the definition of infinite series,
\[ \P\left(\bigcup_{i=1}^\infty A_i\right) = \P\left(\bigcup_{i=1}^\infty B_i\right) = \sum_{i = 1}^\infty \P(B_i) = \lim_{n \to \infty} \sum_{i = 1}^n \P(B_i) \]But \( \P(B_1) = \P(A_1) \) and \( \P(B_i) = \P(A_i) - \P(A_{i-1}) \) for \( i \in \{2, 3, \ldots\} \). Therefore \( \sum_{i=1}^n \P(B_i) = \P(A_n) \) and hence we have \( \P\left(\bigcup_{i=1}^\infty A_i\right) = \lim_{n \to \infty} \P(A_n) \).
An arbitrary union of events can always be written as a union of increasing events, as the next exercise shows.
Suppose that \((A_1, A_2, \ldots)\) is a sequence of events.
Part (a) is trivial. Note that part (b) simply means that \( \bigcup_{n=1}^\infty \bigcup_{i = 1}^n A_i = \bigcup_{i=1}^\infty A_i\). Part (c) follows from parts (a) and (b) by the continuity theorem for increasing events.
The next result shows that the countable additivity axiom for a probability measure is equivalent to finite additivity and the continuity property for increasing events.
Temporarily, suppose that \( \P \) is only finitely additive, but satisfies the continuity property for increasing events in Theorem 3. Then \( \P \) is countably additive.
Suppose that \( (A_1, A_2, \ldots) \) is a sequence of pairwise disjoint events. Since we are assuming that \( \P \) is finitely additive we have
\[ \P\left(\bigcup_{i=1}^n A_i\right) = \sum_{i=1}^n \P(A_i) \]If we let \( n \to \infty \), the left side converges to \( \P\left(\bigcup_{i=1}^\infty A_i\right) \) by the continuity assumption and Theorem 4 (c), while the right side converges to \( \sum_{i=1}^\infty \P(A_i) \) by the definition of an infinite series.
There are a few mathematicians who reject the countable additivity axiom of probability measure in favor of the weaker finite additivity axiom. Whatever the philosophical arguments may be, life is certainly much harder without the continuity property for increasing events.
A sequence of events \((A_1, A_2, \ldots)\) is said to be decreasing if \(A_{n+1} \subseteq A_n\) for each \(n\). Thus, the events are decreasing with respect to the subset partial order. The terminology is also justified by considering the corresponding indicator variables.
Let \(I_n\) denote the indicator variable of event \(A_n\) for \(n \in \N_+\). The sequence of events is decreasing if and only if the sequence of indicator variables is decreasing in the ordinary sense. That is, \(I_{n+1} \le I_n\) for each \(n\).
The sequence of events is decreasing if and only if \( s \in A_{n+1} \) implies \( s \in A_n \) for each \( n \in \N_+ \). But this is equivalent to \( I_{n+1}(s) = 1 \) implies \( I_n(s) = 1 \) for \( n \in \N_+ \). Since the variables just take the values 0 and 1, this is equivalent to \( I_{n+1}(s) \le I_n(s) \) for \( n \in \N_+ \) and \( s \in S \).
If \((A_1, A_2, \ldots, A_n)\) is a decreasing sequence of events, we refer to the intersection of the events as the limit of the events:
\[\lim_{n \to \infty} A_n = \bigcap_{n=1}^\infty A_n\]Once again, the terminology is clarified by the corresponding indicator variables.
Suppose that \((A_1, A_2, \ldots)\) is a decreasing sequence of events. Let \(I_n\) denote the indicator variable of \(A_n\) for \(n \in \N_+\), and let \(I\) denote the indicator variable of the intersection of the events. Then
\[\lim_{n \to \infty} I_n = I\]If \( s \in \bigcap_{n=1}^\infty A_n \) then \( s \in A_n \) for each \( n \in \N_+ \). In this case, \( I_n(s) = 1 \) for each \( n \in \N_+ \) and \( I(s) = 1 \). If \( s \notin \bigcap_{n=1}^\infty A_n\) then \( s \notin A_k \) for some \( k \in \N_+ \). Since the events are decreasing, \( s \notin A_n \) for all \( n \ge k \). In this case, \( I_n(s) = 0 \) for \( n \ge k \) and \( I(s) = 0 \). In both cases, \( I_n(s) \to I(s) \) as \( n \to \infty \).
The result in the following exercise is referred to as the continuity theorem for decreasing events :
Suppose that \((A_1, A_2, \ldots)\) is a decreasing sequence of events. Then
\[\P\left(\lim_{n \to \infty} A_n \right) = \lim_{n \to \infty} \P(A_n)\]The sequence of complements \((A_1^c, A_2^c, \ldots)\) is increasing. Hence using the continuity theorem for increasing events, DeMorgan's law, and the complement rule we have
\[ \P\left(\bigcap_{i=1}^\infty A_i \right) = 1 - \P\left(\bigcup_{i=1}^\infty A_i^c\right) = 1 - \lim_{n \to \infty} \P(A_n^c) = \lim_{n \to \infty} [1 - \P(A_n^c)] = \lim_{n \to \infty} \P(A_n) \]An arbitrary intersection of events can always be written as an intersection of decreasing events, as the next exercise shows.
Suppose that \((A_1, A_2, \ldots)\) is a sequence of events.
Part (a) just means that \( \bigcap_{i=1}^{n+1} A_i \subseteq \bigcap_{i=1}^n A_i \). Part (b) means that \( \bigcap_{n=1}^\infty \bigcap_{i=1}^n A_i = \bigcap_{i=1}^\infty A_i \). Part (c) follows from parts (a) and (b) and the continuity theorem for decreasing events.
Suppose that \((A_1, A_2, \ldots)\) is an arbitrary sequence of events.
The sequence \(\bigcup_{i=n}^\infty A_i\) is decreasing in \(n\).
\( \bigcup_{i=n+1}^\infty A_i \subseteq \bigcup_{i=n}^\infty A_i \) for \( n \in \N_+ \).
The limit (that is, the intersection) of the decreasing sequence in the previous exercise is called the limit superior of the original sequence.
\[\limsup_{n \to \infty} A_n = \bigcap_{n=1}^\infty \bigcup_{i=n}^\infty A_i\]The event \(\limsup_{n \to \infty} A_n\) occurs if and only if \(A_n\) occurs for infinitely many values of \(n\).
From the definition, the event \( \limsup_{n \to \infty} A_n \) occurs if and only if for each \( n \in \N_+ \) there exists \( i \ge n \) such that \( A_i \) occurs.
Once again, the terminology is justified by the corresponding indicator variables:
Let \(I_n\) denote the indicator variable of \(A_n\) for \(n \in \N_+\), and let \(I\) denote the indicator variable of \(\limsup_{n \to \infty} A_n\). Then
\[I = \limsup_{n \to \infty} I_n\]By Theorem 7, \( I = \lim_{n \to \infty} \bs{1}\left(\bigcup_{i=n}^\infty A_i\right) \). But \(\bs{1}\left(\bigcup_{i=n}^\infty A_i\right) = \max\{I_i: i \ge n\}\).
This follows directly from the continuity theorem for decreasing events.
The result in the next exercise is the first Borel-Cantelli Lemma, named after Emil Borel and Francessco Cantelli. It gives a condition that is sufficient to conclude that infinitely many events occur with probability 0.
If \(\sum_{n=1}^\infty \P(A_n) \lt \infty\) then \(\P\left(\limsup_{n \to \infty} A_n\right) = 0\).
From the previous result we have \( \P\left(\limsup_{n \to \infty} A_n\right) = \lim_{n \to \infty} \P\left(\bigcup_{i = n}^\infty A_i \right) \). But from and Boole's inequality, \( \P\left(\bigcup_{i = n}^\infty A_i \right) \le \sum_{i = n}^\infty \P(A_i) \). Since \( \sum_{i = 1}^\infty \P(A_i) < \infty \), we have \( \sum_{i = n}^\infty \P(A_i) \to 0 \) as \( n \to \infty \).
In this section we suppose that \((A_1, A_2, \ldots)\) is an arbitrary sequence of events.
The sequence \(\bigcap_{i=n}^\infty A_i\) is increasing in \(n\).
\( \bigcap_{i=n}^\infty A_i \subseteq \bigcap_{i=n+1}^\infty A_i \) for \( n \in \N_+ \)
The limit (that is, the union) of the increasing sequence in the previous exercise is called the limit inferior of the original sequence.
\[\liminf_{n \to \infty} A_n = \bigcup_{n=1}^\infty \bigcap_{i=n}^\infty A_i\]The event \(\liminf_{n \to \infty} A_n\) occurs if and only if \(A_n\) occurs for all but finitely many values of \(n\).
From the definition, \( \liminf_{n \to \infty} A_n \) occurs if and only if there exists \( n \in \N_+ \) such that \( A_i \) occurs for every \( i \ge n \).
Once again, the terminology is justified by the corresponding indicator variables:
Let \(I_n\) denote the indicator variable of \(A_n\) for \(n \in \N_+\), and let \(I\) denote the indicator variable of \(\liminf_{n \to \infty} A_n\). Then
\[I = \liminf_{n \to \infty} I_n\]From Theorem 2, \( I = \lim_{n \to \infty} \bs{1}\left(\bigcap_{i=n}^\infty A_i\right) \). But \( \bs{1}\left(\bigcap_{i=n}^\infty A_i\right) = \min\{I_i: I \ge n\} \).
This follows directly from the continuity theorem for increasing events
\(\liminf_{n \to \infty} A_n \subseteq \limsup_{n \to \infty} A_n\).
If \( A_n \) occurs for all but finitely many \( n \in \N_+ \) then certainly \( A_n \) occurs for infinitely many \( n \in \N_+ \).
\(\left( \limsup_{n \to \infty} A_n \right)^c = \liminf_{n \to \infty} A_n^c\) and \(\left( \liminf_{n \to \infty} A_n \right)^c = \limsup_{n \to \infty} A_n^c\).
These results follows from DeMorgan's laws.
The result in the next exercise is the second Borel-Cantelli Lemma. It gives a condition that is sufficient to conclude that infinitely many independent events occur with probability 1.
Suppose that \((A_1, A_2, \ldots)\) is a sequence of independent events. If \(\sum_{n=1}^\infty \P(A_n) = \infty\) then \(\P\left( \limsup_{n \to \infty} A_n \right) = 1\).
Note first that \(1 - x \le e^{-x}\) for every \(x \in \R\), and hcnce \( 1 - \P(A_i) \le \exp[-\P(A_i)] \) for each \( i \in \N_+ \). From Theorems 18 and 20,
\[ \P[(\limsup_{n \to \infty} A_n)^c] = \P(\liminf_{n \to \infty} A_n^c) = \lim_{n \to \infty} \P \left(\bigcap_{i = n}^\infty A_i^c\right) \]But by independence and the inequality above,
\[ \P\left(\bigcap_{i = n}^\infty A_i^c\right) = \prod_{i = n}^\infty \P(A_i^c) = \prod_{i = n}^\infty [1 - \P(A_i)] \le \prod_{i = n}^\infty \exp[-\P(A_i)] = \exp\left(-\sum_{i = n}^\infty \P(A_i) \right) = 0 \]Suppose that \(A\) is an event in a basic experiment with \(\P(A) \gt 0\). In the compound experiment that consists of independent replications of the basic experiment, the event \(A\) occurs infinitely often
has probability 1.
Let \( p \) denote the probability of \( A \) in the basic experiment. In the compound experiment, we have a sequence of independent events \( (A_1, A_2, \ldots) \) with \( \P(A_n) = p \) for each \( n \in \N_+ \) (these are independent copies
of \( A \)). But \( \sum_{n=1}^\infty \P(A_n) = \infty \) since \( p \gt 0 \) so the result follows from the second Borel-Cantelli lemma.
Suppose that we have an infinite sequence of coins labeled \(1, 2, \ldots\) Moreover, coin \(n\) has probability of heads \(1 / n^a\) for each \(n \in \N_+\), where \(a \gt 0\) is a parameter. We toss each coin in sequence one time. In terms of \(a\), find the probability of the following events:
Suppose that \((X_1, X_2, \ldots)\) and \(X\) are real-valued random variables for an experiment. We will discuss two ways that the sequence \(X_n\) can converge
to \(X\) as \(n \to \infty\). These are fundamentally important concepts, since some of the deepest results in probability theory are limit theorems.
First, we say that \(X_n \to X\) as \(n \to \infty\) with probability 1 if
\[\P(X_n \to X \text{ as } n \to \infty) = 1\]The statement that an event has probability 1 is the strongest statement that we can make in probability theory. Thus, convergence with probability 1 is the strongest form of convergence. The phrases almost surely and almost everywhere are sometimes used instead of the phrase with probability 1.
Next we say that \(X_n \to X\) as \(n \to \infty\) in probability if
\[\P(|X_n - X| \gt \epsilon) \to 0 \text{ as } n \to \infty \text{ for each } \epsilon \gt 0\]The phrase in probability sounds superficially like the phrase with probability 1. However, as we will see, convergence in probability is much weaker than convergence with probability 1. Indeed, convergence with probability 1 is often called strong convergence, while convergence in probability is often called weak convergence. The next sequence of exercises explores convergence with probability 1. We will let \(\Q_+\) denote the set of positive rational numbers; a critical point to remember is that this set is countable.
The following events are equivalent:
The equivalence of (a) and (b) is simply definition. The equivalence of (b) and (c) follows because there are arbitrarily small positive rational numbers. Note that if the event \( \{|X_n - X| \gt \epsilon\) for infinitely many \(n \in \N_+ \}\) occurs for a given \( \epsilon \gt 0 \), the it holds for all smaller \( \epsilon \gt 0 \).
The following are equivalent:
From part (c) of Theorem 24, \( \P(X_n \to X \text{ as } n \to \infty) = 1 \) if and only if
\[ \P\left(\bigcup_{\epsilon \in \Q_+} \{|X_n - X| \gt \epsilon \text{ for infinitely many } n \in \N_+\} \right) = 0 \]But by Boole's inequality, a countable union of events has probability 0 if and only if every event in the union has probability 0. Thus, (a) is equivalent to (b). Statement (b) is clearly equivalent to (c) since there are arbitrarily small positive rational numbers. Finally, (c) is equivalent to (d) by Theorem 13.
Our next result gives a nice criterion for convergence with probability 1:
If \(\sum_{n=1}^\infty \P(|X_n - X| \gt \epsilon) \lt \infty\) for every \(\epsilon \gt 0\) then \(X_n \to X\) as \(n \to \infty\) with probability 1.
By the first Borel-Cantelli Lemma, if \(\sum_{n=1}^\infty \P(|X_n - X| \gt \epsilon) \lt \infty\) then \(\P(|X_n - X| \gt \epsilon \text{ for infinitely many } n \in \N_+) = 0\). Hence the result follows from the previous theorem.
We can now obtain one of our main results: convergence with probability 1 implies convergence in probability.
If \(X_n \to X\) as \(n \to \infty\) with probability 1 then \(X_n \to X\) as \(n \to \infty\) in probability.
Let \( \epsilon \gt 0 \). Then \( \P(|X_n - X| \gt \epsilon) \le \P(|X_k - X| \gt \epsilon \text{ for some } k \ge n)\). But if \( X_n \to X \) as \( n \to \infty \) with probability 1, then the expression on the right converges to 0 as \( n \to \infty \) by Theorem 25 (d). Hence \( X_n \to X \) as \( n \to \infty \) in probability.
The converse fails with a passion as the next exercise shows.
As in Exercise 23, suppose that we have a sequence of coins labeled \(1, 2, \ldots\), and that coin \(n\) lands heads up with probability \(\frac{1}{n}\) for each \(n\). We toss the coins in order to produce a sequence \((X_1, X_2, \ldots)\) of independent indicator random variables with
\[\P(X_n = 1) = \frac{1}{n}, \; \P(X_n = 0) = 1 - \frac{1}{n}; \quad n \in \N_+\]Parts (a) and (b) follow from the second Borel-Cantelli lemma, since \( \sum_{n = 1}^\infty \P(X_n = 0) = \infty \) and \( \sum_{n = 1}^\infty \P(X_n = 1) = \infty \). Part (c) follows from parts (a) and (b). For part (d), suppose \( 0 \lt \epsilon \lt 1 \). Then \( \P(|X_n - 0| \gt \epsilon) = \P(X_n = 1) = \frac{1}{n} \to 0 \) as \( n \to \infty \).
However, there is a partial converse to Exercise 27 that is very useful.
If \(X_n \to X\) as \(n \to \infty\) in probability, then there exists a subsequence \((n_1, n_2, n_3 \ldots)\) of \(\N_+\) such that \(X_{n_k} \to X\) as \(k \to \infty\) with probability 1.
Suppose that \( X_n \to X \) as \( n \to \infty \) in probability. Then for each \(k \in \N_+\) there exists \(n_k \in \N_+\) such that \(\P\left( \left| X_{n_k} - X \right| \gt 1 / k \right) \lt 1 / k^2\). We can make the choices so that \(n_k \lt n_{k+1}\) for each \(k\). It follows that \(\sum_{k=1}^\infty \P\left(\left|X_{n_k} - X\right| \gt \epsilon \right) \lt \infty\) for every \(\epsilon \gt 0\). By Exercise 26, \(X_{n_k} \to X\) as \(n \to \infty\) with probability 1.
Note that the proof works because \(1 / k \to 0\) as \(k \to \infty\) and \(\sum_{k=1}^\infty 1 / k^2 \lt \infty\). Any two sequences with these properties would work just as well.
There are two other modes of convergence that we will discuss later: