\( \newcommand{\P}{\mathbb{P}} \) \( \newcommand{\R}{\mathbb{R}} \) \( \newcommand{\N}{\mathbb{N}} \) \( \newcommand{\Z}{\mathbb{Z}} \) \( \newcommand{\bs}{\boldsymbol} \)
  1. Virtual Laboratories
  2. 1. Probability Spaces
  3. 1
  4. 2
  5. 3
  6. 4
  7. 5
  8. 6
  9. 7
  10. 8
  11. Answers

7. Measure Theory

In this section we discuss probability spaces, and general measure spaces, from a more advanced point of view. The sections on Measure Theory and Special Set Structures in the chapter on Foundations are essential prerequisites.

Positive Measure

Suppose that \( S \) is a set, playing the role of a universal set for a mathematical theory. As we noted before, \( S \) usually comes with a \( \sigma \)-algebra \( \mathscr{S} \) of admissible subsets of \( S \), so that \( (S, \mathscr{S}) \) is a measurable space. In particular, this is the case for the model of a random experiment, where \( S \) is the sample space: the collection of events \( \mathscr{S} \) is required to be a \( \sigma \)-algebra. A probability measure is a special case of a more general object known as a positive measure. Formally a positive measure on \((S, \mathscr{S})\) is a function \(\mu: \mathscr{S} \to [0, \infty] \) that satisfies the following axioms:

  1. \( \mu(\emptyset) = 0 \)
  2. If \(\{A_i: i \in I\}\) is a countable, pairwise disjoint collection of sets in \(\mathscr{S}\) then \[\mu\left(\bigcup_{i \in I} A_i\right) = \sum_{i \in I} \mu(A_i)\]

Axiom 2 is called countable additivity, and is the essential property. The measure of a set that consists of a countable union of disjoint pieces is the sum of the measures of the pieces. The three objects together, \( (S, \mathscr{S}, \mu) \) form a measure space.

In particular, a probability measure \(\P\) on \((S, \mathscr{S})\) is a positive measure on \((S, \mathscr{S})\) with the additional requirement that \(\P(S) = 1\). However, positive measures are important beyond the application to probability. The standard measures on the Euclidean spaces are all positive measures: the extension of length for measurable subsets of \( \R \), the extension of area for measurable subsets of \( \R^2 \), the extension of volume for measurable subsets of \( \R^3 \), and the higher dimensional analogues. We will actually construct these measures in the next section on Existence and Uniqueness. In addition, Counting measure \( \# \) is a positive measure on the subsets of a set \( S \).

Constructions

There are several simple ways to construct new positive measures from existing ones. As usual, we start with a measurable space \( (S, \mathscr{S}) \)

Suppose that \( (R, \mathscr{R}) \) is a measurable subspace of \( (S, \mathscr{S}) \), in the sense that \( \mathscr{R} \) is a \( \sigma \)-algebra of subsets of \( R \) and \( \mathscr{R} \subseteq \mathscr{S} \) (and hence in particular \( R \in \mathscr{S} \)). If \( \mu \) is a positive measure on \( (S, \mathscr{S}) \) then \( \mu \) restricted to \( \mathscr{R} \) is a positive measure on \( (R, \mathscr{R}) \).

In particular, the previous theorem would apply when \( R = S \) so that \( \mathscr{R} \) is a sub \( \sigma \)-algebra of \( \mathscr{S} \).

If \( \mu \) is a positive measure on \( (S, \mathscr{S}) \) and \( c \gt 0 \), then \( c \, \mu \) is also a positive measure on \( (S, \mathscr{S}) \).

If \( \mu_i \) is a positive measure on \( (S, \mathscr{S}) \) for each \( i \) in a countable index set \( I \) then \( \sum_{i \in I} \mu_i \) is also a positive measure on \( (S, \mathscr{S}) \).

Combining the last two theorems, note that a positive linear combination of positive measures is a positive measure. The most general way to construct new measures from old ones is via the theory of Lebesgue integration. The construction of postive measures more or less from scratch is considered in the next section on Existence and Uniqueness.

Properties

The following exercises give some simple properties of a positive measure \( \mu \) on \( (S, \mathscr{S}) \). The proofs are essentially identical to the proofs of the corresponding properties of probability, except that the measure of a set may be infinite so we must be careful not to use the meaningless expression \( \infty - \infty \).

If \( A, \; B \in \mathscr{S} \), then \( \mu(B) = \mu(A \cap B) + \mu(B \setminus A) \).

Proof:

Note that \( B = (A \cap B) \cup (B \setminus A) \), and the sets in the union are disjoint.

If \( A, \; B \in \mathscr{S} \) and \( A \subseteq B \) then

  1. \( \mu(B) = \mu(A) + \mu(B \setminus A) \)
  2. \( \mu(A) \le \mu(B) \)
Proof:

Part (a) follows from the previous theorem, since \( A \cap B = A \). Part (b) follows from part (a).

Thus \( \mu \) is increasing, relative to the subset partial order \( \subseteq \) on \( \mathscr{S} \) and the ordinary order \( \le \) on \( [0, \infty] \). Note also that if \( A, \; B \in \mathscr{S} \) and \( \mu(B) \lt \infty \) then \( \mu(B \setminus A) = \mu(B) - \mu(A \cap B) \). In the special case that \( A \subseteq B \), this becomes \( \mu(B \setminus A) = \mu(B) - \mu(A) \). These properties are just like the difference rules for probability. If \( \mu(S) \lt \infty \) then \( \mu(A^c) = \mu(S) - \mu(A) \). This is the analogue of the complement rule in probability, with but with \( \mu(S) \) replacing 1.

The following result is the analogue of Boole's inequality for probability. For a general positive measure \( \mu \), the result is referred to as the subadditive property.

Suppose that \( A_i \in \mathscr{S} \) for \( i \) in a countable index set \( I \). Then

\[ \mu\left(\bigcup_{i \in I} A_i \right) \le \sum_{i \in I} \mu(A_i) \]
Proof:

The proof is exaclty like the one for Boole's inequality. Assume that \( I = \N_+ \). Let \( B_1 = A_1 \) and \( B_i = A_i \setminus (A_1 \cup \ldots \cup A_{i-1}) \) for \( i \in \{2, 3, \ldots\} \). Then \( \{B_i: i \in I\} \) is a disjoint collection of sets in \( \mathscr{S} \) with the same union as \( \{A_i: i \in I\} \). Also \( B_i \subseteq A_i \) for each \( i \) so \( \mu(B_i) \le \mu(A_i) \). Hence

\[ \mu\left(\bigcup_{i \in I} A_i \right) = \mu\left(\bigcup_{i \in I} B_i \right) = \sum_{i \in I} \mu(B_i) \le \sum_{i \in I} \mu(A_i) \]

For a union of sets with finite measure, the inclusion-exclusion formula holds, and the proof is just like the one for probability.

Suppose that \(A_i \in \mathscr{S}\) for each \(i \in I\) where \(\#(I) = n\), and that \( \mu(A_i) \lt \infty \) for \( i \in I \). Then

\[\mu \left( \bigcup_{i \in I} A_i \right) = \sum_{k = 1}^n (-1)^{k - 1} \sum_{J \subseteq I, \; \#(J) = k} \mu \left( \bigcap_{j \in J} A_j \right)\]

The continuity theorem for increasing sets holds for a positive measure. The continuity theorem for decreasing events holds also, if the sets have finite measure. Again, the proofs are similar to the ones for a probability measure, except for considerations of infinite measure.

Suppose that \( (A_1, A_2, \ldots) \) is a sequence of sets in \( \mathscr{S} \).

  1. If \( A_1 \subseteq A_2 \subseteq \cdots\) then \( \mu\left(\bigcup_{i=1}^\infty A_i \right) = \lim_{n \to \infty} \mu(A_n) \).
  2. If \( A_1 \supseteq A_2 \supseteq \cdots\) and \( \mu(A_1) \lt \infty \) then \( \mu\left(\bigcap_{i=1}^\infty A_i \right) = \lim_{n \to \infty} \mu(A_n) \).
Proof:

For part (a), note that if \( \mu(A_k) = \infty \) for some \( k \) then \( \mu(A_n) = \infty \) for \( n \ge k \) and \( \mu\left(\bigcup_{i=1}^\infty A_i \right) = \infty \). Thus, suppose that \( \mu(A_i) \lt \infty \) for each \( i \). Let \( B_1 = A_1 \) and \( B_i = A_i \setminus A_{i-1} \) for \( i \in \{2, 3, \ldots\} \). Then \( (B_1, B_2, \ldots) \) is a disjoint sequence with the same union as \( (A_1, A_2, \ldots) \). Also, \( \mu(B_1) = \mu(A_1) \) and \( \mu(B_i) = \mu(A_i) - \mu(A_{i-1}) \) for \( i \in \{2, 3, \ldots\} \). Hence

\[ \mu\left(\bigcup_{i=1}^\infty A_i \right) = \mu \left(\bigcup_{i=1}^\infty B_i \right) = \sum_{i=1}^\infty \mu(B_i) = \lim_{n \to \infty} \sum_{i=1}^n \mu(B_i) \]

But \( \sum_{i=1}^n \mu(B_i) = \mu(A_1) + \sum_{i=2}^n [\mu(A_i) - \mu(A_{i-1})] = \mu(A_n) \). For part (b), note that \( A_1 \setminus A_n \) is increasing in \( n \). Hence using the continuity result for increasing sets,

\[ \begin{align} \mu \left(\bigcap_{i=1}^\infty A_i \right) & = \mu\left[A_1 \setminus \bigcup_{i=1}^\infty (A_1 \setminus A_i) \right] = \mu(A_1) - \mu\left[\bigcup_{i=1}^\infty (A_1 \setminus A_n)\right]\\ & = \mu(A_1) - \lim_{n \to \infty} \mu(A_1 \setminus A_n) = \mu(A_1) - \lim_{n \to \infty} [\mu(A_1) - \mu(A_n)] = \lim_{n \to \infty} \mu(A_n) \end{align} \]

The continuity theorem for decreasing events fails without the additional assumption of finite measure. For example, consider \( \Z \) with counting measure \( \# \). Let \( A_n = \{ z \in \Z: z \le -n\} \) for \( n \in \N_+ \). Then \( \#(A_n) = \infty \) for each \( n \) but \( \# \left(\bigcap_{i=1}^\infty A_i\right) = \#(\emptyset) = 0 \).

A nontrivial finite positive measure \( \mu \), with \( 0 \lt \mu(S) \lt \infty \), is practically just like a probability measure, and in fact can be re-scaled into a probability measure \( \P \), as was done in the section on Probability Measures:

\[ \P(A) = \frac{\mu(A)}{\mu(S)}, \quad A \in \mathscr{S} \]

If a positive measure \( \mu \) is not finite, then the next best thing is for it to be \( \sigma \)-finite. This means that there exists a sequence \( (A_1, A_2, \ldots) \) of sets in \( \mathscr{S} \) with \( \bigcup_{i=1}^\infty A_i = S \) and \( \mu(A_i) \lt \infty \) for each \( i \in \N_+ \). Restricted to \( A_i \), \( \mu \) is finite measure, and hence certain nice properties of finite measures can be extended to \( \sigma \)-finite measures.

If \( \mu \) is \( \sigma \)-finite measure then there exists an increasing sequence satisfying the definition and there exists a disjoint sequence satisfying the definition.

Proof:

We use the same tricks that we have used before. Let \( A_n \in \mathscr{A}, \; n \in \N_+ \) be a sequence that satisfies the \( \sigma \)-finite definition. Let \( B_n = \bigcup_{i = 1}^n A_i \). Then \( B_n \in \mathscr{A} \) for \( n \in \N_+ \) and this sequence is increasing. Moreover, \( \mu(B_n) \le \sum_{i=1}^n \mu(A_i) \lt \infty \) for \( n \in \N_+ \) and \( \bigcup_{n=1}^\infty B_n = \bigcup_{n=1}^\infty A_n = S \). Next, let \( C_1 = A_1 \) and let \( C_n = A_n \setminus \bigcup_{i=1}^{n-1} \) for \( n \in \{2, 3, \ldots\} \). Then \( C_n \in \mathscr{A} \) for each \( n \in \N_+ \) and this sequence is disjoint. Moreover, \( C_n \subseteq A_n \) so \( \mu(C_n) \le \mu(A_n) \lt \infty \) and \( \bigcup_{n=1}^\infty C_n = \bigcup_{n=1}^\infty A_n = S \).

Topics in Probability Revisited

Formally then, a probability space \((S, \mathscr{S}, \P)\), the basic mathematical model of a random experiment, consists of three essential parts:

  1. the sample space \(S\)
  2. the \(\sigma\)-algebra of events \(\mathscr{S}\)
  3. the probability measure \(\P\) on \( \mathscr{S} \)

Moreover, in probability, \(\sigma\)-algebras are not just important for theoretical and foundational purposes, but are important for practical purposes as well. A \(\sigma\)-algebra can be used to specify partial information about an experiment--a concept of fundamental importance. Specifically, suppose that \(\mathscr{A}\) is a collection of events in the experiment, and that we know whether or not \(A\) occurred for each \(A \in \mathscr{A}\). Then in fact, we can determine whether or not \(A\) occurred for each \(A \in \sigma(\mathscr{A})\), the \(\sigma\)-algebra generated by \(\mathscr{A}\).

Suppose that \(X\) is a random variable for the experiment, taking values in a set \(T\). Almost always, \(T\) will have a natural \(\sigma\)-algebra of admissible subsets \(\mathscr{T}\). Technically, the random variable \(X\) is required to be measurable as a function from \(S\) into \(T\). This ensures that \(\{X \in B\}\) is a valid event (that is, a member of the \(\sigma\)-algebra \(\mathscr{S}\)) for each \(B \in \mathscr{T}\). Therefore, the probability distribution of \(X\), that is the mapping \(B \mapsto \P(X \in B)\), really is a probability measure on the on the \(\sigma\)-algebra \(\mathscr{T}\).

Also, \(\{\{X \in B\}: B \in \mathscr{T}\}\) is a sub \(\sigma\)-algebra of \(\mathscr{S}\), and in fact is the \(\sigma\)-algebra generated by \(X\), denoted \(\sigma(X)\). If we observe the value of \(X\), then we know whether or not each event in \(\sigma(X)\) has occurred. More generally, suppose \(X_i\) is a random variable taking values in a set \( T_i \) (with \( \sigma \)-algebra \( \mathscr{T}_i \)) for each \(i\) in an index set \(I\). Recall that the \( \sigma \)-algebra generated by \( \{X_i: i \in I\} \) is

\[ \sigma\{X_i: i \in I\} = \sigma\{\{X \in B_i\}: B_i \in \mathscr{T}_i, \; i \in I\} \]

If we observe the value of \(X_i\) for each \(i \in I\) then we know whether or not each event in \(\sigma\{X_i: i \in I\}\) has occurred. This idea is very important in the study of random processes; see the chapter on Markov Chains for example.

Null and Almost Sure Events

Suppose that \( (S, \mathscr{S}, \P) \) is a probability space. Recall that \( A \in \mathscr{S} \) is a null event if \( \P(A) = 0 \), and \( A \) is an almost sure event if \( \P(A) = 1 \). Let

\[ \mathscr{D} = \{A \in \mathscr{S}: \P(A) = 0 \text{ or } \P(A) = 1 \} \]

denote the collection of null and almost certain events, that is, the collection of essentially deterministic events. In the section on independence, we showed that \( \mathscr{D} \) is independent. It satisfies another important property as well:

\( \mathscr{D} \) is a sub \(\sigma\)-algebra of \( \mathscr{S} \).

Proof:

Trivially \( S \in \mathscr{D} \), and if \( A \in \mathscr{D} \) then \( A^c \in \mathscr{D} \). Suppose that \( A_i \in \mathscr{D} \) for \( i \in I \) where \( I \) is a countable index set. If \( \P(A_i) = 0 \) for every \( i \in I \) then \( \P\left(\bigcup_{i \in I} A_i \right) = 0 \) by Boole's inequality. On the other hand, if \( \P(A_j) = 1 \) for some \( j \in J \) then \( \P\left(\bigcup_{i \in I} A_i \right) = 1 \). In either case, \( \bigcup_{i \in I} A_i \in \mathscr{D} \).

Suppose that \( A \in \mathscr{S} \) and \( \P(A) = 0 \). If \( B \subseteq A \) and \( B \in \mathscr{S} \), then we know that \( \P(B) = 0 \) also. However, in general there might be subsets of \( A \) that are not in \( \mathscr{S} \). The \( \sigma \)-algebra \( \mathscr{S} \) is complete with respect to \( \P \) if \( A \in \mathscr{S} \), \( \P(A) = 0 \), and \( B \subseteq A \) imply \( B \in \mathscr{S} \) (and hence \( \P(B) = 0 \)). That is, the collection of events \( \mathscr{S} \) is complete with respect to \( \P \) if every subset of an event with probability 0 is also an event (and hence also has probability 0).

Fortunately, if \( \mathscr{S} \) is not complete, it can always be completed. Recall that for \( A, \; B \subseteq S \), the symmetric difference is \( A \bigtriangleup B = (A \setminus B) \cup (B \setminus A) \). Let \( \mathscr{N} = \{A \in \mathscr{S}: \P(A) = 0\} \) denote the collection of null events. Define a relation \( \equiv \) on \( \mathscr{P}(S) \) (the power set of \( S \)) by \( A \equiv B \) if and only if there exists \( N \in \mathscr{N} \) such that \( A \bigtriangleup B \subseteq N \).

The relation \( \equiv \) is an equivalence relation on \( \mathscr{P}(S) \). That is for \( A, \; B, \; C \subseteq S \),

  1. \( A \equiv A \) (the reflexive property).
  2. If \( A \equiv B \) then \( B \equiv A \) (the symmetric property).
  3. If \( A \equiv B \) and \( B \equiv C \) then \( A \equiv C \) (the transitive property).
Proof:

For part (a), note that \( A \bigtriangleup A = \emptyset \) and \( \emptyset \in \mathscr{N} \). For part (b), suppose that \( A \bigtriangleup B \subseteq N \) where \( N \in \mathscr{N} \). Then \( B \bigtriangleup A = A \bigtriangleup B \subseteq N\). For part (c), suppose that \( A \bigtriangleup B \subseteq N_1 \) and \( B \bigtriangleup C \subseteq N_2\) where \( N_1, \; N_2 \in \mathscr{N} \). Then \( A \bigtriangleup C \subseteq (A \bigtriangleup B) \cup (B \bigtriangleup C) \subseteq N_1 \cup N_2 \), and \( N_1 \cup N_2 \in \mathscr{N} \).

Note that restricted to \( \mathscr{S} \), the relation \( \equiv \) is the ordinary equivalence relation for events that we studied earlier. Note also that \( A \equiv \emptyset \) if and only if \( A \subseteq N \) for some \( N \in \mathscr{N} \). Now let \( \mathscr{S}_0 = \{A \subseteq S: A \equiv B \text{ for some } B \in \mathscr{S} \} \). Thus, \( A \in \mathscr{S}_0 \) if and only if there exist \( B \in \mathscr{S} \) and \( N \in \mathscr{N} \) such that \( A \bigtriangleup B \subseteq N \).

\( \mathscr{S}_0 \) is a \( \sigma \)-algebra of subsets of \( S \), and in fact is the \( \sigma \)-algebra generated by \( \mathscr{S} \cup \{A \subseteq S: A \equiv \emptyset\} \).

Proof:

Note that if \( A \in \mathscr{S} \) then \( A \equiv A \) so \( A \in \mathscr{S}_0 \). In particular, \( S \in \mathscr{S}_0 \). Also, \( \emptyset \in \mathscr{S} \) so if \( A \equiv \emptyset \) then \( A \in \mathscr{S}_0 \). Suppose that \( A \in \mathscr{S}_0 \) so that \( A \equiv B \) for some \( B \in \mathscr{S} \). Then \( B^c \in \mathscr{S} \) and \( A^c \equiv B^c \) so \( A^c \in \mathscr{S}_0 \). Next suppose that \( A_i \in \mathscr{S}_0 \) for \( i \) in a countable index set \( I \). Then for each \( i \in I \) there exists \( B_i \in \mathscr{S} \) such that \( A_i \equiv B_i \). But then \( \bigcup_{i \in I} B_i \in \mathscr{S} \) and \( \bigcup_{i \in I} A_i \equiv \bigcup_{i \in I} B_i \), so \( \bigcup_{i \in I} A_i \in \mathscr{S}_0 \). Therefore \( \mathscr{S}_0 \) is a \( \sigma \)-algebra of subsets of \( S \). Finally, suppose that \( \mathscr{T} \) is a \( \sigma \)-algebra of subsets of \( S \) and that \( \mathscr{S} \cup \{A \subseteq S: A \equiv \emptyset\} \subseteq \mathscr{T} \). We need to show that \( \mathscr{S}_0 \subseteq \mathscr{T} \). Thus, suppose that \( A \in \mathscr{S}_0 \) Then there exists \( B \in \mathscr{S} \) such that \( A \equiv B \). But \( B \in \mathscr{T} \) and \( A \bigtriangleup B \in \mathscr{T} \) so \( A \cap B = B \setminus (A \bigtriangleup B) \in \mathscr{T}\). Also \( A \setminus B \in \mathscr{T} \), so \( A = (A \cap B) \cup (A \setminus B) \in \mathscr{T} \).

\( \P \) can be extended to a probability measure on \( \mathscr{S}_0 \). Specifically, suppose that \( A \in \mathscr{S}_0 \) so that \( A \equiv B \) for some \( B \in \mathscr{S} \). Define \( \P_0(A) = \P(B) \). Then

  1. \( \P_0 \) is well-defined.
  2. \( \P_0(A) = \P(A) \) for \( A \in \mathscr{S} \).
  3. \( \P_0 \) is a probability measure on \( \mathscr{S}_0 \).
Proof:

Suppose that \( A \in \mathscr{S}_0 \) and that \( A \equiv B_1 \) and \( A \equiv B_2 \) where \( B_1, \; B_2 \in \mathscr{S} \). Then \(B_1 \equiv B_2 \) so \( \P(B_1) = \P(B_2) \). Thus, \( \P_0 \) is well-defined. Next, if \( A \in \mathscr{S} \) then of course \( A \equiv A \) so \( \P_0(A) = \P(A) \). In particular, \( \P_0(S) = 1 \). Trivially \( \P_0(A) \ge 0 \) for \( A \in \mathscr{S}_0 \). Thus we just need to show the countable additivity property. Towards that end, suppose that \( \{A_i: i \in I\} \) is a countable collection of pairwise disjoint sets in \( \mathscr{S}_0 \). For each \( i \in I \) there exists \( B_i \in \mathscr{S} \) and \( N_i \in \mathscr{N} \) such that \( A_i \bigtriangleup B_i \subseteq N_i \). That is, \( A_i \equiv B_i \), so in particular \( \P_0(A_i) = \P(B_i) \). Now \( \bigcup_{i \in I} A_i \equiv \bigcup_{i \in I} B_i \), so \( \P_0\left(\bigcup_{i \in I} A_i\right) = \P\left(\bigcup_{i \in I} B_i\right) \). By Boole's inequality, \( \P\left(\bigcup_{i \in I} B_i\right) \le \sum_{i \in I} \P(B_i) = \sum_{i \in I} \P_0(A_i) \). But also, \( B_i \cap B_j \subseteq N_i \cup N_j \) and therefore \( \P(B_i \cap B_j) = 0 \) for \( i \ne j \). By Bonferroni's inequality, \( \P\left(\bigcup_{i \in I} B_i\right) \ge \sum_{i \in I} \P(B_i) = \sum_{i \in I} \P_0(A_i) \). Thus we conclude that \( \P_0\left(\bigcup_{i \in I} A_i\right) = \sum_{i \in I} \P_0(A_i) \).

More generally, if \( \mu \) is a \( \sigma \)-finite measure on a measurable space \( (S, \mathscr{S}) \), then \( \mathscr{S} \) can be completed with respect to \( \mu \).

Independence

As usual, suppose that \( (S, \mathscr{S}, \P) \) is a probability space. We have already studied the independence of collections of events and the independence of collections of random variables. A more complete and general treatment results if we define the independence of collections of collections of events, and most importantly, the independence of collections of \( \sigma \)-algebras. This extension actually occurred already, when we went from independence of a collection of events to independence of a collection of random variables, but we did not note it at the time. In spite of the layers of set theory, the basic idea is the same.

Thus, suppose that \( \mathscr{A}_i \) is a collection of events for each \( i \) in an index set \( I \). Then \( \mathscr{A} = \{\mathscr{A}_i: i \in I\} \) is independent if and only if for every choice of \( A_i \in \mathscr{A}_i \) for \( i \in I \), the collection of events \(\{ A_i: i \in I\} \) is independent in the sense we defined earlier. That is, for every finite \(J \subseteq I \),

\[ \P\left(\bigcap_{j \in J} A_j\right) = \prod_{j \in J} \P(A_j) \]

Suppose that \( X_i \) is a random variable taking values in a set \( T_i \) (with \( \sigma \)-algebra \( \mathscr{T}_i \)) for each \( i \) in an index set \( I \). The independence of \( \{X_i: i \in I\} \) that we defined earlier is equivalent to the independence of \( \{\sigma(X_i): i \in I\} \).

Suppose that \( A_i \) is an event for each \( i \in I \). The independence of \( \{A_i: i \in I\} \) is equivalent to the independence of \( \{\mathscr{A}_i: i \in I\} \) where \( \mathscr{A}_i = \sigma\{A_i\} = \{S, \emptyset, A_i, A_i^c\} \) for each \( i \in I \). Thus, our new definition really does subsume our old one.

For every collection of objects that we have considered (collections of events, collections of random variables, collections of collections of events), the notion of independence has the basic inheritance property.

Suppose that \( \mathscr{A} \) is a collection of collections of events.

  1. If \( \mathscr{A} \) is independent then \( \mathscr{B} \) is independent for every \( \mathscr{B} \subseteq \mathscr{A} \).
  2. If \( \mathscr{B} \) is independent for every finite \( \mathscr{B} \subseteq \mathscr{A} \) then \( \mathscr{A} \) is independent.

For our next result, you will need to review the definitions and theorems concerning \( \pi \)-systems and \( \lambda \)-systems. The proof uses Dynkin's \( \pi \)-\( \lambda \) theorem, named for Eugene Dynkin.

Suppose that \( \mathscr{A}_i \) is a collection of events for each \( i \) in an index set \( I \), and that \( \mathscr{A_i} \) is a \( \pi \)-system for each \( i \in I \). If \( \{\mathscr{A}_i: i \in I\} \) is independent, then \( \{\sigma(\mathscr{A}_i): i \in I\} \) is independent.

Proof:

In light of Theorem 15, it suffices to consider a finite set of collections. Thus, suppose that \( \{\mathscr{A}_1, \mathscr{A}_2, \ldots, \mathscr{A}_n\} \) is independent. Now, fix \( A_i \in \mathscr{A}_i \) for \( i \in \{2, 3, \ldots, n\} \) and let \( E = \bigcap_{i=2}^n A_i \). Let \( \mathscr{L} = \{B \in \mathscr{S}: \P(B \cap E) = \P(B) \P(E)\} \). Trivially \( S \in \mathscr{L} \) since \( \P(S \cap E) = \P(E) = \P(S) \P(E) \). Next suppose that \( A \in \mathscr{L} \). Then

\[ \P(A^c \cap E) = \P(E) - \P(A \cap E) = \P(E) - \P(A) \P(E) = [1 - \P(A)] \P(E) = \P(A^c) \P(E) \]

Thus \( A^c \in \mathscr{L} \). Finally, suppose that \( \{A_j: j \in J\} \) is a countable collection of disjoint sets in \( \mathscr{L} \). Then

\[ \P\left[\left(\bigcup_{j \in J} A_j \right) \cap E \right] = \P\left[ \bigcup_{j \in J} (A_j \cap E) \right] = \sum_{j \in J} \P(A_j \cap E) = \sum_{j \in J} \P(A_j) \P(E) = \P(E) \sum_{j \in J} \P(A_j) = \P(E) \P\left(\bigcup_{j \in J} A_j \right) \]

Therefore \( \bigcup_{j \in J} A_j \in \mathscr{L} \) and so \( \mathscr{L} \) is a \( \lambda \)-system. Trivially \( \mathscr{A_1} \subseteq \mathscr{L} \) by the original independence assumption, so by the \( \pi \)-\( \lambda \) theorem, \( \sigma(\mathscr{A}_1) \subseteq \mathscr{L} \). Thus, we have that for every \( A_1 \in \sigma(\mathscr{A}_1) \) and \( A_i \in \mathscr{A}_i \) for \( i \in \{2, 3, \ldots, n\} \),

\[ \P\left(\bigcap_{i=1}^n A_i \right) = \prod_{i=1}^n \P(A_i) \]

Thus we have shown that \( \{\sigma(\mathscr{A}_1), \mathscr{A}_2, \ldots, \mathscr{A}_n\} \) is independent. Repeating the argument \( n - 1 \) additional times, we get that \( \{\sigma(\mathscr{A}_1), \sigma(\mathscr{A}_2), \ldots, \sigma(\mathscr{A}_n)\} \) is independent.

Suppose that \( \mathscr{A} \) is an independent collection of events, and that \( \{\mathscr{B}_j: j \in J\} \) is a partition of \( \mathscr{A} \). That is, \( \mathscr{B}_j \cap \mathscr{B}_k = \emptyset \) for \( j \ne k \) and \( \bigcup_{j \in J} \mathscr{B}_j = \mathscr{A} \). Then \( \{\sigma(\mathscr{B}_j): j \in J\} \) is independent.

Proof:

Let \( \mathscr{B}_j^* \) denote the set of all finite intersections of sets in \( \mathscr{B}_j \), for each \( j \in J \). Then clearly \( \mathscr{B}_j^* \) is a \( \pi \)-system for each \( j \), and \( \{\mathscr{B}_j^*: j \in J\} \) is independent. By the previous theorem, \( \{\sigma(\mathscr{B}_j^*): j \in J\} \) is independent. But clearly \( \sigma(\mathscr{B}_j^*) = \sigma(\mathscr{B}_j) \) for \( j \in J \).

The previous result is a rigorous statement of the strong independence that is implied the independence of a collection of events. Let's bring the result down to earth. Suppose that \( A, B, C, D \) are independent events. In our elementary discussion, you were asked to show, for example, that \( A \cup B^c \) and \( C \cap D^c \) are independent. This is a consequence of the much stronger statement that the \( \sigma \)-algebras \( \sigma\{A, B\} \) and \( \sigma\{C, D\} \) are independent.

Tail Events

Let \((X_1, X_2, \ldots)\) be a sequence of random variables for a random experiment. The tail sigma algebra of the sequence is

\[ \mathscr{T} = \bigcap_{n=1}^\infty \sigma\{X_n, X_{n+1}, \ldots\} \]

and an event \(B \in \mathscr{T}\) is a tail event for the sequence. Informally, a tail event is an event that can be defined in terms of \(\{X_n, X_{n+1}, \ldots\}\) for each \(n \in \N_+\). The tail sigma algebra for a sequence of events \( (A_1, A_2, \ldots) \) is defined analogously (let \(X_k = \bs{1}(A_k)\), the indicator variable of \(A\), for each \(k\)). The following exercises give some examples. You may need to review some of the definitions in the section on Convergence.

Suppose that \((A_1, A_2, \ldots)\) is a sequence of events.

  1. If the events are increasing then \(\bigcup_{i=1}^\infty A_i\) is a tail event of the sequence.
  2. If the events are decreasing then \(\bigcap_{n=1}^\infty A_n\) is a tail event of the sequence.
Proof:

If the events are increasing then \( \bigcup_{i=1}^\infty A_i = \bigcup_{i=n}^\infty A_i \) for every \( n \in \N_+ \). Similarly, if the events are decreasing then \( \bigcap_{i=1}^\infty A_i = \bigcap_{i=n}^\infty A_i \) for every \( n \in \N_+ \)

Suppose again that \( (A_1, A_2, \ldots) \) is a sequence of events. Each of the following is a tail event of the sequence:

  1. \(\limsup_{n \to \infty} A_n\)
  2. \(\liminf_{n \to \infty} A_n\)
Proof:

Recall that \( \limsup_{n \to \infty} A_n = \bigcap_{n=1}^\infty \bigcup_{i=n}^\infty A_i \). The events \( \bigcup_{i=n}^\infty A_i \) are decreasing in \( n \) and hence \( \limsup_{n \to \infty} A_n = \bigcap_{n=k}^\infty \bigcup_{i=n}^\infty A_i \in \sigma\{A_k, A_{k+1}, \ldots\} \) for each \( k \in \N_+ \).

Similarly \( \liminf_{n \to \infty} A_n = \bigcup_{n=1}^\infty \bigcap_{i=n}^\infty A_i \). The events \( \bigcap_{i=n}^\infty A_i \) are increasing in \( n \) and hence \( \liminf_{n \to \infty} A_n = \bigcup_{n=k}^\infty \bigcap_{i=n}^\infty A_i \in \sigma\{A_k, A_{k+1}, \ldots\} \) for each \( k \in \N_+ \).

The event \(\{X_n \text{ converges as } n \to \infty\}\) is a tail event for a sequence of real-valued random variables \((X_1, X_2, \ldots)\).

Proof:

The Cauchy criterion for convergence (named for Augustin Cauchy of course) states that \( X_n \) converges as \( n \to \infty \) if an only if for every \( \epsilon > 0 \) there exists \( N \in \N_+ \) such that if \(m, \; n \ge N \) then \( |X_n - X_m| \lt \epsilon \). In this criterion, we can without loss of generality take \( \epsilon \) to be rational, and for a given \( k \in \N_+ \) we can insist that \( m, \; n \ge k \). With these restrictions, the Cauchy criterion is a countable intersection of events, each of which is in \( \sigma\{X_k, X_{k+1}, \ldots\} \).

The following exercise gives the Kolmogorov zero-one law, named for Andrey Kolmogorov. It states that the tail \(\sigma\)-algebra of an indpendent sequence is a sub \(\sigma\)-algebra of the essentially deterministic events.

If \(B\) is a tail event for a sequence of independent random variables \((X_1, X_2, \ldots)\) then \(\P(B) = 0\) or \(\P(B) = 1\).

Proof:

For each \(n\), \( B \in \sigma\{X_n, X_{n+1}, \ldots\} \) and hence \(\{X_1, X_2, \ldots, X_n, \bs{1}_B\}\) is an independent set of random variables. Thus \(\{X_1, X_2, \ldots, \bs{1}_B\}\) is an independent set of random variables. But \( B \in \sigma\{X_1, X_2, \ldots\} \), so it follows that the event \(B\) is independent of itself. Therefore \(\P(B) = 0\) or \(\P(B) = 1\).

From Theorems 19 and 21, note that if \((A_1, A_2, \ldots)\) is a sequence of independent events, then \(\limsup_{n \to \infty} A_n\) must have probability 0 or 1. The Borel-Cantelli lemmas give conditions for which of these is correct: