\(\newcommand{\R}{\mathbb{R}}\) \(\newcommand{\N}{\mathbb{N}}\)
  1. Virtual Laboratories
  2. 0. Foundations
  3. 1
  4. 2
  5. 3
  6. 4
  7. 5
  8. 6
  9. 7
  10. 8
  11. 9
  12. 10
  13. Answers

9. Measure Theory

In this section we discuss some topics from measure theory that are a bit more advanced than the topics in the previous sections of this chapter. However, measure-theoretic ideas are essential for a deep understanding of probability, since probability is itself a measure. The most important of the definitions is the \(\sigma\)-algebra. These play a fundamental role, even for applied probability, in encoding the state of information about a random experiment.

On the other hand, we won't be overly pedantic about measure-theoretic details in this project. Unless we say otherwise, we assume that all sets that appear are measurable (that is, members of the appropriate \(\sigma\)-algebras), and that all functions are measurable (relative to the appropriate \(\sigma\)-algebras).

Algebras and \( \sigma \)-Algebras

Suppose that \(S\) is a set, playing the role of a universal set for a particular mathematical model. It is sometimes impossible to include all subsets of \(S\) in our model, particularly when \(S\) is uncountable. In a sense, the more sets that we include, the harder it is to have consistent theories. However, we almost always want the collection of admissible subsets to be closed under the basic set operations. This leads to some important definitions.

Algebras of Sets

Suppose that \(\mathscr{S}\) is a nonempty collection of subsets of \(S\). Then \(\mathscr{S}\) is said to be an algebra (or field) if it is closed under complement and union:

  1. If \(A \in \mathscr{S}\) then \(A^c \in \mathscr{S}\).
  2. If \(A \in \mathscr{S}\) and \(B \in \mathscr{S}\) then \(A \cup B \in \mathscr{S}\).

If \(\mathscr{S}\) is an algebra of subsets of \(S\) then

  1. \( S \in \mathscr{S} \)
  2. \( \emptyset \in \mathscr{S} \)
Proof:

Since \( \mathscr{S} \) is nonempty, there exists \( A \in \mathscr{S} \). Hence \( A^c \in \mathscr{S} \) so \( S = A \cup A^c \in \mathscr{S} \). Finally, \( \emptyset = S^c \in \mathscr{S} \).

Suppose that \(\mathscr{S}\) is an algebra of subsets of \(S\) and that \(A_i \in \mathscr{S}\) for each \(i\) in a finite index set \(I\).

  1. \(\bigcup_{i \in I} A_i \in \mathscr{S}\)
  2. \(\bigcap_{i \in I} A_i \in \mathscr{S}\)
Proof:

Part (a), follows by induction on the number of elements in \(I\). For part (b), we use part (a) and DeMorgan's law. If \( A_i \in \mathscr{S} \) for \( i \in I \) then \( A_i^c \in \mathscr{S} \) for \( i \in I \). Therefore \( \bigcup_{i \in I} A_i^c \in \mathscr{S} \) and hence \( \bigcap_{i \in I} A_i = \left(\bigcup_{i \in I} A_i^c\right)^c \in \mathscr{S} \).

Thus it follows that an algebra of sets is closed under a finite number of set operations. That is, if we start with a finte number of sets in the algebra \( \mathscr{S} \), and build a new set with a finte number of set operations (union, intersection, complement), then the new set is also in \( \mathscr{S} \). However in many mathematical theories, probability in particular, this is not sufficient; we often need the collection of admissible subsets to be closed under acountable number of set operations.

\(\sigma\)-Algebras of Sets

Suppose that \(\mathscr{S}\) is a nonempty collection of subsets of \(S\). Then \(\mathscr{S}\) is said to be a \(\sigma\)-algebra (or \(\sigma\)-field) if

  1. If \(A \in \mathscr{S}\) then \(A^c \in \mathscr{S}\).
  2. If \(A_i \in S\) for each \(i\) in a countable index set \(I\), then \(\bigcup_{i \in I} A_i \in \mathscr{S}\).

Clearly a \(\sigma\)-algebra of subsets is also an algebra of subsets, so the basic results for algebras in Theorems 1 and 2 still hold. In particular, \( S \in \mathscr{S} \) and \( \emptyset \in \mathscr{S} \).

If \(A_i \in \mathscr{S}\) for each \(i\) in a countable index set \(I\), then \(\bigcap_{i \in I} A_i \in \mathscr{S}\).

Proof:

The proof is just like the one in Theorem 2. If \( A_i \in \mathscr{S} \) for \( i \in I \) then \( A_i^c \in \mathscr{S} \) for \( i \in I \). Therefore \( \bigcup_{i \in I} A_i^c \in \mathscr{S} \) and hence \( \bigcap_{i \in I} A_i = \left(\bigcup_{i \in I} A_i^c\right)^c \in \mathscr{S} \).

Thus a \(\sigma\)-algebra of subsets of \(S\) is closed under countable unions and intersections. This is the reason for the symbol \(\sigma\) in the name. As mentioned in the introductory paragraph, \( \sigma \)-algebras are of fundamental importance in mathematics generally and probability theory specifically. If \( S \) is a set and \( \mathscr{S} \) a \( \sigma \)-algebra of subsets of \( S \), then the pair \( (S, \mathscr{S}) \) is sometimes called a measurable space.

Suppose that \(S\) is a set and that \(\mathscr{S}\) is a finite algebra of subsets of \(S\). Then \(\mathscr{S}\) is also a \(\sigma\)-algebra.

Proof:

Any countable union of sets in \(\mathscr{S}\) reduces to a finite union.

However, there are algebras that are not \(\sigma\)-algebras. Here is the classic example:

The collection of finite and co-finite subsets of \( \N \) defined below is an algebra of subsets of \(\N\), but not a \(\sigma\)-algebra:

\[ \mathscr{F} = \{A \subseteq \N: A \text{ is finite or } A^c \text{ is finite}\} \]
Proof:

\( \N \in \mathscr{F} \) since \( \N^c = \emptyset \) is finite. If \( A \in \mathscr{F} \) then \( A^c \in \mathscr{F} \) by the symmetry of the definition. Suppose that \( A, \; B \in \mathscr{F} \). If \( A \) and \( B \) are both finite then \( A \cup B \) is finite. If \( A^c \) or \( B^c \) is finite, then \( (A \cup B)^c = A^c \cap B^c \) is finite. In either case, \( A \cup B \in \mathscr{F} \). Thus \( \mathscr{F} \) is an algebra of subsets of \( \N \).

Let \( A_n = \{2 \, n\} \) for \( n \in \N \). Then \( A_n \) is finite, so \( A_n \in \mathscr{F} \) for each \( n \in \N \). Let \( E = \bigcup_{n=0}^\infty A_n \), the set of event numbers. Note that \( E \) and \( E^c \) are infinite, so \( E \notin \mathscr{F} \). Thus \( \mathscr{F} \) is not a \( \sigma \)-algebra.

General Constructions

Recall that \(\mathscr{P}(S)\) denotes the collection of all subsets of \(S\), called the power set of \(S\). Trivially, \(\mathscr{P}(S)\) is a the largest \(\sigma\)-algebra of \(S\), and as noted above, is sometimes too large to be useful. At the other extreme, the smallest \(\sigma\)-algebra of \(S\) is given in the following exercise.

The collection \(\{\emptyset, S\}\) is a \(\sigma\)-algebra.

In many cases, we want to construct a \(\sigma\)-algebra that contains certain basic sets. The following exercises show how to do this.

Suppose that \(\mathscr{S}_i\) is a \(\sigma\)-algebras of subsets of \(S\) for each \(i\) in a nonempty index set \(I\). Then \( \mathscr{S} = \bigcap_{i \in I} \mathscr{S}_i\) also a \(\sigma\)-algebra of subsets of \(S\).

Proof:

The proof is completely straightforward. First, \( S \in \mathscr{S}_i \) for each \( i \in I \) so \( S \in \mathscr{S} \). If \( A \in \mathscr{S} \) then \( A \in \mathscr{S}_i \) for each \( i \in I \) and hence \( A^c \in \mathscr{S}_i \) for each \( i \in I \). Therefore \( A^c \in \mathscr{S} \). Finally suppose that \( A_j \in \mathscr{S} \) for each \( j \) in a countable index set \( J \). Then \( A_j \in \mathscr{S}_i \) for each \( i \in I \) and \( j \in J \) and therefore \( \bigcup_{j \in J} A_j \in \mathscr{S}_i \) for each \( i \in I \). It follows that \( \bigcup_{j \in J} A_j \in \mathscr{S} \).

Suppose now that \(\mathscr{B}\) is a collection of subsets of \(S\). Think of the sets in \(\mathscr{B}\) as basic sets; but in general \(\mathscr{B}\) will not be a \(\sigma\)-algebra. The \(\sigma\)-algebra generated by \(\mathscr{B}\) is the intersection of all \(\sigma\)-algebras that contain \(\mathscr{B}\), which by the previous exercise really is a \(\sigma\)-algebra:

\[\sigma(\mathscr{B}) = \bigcap \{\mathscr{S}: \mathscr{S} \text{ is a } \sigma\text{-algebra of subsets of } S \text{ and } \mathscr{B} \subseteq \mathscr{S}\}\]

Note that the collection of \( \sigma \)-algebras in the intersection is not empty, since \( \mathscr{P}(S) \) is in the collection.

The \(\sigma\)-algebra \(\sigma(\mathscr{B})\) is the smallest \(\sigma\) algebra containing \(\mathscr{B}\).

  1. \(\mathscr{B} \subseteq \sigma(\mathscr{B})\)
  2. If \(\mathscr{S}\) is a \(\sigma\)-algebra of subsets of \(S\) and \(\mathscr{B} \subseteq \mathscr{S}\) then \(\sigma(\mathscr{B}) \subseteq \mathscr{S}\).

Note that the conditions in the previous theorem completely characterize \( \sigma(\mathscr{B}) \). If \( \mathscr{S}_1 \) and \( \mathscr{S}_2 \) satisfy the conditions, then by (a), \( \mathscr{B} \subseteq \mathscr{S}_1 \) and \( \mathscr{B} \subseteq \mathscr{S}_2 \). But then by (b), \( \mathscr{S}_1 \subseteq \mathscr{S}_2 \) and \( \mathscr{S}_2 \subseteq \mathscr{S}_1\).

If \(A\) is a subset of \(S\) then \(\sigma\{A\} = \{\emptyset, A, A^c, S\}\)

We can generalize the previous result. Recall that a collection of subsets \( \mathscr{A} = \{A_i: i \in I\} \) is a partition of \( S \) if \( A_i \cap A_j = \emptyset \) for \( i, \; j \in I \) with \( i \ne j \), and \( \bigcup_{i \in I} A_i = S \).

Suppose that \( \mathscr{A} = \{A_i: i \in I\} \) is a countable partition of \( S \). Then \( \sigma(\mathscr{A}) \) is the collection of all unions of sets in \( \mathscr{A} \). That is,

\[ \sigma(\mathscr{A}) = \left\{ \bigcup_{j \in J} A_j: J \subseteq I \right\} \]
Proof:

Let \( \mathscr{S} = \left\{ \bigcup_{j \in J} A_j: J \subseteq I \right\} \). Note that \( S \in \mathscr{S} \) since \( S = \bigcup_{i \in I} A_i \). Next, suppose that \( B \in \mathscr{S} \). Then \( B = \bigcup_{j \in J} A_j \) for some \( J \subseteq I \). But then \( B^c = \bigcup_{j \in J^c} A_j \), so \( B^c \in \mathscr{S} \). Next, suppose that \( B_k \in \mathscr{S} \) for \( k \in K \) where \( K \) is a countable index set. Then for each \( k \in K \) there exists \( J_k \subseteq I \) such that \( B_k = \bigcup_{j \in J_k} A_j \). But then \( \bigcup_{k \in K} B_k = \bigcup_{k \in K} \bigcup_{j \in J_k} A_j = \bigcup_{j \in J} A_j \) where \( J = \bigcup_{k \in K} J_k \). Hcnce \( \bigcup_{k \in K} B_k \in \mathscr{S} \). Therefore \( \mathscr{S} \) is a \( \sigma \)-algebra of subsets of \( S \). Trivially, \( \mathscr{A} \subseteq \mathscr{S} \). If \( \mathscr{T} \) is a \( \sigma \)-algebra of subsets of \( S \) and \( \mathscr{A} \subseteq \mathscr{T} \), then clearly \( \bigcup_{j \in J} A_j \in \mathscr{T} \) for every \( J \subseteq I \). Hence \( \mathscr{S} \subseteq \mathscr{T}\).

Note that if \( A_i \ne \emptyset \) for \( i \in I \) then the unions in \( \sigma(\mathscr{A}) \) are distinct. That is, if \( J, \; K \subseteq I \) and \( J \ne K \) then \( \bigcup_{j \in J} A_j \ne \bigcup_{k \in K} A_k \). In particular, if there are \( n \) nonempty sets in \( \mathscr{A} \), so that \( \#(I) = n \), then there are \( 2^n \) subsets of \( I \) and hence \( 2^n \) sets in \( \sigma(\mathscr{A}) \).

Suppose that \(A\) and \(B\) are subsets of \(S\). List the 16 (in general distinct) sets in \(\sigma\{A, B\}\). Open the Venn diagram applet and check your answer.

Now let's generalize the last exercise from two events to \( n \) events. Thus, suppose that \( \mathscr{A} = \{A_1, A_2, \ldots, A_n\} \) is a collection of \(n\) subsets of \(S\). To describe the \( \sigma \)-algebra generated by \( \mathscr{A} \) we need a bit more notation. For \( x \in \{0, 1\}^n \) (a bit string of length \( n \)), let \( B_x = \bigcap_{i=1}^n A_i^{x_i} \) where \( A_i^1 = A_i \) and \( A_i^0 = A_i^c \).

\( \mathscr{B} = \{B_x: x \in \{0, 1\}^n\} \) partitions \( S \), and moroever,

\[ A_i = \bigcup\left\{B_x: x \in \{0, 1\}^n, \; x_i = 1\right\}, \quad i \in \{1, 2, \ldots, n\} \]
Proof:

Suppose that \( x, \; y \in \{0, 1\}^n \) and that \( x \ne y \). Without loss of generality we can suppose that for some \( j \in \{1, 2, \ldots, n\} \), \(x_j = 0 \) while \( y_j = 1 \). Then \( B_x \subseteq A_j^c \) and \( B_y \subseteq A_j \) so \( B_x \) and \( B_y \) are disjoint. Suppose that \( s \in S \). Construct \( x \in \{0, 1\}^n \) by \( x_i = 1 \) if \( s \in A_i \) and \( x_i = 0 \) if \( s \notin A_i \), for each \( i \in \{1, 2, \ldots, n\} \). Then by definition, \( s \in B_x \). Hence \( \mathscr{B} \) partitions \( S \).

Again if \( x \in \{0, 1\}^n \) and \( x_i = 1 \) then \( B_x \subseteq A_i \). Conversely, if \( s \in A \), define \( x \in \{0, 1\}^n \) by \( x_j = 1 \) if \( s \in A_j \) and \( x_j = 0 \) if \( x \notin A_j \). Then \( x_i = 1 \) and \( s \in B_x \).

In the setting above, \( \sigma(\mathscr{A}) = \sigma(\mathscr{B}) = \left\{\bigcup_{x \in J} B_x: J \subseteq \{0, 1\}^n\right\} \).

Proof:

Clearly, every \( \sigma \)-algebra of subsets of \( S \) that contains \( \mathscr{A} \) must also contain \( \mathscr{B} \), and every \( \sigma \)-algebra of subsets of \( S \) that contains \( \mathscr{B} \) must also contain \( \mathscr{A} \). It follows that \( \sigma(\mathscr{A}) = \sigma(\mathscr{B}) \). The characterization in terms of unions now follows from Theorem 10.

Recall that there are \( 2^n \) bit strings of length \( n \). The sets in \( \mathscr{A} \) are said to be in general position if the sets in \( \mathscr{B} \) are distinct (and hence there are \( 2^n \) of them) and are nonempty. In this case, there are \( 2^{2^n} \) sets in \( \sigma(\mathscr{A}) \).

Sketch a Venn diagram with sets \( A_1, \; A_2, \; A_3 \) in general position. Identify the set \( B_x \) for each \( x \in \{0, 1\}^3 \).

If a \( \sigma \)-algebra is generated by a collection of basic sets, then each set in the \( \sigma \)-algebra is generated by a countable number of the basic sets.

Suppose that \( S \) is a set and \( \mathscr{B} \) a nonempty collection of subsets of \( S \). Then

\[ \sigma(\mathscr{B}) = \{A \subseteq S: A \in \sigma(\mathscr{C}) \text{ for some countable } \mathscr{C} \subseteq \mathscr{B}\} \]
Proof:

Let \( \mathscr{S} \) denote the collection on the right. We first show that \( \mathscr{S} \) is a \( \sigma \)-algebra. First, pick \( B \in \mathscr{B} \), which we can do since \( \mathscr{B} \) is nonempty. Then \( S \in \sigma\{B\} \) so \( S \in \mathscr{S} \). Let \( A \in \mathscr{S} \) so that \( A \in \sigma(\mathscr{C}) \) for some countable \( \mathscr{C} \subseteq \mathscr{B} \). Then \( A^c \in \sigma(\mathscr{C}) \) so \( A^c \in \mathscr{S} \). Finally, suppose that \( A_i \in \mathscr{S} \) for \( i \) in a countable index set \( I \). The for each \( i \in I \), there exists a countable \( \mathscr{C}_i \subseteq \mathscr{B} \) such that \( A_i \in \sigma(\mathscr{C}_i) \). But then \( \bigcup_{i \in I} \mathscr{C}_i \) is also countable and \( \bigcup_{i \in I} A_i \in \sigma\left(\bigcup_{i \in I} \mathscr{C}_i \right) \). Hence \( \bigcup_{i \in I} A_i \in \mathscr{S} \).

Next if \( B \in \mathscr{B} \) then \( B \in \sigma\{B\} \) so \( B \in \mathscr{S} \). Hence \( \sigma(\mathscr{B}) \subseteq \mathscr{S} \). Conversely, if \( A \in \sigma(\mathscr{C}) \) for some countable \( \mathscr{C} \subseteq \mathscr{B} \) then trivially \( A \in \sigma(\mathscr{B}) \).

Suppose that \(S\) is a set with \(\sigma\)-algebra \(\mathscr{S}\), and that \(R \subseteq S\). Then \(\mathscr{R} = \{A \cap R: A \in \mathscr{S}\}\) is a \(\sigma\)-algebra of subsets of \(R\). When \(R \in \mathscr{S}\) (which is usually the case), note that \(\mathscr{R} = \{B \in \mathscr{S}: B \subseteq R\}\). In any event, \(\mathscr{R}\) is the \(\sigma\)-algebra on \(R\) induced by \(\mathscr{S}\).

Proof:

First, \( S \in \mathscr{S} \) and \( S \cap R = R \) so \( R \in \mathscr{R} \). Next suppose that \( B \in \mathscr{R} \). Then there exists \( A \in \mathscr{S} \) such that \( B = A \cap R \). But then \( A^c \in \mathscr{S} \) and \( R \setminus B = R \cap B^c = R \cap A^c \), so \( R \setminus B \in \mathscr{R} \). Finally, suppose that \( B_i \in \mathscr{R} \) for \( i \) in a countable index set \( I \). For each \( i \in I \) there exists \( A_i \in \mathscr{S} \) such that \( B_i = A_i \cap R \). But then \( \bigcup_{i \in I} A_i \in \mathscr{S} \) and \( \bigcup_{i \in I} B_i = \left(\bigcup_{i \in I} A_i \right) \cap R \), so \( \bigcup_{i \in I} B_i \in \mathscr{R} \).

Compare the following example with Example 5:

Let \( S \) be a nonempty set. The collection of countable and co-countable subsets of \( S \) is a \( \sigma \)-algebra:

\[ \mathscr{C} = \{A \subseteq S: A \text{ is countable or } A^c \text{ is countable}\} \]

Moreover \( \mathscr{C} = \sigma\{\{x\}: x \in S\} \), the \( \sigma \)-algebra generated by the singleton sets.

Proof:

First, \( S \in \mathscr{C} \) since \( S^c = \emptyset \) is countable. If \( A \in \mathscr{C} \) then \( A^c \in \mathscr{C} \) by the symmetry of the definition. Suppose that \( A_i \in \mathscr{C} \) for each \( i \) in a countable index set \( I \). If \( A_i \) is countable for each \( i \in I \) then \( \bigcup_{i \in I} A_i \) is countable. If \( A_j^c \) is countable for some \( j \in I \) then \( \left(\bigcup_{i \in I} A_i \right)^c = \bigcap_{i \in I} A_i^c \subseteq A_j \) is countable. In either case, \( \bigcup_{i \in I} A_i \in \mathscr{C} \).

Let \( \mathscr{D} = \sigma\{\{x\}: x \in S\} \). Clearly \( \{x\} \in \mathscr{C} \) for \( x \in S \). Hence \( \mathscr{D} \subseteq \mathscr{C} \). Conversely, suppose that \( A \in \mathscr{C} \). If \( A \) is countable, then \( A = \bigcup_{x \in S} \{x\} \in \mathscr{D} \). If \( A^c \) is countable, then by an identical argument, \( A^c \in \mathscr{D} \) and hence \( A \in \mathscr{D} \).

Of course, if \( S \) is itself is countable then \( \mathscr{C} = \mathscr{P}(S) \). On the other hand, if \( S \) is uncountable, then there exists \( A \subseteq S \) such that \( A \) and \( A^c \) are uncountable. Thus, \( A \notin \mathscr{C} \), but \( A = \bigcup_{x \in A} \{x\} \), and of course \( \{x\} \in \mathscr{C} \). Thus, we have an example of a \( \sigma \)-algebra that is not closed under general unions.

Measurable Functions

Recall that a set usually comes with a \(\sigma\)-algebra of admissible subsets. Thus, suppose that \(S\) and \(T\) are sets with \(\sigma\)-algebras \(\mathscr{S}\) and \(\mathscr{T}\), respectively. If \(f: S \to T\), then a natural requirement is that the inverse image of any admissible subset of \(T\) be an admissible subset of \(S\). Formally \(f\) is said to be measurable if \(f^{-1}(A) \in \mathscr{S}\) for all \(A \in \mathscr{T}\). Measurability is preserved under composition, the most important method for combining functions.

Suppose that \(R\), \(S\), and \(T\) are sets with \(\sigma\)-algebras \(\mathscr{R}\), \(\mathscr{S}\), and \(\mathscr{T}\), respectively. If \(f: R \to S\) is measurable and \(g: S \to T\) is measurable, then \(g \circ f: R \to T\) is measurable.

Proof:

If \( A \in \mathscr{T} \) then \( g^{-1}(A) \in \mathscr{S} \) since \( g \) is measurable, and hence \( (g \circ f)^{-1}(A) = f^{-1}[g^{-1}(A)] \in \mathscr{R} \) since \( f \) is measurable.

If \( T \) is given the smallest possible \( \sigma \)-algebra or if \( S \) is given the largest one, then any function from \( S \) into \( T \) is measurable.

If \( \mathscr{T} = \{\emptyset, T\} \) or if \( \mathscr{S} = \mathscr{P}(S) \) then every \( f: S \to T \) is measurable.

Proof:

Let \( f: S \to T \). Suppose first that \( \mathscr{T} = \{\emptyset, T\} \) and that \( \mathscr{S} \) is an arbitrary \( \sigma \)-algebra on \( S \). The \( f^{-1}(T) = S \in \mathscr{S} \) and \( f^{-1}(\emptyset) = \emptyset \in \mathscr{S} \) so \( f \) is measurable. Next suppose that \( \mathscr{S} = \mathscr{P}(S) \) and that \( \mathscr{T} \) is an arbitrary \( \sigma \)-algebra on \( T \). Then trivially \( f^{-1}(A) \in \mathscr{S} \) for every \( A \in \mathscr{T} \) so again \( f \) is measurable.

Suppose that \(f: S \to T\), and that \(\mathscr{T}\) is a \(\sigma\)-algebra of subsets of. The collection \(\sigma(f) = \{f^{-1}(A): A \in \mathscr{T}\}\) is a \(\sigma\)-algebra of subsets of \(S\), called the \(\sigma\)-algebra generated by \(f\).

Proof:

The key to the proof is that the inverse image preserves all set operations. First, \( S \in \sigma(f) \) since \( T \in \mathscr{T} \) and \( f^{-1}(T) = S \). If \( B \in \sigma(f) \) then \( B = f^{-1}(A) \) for some \( A \in \mathscr{T} \). But then \( A^c \in \mathscr{T} \) and hence \( B^c = f^{-1}(A^c) \in \sigma(f) \). Finally, suppose that \( B_i \in \sigma(f) \) for \( i \) in a countable index set \( I \). Then for each \( i \in I \) there exists \( A_i \in \mathscr{T} \) such that \( B_i = f^{-1}(A_i) \). But then \( \bigcup_{i \in I} A_i \in \mathscr{T} \) and \( \bigcup_{i \in I} B_i = f^{-1}\left(\bigcup_{i \in I} A_i \right) \). Hence \( \bigcup_{i \in I} B_i \in \sigma(f) \).

The \(\sigma\)-algebra generated by \(f\) is the smallest \(\sigma\)-algebra on \(S\) that makes \(f\) measurable (relative to the given \(\sigma\)-algebra on \(T\)). More generally, suppose that \(T_i\) is a set with \(\sigma\)-algebra \(\mathscr{T}_i\) for each \(i\) in a nonempty index set \(I\), and that \(f_i: S \to T_i\) for each \(i \in I\). The \(\sigma\)-algebra generated by this collection of functions is

\[ \sigma\{f_i: i \in I\} = \sigma\{f^{-1}(A): i \in I, \, A \in \mathscr{T}_i\} \]

Again, this is the smallest \(\sigma\)-algebra on \(S\) that makes \(f_i\) measurable for each \(i \in I\).

When there are several \( \sigma \)-algebras for the same set, then we use the phrase with respect to so that we can be precise. If a function is measurable with respect to a given \( \sigma \)-algebra on its domain, then it's measurable with respect to any larger \( \sigma \)-algebra on this space. If the function is measurable with respect to a \( \sigma \)-algebra on the range space then its measurable with respect to any smaller \( \sigma \)-algebra on this space.

Suppose that \( S \) has \( \sigma \)-algebras \( \mathscr{R} \) and \( \mathscr{S} \) with \( \mathscr{R} \subseteq \mathscr{S} \), and that \( T \) has \( \sigma \)-algebras \( \mathscr{U} \) and \( \mathscr{T} \) with \( \mathscr{U} \subseteq \mathscr{T} \). If \( f: S \to T \) is measurable with respect to \( \mathscr{R} \) and \( \mathscr{T} \), then \( f \) is measureable with respect to \( \mathscr{S} \) and \( \mathscr{U} \).

Proof:

If \( A \in \mathscr{U} \) then \( A \in \mathscr{T} \). Hence \( f^{-1}(A) \in \mathscr{R} \) so \( f^{-1}(A) \in \mathscr{S} \).

Product Sets

Product sets arise naturally in the form of the higher-dimensional Euclidean spaces \( \R^n \) for \( n \in \{2, 3, \ldots\} \). In addition, product spaces are particularly important in probability, where they are used to describe the spaces associated with sequences of random variables. We start with the product of two sets; the generalization to \( n \) sets and even to a countably infinite sequence of sets is straightforward. Thus, suppose that \( S \) and \( T \) are sets with \( \sigma \)-algebras \( \mathscr{S} \) and \( \mathscr{T} \), respectively. For the product set \( S \times T \) we usually use the \(\sigma\)-algebra generated by the collection of all products of measurable sets:

\[\mathscr{S} \otimes \mathscr{T} = \sigma\{A \times B: A \in \mathscr{S}, \; B \in \mathscr{T}\} \]

This is known as the product \( \sigma \)-algebra and. There is a sort of converse to the construction that states that if a set is in the product \( \sigma \)-algebra, then its cross sections are in the original \( \sigma \)-algebras.

If \( C \in \mathscr{S} \otimes \mathscr{T} \) then

  1. \( C_x = \{y \in T: (x, y) \in C \} \in \mathscr{T} \) for each \( x \in S \)
  2. \( C^y = \{x \in S: (x, y) \in C \} \in \mathscr{S} \) for each \( y \in S \)
Proof:

The two results are symmetric, so we will prove (a). Fix \( x \in S \) and let \( \mathscr{U} = \{C \subseteq S \times T: C_x \in \mathscr{T}\} \). First note that if \( A \in \mathscr{S} \) and \( B \in \mathscr{B} \) then \( (A \times B)_x \) is \( B \) if \( x \in A \) and is \( \emptyset \) if \( x \notin A \). Thus \( A \times B \in \mathscr{U} \), and in particular \( S \times T \in \mathscr{U} \). Cross sections are preserved by all of the set operations, so \( \mathscr{U} \) is a \( \sigma \)-algebra of subsets of \( S \times T \). It follows that \( \mathscr{S} \otimes \mathscr{T} \subseteq \mathscr{U}\)

As a simple corollary of the previous theorem, note that if \( A \subseteq S \), \( B \subseteq T \) and \( A \times B \in \mathscr{S} \otimes \mathscr{T} \) then \( A \in \mathscr{S} \) and \( B \in \mathscr{T} \). That is, the only measurable product sets are products of measurable sets. On the hand, the following exercise gives an example of a nonmeasurable subset of a product space that has measurable cross sections.

Suppose that \( S \) is an uncoutable set with the \( \sigma \)-algebra \( \mathscr{C} \) of countable and co-countable subsets, as in Example 17. Consider \( S \times S \) with the product \( \sigma \)-algebra \( \mathscr{C} \otimes \mathscr{C} \). Then the diagonal \( D = \{(x, x): x \in S\} \notin \mathscr{S} \otimes \mathscr{S} \), but \( D_x = \{x\} \in \mathscr{S} \) and \( D^y = \{y\} \in \mathscr{S} \) for each \( x, \; y \in S \).

The following result is a functional version of Theorem 23. In addition to our product space \( (S \times T, \mathscr{S} \otimes \mathscr{T}) \) as above, suppose that we have another mearuable space \( (U, \mathscr{U}) \)

Suppose that \( f: S \times T \to U \) is measurable. Then the cross sectional functions \( f_x \) and \( f^y \) are measurable for each \( x \in S \) and \( y \in T \):

  1. \( f_x: T \to U \) defined by \( f_x(y) = f(x, y) \) for \( y \in T \)
  2. \( f^y: S \to U \) defined by \( f^y(x) = f(x, y) \) for \( x \in S \)
Proof:

Again, the results are symmetric, so we will prove part (a). Let \( x \in S \) and \( C \in \mathscr{U} \). Then

\[ f _x^{-1}(C) = \{y \in T: f_x(y) \in C\} = \{y \in T: f(x, y) \in C\} = [f^{-1}(C)]_x \]

But \( f^{-1}(C) \in \mathscr{S} \otimes \mathscr{T} \) since \( f \) is measurable, and hence \( [f^{-1}(C)]_x \in \mathscr{S} \) by Theorem 23.

We now look at the complement to the previous result, when the product space is the co-domain of the function, rather than the domain. Thus, in addition to our product space \( (S \times T, \mathscr{S} \otimes \mathscr{T}) \), suppose that we have a another measurable space \( (R, \mathscr{R}) \).

Suppose that \( f : R \to S \times T \), so that \( f = (g, h) \) where \( g: R \to S \) and \( h: R \to T \) are the coordinate functions. Then \( f \) is measurable if and only if \( g \) and \( h \) are measurable.

Proof:

Suppose that \( f \) is measurable. If \( A \in \mathscr{S} \) then

\[ g^{-1}(A) = \{x \in R: g(x) \in A \} = \{x \in R: g(x) \in A, \; h(x) \in T\} = f^{-1}(A \times T) \in \mathscr{R} \]

and hence \( g \) is measurable. By a symmetric argument, \( h \) is measurable. Suppose now that \( g \) and \( h \) are measurable. Let \( \mathscr{U} = \{C \subseteq S \times T: f^{-1}(C) \in \mathscr{R}\} \). For \( A \in \mathscr{S} \) and \( B \in \mathscr{T} \),

\[ f^{-1}(A \times B) = \{x \in R: f(x) \in A \times B\} = \{x \in R: g(x) \in A, \; h(x) \in B\} = g^{-1}(A) \cap h^{-1}(B) \in \mathscr{R} \]

since \( g \) and \( h \) are measurable. In particular, \( S \times T \in \mathscr{U} \). If \( C \in \mathscr{U} \) then \( f^{-1}(C^c) = [f^{-1}(C)]^c \in \mathscr{U} \). If \( C_i \in \mathscr{U} \) for \( i \) in a countable index set \( I \) then \( f^{-1} \left(\bigcup_{i \in i} C_i \right) = \bigcup_{i \in I} f^{-1}(C_i) \in \mathscr{U} \). Hence \(\mathscr{U} \) is a \( \sigma \)-algebra and so \( \mathscr{S} \otimes \mathscr{T} \subseteq \mathscr{U} \).

The results for products of two spaces generalize in a completely straightforward way to a product of \( n \) spaces. Thus, suppose that \( S_i \) is a set with \( \sigma \)-algebra \( \mathscr{S}_i \) for each \( i \in \{1, 2, \ldots, n\} \). For the product set \( S_1 \times S_2 \times \cdots \times S_n \), we usually use the product \( \sigma \)-algebra generated by products of measurable sets:

\[ \mathscr{S}_1 \otimes \mathscr{S}_2 \otimes \cdots \otimes \mathscr{S}_n = \sigma\{ A_1 \times A_2 \times \cdots \times A_n: A_i \in \mathscr{S}_i \text{ for all } i \in \{1, 2, \ldots, n\}\} \]

Results analogous to Theorems 22, 24, and 25 hold. We can also extend these ideas to an infinite product. Thus, suppose that \( S_i \) is a set with \( \sigma \)-algebra \(\mathscr{S}_i\) for each \(i \in \N_+\). For the product set \( S = S_1 \times S_2 \times \cdots \) we usually use the product \(\sigma\)-algebra generated by the collection of all cylinder sets:

\[ \mathscr{S} = \sigma\{A_1 \times A_2 \times \cdots \times A_n \times S_{n+1} \times S_{n+2} \cdots: n \in \mathbb{N}_+ \text{ and } A_i \in \mathscr{S}_i \text{ for all } i \in \{1, 2, \dots, n\}\} \]

Once again, results analogous to Theorems 22, 24, and 25 hold.

Special Cases

Most of the sets encountered in applied probability are either countable, or subsets of \(\R^n\) for some \(n\), or more generally, subsets of a product of a countable number of sets of these types. In this subsection, we will explore some of theses special cases.

If \(S\) is countable, we usually use the power set \(\mathscr{P}(S)\) as the basic \(\sigma\)-algebra. Thus, all sets are admissible and every function on \( S \) is measurable. The next important case is \( \R \), the set of real numbers.

Each of the following collections generate the same \( \sigma \)-algebra of subsets of \( \R \).

  1. \( \mathscr{B}_1 = \{I \subseteq \R: I \text{ is an interval } \} \)
  2. \( \mathscr{B}_2 = \{(a, b]: a \in \R, \; b \in \R, \; a \lt b \}\)
  3. \( \mathscr{B}_3 = \{(-\infty, b]: b \in \R \} \)
Proof:

The proof involves showing that each set in any one of the collections is in the \( \sigma \)-algebra of any other collection. For example, if \( a, \; b \in \R \) with \( a \lt b \) then \( (a, b] = (-\infty, b] - (-\infty, a] \). On the other hand, \( [a, b] = \bigcap_{n=1}^\infty (a - \frac{1}{n}, b] \), and \( (a, b) = \bigcup_{n=1}^\infty (a, b - \frac{1}{n}] \). Fill in the remaining details.

The common \( \sigma \)-algebra generated by these collections is called the Borel \(\sigma\)-algebra, named after Émile Borel. All of the real-valued elementary functions are measurable. The elementary functions include algebraic functions (which in turn include the polynomial and rational functions), the usual transcendental functions (exponential, logarithm, trigonometric), and the usual functions constructed from these.

For \(\R^n\), we usually use the \( n \)-fold product \( \sigma \)-algebra corresponding to the Borel \( \sigma \)-algebra on \( \R \). This is the Borel \(\sigma\)-algebra for \(\R^n\). Equivalently, the Borel \( \sigma \)-algebra on \( \R^n \) is generated by \( n \)-fold products of sets in any of the collections in Theorem 26.