\(\newcommand{\var}{\text{var}}\) \(\newcommand{\sd}{\text{sd}}\) \(\renewcommand{\P}{\mathbb{P}}\) \(\newcommand{\E}{\mathbb{E}}\) \(\newcommand{\R}{\mathbb{R}}\) \(\newcommand{\N}{\mathbb{N}}\)
  1. Virtual Laboratories
  2. 3. Expected Value
  3. 1
  4. 2
  5. 3
  6. 4
  7. 5
  8. 6

4. Generating Functions

As usual, our starting point is a random experiment with probability measure \(\P\) on an underlying sample space. A generating function of a random variable is an expected value of a certain transformation of the variable. Most generating functions share four important properties:

  1. Under mild conditions, the generating function completely determines the distribution of the random variable.
  2. The generating function of a sum of independent variables is the product of the generating functions
  3. The moments of the random variable can be obtained from the derivatives of the generating function.
  4. Ordinary (pointwise) convergence of a sequence of generating functions corresponds to the special convergence of the corresponding distributions.

Property 1 is most important. Often a random variable is shown to have a certain distribution by showing that the generating function has a certain form. The process of recovering the distribution from the generating function is known as inversion. Property 2 is frequently used to determine the distribution of a sum of independent variables. By contrast, recall that the probability density function of a sum of independent variables is the convolution of the individual density functions, a much more complicated operation. Property 3 is useful because often computing moments from the generating function is easier than computing the moments directly from the definition. The last property is known as the continuity theorem. Often it is easer to show the convergence of the generating functions than to prove convergence of the distributions directly.

Basic Theory

The Probability Generating Function

Suppose that \(N\) is a random variable taking values in \(\N\). The probability generating function \(G\) of \(N\) is defined as follows, for all values \(t \in \R\) for which the expected value exists:

\[ G(t) = \E(t^N) \]

Let \(f\) denote the probability density function of \(N\), so that \(f(n) = \P(N = n)\) for \(n \in \N\).

The probability generating function can be obtained from the probability density function as follows:

\[ G(t) = \sum_{n=0}^\infty f(n) t^n \]
Proof:

This follows from the discrete change of variables theorem for expected value.

Thus, \(G(t)\) is a power series in \(t\), with the values of the probability density function as the coefficients. In the language of combinatorics, \(G\) is the ordinary generating function of \(f\). Recall from calculus that there exists \(r \in [0, \infty]\) such that the series converges absolutely for \(|t| \lt r\) and diverges for \(|t| \gt r\). The number \(r\) is the radius of convergence of the series.

\(G(1) = 1\) and hence \(r \ge 1\).

Proof:

\( G(1) = \E(1^N) = \sum_{n=0}^\infty f(n) = 1 \)

Recall from calculus that a power series can be differentiated term by term, just like a polynomial. Each derivative series has the same radius of convergence as the original series. We denote the derivative of order \(n\) by \(G^{(n)}\). Recall also that if \(n \in \N\) and \(k \in \N\) with \(k \le n\), then the number of permutations of size \(k\) chosen from a population of \(n\) objects is

\[ n^{(k)} = n (n - 1) \cdots (n - k + 1) \]

The following theorem is the inversion result for probability generating functions.

The probability generating function \(G\) completely determines the distribution of \(N\).

\[ f(k) = \frac{G^{(k)}(0)}{k!}, \quad k \in \N \]
Proof:

This is a standard result from the theory of power series: \( G^{(k)}(t) = \sum_{n=k}^\infty n^{(k)} f(n) t^{n-k} \) for \( t \in (-r, r) \). Hence \( G^{(k)}(0) = k^{(k)} f(k) = k! f(k) \)

Our next result is not particularly important, but has a certain curiosity.

\(\P(N \text{ is even}) = \frac{1}{2}[1 + G(-1)]\).

Proof:

Note that

\[ G(1) + G(-1) = \sum_{n=0}^\infty f(n) + \sum_{n=0}^\infty (-1)^n f(n) = 2 \sum_{k=0}^\infty f(2 k) = 2 \P(N \text{ is even }) \]

Recall that the factorial moment of \( N \) of order \( k \in \N \) is \( \E[N^{(k)}] \). The factorial moments can be computed from the derivatives of the probability generating function. The factorial moments, in turn, determine the ordinary moments about 0.

Suppose that the radius of convergence satisfies \(r \gt 1\). Then \(G^{(k)}(1) = \E[N^{(k)}]\) for \(k \in \N\). In particular, \(N\) has finite moments of all orders.

Proof:

As before, \( G^{(k)}(t) = \sum_{n=k}^\infty n^{(k)} f(n) t^{n-k} \) for \( t \in (-r, r) \). Hence if \( r \gt 1 \) then \( G^{(k)}(1) = \sum_{n=k}^\infty n^{(k)} f(n) = \E[N^{(k)}] \)

The mean and variance can be computed from the probability generating function as follows:

  1. \(\E(N) = G^\prime(1)\)
  2. \(\var(N) = G^{\prime \prime}(1) + G^\prime(1)[1 - G^\prime(1)]\)
Proof:

Part (a) is immediate since \( N^{(1)} = N \). For part (b), note that \( \E(N^2) = \E[N (N - 1)] + \E(N) = \E[N^{(2)}] + \E(N) \).

Suppose that \(N_1\) and \(N_2\) are independent random variables taking values in \(\N\), with probability generating functions \(G_1\) and \(G_2\) respectively. Then the probability generating function of \(N_1 + N_2\) is \(G(t) = G_1(t) G_2(t)\).

Proof:

Recall that the expected product of independent variables is the product of the expected values. Hence

\[ G(t) = \E(t^{N_1 + N_2}) = \E(t^{N_1} t^{N_2}) = \E(t^{N_1}) \E(t^{N_2}) = G_1(t) G_2(t) \]

The Moment Generating Function

Suppose that \(X\) is a real-valued random variable. The moment generating function of \(X\) is the function \(M\) defined by

\[ M(t) = \E(e^{tX}), \quad t \in \R \]

Note that since \(e^{t x} \ge 0\) with probability 1, \(M(t)\) exists, as a real number or \(\infty\), for any \(t \in \R\).

Suppose that \(X\) has a continuous distribution on \(\R\) with probability density function \(f\). Then

\[ M(t) = \int_{-\infty}^\infty e^{t x} f(x) dx \]
Proof:

This follows from the change of variables theorem for expected value.

Thus, the moment generating function of \(X\) is closely related to the Laplace transform of the probability density function \(f\). The Laplace transform is named for Pierre Simon Laplace, and is widely used in many areas of applied mathematics. The basic inversion theorem for moment generating functions (similar to the inversion theorem for Laplace transforms) states that if \(M(t) \lt \infty\) for \(t\) is some open interval about 0, then \(M\) completely determines the distribution of \(X\). Thus, if two distributions on \(\R\) have moment generating functions that are equal (and finite) in an open interval about 0, then the distributions are the same.

Suppose that \(X\) has moment generating function \(M\) and that \(M\) is finite in some open interval \( I \) about 0. Then \(X\) has moments of all orders and

\[ M(t) = \sum_{n=0}^\infty \frac{\E(X^n)}{n!} t^n, \quad t \in I \]
Proof:

Under the hypotheses, the expected value perator can be interchanged with the infinite series for the exponential function:

\[ M(t) = \E(e^{t X}) = \E\left(\sum_{n=0}^\infty \frac{X^n}{n!} t^n\right) = \sum_{n=0}^\infty \frac{\E(X^n)}{n!} t^n, \quad t \in I \]

\(M^{(n)}(0) = \E(X^n)\) for \(n \in \N\)

Proof:

This follows by the same argument as Theorem 3: \( M^{(n)}(0) / n! \) is the coefficient of order \( n \) in the power series, namely \( \E(X^n) / n! \).

Thus, the derivatives of the moment generating function at 0 determine the moments of the variable (hence the name). In the language of combinatorics, the moment generating function is the exponential generating function of the sequence of moments. Thus, a random variable that does not have finite moments of all orders cannot have a finite moment generating function. Even when a random variable does have moments of all orders, the moment generating function may not exist. A counterexample is given below.

Next we consider what happens to the moment generating function under some simple transformations of the random variables.

Suppose that \(X\) is a real-valued random variable with moment generating function \(M\) and that \(a\) and \(b\) are constants. The moment generating function of \(Y = a + b X\) is \(N(t) = e^{a t} M(b t)\)

Proof:

\( \E[e^{t (a + b X)}] = \E(e^{t a} e^{t b X}) = e^{t a} \E[e^{(t b) X}] = e^{a t} M(b t) \).

Suppose that \(X_1\) and \(X_2\) are independent, real-valued random variables with moment generating functions \(M_1\) and \(M_2\) respectively. The moment generating function of \(Y = X_1 + X_2\) is \(M(t) = M_1(t) M_2(t)\).

Proof:

As with the PGF, the proof for the MGF relies on the law of exponents and the fact that the expected value of a product of independent variables is the product of the expected values:

\[ \E[e^{t (X_1 + X_2)}] = \E(e^{t X_1} e^{t X_2}) = \E(e^{t X_1}) \E(e^{t X_2}) = M_1(t) M_2(t) \]

Suppose that \(X\) is a random variable taking values in \(\N\) with probability generating function \(G\). The moment generating function of \(X\) is \(M(t) = G(e^t)\).

Proof:

\( M(t) = \E(e^{t X}) = \E[(e^t)^X] = G(e^t) \)

The following theorem gives the Chernoff bounds. These are upper bounds on the tail events of a random variable.

If \(X\) is a real-valued random variable with moment generating function \(M\) then.

  1. \(\P(X \ge x) \le e^{-t x} M(t)\), \(t \gt 0\)
  2. \(\P(X \le x) \le e^{-t x} M(t)\), \(t \lt 0\)
Proof:

From Markov's inequality, \(\P(X \ge x) = \P(e^{t X} \ge e^{t x}) \le \E(e^{t X}) / e^{t x} = e^{-t x} M(t) \) if \(t \gt 0\). Similarly, \(\P(X \le x) = \P(e^{t X} \ge e^{t x}) \le e^{-t x} M(t) \) if \(t \lt 0\).

Naturally, the best Chernoff bound (in either (a) or (b)) is obtained by finding \(t\) that minimizes \(e^{-t x} M(t)\).

The Characteristic Function

From a mathematical point of view, the nicest of the generating functions is the characteristic function which is defined for a real-valued random variable \(X\) by

\[ \chi(t) = \E(e^{i t X}) = \E[\cos(t X)] + i \E[\sin(t X)], \quad t \in \R \]

Note that \(\chi\) is a complex valued function, and thus this subsection requires knowledge of complex analysis, at the undergraduate level. The function \(\chi\) is defined for all \(t \in \R\) because the random variable in the expected value is bounded in magnitude. Indeed, \(|e^{i t X}| = 1\) for all \(t \in \R\). Many of the properties of the characteristic function are more elegant than the corresponding properties of the probability or moment generating functions, because the characteristic function always exists.

If \(X\) has a continuous distribution on \(\R\) with probability density function \(f\) then

\[ \chi(t) = \int_{-\infty}^{\infty} e^{i t x} f(x) dx, \quad t \in \R \]
Proof:

This follows from the change of variables theorem for expected value, albeit a complex version.

Thus, the characteristic function of \(X\) is closely related to the Fourier transform of the probability density function \(f\). The Fourier transform is named for Joseph Fourier, and is widely used in many areas of applied mathematics.

The characteristic function completely determines the distribution. That is, random variables \(X\) and \(Y\) have the same distribution if and only if they have the same characteristic function. Indeed, the general inversion formula is a formula for computing certain combinations of probabilities from the characteristic function: if \(a \lt b\) then

\[ \int_{-n}^n \frac{e^{-i a t} - e^{- i b t}}{2 \pi i t} \chi(t) dt \to \P(a \lt X \lt b) + \frac{1}{2}[\P(X = b) - \P(X = a)] \text{ as } n \to \infty \]

The probability combinations on the right side completely determine the distribution of \(X\). Suppose that \(X\) has a continuous distribution with probability density function \(f\). A special inversion formula states that at every point \(x\) where \(f\) is differentiable,

\[ f(x) = \frac{1}{2 \pi} \int_{-\infty}^\infty e^{-i t x} \chi(t) dt \]

As with the other generating functions, the characteristic function can be used to find the moments of \(X\). Moreover, this can be done even when only some of the moments exist. If \(\E(|X^n|) \lt \infty\) then

\[ \chi(t) = \sum_{k=0}^n \frac{\E(X^k)}{k!} (i t)^k + o(t^n) \]

and therefore \(\chi^{(n)}(0) = i^n \E(X^n)\). Next we consider some simple transformations.

Suppose that \(X\) is a real-valued random variable with characteristic function \(\chi\) and that \(a\) and \(b\) are constants. The characteristic function of \(Y = a + b X\) is \(\psi(t) = e^{i a t} \chi(b t)\).

Proof:

The proof is just like the one for the MGF: \( \psi(t) = \E[e^{i t (a + b X)}] = \E(e^{i t a} e^{i t b X}) = e^{i t a} \E[e^{i (t b) X}] = e^{i a t} \chi(b t) \).

Suppose that \(X_1\) and \(X_2\) are independent, real-valued random variables with characteristic functions \(\chi_1\) and \(\chi_2\) respectively. The characteristic function of \(Y = X_1 + X_2\) is \(\chi(t) = \chi_1(t) \chi_2(t)\).

Proof:

Again, the proof is just like the one for the MGF:

\[ \chi(t) = \E[e^{i t (X_1 + X_2)}] = \E(e^{i t X_1} e^{i t X_2}) = \E(e^{i t X_1}) \E(e^{i t X_2}) = \chi_1(t) \chi_2(t) \]

The characteristic function of a random variable can be obtained from the moment generating function, under the basic existence condition that we saw earlier. Specifically, suppose that \(X\) is a real-valued random variable with moment generating function \(M\) that satisfies \(M(t) \lt \infty\) for \(t\) in some open interval \(I\) about 0. Then the characteristic function \(\chi\) of \(X\) satisfies \(\chi(t) = M(i t)\) for \(t \in I\).

The final important property of characteristic functions that we will discuss relates to convergence in distribution. Suppose that \((X_1, X_2, \ldots)\) is a sequence of real-valued random with characteristic functions \((\chi_1, \chi_2, \ldots)\) respectively. The random variables need not be defined on the same probability space. The continuity theorem states that if the distribution of \(X_n\) converges to the distribution of a random variable \(X\) as \(n \to \infty\) and \(X\) has characteristic function \(\chi\), then \(\chi_n(t) \to \chi(t)\) as \(n \to \infty\) for all \(t \in \R\). Conversely, if \(\chi_n(t)\) converges to a function \(\chi(t)\) as \(n \to \infty\) for \(t\) in some open interval about 0, and if \(\chi\) is continuous at 0, then \(\chi\) is the characteristic function of a random variable \(X\), and the distribution of \(X_n\) converges to the distribution of \(X\) as \(n \to \infty\).

The continuity theorem can be used to prove the central limit theorem, one of the fundamental theorems of probability. Also, the continuity theorem has a straightforward generalization to distributions on \(\R^n\).

The Joint Characteristic Function

Suppose now that \((X, Y)\) is a random vector for an experiment, taking values in \(\R^2\). The (joint) characteristic function of \((X, Y)\) is defined by

\[ \chi(s, t) = \E[\exp(i s X + i t Y)], \quad (s, t) \in \R^2 \]

Once again, the most important fact is that \(\chi\) completely determines the distribution: two random vectors taking values in \(\R^2\) have the same characteristic function if and only if they have the same distribution.

The joint moments can be obtained from the derivatives of the characteristic function. Suppose that \(m \in \N\) and \(n \in \N\). If \(\E(|X^m Y^n|) \lt \infty\) then

\[ \chi^{(m, n)}(0, 0) = e^{i \, (m + n)} \E(X^m \, Y^n) \]

Now let \(\chi_1\), \(\chi_2\), and \(\chi_+\) denote the characteristic functions of \(X\), \(Y\), and \(X + Y\), respectively.

For \(t \in \R\)

  1. \(\chi(t, 0) = \chi_1(t)\)
  2. \(\chi(0, t) = \chi_2(t)\)
  3. \(\chi(t, t) = \chi_+(t)\)
Proof:

All three results follow immediately from the definitions.

\(X\) and \(Y\) are independent if and only if \(\chi(s, t) = \chi_1(s) \chi_2(t)\) for all \((s, t) \in \R^2\).

Naturally, the results for bivariate characteristic functions have analogies in the general multivariate case. Only the notation is more complicated.

Examples and Applications

Bernoulli Trials

Suppose \(X\) is an indicator random variable with \(p = \P(X = 1)\), where \(p \in [0, 1]\) is a parameter. Then \(X\) has probability generating function \(G(t) = 1 - p + p t\) for \(t \in \R\).

Recall that a Bernoulli trials process is a sequence \((X_1, X_2, \ldots)\) of independent, identically distributed indicator random variables. In the usual language of reliability, \(X_i\) denotes the outcome of trial \(i\), where 1 denotes success and 0 denotes failure. The probability of success \(p = \P(X_i = 1)\) is the basic parameter of the process. The process is named for Jacob Bernoulli. A separate chapter on the Bernoulli Trials explores this process in more detail.

The number of successes in the first \(n\) trials is \(Y_n = \sum_{i=1}^n X_i\). Recall that this random variable has the binomial distribution with parameters \(n\) and \(p\), which has probability density function

\[ \P(Y_n = y) = \binom{n}{y} p^y (1 - p)^{n - y}, \quad y \in \{0, 1, \ldots, n\} \]

\(Y_n\) has probability generating function \(G_n(t) = (1 - p + p t)^n\)

Proof:

This result can be proved in (at least) two ways. One way is to use the definition and the probability density function. A more elegant proof uses the previous exercise and the representation as a sum of independent indicator variables. The result then follows immediately from Theorem 7.

If \(Y_n\) has the binomial distribution with parameters \(n\) and \(p\) then

  1. \(\E[Y_n^{(k)}] = n^{(k)} p^k\)
  2. \(\E(Y_n) = n p\)
  3. \(\var(Y_n) = n p (1 - p)\)
  4. \(\P(Y_n \text{ is even }) = \frac{1}{2}[1 - (1 - 2 p)^n]\)
Proof:

Repeated differentiation gives \( G^{(k)}(t) = n^{(k)} p^k (1 - p + p t)^{n-k} \). Hence \( G^{(k)}(1) = n^{(k)} p^k \), which gives (a). Parts (b) and (c) follow from Theorem 6 and part (d) From Theorem 4.

Suppose that \(U\) has the binomial distribution with parameters \(m\) and \(p\), \(V\) has the binomial distribution with parameters \(n\) and \(q\), and that \(U\) and \(V\) are independent.

  1. The probability generating function of \(U + V\) is \(G(t) = (1 - p + p \, t)^m (1 - q + q \, t)^n\) for \(t \in \R\).
  2. If \(p = q\) then \(U + V\) has the binomial distribution with parameters \(m + n\) and \(p\).
  3. If \(p \ne q\) then \(U + V\) does not have a binomial distribution.
Proof:

Part (a) follows from Theorem 7. For part (b), note that if \( p = q \) then \( U + V \) has PGF \( G(t) = (1 - p + p t)^{m + n} \), which is the PGF of the binomial distribution with parameters \( m + n \) and \( p \). On the other hand, if \( p \ne q \), the PGF \( G \) does not have the functional form of a binomial PGF.

Suppose that \(N\) has probability density function \(h(n) = p (1 - p)^{n-1}\) for \(n \in \N_+\) where \(p \in (0, 1]\) is a parameter. Thus, \(N\) has the geometric distribution on \(\N_+\) with parameter \(p\), and governs the trial number of the first success in a sequence of Bernoulli trials. Let \(H\) denote the probability generating function of \(N\).

  1. \(H(t) = \frac{p t}{1 - (1 - p)t}\) for \(t \lt \frac{1}{1 - p}\)
  2. \(\E[N^{(k)}] = k! \frac{(1 - p)^{k-1}}{p^k}\)
  3. \(\E(N) = \frac{1}{p}\)
  4. \(\var(N) = \frac{1 - p}{p^2}\)
  5. \(\P(N \text{ is even }) = \frac{1 - p}{2 - p}\)

The Poisson Distribution

Recall that the Poisson distribution has probability density function

\[ f(n) = e^{-a} \frac{a^n}{n!}, \quad n \in \N \]

where \(a \gt 0\) is a parameter. The Poisson distribution is named after Simeon Poisson and is widely used to model the number of random points in a region of time or space; the parameter is proportional to the size of the region of time or space. The Poisson distribution is studied in more detail in the chapter on the Poisson Process.

Suppose that \(N\) has Poisson distribution with parameter \(a\). Let \(G\) denote the probability generating function of \(N\).

  1. \(G(t) = e^{a (t - 1)}\) for \(t \in \R\)
  2. \(\E[N^{(k)}] = a^k\)
  3. \(\E(N) = a\)
  4. \(\var(N) = a\)
  5. \(\P(N \text{ is even }) = \frac{1}{2}(1 + e^{-2 a})\)

Suppose that \(X\) has the Poisson distribution with parameter \(a \gt 0\), \(Y\) has the Poisson distribution with parameter \(b \gt 0\), and that \(X\) and \(Y\) are independent. Then \(X + Y\) has the Poisson distribution with parameter \(a + b\).

Suppose that \(N\) has the Poisson distribution with parameter \(a \gt 0\).

\[ \P(N \ge n) \le e^{n - a} \left(\frac{a}{n}\right)^n, \quad n \gt a \]
Proof:

Use the Chernoff bounds.

Let \(G_n\) denote the probability generating function of the binomial distribution with parameters \(n \in \N_+\) and \(p_n \in (0, 1)\). Suppose that \(n p_n \to a \gt 0\) as \(n \to \infty\). Then \(G_n(t) \to G(t)\) as \(n \to \infty\) where \(G\) is the probability generating function of the Poisson distribution with parameter \(a\). Thus the binomial distribution with parameters \(n\) and \(p_n\) converges to the Poisson distribution with parameter \(a\) as \(n \to \infty\).

The Exponential Distribution

Recall that the exponential distribution is a continuous distribution with probability density function

\[ f(t) = r e^{-r t}, \quad 0 \le t \lt \infty \]

where \(r \gt 0\) is the rate parameter. This distribution is widely used to model failure times and other arrival times. The exponential distribution is studied in more detail in the chapter on the Poisson Process.

Suppose that \(T\) has the exponential distribution with rate parameter \(r \gt 0\). Let \(M\) denote the moment generating function of \(T\).

  1. \(M(s) = \frac{r}{r - s}\) for \(-\infty \lt s \lt r\).
  2. \(\E(T^n) = n! / r^n\) for \(n \in \N\)

Suppose that \((T_1, T_2, \ldots)\) is a sequence of independent random variables, each having the exponential distribution with rate parameter \(r \gt 0\). The moment generating function of \(T = \sum_{i=1}^n T_i\) is

\[ M(s) = \left(\frac{r}{r - s}\right)^n, \quad s \in (-\infty, r) \]

Random variable \(T\) has the gamma distribution with shape parameter \(n\) and rate parameter \(r\).

Uniform Distributions

Suppose that \(X\) is uniformly distributed on the interval \([a, b]\). Let \(M\) denote the moment generating function of \(X\). Then

  1. \(M(t) = \begin{cases} \frac{e^{b t} - e^{a t}}{(b - a)t}, & t \ne 0 \\ 1, & t = 0 \end{cases}\)
  2. \(\E(X^n) = \frac{b^{n+1} - a^{n + 1}}{(n + 1)(b - a)}\) for \(n \in \N\)

Suppose that \((X, Y)\) is uniformly distributed on the triangle \(T = \{(x, y) \in \R^2: 0 \le x \le y \le 1\}\).

  1. Find the joint moment generating function of \((X, Y)\).
  2. Find the moment generating function of \(X\).
  3. Find the moment generating function of \(Y\).
  4. Find the moment generating function of \(X + Y\).
Answer:
  1. \(M(s, t) = 2 \frac{e^{s+t} - 1}{s (s + t)} - 2 \frac{e^t - 1}{s t}; \quad s \ne 0, \; t \ne 0\)
  2. \(M_X(s) = 2 \left(\frac{e^2}{s^2} - \frac{1}{s^2} - \frac{1}{s}\right), \quad s \ne 0\)
  3. \(M_Y(t) = 2 \frac{t e^t - e^t + 1}{t^2}, \quad t \ne 0\)
  4. \(M_{X+Y}(t) = \frac{e^{2 t} - 1}{t^2} - 2 \frac{e^t - 1}{t^2}, \quad t \ne 0\)

A Bivariate Distribution

Suppose that \( (X, Y) \) has probability density function \(f(x, y) = x + y\) for \(0 \le x \le 1\), \(0 \le y \le 1\).

  1. Find the joint moment generating function \( (X, Y) \).
  2. Find the moment generating function of \(X\).
  3. Find the moment generating function of \(Y\).
  4. Find the moment generating function of \(X + Y\).
Answer:
  1. \(M(s, t) = \frac{e^{s+t}(-2 s t + s + t) + e^s(s t - s - t) + s + t}{s^2 t^2}; \quad s \ne 0, \; t \ne 0\)
  2. \(M_X(s) = \frac{3 s e^2 - 2 e^2 - s + 2}{2 s^2}, \quad s \ne 0\)
  3. \(M_Y(t) = \frac{3 t e^t - 2 e^t - t + 2}{2 t^2}, \quad t \ne 0\)
  4. \(M_{X+Y}(t) = \frac{[e^{2 t}(1 - t) + e^t (t - 2) + 1]}{t^3}, \quad t \ne 0\)

The Normal Distribution

Recall that the standard normal distribution is a continuous distribution with probability density function

\[ \phi(z) = \frac{1}{\sqrt{2 \pi}} e^{-\frac{1}{2} z^2}, \quad z \in \R \]

Normal distributions are widely used to model physical measurements subject to small, random errors and are studied in more detail in the chapter on Special Distributions.

Suppose that \(Z\) has the standard normal distribution. Let \(M\) denote the moment generating function of \(Z\). Then

  1. \(M(t) = e^{\frac{1}{2} t^2}\) for \(t \in \R\)
  2. \(\E(Z^{2 n}) = \frac{(2 n)!}{2^n n!}\) for \(n \in \N\)
  3. \(\E(Z^{2 n + 1}) = 0\) for \(n \in \N\)

Suppose again that \(Z\) has the standard normal distribution. Recall that for \(\mu \in (-\infty, \infty)\) and \(\sigma \in (0, \infty)\), \(X = \mu + \sigma Z\) has the normal distribution with mean \(\mu\) and standard deviation \(\sigma\). The moment generating function of \(X\) is \(M(t) = \exp(\mu t + \frac{1}{2} \sigma^2 t^2)\) for \(t \in \R\).

If \(X\) and \(Y\) are independent, normally distributed random variables then \(X + Y\) has a normal distribution.

The Pareto Distribution

Suppose that \(X\) has the Pareto distribution, which is a continuous distribution with probability density function

\[ f(x) = \frac{a}{x^{a + 1}}, \quad 1 \le x \lt \infty \]

where \(a \gt 0\) is a parameter. The Pareto distribution is named for Vilfredo Pareto. It is a heavy-tailed distribution that is widely used to model financial variables such as income. The Pareto distribution is studied in more detail in the chapter on Special Distributions.

Let \(M\) denote the moment generating function of \(X\). Then

  1. \(\E(X^n) = \begin{cases} \frac{a}{a - n}, & n \lt a \\ \infty, & n \ge a \end{cases}\)
  2. \(M(t) = \infty\) for \(t \gt 0\)

The Cauchy Distribution

Suppose that \(X\) has the Cauchy distribution, a continuous distribution with probability density function

\[ f(x) = \frac{1}{\pi (1 + x^2)}, \quad x \in \R \]

This distribution is named for Augustin Cauchy and is a member of the family of student \(t\) distributions. The \(t\) distributions are studied in more detail in the chapter on Special Distributions. The graph of \(f\) is known as the Witch of Agnesi, named for Maria Agnesi.

Let \(M\) denote the moment generating function of \(X\). Then

  1. \(\E(X)\) does not exist.
  2. \(M(t) = \infty\) for \(t \ne 0\).

Let \(\chi\) denote the characteristic function of \(X\). Then \(\chi(t) = e^{-|t|}\) for \(t \in \R\).

Counterexample

For the Pareto distribution, only some of the moments are finite; naturally, the moment generating function is infinite. We will now give an example of a distribution for which all of the moments are finite, yet still the moment generating function is infinite. Furthermore, we will see two different distributions that have the same moments of all orders.

Suppose that Z has the standard normal distribution and let \(X = e^Z\). The distribution of \(X\) is known as a lognormal distribution. This distribution has finite moments of all orders, but infinite moment generating function.

\(X\) has probability density function

\[ f(x) = \frac{1}{\sqrt{2 \pi} x} \exp\left(-\frac{1}{2} \ln^2(x)\right), \quad x \gt 0 \]
  1. \(\E(X^n) = e^{\frac{1}{2}n^2}\) for \(n \in \N\).
  2. \(\E(e^{t X}) = \infty\) for \(t \gt 0\).
Proof:

We use the change of variables theorem. The transformation is \( x = e^z \) so the inverse transformation is \( z = \ln(x) \) for \( x \in (0, \infty) \) and \( z \in \R \). Letting \( \phi \) denote the PDF of \( Z \), it follows that the PDF of \( X \) is \( f(x) = \phi(z) \, dz / dx = \phi[\ln(x)] / x \) for \( x \gt 0 \).

For part (a), we use the moment generating function of the standard normal distribution in Exercise 34: \( \E(X^n) = \E(e^{n Z}) = e^{n^2 / 2}\). Part (b) follows from Theorem 9, since

\[ \sum_{n=0}^\infty \frac{\E(X^n)}{n!} t^n = \sum_{n=0}^\infty \frac{e^{n^2 / 2}}{n!} t^n = \infty, \quad t \gt 0 \]

Next we construct a different distribution with the same moments as \( X \).

Now let \(h(x) = \sin[2 \pi \ln(x)]\) for \(x \gt 0\) and let \(g(x) = f(x)[1 + h(x)]\) for \(x \gt 0\). Then

  1. \(g\) is a probability density function.
  2. If \( Y \) has probability density function \( g \) then \(\E(Y^n) = e^{\frac{1}{2} n^2}\) for \(n \in \N\)
Proof:

Note first that \( g(x) \ge 0 \) for \( x \gt 0 \). Next, let \( U \) have the normal distribution with mean \( n \) and variance 1. Using the change of variables \(u = \ln(x)\) and completing the square shows that for \(n \in \N\),

\[ \int_0^\infty x^n f(x) h(x) dx = e^{-\frac{1}{2}n^2} \E[\sin(2 \pi U)] \]

From symmetry it follows that \( \int_0^\infty x^n f(x) h(x) dx = 0 \) for \( n \in \N \). Therefore \[ \int_0^\infty x^n g(x) \, dx = \int_0^\infty x^n f(x) \, dx + \int_0^\infty x^n f(x) h(x) \, dx = \int_0^\infty x^n f(x) \, dx \]

Letting \( n = 0 \) shows that \( g \) is a PDF, and then more generally, the moments of \( Y \) are the same as the moments of \( X \).

The graphs of \(f\) and \(g\) are shown below.

Densities of two distributions with the same moments