\(\newcommand{\R}{\mathbb{R}}\)
\(\newcommand{\N}{\mathbb{N}}\)
\(\newcommand{\P}{\mathbb{P}}\)
\(\newcommand{\E}{\mathbb{E}}\)
\(\newcommand{\var}{\text{var}}\)
\(\newcommand{\sd}{\text{sd}}\)
\(\newcommand{\skew}{\text{skew}}\)
\(\newcommand{\kurt}{\text{kurt}}\)

- Random
- 4. Special Distributions
- The Normal Distribution

The normal distribution holds an honored role in probability and statistics, mostly because of the central limit theorem, one of the fundamental theorems that forms a bridge between the two subjects. In addition, as we will see, the normal distribution has many nice mathematical properties. The normal distribution is also called the Gaussian distribution, in honor of Carl Friedrich Gauss, who was among the first to use the distribution.

A random variable \(Z\) has the standard normal distribution if it has the probability density function \(\phi\) given by \[ \phi(z) = \frac{1}{\sqrt{2 \, \pi}} e^{-\frac{1}{2} z^2}, \quad z \in \R \]

Let \(c = \int_{-\infty}^{\infty} e^{-\frac{1}{2} z^2} dz\). We need to show that \( c = \sqrt{2 \pi} \). That is, \(\sqrt{2 \, \pi}\) is the normalzing constant for the function \(z \mapsto e^{-\frac{1}{2} z^2}\). The proof uses a nice trick: \[ c^2 = \int_{-\infty}^\infty e^{-\frac{1}{2}x^2} \, dx \int_{-\infty}^\infty e^{-\frac{1}{2}y^2} \, dy = \int_{-\infty}^\infty \int_{-\infty}^\infty e^{-\frac{1}{2}(x^2 + y^2)} \, dx \, dy \] We now convert the double integral to polar coordinates: \( x = r \cos(\theta) \), \( y = r \sin(\theta) \) where \( r \in [0, \infty) \) and \( \theta \in [0, 2 \pi) \). So, \( x^2 + y^2 = r^2 \) and \( dx \, dy = r \, dr \, d\theta \). Thus \[ c^2 = \int_0^{2 \pi} \int_0^\infty r e^{-\frac{1}{2} r^2} \, dr \, d\theta \] Substituting \( u = r^2 / 2 \) in the inner integral gives \( \int_0^\infty e^{-u} \, du = 1 \) and then the outer integral is \( \int_0^{2 \pi} 1 \, d\theta = 2 \pi \). Thus, \( c^2 = 2 \pi \) and so \( c = \sqrt{2 \pi} \).

The standard normal probability density function has the famous bell shape

that is known to just about everyone.

The standard normal density function \(\phi\) satisfies the following properties:

- \(\phi\) is symmetric about \(z = 0\).
- \(\phi\) increases and then decreases, with mode \( z = 0 \).
- \(\phi\) is concave upward and then downward and then upward again, with inflection points at \(z = \pm 1\).
- \(\phi(z) \to 0\) as \(z \to \infty\) and as \(z \to -\infty\).

These results follow from standard calculus. Note that \(\phi^\prime(z) = - z \phi(z)\). This differential equation helps simplify the computations.

In the Special Distribution Simulator, select the normal distribution and keep the default settings. Note the shape and location of the standard normal density function. Run the simulation 1000 times, and compare the empirical density function to the probability density function.

The standard normal distribution function \(\Phi\), given by
\[ \Phi(z) = \int_{-\infty}^z \phi(t) \, dt = \int_{-\infty}^z \frac{1}{\sqrt{2 \, \pi}} e^{-\frac{1}{2} t^2} \, dt \]
and its inverse, the quantile function \(\Phi^{-1}\), cannot be expressed in closed form in terms of elementary functions. However approximate values of these functions can be obtained from the special distribution calculator, and from most mathematics and statistics software. Indeed these functions are so important that they are considered *special functions* of mathematics.

The standard normal distribution function \(\Phi\) satisfies the following properties:

- \(\Phi(-z) = 1 - \Phi(z)\) for \(z \in \R\)
- \(\Phi^{-1}(p) = -\Phi^{-1}(1 - p)\) for \(p \in (0, 1)\)
- \(\Phi(0) = \frac{1}{2}\), so the median is 0.

Part (a) follows from the symmetry of \( \phi \). Part (b) follows from part (a). Part (c) follows from part (a) with \( z = 0 \).

In the special distribution calculator, select the normal distribution and keep the default settings.

- Note the shape of the density function and the distribution function.
- Find the first and third quartiles.
- Compute the interquartile range.

In the special distribution calculator, select the normal distribution and keep the default settings. Find the quantiles of the following orders for the standard normal distribution:

- \(p = 0.001\), \(p = 0.999\)
- \(p = 0.05\), \(p = 0.95\)
- \(p = 0.1\), \(p = 0.9\)

The mean and variance of the standard normal distribution are

- \( \E(Z) = 0 \)
- \( \var(Z) = 1 \)

- Of course, by symmetry, if \( Z \)
*has*a mean, the mean must be 0, but we have to argue that the mean exists. Actually it's not hard to compute the mean directly. Note that \[ \E(Z) = \int_{-\infty}^\infty z \frac{1}{\sqrt{2 \pi}} e^{-z^2 / 2} \, dz = \int_{-\infty}^0 z \frac{1}{\sqrt{2 \pi}} e^{-z^2 / 2} \, dz + \int_0^\infty z \frac{1}{\sqrt{2 \pi}} e^{-z^2 / 2} \, dz \] The integrals on the right can be evaluated explicitly using the simple substitution \( u = z^2 / 2 \). The result is \( \E(Z) = -1/\sqrt{2 \pi} + 1/\sqrt{2 \pi} = 0 \). - Note that \[ \var(Z) = \E(Z^2) = \int_{-\infty}^\infty z^2 \phi(z) \, dz \] Integrate by parts, using the parts \( u = z \) and \( dv = z \phi(z) \, dz \). Thus \( du = dz \) and \( v = -\phi(z) \). Note that \( z \phi(z) \to 0 \) as \( z \to \infty \) and as \( z \to -\infty \). Thus, the integration by parts formula gives \( \var(Z) = \int_{-\infty}^\infty \phi(z) \, dz = 1 \).

In the Special Distribution Simulator, select the normal distribution and keep the default settings. Note the shape and size of the mean \( \pm \) standard deviation bar.. Run the simulation 1000 times, and compare the empirical mean and standard deviation to the true mean and standard deviation.

More generally, we can compute all of the moments. The key is the following recursion formula.

For \( n \in \N_+ \), \( \E\left(Z^{n+1}\right) = n \E\left(Z^{n-1}\right) \)

First we use the differential equation that we noted above for properties of the PDF, namely \( \phi^\prime(z) = - z \phi(z) \). \[ \E\left(Z^{n+1}\right) = \int_{-\infty}^\infty z^{n+1} \phi(z) \, dz = \int_{-\infty}^\infty z^n z \phi(z) \, dz = - \int_{-\infty}^\infty z^n \phi^\prime(z) \, dz \] Now we integrate by parts, with \( u = z^n \) and \( dv = \phi^\prime(z) \, dz \) to get \[ \E\left(Z^{n+1}\right) = -z^n \phi(z) \bigg|_{-\infty}^\infty + \int_{-\infty}^\infty n z^{n-1} \phi(z) \, dz = 0 + n \E\left(Z^{n-1}\right) \]

The moments of the standard normal distribution are now easy to compute.

For \(n \in \N\),

- \(\E \left( Z^{2 n} \right) = 1 \cdot 3 \cdots (2n - 1) = (2 n)! \big/ (n! 2^n) \)
- \(\E \left( Z^{2 n + 1} \right) = 0\)

The result follows from the recursion formula and the mean and variance above.

- Since \( \E(Z) = 0 \) it follows that \( \E\left(Z^n\right) = 0 \) for every odd \( n \in \N \).
- Since \( \E\left(Z^2\right) = 1 \), it follows that \( \E\left(Z^4\right) = 1 \cdot 3 \) and then \( \E\left(Z^6\right) = 1 \cdot 3 \cdot 5 \), and so forth. You can use induction, if you like, for a more formal proof.

Of course, the fact that the odd-order moments are 0 also follows from the symmetry of the distribution. The following theorem gives the skewness and kurtosis of the standard normal distribution.

If \(Z\) has the standard normal distribution then

- \(\skew(Z) = 0\)
- \(\kurt(Z) = 3\)

- This follows immediately from the symmetry of the distribution. Directly, since \( Z \) has mean 0 and variance 1, \( \skew(Z) = \E\left(Z^3\right) = 0 \).
- Since \( \E(Z) = \E\left(Z^3\right) = 0 \), \( \kurt(Z) = \E\left(Z^4\right) = 3\).

Because of the last result, (and the use of the standard normal distribution literally as a *standard*), the excess kurtosis of a random variable is defined to be the ordinary kurtosis minus 3. Thus, the excess kurtosis of the normal distribution is 0.

Many other important properties of the normal distribution are most easily obtained using the moment generating function or the characteristic function.

If \(Z\) has the standard normal distribution then

- \(Z\) has moment generating function \(m(t) = e^{t^2 / 2}\) for \( t \in \R\).
- \( Z \) has characteristic function \(\chi(t) = e^{-t^2 / 2}\) for \( t \in \R \).

- Note that \[ m(t) = \E(e^{t Z}) = \int_{-\infty}^\infty e^{t z} \frac{1}{\sqrt{2 \pi}} e^{-z^2 / 2} \, dz = \int_{-\infty}^\infty \frac{1}{2 \pi} \exp\left(-\frac{1}{2} z^2 + t z\right) \, dz \] We complete the square in \( z \) to get \( -\frac{1}{2} z^2 + t z = -\frac{1}{2}(z - t)^2 + \frac{1}{2} \). Thus we have \[ \E(e^{t Z}) = e^{\frac{1}{2} t^2} \int_{-\infty}^\infty \frac{1}{\sqrt{2 \pi}} \exp\left[-\frac{1}{2}(z - t)^2\right] \, dz \] In the integral, if we use the simple substitution \(u = z - t\) then the integral becomes \(\int_{-\infty}^\infty \phi(u) \, du = 1\). Hence \( \E\left(e^{t Z}\right) = e^{\frac{1}{2} t^2} \),
- This follows from (a) since \(\chi(t) = m(i t)\).

Thus, the standard normal distribution has the curious property that the characteristic function is a multiple of the probability density function: \[ \chi = \sqrt{2 \pi} \phi \] The moment generating function can be used to give another derivation of the moments of \( Z \), since we know that \( \E\left(Z^n\right) = m^{(n)}(0) \).

The general normal distribution is the location-scale family associated with the standard normal distribution.

Suppose that \(\mu \in \R\) and \( \sigma \in (0, \infty) \) and that \(Z\) has the standard normal distribution. Then \(X = \mu + \sigma Z\) has the normal distribution with location parameter \(\mu\) and scale parameter \(\sigma\).

The basic properties of the density function and distribution function follow easily from general results for location scale families.

The normal distribution with location parameter \(\mu\) and scale parameter \(\sigma\) has probability density function \(f\) given by \[ f(x) = \frac{1}{\sigma} \phi\left(\frac{x - \mu}{\sigma}\right) = \frac{1}{\sqrt{2 \, \pi} \, \sigma} \exp \left[ -\frac{1}{2} \left( \frac{x - \mu}{\sigma} \right)^2 \right], \quad x \in \R \]

This follows from the change of variables formula corresponding to the transformation \( x = \mu + \sigma z \).

The normal density function \(f\) satisfies the following properties:

- \(f\) is symmetric about \(x = \mu\).
- \(f\) increases and then decreases with mode \( x = \mu \).
- \(f\) is concave upward then downward then upward again, with inflection points at \( x = \mu \pm \sigma \).
- \(f(x) \to 0\) as \(x \to \infty\) and as \(x \to -\infty\).

These properties follow from the corresponding properties of \( \phi \).

In the special distribution simulator, select the normal distribution. Vary the parameters and note the shape and location of the probability density function. With your choice of parameter settings, run the simulation 1000 times and compare the empirical density function to the true probability density function.

Let \(F\) denote the distribution function for the normal distribution with location parameter \(\mu\) and scale parameter \(\sigma\), and as above, let \(\Phi\) denote the standard normal distribution function.

The normal distribution function \(F\) and quantile function \( F^{-1} \) satsify the following properties:

- \(F(x) = \Phi \left( \frac{x - \mu}{\sigma} \right)\) for \(x \in \R\).
- \(F^{-1}(p) = \mu + \sigma \, \Phi^{-1}(p)\) for \(p \in (0, 1)\).
- \(F(\mu) = \frac{1}{2}\) so the median occurs at \(x = \mu\).

Part (a) follows since \( X = \mu + \sigma Z \). Parts (b) and (c) follow from (a).

In the special distribution calculator, select the normal distribution. Vary the parameters and note the shape of the density function and the distribution function.

As the notation suggests, the location and scale parameters are also the mean and standard deviation, respectively.

If \(X\) has the normal distribution with location parameter \(\mu\) and scale parameter \(\sigma\) then

- \(\E(X) = \mu\)
- \(\var(X) = \sigma^2\)

This follows from the representation \( X = \mu + \sigma Z \) and basic properties of expected value and variance.

The central moments of \(X\) can be computed easily from the moments of the standard normal distribution. The ordinary (raw) moments of \(X\) can be computed from the central moments, but the formulas are a bit messy.

If \(X\) has the normal distribution with mean \(\mu\) and standard deviation \(\sigma\), then for \(n \in \N\),

- \(\E \left[ (X - \mu)^{2 n} \right] = 1 \cdot 3 \cdots (2n - 1) \sigma^{2n} = (2 n)! \sigma^{2n} \big/ (n! 2^n)\)
- \(\E \left[ (X - \mu)^{2 \, n + 1} \right] = 0\)

All of the odd central moments of \(X\) are 0, a fact that also follows from the symmetry of the probability density function.

In the special distribution simulator select the normal distribution. Vary the mean and standard deviation and note the size and location of the mean/standard deviation bar. With your choice of parameter settings, run the simulation 1000 times and compare the empirical mean and standard deviation to the true mean and standard deviation.

The following exercise gives the skewness and kurtosis of the normal distribution.

If \(X\) has the normal distribution with mean \(\mu\) and standard deviation \(\sigma\) then

- \(\skew(X) = 0\)
- \(\kurt(X) = 3\)

The skewness and kurtosis of a variable are defined in terms of the standard score, so these results follows form the corresponding reults for \( Z \).

If \(X\) has the normal distribution with location parameter \(\mu\) and scale parameter \(\sigma\) then

- \(X\) has moment generating function \(M(t) = \exp \left( \mu t + \frac{1}{2} \sigma^2 t^2 \right)\) for \(t \in \R \).
- \( X \) has characteristic function \( \chi(t) =\exp \left( i \mu t - \frac{1}{2} \sigma^2 t^2 \right)\) for \(t \in \R \)

- This follows from the representation \( X = \mu + \sigma Z \), basic properties of expected value, and the MGF of \( Z \) above: \[ \E\left(e^{t X}\right) = \E\left(e^{t \mu + t \sigma Z}\right) = e^{t \mu} \E\left(e^{t \sigma Z}\right) = e^{t \mu} e^{\frac{1}{2} t^2 \sigma^2} = e^{t \mu + \frac{1}{2} \sigma^2 t^2} \]
- This follows from (a) since \( \chi(t) = M(i t) \).

The normal family of distributions satisfies two very important properties: invariance under linear transformations and invariance with respect to sums of independent variables. The first property is essentially a restatement of the fact that the normal distribution is a location-scale family.

Suppose that \(X\) is normally distributed with mean \(\mu\) and variance \(\sigma^2\). If \(a \in \R\) and \(b \in \R \setminus \{0\}\), then \(a + b \, X\) is normally distributed with mean \(a + b \mu\) and variance \(b^2 \sigma^2\).

The MGF of \(a + b X\) is \[ \E\left[e^{t (a + b X)}\right] = e^{ta} \E\left[e^{(t b) X}\right] = e^{ta} e^{\mu (t b) + \sigma^2 (t b)^2 / 2} = e^{(a + b \mu)t + b^2 \sigma^2 t^2 / 2} \] which we recognize as the MGF of the normal distribution with mean \(a + b \mu\) and variance \(b^2 \sigma^2\).

Recall that in general, if \(X\) is a random variable with mean \(\mu\) and standard deviation \(\sigma \gt 0\), then \(Z = (X - \mu) \big/ \sigma\) is the standard score of \(X\). A corollary of the last result is that if \(X\) has a normal distribution then the standard score \(Z\) has a standard normal distribution. Conversely, any normally distributed variable can be constructed from a standard normal variable.

Standard score.

- If \(X\) has the normal distribution with mean \(\mu\) and standard deviation \(\sigma\) then \(Z = \frac{X - \mu}{\sigma}\) has the standard normal distribution.
- If \(Z\) has the standard normal distribution and if \(\mu \in \R\) and \(\sigma \in (0, \infty)\), then \(X = \mu + \sigma \, Z\) has the normal distribution with mean \(\mu\) and standard deviation \(\sigma\).

Suppose that \(X_1\) and \(X_2\) are independent random variables, and that \(X_i\) is normally distributed with mean \(\mu_i\) and variance \(\sigma_i^2\) for \(i \in \{1, 2\}\). Then \(X_1 + X_2\) is normally distributed with

- \(\E(X_1 + X_2) = \mu_1 + \mu_2\)
- \(\var(X_1 + X_2) = \sigma_1^2 + \sigma_2^2\)

The MGF of \(X_1 + X_2\) is the product of the MGFs, so \[ \E\left(\exp\left[t (X_1 + X_2)\right]\right) = \exp\left(\mu_1 t + \sigma_1^2 t^2 / 2\right) \exp\left(\mu_2 t + \sigma_2^2 t^2 / 2\right) = \exp\left[\left(\mu_1 + \mu_2\right)t + \left(\sigma_1^2 + \sigma_2^2\right) t^2 / 2\right] \] which we recognize as the MGF of the normal distribution with mean \(\mu_1 + \mu_2\) and variance \(\sigma_1^2 + \sigma_2^2\).

This theorem generalizes to a sum of \(n\) independent, normal variables. The important part is that the sum is still normal; the expressions for the mean and variance are standard results that hold for the sum of independent variables generally. As a consequence of this result and the one above for linear transformations, it follows that the normal distribution is stable.

The normal distribution is stable. Specifically, suppose that \( X \) has the normal distribution with mean \( \mu \in \R \) and variance \( \sigma^2 \in (0, \infty)\). If \( (X_1, X_2, \ldots, X_n) \) are independent copies of \( X \), then \( X_1 + X_2 + \cdots + X_n \) has the same distribution as \( \left(n - \sqrt{n}\right) \mu + \sqrt{n} X \), namely normal with mean \( n \mu \) and variance \( n \sigma^2 \).

As a consequence of the previous theorem, \( X_1 + X_2 + \cdots + X_n \) has the normal distribution with mean \( n \mu \) and variance \( n \sigma^2 \). As a consequence of the theorem above on linear transformations, \( \left(n - \sqrt{n}\right) \mu + \sqrt{n} X \) has the normal distribution with mean \( \left(n - \sqrt{n}\right) \mu + \sqrt{n} \mu = n \mu \) and variance \( \left(\sqrt{n}\right)^2 \sigma^2 = n \sigma^2 \).

All stable distributions are infinitely divisible, so the normal distribution belongs to this family as well. For completeness, here is the explicit statement:

The normal distribution is infinitely divisible. Specifically, if \( X \) has the normal distribution with mean \( \mu \in \R \) and variance \( \sigma^2 \in (0, \infty) \), then for \( n \in \N_+ \), \( X \) has the same distribution as \( X_1 + X_2 + \cdots + X_n\) where \( (X_1, X_2, \ldots, X_n) \) are independent, and each has the normal distribution with mean \( \mu / n \) and variance \( \sigma^2 / n \).

Finally, the normal distribution belongs to the family of general exponential distributions.

Suppose that \(X\) has the normal distribution with mean \(\mu\) and variance \(\sigma^2\). The distribution is a two-parameter exponential family with natural parameters \(\left( \frac{\mu}{\sigma^2}, -\frac{1}{2 \, \sigma^2} \right)\), and natural statistics \(\left(X, X^2\right)\).

Expanding the square, the normal PDF can be written in the form \[ f(x) = \frac{1}{\sqrt{2 \pi} \sigma} \exp\left(-\frac{\mu^2}{2 \sigma^2}\right) \exp\left(\frac{\mu}{\sigma^2} x - \frac{1}{2 \sigma^2} x^2 \right), \quad x \in \R\] so the result follows from the definition of the general exponential family.

A number of other special distributions studied in this chapter are constructed from normally distributed variables. These include

- The lognormal distribution
- The folded normal distribution, which includes the half normal distribution as a special case
- The Rayleigh distribution
- The Maxwell distribution
- The Lévy distribution

Also, as mentioned at the beginning of this section, the importance of the normal distribution stems in large part from the central limit theorem, one of the fundamental theorems of probability. By virtue of this theorem, the normal distribution is connected to many other distributions, by means of limits and approximations, including the special distributions in the following list. Details are given in the individual sections.

- The binomial distribution
- The negative binomial distribution
- The Poisson distribution
- The gamma distribution
- The chi-square distribution
- The student \( t \) distribution
- The Irwin-Hall distribution

Suppose that the volume of beer in a bottle of a certain brand is normally distributed with mean 0.5 liter and standard deviation 0.01 liter.

- Find the probability that a bottle will contain at least 0.48 liter.
- Find the volume that corresponds to the 95th percentile

Let \(X\) denote the volume of beer in liters

- \(\P(X \gt 0.48) = 0.9772\)
- \(x_{0.95} = 0.51645\)

A metal rod is designed to fit into a circular hole on a certain assembly. The radius of the rod is normally distributed with mean 1 cm and standard deviation 0.002 cm. The radius of the hole is normally distributed with mean 1.01 cm and standard deviation 0.003 cm. The machining processes that produce the rod and the hole are independent. Find the probability that the rod is to big for the hole.

Let \(X\) denote the radius of the rod and \(Y\) the radius of the hole. \(\P(Y - X \lt 0) = 0.0028\)

The weight of a peach from a certain orchard is normally distributed with mean 8 ounces and standard deviation 1 ounce. Find the probability that the combined weight of 5 peaches exceeds 45 ounces.

Let \(X\) denote the combined weight of the 5 peaches, in ounces. \(\P(X \gt 45) = 0.0127\)

In some settings, it's convenient to consider a constant as having a normal distribution (with mean being the constant and variance 0, of course). This convention simplifies the statements of theorems and definitions in these settings. Of course, the formulas above for the probability density function and distribution function do not hold for a constant, but the other results above involving the mean and variance, the moment generating function, the general moments, linear transformations, and sums are still valid. Moreover, the linear transformation result would hold for all \(a\) and \(b\).