
## 3. Variance

Recall the expected value of a real-valued random variable is the mean of the variable, and is a measure of the center of the distribution. Recall also that by taking the expected value of various transformations of the variable, we can measure other interesting characteristics of the distribution. In this section, we will study expected values that measure the spread of the distribution about the mean.

### Basic Theory

#### Definitions

As usual, we start with a random experiment with probability measure $$\P$$ on an underlying sample space. Suppose that $$X$$ is a random variable for the experiment, taking values in $$S \subseteq \R$$. Recall that $$\E(X)$$, the expected value (or mean) of $$X$$ gives the center of the distribution of $$X$$. The variance of $$X$$ is a measure of the spread of the distribution about the mean and is defined by

$\var(X) = \E\left([X - \E(X)]^2\right)$

Recall that the second moment of $$X$$ about $$a$$ is $$\E[(X - a)^2]$$. Thus, the variance is the second moment of $$X$$ about $$\mu = \E(X)$$, or equivalently, the second central moment of $$X$$. Second moments have a nice interpretation in physics, if we think of the distribution of $$X$$ as a mass distribution in $$\R$$. Then the second moment of $$X$$ about $$a$$ is the moment of inertia of the mass distribution about $$a$$. This is a measure of the resistance of the mass distribution to any change in its rotational motion about $$a$$. In particular, the variance of $$X$$ is the moment of inertia of the mass distribution about the center of mass $$\mu$$.

Suppose that $$X$$ has a discrete distribution with probability density function $$f$$ and mean $$\mu$$. Then

$\var(X) = \sum_{x \in S} (x - \mu)^2 f(x)$
Proof:

This follows from the discrete version of the change of variables theorem

Suppose that $$X$$ has a continuous distribution with probability density function $$f$$ and mean $$\mu$$. Then

$\var(X) = \int_S (x - \mu)^2 f(x) dx$
Proof:

This follows from the continuous version of the change of variables formula.

The standard deviation of $$X$$ is the square root of the variance. It also measures dispersion about the mean but has the same physical units as the variable $$X$$.

$\sd(X) = \sqrt{\var(X)}$

When the random variable $$X$$ is understood, the standard deviation is often denoted by $$\sigma$$, so that the variance is $$\sigma^2$$.

#### Properties

The following exercises give some basic properties of variance, which in turn rely on basic properties of expected value. As usual, we assume that the stated expected values exist. Our first result is a variance formula that is usually better than the definition for computational purposes.

$$\var(X) = \E(X^2) - [\E(X)]^2$$.

Proof:

Let $$\mu = \E(X)$$. Using the linearity of expected value we have

$\var(X) = \E(X - \mu)^2 = \E(X^2 - 2 \mu X + \mu^2) = \E(X^2) - 2 \mu \E(X) + \mu^2 = \E(X^2) - 2 \mu^2 + \mu^2 = \E(X^2) - \mu^2$

Variance is always nonnegative, since its the expected value of a nonnegative random variable. Moreover, any random variable that really is random (not a constant) will have strictly positive variance.

The nonnegative property.

1. $$\var(X) \ge 0$$
2. $$\var(X) = 0$$ if and only if $$\P(X = c) = 1$$ for some constant $$c$$ (and then of course, $$\E(X) = c$$).
Proof:

These results follow from the basic inequality properties of expected value. Let $$\mu = \E(X)$$. First $$(X - \mu)^2 \ge 0$$ with probability 1 so $$\E[(X - \mu)^2] \ge 0$$. In addition, $$\E[(X - \mu)^2] = 0$$ if and only if $$\P(X = \mu) = 1$$.

Our next result shows how the variance and standard deviation are changed by a linear transformation of the random variable. In particular, note that variance, unlike general expected value, is not a linear operation. This is not really surprising since the variance is the expected value of a nonlinear function of the variable.

If $$a$$ and $$b$$ are constants then

1. $$\var(a + b X) = b^2 \, \var(X)$$
2. $$\sd(a + b X) = |b| \, \sd(X)$$
Proof:

Let $$\mu = \E(X)$$. By linearity, $$\E(a + b X) = a + b \mu$$. Hence $$\var(a + b X) = \E\left([(a + b X) - (a + b \mu)]^2\right) = \E\left(b^2 (X - \mu)^2\right) = b^2 \var(X)$$. Part (b) follows from (a) by taking square roots.

Recall that when $$b \gt 0$$, the linear transformation $$x \mapsto a + b \, x$$ is called a location-scale transformation and often corresponds to a change of location and change of scale in the physical units. For example, the change from inches to centimeters in a measurement of length is a scale transformation, and the change from Fahrenheit to Celsius in a measurement of temperature is both a location and scale transformation. The previous result shows that when a location-scale transformation is applied to a random variable, the standard deviation does not depend on the location parameter, but is multiplied by the scale factor.

The random variable $$Z$$ given below has mean 0 and variance 1:

$Z = \frac{X - \E(X)}{\sd(X)}$
Proof:

This result follows from the previous theorem. Let $$\mu = \E(X)$$ and $$\sigma = \sd(X)$$ so that $$Z = \frac{1}{\sigma} (X - \mu)$$. Then $$\E(Z) = \frac{1}{\sigma} [\E(X) - \mu] = 0$$ and $$\var(Z) = \frac{1}{\sigma^2} \var(X) = 1$$.

The random variable $$Z$$ in Exercise 6 is sometimes called the standard score associated with $$X$$. Since $$X$$ and its mean and standard deviation all have the same physical units, the standard score $$Z$$ is dimensionless. It measures the directed distance from $$\E(X)$$ to $$X$$ in terms of standard deviations.

Let $$Z$$ denote the standard score of $$X$$, and suppose that $$Y = a + b X$$ where $$a, \; b \in \R$$ and $$b \ne 0$$. If $$b \gt 0$$, the standard score of $$Y$$ is $$Z$$ and if $$b \lt 0$$, the standard score of $$Y$$ is $$-Z$$.

Proof:

$$E(Y) = a + b \E(X)$$ and $$\sd(Y) = |b| \sd(X)$$. Hence

$\frac{Y - \E(Y)}{\sd(Y)} = \frac{b}{|b|} \frac{X - \E(X)}{\sd(X)}$

As just noted, when $$b \gt 0$$, the variable $$Y = a + b X$$ is a location-scale transformation and often corresponds to a change of physical units. Since the standard score is dimensionless, it's reasonable that the standard scores of $$X$$ and $$Y$$ are the same. On the other hand, when $$X \ge 0$$, the ratio of standard deviation to mean is called the coefficient of variation. This quantity also is dimensionless, and is sometimes used to compare variability for random variables with different means.

$\text{cv}(X) = \frac{\sd(X)}{\E(X)}$

#### Chebyshev's Inequality

Chebyshev's inequality (named after Pafnuty Chebyshev) gives an upper bound on the probability that a random variable will be more than a specified distance from its mean. This is often useful in applied problems where the distribution is unknown, but the mean and variance are at least approximately known. In the following two exercises, suppose that $$X$$ is a real-valued random variable with mean $$\mu = \E(X)$$ and standard deviation $$\sigma = \sd(X)$$.

Chebyshev's inequality:

$\P(|X - \mu| \ge t) \le \frac{\sigma^2}{t^2}, \quad t \gt 0$
Proof:

From Markov's inequality, $$\P(|X - \mu| \ge t) = \P[(X - \mu)^2 \ge t^2] \le \E[(X - \mu)^2] / t^2 = \sigma^2 / t^2$$ .

An alternate version of Chebyshev's inequality is

$\P(|X - \mu| \ge k \sigma) \le \frac{1}{k^2}, \quad k \gt 0$
Proof:

Let $$t = k \sigma$$ in the first version of Chebyshev's inequality.

The usefulness of the Chebyshev inequality comes from the fact that it holds for any distribution (assuming only that the mean and variance exist). The tradeoff is that for many specific distributions, the Chebyshev bound is rather crude. Note in particular that in the last exercise, the bound is useless when $$k \le 1$$, since 1 is an upper bound for the probability of any event.

### Examples and Applications

#### Indicator Variables

Suppose that $$X$$ is an indicator variable with $$p = \P(X = 1)$$, where $$p \in [0, 1]$$. Then

1. $$\E(X) = p$$
2. $$\var(X) = p (1 - p)$$
Proof:

We proved part (a) in the section on expected value, although the result is so simple that the derivation is trivial. For part (b), note that $$X^2 = X$$ since $$X$$ only takes values 0 and 1. Hence $$\E(X^2) = p$$ and therefore $$\var(X) = p - p^2 = p (1 - p)$$.

The graph of $$\var(X)$$ as a function of $$p$$ is a parabola, opening downward, with roots at 0 and 1. Thus the minimum value of $$\var(X)$$ is 0, and occurs when $$p = 0$$ and $$p = 1$$ (when $$X$$ is deterministic). The maximum value is $$\frac{1}{4}$$ and occurs when $$p = \frac{1}{2}$$.

#### Uniform Distributions

Suppose that $$X$$ has the discrete uniform distribution on $$\{m, m+1, \ldots, n\}$$ where $$m \le n$$. Then

1. $$\E(X) = \frac{1}{2}(m + n)$$.
2. $$\var(X) = \frac{1}{12}(n - m)(n - m + 2)$$.

Suppose that $$X$$ has the continuous uniform distribution on the interval $$[a, b]$$ where $$a \lt b$$. Then

1. $$\E(X) = \frac{1}{2}(a + b)$$
2. $$\var(X) = \frac{1}{12}(b - a)^2$$

Note that in both the discrete and continuous cases, the variance depends only on the length of the interval.

#### Dice

Recall that a fair die is one in which the faces are equally likely. In addition to fair dice, there are various types of crooked dice. Here are three:

• An ace-six flat die is a six-sided die in which faces 1 and 6 have probability $$\frac{1}{4}$$ each while faces 2, 3, 4, and 5 have probability $$\frac{1}{8}$$ each.
• A two-five flat die is a six-sided die in which faces 2 and 5 have probability $$\frac{1}{4}$$ each while faces 1, 3, 4, and 6 have probability $$\frac{1}{8}$$ each.
• A three-four flat die is a six-sided die in which faces 3 and 4 have probability $$\frac{1}{4}$$ each while faces 1, 2, 5, and 6 have probability $$\frac{1}{8}$$ each.

A flat die, as the name suggests, is a die that is not a cube, but rather is shorter in one of the three directions. The particular probabilities that we use ($$\frac{1}{4}$$ and $$\frac{1}{8}$$) are fictitious, but the essential property of a flat die is that the opposite faces on the shorter axis have slightly larger probabilities that the other four faces. Flat dice are sometimes used by gamblers to cheat. In the following problems, you will compute the mean and variance for each of the various types of dice. Be sure to compare the results.

A standard, fair die is thrown and the score $$X$$ is recorded. Sketch the graph of the probability density function and compute each of the following:

1. $$\E(X)$$
2. $$\var(X)$$
1. $$\frac{7}{2}$$
2. $$\frac{35}{12}$$

An ace-six flat die is thrown and the score $$X$$ is recorded. Sketch the graph of the probability density function and compute each of the following:

1. $$\E(X)$$
2. $$\var(X)$$
1. $$\frac{7}{2}$$
2. $$\frac{15}{4}$$

A two-five flat die is thrown and the score $$X$$ is recorded. Sketch the graph of the probability density function and compute each of the following:

1. $$\E(X)$$
2. $$\var(X)$$
1. $$\frac{7}{2}$$
2. $$\frac{11}{4}$$

A three-four flat die is thrown and the score $$X$$ is recorded. Sketch the graph of the probability density function and compute each of the following:

1. $$\E(X)$$
2. $$\var(X)$$
1. $$\frac{7}{2}$$
2. $$\frac{9}{4}$$

In the dice experiment, select one die. Run the experiment 1000 times and note the apparent convergence of the empirical mean and standard deviation to the distribution mean and standard deviation in each of the following cases:

1. Fair die
2. Ace-six flat die
3. Two-five flat die
4. Three-four flat die

#### The Poisson Distribution

Recall that the Poisson distribution has probability density function

$f(n) = e^{-a} \, \frac{a^n}{n!}, \quad n \in \N$

where $$a \gt 0$$ is a parameter. The Poisson distribution is named after Simeon Poisson and is widely used to model the number of random points in a region of time or space; the parameter $$a$$ is proportional to the size of the region. The Poisson distribution is studied in detail in the chapter on the Poisson Process.

Suppose that $$N$$ has the Poisson distribution with parameter $$a$$. Then

1. $$\E(N) = a$$
2. $$\var(N) = a$$
Proof:

Part (a) was shown in the section on expected value. For part (b), we compute the second factorial moment:

$\E[N (N - 1)] = \sum_{n=1}^\infty n (n - 1) e^{-a} \frac{a^n}{n!} = \sum_{n=2}^\infty e^{-a} \frac{a^n}{(n - 2)!} = e^{-a} a^2 \sum_{n=2}^\infty \frac{a^{n-2}}{(n - 2)!} = a^2 e^{-a} e^a = a^2$

Hence, $$E(N^2) = \E[N(N - 1)] + \E(N) = a^2 + a$$, so finally $$\var(N) = (a^2 + a) - a^2 = a$$.

Thus, the parameter is both the mean and the variance of the distribution.

In the Poisson experiment, the parameter is $$a = r \, t$$. Vary the parameter and note the size and location of the mean-standard deviation bar. For selected values of the parameter, run the experiment 1000 times and note the apparent convergence of the empirical mean and standard deviation to the distribution mean and standard deviation.

#### The Geometric Distribution

Recall that the geometric distribution on $$\N_+$$ is a discrete distribution with probability density function

$f(n) = p \, (1 - p)^{n - 1}, \quad n \in \N_+$

where $$p \in (0, 1]$$ is a parameter. The geometric distribution governs the trial number of the first success in a sequence of Bernoulli trials with success parameter $$p$$.

Suppose that $$N$$ has the geometric distribution on $$\N_+$$ with success parameter $$p$$. Then

1. $$\E(N) = \frac{1}{p}$$
2. $$\var(N) = \frac{1 - p}{p^2}$$
Proof:

We proved part (a) in the section on expected value. For part (b) we will compute the second factorial moment. Thus

$\E[N(N - 1)] = \sum_{n = 2}^\infty n (n - 1) (1 - p)^{n-1} p = p(1 - p) \frac{d^2}{dp^2} \sum_{n=0}^\infty (1 - p)^n = p (1 - p) \frac{d^2}{dp^2} \frac{1}{p} = p (1 - p) \frac{2}{p^3} = \frac{2 (1 - p)}{p^2}$

Hence $$\E(N^2) = \E[N(N - 1)] + \E(N) = 2 / p^2 - 1 / p$$ and hence $$\var(X) = 2 / p^2 - 1 / p - 1 / p^2 = 1 / p^2 - 1 / p$$.

Note that the variance is 0 when $$p = 1$$, not surprising since $$X$$ is deterministic in this case.

In the negative binomial experiment, set $$k = 1$$ to get the geometric distribution . Vary $$p$$ with the scroll bar and note the size and location of the mean-standard deviation bar. For selected values of $$p$$, run the experiment 1000 times and note the apparent convergence of the empirical mean and standard deviation to the distribution mean and standard deviation.

Suppose that $$N$$ has the geometric distribution with parameter $$p = \frac{3}{4}$$. Compute the true value and the Chebyshev bound for the probability that $$N$$ is at least 2 standard deviations away from the mean.

1. $$\frac{1}{16}$$
2. $$\frac{1}{4}$$

#### The Exponential Distribution

Recall that the exponential distribution is a continuous distribution with probability density function

$f(t) = r \, e^{-r \, t}, \quad 0 \le t \lt \infty$

where $$r \gt 0$$ is the with rate parameter. This distribution is widely used to model failure times and other arrival times. The exponential distribution is studied in detail in the chapter on the Poisson Process.

Suppose that $$T$$ has the exponential distribution with rate parameter $$r$$. Then

1. $$\E(T) = \frac{1}{r}$$.
2. $$\var(T) = \frac{1}{r^2}$$.

Thus, for the exponential distribution, the mean and standard deviation are the same.

In the gamma experiment, set $$k = 1$$ to get the exponential distribution. Vary $$r$$ with the scroll bar and note the size and location of the mean-standard deviation bar. For selected values of $$r$$, run the experiment 1000 times and note the apparent convergence of the empirical mean and standard deviation to the distribution mean and standard deviation.

Suppose that $$X$$ has the exponential distribution with rate parameter $$r \gt 0$$. Compute the true value and the Chebyshev bound for the probability that $$X$$ is at least $$k$$ standard deviations away from the mean.

1. $$e^{-(k+1)}$$
2. $$\frac{1}{k^2}$$

#### The Pareto Distribution

Recall that the Pareto distribution is a continuous distribution with probability density function

$f(x) = \frac{a}{x^{a + 1}}, \quad 1 \le x \lt \infty$

where $$a \gt 0$$ is a parameter. The Pareto distribution is named for Vilfredo Pareto. It is a heavy-tailed distribution that is widely used to model financial variables such as income. The Pareto distribution is studied in detail in the chapter on Special Distributions.

Suppose that $$X$$ has the Pareto distribution with shape parameter $$a$$. Then

1. $$\E(X) = \infty$$ if $$0 \lt a \le 1$$
2. $$\E(X) = \frac{a}{a - 1}$$ if $$1\lt a \lt \infty$$
3. $$\var(X)$$ is undefined if $$0 \lt a \le 1$$
4. $$\var(X) = \infty$$ if $$1 \lt a \le 2$$
5. $$\var(X) = \frac{a}{(a - 1)^2 (a - 2)}$$ if $$2 \lt a \lt \infty$$

In the special distribution simuator, select the Pareto distribution. Vary $$a$$ with the scroll bar and note the size and location of the mean/standard deviation bar. For each of the following values of $$a$$, run the experiment 1000 times and note the behavior of the empirical mean and standard deviation.

1. $$a = 1$$
2. $$a = 2$$
3. $$a = 3$$

#### The Normal Distribution

Recall that the standard normal distribution is a continuous distribution with density function

$\phi(z) = \frac{1}{\sqrt{2 \pi}} e^{-\frac{1}{2} z^2}, \quad z \in \R$

Normal distributions are widely used to model physical measurements subject to small, random errors and are studied in detail in the chapter on Special Distributions.

Suppose that $$Z$$ has the standard normal distribution. Then

1. $$\E(Z) = 0$$
2. $$\var(Z) = 1$$
Proof:

We showed that $$\E(Z) = 0$$ in the section on properties of expected value. Hence $$\var(Z) = \E(Z^2) = \int_{-\infty}^\infty z^2 \phi(z) \, dz$$. Integrate by parts with $$u = z$$ and $$dv = z \phi(z) \, dz$$. Thus, $$du = dz$$ and $$v = -\phi(z)$$. Hence

$\var(Z) = -z \phi(z) \bigg|_{-\infty}^\infty + \int_{-\infty}^\infty \phi(z) \, dz = 0 + 1$

Suppose again that $$Z$$ has the standard normal distribution and that $$\mu \in (-\infty, \infty)$$ and $$\sigma \in (0, \infty)$$. Recall that $$X = \mu + \sigma Z$$ has the normal distribution with location parameter $$\mu$$ and scale parameter $$\sigma$$. Then

1. $$\E(X) = \mu$$
2. $$\var(X) = \sigma^2$$
Proof:

These results follow directly from Theorem 5: $$\E(X) = \mu + \sigma \E(Z) = \mu + 0 = \mu$$ and $$\var(X) = \sigma^2 \E(Z) = \sigma^2 \cdot 1 = \sigma^2$$.

Thus, as the notation suggests, the location parameter $$\mu$$ is also the mean and the scale parameter $$\sigma$$ is also the standard deviation.

In the special distribution simulator, select the normal distribution. Vary the parameters and note the shape and location of the mean-standard deviation bar. For selected parameter values, run the experiment 1000 times and note the apparent convergence of the empirical mean and standard deviation to the distribution mean and standard deviation.

#### Beta Distributions

The distributions in this subsection belong to the family of beta distributions, which are widely used to model random proportions and probabilities. The beta distribution is studied in detail in the chapter on Special Distributions.

Graph the density functions below and compute the mean and variance of each.

1. $$f(x) = 6 \, x \, (1 - x)$$ for $$0 \le x \le 1$$
2. $$f(x) = 12 \, x^2 \, (1 - x)$$ for $$0 \le x \le 1$$
3. $$f(x) = 12 \, x \, (1 - x)^2$$ for $$0 \le x \le 1$$
4. $$f(x) = \frac{1}{\pi \sqrt{x (1 - x)}}$$ for $$0 \lt x \lt 1$$
1. $$\E(X) = \frac{1}{2}$$, $$\var(X) = \frac{1}{20}$$
2. $$\E(X) = \frac{3}{5}$$, $$\var(X) = \frac{1}{25}$$
3. $$\E(X) = \frac{2}{6}$$, $$\var(X) = \frac{1}{25}$$
4. $$\E(X) = \frac{1}{2}$$, $$\var(X) = \frac{1}{8}$$

The particular beta distribution in part (d) is also known as the arcsine distribution.

#### Exercises on Basic Properties

Suppose that $$X$$ is a real-valued random variable with $$\E(X) = 5$$ and $$\var(X) = 4$$. Find each of the following:

1. $$\var(3 X - 2)$$
2. $$\E(X^2)$$
1. $$36$$
2. $$29$$

Suppose that $$X$$ is a real-valued random variable with $$\E(X) = 2$$ and $$\E[X(X - 1) = 8]$$. Find each of the following:

1. $$\E(X^2)$$
2. $$\var(X)$$
1. $$10$$
2. $$6$$

The expected value $$\E[X(X - 1)]$$ is an example of a factorial moment.

Suppose that $$X_1$$ and $$X_2$$ are independent, real-valued random variables with $$\E(X_i) = \mu_i$$ and $$\var(X_i) = \sigma_i^2$$ for $$i \in \{1, 2\}$$. Then

$\var(X_1 X_2) = (\sigma_1^2 + \mu_1^2) (\sigma_2^2 + \mu_2^2) - \mu_1^2 \mu_2^2$

Marilyn Vos Savant has an IQ of 228. Assuming that the distribution of IQ scores has mean 100 and standard deviation 15, find Marilyn's standard score.

$$z = 8.53$$