]>
Recall that by taking the expected value of various transformations of a random variable, we can measure many interesting characteristics of the distribution of the variable. In this section, we will study expected values that measure spread, skewness and other properties.
As usual, we start with a random experiment with probability measure on an underlying sample space. Suppose that is a random variable for the experiment, taking values in . Recall that the expected value or mean of gives the center of the distribution of . The variance of is a measure of the spread of the distribution about the mean and is defined by
Recall that the second moment of about is . Thus, the variance is the second moment of about , or equivalently, the second central moment of . Second moments have a nice interpretation in physics, if we think of the distribution of as a mass distribution in . Then the second moment of about is the moment of inertia of the mass distribution about . This is a measure of the resistance of the mass distribution to any change in its rotational motion about . In particular, the variance of is the moment of inertia of the mass distribution about the center of mass .
Suppose that has a discrete distribution with probability density function . Use the change of variables theorem to show that
Suppose that has a continuous distribution with probability density function . Use the change of variables theorem to show that
The standard deviation of is the square root of the variance. It also measures dispersion about the mean but has the same physical units as the variable .
The following exercises give some basic properties of variance, which in turn rely on basic properties of expected value:
Show that .
Show that
Show that if and only if for some constant .
Show that if and are constants then .
Show that the random variable given below has mean 0 and variance 1:
.The random variable in Exercise 7 is sometimes called the standard score associated with . Since and its mean and standard deviation all have the same physical units, the standard score is dimensionless. It measures the directed distance from to in terms of standard deviations.
On the other hand, when , the ratio of standard deviation to mean is called the coefficient of variation. This quantity also is dimensionless, and is sometimes used to compare variability for random variables with different means.
Chebyshev's inequality (named after Pafnuty Chebyshev) gives an upper bound on the probability that a random variable will be more than a specified distance from its mean. This is often useful in applied problems where the distribution is unknown, but the mean and variance are at least approximately known. In the following two exercises, suppose that is a real-valued random variable with mean and standard deviation .
Use Markov's inequality to prove Chebyshev's inequality:
Establish the following equivalent version of Chebyshev's inequality:
The usefulness of the Chebyshev inequality comes from the fact that it holds for any distribution (assuming only that the mean and variance exist). The tradeoff is that for many specific distributions, the Chebyshev bound is rather crude. Note in particular that in the last exercise, the bound is useless when , since 1 is an upper bound for the probability of any event.
Suppose that is an indicator variable with .
Note that the minimum value of is 0, and occurs when and . The maximum value is and occurs when
Suppose that has the discrete uniform distribution on the integer interval where .
Suppose that has the continuous uniform distribution on the interval .
Note that in both the discrete and continuous cases, the variance depends only on the length of the interval.
Recall that a standard die is a six-sided die. A fair die is one in which the faces are equally likely. An ace-six flat die is a standard die in which faces 1 and 6 have probability each, and faces 2, 3, 4, and 5 have probability each.
In the dice experiment, select one fair die. Run the experiment 1000 times, updating every 10 runs, and note the apparent convergence of the empirical mean and standard deviation to the distribution mean and standard deviation.
In the dice experiment, select one ace-six flat die. Run the experiment 1000 times, updating every 10 runs, and note the apparent convergence of the empirical mean and standard deviation to the distribution mean and standard deviation.
Recall that the Poisson distribution has density function
where
is a parameter. The Poisson distribution is named after Simeon Poisson and is widely used to model the number of random points
in a region of time or space; the parameter
is proportional to the size of the region. The Poisson distribution is studied in detail in the chapter on the Poisson Process.
Suppose that has the Poisson distribution with parameter .
Thus, the parameter is both the mean and the variance of the distribution.
In the Poisson experiment, the parameter is . Vary the parameter and note the size and location of the mean-standard deviation bar. For selected values of the parameter, run the experiment 1000 times updating every 10 runs. Note the apparent convergence of the empirical mean and standard deviation to the distribution mean and standard deviation.
Recall that the geometric distribution on is a discrete distribution with density function
where is a parameter. The geometric distribution governs the trial number of the first success in a sequence of Bernoulli trials with success parameter .
Suppose that has the geometric distribution with success parameter .
In the negative binomial experiment, set to get the geometric distribution . Vary with the scroll bar and note the size and location of the mean-standard deviation bar. For selected values of , run the experiment 1000 times updating every 10 runs. Note the apparent convergence of the empirical mean and standard deviation to the distribution mean and standard deviation.
Suppose that has the geometric distribution with parameter . Compute the true value and the Chebyshev bound for the probability that is at least 2 standard deviations away from the mean.
Recall that the exponential distribution is a continuous distribution with probability density function
where
is the with rate parameter. This distribution is widely used to model failure times and other arrival times
. The exponential distribution is studied in detail in the chapter on the Poisson Process.
Suppose that has the exponential distribution with rate parameter .
Thus, for the exponential distribution, the mean and standard deviation are the same.
In the gamma experiment, set to get the exponential distribution. Vary with the scroll bar and note the size and location of the mean-standard deviation bar. For selected values of , run the experiment 1000 times updating every 10 runs. Note the apparent convergence of the empirical mean and standard deviation to the distribution mean and standard deviation.
Suppose that has the exponential distribution with rate parameter . Compute the true value and the Chebyshev bound for the probability that is at least standard deviations away from the mean.
Recall that the Pareto distribution is a continuous distribution with density function
where is a parameter. The Pareto distribution is named for Vilfredo Pareto. It is a heavy-tailed distribution that is widely used to model financial variables such as income. The Pareto distribution is studied in detail in the chapter on Special Distributions.
Suppose that has the Pareto distribution with shape parameter .
In the random variable experiment, select the Pareto distribution. Vary with the scroll bar and note the size and location of the mean/standard deviation bar. For each of the following values of , run the experiment 1000 times updating every 10 runs and note the behavior of the empirical mean and standard deviation.
Recall that the standard normal distribution is a continuous distribution with density function
Normal distributions are widely used to model physical measurements subject to small, random errors and are studied in detail in the chapter on Special Distributions.
Suppose that has the standard normal distribution.
Suppose again that has the standard normal distribution and that , . Recall that has the normal distribution with location parameter and scale parameter .
Thus, as the notation suggests, the location parameter is also the mean and the scale parameter is also the standard deviation.
In the random variable experiment, select the normal distribution. Vary the parameters and note the shape and location of the mean-standard deviation bar. For selected parameter values, run the experiment 1000 times updating every 10 runs and note the apparent convergence of the empirical mean and standard deviation to the distribution mean and standard deviation.
The distributions in this subsection belong to the family of beta distributions, which are widely used to model random proportions and probabilities. The beta distribution is studied in detail in the chapter on Special Distributions.
The particular beta distribution in part (d) is also known as the arcsine distribution.
Suppose that and are independent, real-valued random variables with and for . Show that
Marilyn Vos Savant has an IQ of 228. Assuming that the distribution of IQ scores has mean 100 and standard deviation 15, find Marilyn's standard score.
Suppose that is a real-valued random variable. Recall again that the variance of is the second moment of about the mean, and measures the spread of the distribution of about the mean. The third and fourth moments of about the mean also measure interesting features of the distribution. The third moment measures skewness, the lack of symmetry, while the fourth moment measures kurtosis, the degree to which the distribution is peaked. The actual numerical measures of these characteristics are standardized to eliminate the physical units, by dividing by an appropriate power of the standard deviation. As usual, we assume that all expected values given below exist, and we will let and .
The skewness of is the third moment of the standard score :
Suppose that has a continuous distribution with probability density that is symmetric about : .
Show that
The kurtosis of is the fourth moment of the standard score :
Show that
Graph the following density functions and compute the mean, variance, skewness and kurtosis of each. (The corresponding distributions are all members of the family of beta distributions).
Variance and higher moments are related to the concept of norm and distance in the theory of vector spaces. This connection can help unify and illuminate some of the ideas.
Our vector space consists of all real-valued random variables defined on a fixed probability space (that is, relative to a given random experiment). Recall that two random variables are equivalent if they are equal with probability 1. We consider two such random variables as the same vector, so that technically, our vector space consists of equivalence classes under this equivalence relation. The addition operator corresponds to the usual addition of two real-valued random variables, and the operation of scalar multiplication corresponds to the usual multiplication of a real-valued random variable by a real (non-random) number.
Let be a real-valued random variable. For , we define the -norm by
Thus, is a measure of the size of in a certain sense. The following exercises establish the fundamental properties.
Show that for any .
Show that if and only if (so that is equivalent to 0).
Show that for any constant .
The next exercise gives Minkowski's inequality, named for Hermann Minkowski. It is also known as the triangle inequality.
Show that for any and .
It follows from Exercises 42-45 that the set of random variables with finite moment forms a subspace of our parent vector space , and that the -norm really is a norm on this vector space:
Our next exercise gives Lyapunov's inequality, named for Aleksandr Lyapunov. This inequality shows that the -norm of a random variable is increasing in .
Show that if then .
Lyapunov's inequality shows that if and has a finite moment, then has a finite moment as well. Thus, is a subspace of .
Suppose that has probability density function , where is a parameter. Thus, has the Pareto distribution with shape parameter .
The -norm, like any norm on a vector space, can be used to measure distance; we simply compute the norm of the difference between two vectors. Thus, we define the -distance (or -metric) between real-valued random variables and to be
.The properties in the following exercises are analogies of the properties in Exercises 42-45 (and thus very little additional work should be required). These properties show that the -metric really is a metric.
Show that for any , .
Show that if and only if (so that and are equivalent).
Show that for any , , (this is known as the triangle inequality).
Thus, the standard deviation is simply the 2-distance from to its mean:
and the variance is the square of this. More generally, the moment of about is simply the power of the -distance from to . The 2-distance is especially important for reasons that will become clear below and in the next section. This distance is also called the root mean square distance.
Measures of center and measures of spread are best thought of together, in the context of a measure of distance. For a random variable , we first try to find the constants that are closest to , as measured by the given distance; any such is a measure of center relative to the distance. The minimum distance itself is the corresponding measure of spread.
Let us apply this procedure to the 2-distance. Thus, we define the root mean square error function by
Show that is minimized when and that the minimum value is .
The physical interpretation of this result is that the moment of inertia of the mass distribution of about is minimized when , the center of mass.
In the histogram applet, construct a discrete distribution each of the types indicated below. Note the position and size of the mean ± standard deviation bar and the shape of the mean square error graph.
Next, let us apply our procedure to the 1-distance. Thus, we define the mean absolute error function by
We will show that is minimized when is any median of . We start with a discrete case, because it's easier and has special interest.
Suppose that has a discrete distribution with values in a finite set .
The last exercise shows that mean absolute error has a couple of basic deficiencies as a measure of error:
Indeed, when does not have a unique median, there is no compelling reason to choose one value in the median interval, as the measure of center, over any other value in the interval.
In the histogram applet, construct a distribution of each of the types indicated below. In each case, note the position and size of the boxplot and the shape of the mean absolute error graph.
Let be an indicator random variable with . Graph as a function of in each of the cases below. In each case, find the minimum value of the function and the values of where the minimum occurs.
Suppose now that has a general distribution on . Show that is minimized when is any median of .
Whenever we have a measure of distance, we automatically have a criterion for convergence. Let and be real-valued random variables defined on the same sample space (that is, defined for the same random experiment). We say that as in mean if
or equivalentlyWhen , we simply say that as in mean; when , we say that as in mean square. These are the most important special cases.
Use Lyapunov's inequality to show that if , then as in mean implies as in mean.
Our next sequence of exercises shows that convergence in mean is stronger than convergence in probability.
Use Markov's inequality to show that if as in mean, then as in probability.
The converse is not true. Moreover, convergence with probability 1 does not imply convergence in mean and convergence in mean does not imply convergence with probability 1. The next two exercises give some counterexamples.
Suppose that is a sequence of independent random variables with
Suppose that is a sequence of independent indicator random variables with
The implications in the various modes of convergence are shown below; no other implications hold in general.
For a related statistical topic, see the section on the Sample Variance in the chapter on Random Samples. The variance of a sum of random variables is best understood in terms of a related concept known as covariance, that will be studied in detail in the next section.