]>
The central limit theorem and the law of large numbers are the two fundamental theorems of probability. Roughly, the central limit theorem states that the distribution of the sum of a large number of independent, identically distributed variables will be approximately normal, regardless of the underlying distribution. The importance of the central limit theorem is hard to overstate; indeed it is the reason that many statistical procedures work.
Suppose that is a sequence of independent, identically distributed, real-valued random variables with common probability density function , mean , and variance . Let
Note that by convention, , since the sum is over an empty index set. The random process is called the partial sum process associated with . Special types of partial sum processes have been studied in many places in this project; in particular see
Recall that in statistical terms, the sequence corresponds to sampling from the underlying distribution. In particular, is a random sample of size from the distribution, and the corresponding sample mean is
By the law of large numbers, as with probability 1.
Show that if then has the same distribution as . Thus the process has stationary increments.
Show that if then is a sequence of independent random variables. Thus the process has independent increments.
Conversely, suppose that is a random process with stationary, independent increments, in the sense of Exercise 1 and Exercise 2. Define for . Show that is a sequence of independent, identically distributed variables and that is the partial sum process associated with .
Thus, partial sum processes are the only discrete-time random processes that have stationary, independent increments. An interesting, and much harder problem, is to characterize the continuous-time processes that have stationary independent increments. The Poisson counting process has stationary independent increments, as does the Brownian motion process.
Suppose that . Use basic properties of expected value and variance to show that
Suppose that and with . Use basic properties of covariance and the stationary and independence properties to verify the following results. Hint: Note that .
Suppose that has moment generating function . Show that has moment generating function
Suppose that has either a discrete distribution or a continuous distribution with probability density function . Recall that the probability density function of is , the convolution power of of order .
More generally, we can use the stationary and independence properties to find the joint distributions of the partial sum process:
Suppose that . Show that has joint probability density function
We will now make the central limit theorem precise. From Exercise 4, we cannot expect itself to have a limiting distribution. Note that
Thus, to obtain a limiting distribution that is not degenerate, we need to consider, not itself, but the standard score of . Thus, let
Show that
Show that is also the standard score of the sample mean :
The central limit theorem states that the distribution of the standard score converges to the standard normal distribution as . A special case of the central limit theorem (to Bernoulli trials), dates to Abraham De Moivre. The term central limit theorem was coined by George Pólya in 1920.
We need to show that as for each , where is the distribution function of and the distribution function of the standard normal distribution. Equivalently we will show that
where is the characteristic function of , and the expression on the right is the characteristic function of the standard normal distribution.
The following exercises sketch the proof of the central limit theorem. Ultimately, the proof hinges on a generalization of a famous limit from calculus.
Suppose that as . Show that
Now let denote the characteristic function of the standard score of a sample variable , and let denote the characteristic function of the standard score :
Show that
Show that
Use properties of characteristic functions to show that
Use Taylor's theorem (named after Brook Taylor) to show that
In the context of previous exercise, show that and hence as .
Finally, show that
The central limit theorem implies that if the sample size
is large
then the distribution of the partial sum
is approximately normal with mean
and variance
.
Equivalently the sample mean
is approximately normal with mean
and variance
.
The central limit theorem is of fundamental importance, because it means that we can approximate the distribution of certain statistics, even if we know very little about the underlying sampling distribution.
Of course, the term large
is relative. Roughly, the more abnormal
the basic distribution, the larger
must be for normal approximations to work well. The rule of thumb is that a sample size
of at least 30 will usually suffice; although for many distributions smaller
will do.
Let denote the sum of the variables in a random sample of size 30 from the uniform distribution on . Find normal approximations to each of the following:
Let denote the sample mean of a random sample of size 50 from the distribution with probability density function . This is a Pareto distribution, named for Vilfredo Pareto. Find normal approximations to each of the following:
A slight technical problem arises when the sampling distribution is discrete. In this case, the partial sum also has a discrete distribution, and hence we are approximating a discrete distribution with a continuous one.
Suppose that takes integer values and hence so does the partial sum . Show that for any and , the event is equivalent to the event .
In the context of the previous exercise, different values of lead to different normal approximations, even though the events are equivalent. The smallest approximation would be 0 when , and the approximations increase as increases. It is customary to split the difference by using for the normal approximation. This is sometimes called the continuity correction. The continuity correction is extended to other events in the natural way, using the additivity of probability.
In the dice experiment, set the die distribution to fair, select the sum random variable , and set . Run the simulation 1000 times, updating every 10 runs. Compute the following and compare with the result in the previous exercise:
If has the gamma distribution with shape parameter and scale parameter then
where is a sequence of independent variables, each having the exponential distribution with scale parameter . Since and , it follows that if is large, the gamma distribution can be approximated by the normal distribution with mean and variance . The same statement actually holds when is not an integer; more precisely, the distribution of the standardized variable below converges to the standard normal distribution as .
In the gamma experiment, vary and and note the shape of the probability density function. With and , run the experiment 1000 times with an update frequency of 10 and note the apparent convergence of the empirical density function to the true density function.
Suppose that has the gamma distribution with shape parameter and scale parameter . Find normal approximations to each of the following:
The chi-square distribution with degrees of freedom is the gamma distribution with shape parameter and scale parameter . From the previous subsection, it follows that if is large the chi-square distribution can be approximated by the normal distribution with mean and variance . More precisely, if has the chi-square distribution with degrees of freedom, then the distribution of the standardized variable below converges to the standard normal distribution as
In the chi-square experiment, vary and note the shape of the density function. With , run the experiment 1000 times with an update frequency of 10 and note the apparent convergence of the empirical density function to the probability density function.
Suppose that has the chi-square distribution with . Find normal approximations to each of the following:
If has the binomial distribution with trial parameter and success parameter , then
where is a Bernoulli trails sequence with success parameter , that is, a sequence of independent indicator variables with for each . It follows that if is large, the binomial distribution with parameters and can be approximated by the normal distribution with mean and variance . The rule of thumb is that should be large enough for and . More precisely, the distribution of the standardized variable given below converges to the standard normal distribution as :
In the binomial timeline experiment, vary and and note the shape of the probability density function. With and , run the simulation 1000 times, updating every 10 runs. Compute the following:
Suppose that has the binomial distribution with parameters and . Compute the normal approximation to (don't forget the continuity correction) and compare with the results of the previous exercise.
If has the Poisson distribution with parameter , then
where is a sequence of independent variables, each with the Poisson distribution with parameter 1. Since , it follows from the central limit theorem that if is large, the Poisson distribution with parameter can be approximated by the normal distribution with mean and variance . The same statement holds when is not an integer; more precisely, the distribution of the standardized variable below converges to the standard normal distribution as .
Suppose that has the Poisson distribution with mean 20.
In the Poisson experiment, vary the time and rate parameters and (the parameter of the Poisson distribution in the experiment is the product ). Note the shape of the density function. With and , run the experiment 1000 times with an update frequency of 10 and note the apparent convergence of the empirical density function to the probability density function.
If has the negative binomial distribution with trial parameter and success parameter then
where is a sequence of independent variables, each having the geometric distribution on with success parameter . Since and , it follows that if is large, the negative binomial distribution can be approximated by the normal distribution with mean and variance . More precisely, the distribution of the standardized variable below converges to the standard normal distribution as .
In the negative binomial experiment, vary and and note the shape of the probability density function. With and , run the experiment 1000 times with an update frequency of 10 and note the apparent convergence of the empirical density function to the true density function.
Suppose that has the negative binomial distribution with trial parameter and success parameter . Find normal approximations to each of the following:
Suppose now that is a random variable taking values in , with finite mean and variance. Then
is a random sum of the independent, identically distributed variables. That is, the terms are random of course, but so also is the number of terms . We are primarily interested in the moments of .
Suppose first that , the number of terms, is independent of , the sequence of terms. Computing the moments of is a good exercise in conditional expectation.
Show that
Show that
Let denote the probability generating function of . Show that the moment generating function of is
Some of these results generalize to the case where the random number of terms is a stopping time for the sequence . This means that the event depends only on (technically, is measurable with respect to) for each .
Prove Wald's equation named after Abraham Wald: .