]>
Suppose that and are real-valued random variables with distribution functions and , respectively. We say that the distribution of converges to the distribution of as if
for all at which is continuous. The first fact to notice is that convergence in distribution, as the name suggests, only involves the distributions of the random variables. Thus, the random variables need not even be defined on the same probability space (that is, they need not be defined for the same random experiment). This is in sharp contrast to the other modes of convergence we have studied:
We will show, in fact, that convergence in distribution is the weakest of all of these modes of convergence. It is nonetheless very important. The central limit theorem, one of the two fundamental theorems of probability, is a theorem about convergence in distribution.
The examples below show why the definition is given in terms of distribution functions, rather than density functions, and why convergence is only required at the points of continuity of the limiting distribution function.
Let for and let . Let and be the corresponding density functions and let and be the corresponding distribution functions. Show that
Suppose that has the discrete uniform distribution on for each . Let have the continuous uniform distribution on the interval
As Exercise 2 shows, it is quite possible to have a sequence of discrete distributions converge to a continuous distribution (or the other way around). Recall that probability density functions have very different meanings in the discrete and continuous cases: density with respect to counting measure in the first case, and density with respect to Lebesgue measure in the second case. This is another indication that distribution functions, rather than density functions, are the correct objects of study. However, if probability density functions of a fixed type converge then the distributions converge. The following results are a consequence of Scheffe's theorem, which is given in advanced topics below.
Suppose that and are probability density functions for discrete distributions on a countable set , and that as for each . Then the distribution defined by converges to the distribution defined by as . Similarly, suppose that and are probability density functions for continuous distributions on , and that as for all (except perhaps on a set with Lebesgue measure 0). Then the distribution defined by converges to the distribution defined by as .
Suppose that and are random variables (defined on the same probability space) with distribution functions and , respectively. Show that if as in probability, then the distribution of converges to the distribution of as .
Our next example shows that even when the variables are defined on the same probability space, a sequence can converge in distribution, but not in any other way.
Let be an indicator variable with , and let for . Show that
To summarize, we have the following implications for the various modes of convergence; no other implications hold in general.
However, the following exercise gives an important converse to the last implication in the summary above, when the limiting variable is a constant. Of course, a constant can be viewed as a random variable defined on any probability space.
Suppose that is a sequence of random variables (defined on the same probability space) and that the distribution of converges to the distribution of the constant as . Show that as in probability:
There are several important cases where a special distribution converges to another special distribution as a parameter approaches a limiting value. Indeed, such convergence results are part of the reason why such distributions are special in the first place.
Recall that the hypergeometric distribution with parameters , , and is the distribution that governs the number of type 1 objects in a sample of size , drawn without replacement from a population of objects with of type 1. It has discrete probability density function
Suppose that depends on , and that as . Show that for fixed , the hypergeometric distribution with parameters , , and converges to the binomial distribution with parameters and as .
From a practical point of view, the result in the last exercise means that if the population size
is large
compared to sample size
, then the hypergeometric distribution with parameters
,
, and
(which corresponds to sampling without replacement) is well approximated by the binomial distribution with parameters
and
(which corresponds to sampling with replacement). This is often a useful result, because the binomial distribution has fewer parameters than the hypergeometric distribution (and often in real problems, the parameters may only be known approximately). Specifically, in the limiting binomial distribution, we do not need to know the population size
and the number of type 1 objects
individually, but only in the ratio
.
In the ball and urn experiment, set and For each of the following values of (the sample size), switch between sampling without replacement (the hypergeometric distribution) and sampling with replacement (the binomial distribution). Note the difference in the density function. Run the simulation 1000 times with an update frequency of 10 for each sampling mode.
Recall that the binomial distribution with parameters and is the distribution of the number successes in Bernoulli trials, when is the probability of success on a trial. This distribution has probability density function
Recall also that the Poisson distribution with rate parameter has probability density function;
Suppose now that the success parameter in the binomial distribution depends on the trial parameter and that as where . Show that this binomial distribution converges to the Poisson distribution with parameter as .
From a practical point of view, the result in the last exercise means that if the number of trials
is large
and the probability of success
small
,
so that the product
is of moderate size, then the binomial distribution with parameters
and
is well approximated by the Poisson distribution with parameter
.
This is often a useful result, because the Poisson distribution has fewer parameters than the binomial distribution (and often in real problems, the parameters may only be known approximately). Specifically, in the limiting Poisson distribution, we do not need to know the number of trials
and the probability of success
individually, but only in the product
.
Suppose that is a random variable with probability density function , where is a parameter. Thus, has the geometric distribution on with parameter . Random variable can be interpreted as the trial number of the first success in a sequence of Bernoulli trials.
Suppose that has the geometric distribution on with success parameter . Moreover, suppose that as where . Show that the distribution of converges to the exponential distribution with parameter as .
Note that the limiting condition on and in the last exercise is precisely the same as the condition for the convergence of the binomial distribution to the Poisson discussed above. For a deeper interpretation of both of these results, see the section on the Bernoulli trials and the Poisson process.
Consider a random permutation of the elements in the set . We say that a match occurs at position if .
Show that for each . Thus, the matching events all have the same probability, which varies inversely with the number of trials.
Show, however, that for each , with . Thus, the matching events are dependent, and in fact are positively correlated. In particular, the matching events do not form a sequence of Bernoulli trials.
The matching problem is studied in detail in the chapter on Finite Sampling Models. In particular, the number of matches has the following density function
Show that the distribution of converges to the Poisson distribution with parameter 1 as .
Suppose that is a sequence of independent random variables, each with the standard exponential distribution. Thus, the common distribution function is
Find the distribution function of
Show that the distribution of converges to the distribution with the following distribution function as :
The limiting distribution in the last exercise is the type 1 extreme value distribution, also known as the Gumbel distribution in honor of Emil Gumbel. Extreme value distributions are studied in detail in the chapter on Special Distributions.
Suppose that takes values in for with distribution function . Thus, has the Pareto distribution with shape parameter , named for Vilfredo Pareto. The Pareto distribution is studied in more detail in the chapter on Special Distributions.
The two fundamental theorems of basic probability theory, the law of large numbers and the central limit theorem, are studied in detail in the chapter on Random Samples. For this reason we will simply state the results in this section.
Suppose that is a sequence of independent, identically distributed, real-valued random variables (defined on the same probability space) with mean and standard deviation . Let
denote the sum of the first variables. A weak version of the law of large numbers states that the distribution of the average converges to the point mass distribution at as . From Exercise 5, the convergence is also in probability. In fact the convergence is with probability 1 (much stronger), assuming that is finite. The central limit theorem states that the distribution of the standard score
converges to the standard normal distribution as .
Suppose that and are distribution functions, and that as in the sense of convergence of distribution. In this subsection we will prove the Skorohod representation theorem: there exist random variables and (defined on the same probability space) such that
Prove the Skorohod representation theorem using the following steps:
The following important result illustrates the value of the Skorohod representation.
Suppose that and are real-valued random variables such that the distribution of converges to the distribution of as . If is a continuous function from into , then the distribution of converges to the distribution of as .
The following exercises gives Scheffé's theorem, named after Henry Scheffé.
Suppose that is a probability density function for a continuous distribution on for each , and that is a probability density function for a continuous distribution on . Suppose that as for all , except perhaps on a set of Lebesgue measure 0. Then as uniformly in (measurable) .
Scheffé's theorem is true if the functions are probability density functions with respect to an arbitrary positive measure on , not just Lebesgue measure. The proof is essentially the same. In particular, if we use counting measure, we get the version of Scheffé's theorem for discrete distributions.
Generating functions are studied in the chapter on Expected Value. In part, the importance of generating functions stems from the fact that ordinary (pointwise) convergence of a sequence of generating functions corresponds to the convergence of the distributions in the sense of this section. Often it is easier to show convergence in distribution using generating functions than directly from the definition.