]>
The matching experiment is a random experiment that can the formulated in a number of colorful ways:
These experiments are clearly equivalent from a mathematical point of view, and correspond to selecting a random permutation of the population :
Here is the interpretation for the examples above:
Our modeling assumption, of course, is that is uniformly distributed on the sample space of permutations of . The number of objects is the basic parameter of the experiment. We will also consider the case of sampling with replacement from the population , because the analysis is much easier but still provides insight. In this case, is a sequence of independent random variables, each uniformly distributed over .
We will say that a match occurs at position if . Thus, number of matches is the random variable defined mathematically by
where is the indicator variable for the event of match at position . Our problem is to compute the probability distribution of the number of matches. This is an old and famous problem in probability that was first considered by Pierre-Remond Montmort; it sometimes referred to as Montmort's matching problem in his honor.
First let's solve the matching problem in the easy case, when the sampling is with replacement.
Show that the sample is a sequence of Bernoulli trials, with success probability .
Conclude that the number of matches has a binomial distribution with trial parameter and success parameter .
Use a basic results for the binomial distribution to show that
Thus, the expected value and the variance of the number of matches are both 1, regardless of , a somewhat surprising result at first.
Use a basic limit from calculus to show that as for ,
As a function of , the right-hand side of this expression is the probability density function of the Poisson distribution with parameter 1. Thus, the distribution of the number of matches converges to the Poisson distribution with parameter 1 as the increases. This is a special case of the convergence of the binomial distribution to the Poisson.
Now let's consider the case of real interest, when the sampling is without replacement, so that is a random permutation of the elements of .
To find the probability density function of , we need to count the number of permutations of with a specified number of matches. This will turn out to be easy once we have counted the number of permutations with no matches; these are called derangements of . We will denote the number of permutations of with exactly matches by
Thus, is the number of derrangements of .
Show that the number of derrangements is
Use the multiplication principle of combinatorics to show that
Use the result of the previous exercise to show that has the probability density function given below. Compare with the result in Exercise 2, when the sampling is with replacement.
In the matching experiment, vary the parameter and note the shape and location of the probability density function. For selected values of , run the simulation 1000 times, updating every 10 runs. Note the apparent convergence of empirical density function to the true probability density function.
Show that . Give a probabilistic proof by arguing that the event is impossible and an algebraic proof using the probability density function in Exercise 7.
Use a standard infinite series from calculus to show that as for .
Just as in the case of sampling with replacement, the distribution of the number of matches converges to the Poisson distribution with parameter 1 as the increases. The convergence is remakably rapid. Indeed, the distribution of the number of matches with is essentially the same as the distribution of the number of matches with !
In the matching experiment, increase and note how the probability density function stabilizes rapidly. For selected values of , run the simulation 1000 times, updating every 10 runs. Note the apparent convergence of the relative frequency function to the probability density function.
The mean and variance of the number of matches could be computed directly from the distribution. However, it is much better to use the representation in terms of indicator variables. The exchangeable property is an important tool in this section.
Show that for any .
Show that . Hint: Use the result of Exercise 12 and basic properties of expected value.
Thus, the expected number of matches is 1, regardless of , just as when the sampling is with replacement.
Show that for any .
A match in one position would seem to make it more likely that there would be a match in another position. Thus, we might guess that the indicator variables are positively correlated.
Show that for distinct and in ,
From Exercise 15, when , the event that there is a match in position 1 is perfectly correlated with the event that there is a match in position 2. Does this result seem reasonable?
Show that . Hint: Use the previous two exercises and basic properties of covariance.
In the matching experiment, vary the parameter and note the shape and location of the mean/standard deviation bar. For selected values of the parameter, run the simulation 1000 times, updating every 10 runs. Note the apparent convergence of sample mean and standard deviation to the distribution mean and standard deviation.
Show that for distinct and in , as .
Thus, the event that a match occurs in position is nearly independent of the event that a match occurs in position if is large. For large , the indicator variables behave nearly like Bernoulli trials with success probability , which of course, is what happens when the sampling is with replacement.
In this subsection, we will give an alternate derivation of the distribution of the number of matches, in a sense by embedding the experiment with parameter into the experiment with parameter . First, consider the random permutation of
Argue that is a random permutation of if and only if if and only if
Use the result of the previous exercise to argue that
Use the result of the previous exercise and a conditional probability argument to show that
Argue that
Finally conclude that for
Note also that .
The results of the previous two exercises can be used to obtain the probability density function of recursively for any .
Next recall that the probability generating function of is given by
Use the results of the last subsection and basic calculus to show that the family of probability generating functions satisfies the following differential equations and ancillary conditions:
Note that for . This fact, together with the system of differential equations in the previous exercise, can be used to compute for any
Show that for ,
Use Exercise 25 to show that
Use the result of the previous exercise and basic properties of generating functions to conclude that
A secretary randomly stuffs 5 letters into 5 envelopes.
Ten married couples are randomly paired for a dance.
In the matching experiment, set . Run the experiment 1000 times with an update frequency of 10. Compare the following for the number of matches: