]> The Matching Problem
  1. Virtual Laboratories
  2. 11. Finite Sampling Models
  3. 1
  4. 2
  5. 3
  6. 4
  7. 5
  8. 6
  9. 7
  10. 8
  11. 9

5. The Matching Problem

Definitions and Notation

The Matching Experiment

The matching experiment is a random experiment that can the formulated in a number of colorful ways:

These experiments are clearly equivalent from a mathematical point of view, and correspond to selecting a random permutation X X 1 X 2 X n of the population D n 1 2 n :

Here is the interpretation for the examples above:

Our modeling assumption, of course, is that X is uniformly distributed on the sample space of permutations of D n . The number of objects n is the basic parameter of the experiment. We will also consider the case of sampling with replacement from the population D n , because the analysis is much easier but still provides insight. In this case, X is a sequence of independent random variables, each uniformly distributed over D n .

Matches

We will say that a match occurs at position j if X j j . Thus, number of matches is the random variable N n defined mathematically by

N n i 1 n I j

where I j X j j is the indicator variable for the event of match at position j . Our problem is to compute the probability distribution of the number of matches. This is an old and famous problem in probability that was first considered by Pierre-Remond Montmort; it sometimes referred to as Montmort's matching problem in his honor.

Sampling With Replacement

First let's solve the matching problem in the easy case, when the sampling is with replacement.

Show that the sample I 1 I 2 I n is a sequence of n Bernoulli trials, with success probability 1 n .

Conclude that the number of matches N n has a binomial distribution with trial parameter n and success parameter 1 n .

N n k n k 1 n k 1 1 .n n k ,  k 0 1 n

Use a basic results for the binomial distribution to show that

  1. N n 1
  2. N n 1

Thus, the expected value and the variance of the number of matches are both 1, regardless of n , a somewhat surprising result at first.

Use a basic limit from calculus to show that N n k 1 k as n for k ,

As a function of k , the right-hand side of this expression is the probability density function of the Poisson distribution with parameter 1. Thus, the distribution of the number of matches converges to the Poisson distribution with parameter 1 as the n increases. This is a special case of the convergence of the binomial distribution to the Poisson.

Sampling Without Replacement

Now let's consider the case of real interest, when the sampling is without replacement, so that X is a random permutation of the elements of D n .

Counting Permutations with Matches

To find the probability density function of N n , we need to count the number of permutations of D n with a specified number of matches. This will turn out to be easy once we have counted the number of permutations with no matches; these are called derangements of D n . We will denote the number of permutations of D n with exactly k matches by

b n k N n k ,  k 0 1 n

Thus, b n 0 is the number of derrangements of D n .

Show that the number of derrangements is

b n 0 n j 0 n 1 j j
  1. Use basic properties of counting measure to show that b n 0 n i 1 n X i i
  2. Use the inclusion-exclusion formula of combinatorics to show that b n 0 n j 1 n 1 j 1 K J j i J X i i
  3. Show that if J D n with J j then i J X i i n j
  4. Recall that the number of subsets J of D n with J j is n j

Use the multiplication principle of combinatorics to show that

b n k n k j 0 n k 1 j j ,  k 0 1 n
  1. A two-step procedure generates all permutations with exactly k matches.
  2. Select the k integers that will match. The number of ways of performing this step is n k
  3. Select a permutation of the remaining n k integers with no matches. The number of ways of performing this step is b n k 0

The Probability Density Function

Use the result of the previous exercise to show that N n has the probability density function given below. Compare with the result in Exercise 2, when the sampling is with replacement.

N n k 1 k j 0 n k 1 j j ,  k 0 1 n

In the matching experiment, vary the parameter n and note the shape and location of the probability density function. For selected values of n , run the simulation 1000 times, updating every 10 runs. Note the apparent convergence of empirical density function to the true probability density function.

Show that N n n 1 0 . Give a probabilistic proof by arguing that the event is impossible and an algebraic proof using the probability density function in Exercise 7.

Use a standard infinite series from calculus to show that N n k 1 k as n for k .

Just as in the case of sampling with replacement, the distribution of the number of matches converges to the Poisson distribution with parameter 1 as the n increases. The convergence is remakably rapid. Indeed, the distribution of the number of matches with n 10 is essentially the same as the distribution of the number of matches with n 1000000 !

In the matching experiment, increase n and note how the probability density function stabilizes rapidly. For selected values of n , run the simulation 1000 times, updating every 10 runs. Note the apparent convergence of the relative frequency function to the probability density function.

Moments

The mean and variance of the number of matches could be computed directly from the distribution. However, it is much better to use the representation in terms of indicator variables. The exchangeable property is an important tool in this section.

Show that I j 1 n for any j 1 2 n .

Show that N n 1 . Hint: Use the result of Exercise 12 and basic properties of expected value.

Thus, the expected number of matches is 1, regardless of n , just as when the sampling is with replacement.

Show that I j n 1 n 2 for any j 1 2 n .

A match in one position would seem to make it more likely that there would be a match in another position. Thus, we might guess that the indicator variables are positively correlated.

Show that for distinct j and k in 1 2 n ,

  1. I j I k 1 n 2 n 1
  2. I j I k 1 n 1 2

From Exercise 15, when n 2 , the event that there is a match in position 1 is perfectly correlated with the event that there is a match in position 2. Does this result seem reasonable?

Show that N n 1 . Hint: Use the previous two exercises and basic properties of covariance.

In the matching experiment, vary the parameter n and note the shape and location of the mean/standard deviation bar. For selected values of the parameter, run the simulation 1000 times, updating every 10 runs. Note the apparent convergence of sample mean and standard deviation to the distribution mean and standard deviation.

Show that for distinct j and k in 1 2 n , I j I k 0 as n .

Thus, the event that a match occurs in position j is nearly independent of the event that a match occurs in position k if n is large. For large n , the indicator variables behave nearly like n Bernoulli trials with success probability 1 n , which of course, is what happens when the sampling is with replacement.

A Recursion Relation

In this subsection, we will give an alternate derivation of the distribution of the number of matches, in a sense by embedding the experiment with parameter n into the experiment with parameter n 1 . First, consider the random permutation X 1 X 2 X n X n 1 of D n 1

Argue that X 1 X 2 X n is a random permutation of D n if and only if X n 1 n 1 if and only if I n 1 1

Use the result of the previous exercise to argue that

N n k N n 1 k 1 I n 1 1 ,  k 0 1 n

Use the result of the previous exercise and a conditional probability argument to show that

N n k N n 1 k 1 I n 1 1 N n 1 k 1 I n 1 1 ,  k 0 1 n

Argue that

  1. I n 1 1 1 n 1
  2. I n 1 1 N n 1 k 1 k 1 n 1

Finally conclude that N n k k 1 N n 1 k 1 for k 0 1 n

Note also that N 1 1 1 .

The results of the previous two exercises can be used to obtain the probability density function of N n recursively for any n .

The Probability Generating Function

Next recall that the probability generating function of N n is given by

G n t t N n j 0 n N n j t j ,  t

Use the results of the last subsection and basic calculus to show that the family of probability generating functions satisfies the following differential equations and ancillary conditions:

t G n 1 t G n t ,  G n 1 1 1

Note that G 1 t t for t . This fact, together with the system of differential equations in the previous exercise, can be used to compute G n for any n

Show that for t ,

  1. G 2 t 12 12 t 2
  2. G 3 t 13 12 t 16 t 3
  3. G 4 t 38 13 t 14 t 2 124 t 4

Use Exercise 25 to show that

t k G n t G n k t

Use the result of the previous exercise and basic properties of generating functions to conclude that

N n k 1 k N n k 0 ,  k 0 1 n

Examples and Applications

A secretary randomly stuffs 5 letters into 5 envelopes.

  1. Compute the number of outcomes with exactly k letters in the correct envelope, for each k 0 1 2 3 4 5
  2. Find the probability density function of the number of letters in the correct envelopes.
  3. Compute the covariance and correlation of the events that a letter is in the correct envelope and another letter is in the correct envelope.

Ten married couples are randomly paired for a dance.

  1. Find the probability density function of the number of married couples who are paired together.
  2. Find the mean and variance of the number of married couples who are paired together.
  3. Find the probability that at least 3 couples are paired together.

In the matching experiment, set n 10 . Run the experiment 1000 times with an update frequency of 10. Compare the following for the number of matches:

  1. the true probabilities
  2. the relative frequencies from the simulation
  3. the limiting Poisson probabilities