]> The Coupon Collector Problem
  1. Virtual Laboratories
  2. 11. Finite Sampling Models
  3. 1
  4. 2
  5. 3
  6. 4
  7. 5
  8. 6
  9. 7
  10. 8
  11. 9

7. The Coupon Collector Problem

Basic Theory

Random Variables

In this section, our random experiment is to sample repeatedly, with replacement, from the population D 1 2 m . This generates a sequence of independent random variables, each uniformly distributed on D :

X X 1 X 2

We will often interpret the sampling in terms of a coupon collector: each time the collector buys a certain product (bubble gum or Cracker Jack, for example) she receives a coupon (a baseball card or a toy, for example) which is equally likely to be any one of m types. Thus, in this setting, X i D is the coupon type received on the i purchase.

Let V m n denote the number of distinct values in the first n selections, for n . This is the random variable studied in the last section on the Birthday Problem. Our interest is in this section is the sample size needed to get k distinct sample values, for k 1 2 m . Thus, let

W m k n V m n k

In terms of the coupon collector, this random variable gives the number of products required to get k distinct coupon types. Note that the set of possible values of W m k is k k 1 . We will be particularly interested in W m m , the sample size needed to get the entire population. In terms of the coupon collector, this is the number of products required to get the entire set of coupons.

In the coupon collector experiment, run the experiment in single-step mode a few times for selected values of the parameters.

The Probability Density Function

Now let's find the distribution of W m k . The results of the previous section will be very helpful

Argue that W m k n if and only if V m n 1 k 1 and V m n k .

Use Exercise 2 and a conditional probability argument to show that

W m k n m k 1 m V m n 1 k 1

Use the result of the last exercise and the distribution of V m n 1 from the last section to show that

W m k n m 1 k 1 j 0 k 1 1 j k 1 j k j 1 m n 1 ,  n k k 1

In the coupon collector experiment, vary the parameters and note the shape of and position of the probability density function. For selected values of the parameters, run the experiment 1000 times with an update frequency of 10 and note the apparent convergence of the relative frequency function to the probability density function.

Recursion Formula

An alternate approach to the distribution of the sample size needed to get k distinct values is via a recursion formula.

Let p m k n W m k n Use a conditional probability argument to show that

p m k n 1 k 1 m p m k n m k 1 m p m k 1 n

Decomposition as a Sum

We will now show that W m k can be decomposed as a sum of k independent, geometrically distributed random variables. This will provide some additional insight into the nature of the distribution and will make the computation of the mean and variance easy.

For i 1 2 m , let Z m i denote the number of additional samples needed to go from i 1 distinct values to i distinct values.

Argue that

  1. Z m 1 Z m 2 Z m m is a sequence of independent random variables.
  2. Z m i has the geometric distribution on with parameter p m i m i 1 m
  3. W m k i 1 k Z m i

Exercise 7 shows clearly that each time a new coupon is obtained, it becomes harder to get the next new coupon.

In the coupon collector experiment, run the experiment in single-step mode a few times for selected values of the parameters. In particular, try this with m large and k near m .

Moments

Use the results of Exercise 7 to show that

  1. W m k i 1 k m m i 1
  2. W m k i 1 k i 1 m m i 1 2

In the coupon collector experiment, vary the parameters and note the shape and location of the mean/standard deviation bar. For selected values of the parameters, run the experiment 1000 times with an update frequency of 10 and note the apparent convergence of the sample mean and standard deviation to the distribution mean and standard deviation.

Use the Result of Exercise 7 to show that the probability generating function of W m k is

t W m k i 1 k m i 1 m i 1 t ,  t m k 1

Examples and Applications

Suppose that people are sampled at random until 40 distinct birthdays are obtained.

  1. Find the probability density function of the sample size.
  2. Find the mean of the sample size.
  3. Find the variance of the sample size.
  4. Find the probability generating function of the sample size.

Suppose that a standard, fair die is thrown until all 6 scores have occurred.

  1. Find the probability density function of the number of throws.
  2. Find the mean of the number of throws.
  3. Find the variance of the number of throws.
  4. Find the probability that at least 10 throws are required.

A box of a certain brand of cereal comes with a special toy. There are 10 different toys in all. A collector buys boxes of cereal until she has all 10 toys.

  1. Find the probability density function of the number boxes purchased.
  2. Find the mean of the number of boxes purchased.
  3. Find the variance of the number of boxes purchased.
  4. Find the probability that no more than 15 boxes were purchased.