]> The Negative Binomial Distribution
  1. Virtual Laboratories
  2. 10. Bernoulli Trials
  3. 1
  4. 2
  5. 3
  6. 4
  7. 5
  8. 6

4. The Negative Binomial Distribution

Basic Theory

Suppose again that our random experiment is to perform a sequence of Bernoulli trials X X 1 X 2 with parameter p 0 1 . Recall that the number of successes in the first n trials

Y n i 1 n X i

has the binomial distribution with parameters n and p . In this section we will study the random variable that gives the trial number of the k success:

V k n 1 2 Y n k

Note that V 1 is the number of trials needed to get the first success, which we now know has the geometric distribution on with parameter p .

The Probability Density Function

Show that V k n if and only if X n 1 and Y n 1 k 1 .

Use Exercise 1, independence, and the binomial distribution to show that

V k n n 1 k 1 p k 1 p n k ,  n k k 1 k 2

The distribution defined by the density function in Exercise 2 is known as the negative binomial distribution; it has two parameters, the number of successes k and the success probability p .

In the negative binomial experiment, vary k and p with the scroll bars and note the shape of the density function. For selected values of k and p , run the experiment 1000 times with an update frequency of 10. Watch the apparent convergence of the relative frequency function to the density function.

Show that the binomial and negative binomial sequences are inverse to each other in the sense that

Y n k  if and only if  V k n

This property is implicit in the definition of V k given at the beginning of this section. In particular, argue that any event that can be expressed in terms of the negative binomial variables can also be expressed in terms of the binomial variables.

Show that V k n V k n 1 if and only if n 1 k 1 p . Thus, the density function at first increases and then decreases, reaching its maximum value at 1 k 1 p . This integer is the mode of the distribution and hence the negative binomial distribution is unimodal.

Times Between Successes

Next we will define the random variables that give the number of trials between successive successes. Let

U 1 V 1  and  U k V k V k 1  for  k 2 3

Show that U U 1 U 2 is a sequence of independent random variables, each having the geometric distribution on with parameter p . Moreover,

V k i 1 k U i

In statistical terms, U corresponds to sampling from the geometric distribution with parameter p , so that for each k , U 1 U 2 U k is a random sample of size k from this distribution. The sequence of negative binomial variables V is the partial sum process corresponding to the sequence U . In statistical terms, V k k is the sample mean corresponding to the random sample U 1 U 2 U k ; this random variable gives the average number of trials between the first k successes. Partial sum processes are studied in more generality in the chapter on Random Samples.

Use the partial sum representation to prove the following properties.

  1. If j k then V k V j has the same distribution as V k j , namely negative binomial with parameters k j and p . Thus, the process V has stationary increments.
  2. If k 1 k 2 k 3  ···   then V k 1 V k 2 V k 1 V k 3 V k 2 is a sequence of independent random variables. Thus, the process V has independent increments.

Actually, any partial sum process corresponding to an independent, identically distributed sequence will have stationary, independent increments.

Moments

The mean, variance and probability generating function of V k now follow easily from the representation as a sum of independent, identically distributed geometrically distributed variables..

Show that V k k 1 p .

Show that V k k 1 p p 2 .

Show that t V k p t 1 1 p t k for t 1 1 p

In the negative binomial experiment, vary k and p with the scroll bars and note the location and size of the mean/standard deviation bar. For selected values of the parameters, run the experiment 1000 times with an update frequency of 10. Watch the apparent convergence of the sample mean and standard deviation to the distribution mean and standard deviation.

Verify the results of Exercise 8, Exercise 9, and Exercise 10 directly using the probability density function. Note that this method requires significantly more work.

Suppose that V and W are independent random variables for an experiment, and that V has the negative binomial distribution with parameters j and p , and W has the negative binomial distribution with parameters k and p . Show that V W has the negative binomial distribution with parameters k j and p .

  1. Give a probabilistic proof, based on the partial sum representation.
  2. Give an analytic proof, based on probability density functions
  3. Give an analytic proof, based on probability generating functions.

Normal Approximation

In the negative binomial experiment, start with various values of p and k 1 . Successively increase k by 1, noting the shape of the probability density function each time.

Even though you are limited to k 5 , you can still see the characteristic bell shape. This is a consequence of the central limit theorem because the negative binomial variable can be written as a sum of k independent, identically distributed (geometric) random variables.

Show that the distribution of the standardized variable given below converges to the standard normal distribution as k .

Z k p V k k k 1 p

From a practical point of view, this result means that if k is large, the distribution of V k is approximately normal with mean and variance given in Exercise 8 and Exercise 9, respectively. Just how large k needs to be for the approximation to work well depends on p . Also, when using the normal approximation, we should remember to use the continuity correction, since the negative binomial is a discrete distribution.

Relation to Order Statistics

Suppose that k n . Show that

V 1 j 1 V 2 j 2 V k j k Y n k 1 n k  for   j 1 j 2 j k L

where L j 1 j 2 j k 1 2 n k j 1 j 2 j k . Thus, given k successes in the first n trials, the vector of success trial numbers is uniformly distributed on L . Equivalently, the vector of success trial numbers is distributed as the vector of order statistics corresponding to a sample of size k chosen at random and without replacement from 1 2 n .

Show that

V m j Y n k j 1 m 1 n j k m n k ,  j m m 1 n k m

Again, this is the distribution of the m order statistic when a sample of size k is selected at random and without replacement from the population 1 2 n .

Examples and Applications

A standard, fair die is thrown until 3 aces occur. Let V denote the number of throws.

  1. Find the probability density function of V .
  2. Find the mean of V
  3. Find the variance of V
  4. Find the probability that at least 20 throws will needed.

A coin is tossed repeatedly. The 10 head occurs on the 25 toss.

  1. Find the probability density function of the trial number of the 5 head.
  2. Find the mean of the distribution in (a)
  3. Find the variance of the distribution in (a)

A certain type of missile has failure probability 0.02. Let N denote the launch number of the fourth failure.

  1. Find the probability density function of N .
  2. Find the mean of N
  3. Find the variance of N
  4. Find the probability that there will be at least 4 failures in the first 200 launches.

In the negative binomial experiment, set p 0.5 and k 5 . Run the experiment 1000 times with an update frequency of 100. Compute and compare each of the following:

  1. 8 V 5 15
  2. The relative frequency of the event 8 V 5 15 in the simulation.
  3. The normal approximation to 8 V 5 15

A coin is tossed until the 50 head occurs.

  1. Assuming that the coin is fair, find the normal approximation of the probability that the coin is tossed at least 125 times.
  2. Suppose that you perform this experiment, and 125 tosses are required. Do you believe that the coin is fair?

The Banach Match Problem

Suppose that an absent-minded professor (is there any other kind?) has m matches in his right pocket and m matches in his left pocket. When he needs a match to light his pipe, he is equally likely to choose a match from either pocket. We want to compute the probability density function of the random variable W that gives the number of matches remaining when the professor first discovers that one of the pockets is empty. This is known as the Banach match problem, named for the mathematician Stefan Banach, who evidently behaved in the manner described.

We can recast the problem in terms of the negative binomial distribution. Clearly the match choices form a sequence of Bernoulli trials with parameter p 12 . Specifically, we can consider a match from the right pocket as a win for player R , and a match from the left pocket as a win for player L . In a hypothetical infinite sequence of trials, let U denote the number of trials necessary for R to win m 1 trials, and V the number of trials necessary for L to win m 1 trials. Note that U and V each have the negative binomial distribution with parameters m 1 and p .

For k 0 1 m , show that

  1. L has m k wins at the moment when R wins m 1 games if and only if U 2 m k 1
  2. U 2 m k 1 is equivalent to the event that the professor first discovers that the right pocket is empty and that the left pocket has k matches
  3. U 2 m k 1 2 m k m 12 2 m k 1 .

For k 0 1 m , show that

  1. R has m k wins at the moment when L wins m 1 games if and only if V 2 m k 1
  2. V 2 m k 1 is equivalent to the event that the professor first discovers that the right pocket is empty and that the left pocket has k matches
  3. V 2 m k 1 2 m k m 12 2 m k 1 .

Combine the results of the previous two exercises to conclude that

W k 2 m k m 12 2 m k ,  k 0 1 m

We can also solve the non-symmetric Banach match problem, using the same methods as above. Thus, suppose that the professor reaches for a match in his right pocket with probability p and in his left pocket with probability 1 p , where 0 p 1 . The essential change in the analysis is that U has the negative binomial distribution with parameters m 1 and p , while V has the negative binomial distribution with parameters m 1 and 1 p .

Show that

W k 2 m k m p m 1 1 p m k 1 p m 1 p m k ,  k 0 1 m

The Problem of Points

Suppose that two teams (or individuals) A and B play a sequence of Bernoulli trials, where p is the probability that player A wins a trial. For nonnegative integers n and m , let A n m p denote the probability that A wins n points before B wins m points. Computing A n m is an historically famous problem, known as the problem of points, that was solved by Pierre de Fermat and by Blaise Pascal.

Comment on the validity of the Bernoulli trial assumptions (independence of trials and constant probability of success) for games of sport that have a skill component as well as a random component.

There is an easy solution to the problem of points using the binomial distribution; this was essentially Pascal's solution. Let us pretend that n m 1 trials are played, regardless of the outcomes, and let Y n m 1 denote the number of trials that A wins. By definition, Y n m 1 has the binomial distribution with parameters n m 1 and p .

Show that A wins n trials before B wins m trials if and only if

Y n m 1 n

Use the result of the previous exercise to show that

A n m p k n n m 1 n m 1 k p k 1 p n m 1 k

There is also an easy solution to the problem of points using the negative binomial distribution In a sense, this has to be the case, given the equivalence between the binomial and negative binomial processes. First, let us pretend that the games go on forever, regardless of the outcomes, and let V n denote the number of games needed for A to win n games. By definition, V n has the negative binomial distribution with parameters n and p .

Show that A wins n trials before B wins m trials if and only if

V n n m 1

Use the result of the previous exercise to show that

A n m p j n n m 1 j 1 n 1 p n 1 p j n

Show that for fixed n and m , A n m p increases from 0 to 1 as p increases from 0 to 1.

Show that

  1. A n m p decreases as n increases for fixed m and p .
  2. A n m p increases as m increases for fixed n and p .

Show that 1 A n m p A m n 1 p for any n , m , and p .

  1. Give an analytic proof.
  2. Give a probabilistic argument.

In the problem of points experiments, vary the parameters n , m , and p , and note how the probability changes. For selected values of the parameters, run the simulation 1000 times with an update frequency of 10. Note the apparent convergence of the relative frequency to the probability.

Condition on the outcome of the first trial to derive the following recurrence relation and boundary conditions (this was essentially Fermat's solution):

  1. A n m p p A n 1 m p 1 p A n m 1 p for n and m
  2. A n 0 p 0 , A 0 m p 1

Now let's study the number of trials needed for the problem of points experiment. Let N n m denote the number of trials needed until either A wins n points or B wins m points, which occurs first. The negative binomial variables provide an easy derivation of the distribution N n m . Again, imagine that we continue the trials indefinitely. Let V n denote the number of trials needed for A to win n points, and let W m denote the number of trials needed for B to win m points.

Show that for k m n n m 1

N n m k V n k W m k k 1 n 1 p n 1 p k n k 1 m 1 1 p m p k m

Series of Games

The special case of the problem of points experiment with m n is important, because it corresponds to A and B playing a best of 2 n 1 game series. That is, the first player to win n games wins the series. Such series, especially when n 2 3 4 are frequently used in championship tournaments. We will let A n p A n n p denote the probability that player A wins the series. From our general results with the problem of points, we have

A n p k n 2 n 1 2 n 1 k p k 1 p 2 n 1 k j n 2 n 1 j 1 n 1 p n 1 p j n

Suppose that p 0.6 . Explicitly find the probability that team A wins in each of the following cases:

  1. A best of 5 game series.
  2. A best of 7 game series.

In the problem of points experiments, vary the parameters n , m , and p (keeping n m ), and note how the probability changes. Now simulate a best of 5 series by selecting n m 3 , p 0.6 . Run the experiment 1000 times, updating every 10 runs. Note the apparent convergence of the relative frequency to the true probability.

Show that A n 1 p 1 A n p for any n and p .

  1. Show that this condition means that the graph of A n is symmetric with respect to p 12 .
  2. Show that this condition implies that A n 12 12 .

In the problem of points experiments, vary the parameters n , m , and p (keeping n m ), and note how the probability changes. Now simulate a best 7 series by selecting n m 4 , p 0.45 . Run the experiment 1000 times, updating every 10 runs. Note the apparent convergence of the relative frequency o the true probability.

Suppose that n m Show that A n p A m p if and only if p 12 Interpret the result.

Let N n denote the number of trials in the series. Use Exercise 37 to show that

N n k k 1 n 1 p n 1 p k n 1 p n p k n ,  k n n 1 2 n 1

Explicitly compute the probability density function, expected value, and standard deviation for the number of games in a best of 7 series with the following values of p :

  1. 0.5
  2. 0.7
  3. 0.9

Division of Stakes

The problem of points originated from a question posed by Chevalier de Mere, who was interested in the fair division of stakes when a game is interrupted. Specifically, suppose that players A and B each put up C monetary units, and then play Bernoulli trials until one of them wins a specified number of trials. The winner then takes the entire 2 C fortune.

If the game is interrupted when A needs to win n more trials and B needs to win m more trials, argue that the fortune should be divided between A and B , respectively, as follows:

  1. 2 C A n m p for A ,
  2. 2 C 1 A n m p for B .

Suppose that players A and B bet $50 each. The players toss a fair coin until one of them has 10 wins; the winner takes the entire fortune. Suppose that the game is interrupted by the gambling police when A has 5 wins and B has 3 wins. How should the stakes be divided?