]> The Binomial Distribution
  1. Virtual Laboratories
  2. 11. Bernoulli Trials
  3. 1
  4. 2
  5. 3
  6. 4
  7. 5
  8. 6

2. The Binomial Distribution

Basic Theory

Suppose that our random experiment is to perform a sequence of Bernoulli trials :

X X 1 X 2

Recall that X is a sequence of independent indicator random variables with common probability of success p , the basic parameter of the process. In statistical terms, the first n trails X 1 X 2 X n form a random sample of size n from the Bernoulli distribution. In this section we will study the random variable that gives the number of successes in the first n trials and the random variable that gives the proportion of successes in the first n trials. The underlying distribution, the binomial distribution, is one of the most important in probability theory.

Show that the number of successes in the first n trials is the random variable

Y n i 1 n X i .

The Density Function

Denote the index set for the first n trials by N 1 2 n . Use the assumptions of Bernoulli trials to show that for K N with K k ,

i K X i 1 i N K X i 0 p k 1 p n k

Recall that the number of subsets of size k from a set of size n is the binomial coefficient

n k n k n k

Use Exercise 2 and basic properties of probability to show that the probability density function of Y n is given by

Y n k n k > p k 1 p n k ,  k 0 1 n

The distribution with this probability density function is known as the binomial distribution with parameters n and p .

In the binomial coin experiment, vary n and p with the scrollbars, and note the shape and location of the probability density function. For selected values of the parameters, run the simulation 1000 times, updating every 10 runs. Note the apparent convergence of the relative frequency function to the density function.

Use the binomial theorem to show that the binomial probability density function really is a probability density function.

Show that

  1. Y n k Y n k 1 if and only if k n 1 p
  2. Y n k Y n k 1 if and only if k n 1 p , an integer between 1 and n .

Thus, the density function at first increases and then decreases, reaching its maximum value at n 1 p . This integer is a mode of the distribution. In the case that m n 1 p is an integer between 1 and n , there are two consecutive modes, at m 1 and m . In any event, the shape of the binomial distribution is unimodal.

Suppose that U is a random variable having the binomial distribution with parameters n and p . Show that n U has the binomial distribution with parameters n and 1 p .

  1. Give a probabilistic proof, based on Bernoulli trials
  2. Give an analytic proof, based on probability density functions

Moments

We will compute the mean and variance of the binomial distribution several different ways. The method using indicator variables is the best.

Show that Y n n p in two ways:

  1. using the probability density function.
  2. using the representation as a sum of indicator variables.

This result makes intuitive sense, since p should be approximately the proportion of successes in a large number of trials. We will discuss the point further in the subsection on the proportion of successes.

Show that Y n n p 1 p in two ways:

  1. using the probability density function
  2. using the representation as a sum of independent indicator variables.

Sketch the graph of the variance as a function of p . Note in particular that the variance takes its maximum value 14 when p 12 and its minimum value 0 when p 0 or p 1 .

In the binomial coin experiment, vary n and p with the scrollbars and note the location and size of the mean/standard deviation bar. For selected values of the parameters, run the simulation 1000 times, updating every 10 runs. Note the apparent convergence of the sample mean and standard deviation to the distribution mean and standard deviation.

Show that the probability generating function is t Y n 1 p p t n for t in two ways:

  1. using the probability density function
  2. using the representation as a sum of independent indicator variables.

Use the probability generating function to compute the mean and variance.

Use the identity j n j n n 1 j 1 for n and j to show that

Y n k n p Y n 1 1 k 1 ;  n  ,  k

Use the recursion result in the previous exercise to give yet one more derivation of the mean and variance.

The Proportion of Successes

The proportion of successes in the first n trials is the random variable

M n Y n n 1 n i 1 n X i

In statistical terms, M n is the sample mean of the random sample X 1 X 2 X n . The proportion of successes M n is typically used to estimate the probability of success p when this probability is unknown. It is basic to the very notion of probability, that if the number of trials is large, then M n should be close to p . The mathematical formulation of this idea is a special case of the law of large numbers.

It is easy to express the probability density function of the proportion of successes M n in terms of the probability density function of the number of successes Y n . First, note that M n takes the values k n where k 0 1 n .

Show that

M n k n n k p k 1 p n k ,  k 0 1 n

In the binomial coin experiment, select the proportion of heads. Vary n and p with the scroll bars and note the shape of the probability density function. For selected values of the parameters, run the experiment 1000 times, updating every 10 runs. Watch the apparent convergence of the relative frequency function to the probability density function.

Show that M n p . In statistical terms, this means that M n is an unbiased estimator of p .

Show that M n p 1 p n . In particular, M n 1 4 n for any p 0 1 .

Note that M n 0 as n (and actually, the convergence is uniform in p 0 1 ). Thus, the estimate improves as n increases; in statistical terms, this is known as consistency.

Use the result in the last exercise and Chebyshev's inequality to show that M n p δ 0 as n for every δ 0 . In fact, the convergence is uniform in p 0 1 .

The result in the last exercise is a special case of the weak law of large numbers and means that M n p as n in probability. The strong law of large numbers states that the convergence actually holds with probability 1. See Estimation in the Bernoulli Model in the chapter on Set Estimation for a different approach to the problem of estimating p .

In the binomial coin experiment, select the proportion of heads. Vary n and p and note the size and location of the mean/standard deviation bar. For selected values of the parameters, run the experiment 10000 times, updating every 10 runs, and note the apparent convergence of the empirical moments to the distribution moments.

Sums of Independent Binomial Variables

Several important properties of the random process Y Y 1 Y 2 stem from the fact that it is a partial sum process corresponding to the sequence X of independent, identically distributed indicator variables.

Use the representation in terms of indicator variables to prove the following properties.

  1. If m and n are positive integers with m n then Y n Y m has the same distribution as Y n m , namely binomial with parameters n m and p . Thus, the process Y has stationary increments.
  2. If n 1 n 2 n 3  ···   then Y n 1 Y n 2 Y n 1 Y n 3 Y n 2 is a sequence of independent random variables. Thus, the process Y has independent increments.

Actually, any partial sum process corresponding to an independent, identically distributed sequence will have stationary, independent increments.

Show that if U and V are independent random variables for an experiment, and that U has the binomial distribution with parameters m and p , and V has the binomial distribution with parameters n and p , then U V has the binomial distribution with parameters m n and p .

  1. Give a probabilistic proof using the previous exercise.
  2. Give an analytic proof using probability density functions.
  3. Give an analytic poof using probability generating functions.

Use the stationary, independent increments properties to find the joint probability density functions of the Y process: if n 1 n 2 n k and j 1 j 2 j k then

Y n 1 j 1 Y n 2 j 2 Y n k j k n 1 j 1 n 2 n 1 j 2 j 1 n k n k 1 j k j k 1 p j k 1 p n k j k

where as always, we use the standard conventions for binomial coefficients: a b 0 if b 0 or b a .

Connection to the Hypergeometric Distribution

Suppose that m n . Show that

Y m j Y n k m j m n k j n k ,  j 0 1 n

Interestingly, the distribution in the last exercise is independent of p . It is known as the hypergeometric distribution with parameters n , m and k , and governs the number of type 1 objects in a sample of size k chosen at random and without replacement from a population of n objects of which m are type 1 (and the remainder type 0). Try to interpret this result probabilistically.

The Normal Approximation

Open the binomial timeline experiment. For selected values of p 0 1 , start with n 1 and successively increase n by 1. For each value of n , Note the shape of the probability density function of the number of successes and the proportion of successes. With n 100 , run the experiment 1000 time, updating by 10. For the number of successes and the proportion of successes, note the apparent convergence of the empirical density function to the probability density function.

The characteristic bell shape that you should observe in the previous exercise is an example of the central limit theorem, because the binomial variable can be written as a sum of n independent, identically distributed random variables (the indicator variables).

Show that the standard score of Y n is the same as the standard score of M n :

Y n n p n p 1 p M n p p 1 p n

Apply the central limit theorem to show that the distribution of the standard score given in the previous exercise converges to the standard normal distribution as n :

This version of the central limit theorem is known as the DeMoivre-Laplace theorem, and is named after Abraham DeMoivre and Simeon Laplace. From a practical point of view, this result means that, for large n , the distribution of Y n is approximately normal, with mean n p and standard deviation n p 1 p and the distribution of M n is approximately normal, with mean p and standard deviation p 1 p n . Just how large n needs to be for the normal approximation to work well depends on the value of p . The rule of thumb is that we need n p 5 and n 1 p 5 . Finally, when using the normal approximation, we should remember to use the continuity correction, since the binomial is a discrete distribution.

Examples and Applications

A student takes a multiple choice test with 20 questions, each with 5 choices (only one of which is correct). Suppose that the student blindly guesses. Let X denote the number of questions that the student answers correctly.

  1. Find the probability density function of X .
  2. Find the mean of X .
  3. Find the variance of X .
  4. Find the probability that the student answers at least 12 questions correctly (the score that she needs to pass).

A certain type of missile has failure probability 0.02. Let Y denote the number of failures in 50 tests.

  1. Find the probability density function of Y .
  2. Find the mean of Y .
  3. Find the variance of Y .
  4. Find the probability of at least 47 successful tests.

Suppose that in a certain district, 40% of the registered voters prefer candidate A . A random sample of 50 registered voters is selected. Let Z denote the number in the sample who prefer A .

  1. Find the probability density function of Z .
  2. Find the mean of Z .
  3. Find the variance of Z .
  4. Find the probability that Z is less that 19.
  5. Compute the normal approximation to the probability in (d).

Coins and Dice

Recall that a standard die is a six-sided die. A fair die is one in which the faces are equally likely. An ace-six flat die is a standard die in which faces 1 and 6 have probability 14 each, and faces 2, 3, 4, and 5 have probability 18 .

A standard, fair die is tossed 10 times. Let N denote the number of aces.

  1. Find the probability density function of N
  2. Find the mean of N .
  3. Find the variance of N .

A coin is tossed 100 times and results in 30 heads. Find the probability density function of the number of heads in the first 20 tosses.

An ace-six flat die is rolled 1000 times. Let Z denote the number of times that a score of 1 or 2 occurred.

  1. Find the probability density function of Z
  2. Find the mean of Z .
  3. Find the variance of Z .
  4. Find the probability that Z is at least 400.
  5. Find the normal approximation of the probability in (d)

In the binomial coin experiment, select the proportion of heads. Set n 10 and p 0.4 . Run the experiment 100 times, updating after each run. Over all 100 runs, compute the square root of the average of the squares of the errors, when M n used to estimate p . This number is a measure of the quality of the estimate.

In the binomial coin experiment, select the number of heads Y , and set p 0.5 and n 15 . Run the experiment 1000 times with an update frequency of 100. Compute and compare the following:

  1. 5 Y 10
  2. The relative frequency of the event 5 Y 10
  3. The normal approximation to 5 Y 10

In the binomial coin experiment, select the proportion of heads M and set n 30 , p 0.6 . Run the experiment 1000 times with an update frequency of 100. Compute and compare each of the following:

  1. 0.5 M 0.7
  2. The relative frequency of the event 0.5 M 0.7
  3. The normal approximation to 0.5 M 0.7

Famous Problems

In 1693, Samuel Pepys asked Isaac Newton whether it is more likely to get at least one ace in 6 rolls of a die or at least two aces in 12 rolls of a die. This problems is known a Pepys' problem; naturally, Pepys had fair dice in mind.

Guess the answer to Pepys' problem based on empirical data. With fair dice and n 6 , run the simulation of the dice experiment 500 times and compute the relative frequency of at least one ace. Now with n 12 , run the simulation 500 times and compute the relative frequency of at least two aces. Compare the results.

Solve Pepys' problem using the binomial distribution.

Which is more likely: at least one ace with 4 throws of a fair die or at least one double ace in 24 throws of two fair dice? This is known as DeMere's problem, named after Chevalier De Mere.

Data Analysis Exercises

In the cicada data, compute the proportion of males in the entire sample, and the proportion of males of each species in the sample.

In the M&M data, pool the bags to create a large sample of M&Ms. Now compute the sample proportion of red M&Ms.

The Galton Board

The Galton board is a triangular array of pegs. The rows are numbered by the natural numbers 0 1 from top downward. Row n has n 1 pegs numbered from left to right by the integers 0 1 n . Thus a peg can be uniquely identified by the ordered pair n k where n is the row number and k is the peg number in that row. The Galton board is named after Francis Galton.

Now suppose that a ball is dropped from above the top peg 0 0 . Each time the ball hits a peg, it bounces to the to the right with probability p and to the left with probability 1 p , independently from bounce to bounce.

Show that the number of the peg that the ball hits in row n is the has the binomial distribution with parameters n and p .

In the Galton board experiment, select random variable Y (the number of moves right). Vary the parameters n and p and note the shape and location of the probability density function and the mean/standard deviation bar. For selected values of the parameters, click single step several times and watch the ball fall through the pegs. Then run the experiment 1000 times, updating every 10 runs and watch the path of the ball. Note the apparent convergence of the relative frequency function and empirical moments to the density function and distribution moments, respectively.

Reliability

Recall the discussion of structural reliability given in the last section on Bernoulli trials. In particular, we have a system of n similar components that function independently, each with reliability p . Suppose now that the system as a whole functions properly if and only if at least k of the n components are good. Such a systems is called, appropriately enough, a k out of n system. Note that the series and parallel systems considered in the previous section are n out of n and 1 out of n systems, respectively.

Show that for a k out of n system

  1. The state of the system is 1 Y n k where Y n is the number of working components.
  2. The reliability function is r n k p i k n n i p i 1 p n i

In the binomial coin experiment, set n 10 and p 0.9 and run the simulation 1000 times, updating every 100 runs. Compute the empirical reliability and compare with the true reliability in each of the following cases:

  1. 10 out of 10 (series) system.
  2. 1 out of 10 (parallel) system.
  3. 4 out of 10 system.

Consider a system with n 4 components. Sketch the graphs of r 4 1 , r 4 2 , r 4 3 , and r 4 4 , on the same set of axes.

An n out of 2 n 1 system is a majority rules system.

  1. Compute the reliability of a 2 out of 3 system.
  2. Compute the reliability of a 3 out of 5 system
  3. For what values of p is a 3 out of 5 system more reliable than a 2 out of 3 system?
  4. Sketch the graphs of r 3 2 and r 5 3 , on the same set of axes.

In the binomial coin experiment, compute the empirical reliability, based on 100 runs, in each of the following cases. Compare your results to the true probabilities.

  1. A 2 out of 3 system with p 0.3
  2. A 3 out of 5 system with p 0.3
  3. A 2 out of 3 system with p 0.8
  4. A 3 out of 5 system with p 0.8

Show that r 2 n 1 n 12 12 .

Bernstein Polynomials

The Weierstrass Approximation Theorem, named after Karl Weierstrass, states that any real-valued function that is continuous on a closed, bounded interval can be uniformly approximated on that interval, to any degree of accuracy, with a polynomial. The theorem is important, since polynomials are simple and basic functions, and a bit surprising, since continuous functions can be quite strange.

In 1911, Sergi Bernstein gave an explicit construction of polynomials that uniformly approximate a given continuous function, using Bernoulli trials. Bernstein's result is a beautiful example of the probabilistic method, the use of probability theory to obtain results in other areas of mathematics that are seemingly unrelated to probability,

Suppose that f is a real-valued function that is continuous on the interval 0 1 . The Bernstein polynomial of degree n for f is defined by

b n p p f M n ,  p 0 1

where M n is the proportion of successes in the first n Bernoulli trials with success parameter p , as defined earlier. Note that we are emphasizing the dependence on p in the expected value operator. The next exercise gives a more explicit representation, and shows that the Bernstein polynomial is, in fact, a polynomial

Use basic properties of expected value to show that

b n p k 0 n f k n n k p k 1 p n k ,  p 0 1

In particular, show that

  1. b n 0 f 0 and b n 1 f 1 . Thus, the graph of b n passes through the endpoints 0 f 0 and 1 f 1
  2. b 1 p f 0 f 1 f 0 p ,  p 0 1 . Thus, the graph of b 1 is the line connecting the endpoints.
  3. b 2 p f 0 2 f 12 f 0 p f 1 2 f 12 f 0 p 2 ,  p 0 1 . Thus, the graph of b 2 is the parabola passing through the endpoints and the point 12 14 f 0 12 f 12 14 f 1 .

Prove Bernstein's theorem: b n f  as  n  uniformly on  0 1 . Fill in the details of the major steps below:

  1. Since f is continuous on the closed, bounded interval 0 1 , it is bounded on this interval. Thus, there exists a constant C such that f p C for all p 0 1
  2. Since f is continuous on the closed, bounded interval 0 1 , it is uniformly continuous on this interval. Thus, for any ε 0 there exists δ 0 such that if p 0 1 , q 0 1 , and p q δ then f p f q ε
  3. From basic properties of expected value, b n p f p p f M n f p M n p δ p f M n f p M n p δ
  4. Using parts (a) and (b), b n p f p ε 2 C P p M n p δ for any p 0 1 .
  5. By Exercise 20, P p M n p δ 0 as n uniformly in p 0 1 .

Compute the Bernstein polynomials of orders 1, 2, and 3 for the function f defined by f x x ,  x 0 1 . Graph f and the three polynomials on the same set of axes.

Use a computer algebra system to compute the Bernstein polynomials of orders 10, 20, and 30 for the function f defined below. Use the CAS to graph the function and the three polynomials on the same axes.

f x 0 x 0 x x x 0 1