]>
Suppose that our random experiment is to perform a sequence of Bernoulli trials :
Recall that is a sequence of independent indicator random variables with common probability of success , the basic parameter of the process. In statistical terms, the first trails form a random sample of size from the Bernoulli distribution. In this section we will study the random variable that gives the number of successes in the first trials and the random variable that gives the proportion of successes in the first trials. The underlying distribution, the binomial distribution, is one of the most important in probability theory.
Show that the number of successes in the first trials is the random variable
.Denote the index set for the first trials by . Use the assumptions of Bernoulli trials to show that for with ,
Recall that the number of subsets of size from a set of size is the binomial coefficient
Use Exercise 2 and basic properties of probability to show that the probability density function of is given by
The distribution with this probability density function is known as the binomial distribution with parameters and .
In the binomial coin experiment, vary and with the scrollbars, and note the shape and location of the probability density function. For selected values of the parameters, run the simulation 1000 times, updating every 10 runs. Note the apparent convergence of the relative frequency function to the density function.
Use the binomial theorem to show that the binomial probability density function really is a probability density function.
Show that
Thus, the density function at first increases and then decreases, reaching its maximum value at . This integer is a mode of the distribution. In the case that is an integer between 1 and , there are two consecutive modes, at and . In any event, the shape of the binomial distribution is unimodal.
Suppose that is a random variable having the binomial distribution with parameters and . Show that has the binomial distribution with parameters and .
We will compute the mean and variance of the binomial distribution several different ways. The method using indicator variables is the best.
Show that in two ways:
This result makes intuitive sense, since should be approximately the proportion of successes in a large number of trials. We will discuss the point further in the subsection on the proportion of successes.
Show that in two ways:
Sketch the graph of the variance as a function of . Note in particular that the variance takes its maximum value when and its minimum value 0 when or .
In the binomial coin experiment, vary and with the scrollbars and note the location and size of the mean/standard deviation bar. For selected values of the parameters, run the simulation 1000 times, updating every 10 runs. Note the apparent convergence of the sample mean and standard deviation to the distribution mean and standard deviation.
Show that the probability generating function is for in two ways:
Use the probability generating function to compute the mean and variance.
Use the identity for and to show that
Use the recursion result in the previous exercise to give yet one more derivation of the mean and variance.
The proportion of successes in the first trials is the random variable
In statistical terms, is the sample mean of the random sample . The proportion of successes is typically used to estimate the probability of success when this probability is unknown. It is basic to the very notion of probability, that if the number of trials is large, then should be close to . The mathematical formulation of this idea is a special case of the law of large numbers.
It is easy to express the probability density function of the proportion of successes in terms of the probability density function of the number of successes . First, note that takes the values where .
Show that
In the binomial coin experiment, select the proportion of heads. Vary and with the scroll bars and note the shape of the probability density function. For selected values of the parameters, run the experiment 1000 times, updating every 10 runs. Watch the apparent convergence of the relative frequency function to the probability density function.
Show that . In statistical terms, this means that is an unbiased estimator of .
Show that . In particular, for any .
Note that as (and actually, the convergence is uniform in ). Thus, the estimate improves as increases; in statistical terms, this is known as consistency.
Use the result in the last exercise and Chebyshev's inequality to show that as for every . In fact, the convergence is uniform in .
The result in the last exercise is a special case of the weak law of large numbers and means that as in probability. The strong law of large numbers states that the convergence actually holds with probability 1. See Estimation in the Bernoulli Model in the chapter on Set Estimation for a different approach to the problem of estimating .
In the binomial coin experiment, select the proportion of heads. Vary and and note the size and location of the mean/standard deviation bar. For selected values of the parameters, run the experiment 10000 times, updating every 10 runs, and note the apparent convergence of the empirical moments to the distribution moments.
Several important properties of the random process stem from the fact that it is a partial sum process corresponding to the sequence of independent, identically distributed indicator variables.
Use the representation in terms of indicator variables to prove the following properties.
Actually, any partial sum process corresponding to an independent, identically distributed sequence will have stationary, independent increments.
Show that if and are independent random variables for an experiment, and that has the binomial distribution with parameters and , and has the binomial distribution with parameters and , then has the binomial distribution with parameters and .
Use the stationary, independent increments properties to find the joint probability density functions of the process: if and then
where as always, we use the standard conventions for binomial coefficients: if or .
Suppose that . Show that
Interestingly, the distribution in the last exercise is independent of . It is known as the hypergeometric distribution with parameters , and , and governs the number of type 1 objects in a sample of size chosen at random and without replacement from a population of objects of which are type 1 (and the remainder type 0). Try to interpret this result probabilistically.
Open the binomial timeline experiment. For selected values of , start with and successively increase by 1. For each value of , Note the shape of the probability density function of the number of successes and the proportion of successes. With , run the experiment 1000 time, updating by 10. For the number of successes and the proportion of successes, note the apparent convergence of the empirical density function to the probability density function.
The characteristic bell shape that you should observe in the previous exercise is an example of the central limit theorem, because the binomial variable can be written as a sum of independent, identically distributed random variables (the indicator variables).
Show that the standard score of is the same as the standard score of :
Apply the central limit theorem to show that the distribution of the standard score given in the previous exercise converges to the standard normal distribution as :
This version of the central limit theorem is known as the DeMoivre-Laplace theorem, and is named after Abraham DeMoivre and Simeon Laplace. From a practical point of view, this result means that, for large , the distribution of is approximately normal, with mean and standard deviation and the distribution of is approximately normal, with mean and standard deviation . Just how large needs to be for the normal approximation to work well depends on the value of . The rule of thumb is that we need and . Finally, when using the normal approximation, we should remember to use the continuity correction, since the binomial is a discrete distribution.
A student takes a multiple choice test with 20 questions, each with 5 choices (only one of which is correct). Suppose that the student blindly guesses. Let denote the number of questions that the student answers correctly.
A certain type of missile has failure probability 0.02. Let denote the number of failures in 50 tests.
Suppose that in a certain district, 40% of the registered voters prefer candidate . A random sample of 50 registered voters is selected. Let denote the number in the sample who prefer .
Recall that a standard die is a six-sided die. A fair die is one in which the faces are equally likely. An ace-six flat die is a standard die in which faces 1 and 6 have probability each, and faces 2, 3, 4, and 5 have probability .
A standard, fair die is tossed 10 times. Let denote the number of aces.
A coin is tossed 100 times and results in 30 heads. Find the probability density function of the number of heads in the first 20 tosses.
An ace-six flat die is rolled 1000 times. Let denote the number of times that a score of 1 or 2 occurred.
In the binomial coin experiment, select the proportion of heads. Set and . Run the experiment 100 times, updating after each run. Over all 100 runs, compute the square root of the average of the squares of the errors, when used to estimate . This number is a measure of the quality of the estimate.
In the binomial coin experiment, select the number of heads , and set and . Run the experiment 1000 times with an update frequency of 100. Compute and compare the following:
In the binomial coin experiment, select the proportion of heads and set , . Run the experiment 1000 times with an update frequency of 100. Compute and compare each of the following:
In 1693, Samuel Pepys asked Isaac Newton whether it is more likely to get at least one ace in 6 rolls of a die or at least two aces in 12 rolls of a die. This problems is known a Pepys' problem; naturally, Pepys had fair dice in mind.
Guess the answer to Pepys' problem based on empirical data. With fair dice and , run the simulation of the dice experiment 500 times and compute the relative frequency of at least one ace. Now with , run the simulation 500 times and compute the relative frequency of at least two aces. Compare the results.
Which is more likely: at least one ace with 4 throws of a fair die or at least one double ace in 24 throws of two fair dice? This is known as DeMere's problem, named after Chevalier De Mere.
In the cicada data, compute the proportion of males in the entire sample, and the proportion of males of each species in the sample.
In the M&M data, pool the bags to create a large sample of M&Ms. Now compute the sample proportion of red M&Ms.
The Galton board is a triangular array of pegs. The rows are numbered by the natural numbers from top downward. Row has pegs numbered from left to right by the integers . Thus a peg can be uniquely identified by the ordered pair where is the row number and is the peg number in that row. The Galton board is named after Francis Galton.
Now suppose that a ball is dropped from above the top peg . Each time the ball hits a peg, it bounces to the to the right with probability and to the left with probability , independently from bounce to bounce.
Show that the number of the peg that the ball hits in row is the has the binomial distribution with parameters and .
In the Galton board experiment, select random variable (the number of moves right). Vary the parameters and and note the shape and location of the probability density function and the mean/standard deviation bar. For selected values of the parameters, click single step several times and watch the ball fall through the pegs. Then run the experiment 1000 times, updating every 10 runs and watch the path of the ball. Note the apparent convergence of the relative frequency function and empirical moments to the density function and distribution moments, respectively.
Recall the discussion of structural reliability given in the last section on Bernoulli trials. In particular, we have a system of similar components that function independently, each with reliability . Suppose now that the system as a whole functions properly if and only if at least of the components are good. Such a systems is called, appropriately enough, a out of system. Note that the series and parallel systems considered in the previous section are out of and 1 out of systems, respectively.
Show that for a out of system
In the binomial coin experiment, set and and run the simulation 1000 times, updating every 100 runs. Compute the empirical reliability and compare with the true reliability in each of the following cases:
Consider a system with components. Sketch the graphs of , , , and , on the same set of axes.
An out of system is a majority rules system.
In the binomial coin experiment, compute the empirical reliability, based on 100 runs, in each of the following cases. Compare your results to the true probabilities.
Show that .
The Weierstrass Approximation Theorem, named after Karl Weierstrass, states that any real-valued function that is continuous on a closed, bounded interval can be uniformly approximated on that interval, to any degree of accuracy, with a polynomial. The theorem is important, since polynomials are simple and basic functions, and a bit surprising, since continuous functions can be quite strange.
In 1911, Sergi Bernstein gave an explicit construction of polynomials that uniformly approximate a given continuous function, using Bernoulli trials. Bernstein's result is a beautiful example of the probabilistic method, the use of probability theory to obtain results in other areas of mathematics that are seemingly unrelated to probability,
Suppose that is a real-valued function that is continuous on the interval . The Bernstein polynomial of degree for is defined by
where is the proportion of successes in the first Bernoulli trials with success parameter , as defined earlier. Note that we are emphasizing the dependence on in the expected value operator. The next exercise gives a more explicit representation, and shows that the Bernstein polynomial is, in fact, a polynomial
Use basic properties of expected value to show that
In particular, show that
Prove Bernstein's theorem: . Fill in the details of the major steps below:
Compute the Bernstein polynomials of orders 1, 2, and 3 for the function defined by . Graph and the three polynomials on the same set of axes.
Use a computer algebra system to compute the Bernstein polynomials of orders 10, 20, and 30 for the function defined below. Use the CAS to graph the function and the three polynomials on the same axes.