]>
Suppose that is a random sample from the Bernoulli distribution with unknown success parameter . Thus, these are independent random variables taking the values 1 and 0 with probabilities and respectively. Recall that the mean and variance of the Bernoulli distribution are and .
Usually, this model arises in one of the following contexts:
In this section, we will construct confidence intervals for . A parallel section on Tests in the Bernoulli Model is in the chapter on Hypothesis Testing. Note that the sample mean of our data vector
is the sample proportion of objects of the type of interest. By the central limit theorem, the standard score
has approximately a standard normal distribution and hence is (approximately) a pivot variable for . For a given sample size , the distribution of is closest to normal when is near and farthest from normal when is near 0 or 1 (extreme). Because the pivot variable is (approximately) normally distributed, the construction of confidence intervals for in this model is similar to the construction of confidence intervals for in the normal model.
As usual, for , let denote the quantile of order for the standard normal distribution. For selected values of , can be obtained from the last row of the table of the distribution, from the table of the standard normal distribution, from the quantile applet, or from most statistical software packages.
Use the pivot variable to show that for any and any , an approximate confidence set for is
As usual, is the proportion of the significance level in the right tail of the distribution of the pivot variable, and is the proportion of the significance level in the left tail of the distribution of the pivot variable.
Use the quadratic formula to show that the confidence set in Exercise 1 is actually an interval of the form where
As usual, the most important special cases are the equal-tailed confidence interval, obtained by setting , the confidence upper bound, obtained by setting , and the confidence lower bound obtained by setting .
A simplified approximate confidence interval for can be obtained by replacing the distribution parameter by the point estimate in the extreme parts of the inequality in Exercise 1:
Show that an approximate level confidence lower bound for is
Show that an approximate level confidence upper bound for is
Of the two-sided confidence intervals in Exercise 1, show that the one with smallest length is the equal-tailed interval obtained by letting
Note that this interval is symmetric about the sample proportion but that the length of the interval, as well as the center is random. This is the two-sided interval that is normally used.
Use the simulation of the proportion estimation experiment to explore the procedure. Use various values of and various confidence levels, sample sizes, and interval types. For each configuration, run the experiment 1000 times with an update frequency of 10 and note how well the proportion of successful intervals approximates the theoretical confidence level.
Show that the variance of the Bernoulli distribution is maximized when and thus the maximum variance is .
Use the pivot variable to show that for any and any , a conservative confidence interval for is
Show that a conservative level confidence lower bound for is
Show that a conservative level confidence upper bound for is
Of the two-sided confidence intervals in Exercise 8, show that the one with smallest length is the equal-tailed interval obtained by letting
Note that this interval is symmetric about the sample proportion and that the length of the interval is deterministic. This is the conservative two-sided interval that is normally used. Of course, the conservative confidence intervals will be larger than the approximate confidence intervals. The conservative estimate can be used to design the experiment.
Show that a conservative estimate of the sample size needed to estimate with confidence and margin of error is as follows, where for the two-sided interval and for the confidence upper or lower bound:
In a pole of 1000 registered voters in a certain district, 427 prefer candidate X. Construct the 95% two-sided confidence interval for the proportion of all registered voters in the district that prefer X.
A coin is tossed 500 times and results in 302 heads. Construct the 95% confidence lower bound for the probability of heads. Do you believe that the coin is fair?
A sample of 400 memory chips from a production line are tested, and 30 are defective. Construct the conservative 90% two-sided confidence interval for the proportion of defective chips.
A drug company wants to estimate the proportion of persons who will experience an adverse reaction to a certain new drug. The company wants a two-sided interval with margin of error 0.03 with 95% confidence. How large should the sample be?
An advertising agency wants to construct a 99% confidence lower bound for the proportion of dentists who recommend a certain brand of toothpaste. The margin of error is to be 0.02. How large should the sample be?
The Buffon trial data set gives the results of 104 repititions of Buffon's needle experiment. Theoretically, the data should correspond to Bernoulli trials with , but because real students dropped the needle, the true value of is unknown. Construct a 95% confidence interval for . Do you believe that is the theoretical value?