]>
In this section, we will study estimation problems in the two-sample normal model and in the bivariate normal model. This section parallels the section on Tests in the Two-Sample Normal Model in the Chapter on Hypothesis Testing.
Suppose that is a random sample of size from the normal distribution with mean and standard deviation . and that is a random sample of size from the normal distribution with mean and standard deviation . Moreover, suppose that the samples and are independent. Usually, the parameters are unknown, so the parameter space for our vector of parameters is
This type of situation arises frequently when the random variables represent a measurement of interest for the objects of the population, and the samples correspond to two different treatments. For example, we might be interested in the blood pressure of a certain population of patients. The vector records the blood pressures of a control sample, while the vector records the blood pressures of the sample receiving a new drug. Similarly, we might be interested in the yield of an acre of corn. The vector records the yields of a sample receiving one type of fertilizer, while the vector records the yields of a sample receiving a different type of fertilizer.
Usually our interest is in a comparison of the parameters (either the mean or standard deviation) for the two sampling distributions. In this section we will construct confidence sets for the parameter vector, which will in turn lead to confidence intervals for the ratio of the standard deviations and for the difference of the means. As with previous estimation problems, the construction depends on finding appropriate pivot variables.
For a generic sample from a distribution with mean . we will use our standard notation for the sample mean and for the sample variances. Recall also the special properties of these statistics when the sampling distribution is normal.
Show that the following random variable has the distribution with degrees of freedom in the numerator and degrees of freedom in the denominator. Show that the variable is a pivot variable for and for .
Now for and for and , let denote the quantile of order for the distribution with degrees of freedom in the numerator and degrees of freedom in the denominator. For selected values of , . and , can be computed using the quantile applet or from most statistical and mathematical software packages.
Use the pivot variable in Exercise 1 to show that for any and , a confidence set for is
It's hard to get much insight about the structure of this confidence set as subset of our four-dimensional parameter space. The set is unbounded, of course, since we are estimating four parameters with a single pivot variable. If and are known (admittedly an artificial assumption), then the set becomes a confidence set for . This set is also unbounded, and has the general shape given in the picture below:
Note, however, that with and known, our confidence set gives a bounded confidence interval for , and by taking square roots, a bounded confidence interval for . As always, the number controls the proportion of . in the right tail of the distribution of our pivot variable (so that is the proportion in the left tail). As usual, the most important special cases are
This subsection will parallel the previous one, except that we will use the usual version of the sample variances instead of the special version.
Show that the following random variable has the distribution with degrees of freedom in the numerator and degrees of freedom in the denominator. Show that the variable is a pivot variable for , for , and for .
Use the pivot variable in Exercise 3 to show that for any and , a confidence set for is
By design, this confidence set gives no information at all about and . As a confidence set for , the set has the same general shape as before:
Moreover, our confidence set gives a bounded confidence interval for , and by taking square roots, a bounded confidence interval for As always, the number controls the proportion of in the right tail of the distribution of our pivot variable (so that is the proportion of in the left tail). As usual, the most important special cases are
Next we will study a pivot variable whose confidence sets are better in terms of the difference of the means.
Show that has the normal distribution with mean and variance .
Show that the following random variable has the standard normal distribution. Show that the variable is a pivotal variable for and for
As usual, for , we will let denote the quantile of order for the standard normal distribution. Recall also that
Use the pivot variable in Exercise 6 to show that for any and , a confidence set for is
Once again, it is hard to get much insight about this confidence set, as a subset of the four-dimensional parameter space. It is unbounded, of course, since we are estimating four parameters using a single pivot variable. If and are known, then the set is a confidence set for . This set is also unbounded, but is easy to understand; the boundary curves are lines with slope 1, so the set has the general shape given in the picture below. Note that the width of the confidence strip is deterministic.
Moreover, if and are known, the confidence set gives bounded confidence intervals for the difference of the means , and often this difference is our parameter of interest. The assumption that and are known is usually artificial, but less so than the assumption in our first subsection that and are known.
As always, the number controls the proportion of in the right tail of the distribution of our pivot variable (so that is the proportion of in the left tail). As usual, the most important special cases are
Our last construction will produce a pivot variable that is useful for estimating the difference of the means without needing to know the standard deviations and . However, there is a cost; we will assume that the standard deviations are the same, , but the common value is unknown. This assumption is reasonable if there is an inherent variability in the measurement variables that does not change even when different treatments are applied to the objects in the population.
Show that the normal pivot variable in Exercise 6 becomes
To construct our desired pivot variable, we first need a point estimate of . A natural approach is to consider a weighted average of the sample variances and , with the degrees of freedom as the weight factors (this is called the pooled estimate of . Thus, let
Show that the following variable has the chi-square distribution with degrees of freedom. Hint: Express the variable as the sum of independent chi-square variables.
Show that and are independent. Hint: Show or recall independence for each of the following pairs of variables:
Show that the following random variable has the distribution with degrees of freedom. Show that the variable is a pivot variable for , for , and for .
Hint: Show that the random variable can be written as where is the random variable in Exercise 8 and is the random variable in Exercise 9. Moreover, and are independent by Exercise 10.
For and , let denote the quantile of order for the distribution with degrees of freedom. For selected values of and , values of are given in the table of the Student distribution, from the quantile applet, or from most statistical software packages. Recall also that, by symmetry,
Use the pivot variable in Exercise 11 to show that for any and , a confidence set for is
By design, the confidence set gives no information about and . As a confidence set for , the set has the same general shape as in the last subsection, except that the width of the strip is random.
Finally, of course, the confidence set gives bounded confidence intervals for the difference of the means , and again, this difference is often our parameter of interest.
As always, the number controls the proportion of in the right tail of the distribution of our pivot variable (so that is the proportion of in the left tail). The most important special cases are
In this subsection, we consider a model that is superficially similar to the two-sample normal model, but is actually much simpler. Suppose that
is a random sample of size from the bivariate normal distribution of with , , , , and .
Thus, instead of a pair of samples, we have a sample of pairs. This type of model frequently arises in before and after experiments, in which a measurement of interest is recorded for a sample of objects from the population, both before and after a treatment. For example, we could record the blood pressure of a sample of patients, before and after the administration of a certain drug. As with the two-sample normal model, the interest is usually in estimating the difference of the means.
We will use our usual notation for the sample means and variances of and . Recall also that the sample covariance of , is
Show that is a random sample of size from the distribution of , which is normal with
Show that
The sample of differences fits the normal model for a single variable. The section on Estimation in the Normal Model could be used to obtain confidence sets and intervals for the parameters .
Suppose that and are independent samples from normal distributions. This data fits both models--the two-sample normal model and the bivariate normal model. Which procedure would work better for estimating the difference of means ?
A new drug is being developed to reduce a certain blood chemical. A sample of 36 patients are given a placebo while a sample of 49 patients are given the drug. Let denote the measurement for a patient given the placebo and the measurement for a patient given the drug (in mg). The statistics are , , , .
A company claims that an herbal supplement improves intelligence. A sample of 25 persons are given a standard IQ test before and after taking the supplement. Let denote the IQ of a subject before taking the supplement and the IQ of the subject after the supplement. The before and after statistics are , , , , . Do you believe the company's claim?
In Fisher's iris data, let denote consider the petal length of a Versicolor iris and the petal length of a Virginica iris.
A plant has two machines that produce a circular rod whose diameter (in cm) is critical. Let denote the diameter of a rod from the first machine and the diameter of a rod from the second machine. A sample of 100 rods from the first machine as mean 10.3 and standard deviation 1.2. A sample of 100 rods from the second machine has mean 9.8 and standard deviation 1.6.