]>
Suppose that is a random sample of size from the normal distribution with mean and standard deviation . In general, both parameters are unknown, so the parameter space is .
In this section we will construct hypothesis tests for and . among the most important special cases of hypothesis testing. We will develop several tests, based on different test statistics. Some of these tests will work better than others, depending on whether one of the parameters is known. This section parallels the section on Estimation in the Normal Model in the chapter on Set Estimation, and in particular, the duality between set estimation and hypothesis testing will play an important role.
Ultimately, all of our test statistics can be constructed from basic test statistics that are standardized
versions of our data variables. For
and
let
Show that is a random sample of size from the normal distribution with mean and standard deviation
In particular, is a random sample from the standard normal distribution. Thus, it is reasonable that test statistics constructed from may be useful for testing hypothesis with and as conjectured values of and respectively.
Recall that the sample mean of our data vector is
Our first test statistic is a linear transformation of the sample mean. For and , define
Show that has the normal distribution with mean and standard deviation
In particular, the random variable is the ordinary standard score of and has the standard normal distribution; this variable was one of the pivot variables used to construct confidence sets in the normal model.
Show that the test statistic in Exercise 2 can be written in terms of the basic test statistics in Exercise 1 as follows:
As usual, we will let denote the standard normal probability density function and the standard normal distribution function. For , let denote the quantile of order for the standard normal distribution. That is, . For selected values of , can be obtained from the last row of the table of the distribution, from the table of the standard normal distribution, from the quantile applet, or from most statistical software packages.
Show or recall the following properties. Part (d) follows from the inverse function theorem of calculus.
Show that for any and , the following test has significance level . Reject versus if and only if . Equivalently, we fail to reject if and only if
Note that the hypothesis test in Exercise 5 is the dual of the confidence set constructed with the pivot variable in the section on Estimation in the Normal Model. That is, the set of for which we fail to reject at level is precisely the confidence set for . This set is shown in the picture above. Note that is the fraction of the significance level in the right tail of the distribution of the pivot variable; is the fraction of in the left tail. The equal-tailed case when is the one that is most commonly used; this test is said to be unbiased.
Clearly, a test of two real parameters based on a single real-valued test statistic cannot be very good. In the remainder of this subsection, we will assume that is known. This is often, but not always, an artificial assumption; see Exercise 43 for an example where the assumption may be reasonable. With known, we will write out basic test statistic as .
Show that for any and , the following test has significance level :
In part (a), the unbiased test with is most commonly used: Reject versus if and only if or
For each of the tests in Exercise 6, show that we fail to reject at significance level if and only if is in the corresponding confidence interval.
Recall that the power function of a test of a parameter is the probability of rejecting the null hypothesis, as a function of the true value of the parameter. Our next series of exercises will explore the power functions of the tests in Exercise 6.
Show that the power function of the test in Exercise 6 (a) is given by the formula below, and satisfies the given properties
Show that the power function of the test in Exercise 6 (b), is given by the formula below, and satisfies the given properties:
Show that the power function of the test in Exercise 6 (c), is given by the formula below, and satisfies the given properties:
Show that for any of the three tests in Exercise 6, increasing the sample size or decreasing the standard deviation results in a uniformly more powerful test.
In the mean test experiment, select the normal test statistic and select the normal sampling distribution with standard deviation , significance level , sample size , and . Run the experiment 1000 times, updating every 10 runs for several values of the true distribution mean . For each value of . note the relative frequency of the event that the null hypothesis is rejected. Sketch the empirical power function.
In the mean estimate experiment, select the normal pivot variable and select the normal distribution with and standard deviation , confidence level , and sample size . For each of the three types of confidence intervals, run the experiment 20 times, updating after each run. State the corresponding hypotheses and significance level, and for each run, give the set of for which the null hypothesis would be rejected.
In many cases, the first step is to design the experiment so that the significance level is and so that the test has a given power for a given alternative
For either of the one-sided tests in Exercise 6, show that the sample size needed for a test with significance level and power for the alternative is as follows. Hint: Set the power function equal to and solve for
For the unbiased, two-sided test, show that the sample size needed for a test with significance level and power for the alternative is given approximately by the following formula. Hint: In the power function for the two-sided test given in Exercise 8, neglect the first term if and neglect the second term if .
In this section we will use test statistics related to the sample variance. First recall that the usual version of the sample variance is
Moreover, recall that one of the most important special properties of normal samples is that the sample mean and the sample variance are independent. Now, for and , define
Recall that we have used as a special version of the sample variance, in the unlikely event that the mean is known.
Show that is the sum of the squares of the basic test statistics in Exercise 1:
Show that where is the test statistic in Exercise 2. Thus, in particular, it follows that can also be written in terms of the basic test statistics in Exercise 1.
Show that
The variables in Exercise 18 are pivot variables that were used to construct confidence sets in the normal model.
Let denote the probability density function and the distribution function for the chi-square distribution with degrees of freedom. In addition, for , let denote the quantile of order for the distribution, so that by definition, . For selected values of and , can be obtained from the table of the chi-square distribution, from the quantile applet, or from most statistical software packages.
Show or recall the following properties. Part (c) follows from the inverse function theorem of calculus:
Show that for any and , the following test has significance level . Reject versus if and only if or . Equivalently, show that we fail to reject is and only if
Note that the hypothesis test in Exercise 20 is the dual of the confidence sets constructed with the pivot variable in the section on Estimation in the Normal Model. That is, the set of for which we fail to reject at level is precisely the confidence set for . This set are shown in the pictures above.
Show that for any and , the following test has significance level . Reject versus if and only if or . Equivalently, show that we fail to reject if and only if
Note that the hypothesis tests in Exercise 21 is the dual of the confidence set constructed with the pivot variable in the section on Estimation in the Normal Model. That is, the set of for which we fail to reject at level is precisely the confidence set for . This set is shown in the picture above.
In both tests, note that is the fraction of the significance level in the right tail of the distribution of the pivot variable; is the fraction of in the left tail. The equal-tailed case when is the one that is most commonly used; this test is said to be unbiased.
Clearly, a test of two real parameters based on a single real-valued test statistic cannot be very good. However, since the test statistic gives no information about . it natural to use this statistic in tests involving only. If were known, we could also use the test statistic . All of the results given in the remainder of this subsection would apply, but with replacing . However, the assumption that is known is very artificial.
Show that for any and , the following test has significance level :
In part (a), the unbiased test with is most commonly used: Reject versus if and only if or
For each of the tests in Exercise 22, show that we fail to reject at significance level if and only if is in the corresponding confidence interval.
Recall that the power function of a test of a parameter is the probability of rejecting the null hypothesis, as a function of the true value of the parameter. Our next series of exercises will explore the power functions of the tests in Exercise 22.
Show that the power function of the test in Exercise 22 (a) is given by the following formula, and then verify the given properties:
Show that the power function of the test in Exercise 22 (b) is given by the following formula, and then verify the given properties:
Show that the power function for the test in Exercise 22 (c) is given by the following formula, and then verify the given properties:
Show that for any of the three tests in Exercise 22, increasing the sample size results in a uniformly more powerful test.
In the variance test experiment, select the normal distribution with mean 0, and select significance level 0.1, sample size 10, and test standard deviation 1.0. For various values of the true standard deviation, run the simulation 1000 times, updating every 10 runs. Record the relative frequency of rejecting the null hypothesis and plot the empirical power curve.
In the variance estimate experiment, select the normal distribution with mean 0 and standard deviation 2, and select confidence level 0.90 and sample size 10. Run the experiment 20 times, updating after each run. State the corresponding hypotheses and significance level, and for each run, give the set of test standard deviations for which the null hypothesis would be rejected.
Our next text statistic will lead to good tests of without requiring the assumption that is known. For , define
Show that for any and any ,
In particular, it follows that can be written in terms of the basic test statistics in Exercise 1.
Show or recall that has the student distribution with degrees of freedom.
This variable was one of the pivot variables used to construct estimates in the normal model. When , the distribution of is known as a non-central distribution.
As usual, for , we will let denote the probability density function and the distribution function for the distribution with degrees of freedom. Additionally, for , let denote the quantile of order for this distribution, so that . For selected values of and . values of can be obtained from the table of the distribution or from the quantile applet.
Show or recall the following properties. Part (d) follows from the inverse function theorem of calculus.
Show that for any and , the following test has significance level . Reject versus if and only if . Equivalently, we fail to reject if and only if
Note that the hypothesis test in Exercise 33 is the dual of the confidence set constructed with the pivot variable in the section on Estimation in the Normal Model. That is, the set of for which we fail to reject at level is precisely the confidence set for . This set is shown in the picture above. Note that is the fraction of the significance level in the right tail of the distribution of the pivot variable; is the fraction of in the left tail. The equal-tailed case when is the one that is most commonly used; this test is said to be unbiased.
Clearly since our test statistic gives no information about . there is no point in putting this parameter in the hypotheses.
Show that for any and , the following test has significance level :
In part (a), the unbiased test with is most commonly used:
Reject versus if and only if or
For each of the tests in Exercise 34, show that we fail to reject at significance level if and only if is in the corresponding confidence interval.
The -value of these test can be computed in terms of the distribution function of the -distribution with degrees of freedom..
Show that the -values of the tests in Exercise 34 are respectively
In the mean test experiment, select the student test statistic and select the normal sampling distribution with standard deviation , significance level , sample size , and . Run the experiment 1000 times, updating every 10 runs for several values of the true distribution mean . For each value of . note the relative frequency of the event that the null hypothesis is rejected. Sketch the empirical power function.
In the mean estimate experiment, select the student pivot variable and select the normal sampling distribution with mean 0 and standard deviation 2. Select confidence level 0.90 and sample size 10. For each of the three types of intervals, run the experiment 20 times, updating after each run. State the corresponding hypotheses and significance level, and for each run, give the set of for which the null hypothesis would be rejected.
The power function for the tests in Exercise 34 can be computed explicitly in terms of the non-central distribution function. Qualitatively, the graphs of the power functions are similar to the case when is known, given in Exercise 8, Exercise 9, and Exercise 10.
If an upper bound on the standard deviation is known, then conservative estimates on the sample size needed for a given confidence level and a given margin of error can be obtained using the methods of Exercise 14 and Exercise 15.
The primary assumption that we made is that the underlying sampling distribution is normal. Of course, in real statistical problems, we are unlikely to know much about the sampling distribution, let alone whether or not it is normal. Suppose in fact that the underlying distribution is not normal. When the sample size is relatively large, the distribution of the sample mean will still be approximately normal by the central limit theorem, and thus our tests of the mean should still be approximately valid. On the other hand, tests of the variance are less robust to deviations form the assumption of normality. The following exercises explore these ideas.
In the mean test experiment, select the gamma distribution with shape parameter 1 and scale parameter 1. For the three different tests and for various significance levels, sample sizes, and values of , run the experiment 1000 times with an update frequency of 10. For each configuration, note the relative frequency of rejecting . When is true, compare the relative frequency with the significance level.
In the mean test experiment, select the uniform distribution on . For the three different tests and for various significance levels, sample sizes, and values of , run the experiment 1000 times with an update frequency of 10. For each configuration, note the relative frequency of rejecting . When is true, compare the relative frequency with the significance level.
How large needs to be for the testing procedure to work well depends, of course, on the underlying distribution; the more this distribution deviates from normality, the larger must be. Fortunately, convergence to normality in the central limit theorem is rapid and hence, as you observed in the exercises, we can get away with relatively small sample sizes (30 or more) in most cases.
In the variance test experiment, select the gamma distribution with shape parameter 1 and scale parameter 1. For the three different tests and for various significance levels, sample sizes, and values of , run the experiment 1000 times with an update frequency of 10. For each configuration, note the relative frequency of rejecting . When is true, compare the relative frequency with the significance level.
In the variance test experiment, select the uniform distribution on . For the three different tests and for various significance levels, sample sizes, and values of , run the experiment 1000 times with an update frequency of 10. For each configuration, note the relative frequency of rejecting . When is true, compare the relative frequency with the significance level.
The length of a certain machined part is supposed to be 10 centimeters. In fact, due to imperfections in the manufacturing process, the actual length is a random variable. The standard deviation is due to inherent factors in the process, which remain fairly stable over time. From historical data, the standard deviation is known with a high degree of accuracy to be 0.3. The mean, on the other hand, may be set by adjusting various parameters in the process and hence may change to an unknown value fairly frequently. We are interested in testing versus .
A bag of potato chips of a certain brand has an advertised weight of 250 grams. Actually, the weight (in grams) is a random variable. Suppose that a sample of 75 bags has mean 248 and standard deviation 5. At the 0.05 significance level, perform the following tests:
At a telemarketing firm, the length of a telephone solicitation (in seconds) is a random variable. A sample of 50 calls has mean 310 and standard deviation 25. At the 0.1 level of significance, can we conclude that
At a certain farm the weight of a peach (in ounces) at harvest time is a random variable. A sample of 100 peaches has mean 8.2 and standard deviation 1.0. At the 0.01 level of significance, can we conclude that
The hourly wage for a certain type of construction work is a random variable with standard deviation 1.25. For sample of 25 workers, the mean wage was $6.75. At the 0.01 level of significance, can we conclude that ?
Using Michelson's data, test to see if the velocity of light is greater than 730 (+299000) km/sec, at the 0.005 significance level.
Using Cavendish's data, test to see if the density of the earth is less than 5.5 times the density of water, at the 0.05 significance level .
Using Short's data, test to see if the parallax of the sun differs from 9 seconds of a degree, at the 0.1 significance level.
Using Fisher's iris data, perform the following tests, at the 0.1 level: