]> Tests in the Normal Model
  1. Virtual Laboratories
  2. 9. Hypothesis Testing
  3. 1
  4. 2
  5. 3
  6. 4
  7. 5
  8. 6
  9. 7

2. Tests in the Normal Model

Preliminaries

The Normal Model

Suppose that X X 1 X 2 X n is a random sample of size n from the normal distribution with mean μ and standard deviation σ 0 . In general, both parameters are unknown, so the parameter space is 0 .

In this section we will construct hypothesis tests for μ and σ . among the most important special cases of hypothesis testing. We will develop several tests, based on different test statistics. Some of these tests will work better than others, depending on whether one of the parameters is known. This section parallels the section on Estimation in the Normal Model in the chapter on Set Estimation, and in particular, the duality between set estimation and hypothesis testing will play an important role.

The Basic Test Statistics

Ultimately, all of our test statistics can be constructed from basic test statistics that are standardized versions of our data variables. For a and b 0 let

Z i a b X i a b ,  i 1 2 n

Show that Z a b Z 1 a b Z 2 a b Z n a b is a random sample of size n from the normal distribution with mean μ a b and standard deviation σ b

In particular, Z μ σ is a random sample from the standard normal distribution. Thus, it is reasonable that test statistics constructed from Z a b may be useful for testing hypothesis with a and b as conjectured values of μ and σ respectively.

Tests Based on a Normal Statistic

The Test Statistic

Recall that the sample mean of our data vector is

M 1 n i 1 n X i

Our first test statistic is a linear transformation of the sample mean. For a and b 0 , define

Z a b M a b n

Show that Z a b has the normal distribution with mean μ a b n and standard deviation σ b

In particular, the random variable Z μ σ is the ordinary standard score of M and has the standard normal distribution; this variable was one of the pivot variables used to construct confidence sets in the normal model.

Show that the test statistic in Exercise 2 can be written in terms of the basic test statistics in Exercise 1 as follows:

Z a b 1 n i 1 n Z i a b

As usual, we will let φ denote the standard normal probability density function and Φ the standard normal distribution function. For p 0 1 , let z p denote the quantile of order p for the standard normal distribution. That is, z p Φ p . For selected values of p , z p can be obtained from the last row of the table of the t distribution, from the table of the standard normal distribution, from the quantile applet, or from most statistical software packages.

Show or recall the following properties. Part (d) follows from the inverse function theorem of calculus.

  1. z p z 1 p
  2. z p as p 0
  3. z p as p 1
  4. z p 1 φ z p

Hypothesis Tests

Show that for any α 0 1 and p 0 1 , the following test has significance level α . Reject μ σ μ 0 σ 0 versus μ σ μ 0 σ 0 if and only if Z μ 0 σ 0 z α p α  or  Z μ 0 σ 0 z 1 p α . Equivalently, we fail to reject if and only if

M z 1 p α σ 0 n μ 0 M z α p α σ 0 n
Confidence set

Note that the hypothesis test in Exercise 5 is the dual of the confidence set constructed with the pivot variable Z μ σ in the section on Estimation in the Normal Model. That is, the set of μ 0 σ 0 for which we fail to reject at level α is precisely the 1 α confidence set for μ σ . This set is shown in the picture above. Note that p is the fraction of the significance level α in the right tail of the distribution of the pivot variable; 1 p is the fraction of α in the left tail. The equal-tailed case when p 12 is the one that is most commonly used; this test is said to be unbiased.

Clearly, a test of two real parameters based on a single real-valued test statistic cannot be very good. In the remainder of this subsection, we will assume that σ is known. This is often, but not always, an artificial assumption; see Exercise 43 for an example where the assumption may be reasonable. With σ known, we will write out basic test statistic as Z μ 0 Z μ 0 σ .

Show that for any α 0 1 and p 0 1 , the following test has significance level α :

  1. Reject μ μ 0 versus μ μ 0 if and only if Z μ 0 z α p α or Z μ 0 z 1 p α
  2. Reject μ μ 0 versus μ μ 0 if and only if Z μ 0 z α z 1 α
  3. Reject μ μ 0 versus μ μ 0 if and only if Z μ 0 z 1 α

In part (a), the unbiased test with p 12 is most commonly used: Reject μ μ 0 versus μ μ 0 if and only if Z μ 0 z 1 α 2 or Z μ 0 z 1 α 2

For each of the tests in Exercise 6, show that we fail to reject at significance level α if and only if μ 0 is in the corresponding 1 α confidence interval.

Power Functions

Recall that the power function of a test of a parameter is the probability of rejecting the null hypothesis, as a function of the true value of the parameter. Our next series of exercises will explore the power functions of the tests in Exercise 6.

Show that the power function of the test in Exercise 6 (a) is given by the formula below, and satisfies the given properties

Q μ Φ z α p α n σ μ μ 0 Φ n σ μ μ 0 z 1 p α
  1. Q is decreasing on m 0 and increasing on m 0 where m 0 μ 0 z α p α z 1 p α n 2 σ .
  2. Q μ 0 α .
  3. Q μ 1 as μ and Q μ 1 as μ .
  4. If p 12 then Q is symmetric about μ 0 .
  5. As p increases, Q μ increases if μ μ 0 and decreases if μ μ 0

Show that the power function of the test in Exercise 6 (b), is given by the formula below, and satisfies the given properties:

Q μ Φ z α n σ μ μ 0
  1. Q is increasing on .
  2. Q μ 0 α .
  3. Q μ 1 as μ and Q μ 0 as μ

Show that the power function of the test in Exercise 6 (c), is given by the formula below, and satisfies the given properties:

Q μ Φ z α n σ μ μ 0
  1. Q is decreasing on .
  2. Q μ 0 α .
  3. Q μ 0 as μ and Q μ 1 as μ

Show that for any of the three tests in Exercise 6, increasing the sample size n or decreasing the standard deviation σ results in a uniformly more powerful test.

In the mean test experiment, select the normal test statistic and select the normal sampling distribution with standard deviation σ 2 , significance level α 0.1 , sample size n 20 , and μ 0 0 . Run the experiment 1000 times, updating every 10 runs for several values of the true distribution mean μ . For each value of μ . note the relative frequency of the event that the null hypothesis is rejected. Sketch the empirical power function.

In the mean estimate experiment, select the normal pivot variable and select the normal distribution with μ 0 and standard deviation σ 2 , confidence level 1 α 0.90 , and sample size n 10 . For each of the three types of confidence intervals, run the experiment 20 times, updating after each run. State the corresponding hypotheses and significance level, and for each run, give the set of μ 0 for which the null hypothesis would be rejected.

Design of the Experiment

In many cases, the first step is to design the experiment so that the significance level is α and so that the test has a given power β for a given alternative μ 1

For either of the one-sided tests in Exercise 6, show that the sample size n needed for a test with significance level α and power β for the alternative μ 1 is as follows. Hint: Set the power function equal to β and solve for n

n σ z β z α μ 1 μ 0 2

For the unbiased, two-sided test, show that the sample size n needed for a test with significance level α and power β for the alternative μ 1 is given approximately by the following formula. Hint: In the power function for the two-sided test given in Exercise 8, neglect the first term if μ 1 μ 0 and neglect the second term if μ 1 μ 0 .

n σ z β z α 2 μ 1 μ 0 2

Tests Based on Chi-Square Statistics

Test Statistics

In this section we will use test statistics related to the sample variance. First recall that the usual version of the sample variance is

S 2 1 n 1 i 1 n X i M 2

Moreover, recall that one of the most important special properties of normal samples is that the sample mean M and the sample variance S 2 are independent. Now, for a and b 0 , define

W 2 a 1 n i 1 n X i a 2 ,  U a b n b 2 W 2 a ,  V b n 1 b 2 S 2

Recall that we have used W 2 μ as a special version of the sample variance, in the unlikely event that the mean μ is known.

Show that U a b is the sum of the squares of the basic test statistics in Exercise 1:

U a b i 1 n Z i 2 a b

Show that V b U a b Z 2 a b where Z a b is the test statistic in Exercise 2. Thus, in particular, it follows that V b can also be written in terms of the basic test statistics in Exercise 1.

Show that

  1. U μ σ has the chi-square distribution with n degrees of freedom.
  2. V σ has the chi-square distribution with n 1 degrees of freedom.

The variables in Exercise 18 are pivot variables that were used to construct confidence sets in the normal model.

Let g k denote the probability density function and G k the distribution function for the chi-square distribution with k degrees of freedom. In addition, for p 0 1 , let k p denote the quantile of order p for the distribution, so that by definition, k p G k p . For selected values of k and p , k p can be obtained from the table of the chi-square distribution, from the quantile applet, or from most statistical software packages.

Show or recall the following properties. Part (c) follows from the inverse function theorem of calculus:

  1. k p 0 as p 0
  2. k p as p 1
  3. p k p 1 g k k p

Hypothesis Tests

Show that for any α 0 1 and p 0 1 , the following test has significance level α . Reject μ σ μ 0 σ 0 versus μ σ μ 0 σ 0 if and only if U μ 0 σ 0 n α p α or U μ 0 σ 0 n 1 p α . Equivalently, show that we fail to reject is and only if

n W 2 μ 0 n 1 p α σ 0 2 n W 2 μ 0 n α p α
Confidence set

Note that the hypothesis test in Exercise 20 is the dual of the confidence sets constructed with the pivot variable U μ σ in the section on Estimation in the Normal Model. That is, the set of μ 0 σ 0 for which we fail to reject at level α is precisely the 1 α confidence set for μ σ . This set are shown in the pictures above.

Show that for any α 0 1 and p 0 1 , the following test has significance level α . Reject μ σ μ 0 σ 0 versus μ σ μ 0 σ 0 if and only if V σ 0 n 1 α p α or V σ 0 n 1 1 p α . Equivalently, show that we fail to reject if and only if

n 1 S 2 n 1 1 p α σ 0 2 n 1 S 2 n 1 α p α
Confidence set

Note that the hypothesis tests in Exercise 21 is the dual of the confidence set constructed with the pivot variable V σ in the section on Estimation in the Normal Model. That is, the set of μ 0 σ 0 for which we fail to reject at level α is precisely the 1 α confidence set for μ σ . This set is shown in the picture above.

In both tests, note that p is the fraction of the significance level α in the right tail of the distribution of the pivot variable; 1 p is the fraction of α in the left tail. The equal-tailed case when p 12 is the one that is most commonly used; this test is said to be unbiased.

Clearly, a test of two real parameters based on a single real-valued test statistic cannot be very good. However, since the test statistic V σ 0 gives no information about μ . it natural to use this statistic in tests involving σ only. If μ were known, we could also use the test statistic U σ 0 U μ σ 0 . All of the results given in the remainder of this subsection would apply, but with n replacing n 1 . However, the assumption that μ is known is very artificial.

Show that for any α 0 1 and p 0 1 , the following test has significance level α :

  1. Reject σ σ 0 versus σ σ 0 if and only if V σ 0 n 1 α p α or V σ 0 n 1 1 p α
  2. Reject σ σ 0 versus σ σ 0 if and only if V σ 0 n 1 α
  3. Reject σ σ 0 versus σ σ 0 if and only if V σ 0 n 1 1 α

In part (a), the unbiased test with p 12 is most commonly used: Reject σ σ 0 versus σ σ 0 if and only if V σ 0 n 1 α 2 or V σ 0 n 1 1 α 2

For each of the tests in Exercise 22, show that we fail to reject at significance level α if and only if σ 0 is in the corresponding 1 α confidence interval.

Power Functions

Recall that the power function of a test of a parameter is the probability of rejecting the null hypothesis, as a function of the true value of the parameter. Our next series of exercises will explore the power functions of the tests in Exercise 22.

Show that the power function of the test in Exercise 22 (a) is given by the following formula, and then verify the given properties:

Q σ 1 G n 1 σ 0 2 σ 2 n 1 1 p α G n 1 σ 0 2 σ 2 n 1 α p α
  1. Q is decreasing on σ 0 and increasing on σ 0 .
  2. Q σ 0 α .
  3. Q σ 1 as σ and Q σ 1 as σ 0

Show that the power function of the test in Exercise 22 (b) is given by the following formula, and then verify the given properties:

Q σ 1 G n 1 σ 0 2 σ 2 n 1 1 α
  1. Q is increasing on 0
  2. Q σ 0 α .
  3. Q σ 1 as σ and Q σ 0 as σ 0

Show that the power function for the test in Exercise 22 (c) is given by the following formula, and then verify the given properties:

Q σ G n 1 σ 0 2 σ 2 n 1 α
  1. Q is decreasing on 0
  2. Q σ 0 α .
  3. Q σ 0 as σ and Q σ 1 as σ 0

Show that for any of the three tests in Exercise 22, increasing the sample size n results in a uniformly more powerful test.

In the variance test experiment, select the normal distribution with mean 0, and select significance level 0.1, sample size 10, and test standard deviation 1.0. For various values of the true standard deviation, run the simulation 1000 times, updating every 10 runs. Record the relative frequency of rejecting the null hypothesis and plot the empirical power curve.

  1. Two-sided test
  2. Left-tailed test
  3. Right-tailed test

In the variance estimate experiment, select the normal distribution with mean 0 and standard deviation 2, and select confidence level 0.90 and sample size 10. Run the experiment 20 times, updating after each run. State the corresponding hypotheses and significance level, and for each run, give the set of test standard deviations for which the null hypothesis would be rejected.

  1. Two-sided confidence interval
  2. Confidence lower bound
  3. Confidence upper bound

Tests Based on a Student t Statistic

The Test Statistic

Our next text statistic will lead to good tests of μ without requiring the assumption that σ is known. For a , define

T a M a S n

Show that for any a and any b 0 ,

T a Z a b V b n 1

In particular, it follows that T a can be written in terms of the basic test statistics in Exercise 1.

Show or recall that T μ has the student t distribution with n 1 degrees of freedom.

This variable was one of the pivot variables used to construct estimates in the normal model. When μ a , the distribution of T a is known as a non-central t distribution.

As usual, for k 0 , we will let φ k denote the probability density function and Φ k the distribution function for the t distribution with k degrees of freedom. Additionally, for p 0 1 , let t k p denote the quantile of order p for this distribution, so that t k p Φ k p . For selected values of k and p . values of t k p can be obtained from the table of the t distribution or from the quantile applet.

Show or recall the following properties. Part (d) follows from the inverse function theorem of calculus.

  1. t k 1 p t k p
  2. t k p as p 0
  3. t k p as p 1
  4. t k p 1 φ k t k p

Hypothesis Tests

Show that for any α 0 1 and p 0 1 , the following test has significance level α . Reject μ σ μ 0 σ 0 versus μ σ μ 0 σ 0 if and only if T μ 0 t n 1 α p α  or  T μ 0 t n 1 1 p α . Equivalently, we fail to reject if and only if

M t n 1 1 p α S n μ 0 M t n 1 α p α S n
Confidence set

Note that the hypothesis test in Exercise 33 is the dual of the confidence set constructed with the pivot variable T μ in the section on Estimation in the Normal Model. That is, the set of μ 0 σ 0 for which we fail to reject at level α is precisely the 1 α confidence set for μ σ . This set is shown in the picture above. Note that p is the fraction of the significance level α in the right tail of the distribution of the pivot variable; 1 p is the fraction of α in the left tail. The equal-tailed case when p 12 is the one that is most commonly used; this test is said to be unbiased.

Clearly since our test statistic gives no information about σ . there is no point in putting this parameter in the hypotheses.

Show that for any α 0 1 and p 0 1 , the following test has significance level α :

  1. Reject μ μ 0 versus μ μ 0 if and only if T μ 0 t n 1 α p α or T μ 0 t n 1 1 p α
  2. Reject μ μ 0 versus μ μ 0 if and only if T μ 0 t n 1 α t n 1 1 α
  3. Reject μ μ 0 versus μ μ 0 if and only if T μ 0 t n 1 1 α

In part (a), the unbiased test with p 12 is most commonly used:

Reject μ μ 0 versus μ μ 0 if and only if T μ 0 t n 1 1 α 2 or T μ 0 t n 1 1 α 2

For each of the tests in Exercise 34, show that we fail to reject at significance level α if and only if μ 0 is in the corresponding 1 α confidence interval.

The P -value of these test can be computed in terms of the distribution function Φ n 1 of the t -distribution with n 1 degrees of freedom..

Show that the P -values of the tests in Exercise 34 are respectively

  1. 2 1 Φ n 1 T 0
  2. 1 Φ n 1 T 0
  3. Φ n 1 Z 0

In the mean test experiment, select the student test statistic and select the normal sampling distribution with standard deviation σ 2 , significance level α 0.1 , sample size n 20 , and μ 0 0 . Run the experiment 1000 times, updating every 10 runs for several values of the true distribution mean μ . For each value of μ . note the relative frequency of the event that the null hypothesis is rejected. Sketch the empirical power function.

In the mean estimate experiment, select the student pivot variable and select the normal sampling distribution with mean 0 and standard deviation 2. Select confidence level 0.90 and sample size 10. For each of the three types of intervals, run the experiment 20 times, updating after each run. State the corresponding hypotheses and significance level, and for each run, give the set of μ 0 for which the null hypothesis would be rejected.

The power function for the tests in Exercise 34 can be computed explicitly in terms of the non-central t distribution function. Qualitatively, the graphs of the power functions are similar to the case when σ is known, given in Exercise 8, Exercise 9, and Exercise 10.

If an upper bound σ 0 on the standard deviation σ is known, then conservative estimates on the sample size needed for a given confidence level and a given margin of error can be obtained using the methods of Exercise 14 and Exercise 15.

Exercises

Robustness

The primary assumption that we made is that the underlying sampling distribution is normal. Of course, in real statistical problems, we are unlikely to know much about the sampling distribution, let alone whether or not it is normal. Suppose in fact that the underlying distribution is not normal. When the sample size n is relatively large, the distribution of the sample mean will still be approximately normal by the central limit theorem, and thus our tests of the mean μ should still be approximately valid. On the other hand, tests of the variance σ are less robust to deviations form the assumption of normality. The following exercises explore these ideas.

In the mean test experiment, select the gamma distribution with shape parameter 1 and scale parameter 1. For the three different tests and for various significance levels, sample sizes, and values of μ 0 , run the experiment 1000 times with an update frequency of 10. For each configuration, note the relative frequency of rejecting . When is true, compare the relative frequency with the significance level.

In the mean test experiment, select the uniform distribution on 0 4 . For the three different tests and for various significance levels, sample sizes, and values of μ 0 , run the experiment 1000 times with an update frequency of 10. For each configuration, note the relative frequency of rejecting . When is true, compare the relative frequency with the significance level.

How large n needs to be for the testing procedure to work well depends, of course, on the underlying distribution; the more this distribution deviates from normality, the larger n must be. Fortunately, convergence to normality in the central limit theorem is rapid and hence, as you observed in the exercises, we can get away with relatively small sample sizes (30 or more) in most cases.

In the variance test experiment, select the gamma distribution with shape parameter 1 and scale parameter 1. For the three different tests and for various significance levels, sample sizes, and values of σ 0 , run the experiment 1000 times with an update frequency of 10. For each configuration, note the relative frequency of rejecting . When is true, compare the relative frequency with the significance level.

In the variance test experiment, select the uniform distribution on 0 4 . For the three different tests and for various significance levels, sample sizes, and values of μ 0 , run the experiment 1000 times with an update frequency of 10. For each configuration, note the relative frequency of rejecting . When is true, compare the relative frequency with the significance level.

Computational Exercises

The length of a certain machined part is supposed to be 10 centimeters. In fact, due to imperfections in the manufacturing process, the actual length is a random variable. The standard deviation is due to inherent factors in the process, which remain fairly stable over time. From historical data, the standard deviation is known with a high degree of accuracy to be 0.3. The mean, on the other hand, may be set by adjusting various parameters in the process and hence may change to an unknown value fairly frequently. We are interested in testing μ 10 versus μ 10 .

  1. Suppose that a sample of 100 parts has mean 10.1. Perform the test at the 0.1 level of significance.
  2. Compute the P -value for the data in (a).
  3. Compute the power of the test in (a) at μ 10.05 .
  4. Compute the approximate sample size needed for significance level 0.1 and power 0.8 when μ 10.05 .

A bag of potato chips of a certain brand has an advertised weight of 250 grams. Actually, the weight (in grams) is a random variable. Suppose that a sample of 75 bags has mean 248 and standard deviation 5. At the 0.05 significance level, perform the following tests:

  1. μ 250 versus μ 250
  2. σ 7 versus σ 7

At a telemarketing firm, the length of a telephone solicitation (in seconds) is a random variable. A sample of 50 calls has mean 310 and standard deviation 25. At the 0.1 level of significance, can we conclude that

  1. μ 300 ?
  2. σ 20 ?

At a certain farm the weight of a peach (in ounces) at harvest time is a random variable. A sample of 100 peaches has mean 8.2 and standard deviation 1.0. At the 0.01 level of significance, can we conclude that

  1. μ 8 ?
  2. σ 1.5 ?

The hourly wage for a certain type of construction work is a random variable with standard deviation 1.25. For sample of 25 workers, the mean wage was $6.75. At the 0.01 level of significance, can we conclude that μ 7.00 ?

Using Michelson's data, test to see if the velocity of light is greater than 730 (+299000) km/sec, at the 0.005 significance level.

Using Cavendish's data, test to see if the density of the earth is less than 5.5 times the density of water, at the 0.05 significance level .

Using Short's data, test to see if the parallax of the sun differs from 9 seconds of a degree, at the 0.1 significance level.

Using Fisher's iris data, perform the following tests, at the 0.1 level:

  1. The mean petal length of Setosa irises differs from 15 mm.
  2. The mean petal length of Verginica irises is greater than 52 mm.
  3. The mean petal length of Versicolor irises is less than 44 mm.