]>
Suppose that the objects in our population are numbered from 1 to , so that . For example, the population might consist of manufactured items, and the labels might correspond to serial numbers. As in the basic sampling model we select objects at random, without replacement from :
where is the object chosen. Recall that is uniformly distributed over the set of permutations of size chosen from . Recall also that
is the unordered sample, which is uniformly distributed on the set of combinations of size chosen from .
For let
The random variable is known as the order statistic of order for the sample . Note that in particular, the extreme order statistics are
Show that takes values in .
We will denote the vector of order statistics by
Note that takes values in .
Run the order statistic experiment. Note that you can vary the population size and the sample size . The order statistics are recorded on each update.
Show that has elements and that is uniformly distributed on . Hint: if and only if if and only if is one of the permutations of .
Use a combinatorial argument to show that the probability density function of is
In the order statistic experiment, vary the parameters and note the shape and location of the probability density function. For selected values of the parameters, run the experiment 1000 times, updating very 10 runs. Note the apparent convergence of the relative frequency function to the probability density function.
The probability density function in Exercise 4 can be used to obtain an interesting identity involving the binomial coefficients. This identity, in turn, can be used to find the mean and variance of .
Show that for ,
Show that
Use the identity in Exercise 6 to show that
In the order statistic experiment, vary the parameters and note the size and location of the mean/standard deviation bar. For selected values of the parameters, run the experiment 1000 times, updating every 10 runs. Note the apparent convergence of the sample mean and standard deviation to the distribution mean and standard deviation.
Use the result of Exercise 7 to show that for , the following statistic is an unbiased estimator of :
Since is unbiased, its variance is the mean square error, a measure of the quality of the estimator.
Use the result of Exercise 8 to show that
Show that for fixed and , decreases as increases. Thus, the estimators improve as increases; in particular, is the best and the worst.
Verify the following ratio, known as the relative efficiency of with respect to :
.Note that the relative efficiency depends only on the orders and and the sample size , but not on the population size . In particular, the relative efficiency of with respect to is .
Usually, we hope that an estimator improves (in the sense of mean square error) as the sample size increases (the more information we have, the better our estimate should be). This general idea is known as consistency.
Verify the following result. Thus, decreases to 0 as increases from 1 to , and so is consistent:
Show that for fixed , at first increases and then decreases to 0 as increases from to . Thus, is inconsistent.
The following graph shows as a function of for .
In this subsection, we will derive another estimator of the parameter based on the average of the sample variables , (the sample mean) and compare this estimator with the estimator based on the maximum of the variables (the largest order statistic).
Show that .
It follows that is an unbiased estimator of . Moreover, it seems that superficially at least, uses more information from the sample (since it involves all of the sample variables) than . Could it be better? To find out, we need to compute the variance of the estimator (which, since it is unbiased, is the mean square error). This computation is a bit complicated since the sample variables are dependent. We will compute the variance of the sum as the sum of all of the pairwise covariances.
Show that for .
Recall or show that .
Show that .
Finally, show that .
The variance in Exercise 20 is decreasing with , so the estimator is also consistent. Let's compute the relative efficiency of the estimator based on the maximum to the estimator based on the mean.
Show that .
Thus, once again, the estimator based on the maximum is better.
If the sampling is with replacement, then the sample is a sequence of independent and identically distributed random variables. The order statistics from such samples are studied in the chapter on Random Samples.
Suppose that in a lottery, tickets numbered from 1 to 25 are placed in a bowl. Five tickets are chosen at random and without replacement.
The estimator was used by the Allies during World War II to estimate the number of German tanks that had been produced. German tanks had serial numbers, and captured German tanks and records formed the sample data. The statistical estimates turned out to be much more accurate than intelligence estimates. Some of the data are given in the table below.
| Date | Statistical Estimate | Intelligence Estimate | German Records |
|---|---|---|---|
| June 1940 | 169 | 1000 | 122 |
| June 1941 | 244 | 1550 | 271 |
| August 1942 | 327 | 1550 | 342 |
One of the morals, evidently, is not to put serial numbers on your weapons!
Suppose that in a certain war, 5 enemy tanks have been captured. The serial numbers are 51, 3, 27, 82, 65. Compute the estimate of , the total number of tanks, using all of the estimators discussed above.
In the order statistic experiment, and set , . Run the experiment 50 times, updating after each run. For each run, compute the estimate of based on each order statistic. For each estimator, compute the square root of the average of the squares of the errors over the 50 runs. Based on these empirical error estimates, rank the estimators of in terms of quality.