]>
As in the basic sampling model, we start with a finite population consisting of objects. In this section, we suppose in addition that each object is one of types; that is, we have a multi-type population. For example, we could have an urn with balls of several different colors, or a population of voters who are either democrat, republican, or independent. Let denote the subset of all type objects and let for . Thus
The dichotomous model considered earlier is clearly a special case, with . As in the basic sampling model, we sample objects at random from :
where is the object chosen. Now let denote the number of type objects in the sample, for . Note that
so if we know the values of of the counting variables, we can find the value of the remaining counting variable. As with any counting variable, we can express as a sum of indicator variables:
Show that
We assume initially that the sampling is without replacement, since this is the realistic case in most applications.
Basic combinatorial arguments can be used to derive the probability density function of the random vector of counting variables. Recall that since the sampling is without replacement, the unordered sample is uniformly distributed over the combinations of size chosen from .
Show that
The distribution of is called the multivariate hypergeometric distribution with parameters , , and . We also say that has this distribution (recall again that the values of any of the variables determines the value of the remaining variable). Usually it is clear from context which meaning is intended. The ordinary hypergeometric distribution corresponds to .
Show the following alternate from of the multivariate hypergeometric probability density function in two ways: combinatorially, by considering the ordered sample uniformly distributed over the permutations of size chosen from , and algebraically, starting with the result in Exercise 2.
Show that has the hypergeometric distribution with parameters , . and . Give both a probabilistic argument, based on the sampling model, and an analytic derivation, based on the joint probability density function in Exercise 2.
The multivariate hypergeometric distribution is preserved when the counting variables are combined. Specifically, suppose that is a partition of the index set into nonempty, disjoint subsets. Let
Show that has the multivariate hypergeometric distribution with parameters , , and .
The multivariate hypergeometric distribution is also preserved when some of the counting variables are observed. Specifically, suppose that is a partition of the index set into nonempty, disjoint subsets . Suppose that we observe for . Let
Show that the conditional distribution of given is multivariate hypergeometric with parameters , , and .
Combinations of the basic results in Exercise 5 and Exercise 6 can be used to compute any marginal or conditional distributions of the counting variables.
We will compute the mean, variance, covariance, and correlation of the counting variables. Results from the hypergeometric distribution and the representation in terms of indicator variables in Exercise 1 are the main tools.
Show that for ,
Suppose that and are distinct elements of and that and are distinct elements of . Show that
Suppose that and are distinct elements of and that and are distinct elements of . Show that
In particular, and are negatively correlated for distinct and , and for any and . Does this result seem reasonable?
Use the result of Exercise 7 and Exercise 8 to show that for distinct and in ,
Suppose now that the sampling is with replacement, even though this is usually not realistic in applications.
Show that the types of the objects in the sample form a sequence of multinomial trials with parameters .
The following results now follow immediately from the general theory of multinomial trials, although modifications of the arguments above could also be used.
Show that has the multinomial distribution with parameters and :
Show that for distinct and in ,
Suppose that the population size is very large compared to the sample size . In this case, it seems reasonable that sampling without replacement is not too much different than sampling with replacement, and hence the multivariate hypergeometric distribution should be well approximated by the multinomial. The following exercise makes this observation precise. Practically, it is a valuable result, since in many cases we do not know the population size exactly.
Suppose that depends on and that as for . Show that for fixed , the multivariate hypergeometric probability density function with parameters , , and converges to the multinomial probability density function with parameters and . Hint: Use the representation in Exercise 3.
A population of 100 voters consists of 40 republicans, 35 democrats and 25 independents. A random sample of 10 voters is chosen.
Recall that the general card experiment is to select cards at random and without replacement from a standard deck of 52 cards. The special case is the poker experiment and the special case is the bridge experiment.
In a bridge hand, find the probability density function of
In a bridge hand,
In a bridge hand,
In the card experiment, a hand that does not contain any cards of a particular suit is said to be void in that suit.
Use the inclusion-exclusion rule to show that the probability that a poker hand is void in at least one suit is
In the card experiment, set . Run the simulation 1000 times, updating after each run. Compute the relative frequency of the event that the hand is void in at least one suit. Compare the relative frequency with the true probability given in the previous exercise.
Use the inclusion-exclusion rule to show that the probability that a bridge hand is void in at least one suit is