]>
In this section we will study two probability models that are interesting and important individually. The fact that there are deep connections between the two processes, of course, makes them even more important. Pólya's urn scheme is a dichotomous sampling model that generalizes the hypergeometric model (sampling without replacement) and the Bernoulli model (sampling with replacement). The beta-Berenoulli process is obtained by randomizing
the success parameter
in the Bernoulli trials process with a beta distribution. For certain values of the parameters, the two processes are equivalent, an interesting and surprising result..
Suppose that we have an urn (what else!) that initially contains red and green balls, where and are positive integers. At each discrete time (trial), we select a ball from the urn and then return the ball to the urn along with new balls of the same color. Ordinarily, the parameter is a nonnegative integer. However, the model actually makes sense if is a negative integer, if we interpret this to mean that we remove the balls rather than add them, and assuming that there are enough balls of the proper color in the urn. In any case, the random process is known as Pólya's urn process, named for George Pólya.
In terms of the colors of the selected balls, Pólya's urn scheme generalizes the standard models of sampling with and without replacement. Note that
For the most part, we will assume that is nonnegative so that the process can be continued indefinitely. Occasionally we consider the case so that we can interpret the results in terms of sampling without replacement.
Let denote the color of the ball selected at time , where 0 denotes green and 1 denotes red. Mathematically, our basic random process is the sequence of indicator variables:
As with any random process, our first goal is to compute the finite dimensional distributions of . That is, we want to compute the joint distribution of for each .
Some additional notation will really help. Recall the generalized permutation formula in our study of combinatorial structures: for , , and , we defined
As usual, we adopt the convention that a product over an empty index set is 1. Hence for every and .
Recall that
The finite dimensional distributions are easy to compute using the multiplication rule of conditional probability. If we know the contents of the urn at any given time, then the probability of an outcome at the next time is all but trivial.
Let and let Show that
The joint probability in the previous exercise just depends on the number of red balls . Thus, the joint distribution is invariant under a permutation of the coordinates, and hence is an exchangeable sequence. Of course the joint distribution reduces to the formulas we have obtained earlier in the special cases of sampling with replacement () or sampling without replacement (), although in the latter case we must have .
Show that for every .
Thus is a sequence of identically distributed variables, quite surprising at first but of course inevitable for any exchangeable sequence. Compare the joint and marginal distributions. Note that is an independent sequence if and only if , when we have simple sampling with replacement. Pólya's urn is one of the most famous examples of a random process in which the outcome variables are exchangeable, but dependent (in general).
Next, let's compute the covariance and correlation of a pair of outcome variables.
Suppose that and are distinct indices. Show that
Thus, the variables are positively correlated if , negatively correlated if , and uncorrelated (in fact, independent), if . These results certainly make sense when we recall the dynamics of Pólya's urn.
Pólya's urn is described by a sequence of indicator variables. We can study the same derived random processes that we studied with Bernoulli trials: the number of red balls in the first trials, the trial number of the red ball, and so forth.
The number of red balls selected in the first trials is
Note that
Of course, is the partial sum process associated with . The basic analysis of follows easily from our work with .
Show that
The distribution defined by this probability density function is known, appropriately enough, as the Pólya distribution. Of course, the distribtion reduces to the binomial distributiion in the case of sampling with replacement () and to the hypergeometric distribution in the case of sampling without replacement (), although again in this case we need . The case where all three parameters are equal is particularly interesting.
Suppose that . Show that is uniformly distributed on .
In general, the Pólya family of distributions has a diverse collection of shapes.
Start the simulation of the Pólya Urn Experiment. Vary the parameters and note the shape of the probability density function. In particular, note when the function is skewed, when the function is symmetric, when the function is unimodal, when the function is monotone, and when the function is U-shaped. For various values of the parameters, run the simulation 1000 times and note the apparent convergence of the empirical density function to the probability density function.
Solve the inequality for . In particular, show that
Next, let's find the mean and variance. As usual, our main tools are the facts that the expected value of a sum is the sum of the expected values and that the variance of a sum is the sum of all pairwise covariances. Curiously, the mean does not depend on the parameter .
Show that
Start the simulation of the Pólya Urn Experiment. Vary the parameters and note the shape and location of the mean/standard deviation bar. For various values of the parameters, run the simulation 1000 times and note the apparent convergence of the empirical mean and standard deviation to their distributional counterparts.
Explicitly compute the probability density function, mean, and variance of when , , and for the following values of . Sketch the graph of the density function in each case.
Fix , , and , and let . Show that
Thus, the limiting distribution of is concentrated on 0 and . The limiting probabilities are just the initial proportion of green and red balls, respectively. Interpret this result in terms of the dynamics of Pólya's urn scheme.
Suppose that is nonnegative, so that the process continues indefinitely. The proportion of red balls selected in the first trials is
This is an interesting variable, since a little reflection suggests that it may have a limit as increases. Indeed, if , then is just the sample mean corresponding to Bernoulli trials. Thus, by the law of large numbers, converges to the success parameter as with probability 1.
On the other hand, the proportion of red balls in the urn after trials is
When , of course, so that and have the same limiting behavior.
Suppose that . Show that has a limit if and only if has a limit, and in this case, the limits are the same.
Suppose that . Show that the distribution of converges to the uniform distribution on the interval as .
More generally, it turns out that when , and converge with probability 1 to a random variable that has the beta distribution with left parameter and right parameter . We need the theory of martingales to derive and understand this result.
Suppose again that is nonnegative, so that the process continues indefinitely. For let
The random processes and are inverses of each other in a sense. Show that if and only if and , for and .
Suppose that and . Show that
In particular, if then
These last probabilities satisfy Laplace's rule of succession, another interesting connection. The rule is named for Pierre Simon Laplace, and is studied from a different point of view in the section on Independence.
Use Exercises 7, Exercise 17, Exercise 18, and the multiplication rule for conditional probability to show that
Of course this probability density function reduces to the negative binomial density function with trial parameter and success parameter when (sampling with replacement).
Suppose that . Show that
Fix , , and , and let . Show that
Thus, the limiting distribution of is concentrated on 0 and . The limiting probabilities at these two points are just the initial proportion of red and green balls, respectively. Interpret this result in terms of the dynamics of Pólya's urn scheme.
An interesting thing to do in almost any parametric probability model is to randomize
one or more of the parameters. Done in a clever way, this often leads to interesting new models and unexpected connections between models. In this subsection we will randomize the success parameter in the Bernoulli trials model.
Suppose that has the beta distribution on the interval with left parameter and right parameter . Thus, has probability density function given by
Next suppose that is a sequence of indicator random variables with the property that is a conditionally independent sequence given , with
In short, given , is a Bernoulli trials sequence with success parameter . We will refer to as the beta-Bernoulli process with parameters and .
For a statistical application, suppose that we have a Bernoulli trials process (coin tosses for example) with an unknown probability of success. We model the probability of success with the beta distribution; the parameters and are selected to incorporate our knowledge (if any) of this probability.
What's our first step? Well, of course we need to compute the finite dimensional distributions of .
Let and let Condition on to show that
Thus, if and are integers, then the beta-Bernoulli process is equivalent to Pólya's urn process with parameters , , and , quite a beautiful result. In general, the processes are not equivalent. The beta-Bernoulli process is less restrictive in the sense that the parameters and need not be integers; it is more restrictive in the sense that must be 1.
Verify that the basic mathematical results for the Pólya process also hold for the beta-Bernoulli process, except of course, that and can be any positive numbers (not just integers) and that must be 1.
Use Bayes' theorem to show that the conditional distribution of given is beta with left parameter and right parameter .
Thus, the left parameter increases by the number of successes while the right parameter increases by the number of failures. In the language of Bayesian statistics, the original distribution of is the prior distribution, and the conditional distribution of given is the posterior distribution. The fact that the posterior distribution is beta whenever the prior distribution is beta means that the family of beta distributions is a conjugate family. These concepts are studied in more generality in the section on Bayes Estimators in the chapter on Point Estimation.
Run the simulation of the beta coin experiment for various values of the parameter. Note how the posterior density changes from the prior density, given the number of heads.