]>
Suppose again that we have an observable random variable for an experiment, that takes values in a set . Suppose also that distribution of depends on a parameter taking values in a parameter space . We will denote the probability density function of by for and . Of course, our data variable is almost always vector-valued. The parameter may also be vector-valued.
In Bayesian analysis, we treat the parameter as a random variable, with a given probability density function . The corresponding distribution is called the prior distribution of and is intended to reflect our knowledge (if any) of the parameter, before we gather data. After observing , we then use Bayes' theorem, named for Thomas Bayes, to compute the conditional probability density function of given :
where is the (marginal) probability density function of . Recall that for fixed ,
if the parameter has a discrete distribution, or
if the parameter has a continuous distribution. Equivalently, is simply the normalizing constant for as a function of . The conditional distribution of given is called the posterior distribution, and is an updated distribution, given the information in the data.
If is a real parameter, the conditional expected value is the Bayes' estimator of . Recall that is a function of and, among all functions of , is closest to in the mean square sense.
In many important special cases, we can find a parametric family of distributions with the following property: If the prior distribution of belongs to the family, then so does the posterior distribution of given . The family is said to be conjugate for the distribution of . Conjugate families are nice from a computational point of view, since we can often compute the posterior distribution through a simple formula involving the parameters of the family, without having to use Bayes' theorem directly.
Suppose that is a random sample of size from the Bernoulli distribution with unknown success parameter . In the usual language of reliability, means success on trial and means failure on trial . For specific example, suppose that we have a coin with an unknown probability of heads . We toss the coin times, defining heads as success and tails as failure. In any event, the number of successes in the trials is
Suppose now that we give a prior beta distribution with left parameter and right parameter , where and are chosen to reflect our initial information about . For example, if we know nothing, we might let , so that the prior distribution of is uniform on the parameter space . On the other hand, if we believe that is about , we might let and (so that the mean of the prior distribution is ).
Show that the posterior distribution of given is beta with left parameter and right parameter .
Thus, the beta distribution is conjugate for the Bernoulli distribution. Note also that the posterior distribution depends on the data vector only through the number of successes . This is true because is a sufficient statistic for . In particular, note that the right beta parameter is increased by the number of successes and the left beta parameter is increased by the number of failures.
In the beta coin experiment, set and , and set (the uniform prior). Run the simulation 100 times, updating after each run. Note the shape and location of the posterior probability density function of on each run.
Show that the Bayes' estimator of given is
In the beta coin experiment, set and , and set and . Run the simulation 100 times, updating after each run. Note the estimate of and the shape and location of the posterior probability density function of on each run.
Verify the bias of given in the following formula. Show that is asymptotically unbiased.
Note also that we cannot choose and to make unbiased, since such a choice would involve the true value of , which we do not know.
In the beta coin experiment, vary the parameters and note the change in the bias. Now set and , and set and . Run the simulation 1000 times, updating every 10 runs. Note the estimate of and the shape and location of the posterior probability density function of on each update. Note the apparent convergence of the empirical bias to the true bias.
Verify the mean square error of given in the following formula. Show that is consistent.
In the beta coin experiment, vary the parameters and note the change in the mean square error. Now set and , and set . Run the simulation 1000 times, updating every 10 runs. Note the estimate of and the shape and location of the posterior probability density function of on each update. Note the apparent convergence of the empirical mean square error to the true mean square error.
Interestingly, we can choose and so that has mean square error that is independent of :
Show that if then for any ,
In the beta coin experiment, set and . Vary and note that the mean square error does not change. Now set and run the simulation 1000 times, updating every 10 runs. Note the estimate of and the shape and location of the posterior probability density function on each update. Note the apparent convergence of the empirical bias and mean square error to the true values.
Recall that the method of moments estimator and the maximum likelihood estimator of is the sample mean (the proportion of heads):
This estimator has mean square error .
Sketch the graphs of in Exercise 7 and as functions of , on the same set of axes.
Consider the coin interpretation of Bernoulli trials in the preceding section, but suppose now that the coin is either fair or two-headed. We give the prior distribution with probability density function given by , , where is chosen to reflect our prior knowledge of the probability that the coin is two-headed.
Show that the posterior distribution of given is as follows. Interpret the result.
Show that the Bayes' estimator of is
Verify the bias of given in the following formula. Show that is asymptotically unbiased.
Verify the mean square error of given in the following formula. Show that is consistent.
Suppose that is a random sample of size from the geometric distribution with unknown success parameter . As usual, we will denote the sum of the sample values by
Recall that the sample variables can be interpreted as the number of trials between successive successes in a sequence of Bernoulli trials. Thus, is the trial number of the success. Given , has the negative binomial distribution with parameters and . Suppose now that we give a prior beta distribution with left parameter and right parameter as before.
Show that the posterior distribution of given is beta with left parameter and right parameter .
Thus, the beta distribution is conjugate to the geometric distribution.
Show that the Bayes' estimator of is
Suppose that is a random sample of size from the Poisson distribution with parameter . Moreover, suppose that has a prior gamma distribution with shape parameter and scale parameter . As usual, we will denote the sum of the sample values by
Show that the posterior distribution of given is gamma with shape parameter and scale parameter .
It follows that the gamma distribution is conjugate to the Poisson distribution.
Show that the Bayes' estimator of is
Show that the bias of is given by the following formula, and hence that is asymptotically unbiased:
Note that, as before, we cannot choose and to make unbiased, without knowledge of
Show that the mean square error of is given by the following formula, and hence is consistent:
Suppose that is a random sample of size from the normal distribution with unknown mean and known variance . Moreover, suppose that is given a prior normal distribution with mean and variance , both known of course. Denote the sum of the sample values by
Show that the posterior distribution of given is normal with mean and variance given by.
Therefore, the normal distribution is conjugate for the normal distribution with unknown mean and known variance. Moreover, it follows that the Bayes' estimator of is
Show that the bias of is given by the following formula, and hence is asymptotically unbiased:
Show that the mean square error of is given by the following formula and hence is consistent: