]>
Suppose again that we have an observable random variable for an experiment, that takes values in a set . Suppose also that distribution of depends on an unknown parameter , taking values in a parameter space . Specifically, we will denote the probability density function of on by for . Of course, our data variable will almost always be vector-valued. The parameter may also be vector-valued.
The likelihood function is the function obtained by reversing the roles of and in the probability density function; that is, we view as the variable and as the given information (which is precisely the point of view in estimation):
.In the method of maximum likelihood, we try to find a value of the parameter that maximizes for each . If we can do this, then the statistic is called a maximum likelihood estimator of . The method is intuitively appealing--we try to find the values of the parameters that would have most likely produced the data we in fact observed.
Since the natural logarithm function is strictly increasing, the maximum value of , if it exists, will occur at the same points as the maximum value of . This latter function is called the log likelihood function and in many cases is easier to work with than the likelihood function (typically because the probability density function has a product structure).
An important special case is when is a vector of real parameters, so that . In this case, the maximum likelihood problem is to maximize a function of several variables. If is a continuous set, the methods of calculus can be used. If the maximum value of occurs at a point in the interior of , then has a local maximum at . Therefore, assuming that the likelihood function is differentiable, we can find this point by solving
or equivalently
On the other hand, the maximum value may occur at a boundary point of , or may not exist at all.
Consider next the case where our outcome variable is a random sample of size from the distribution with of a random variable taking values in , with probability density function . Then takes values in , and the joint probability density function of is the product of the marginal probability density functions. Thus, the likelihood function in this special case becomes
and hence the log likelihood function becomes
In the following subsections, we will study maximum likelihood estimation in a number of classical cases.
Suppose that we have a coin with unknown probability of heads. We toss the coin times and record the sequence of heads and tails. Thus, the data is a random sample of size from the Bernoulli distribution with success parameter . Let
denote the number of heads, so that the proportion of heads (the sample mean) is
Suppose that varies in the interval . Show that is the maximum likelihood estimator of . Recall that is also the method of moments estimator of .
Suppose that the coin is either fair or two-headed, so takes values in . Show that the maximum likelihood estimator of is the statistic given below, and interpret the result:
Exercises 1 and 2 show that the maximum likelihood estimator of a parameter, like the solution to any maximization problem, depends critically on the domain.
Show that
Show that
Show that is uniformly better than on the parameter space
In the following exercises, recall that if is a random sample from a distribution with mean and variance , then the method of moments estimators of and are, respectively,
Of course, is the sample mean, and where is the sample variance. In the exercises that follow, we will compute the maximum likelihood estimators for these parameters for several families of distributions.
Suppose that is a random sample from the Poisson distribution with unknown parameter . Show that the maximum likelihood estimator of is the sample mean . Recall that for the Poisson distribution, the parameter is both the mean and the variance.
Suppose that is a random sample from the normal distribution with unknown mean and variance . Show that the maximum likelihood estimators of and are and , respectively.
Suppose that is a random sample from the gamma distribution with known shape parameter and unknown scale parameter .
Run the gamma estimation experiment 1000 times, updating every 10 runs, for several values of the sample size , shape parameter , and scale parameter . In each case, compare the method of moments estimator of when is unknown with the method of moments and maximum likelihood estimator of when is known. Which estimator seems to work better in terms of mean square error?
Suppose that is a random sample from the beta distribution with left parameter and right parameter . Show that the maximum likelihood estimator of is
Run the beta estimation experiment 1000 times, updating every 10 runs, for several values of the sample size and the parameter . In each case, compare the method of moments estimator with the maximum likelihood estimator , Which estimator seems to work better in terms of mean square error?
Suppose that is a random sample from the Pareto distribution with shape parameter . Show that the maximum likelihood estimator of is
Run the Pareto estimation experiment 1000 times, updating every 10 runs, for several values of the sample size and the parameter . In each case, compare the method of moments estimator with the maximum likelihood estimator . Which estimator seems to work better in terms of mean square error?
In this section we will study two estimation problems that are a good source of insight and counterexamples. In a sense, our first estimation problem is the continuous analogue of an estimation problem studied in the section on Order Statistics in the chapter Finite Sampling Models. Suppose that is a random sample from the uniform distribution on the interval , where is an unknown parameter.
Show that the method of moments estimator of is .
Show that
Show that the maximum likelihood estimator of is , the order statistic.
Show that
Show that
Now let .
Show that
Show that the asymptotic relative efficiency of to is infinite.
The last exercise shows that is a much better estimator than ; in fact, an estimator such as , whose mean square error decreases on the order of , is called super efficient. Now, having found a really good estimator, let's see if we can find a really bad one. A natural candidate is an estimator based on , the first order statistic.
Show that
Show that , and hence is unbiased.
Show that , so is not even consistent.
Run the uniform estimation experiment 1000 times, updating every 10 runs, for several values of the sample size and the parameter . In each case, compare the empirical bias and mean square error of the estimators with their theoretical values. Rank the estimators in terms of empirical mean square error.
Our next series of exercises will show that the maximum likelihood estimator is not necessarily unique. Suppose that is a random sample from the uniform distribution on the interval , where is an unknown parameter.
Show that the method of moments estimator of is .
Show that
Show that any statistic is a maximum likelihood estimator of .
Returning to the general setting, suppose now that is a one-to-one function from the parameter space onto a set . We can view as a new parameter taking values in the space , and it is easy to re-parameterize the probability density function with the new parameter. Thus, let
The corresponding likelihood function is
Suppose that maximizes for . Show that maximizes for .
It follows from Exercise 28 that if is a maximum likelihood estimator for , then is a maximum likelihood estimator for . This result is known as the invariance property.
Suppose that is a random sample from the Poisson distribution with parameter , and let . Find the maximum likelihood estimator of in two ways:
If the function is not one-to-one, the maximum likelihood problem for the new parameter vector is not well-defined, because we cannot parameterize the probability density function in terms of . However, there is a natural generalization of the maximum likelihood problem in this case. Define
Suppose again that maximizes for . Show that maximizes for .
The result in the last exercise extends the invariance property to many-to-one transformations of the parameter: if is a maximum likelihood estimator for , then is a maximum likelihood estimator for .
Suppose that is a random sample of size from the Bernoulli distribution with unknown success parameter . Find the maximum likelihood estimator of , which is the variance of the sampling distribution.
Suppose that is a random sample from the normal distribution with unknown mean and variance . Find the maximum likelihood estimator of , which is the second moment about 0 for the sampling distribution.