]> Maximum Likelihood
  1. Virtual Laboratories
  2. 7. Point Estimation
  3. 1
  4. 2
  5. 3
  6. 4
  7. 5
  8. 6

3. Maximum Likelihood

Basic Theory

Suppose again that we have an observable random variable X for an experiment, that takes values in a set S . Suppose also that distribution of X depends on an unknown parameter θ , taking values in a parameter space Θ . Specifically, we will denote the probability density function of X on S by f θ for θ Θ . Of course, our data variable X will almost always be vector-valued. The parameter θ may also be vector-valued.

The likelihood function L is the function obtained by reversing the roles of x and θ in the probability density function; that is, we view θ as the variable and x as the given information (which is precisely the point of view in estimation):

L x θ f θ x ,  θ Θ ,  x S .

In the method of maximum likelihood, we try to find a value u x of the parameter θ that maximizes L x θ for each x S . If we can do this, then the statistic u X is called a maximum likelihood estimator of θ . The method is intuitively appealing--we try to find the values of the parameters that would have most likely produced the data we in fact observed.

Since the natural logarithm function is strictly increasing, the maximum value of L x θ , if it exists, will occur at the same points as the maximum value of L x θ . This latter function is called the log likelihood function and in many cases is easier to work with than the likelihood function (typically because the probability density function f θ x has a product structure).

Vector of Parameters

An important special case is when θ θ 1 θ 2 θ k is a vector of k real parameters, so that Θ k . In this case, the maximum likelihood problem is to maximize a function of several variables. If Θ is a continuous set, the methods of calculus can be used. If the maximum value of L x occurs at a point θ in the interior of Θ , then L x has a local maximum at θ . Therefore, assuming that the likelihood function is differentiable, we can find this point by solving

θ i L x θ 0 ,  i 1 2 k

or equivalently

θ i L x θ 0 ,  i 1 2 k

On the other hand, the maximum value may occur at a boundary point of Θ , or may not exist at all.

Random Sample

Consider next the case where our outcome variable X X 1 X 2 X n is a random sample of size n from the distribution with of a random variable X taking values in R , with probability density function g θ ,  θ Θ . Then X takes values in S R n , and the joint probability density function of X is the product of the marginal probability density functions. Thus, the likelihood function in this special case becomes

L x θ i 1 n g θ x i ,  x x 1 x 2 x n S ,  θ Θ

and hence the log likelihood function becomes

L x θ i 1 n g θ x i ,  x x 1 x 2 x n S ,  θ Θ

Examples and Special Cases

In the following subsections, we will study maximum likelihood estimation in a number of classical cases.

The Bernoulli Distribution

Suppose that we have a coin with unknown probability p of heads. We toss the coin n times and record the sequence of heads and tails. Thus, the data X X 1 X 2 X n is a random sample of size n from the Bernoulli distribution with success parameter p . Let

Y i 1 n X i

denote the number of heads, so that the proportion of heads (the sample mean) is

M Y n

Suppose that p varies in the interval 0 1 . Show that M is the maximum likelihood estimator of p . Recall that M is also the method of moments estimator of p .

Suppose that the coin is either fair or two-headed, so p takes values in 12 1 . Show that the maximum likelihood estimator of p is the statistic given below, and interpret the result:

U 1 Y n 12 Y n

Exercises 1 and 2 show that the maximum likelihood estimator of a parameter, like the solution to any maximization problem, depends critically on the domain.

Show that

  1. U 1 p 1 12 12 n 1 p 12
  2. U is biased, but is asymptotically unbiased.

Show that

  1. MSE U 0 p 1 12 n 2 p 12
  2. U is consistent.

Show that U is uniformly better than M on the parameter space 12 1

Other Basic Distributions

In the following exercises, recall that if X X 1 X 2 X n is a random sample from a distribution with mean μ and variance σ 2 , then the method of moments estimators of μ and σ 2 are, respectively,

M 1 n i 1 n X i ,  T 2 1 n i 1 n X i M 2

Of course, M is the sample mean, and T 2 n 1 n S 2 where S 2 is the sample variance. In the exercises that follow, we will compute the maximum likelihood estimators for these parameters for several families of distributions.

Suppose that X X 1 X 2 X n is a random sample from the Poisson distribution with unknown parameter a 0 . Show that the maximum likelihood estimator of a is the sample mean M . Recall that for the Poisson distribution, the parameter a is both the mean and the variance.

Suppose that X X 1 X 2 X n is a random sample from the normal distribution with unknown mean μ and variance σ 2 0 . Show that the maximum likelihood estimators of μ and σ 2 are M and T 2 , respectively.

Suppose that X X 1 X 2 X n is a random sample from the gamma distribution with known shape parameter k and unknown scale parameter b 0 .

  1. Show that the method of moments estimator of b is W M k .
  2. Show that W is also the maximum likelihood estimator of b .

Run the gamma estimation experiment 1000 times, updating every 10 runs, for several values of the sample size n , shape parameter k , and scale parameter b . In each case, compare the method of moments estimator V of b when k is unknown with the method of moments and maximum likelihood estimator W of b when k is known. Which estimator seems to work better in terms of mean square error?

Suppose that X X 1 X 2 X n is a random sample from the beta distribution with left parameter a 0 and right parameter b 1 . Show that the maximum likelihood estimator of a is

V n i 1 n X i

Run the beta estimation experiment 1000 times, updating every 10 runs, for several values of the sample size n and the parameter a . In each case, compare the method of moments estimator U with the maximum likelihood estimator V , Which estimator seems to work better in terms of mean square error?

Suppose that X X 1 X 2 X n is a random sample from the Pareto distribution with shape parameter a 0 . Show that the maximum likelihood estimator of a is

V n i 1 n X i

Run the Pareto estimation experiment 1000 times, updating every 10 runs, for several values of the sample size n and the parameter a . In each case, compare the method of moments estimator U with the maximum likelihood estimator V . Which estimator seems to work better in terms of mean square error?

Uniform Distributions

In this section we will study two estimation problems that are a good source of insight and counterexamples. In a sense, our first estimation problem is the continuous analogue of an estimation problem studied in the section on Order Statistics in the chapter Finite Sampling Models. Suppose that X X 1 X 2 X n is a random sample from the uniform distribution on the interval 0 a , where a 0 is an unknown parameter.

Show that the method of moments estimator of a is U 2 M .

Show that

  1. U is unbiased.
  2. U a 2 3 n so U is consistent.

Show that the maximum likelihood estimator of a is X n n , the n order statistic.

Show that

  1. X n n n n 1 a
  2. bias X n n a n 1 so that X n n is negatively biased but asymptotically unbiased.

Show that

  1. X n n n n 2 n 1 2 a 2
  2. MSE X n n 2 n 1 n 2 a 2 so that X n n is consistent.

Now let V n 1 n X n n .

Show that

  1. V is unbiased.
  2. V a 2 n n 2 so that V is consistent.

Show that the asymptotic relative efficiency of V to U is infinite.

The last exercise shows that V is a much better estimator than U ; in fact, an estimator such as V , whose mean square error decreases on the order of 1 n 2 , is called super efficient. Now, having found a really good estimator, let's see if we can find a really bad one. A natural candidate is an estimator based on X n 1 , the first order statistic.

Show that

  1. If X is uniformly distributed on 0 a then so is a X
  2. a X 1 a X 2 a X n is also a random sample from the uniform distribution on 0 a
  3. X n 1 has the same distribution as a X n n .

Show that X n 1 a n 1 , and hence W n 1 X n 1 is unbiased.

Show that W n n 2 a 2 , so W is not even consistent.

Run the uniform estimation experiment 1000 times, updating every 10 runs, for several values of the sample size n and the parameter a . In each case, compare the empirical bias and mean square error of the estimators with their theoretical values. Rank the estimators in terms of empirical mean square error.

Our next series of exercises will show that the maximum likelihood estimator is not necessarily unique. Suppose that X X 1 X 2 X n is a random sample from the uniform distribution on the interval a a 1 , where a is an unknown parameter.

Show that the method of moments estimator of a is U M 12 .

Show that

  1. U is unbiased.
  2. U 1 12 n so U is consistent.

Show that any statistic V X n n 1 X n 1 is a maximum likelihood estimator of a .

The Invariance Property

Returning to the general setting, suppose now that h is a one-to-one function from the parameter space Θ onto a set Λ . We can view λ h θ as a new parameter taking values in the space Λ , and it is easy to re-parameterize the probability density function with the new parameter. Thus, let

f λ x f h λ x ,  x S ,  λ Λ

The corresponding likelihood function is

L x λ L x h λ ,  λ Λ ,  x S

Suppose that u x Θ maximizes L x for x S . Show that h u x Λ maximizes L x for x S .

It follows from Exercise 28 that if U is a maximum likelihood estimator for θ , then V h U is a maximum likelihood estimator for λ h θ . This result is known as the invariance property.

Suppose that X X 1 X 2 X n is a random sample from the Poisson distribution with parameter a 0 , and let p X i 0 a . Find the maximum likelihood estimator of p in two ways:

  1. Directly, by finding the likelihood function corresponding to the parameter p .
  2. By using the result of Exercise 6 and the invariance property.

If the function h is not one-to-one, the maximum likelihood problem for the new parameter vector λ h θ is not well-defined, because we cannot parameterize the probability density function in terms of λ . However, there is a natural generalization of the maximum likelihood problem in this case. Define

L x λ L x θ θ Θ h θ λ ,  λ Λ ,  x S

Suppose again that u x Θ maximizes L x for x S . Show that h u x Λ maximizes L x for x S .

The result in the last exercise extends the invariance property to many-to-one transformations of the parameter: if U is a maximum likelihood estimator for θ , then V h U is a maximum likelihood estimator for λ h θ .

Suppose that X X 1 X 2 X n is a random sample of size n from the Bernoulli distribution with unknown success parameter p 0 1 . Find the maximum likelihood estimator of p 1 p , which is the variance of the sampling distribution.

Suppose that X X 1 X 2 X n is a random sample from the normal distribution with unknown mean μ and variance σ 2 0 . Find the maximum likelihood estimator of μ 2 σ 2 , which is the second moment about 0 for the sampling distribution.