\(\newcommand{\R}{\mathbb{R}}\) \(\newcommand{\N}{\mathbb{N}}\) \(\newcommand{\E}{\mathbb{E}}\) \(\newcommand{\P}{\mathbb{P}}\) \(\newcommand{\var}{\text{var}}\) \(\newcommand{\sd}{\text{sd}}\) \(\newcommand{\cov}{\text{cov}}\) \(\newcommand{\cor}{\text{cor}}\) \(\newcommand{\vc}{\text{vc}}\) \(\newcommand{\bs}{\boldsymbol}\)
  1. Virtual Laboratories
  2. 4. Special Distributions
  3. The Multivariate Normal Distribution

The Multivariate Normal Distribution

The Bivariate Normal Distribution

Definition

Suppose that \(U\) and \(V\) are independent random variables each, with the standard normal distribution. We will need the following five parameters: \(\mu, \; \nu \in \R\); \(\sigma, \; \tau \in (0, \infty)\); and \(\rho \in [0, 1]\). Now let \(X\) and \(Y\) be new random variables defined by

\[ \begin{align} X & = \mu + \sigma \, U \\ Y & = \nu + \tau \, \rho \, U + \tau \sqrt{1 - \rho^2} \, V \end{align} \]

The joint distribution of \((X, Y)\) is called the bivariate normal distribution with parameters \((\mu, \nu, \sigma, \tau, \rho)\).

Basic Properties

For the following exercises, use properties of mean, variance, covariance, and properties of the normal distribution.

\(X\) is normally distributed with mean \(\mu\) and standard deviation \(\sigma\).

\(Y\) is normally distributed with mean \(\nu\) and standard deviation \(\tau\).

\(\cor(X, Y) = \rho\).

\(X\) and \(Y\) are independent if and only if \( \rho = 0 \).

Thus, for two random variables with a joint normal distribution, the random variables are independent if and only if they are uncorrelated.

In the bivariate normal experiment, change the standard deviations of \(X\) and \(Y\) with the scroll bars. Watch the change in the shape of the probability density functions. Now change the correlation with the scroll bar and note that the probability density functions do not change. For various values of the parameters, run the experiment 1000 times. Observe the cloud of points in the scatterplot, and note the apparent convergence of the empirical density function to the probability density function.

The Probability Density Function

We can use the change of variables formula to find the joint probability density function of \((X, Y)\).

The joint probability density function of \((X, Y)\) is

\[ f(x, y) = \frac{1}{2 \, \pi \, \sigma \, \tau \, \sqrt{1 - \rho^2}} \exp \left\{ -\frac{1}{2 \, (1 - \rho^2)} \left[ \frac{(x - \mu)^2}{\sigma^2} - 2 \, \rho \frac{(x - \mu)(y - \nu)}{\sigma \, \tau} + \frac{(y - \nu)^2}{\tau^2} \right] \right\}, \quad (x, y) \in \R^2 \]
Proof:

Consider the transformation that defines \((x, y)\) from \((u, v)\). The inverse transformation is given by

\[ \begin{align} u & = \frac{x - \mu}{\sigma} \\ v & = \frac{x - \mu}{\sigma \, \sqrt{1 - \rho^2}} - \rho \, \frac{y - \nu}{\tau \, \sqrt{1 - \rho^2}} \end{align} \]

The Jacobian of the inverse transformation is

\[ \frac{\partial(u, v)}{\partial(x, y)} = \frac{1}{\sigma \, \tau \, \sqrt{1 - \rho^2}} \]

Note that the Jacobian is a constant, because the transformation is linear. The result now follows from the independence of \(U\) and \(V\), and the change of variables formula

For \(c \gt 0\), the set of points \(\{(x, y): f(x, y) = c\}\) is called a level curve of \(f\) (these are curves of constant probability density).

The level curves of \(f\) satsify the following properties:

  1. The curves are ellipses centered at \((\mu, \nu)\).
  2. The axes of these ellipses are parallel to the coordinate axes if and only if \(\rho = 0\).

In the bivariate normal experiment, run the experiment 1000 times with for selected values of the parameters. Observe the cloud of points in the scatterplot and note the apparent convergence of the empirical density function to the probability density function.

Transformations

The following exercise shows that the bivariate normal distribution is preserved under linear (more technically affine) transformations.

Define \(W = a_1 + b_1 X + c_1 Y\) and \(Z = a_2 + b_2 X + c_2 Y\), where the coefficients are in \(\R\) and \(b_1 c_2 - c_1 b_2 \ne 0\). Then \((W, Z)\) has a bivariate normal distribution. Moreover,

  1. \(\E(W) = a_1 + b_1 \mu + c_1 \nu\)
  2. \(\E(Z) = a_2 + b_2 \mu + c_2 \nu\)
  3. \(\var(W) = b_1^2 \sigma^2 + c_1^2 \tau^2 + 2 b_1 c_1 \rho \sigma \tau\)
  4. \(\var(Z) = b_2^2 \sigma^2 + c_2^2 \tau^2 + 2 b_2 c_2 \rho \sigma \tau\)
  5. \(\cov(Z, W) = b_1 b_2 \sigma^2 + c_1 c_2 \tau^2 + (b_1 c_2 + b_2 c_1) \rho \sigma \tau \)

Bivariate normal distributions are also perserved under conditioning.

The conditional distribution of \(Y\) given \(X = x\) is normal with mean and variance given by

  1. \(\E(Y \mid X = x) = \nu + \rho \, \tau \frac{x - \mu}{\sigma}\)
  2. \(\var(Y \mid X = x) = \tau^2 (1 - \rho^2)\)

Note that the conditional variance does not depende on \(x\)

From the definition of \(X\) and \(Y\) in terms of the independent standard normal variables \(U\) and \(V\) note that

\[ Y = \nu + \rho \, \tau \frac{X - \mu}{\sigma} + \tau \, \sqrt{1 - \rho^2} \, V \]

This result can be used to give another proof of the result in Exercise 10 (note that \(X\) and \(V\) are independent).

In the bivariate normal experiment, set the standard deviation of \(X\) to 1.5, the standard deviation of \(Y\) to 0.5, and the correlation to 0.7.

  1. Run the experiment 100 times.
  2. For each run, compute \(\E(Y \mid X = x)\) the predicted value of \(Y\) for the given the value of \(X\).
  3. Over all 100 runs, compute the square root of the average of the squared errors between the predicted value of \(Y\) and the true value of \(Y\).

The following problem is a good exercise in using the change of variables formula and will be useful when we discuss the simulation of normal variables.

Recall that \(U\) and \(V\) are independent random variables each with the standard normal distribution. Define the polar coordinates \((R, \Theta)\) of \((U, V)\) by the equations \(U = R \, \cos(\Theta)\), \(V = R \, \sin(\Theta)\) where \(R \ge 0\) and \(0 \le \Theta \lt 2 \, \pi\). Then

  1. \(R\) has probability density function \(g(r) = r \, e^{-\frac{1}{2} r^2}\) for \(r \ge 0\).
  2. \(\Theta\) is uniformly distributed on \([0, 2 \, \pi)\).
  3. \(R\) and \(\Theta\) are independent.

The distribution of \(R\) is known as the Rayleigh distribution, named for William Strutt, Lord Rayleigh. It is a member of the family of Weibull distributions, named in turn for Wallodi Weibull.

The General Multivariate Normal Distribution

The general multivariate normal distribution is a natural generalization of the bivariate normal distribution studied above. The exposition is very compact and elegant using expected value and covariance matrices, and would be horribly complex without these tools. Thus, this section requires some prerequisite knowledge of linear algebra. In particular, recall that \(\boldsymbol{A}^T\) denotes the transpose of a matrix \(\boldsymbol{A}\) and that we identify a vector in \(\R^n\) with the \(n \times 1\) column vector.

The Standard Normal Distribution

Suppose that \(\bs{Z} = (Z_1, Z_2, \ldots, Z_n)\) is a vector of independent random variables, each with the standard normal distribution. Then \(\bs{Z}\) is said to have the \(n\)-dimensional standard normal distribution.

\(\E(\bs{Z}) = \bs{0}\) (the zero vector in \(\R^n\)).

\(\vc(\bs{Z}) = I\) (the \(n \times n\) identity matrix).

\(\bs{Z}\) has probability density function

\[ \phi(\bs{z}) = \frac{1}{(2 \, \pi)^{n/2}} \exp \left( -\frac{1}{2} \bs{z} \cdot \bs{z} \right), \quad \bs{z} \in \R^n \]

\(\bs{Z}\) has moment generating function given by

\[ \E[\exp(\bs{t} \cdot \bs{Z})] = \exp \left( \frac{1}{2} \bs{t} \cdot \bs{t} \right), \quad \bs{t} \in \R^n \]

The General Normal Distribution

Now suppose that \(\bs{Z}\) has the \(n\)-dimensional standard normal distribution. Suppose also that \(\bs{\mu} \in \R^n\) and that \(\bs{A} \in \R^{n \times n}\) is invertible. The random vector \(\bs{X} = \bs{\mu} + \bs{A} \, \bs{Z}\) is said to have an \(n\)-dimensional normal distribution.

\(\E(\bs{X}) = \bs{\mu}\).

\(\vc(\bs{X}) = \bs{A} \, \bs{A}^T\). This matrix is invertible and positive definite.

Let \(\bs{V} = \vc(\bs{X}) = \bs{A} \, \bs{A}^T\). Then \(\bs{X}\) has probability density function

\[ f(\bs{x}) = \frac{1}{(2 \, \pi)^{n/2} \sqrt{\det(\bs{V})}} \exp \left[ -\frac{1}{2} (\bs{x} - \bs{\mu}) \cdot \bs{V}^{-1} (\bs{x} - \bs{\mu}) \right], \quad \bs{x} \in \R^n \]
Proof:

Use the multivariate change of variables theorem.

\(\bs{X}\) has moment generating function given by

\[ \E[\exp(\bs{t} \cdot \bs{X})] = \exp \left( \bs{\mu} \cdot \bs{t} + \frac{1}{2} \bs{t} \cdot \bs{V} \, \bs{t} \right), \quad \bs{t} \in \R^n \]

Note that the matrix \(\bs{A}\) that occurs in the transformation is not unique, but of course the variance-covariance matrix \(\bs{V}\) is unique. In general, for a given positive definite matrix \(\bs{V}\), there are many invertible matrices \(\bs{A}\) such that \( \bs{V} = \bs{A} \, \bs{A}^T\). A theorem in matrix theory states that there is a unique lower triangular matrix \(\bs{L}\) with this property.

Identify the lower triangular matrix \(\bs{L}\) for the bivariate normal distribution.

Transformations

The multivariate normal distribution is invariant under two basic types of transformations: affine transformation with an invertible matrix, and the formation of subsequences.

Suppose that \(\bs{X}\) has an \(n\)-dimensional normal distribution with mean vector \(\bs{\mu}\) and variance-covariance matrix \(\bs{V}\). Suppose also that \(\bs{a} \in \R^n\) and that \(\bs{B} \in \R^{n \times n}\) is invertible. Then \(\bs{Y} = \bs{a} + \bs{B} \, \bs{X}\) has a multivariate normal distribution. Moreover,

  1. \(\E(\bs{Y}) = \bs{a} + \bs{B} \, \bs{\mu}\)
  2. \(\vc(\bs{Y}) = \bs{B} \, \bs{V} \, \bs{B}^T\)

Suppose that \(\bs{X}\) has an \(n\)-dimensional normal distribution. Then any permutation of the coordinates of \(\bs{X}\) also has an \(n\)-dimensional normal distribution.

Proof:

Permuting the coordinates of \(\bs{X}\) corresponds to multiplication of \(\bs{X}\) by a permutation matrix--a matrix of 0's and 1's in which each row and column has a single 1.

Suppose that \(\bs{X} = (X_1, X_2, \ldots, X_n)\) has an \(n\)-dimensional normal distribution. If \(k \lt n\), then \(\bs{W} = (X_1, X_2, \ldots, X_k)\) has a \(k\)-dimensional normal distribution.

If \(\bs{X} = (X_1, X_2, \ldots, X_n)\) has an \(n\)-dimensional normal distribution and if \((i_1, i_2, \ldots, i_k)\) is a sequence of distinct indices, then \(\bs{W} = (X_{i_1}, X_{i_2}, \ldots, X_{i_k})\) has a \(k\)-dimensional normal distribution.

Proof:

Use the results of Exercise 24 and Exercise 25

Suppose that \(\bs{X}\) has an \(n\)-dimensional normal distribution with mean vector \(\bs{\mu}\) and variance-covariance matrix \(\bs{V}\). Suppose also that \(\bs{a} \in \R^m\) and that \(\bs{B} \in \R^{m \times n}\) has linearly independent rows (thus, \(m \le n\)). Then \(\bs{Y} = \bs{a} + \bs{B} \, \bs{X}\) has an \(m\)-dimensional normal distribution.

  1. \(\E(\bs{Y}) = \bs{a} + \bs{B} \, \bs{\mu}\)
  2. \(\vc(\bs{Y}) = \bs{B} \, \bs{V} \, \bs{B}^T\)
Proof:

There exists an invertible \(n \times n\) matrix \(\bs{C}\) for which the first \(m\) rows are the rows of \(\bs{B}\). Now use Exercise 23 and Exercise 25.

Note that the results in Exercises 23, 24, 25, and 26 are special cases of the result in Exercise 27.

Suppose that \(\bs{X}\) has an \(n\)-dimensional normal distribution, \(\bs{Y}\) has an \(m\)-dimensional normal distribution, and that \(\bs{X}\) and \(\bs{Y}\) are independent. Then \((\bs{X}, \bs{Y})\) has an \((m + n)\)-dimensional normal distribution.

Suppose that \(\bs{X}\) is a random vector in \(\R^m\), \(\bs{Y}\) is a random vector in \(\R^n\), and that \((\bs{X}, \bs{Y})\) has an \((m + n)\)-dimensional normal distribution. Then \(\bs{X}\) and \(\bs{Y}\) are independent if and only if \(\cov(\bs{X}, \bs{Y}) = \bs{0}\) (the \(m \times n\) zero matrix).

Suppose that \(\bs{X} = (X_1, X_2, \ldots, X_n)\) has an \(n\)-dimensional normal distribution with mean vector \(\bs{\mu} = (\mu_1,\mu_2, \ldots, \mu_n)\) and variance-covariance matrix \(\bs{V} = (\sigma_{ij}: i, j \in \{1, 2, \ldots, n\})\), and that \(a_1, a_2, \ldots, a_n \in \R\) (not all 0). Then \(Y = \sum_{i=1}^n a_i X_i\) has a (univariate) normal distribution with

  1. \(\E(Y) = \sum_{i=1}^n a_i \mu_i\)
  2. \(\var(Y) = \sum_{i=1}^n \sum_{j=1}^n a_i a_j v_{i,j}\)
Proof:

This follows from Theorems 26 and 27. Let \(\bs{a} = [a_{i_1}, a_{i_2}, \ldots, a_{i_k}]\) denote the matrix of non-zero coefficients, and let \(\bs{W} = (X_{i_1}, X_{i_2}, \ldots, X_{i_k})\). Then \(\bs{W}\) has a \(k\)-dimensional normal distribution and \(Y = \bs{a} \bs{W}\)

A Further Generalization

The converse of the previous theorem is not true. For example, if \(X\) has a (univariate) normal distribution, then non-trivial linear combinations of \((X, X)\) have normal distributions (if we allow constants as part of the normal family), but \((X, X)\) does not have a 2-dimensional normal distribution in the sense above. In fact, of course, \((X, X)\) although having a continuous distribution, does not have a probability density function with respect to the standard (Lebesgue) measure on \(\R^2\). That is, it has a degenerate continuous distribution on \(\R^2\).

But the converse to Theorem 30 yields a slightly more general definition of multivariate normal distribution that is very useful in some settings, because the formulation of theorems and definitions is greatly simplified. Thus, with this definition, we say that \(\bs{X}\) has a multivariate normal distribution if for every \(a_1, a_2, \ldots, a_n \in \R\), the linear combination \(Y = \sum_{i=1}^n a_i X_i\) has a (univariate) normal distribution (including the constants). If \(\bs{X}\) has a multivariate normal distribution with this definition, and the distribution of \(\bs{X}\) is not degenerate, then \(\bs{X}\) has a multivariate distribution in our previous definition.