The main purpose of this section is a discussion of expected value and covariance for random matrices and vectors. These topics are particularly important in multivariate statistical models and the multivariate normal distribution. This section requires some prerequisite knowledge of linear algebra.
Basic Theory
We will let
denote the space of all
matrices of real numbers. In particular, we will identify
with
,
so that an ordered
-tuple can also be thought of as an
column vector. The transpose of a matrix
is denoted
.
As usual, our starting point is a random experiment with a probability measure
on an underlying sample space.
Expected Value of a Random Matrix
Suppose that
is an
matrix of real-valued random variables,
whose
entry is denoted
.
Equivalently,
can be thought of as a random
matrix. It is natural to define the expected value
to be the
matrix whose
entry is
,
the expected value of
.
Many of the basic properties of expected value of random variables have analogies for expected value of random matrices, with matrix operation replacing the ordinary ones.
Show that
if
and
are random
matrices.
Show that
if
is a non-random
matrix and
is random
matrix.
Show that
if
is a random
matrix,
is a random
matrix, and
and
are independent.
Covariance Matrices
Suppose now that
is a random vector in
and
is a random vector in
.
The covariance matrix of
and
is the
matrix
whose
entry is
the covariance of
and
.
Show that
.
Show that
Show that
Show that
if and only if each coordinate of
X is uncorrelated with each coordinate of
Y (in particular, this holds if
X and
Y are independent).
Show that
XYZXZYZ
if
X and
Y are random vectors in
m
and
Z is a random vector in
n.
Show that
XYZXYXZ
if
X is a random vector in
m
and
Y and
Z are random vectors in
n.
Show that
AXYAXY
if
X is a random vector in
m,
Y is a random vector in
n,
and
A is a non-random matrix in
km.
Show that
XAYXYA
if
X is a random vector in
m,
Y is a random vector in
n,
and
A is a non-random matrix in
kn.
Variance-Covariance Matrices
Suppose now that
XX1X2Xn
is a random vector in
n.
The covariance matrix of
X with itself is called the variance-covariance matrix of
X:
VCXXX
Show that
VCX
is a symmetric
nn
matrix with
X1X2Xn
on the diagonal.
Show that
VCXYVCXXYYXVCY
if
X and
Y are random vectors in
n.
Show that
VCAXAVCXA
if
X is a random vector in
n
and
A is a non-random matrix in
mn.
If
an,
note that
aX
is simply the inner product or dot product of
a with
X, and is a linear combination
of the coordinates of
X:
aXi1naiXi>
Show that
aXaVCXa
if
X is a random vector in
n
and
an.
Thus conclude that
VCX
is either positive semi-definite or positive definite. In particular, the eigenvalues and the determinant of
VCX
are nonnegative.
Show that
VCX
is positive semi-definite (but not positive definite) if and only if there exists
an
and
c
such that, with probability 1,
aXi1naiXic
Thus, if
VCX
is positive semi-definite, then one of the coordinates of
X can be written as an affine transformation of the other coordinates (and hence can usually be eliminated in the underlying model). By contrast, if
VCX
is positive definite, then this cannot happen;
VCX
has positive eigenvalues and determinant and is invertible.
Best Linear Predictors
Suppose again that
XX1X2Xm
is a random vector in
m
and that
YY1Y2Yn
is a random vector in
n.
We are interested in finding the linear (technically affine) function of
X,
AXb, Anm, bn
that is closest to
Y in the mean square sense. This problem is of fundamental importance in statistics when random vector
X, the
predictor vector is observable, but not random vector
Y, the response vector. Our discussion here generalizes the one-dimensional case, when
X and
Y are random variables. That problem was solved in the section on Covariance and Correlation. We will assume that
VCX is positive definite, so that none of the coordinates of
X can be written as an affine function of the other coordinates.
Show that
YAXb2
is minimized when
AYXVCX
and
bYYXVCXX
Thus, the linear function of
X that is closest to
X in the mean square sense is the random vector
LYXYYXVCXXX.
The function of
x given by
LYXxYYXVCXxX
is known as the (distribution) linear regression function. If we observe
Xx
then
LYXx
is our prediction of
Y.
Non-linear regression with a single, real-valued predictor variable can be thought of as a special case of multiple linear regression. Thus, suppose that
X is the predictor variable,
Y is the response variable, and that
g1g2gn
is a sequence of real-valued functions. We can apply the results of Exercise 17 to find the linear
function of
g1Xg2XgnX
that is closest to
Y in the mean square sense. We just replace
Xi
with
giX
for each
i.
Examples and Applications
Suppose that
XY
has probability density function
fxyxy, 0x1, 0y1.
Find each of the following:
XY
VCXY
Suppose that
XY
has probability density function
fxy2xy, 0xy1.
Find each of the following:
XY
VCXY
Suppose that
XY
has probability density function
fxy6x2y, 0x1, 0y1.
Find each of the following:
XY
VCXY
Suppose that
XY
has probability density function
fxy15x2y, 0xy1.
Find each of the following:
XY
VCXY
LYX
LYXX2
Sketch the regression curves on the same set of axes.
Suppose that
XYZ
is uniformly distributed on the region
xyz30xyz1
Find each of the following:
XYZ
VCXYZ
LZXY.
LYXZ.
LXYZ.
Suppose that
X is uniformly distributed on
01,
and that given
X,
Y is uniformly distributed on
0X.
Find each of the following: