1. Virtual Laboratories
2. 4. Special Distributions
3. The Chi-Square Distribution

## The Chi-Square Distribution

In this section we will study a distribution that has special importance in statistics. In particular, this distribution will arise in the study of the sample variance when the underlying distribution is normal and in goodness of fit tests.

#### The Density Function

For $$n \gt 0$$, the gamma distribution with shape parameter $$k = \frac{n}{2}$$ and scale parameter 2 is called the chi-square distribution with $$n$$ degrees of freedom. For reasons that will be clear later, $$n$$ is usually a positive integer, although technically this is not a mathematical requirement.

The chi-square distribution with $$n$$ degrees of freedom has probability density function

$f(x) = \frac{1}{2^{n/2} \Gamma(n/2)} x^{n/2 - 1} e^{-x/2}, \quad 0 \lt x \lt \infty$

In the special distribution simulator, select the chi-square distribution. Vary $$n$$ with the scroll bar and note the shape of the probability density function. For selected values of $$n$$, run the simulation 1000 times and note the apparent convergence of the empirical density function to the true density function.

The chi-square distribution with 2 degrees of freedom is the exponential distribution with scale parameter 2.

The chi-square probability density function satisfies the following properties:

1. If $$0 \lt n \lt 2$$ then $$f$$ is decreasing and concave upward with $$f(x) \to \infty$$ as $$x \downarrow 0$$.
2. If $$n = 2$$ then $$f$$ is decreasing and concave upward with $$f(0) = \frac{1}{2}$$.
3. If $$2 \lt n \le 4$$ then $$f$$ increases on $$(0, n - 2)$$ and decreases on $$(n - 2, \infty)$$. Also, $$f$$ is concave downward on $$(0, x_1)$$ and concave upward on $$(x_1, \infty)$$, where $$x_1 = n - 2 + \sqrt{2 n - 4}$$.
4. If $$n \gt 4$$ then $$f$$ increases on $$(0, n - 2)$$ and decreases on $$(n - 2, \infty)$$. Also $$f$$ is concave upward on $$(0, x_1)$$ and on $$(x_1, \infty)$$ and is concave downward on $$(x_1, x_2)$$, where $$x_1 = n - 2 - \sqrt{2 n - 4}$$ and $$x_2 = n - 2 + \sqrt{2 n - 4}$$.

In particular, when $$n \ge 2$$, the distribution is unimodal with mode $$n - 2$$.

The distribution function and the quantile function do not have simple, closed-form representations. Approximate values of these functions can be obtained from the special distribution calculator, and from most mathematical and statistical software packages.

In the special distribution calculator, select the chi-square distribution. Vary the parameter and note the shape of the density function and the distribution function. In each of the following cases, find the median, the first and third quartiles, and the interquartile range.

1. $$n = 1$$
2. $$n = 2$$
3. $$n = 5$$
4. $$n = 10$$

#### Moments

The mean, variance, moments, and moment generating function of the chi-square distribution can be obtained easily from general results for the gamma distribution.

If $$X$$ has the chi-square distribution with $$n$$ degrees of freedom then

1. $$\E(X) = n$$
2. $$\var(X) = 2 \, n$$

If $$X$$ has the chi-square distribution with $$n$$ degrees of freedom, then the moment of order $$k \gt 0$$ is

$\E(X^k) = 2^k \frac{\Gamma(n/2 + k)}{\Gamma(n/2)}$

If $$X$$ has the chi-square distribution with $$n$$ degrees of freedom, then $$X$$ has moment generating function

$\E(e^{t \, X}) = \frac{1}{(1 - 2 \, t)^{n / 2}}, \quad t \lt \frac{1}{2}$

In the simulation of the special distribution simulator, select the chi-square distribution. Vary $$n$$ with the scroll bar and note the size and location of the mean/standard deviation bar. For selected values of $$n$$, run the simulation 1000 times and note the apparent convergence of the empirical moments to the distribution moments.

#### Transformations

If $$Z$$ has the standard normal distribution then $$U = Z^2$$ has the chi-square distribution with 1 degree of freedom.

Proof:

Use the change of variable theorem.

If $$X$$ has the chi-square distribution with $$m$$ degrees of freedom, $$Y$$ has the chi-square distribution with $$n$$ degrees of freedom, and $$X$$ and $$Y$$ are independent, then $$X + Y$$ has the chi-square distribution with $$m + n$$ degrees of freedom.

Proof:

Use moment generating functions or properties of the gamma distribution.

If $$(Z_1, Z_2, \ldots, Z_n)$$ is a sequence of independent standard normal variables (that is, a random sample of size $$n$$ from the standard normal distribution) then the sum of the squares has the chi-square distribution with $$n$$ degrees of freedom:

$V = \sum_{i=1}^n Z_i^2$
Proof:

Use the results of the previous two exercises.

The result of the last exercise is the reason that the chi-square distribution deserves a name of its own, and the reason that the degrees of freedom parameter is usually a positive integer. Sums of squares of independent normal variables occur frequently in statistics. On the other hand, the following exercise shows that any gamma distributed variable can be re-scaled into a variable with a chi-square distribution.

If $$X$$ has the gamma distribution with shape parameter $$k$$ and scale parameter $$b$$ then $$Y = \frac{2}{b} X$$ has the chi-square distribution with $$2 \, k$$ degrees of freedom.

Suppose that a missile is fired at a target at the origin of a plane coordinate system, with units in meters. The missile lands at $$(X, Y)$$ where $$X$$ and $$Y$$ are independent and each has the normal distribution with mean 0 and variance 100. The missile will destroy the target if it lands within 20 meters of the target. Find the probability of this event.

Let $$Z$$ denote the distance from the missile to the target. $$\P(Z \lt 20) = 1 - e^{-2} \approx 0.8647$$

#### Normal Approximation

From the central limit theorem, and previous results for the gamma distribution, it follows that if $$n$$ is large, the chi-square distribution with $$n$$ degrees of freedom can be approximated by the normal distribution with mean $$n$$ and variance $$2 \, n$$. More precisely, if $$X_n$$ has the chi-square distribution with $$n$$ degrees of freedom, then the distribution of the standardized variable below converges to the standard normal distribution as $$n \to \infty$$.

$Z_n = \frac{X_n - n}{\sqrt{2 \, n}}$

In the simulation of the special distribution simulator, select the chi-square distribution. Start with $$n = 1$$ and increase $$n$$. Note the shape of the probability density function. For selected values of $$n$$, run the experiment 1000 times and note the apparent convergence of the empirical density function to the true density function.

Suppose that $$X$$ has the chi-square distribution with $$n = 18$$ degrees of freedom. For each of the following, compute the true value using the special distribution calculator and then compute the normal approximation. Compare the results.

1. $$\P(15 \lt X \lt 20)$$
2. The 75th percentile of $$X$$.
1. $$\P(15 \lt X \lt 20) = 0.3252$$, $$\P(15 \lt X \lt 20) \approx 0.3221$$
2. $$x_{0.75} = 21.605$$, $$x_{0.75} \approx 22.044$$