1. Random
2. 4. Special Distributions
3. The Chi-Square Distribution

## The Chi-Square Distribution

In this section we will study a distribution that has special importance in statistics. In particular, this distribution will arise in the study of the sample variance when the underlying distribution is normal and in goodness of fit tests.

### Basic Theory

#### Distribution Functions

For $$n \gt 0$$, the gamma distribution with shape parameter $$\frac{n}{2}$$ and scale parameter 2 is called the chi-square distribution with $$n$$ degrees of freedom. The probability density function $$f$$ is given by $f(x) = \frac{1}{2^{n/2} \Gamma(n/2)} x^{n/2 - 1} e^{-x/2}, \quad x \in (0, \infty)$

For reasons that will be clear later, $$n$$ is usually a positive integer, although technically this is not a mathematical requirement. When $$n$$ is a positive integer, the gamma function in the normalizing constant can be be given explicitly.

If $$n \in \N_+$$ then

1. $$\Gamma(n/2) = (n/2 - 1)!$$ if $$n$$ is even.
2. $$\Gamma(n/2) = \frac{(n - 1)!}{2^{n-1} (n/2 - 1/2)!} \sqrt{\pi}$$ if $$n$$ is odd.

The chi-square distribution has a rich collection of shapes.

The chi-square probability density function satisfies the following properties:

1. If $$0 \lt n \lt 2$$ then $$f$$ is decreasing with $$f(x) \to \infty$$ as $$x \downarrow 0$$.
2. If $$n = 2$$ then $$f$$ is decreasing with $$f(0) = \frac{1}{2}$$.
3. If $$n \gt 2$$ then $$f$$ increases and then decreases with mode at $$n - 2$$.
4. If $$0 \lt n \le 2$$ then $$f$$ is concave downward.
5. If $$2 \lt n \le 4)$$ then $$f$$ is concave downward and then upward, with inflection point at $$n - 2 + \sqrt{2 n - 4}$$
6. If $$n \gt 4$$ then $$f$$ is concave upward then downward and then upward again, with inflection points at $$n - 2 \pm \sqrt{2 n - 4}$$

In the special distribution simulator, select the chi-square distribution. Vary $$n$$ with the scroll bar and note the shape of the probability density function. For selected values of $$n$$, run the simulation 1000 times and compare the empirical density function to the true probability density function.

The distribution function and the quantile function do not have simple, closed-form representations for most values of the parameter. However, the distribution function can be given in terms of the complete and incomplete gamma functions.

Suppose that $$X$$ has the chi-square distribution with $$n$$ degrees of freedom. The distribution function $$F$$ of $$X$$ is given by $F(x) = \frac{\Gamma(n/2, x/2)}{\Gamma(n/2)}, \quad x \in (0, \infty)$

Approximate values of the distribution and quantile functions can be obtained from the special distribution calculator, and from most mathematical and statistical software packages.

In the special distribution calculator, select the chi-square distribution. Vary the parameter and note the shape of the probability density, distribution, and quantile functions. In each of the following cases, find the median, the first and third quartiles, and the interquartile range.

1. $$n = 1$$
2. $$n = 2$$
3. $$n = 5$$
4. $$n = 10$$

#### Moments

The mean, variance, moments, and moment generating function of the chi-square distribution can be obtained easily from general results for the gamma distribution.

If $$X$$ has the chi-square distribution with $$n$$ degrees of freedom then

1. $$\E(X) = n$$
2. $$\var(X) = 2 n$$

In the simulation of the special distribution simulator, select the chi-square distribution. Vary $$n$$ with the scroll bar and note the size and location of the mean $$\pm$$ standard deviation bar. For selected values of $$n$$, run the simulation 1000 times and compare the empirical moments to the distribution moments.

The skewness and kurtosis of the chi-square distribution are given next.

If $$X$$ has the chi-square distribution with $$n$$ degrees of freedom, then

1. $$\skew(X) = 2 \sqrt{2 / n}$$
2. $$\kurt(X) = 3 + 12/n$$

Note that $$\skew(X) \to 0$$ and $$\kurt(X) \to 3$$ as $$n \to \infty$$.

In the simulation of the special distribution simulator, select the chi-square distribution. Increase $$n$$ with the scroll bar and note the shape of the probability density function in light of the previous results on skewness and kurtosis. For selected values of $$n$$, run the simulation 1000 times and compare the empirical density function to the true probability density function.

The next result gives the general moments of the chi-square distribution.

If $$X$$ has the chi-square distribution with $$n$$ degrees of freedom, then for $$k \gt -n/2$$, $\E\left(X^k\right) = 2^k \frac{\Gamma(n/2 + k)}{\Gamma(n/2)}$

In particular, if $$k \in \N_+$$ then $\E\left(X^k\right) = 2^k \left(\frac{n}{2}\right)\left(\frac{n}{2} + 1\right) \cdots \left(\frac{n}{2} + k - 1\right)$ Note also $$\E\left(X^k\right) = \infty$$ if $$k \le -n/2$$.

If $$X$$ has the chi-square distribution with $$n$$ degrees of freedom, then $$X$$ has moment generating function $\E\left(e^{t X}\right) = \frac{1}{(1 - 2 t)^{n / 2}}, \quad t \lt \frac{1}{2}$

#### Relations

The chi-square distribution is connected to a number of other special distributions. Of course, the most important relationship is the definition—the chi-square distribution with $$n$$ degrees of freedom is a special case of the gamma distribution, corresponding to shape parameter $$n/2$$ and scale parameter 2. On the other hand, any gamma distributed variable can be re-scaled into a variable with a chi-square distribution.

If $$X$$ has the gamma distribution with shape parameter $$k$$ and scale parameter $$b$$ then $$Y = \frac{2}{b} X$$ has the chi-square distribution with $$2 k$$ degrees of freedom.

Proof:

Since the gamma distribution is a scale family, $$Y$$ has a gamma distribution with shape parameter $$k$$ and scale parameter $$b \frac{2}{b} = 2$$. Hence $$Y$$ has the chi-square distribution with $$2 k$$ degrees of freedom.

The chi-square distribution with 2 degrees of freedom is the exponential distribution with scale parameter 2.

Proof:

The chi-square distribution with 2 degrees of freedom is the gamma distribution with shape parameter 1 and scale parameter 2, which we already know is the exponential distribution with scale parameter 2.

If $$Z$$ has the standard normal distribution then $$X = Z^2$$ has the chi-square distribution with 1 degree of freedom.

Proof:

As usual, let $$\phi$$ and $$\Phi$$ denote the PDF and CDF of the standard normal distribution, respectivley Then for $$x \gt 0$$, $\P(X \le x) = \P(-\sqrt{x} \le Z \le \sqrt{x}) = 2 \Phi\left(\sqrt{x}\right) - 1$ Differentiating with respect to $$x$$ gives the density function $$f$$ of $$X$$: $f(x) = \phi\left(\sqrt{x}\right) x^{-1/2} = \frac{1}{\sqrt{2 \pi}} x^{-1/2} e^{-x / 2}, \quad x \in (0, \infty)$ which we recognize as the chi-square PDF with 1 degree of freedom.

Recall that if we add independent gamma variables with a common scale parameter, the resulting random variable also has a gamma distribution, with the common scale parameter and with shape parameter that is the sum of the shape parameters of the terms. Specializing to the chi-square distribution, we have the following important result:

If $$X$$ has the chi-square distribution with $$m$$ degrees of freedom, $$Y$$ has the chi-square distribution with $$n$$ degrees of freedom, and $$X$$ and $$Y$$ are independent, then $$X + Y$$ has the chi-square distribution with $$m + n$$ degrees of freedom.

The last two results lead to the following theorem, which is fundamentally important in statistics.

If $$(Z_1, Z_2, \ldots, Z_n)$$ is a sequence of independent standard normal variables then the sum of the squares $V = \sum_{i=1}^n Z_i^2$ has the chi-square distribution with $$n$$ degrees of freedom:

This theorem is the reason that the chi-square distribution deserves a name of its own, and the reason that the degrees of freedom parameter is usually a positive integer. Sums of squares of independent normal variables occur frequently in statistics.

From the central limit theorem, and previous results for the gamma distribution, it follows that if $$n$$ is large, the chi-square distribution with $$n$$ degrees of freedom can be approximated by the normal distribution with mean $$n$$ and variance $$2 n$$. Here is the precise statement:

If $$X_n$$ has the chi-square distribution with $$n$$ degrees of freedom, then the distribution of the standard score $Z_n = \frac{X_n - n}{\sqrt{2 n}}$ converges to the standard normal distribution as $$n \to \infty$$.

In the simulation of the special distribution simulator, select the chi-square distribution. Start with $$n = 1$$ and increase $$n$$. Note the shape of the probability density function in light of the previous theorem. For selected values of $$n$$, run the experiment 1000 times and compare the empirical density function to the true density function.

Like the gamma distribution, the chi-square distribution is infinitely divisible:

Suppose that $$X$$ has the chi-square distribution with $$n \in (0, \infty)$$ degrees of freedom. For $$k \in \N_+$$, $$X$$ has the same distribution as $$\sum_{i=1}^k X_i$$, where $$(X_1, X_2, \ldots, X_k)$$ is a sequence of independent random variables, each with the chi-square distribution with $$n / k$$ degrees of freedom.

Also like the gamma distribution, the chi-square distribution is a member of the general exponential family of distributions:

The chi-square distribution with with $$n$$ degrees of freedom is a one-parameter exponential family with natural parameter $$n/2 - 1$$, and natural statistic $$\ln(X)$$.

Proof:

This follows from the definition of the general exponential family. The PDF can be written as $f(x) = \frac{e^{-x/2}}{2^{n/2} \Gamma(n/2)} \exp\left[(n/2 - 1) \ln(x)\right], \quad x \in (0, \infty)$

#### Computational Exercises

Suppose that a missile is fired at a target at the origin of a plane coordinate system, with units in meters. The missile lands at $$(X, Y)$$ where $$X$$ and $$Y$$ are independent and each has the normal distribution with mean 0 and variance 100. The missile will destroy the target if it lands within 20 meters of the target. Find the probability of this event.

Let $$Z$$ denote the distance from the missile to the target. $$\P(Z \lt 20) = 1 - e^{-2} \approx 0.8647$$
Suppose that $$X$$ has the chi-square distribution with $$n = 18$$ degrees of freedom. For each of the following, compute the true value using the special distribution calculator and then compute the normal approximation. Compare the results.
1. $$\P(15 \lt X \lt 20)$$
2. The 75th percentile of $$X$$.
1. $$\P(15 \lt X \lt 20) = 0.3252$$, $$\P(15 \lt X \lt 20) \approx 0.3221$$
2. $$x_{0.75} = 21.605$$, $$x_{0.75} \approx 22.044$$