In this chapter, we will study a number of parametric families of distributions that have special importance in probability and statistics. In some cases, a distribution may be important because it occurs as the limit of other distributions. In some cases, a parametric family may be important because it can be used to model a wide variety of random phenomena. In turn, this is usually the case because the family has a rich collection of probability density functions with a small number of parameters (usually 1 or 2). As a general philosophical principle, we try to model a random process with as few parameters as possible; this is sometimes referred to as the principle of parsimony of parameters. In turn, this is a special case of Ockham's razor, named in honor of William of Ockham, the principle that states that one should use the simplest model that adequately describes a given phenomenon.
There are several other parametric families of distributions that are studied elsewhere in this project, because the natural home for these distributions are various random processes. These include
Before we begin our study of special parametric families of distributions, we will study two general parametric families. Many of the special parametric families studied in this chapter belong to one or both of these general families.
If \(Z\) has distribution function \(G\) then \(X\) has distribution function \(F\) given by
\[ F(x) = G \left( \frac{x - a}{b} \right), \quad x \in \R\]If \(Z\) has a continuous distribution with probability density function \(g\), then \(X\) also has a continuous distribution, with probability density function \(f\) given by
\[ f(x) = \frac{1}{b} \, g \left( \frac{x - a}{b} \right), \quad x \in \R\]This follows by taking derivatives in Theorem 1, since \( f = F^\prime \) and \( g = G^\prime \)
If \(Z\) has a mode at \(z\), then \(X\) has a mode at \(x = a + b z\).
This follows from Theorem 2. If \( g \) has a maximum at \( z \) then \( f \) has a maximum at \( x = a + b z \)
The following exercise relates the quantile functions of \(Z\) and \(X\).
If \(G\) and \(F\) are the distribution functions of \(Z\) and \(X\), respectively, then
These results follow from Theorem 2.
Let \(a \in \R\) and \(b \gt 0\). The uniform distribution on the interval \([a, a + b]\) is a location-scale family.
Suppose that \( Z \) has the uniform distribution on \( [0, 1] \) (so that \( Z \) is a random number). Then \( X = a + b Z \) has the uniform distribution on \( [a, a + b] \).
Let \(g(z) = e^{-z}\) for \(0 \le z \lt \infty\). This is the probability density function of the exponential distribution with parameter 1.
\[f(x) = \frac{1}{b} \exp\left(-\frac{x - a}{b}\right), \quad a \le x \lt \infty\]
The distributions in the previous exercise are the two-parameter exponential distributions.
Let \(g(z) = \frac{1}{\pi \, (1 + z^2)}\) for \(z \in \R\). This is the probability density function of the Cauchy distribution, named after Augustin Cauchy.
\[f(x) = \frac{1}{\pi b \left(a + \frac{x - a}{b}\right)^2}, \quad x \in \R\]
The distributions in the previous exercise form the general family of Cauchy distributions.
The following exercise relates the mean, variance, and standard deviation of \(Z\) and \(X\).
As before, suppose that \(X = a + b \, Z\). Then
These result follow immediately from basic properties of expected value and variance.
Recall that the standard score of a random variable is obtained by subtracting the mean and dividing by the standard deviation. The standard score is dimensionless (that is, has no physical units) and measures the distance from the mean to the random variable in standard deviations. Since location-scale familes essentially correspond to a change of units, it's not surprising that the standard score is unchanged by a location-scale transformation.
The standard scores of \(X\) and \(Z\) are the same:
\[ \frac{X - \E(X)}{\sd(X)} = \frac{Z - \E(Z)}{\sd(Z)} \]From the previous theorem,
\[ \frac{X - \E(X)}{\sd(X)} = \frac{a + b Z - [a + b \E(Z)]}{b \sd(Z)} = \frac{Z - \E(Z)}{\sd(Z)} \]Recall that the skewness and kurtosis of a random variable are the third and fourth moments, respectively, of the standard score. Thus it follows from the previous exercise that skewness and kurtosis are unchanged by location-scale transformations: \(\skew(X) = \skew(Z)\), \(\kurt(X) = \kurt(Z)\). The following exercise relates the moment generating functions of \(Z\) and \(X\).
If \(Z\) has moment generating function \(M\) then \(X\) has moment generating function \(N\) given by
\[ N(t) = e^{a \, t} \, M(b \, t) \]Two probability distributions on \(\R\) are said to be of the same type if they are related by a location-scale transformation. Specifically, if the distributions have distribution functions \(F\) and \(G\), then the distributions are of the same type if there exist constants \(a \in \R\) and \(b \in (0, \infty)\) such that
\[ F(x) = G \left( \frac{x - a}{b} \right), \quad x \in \R \]Being of the same type is an equivalence relation on the collection of probability distributions on \(\R\). That is, if \(F\), \(G\), and \(H\) are arbitrary distribution functions then
Suppose that \(X\) is random variable taking values in \(S\), and that the distribution of \(X\) depends on an unspecified parameter \(\theta\) taking values in a parameter space \(\Theta\). In general, both \(X\) and \(\theta\) may be vector-valued. Let \(f_\theta\) denote the probability density function of \(X\) on \(S\), corresponding to \(\theta \in \Theta\).
The distribution of \(X\) is a \(k\)-parameter exponential family if \(S\) does not depend on \(\theta\) and if the probability density function can be written as
\[ f_\theta(x) = \alpha(\theta) \, g(x) \, \exp \left( \sum_{i=1}^k \beta_i(\theta) \, h_i(x) \right); \quad x \in S, \; \theta \in \Theta \]where \(\alpha\) and \((\beta_1, \beta_2, \ldots, \beta_k)\) are real-valued functions on \(\Theta\), and where \(g\) and \((h_1, h_2, \ldots, h_k)\) are real-valued functions on \(S\). Moreover, \(k\) is assumed to be the smallest such integer. The parameters \((\beta_1(\theta), \beta_2(\theta), \ldots, \beta_k(\theta))\) are sometimes called natural parameters of the distribution, and the random variables \((h_1(X), h_2(X), \ldots, h_k(X))\) are sometimes called natural statistics of the distribution. Although the definition may look intimidating, exponential families are useful because they have many nice mathematical properties, and because many special parametric families turn out to be exponential families.
Suppose that \(X\) has the binomial distribution with parameters \(n\) and \(p\), where \(n\) is fixed and \(p \in (0, 1)\). This distribution is a one-parameter exponential family with natural parameter \(\ln \left( \frac{p}{1 - p} \right)\) and natural statistic \(X\). Note that the natural parameter is the logarithm of the odds ratio corresponding to \(p\). This function is sometimes called the logit function.
Suppose that \(X\) has the Poisson distribution with parameter \(a \in (0, \infty)\). This distribution is a one-parameter exponential family with natural parameter \(\ln(a)\) and natural statistic \(X\).
Suppose that \(X\) has the negative binomial distribution with parameters \(k\) and \(p\), where \(k\) is fixed and \(p \in (0, 1)\). This distribution is a one-parameter exponential family with natural parameter \(\ln(1 - p)\) and natural statistic \(X\).
In many cases, the distribution of a random variable \(X\) will fail to be an exponential family if the support set \(\{x \in S: f(x) \gt 0\}\) depends on the parameter \(\theta\).
If \(X\) has the uniform distribution on \((0, a)\) where \(a \in (0, \infty)\) then the distribution of \(X\) is not an exponential family.
The next exercise shows that if we sample from the distribution of an exponential family, then the distribution of the random sample is itself an exponential family with the same natural statistics.
Suppose that the distribution of random variable \(X\) is a \(k\)-parameter exponential family with natural parameters \((\beta_1(\theta), \beta_2(\theta), \ldots, \beta_k(\theta))\), and natural statistics \((h_1(X), h_2(X), \ldots, h_k(X))\). Let \(\boldsymbol{X} = (X_1, X_2, \ldots, X_n)\) be a sequence of \(n\) independent random variables, each with the same distribution as \(X\). Then \(\boldsymbol{X}\) is a \(k\)-parameter exponential family with natural parameters \((\beta_1(\theta), \beta_2(\theta), \ldots, \beta_k(\theta))\), and natural statistics
\[ u_j(\boldsymbol{X}) = \sum_{i=1}^n h_j(X_i), \quad j \in \{1, 2, \ldots, k\} \]