$$\newcommand{\P}{\mathbb{P}}$$ $$\newcommand{\R}{\mathbb{R}}$$ $$\newcommand{\E}{\mathbb{E}}$$ $$\newcommand{\var}{\text{var}}$$ $$\newcommand{\sd}{\text{sd}}$$ $$\newcommand{\skew}{\text{skew}}$$ $$\newcommand{\kurt}{\text{kurt}}$$

## 1. Introduction

In this chapter, we will study a number of parametric families of distributions that have special importance in probability and statistics. In some cases, a distribution may be important because it occurs as the limit of other distributions. In some cases, a parametric family may be important because it can be used to model a wide variety of random phenomena. In turn, this is usually the case because the family has a rich collection of probability density functions with a small number of parameters (usually 1 or 2). As a general philosophical principle, we try to model a random process with as few parameters as possible; this is sometimes referred to as the principle of parsimony of parameters. In turn, this is a special case of Ockham's razor, named in honor of William of Ockham, the principle that states that one should use the simplest model that adequately describes a given phenomenon.

There are several other parametric families of distributions that are studied elsewhere in this project, because the natural home for these distributions are various random processes. These include

Before we begin our study of special parametric families of distributions, we will study two general parametric families. Many of the special parametric families studied in this chapter belong to one or both of these general families.

### Location-Scale Families

Suppose that $$Z$$ is a fixed random variable taking values in $$\R$$. For $$a \in \R$$ and $$b \gt 0$$, let $$X = a + b \, Z$$. The two-parameter family of distributions associated with $$X$$ is called the location-scale family associated with the given distribution of $$Z$$; $$a$$ is called the location parameter and $$b$$ the scale parameter. Thus a linear transformation, with positive slope, of the underlying random variable $$Z$$ creates a location-scale family for the underlying distribution. In the special case that $$b = 1$$, the one-parameter family is called the location family associated with the given distribution, and in the special case that $$a = 0$$, the one-parameter family is called the scale family associated with the given distribution. Scale transformations, as the name suggests, occur naturally when physical units are changed. For example, if a random variable represents the length of an object, then a change of units from meters to inches corresponds to a scale transformation. Location-scale transformations can also occur with a change of physical units. For example, if a random variable represents the temperature of an object, then a change of units from Fahrenheit to Celsius corresponds to a location-scale transformation.

If $$Z$$ has distribution function $$G$$ then $$X$$ has distribution function $$F$$ given by

$F(x) = G \left( \frac{x - a}{b} \right), \quad x \in \R$
Proof: $F(x) = \P(X \le x) = \P(a + b Z \le x) = \P\left(Z \le \frac{x - a}{b}\right) = G\left(\frac{x - a}{b}\right)$

If $$Z$$ has a continuous distribution with probability density function $$g$$, then $$X$$ also has a continuous distribution, with probability density function $$f$$ given by

$f(x) = \frac{1}{b} \, g \left( \frac{x - a}{b} \right), \quad x \in \R$
1. For the location family associated with $$g$$, the graph of $$f$$ is obtained by shifting the graph of $$g$$, $$a$$ units to the right if $$a \gt 0$$ and $$-a$$ units to the left if $$a \lt 0$$.
2. For the scale family associated with $$g$$, if $$b \gt 1$$, the graph of $$f$$ is obtained from the graph of $$g$$ by stretching horizontally and compressing vertically, by a factor of $$b$$. If $$0 \lt b \lt 1$$, the graph of $$f$$ is obtained from the graph of $$g$$ by compressing horizontally and stretching vertically, by a factor of $$b$$.
Proof:

This follows by taking derivatives in Theorem 1, since $$f = F^\prime$$ and $$g = G^\prime$$

If $$Z$$ has a mode at $$z$$, then $$X$$ has a mode at $$x = a + b z$$.

Proof:

This follows from Theorem 2. If $$g$$ has a maximum at $$z$$ then $$f$$ has a maximum at $$x = a + b z$$

The following exercise relates the quantile functions of $$Z$$ and $$X$$.

If $$G$$ and $$F$$ are the distribution functions of $$Z$$ and $$X$$, respectively, then

1. $$F^{-1}(p) = a + b \, G^{-1}(p)$$ for $$p \in (0, 1)$$
2. If $$z$$ is a quantile of order $$p$$ for $$Z$$ then $$x = a + b \, z$$ is a quantile of order $$p$$ for $$X$$.
Proof:

These results follow from Theorem 2.

Let $$a \in \R$$ and $$b \gt 0$$. The uniform distribution on the interval $$[a, a + b]$$ is a location-scale family.

Proof:

Suppose that $$Z$$ has the uniform distribution on $$[0, 1]$$ (so that $$Z$$ is a random number). Then $$X = a + b Z$$ has the uniform distribution on $$[a, a + b]$$.

Let $$g(z) = e^{-z}$$ for $$0 \le z \lt \infty$$. This is the probability density function of the exponential distribution with parameter 1.

1. Find the location-scale family of probability density functions associated with $$g$$.
2. Sketch the graphs.

$f(x) = \frac{1}{b} \exp\left(-\frac{x - a}{b}\right), \quad a \le x \lt \infty$

The distributions in the previous exercise are the two-parameter exponential distributions.

Let $$g(z) = \frac{1}{\pi \, (1 + z^2)}$$ for $$z \in \R$$. This is the probability density function of the Cauchy distribution, named after Augustin Cauchy.

1. Find the location-scale family of probability density functions.
2. Sketch the graphs.

$f(x) = \frac{1}{\pi b \left(a + \frac{x - a}{b}\right)^2}, \quad x \in \R$

The distributions in the previous exercise form the general family of Cauchy distributions.

The following exercise relates the mean, variance, and standard deviation of $$Z$$ and $$X$$.

As before, suppose that $$X = a + b \, Z$$. Then

1. $$\E(X) = a + b \, \E(Z)$$
2. $$\var(X) = b^2 \, \var(Z)$$
3. $$\sd(X) = b \, \sd(Z)$$
Proof:

These result follow immediately from basic properties of expected value and variance.

Recall that the standard score of a random variable is obtained by subtracting the mean and dividing by the standard deviation. The standard score is dimensionless (that is, has no physical units) and measures the distance from the mean to the random variable in standard deviations. Since location-scale familes essentially correspond to a change of units, it's not surprising that the standard score is unchanged by a location-scale transformation.

The standard scores of $$X$$ and $$Z$$ are the same:

$\frac{X - \E(X)}{\sd(X)} = \frac{Z - \E(Z)}{\sd(Z)}$
Proof:

From the previous theorem,

$\frac{X - \E(X)}{\sd(X)} = \frac{a + b Z - [a + b \E(Z)]}{b \sd(Z)} = \frac{Z - \E(Z)}{\sd(Z)}$

Recall that the skewness and kurtosis of a random variable are the third and fourth moments, respectively, of the standard score. Thus it follows from the previous exercise that skewness and kurtosis are unchanged by location-scale transformations: $$\skew(X) = \skew(Z)$$, $$\kurt(X) = \kurt(Z)$$. The following exercise relates the moment generating functions of $$Z$$ and $$X$$.

If $$Z$$ has moment generating function $$M$$ then $$X$$ has moment generating function $$N$$ given by

$N(t) = e^{a \, t} \, M(b \, t)$
Proof: $N(t) = \E(e^{tX}) = \E[e^{t(a + bZ)}] = e^{ta} \E(e^{t b Z}) = e^{a t} M(b t)$

Two probability distributions on $$\R$$ are said to be of the same type if they are related by a location-scale transformation. Specifically, if the distributions have distribution functions $$F$$ and $$G$$, then the distributions are of the same type if there exist constants $$a \in \R$$ and $$b \in (0, \infty)$$ such that

$F(x) = G \left( \frac{x - a}{b} \right), \quad x \in \R$

Being of the same type is an equivalence relation on the collection of probability distributions on $$\R$$. That is, if $$F$$, $$G$$, and $$H$$ are arbitrary distribution functions then

1. $$F$$ is the same type as $$F$$ (the reflexive property).
2. If $$F$$ is the same type as $$G$$ then $$G$$ is the same type as $$F$$ (the symmetric property).
3. If $$F$$ is the same type as $$G$$, and $$G$$ is the same type as $$H$$, then $$F$$ is the same type as $$H$$ (the transitive property).

### Exponential Families

Suppose that $$X$$ is random variable taking values in $$S$$, and that the distribution of $$X$$ depends on an unspecified parameter $$\theta$$ taking values in a parameter space $$\Theta$$. In general, both $$X$$ and $$\theta$$ may be vector-valued. Let $$f_\theta$$ denote the probability density function of $$X$$ on $$S$$, corresponding to $$\theta \in \Theta$$.

The distribution of $$X$$ is a $$k$$-parameter exponential family if $$S$$ does not depend on $$\theta$$ and if the probability density function can be written as

$f_\theta(x) = \alpha(\theta) \, g(x) \, \exp \left( \sum_{i=1}^k \beta_i(\theta) \, h_i(x) \right); \quad x \in S, \; \theta \in \Theta$

where $$\alpha$$ and $$(\beta_1, \beta_2, \ldots, \beta_k)$$ are real-valued functions on $$\Theta$$, and where $$g$$ and $$(h_1, h_2, \ldots, h_k)$$ are real-valued functions on $$S$$. Moreover, $$k$$ is assumed to be the smallest such integer. The parameters $$(\beta_1(\theta), \beta_2(\theta), \ldots, \beta_k(\theta))$$ are sometimes called natural parameters of the distribution, and the random variables $$(h_1(X), h_2(X), \ldots, h_k(X))$$ are sometimes called natural statistics of the distribution. Although the definition may look intimidating, exponential families are useful because they have many nice mathematical properties, and because many special parametric families turn out to be exponential families.

Suppose that $$X$$ has the binomial distribution with parameters $$n$$ and $$p$$, where $$n$$ is fixed and $$p \in (0, 1)$$. This distribution is a one-parameter exponential family with natural parameter $$\ln \left( \frac{p}{1 - p} \right)$$ and natural statistic $$X$$. Note that the natural parameter is the logarithm of the odds ratio corresponding to $$p$$. This function is sometimes called the logit function.

Suppose that $$X$$ has the Poisson distribution with parameter $$a \in (0, \infty)$$. This distribution is a one-parameter exponential family with natural parameter $$\ln(a)$$ and natural statistic $$X$$.

Suppose that $$X$$ has the negative binomial distribution with parameters $$k$$ and $$p$$, where $$k$$ is fixed and $$p \in (0, 1)$$. This distribution is a one-parameter exponential family with natural parameter $$\ln(1 - p)$$ and natural statistic $$X$$.

In many cases, the distribution of a random variable $$X$$ will fail to be an exponential family if the support set $$\{x \in S: f(x) \gt 0\}$$ depends on the parameter $$\theta$$.

If $$X$$ has the uniform distribution on $$(0, a)$$ where $$a \in (0, \infty)$$ then the distribution of $$X$$ is not an exponential family.

The next exercise shows that if we sample from the distribution of an exponential family, then the distribution of the random sample is itself an exponential family with the same natural statistics.

Suppose that the distribution of random variable $$X$$ is a $$k$$-parameter exponential family with natural parameters $$(\beta_1(\theta), \beta_2(\theta), \ldots, \beta_k(\theta))$$, and natural statistics $$(h_1(X), h_2(X), \ldots, h_k(X))$$. Let $$\boldsymbol{X} = (X_1, X_2, \ldots, X_n)$$ be a sequence of $$n$$ independent random variables, each with the same distribution as $$X$$. Then $$\boldsymbol{X}$$ is a $$k$$-parameter exponential family with natural parameters $$(\beta_1(\theta), \beta_2(\theta), \ldots, \beta_k(\theta))$$, and natural statistics

$u_j(\boldsymbol{X}) = \sum_{i=1}^n h_j(X_i), \quad j \in \{1, 2, \ldots, k\}$