\(\newcommand{\R}{\mathbb{R}}\) \(\newcommand{\N}{\mathbb{N}}\) \(\newcommand{\E}{\mathbb{E}}\) \(\newcommand{\P}{\mathbb{P}}\) \(\newcommand{\var}{\text{var}}\) \(\newcommand{\sd}{\text{sd}}\) \(\newcommand{\cov}{\text{cov}}\) \(\newcommand{\cor}{\text{cor}}\) \(\newcommand{\skew}{\text{skew}}\) \(\newcommand{\kurt}{\text{kurt}}\)
  1. Virtual Laboratories
  2. 4. Special Distributions
  3. The Beta Distribution

The Beta Distribution

In this section, we will study a two-parameter family of distributions that has special importance in probability and statistics.

The Beta Function

The beta function, first introduced by Leonhard Euler, is defined as follows:

\[ B(a, b) = \int_0^1 u^{a-1} (1 - u)^{b - 1} du; \quad a \gt 0, \; b \gt 0 \]

The beta function is well-defined, that is, \(B(a, b) \lt \infty\) for any \(a \gt 0\) and \(b \gt 0\).

Proof:

Break the integral into two parts, from 0 to \(\frac{1}{2}\) and from \(\frac{1}{2}\) to 1. If \(0 \lt a \lt 1\), the integral is improper at \(u = 0\), but \((1 - u)^{b-1}\) is bounded on \((0, \frac{1}{2}]\). If \(0 \lt b \lt 1\), the integral is improper at \(u = 1\), but \(u^{a-1}\) is bounded on \([\frac{1}{2}, 1)\).

The beta function satisfies the following properties:

  1. \(B(a, b) = B(b, a)\)
  2. \(B(a, 1) = \frac{1}{a}\)

The beta function can be written in terms of the gamma function as follows:

\[ B(a, b) = \frac{\Gamma(a) \, \Gamma(b)}{\Gamma(a + b)}; \quad a \gt 0, \; b \gt 0 \]
Proof:

Express \(\Gamma(a + b) \, B(a, b)\) as a double integral with respect to \(x\) and \(y\) where \(x \gt 0\) and \(0 \lt y \lt 1\). Next use the transformation \(w = x \, y\), \(z = x - x \, y\) and the change of variables theorem for multiple integrals. The transformation maps the \((x, y)\) region one-to-one and onto the region \(z \gt 0\), \(w \gt 0\). The Jacobian of the inverse transformation has magnitude \(\frac{1}{z + w}\). The transformed integral is \(\Gamma(a) \, \Gamma(b)\).

Recall that the gamma function is a generalization of the factorial function. The following result follows easily from a basic property of the gamma function.

If \(j \in \N_+\) and \(k \in \N_+\), then

\[ B(j, k) = \frac{(j - 1)! (k - 1)!}{(j + k - 1)!} \]

Let's generalize this result. First, recall the generalized permutation formula from our study of combinatorial structures: for \(a \in \R\), \(s \in \R\), and \(j \in \N\), we defined

\[ a^{(s, j)} = a (a + s)(a + 2 \, s) \cdots [a + (j - 1)s] \]

In particular, note that \(a^{(1, j)} = a (a + 1) \cdots [a + (j - 1)]\) is the ascending power of base \(a\) and order \(j\).

If \(a \gt 0\), \(b \gt 0\), \(j \in \N\), and \(k \in \N\) then

\[ \frac{B(a + j, b + k)}{B(a, b)} = \frac{a^{(1, j)} b^{(1, k)}}{(a + b)^{(1, j + k)}} \]

\(B(\frac{1}{2}, \frac{1}{2}) = \pi\).

A graph of \(B(a, b)\) on the square \(0 \lt a \lt 1\), \(0 \lt b \lt 1\) is shown below.

Graph of the beta function

The integral that defines the beta function can be generalized by changing the interval of integration from \((0, 1)\) to \((0, x)\) where \(x \in [0, 1]\). The resulting function of \(x\) is known as the incomplete beta function:

\[ B(x; a, b) = \int_0^x u^{a-1} (1 - u)^{b-1} du, \quad 0 \lt x \lt 1 \]

Of course, the ordinary beta function is \(B(a, b) = B(1; a, b)\).

The Probability Density Function

The function \(f\) below is a probability density function for every \(a \gt 0\), \(b \gt 0\).

\[ f(x) = \frac{1}{B(a, b)} x^{a-1} (1 - x)^{b-1}, \quad 0 \lt x \lt 1 \]

Of course, the beta function is simply the normalizing constant. The distribution with this probability density function is called the beta distribution with left parameter \(a\) and right parameter \(b\). The beta distribution is useful for modeling random probabilities and proportions, particularly in the context of Bayesian analysis. The distribution has just two parameters and yet a rich variety of shapes (because both parameters are shape parameters).

Sketch the graph of the beta probability density function. Note the qualitative differences in the shape of the density for the following parameter ranges:

  1. \(0 \lt a \lt 1\), \(0 \lt b \lt 1\)
  2. \(a = 1\), \(b = 1\)
  3. \(a = 1\), \(0 \lt b \lt 1\)
  4. \(0 \lt a \lt 1\), \(b = 1\)
  5. \(0 \lt a \lt 1\), \(b \gt 1\)
  6. \(a \gt 1\), \(0 \lt b \lt 1\)
  7. \(a = 1\), \(b \gt 1\)
  8. \(a \gt 1\), \(b = 1\)
  9. \(a \gt 1\), \(b \gt 1\)

From part (b) of the last exercise, note that the special case \(a = 1\) and \(b = 1\) gives the uniform distribution on the interval \((0, 1)\) (the standard uniform distribution. You should also have discovered that when \(a \lt 1\) or \(b \lt 1\), the probability density function is unbounded, and hence the distribution has no mode. On the other hand, if \(a \ge 1\), \(b \ge 1\), and one of the inequalites is strict, the distribution has a unique mode at

\[ \frac{a - 1}{a + b - 2} \]

In the special distribution simulator, select the beta distribution. Set the parameters to values in each of the ranges of the previous exercise. In each case, note the shape of the beta density function. Run the simulation 1000 times and note the apparent convergence of the empirical density function to the true density function.

The special case \(a = \frac{1}{2}\), \(b = \frac{1}{2}\) is the arcsine distribution (the name will be explained below). Thus, the probability density function of the arcsine distribution is

\[ f(x) = \frac{1}{\pi \, \sqrt{x (1 - x)}}, \quad 0 \lt x \lt 1 \]

The Distribution Function

The beta distribution function \(F\) can be easily expressed in terms of the incomplete beta function. As usual \(a\) denotes the left parameter and \(b\) the right parameter.

The beta distribution function with parameters \(a \gt 0\) and \(b \gt 0\) is

\[ F(x) = \frac{B(x; a, b)}{B(a, b)}, \quad 0 \lt x \lt 1 \]

The distribution function in the last exercise is sometimes known as the regularized incomplete beta function. In some special cases, the distribution function \(F\) and its inverse, the quantile function \(F^{-1}\), can be computed in closed form, without resorting to special functions.

If \(a \gt 0\) and \(b = 1\) then

  1. \(F(x) = x^a\) for \(0 \lt x \lt 1\)
  2. \(F^{-1}(p) = p^{1/a}\) for \(0 \lt p \lt 1\)

If \(a = 1\) and \(b \gt 0\) then

  1. \(F(x) = 1 - (1 - x)^b\) for \(0 \lt x \lt 1\)
  2. \(F^{-1}(p) = 1 - (1 - p)^{1/b}\) for \(0 \lt p \lt 1\)

If \(a = b = \frac{1}{2}\) (the arcsine distribution) then

  1. \(F(x) = \frac{2}{\pi} \arcsin(\sqrt{x})\) for \(0 \lt x \lt 1\)
  2. \(F^{-1}(p) = \sin^2(\frac{\pi}{2} p)\) for \(0 \lt p \lt 1\)
  3. The first quartile is \(\frac{2 - \sqrt{2}}{4} \approx 0.1465\)
  4. The median is \(\frac{1}{2}\)
  5. The third quartile is \(\frac{2 + \sqrt{2}}{4} \approx 0.8536\)

There is an interesting relationship between the distribution functions of the beta distribution and the binomial distribution, when the beta parameters are positive integers. To state the relationship we need to embellish our notation to indicate the dependence on the parameters. Thus, let \(F_{a, b}\) denote the beta distribution function with left parameter \(a \in (0, \infty)\) and right parameter \(b \in (0, \infty)\), and let \(G_{n,p}\) denote the binomial distribution function with trial parameter \(n \in \N_+\) and success parameter \(p \in (0, 1)\).

If \(j \in \N_+\), \(k \in \N_+\), and \(x \in (0, 1)\) then

\[ F_{j,k}(x) = G_{j + k - 1, 1 - x}(k - 1) \]
Proof:

Express \(F_{j,k}(x)\) as an integral of the beta probability density function. Integrate by parts to show that \(F_{j,k}(x) = \binom{j + k - 1}{j} x^j (1 - x)^{k-1} + F_{j + 1, k - 1}(x)\). Finally recall that \(F_{j + k - 1, 1}(x) = x^{j + k - 1}\).

In the special distribution calculator, select the beta distribution. Vary the parameters and note the shape of the density function and the distribution function. In each of the following cases, find the median, the first and third quartiles, and the interquartile range. Sketch the boxplot.

  1. \(a = 1\), \(b = 1\)
  2. \(a = 1\), \(b = 3\)
  3. \(a = 3\), \(b = 1\)
  4. \(a = 2\), \(b = 4\)
  5. \(a = 4\), \(b = 2\)
  6. \(a = 4\), \(b = 4\)

Moments

The moments of the beta distribution are easy to express in terms of the beta function. As before, suppose that \(X\) has the beta distribution with left parameter \(a \gt 0\) and right parameter \(b \gt 0\).

If \(k \ge 0\) then

\[ \E(X^k) = \frac{B(a + k, b)}{B(a, b)} \]

In particular, if \(k \in \N\),

\[ \E(X^k) = \frac{a^{(1, k)}}{(a + b)^{(1, k)}} \]

From the general formula for the moments, it's straightforward to compute the mean, variance, skewness, and kurtosis.

If \(X\) has the beta distribution with left parameter \(a \gt 0\) and right parameter \(b \gt 0\) then

  1. \(\E(X) = \frac{a}{a + b}\)
  2. \(\var(X) = \frac{a \, b}{(a + b)^2 (a + b + 1)}\)
  3. \(\skew(X) = \frac{2 \, (b - a) \sqrt{a + b + 1}}{(a + b + 2) \sqrt{a \, b}}\)
  4. \(\kurt(X) = \frac{3 \, a^3 \, b + 3 \, a \, b^3 + 6 \, a^2 \, b^2 + a^3 + b^3 + 13 \, a^2 \, b + 13 \, a \, b^2 + a^2 + b^2 + 14 \, a \, b}{a \, b \, (a + b + 2) \, (a + b + 3)}\)

In particular, if \(a = b = \frac{1}{2}\), so that \(X\) has the arcsine distribution, then \(\E(X) = \frac{1}{2}\), \(\var(X) = \frac{1}{8}\), \(\skew(X) = 0\), and \(\kurt(X) = \frac{33}{4}\).

In the simulation of the special distribution simulator, select the beta distribution. Set the parameters to values in each of the ranges of Exercise 8. In each case, note the size and location of the mean/standard deviation bar. In each case, run the simulation 1000 times and note the apparent convergence of the sample moments to the distribution moments.

Transformations

If \(X\) has the beta distribution with left parameter \(a \gt 0\) and right parameter \(b \gt 0\) then \(Y = 1 - X\) has the beta distribution with left parameter \(b\) and right parameter \(a\).

If \(X\) has the beta distribution with left parameter \(a \gt 0\) and right parameter \(b = 1\) then \(Y = 1 / X\) has the Pareto distribution with shape parameter \(a\).

Suppose that \(X\) has the gamma distribution with shape parameter \(a \gt 0\) and scale parameter \(r \gt 0\), that \(Y\) has the gamma distribution with shape parameter \(b\) and scale parameter \(r\), and that \(X\) and \(Y\) are independent. Then \(U = X / (X + Y)\) has the beta distribution with left parameter \(a\) and right parameter \(b\).

If \(X\) has the \(F\) distribution with \(m \gt 0\) degrees of freedom in the numerator and \(n \gt 0\) degrees of freedom in the denominator then

\[ U = \frac{(m / n) X}{1 + (m / n)X} \]

has the beta distribution with left parameter \(a = m / 2\) and right parameter \(b = n / 2\).

Suppose that \(X\) has the beta distribution with left parameter \(a \gt 0\) and right parameter \(b \gt 0\). Then the distribution is a two-parameter exponential family with natural parameters \(a - 1\) and \(b - 1\), and natural statistics \(\ln(X)\) and \(\ln(1 - X)\).

The beta distribution is also the distribution of the order statistics of a random sample from the uniform distribution.

The Generalized Beta Distribution

The beta distribution can be easily generalized from the support interval \((0, 1)\) to an arbitrary bounded interval using a linear transformation. Thus, this generalization is simply the location-scale family associated with the standard beta distribution. Specifically, suppose that \(Z\) has the standard beta distibution with left parameter \(a \gt 0\) and right parameter \(b \gt 0\). For \(c \in \R\) and \(d \in (0, \infty)\), let \(X = c + d \, Z\).

\(X\) has probability density function

\[ f(x) = \frac{1}{B(a, b) \, d^{a + b - 1}} (x - c)^{a - 1} (c + d - x)^{b - 1}, \quad c \lt x \lt c + d \]
Proof:

Use the change of variables theorem.

Most of the results in the previous sections have simple extensions to this generalized beta distribution. In particular, the mean and variance are given in the following exercise.

With \(X\) as defined above,

  1. \(\E(X) = c + d \, \frac{a}{a + b}\)
  2. \(\var(X) = d^2 \, \frac{a \, b}{(a + b)^2 \, (a + b + 1)}\)
Proof:

Use moment results for the standard beta distribution and basic properties of expected value and variance.