\(\newcommand{\P}{\mathbb{P}}\) \(\newcommand{\E}{\mathbb{E}}\) \(\newcommand{\var}{\text{var}}\) \(\newcommand{\N}{\mathbb{N}}\)
  1. Virtual Laboratories
  2. 4. Special Distributions
  3. 1
  4. 2
  5. 3
  6. 4
  7. 5
  8. 6
  9. 7
  10. 8
  11. 9
  12. 10
  13. 11
  14. 12
  15. 13
  16. 14
  17. 15
  18. Answers

10. The Zeta Distribution

The zeta distribution is used to model the size or ranks of certain types of objects randomly chosen from certain types of populations. Typical examples include the frequency of occurrence of a word randomly chosen from a text, or the population rank of a city randomly chosen from a country. The zeta distribution is also known as the Zipf distribution, in honor of the American linguist George Zipf.

The Zeta Function

The Riemann zeta function \(\zeta\), named after Bernhard Riemann, is defined as follows:

\[ \zeta(a) = \sum_{n=1}^\infty \frac{1}{n^a}, \quad a \gt 1 \]

You might recall from calculus that the series in the zeta function converges for \(a \gt 1\) and diverges for \(a \le 1\). A graph of the zeta function on the interval \([1, 10]\) is given below:

Graph of the zeta function

The zeta function satifies the following properties:

  1. \(\zeta\) is decreasing.
  2. \(\zeta\) is concave upward.
  3. \(\zeta(a) \downarrow 1\) as \(a \uparrow \infty\)
  4. \(\zeta(a) \uparrow \infty\) as \(a \downarrow 1\)

The zeta function is transcendental, and most of its values must be approximated. However, \(\zeta(a)\) can be given explicitly for even integer values of \(a\); in particular, \(\zeta(2) = \frac{\pi^2}{6}\) and \(\zeta(4) = \frac{\pi^4}{90}\).

The Probability Density Function

The function \(f\) given below is probability density function for any \(a \gt 1\).

\[ f(n) = \frac{1}{\zeta(a) \, n^a}, \quad n \in \N_+ \]

The discrete distribution defined by the probability density function in Exercise 2 is called the zeta distribution with parameter \(a\). In an algebraic sense, the zeta distribution is a discrete version of the Pareto distribution.

Let \(X\) denote the frequency of occurrence of a word chosen at random from a certain text, and suppose that \(X\) has the zeta distribution with parameter \(a = 2\). Find \(\P(X \gt 4)\).

Suppose that \(X\) has the zeta distribution with parameter \(a\). Then the distribution is a one-parameter exponential family with natural parameter \(a\) and natural statistic \(-\ln(X)\).

Moments

The moments of the zeta distribution can be expressed easily in terms of the zeta function.

Suppose that \(X\) has the zeta distribution with parameter \(a\) and that \(k \ge 0\). Then

  1. \(\E(X^k) = \infty\) if \(a \le k + 1\)
  2. \(\E(X^k) = \frac{\zeta(a - k)}{\zeta(a)}\) if \(a \gt k + 1\)

In particular, the mean and variance of \(X\) are

  1. \(\E(X) = \frac{\zeta(a - 1)}{\zeta(a)}\) if \(a \gt 2\)
  2. \(\var(X) = \frac{\zeta(a - 2)}{\zeta(a)} - \left(\frac{\zeta(a - 1)}{\zeta(a)}\right)^2\) if \(a \ge 3\).

Let \(X\) denote the frequency of occurrence of a word chosen at random from a certain text, and suppose that \(X\) has the zeta distribution with parameter \(a = 4\). Approximate each of the following:

  1. \(\E(X)\)
  2. \(\var(X)\)