\(\newcommand{\P}{\mathbb{P}}\) \(\newcommand{\E}{\mathbb{E}}\) \(\newcommand{\R}{\mathbb{R}}\) \(\newcommand{\N}{\mathbb{N}}\) \(\newcommand{\bs}{\boldsymbol}\) \(\newcommand{\var}{\text{var}}\) \(\newcommand{\skew}{\text{skew}}\) \(\newcommand{\kurt}{\text{kurt}}\)
  1. Virtual Laboratories
  2. 10. Bernoulli Trials
  3. 1
  4. 2
  5. 3
  6. 4
  7. 5
  8. 6
  9. 7

3. The Geometric Distribution

Basic Theory

Suppose again that our random experiment is to perform a sequence of Bernoulli trials \(\bs{X} = (X_1, X_2, \ldots)\) with success parameter \(p \in (0, 1]\). In this section we will study the random variable \(N\) that gives the trial number of the first success:

\[ N = \min\{n \in \N_+: X_n = 1\} \]

The Probability Density Function

\(\P(N = n) = p \, (1 - p)^{n-1}\) for \(n \in \N_+\), and this defines a valid probability density function on \(\N_+\).

Proof:

Note first that \(\{N = n\} = \{X_1 = 0, \ldots, X_{n-1} = 0, X_n = 1\}\). By independence, the probability of this event is \((1 - p)^{n-1} \, p\). Standard results from geometric series show that \(\sum_{n=1}^\infty \P(N = n) = 1\).

A priori, we might have thought it possible to have \(N = \infty\) with positive probability; that is, we might have thought that we could run Bernoulli trials forever without ever seeing a success. However, Exercise 1 shows that this cannot happen when the success parameter \(p\) is positive. The distribution defined by the probability density function in Exercise 1 is known as the geometric distribution on \(\N_+\), with success parameter \(p\). The random variable \(M = N - 1\) is the number of failures before the first success, and takes values in \(\N\).

The probability density function of \(M\) is given by \(\P(M = n) = p \, (1 - p)^n, \quad n \in \N\).

The distribution of this random variable is known as the geometric distribution on \(\N\) with parameter \(p\). Clearly \(N\) and \(M\) give essentially the same information.

In the negative binomial experiment, set \(k = 1\) to get the geometric distribution on \(\N_+\). Vary \(p\) with the scroll bar and note the shape and location of the probability density function. For selected values of \(p\), run the simulation 1000 times. Watch the apparent convergence of the relative frequency function to the density function.

Distribution Functions and the Memoryless Property

\(\P(N \gt n) = (1 - p)^n\) for \(n \in \N\)

Proof:

The simplest proof is to note that \(\{N \gt n\} = \{X_1 = 0, \ldots, X_n = 0\}\). By independence, the probability of this event is \((1 - p)^n\). Another derivation is to sum the probability density function over \(\{n + 1, n + 2, \ldots\}\) and use geometric series.

From the Exercise 4, it follows that the distribution function of \(N\) is given by

\[ \P(N \le n) = 1 - (1 - p)^n, \quad n \in \N \]

Of course the distribution function \(n \mapsto \P(N \le n)\) and the complementary function \(n \mapsto \P(N \gt n)\) in Exercise 4 completely determine the distribution of \(N\). We will now explore another characterization known as the memoryless property.

\(\P(N \gt n + m \mid N \gt m) = \P(N \gt n)\) for \(m, \; n \in \N\).

Proof:

Use Exercise 4 and the definition of conditional probability.

The memoryless property is equivalent to the statement that the conditional distribution of \(N - m\) given \(N \gt m\) is the same as the distribution of \(N\). That is, if the first success has not occurred by trial number \(m\), then the remaining number of trials needed to achieve the first success has the same distribution as the trial number of the first success in a fresh sequence of Bernoulli trials. In short, Bernoulli trials have no memory. This fact has implications for a gambler betting on Bernoulli trials (such as in the casino games roulette or craps). No betting strategy based on observations of past outcomes of the trials can possibly help the gambler.

Conversely, if \(T\) is a random variable taking values in \(\N_+\) that satisfies the memoryless property, then \(T\) has a geometric distribution.

Proof:

Let \(G(n) = \P(T \gt n)\) for \(n \in \N\). The memoryless property and the definition of conditional probability imply that \(G(m + n) = G(m) G(n)\) for \(m, \; n \in \N\). Note that this is the law of exponents for \(G\). It follows that \(G(n) = G^n(1)\) for \(n \in \N\). Hence \(T\) has the geometric distribution with parameter \(p = 1 - G(1)\).

Moments

Suppose again that \(N\) is the trial number of the first success in a sequence of Bernoulli trials, so that \(N\) has the geometric distribution on \(\N_+\) with parameter \(p \in (0, 1]\). The mean of \(N\) can be computed in several different ways.

\(\E(N) = 1 / p\)

Proof:

The most direct approach is to use the definition \(\E(N) = \sum_{n=1}^\infty n \, \P(N = n)\) and the formula for the derivative of a geometric series. An easier computation is to use the alternate formula \(\E(N) = \sum_{n=0}^\infty \P(N \gt n)\) and the formula for the sum of a geometric series. A clever derivation is to condition on the outcome of the first trial to get \(\E(N) = 1 + (1 - p) \E(N)\).

the probability generating function of \(N\) is given by

\[ P(t) := \E(t^N) = \frac{p \, t}{1 - (1 - p) \, t}, \quad |t| \lt \frac{1}{1 - p} \]
Proof:

This result follows from yet another application of geometric series.

The factorial moments of \(N\) are given by

\[ \E[N^{(k)}] = k! \frac{(1 - p)^{k-1}}{p^k}, \quad k \in \N_+ \]
Proof:

Recall that \(\E[N^{(k)}] = P^{(k)}(1)\) where \(P\) is the probability generating function of \(N\).

The variance, skewness, and kurtosis of \(N\) are as follows:

  1. \(\var(N) = \frac{1 - p}{p^2}\)
  2. \(\skew(N) = \frac{2 - p}{\sqrt{1 - p}}\)
  3. \(\kurt(N) = \frac{p^2}{1 - p}\)
Proof:

The factorial moments can be used to find the moments of \(N\) about 0. Then the standard formulas for variance, skewness, and kurtosis can be used.

In the negative binomial experiment, set \(k = 1\) to get the geometric distribution. Vary \(p\) with the scroll bar and note the location and size of the mean/standard deviation bar. For selected values of \(p\), run the simulation 1000 times and watch the apparent convergence of the sample mean and standard deviation to the distribution mean and standard deviation.

Suppose now that \(M = N - 1\), so that \(M\) (the number of failures before the first success) has the geometric distribution on \(\N\). Then

  1. \(\E(M) = \frac{1 - p}{p}\)
  2. \(\var(M) = \frac{1 - p}{p^2}\)
  3. \(\skew(M) = \frac{2 - p}{\sqrt{1 - p}}\)
  4. \(\kurt(M) = \frac{p^2}{1 - p}\)
  5. \(\E(t^M) = \frac{p}{1 - (1 - p) \, t}\) for \(|t| \lt \frac{1}{1 - p}\)

Of course, the fact that the variance, skewness, and kurtosis are unchanged follows easily, since \(N\) and \(M\) differ by a constant.

The Quantile Function

Let \(F\) denote the distribution function of \(N\), so that \(F(n) = 1 - (1 - p)^n\) for \(n \in \N\). Recall that \(F^{-1}(r) = \min \{n \in \N_+: F(n) \ge r\}\) for \(r \in (0, 1)\) is the quantile function of \(N\).

The quantile function of \(N\) is

\[ F^{-1}(r) = \left\lceil \frac{\ln(1 - r)}{\ln(1 - p)}\right\rceil, \quad r \in (0, 1) \]

Of course, the quantile function, like the probability density function and the distribution function, completely determines the distribution of \(N\). Moreover, we can compute the median and quartiles to get measures of center and spread.

The first quartile, the median (or second quartile), and the third quartile are

  1. \(F^{-1}(\frac{1}{4}) = \lceil \ln(3/4) / \ln(1 - p)\rceil \approx \lceil-0.2877 / \ln(1 - p)\rceil\)
  2. \(F^{-1}(\frac{1}{2}) = \lceil \ln(1/2) / \ln(1 - p)\rceil \approx \lceil-0.6931 / \ln(1 - p)\rceil\)
  3. \(F^{-1}(\frac{3}{4}) = \lceil \ln(1/4) / \ln(1 - p)\rceil \approx \lceil-1.3863 / \ln(1 - p)\rceil\)

The Constant Rate Property

Suppose that \(T\) is a random variable taking values in \(\N_+\) which we interpret as the first time that some event of interest occurs. The function

\[ h(n) = \P(T = n \mid T \ge n) = \frac{\P(T = n)}{\P(T \ge n)}, \quad n \in \N_+ \]

will be called the rate function of \(T\). If \(T\) is interpreted as the (discrete) lifetime of a device, then \(h\) is a discrete version of the failure rate function studied in reliability theory. However, in our usual formulation of Bernoulli trials, the event of interest is success rather than failure (or death), so we will simply use the term rate function to avoid confusion. The constant rate property characterizes the geometric distribution.

As usual, let \(N\) denote the trial number of the first success in a sequence of Bernoulli trials with success parameter \(p \in (0, 1)\), so that \(N\) has the geometric distribution on \(\N_+\) with parameter \(p\). Then \(N\) has constant rate \(p\).

Conversely, if \(T\) has constant rate \(p \in (0, 1)\) then \(T\) has the geometric distrbution on \(\N_+\) with success parameter \(p\).

Proof:

Let \(H(n) = \P(T \ge n)\) for \(n \in \N_+\). From the constant rate property, \(\P(T = n) = p \, H(n)\) for \(n \in \N_+\). Next note that \(\P(T = n) = H(n) - H(n + 1)\) for \(n \in \N_+\). Thus, \(H\) satisfies the recurrence relation \(H(n + 1) = (1 - p) \, H(n)\) for \(n \in \N_+\). Also \(H\) satisfies the initial condition \(H(1) = 1\). Solving the recurrence relation gives \(H(n) = (1 - p)^{n-1}\) for \(n \in \N_+\).

Relation to the Uniform Distribution

Recall that \(Y_n\), the number of successes in the first \(n\) trials, has the binomial distribution with parameters \(n\) and \(p\).

The conditional distribution of \(N\) given \(Y_n = 1\) is uniform on \(\{1, 2, \ldots, n\}\).

Note that the conditional distribution does not depend on the success parameter \(p\). If we know that there is exactly one success in the first \(n\) trials, then the trial number of that success is equally likely to be any of the \(n\) possibilities.

Examples and Applications

A standard, fair die is thrown until an ace occurs. Let \(N\) denote the number of throws. Find each of the following:

  1. The probability density function of \(N\).
  2. The mean of \(N\).
  3. The variance of \(N\).
  4. The probability that the die will have to be thrown at least 5 times.
  5. The quantile function of \(N\).
  6. The median and the first and third quartiles.
Answwer:
  1. \(\P(N = n) = \left(\frac{5}{6}\right)^{n-1} \frac{1}{6}, \quad n \in \N_+\)
  2. \(\E(N) = 6\)
  3. \(\var(N) = 30\)
  4. \(\P(N \ge 5) = 525 / 1296\)
  5. \(F^{-1}(r) = \lceil \ln(1 - r) / \ln(5 / 6)\rceil, \quad r \in (0, 1)\)
  6. Quartiles \(q_1 = 2\), \(q_2 = 4\), \(q_3 = 8\)

A type of missile has failure probability 0.02. Let \(N\) denote the number of launches before the first failure. Find each of the following:

  1. The probability density function of \(N\).
  2. The mean of \(N\).
  3. The variance of \(N\).
  4. The probability of 20 consecutive successful launches.
  5. The quantile function of \(N\).
  6. The median and the first and third quartiles.
Answer:
  1. \(\P(N = n) \ \left(\frac{49}{50}\right)^{n-1} \frac{1}{50}, \quad n \in \N_+\)
  2. \(\E(N) = 50\)
  3. \(\var(N) = 2450\)
  4. \(\P(N \gt 20) = 0.6676\)
  5. \(F^{_1}(r) = \lceil \ln(1 - r) / \ln(0.98)\rceil, \quad r \in (0, 1)\)
  6. Quartiles \(q_1 = 15\), \(q_2 = 35\), \(q_3 = 69\)

A student takes a multiple choice test with 10 questions, each with 5 choices (only one correct). The student blindly guesses and gets one question correct. Find the probability that the correct question was one of the first 4.

Answer:

0.4

Recall that an American roulette wheel has 38 slots: 18 are red, 18 are black, and 2 are green. Suppose that you observe red or green on 10 consecutive spins. Give the conditional distribution of the number of additional spins needed for black to occur.

Answer:

Geometric with \(p = \frac{18}{38}\)

The game of roulette is studied in more detail in the chapter on Games of Chance.

In the negative binomial experiment, set \(k = 1\) to get the geometric distribution and set \(p = 0.3\). Run the experiment 1000 times. Compute the appropriate relative frequencies and empirically investigate the memoryless property

\[ \P(V \gt 5 \mid V \gt 2) = \P(V \gt 3) \]

The Petersburg Problem

We will now explore a gambling situation, known as the Petersburg problem, which leads to some famous and surprising results. Suppose that we are betting on a sequence of Bernoulli trials with success parameter \(p \in (0, 1)\). We can bet any amount of money on a trial at even stakes: if the trial results in success, we receive that amount, and if the trial results in failure, we must pay that amount. We will use the following strategy, known as a martingale strategy:

  1. We bet \(c\) units on the first trial.
  2. Whenever we lose a trial, we double the bet for the next trial.
  3. We stop as soon as we win a trial.

Let \(W\) denote our net winnings when we stop. Then \(W = c\) (with probability 1).

Thus, \(W\) is not random and \(W\) is independent of \(p\)! Since \(c\) is an arbitrary constant, it would appear that we have an ideal strategy. However, let us study the amount of money \(Z\) needed to play the strategy.

\(Z = c \, (2^N - 1)\) where as usual, \(N\) is the trial number of the first success.

The expected amount of money needed for the martingale strategy is

\[ \E(Z) = \begin{cases} \frac{c}{2 \, p - 1}, & p \gt \frac{1}{2} \\ \infty, & p \le \frac{1}{2} \end{cases} \]

Thus, the strategy is fatally flawed when the trials are unfavorable and even when they are fair, since we need infinite expected capital to make the strategy work in these cases.

Compute \(\E(Z)\) explicitly if \(c = 100\) and \(p = 0.55\).

Answer:

$1000

In the negative binomial experiment, set \(k = 1\). For each of the following values of \(p\), run the experiment 100 times. For each run compute \(Z\) (with \(c = 1\)). Find the average value of \(Z\) over the 100 runs:

  1. \(p = 0.2\)
  2. \(p = 0.5\)
  3. \(p = 0.8\)

For more information about gambling strategies, see the chapter on Red and Black.

The Alternating Coin-Tossing Game

A coin has probability of heads \(p \in (0, 1]\). There are \(n\) players who take turns tossing the coin in round-robin style: player 1 first, then player 2, ..., then player \(n\), then player 1 again, and so forth. The first player to toss heads wins the game.

Let \(N\) denote the number of the first toss that results in heads. Of course, \(N\) has the geometric distribution on \(\N_+\) with parameter \(p\). Additionally, let \(W\) denote the winner of the game; \(W\) takes values in the set \(\{1, 2, \ldots, n\}\). We are interested in the probability distribution of \(W\).

For \(i \in \{1, 2, \ldots, n\}\), \(W = i\) if and only if \(N = i + k \, n\) for some \(k \in \N\). That is, using modular arithmetic,

\[ W = [(N - 1) \mod n] + 1 \].

The winning player \(W\) has probability density function

\[ \P(W = i) = \frac{p \, (1 - p)^{i-1}}{1 - (1 - p)^n}, \quad i \in \{1, 2, \ldots, n\} \]
Proof:

This follows from the previous exercise and the geometric distribution of \(N\).

\(\P(W = i) = (1 - p)^{i-1} \P(W = 1)\) for \(i \in \{1, 2, \ldots, n\}\).

Proof:

This result can be argued directly, using the memoryless property of the geometric distribution. In order for player \(i\) to win, the previous \(i - 1\) players must first all toss tails. Then, player \(i\) effectively becomes the first player in a new sequence of tosses. This result can be used to give another derivation of the probability density function in the previous exercise.

Note that \(\P(W = i)\) is a decreasing function of \(i \in \{1, 2, \ldots, n\}\). Not surprisingly, the lower the toss order the better for the player.

Explicitly compute the probability density function of \(W\) when the coin is fair (\(p = 1 / 2\)).

Answer:

\(\P(W = i) = 2^{n-1} / (2^n - 1), \quad i \in \{1, 2, \ldots, n\}\)

Note from Exercise 29 that \(W\) itself has a truncated geometric distribution.

The distribution of \(W\) is the same as the conditional distribution of \(N\) given \(N \le n\):

\[ \P(W = i) = \P(N = i \mid N \le n), \quad i \in \{1, 2, \ldots, n\} \]

The following problems explore some limiting distributions related to the alternating coin-tossing game.

For fixed \(p \in (0, 1]\), the distribution of \(W\) converges to the geometric distribution with parameter \(p\) as \(n \uparrow \infty\).

For fixed \(n\), the distribution of \(W\) converges to the uniform distribution on \(\{1, 2, \ldots, n\}\) as \(p \downarrow 0\).

Players at the end of the tossing order should hope for a coin biased towards tails.

Odd Man Out

In the game of odd man out, we start with a specified number of players, each with a coin that has the same probability of heads. The players toss their coins at the same time. If there is an odd man, that is a player with an outcome different than all of the other players, then the odd player is eliminated; otherwise no player is eliminated. In any event, the remaining players continue the game in the same manner.

Suppose there are \(k \in \{2, 3, \ldots\}\) players and \(p \in [0, 1]\). In a single round, the probability of an odd man is

\[r_k(p) = \begin{cases} 2 p (1 - p), & k = 2 \\ k p (1 - p)^{k-1} + k p^{k-1} (1 - p), & k \in \{3, 4, \ldots\} \end{cases}\]
Proof:

Let \(Y\) denote the number of heads. If \(k = 2\), the event that there is an odd man is \(\{Y = 1\}\). If \(k \ge 3\), the event that there is an odd man is \(\{Y \in \{1, k - 1\}\}\). The result now follows since \(Y\) has a binomial distribution.

The graph of \(r_k\) is more interesting than you might think.

\(r_k\) has the following properties:

  1. \(r_k(0) = r_k(1) = 0\)
  2. \(r_k\) is symmetric about \(p = \frac{1}{2}\)
  3. For fixed \(p \in [0, 1]\), \(r_k(p) \to 0\) as \(k \to \infty\).
  4. For \(k \in \{2, 3, 4\}\), \(r_k\) the maximum occurs at \(p = \frac{1}{2}\) and there are no other local extrema.
  5. For \(k \ge 5\), the maximum occurs at two points of the form \(p_k\) and \(1 - p_k\) where \(p_k \to 0\) as \(k \to \infty\). The maximum value \(r_k(p_k) \to 1/e \approx 0.3679\) as \(k \to \infty\). The graph has a local minimum at \(p = \frac{1}{2}\)
Proof sketch:

Parts (a), (b), and (c) are clear. Part (d) follows by computing the derivatives: \(r_2^\prime(p) = 2 (1 - 2 p)\), \(r_3^\prime(p) = 3 (1 - 2 p)\), \(r_4^\prime(p) = 4 (1 - 2 p)^3\). For part (d), note that \(r_k(p) = s_k(p) + s_k(1 - p)\) where \(s_k(t) = k t^{k-1}(1 - t)\) for \(t \in [0, 1]\). Also, \(p \mapsto s_k(p)\) is the dominant term when \(p \gt \frac{1}{2}\) while \(p \mapsto s_k(1 - p)\) is the dominant term when \(p \lt \frac{1}{2}\). A simple analysis of the derivative shows that \(s_k\) increases and then decreases, reaching its maximum at \((k - 1) / k\). Moreover, the maximum value is \(s_k[(k - 1) / k] = (1 - 1 / k)^{k-1} \to e^{-1}\) as \(k \to \infty\).

Suppose \(p \in (0, 1)\), and let \(N_k\) denote the number of rounds until an odd man is eliminated, starting with \(k\) players. Then \(N_k\) has the geometric distribution on \(\N_+\) with parameter \(r_k(p)\). The mean and variance are

  1. \(\mu_k(p) = 1 / r_k(p)\)
  2. \(\sigma_k^2(p) = [1 - r_k(p)] / r_k^2(p)\)

As we might expect, \(\mu_k(p) \to \infty\) and \(\sigma_k^2(p) \to \infty\) as \(k \to \infty\) for fixed \(p \in (0, 1)\). On the other hand, from Exercise 36, \(\mu_k(p_k) \to e\) and \(\sigma_k^2(p_k) \to e^2 - e\) as \(k \to \infty\).

Suppose we start with \(k \in \{2, 3, \ldots\}\) players and \(p \in (0, 1)\). The number of rounds until a single player remains is \(M_k = \sum_{j = 2}^k N_j\) where \((N_2, N_3, \ldots, N_k)\) are independent and \(N_j\) has the geometric distribution on \(\N_+\) with parameter \(r_j(p)\). The mean and variance are

  1. \(\E(M_k) = \sum_{j=2}^k 1 / r_j(p)\)
  2. \(\var(M_k) = \sum_{j=2}^k [1 - r_j(p)] / r_j^2(p)\)
Proof:

The form of \(M_k\) follows from Exercise 37: \(N_k\) is the number of rounds until the first player is eliminated. Then the game continues independently with \(k - 1\) players, so \(N_{k-1}\) is the number of additional rounds until the second player is eliminated, and so forth. Parts (a) and (b) follow from Exercise 37 and standard properties of expected value and variance.

Starting with \(k\) players and probability of heads \(p \in (0, 1)\), the total number of coin tosses is \(T_k = \sum_{j=2}^k j N_j\). The mean and variance are

  1. \(\E(T_k) = \sum_{j=2}^k j / r_j(p)\)
  2. \(\var(T_k) = \sum_{j=2}^k j^2 [1 - r_j(p)] / r_j^2(p)\)
Proof:

As before, the form of \(M_k\) follows from Exercise 37: \(N_k\) is the number of rounds until the first player is eliminated, and each these rounds has \(k\) tosses. Then the game continues independently with \(k - 1\) players, so \(N_{k-1}\) is the number of additional rounds until the second player is eliminated with each round having \(k - 1\) tosses, and so forth. Parts (a) and (b) follow from Exercise 37 and standard properties of expected value and variance.