Suppose again that our random experiment is to perform a sequence of Bernoulli trials \(\bs{X} = (X_1, X_2, \ldots)\) with success parameter \(p \in (0, 1]\). In this section we will study the random variable \(N\) that gives the trial number of the first success:
\[ N = \min\{n \in \N_+: X_n = 1\} \]\(\P(N = n) = p \, (1 - p)^{n-1}\) for \(n \in \N_+\), and this defines a valid probability density function on \(\N_+\).
Note first that \(\{N = n\} = \{X_1 = 0, \ldots, X_{n-1} = 0, X_n = 1\}\). By independence, the probability of this event is \((1 - p)^{n-1} \, p\). Standard results from geometric series show that \(\sum_{n=1}^\infty \P(N = n) = 1\).
A priori, we might have thought it possible to have \(N = \infty\) with positive probability; that is, we might have thought that we could run Bernoulli trials forever without ever seeing a success. However, Exercise 1 shows that this cannot happen when the success parameter \(p\) is positive. The distribution defined by the probability density function in Exercise 1 is known as the geometric distribution on \(\N_+\), with success parameter \(p\). The random variable \(M = N - 1\) is the number of failures before the first success, and takes values in \(\N\).
The probability density function of \(M\) is given by \(\P(M = n) = p \, (1 - p)^n, \quad n \in \N\).
The distribution of this random variable is known as the geometric distribution on \(\N\) with parameter \(p\). Clearly \(N\) and \(M\) give essentially the same information.
In the negative binomial experiment, set \(k = 1\) to get the geometric distribution on \(\N_+\). Vary \(p\) with the scroll bar and note the shape and location of the probability density function. For selected values of \(p\), run the simulation 1000 times. Watch the apparent convergence of the relative frequency function to the density function.
\(\P(N \gt n) = (1 - p)^n\) for \(n \in \N\)
The simplest proof is to note that \(\{N \gt n\} = \{X_1 = 0, \ldots, X_n = 0\}\). By independence, the probability of this event is \((1 - p)^n\). Another derivation is to sum the probability density function over \(\{n + 1, n + 2, \ldots\}\) and use geometric series.
From the Exercise 4, it follows that the distribution function of \(N\) is given by
\[ \P(N \le n) = 1 - (1 - p)^n, \quad n \in \N \]Of course the distribution function \(n \mapsto \P(N \le n)\) and the complementary function \(n \mapsto \P(N \gt n)\) in Exercise 4 completely determine the distribution of \(N\). We will now explore another characterization known as the memoryless property.
\(\P(N \gt n + m \mid N \gt m) = \P(N \gt n)\) for \(m, \; n \in \N\).
Use Exercise 4 and the definition of conditional probability.
The memoryless property is equivalent to the statement that the conditional distribution of \(N - m\) given \(N \gt m\) is the same as the distribution of \(N\). That is, if the first success has not occurred by trial number \(m\), then the remaining number of trials needed to achieve the first success has the same distribution as the trial number of the first success in a fresh sequence of Bernoulli trials. In short, Bernoulli trials have no memory. This fact has implications for a gambler betting on Bernoulli trials (such as in the casino games roulette or craps). No betting strategy based on observations of past outcomes of the trials can possibly help the gambler.
Conversely, if \(T\) is a random variable taking values in \(\N_+\) that satisfies the memoryless property, then \(T\) has a geometric distribution.
Let \(G(n) = \P(T \gt n)\) for \(n \in \N\). The memoryless property and the definition of conditional probability imply that \(G(m + n) = G(m) G(n)\) for \(m, \; n \in \N\). Note that this is the law of exponents for \(G\). It follows that \(G(n) = G^n(1)\) for \(n \in \N\). Hence \(T\) has the geometric distribution with parameter \(p = 1 - G(1)\).
Suppose again that \(N\) is the trial number of the first success in a sequence of Bernoulli trials, so that \(N\) has the geometric distribution on \(\N_+\) with parameter \(p \in (0, 1]\). The mean of \(N\) can be computed in several different ways.
\(\E(N) = 1 / p\)
The most direct approach is to use the definition \(\E(N) = \sum_{n=1}^\infty n \, \P(N = n)\) and the formula for the derivative of a geometric series. An easier computation is to use the alternate formula \(\E(N) = \sum_{n=0}^\infty \P(N \gt n)\) and the formula for the sum of a geometric series. A clever derivation is to condition on the outcome of the first trial to get \(\E(N) = 1 + (1 - p) \E(N)\).
the probability generating function of \(N\) is given by
\[ P(t) := \E(t^N) = \frac{p \, t}{1 - (1 - p) \, t}, \quad |t| \lt \frac{1}{1 - p} \]This result follows from yet another application of geometric series.
The factorial moments of \(N\) are given by
\[ \E[N^{(k)}] = k! \frac{(1 - p)^{k-1}}{p^k}, \quad k \in \N_+ \]Recall that \(\E[N^{(k)}] = P^{(k)}(1)\) where \(P\) is the probability generating function of \(N\).
The variance, skewness, and kurtosis of \(N\) are as follows:
The factorial moments can be used to find the moments of \(N\) about 0. Then the standard formulas for variance, skewness, and kurtosis can be used.
In the negative binomial experiment, set \(k = 1\) to get the geometric distribution. Vary \(p\) with the scroll bar and note the location and size of the mean/standard deviation bar. For selected values of \(p\), run the simulation 1000 times and watch the apparent convergence of the sample mean and standard deviation to the distribution mean and standard deviation.
Suppose now that \(M = N - 1\), so that \(M\) (the number of failures before the first success) has the geometric distribution on \(\N\). Then
Of course, the fact that the variance, skewness, and kurtosis are unchanged follows easily, since \(N\) and \(M\) differ by a constant.
Let \(F\) denote the distribution function of \(N\), so that \(F(n) = 1 - (1 - p)^n\) for \(n \in \N\). Recall that \(F^{-1}(r) = \min \{n \in \N_+: F(n) \ge r\}\) for \(r \in (0, 1)\) is the quantile function of \(N\).
The quantile function of \(N\) is
\[ F^{-1}(r) = \left\lceil \frac{\ln(1 - r)}{\ln(1 - p)}\right\rceil, \quad r \in (0, 1) \]Of course, the quantile function, like the probability density function and the distribution function, completely determines the distribution of \(N\). Moreover, we can compute the median and quartiles to get measures of center and spread.
The first quartile, the median (or second quartile), and the third quartile are
Suppose that \(T\) is a random variable taking values in \(\N_+\) which we interpret as the first time that some event of interest occurs. The function
\[ h(n) = \P(T = n \mid T \ge n) = \frac{\P(T = n)}{\P(T \ge n)}, \quad n \in \N_+ \]will be called the rate function of \(T\). If \(T\) is interpreted as the (discrete) lifetime of a device, then \(h\) is a discrete version of the failure rate function studied in reliability theory. However, in our usual formulation of Bernoulli trials, the event of interest is success rather than failure (or death), so we will simply use the term rate function to avoid confusion. The constant rate property characterizes the geometric distribution.
As usual, let \(N\) denote the trial number of the first success in a sequence of Bernoulli trials with success parameter \(p \in (0, 1)\), so that \(N\) has the geometric distribution on \(\N_+\) with parameter \(p\). Then \(N\) has constant rate \(p\).
Conversely, if \(T\) has constant rate \(p \in (0, 1)\) then \(T\) has the geometric distrbution on \(\N_+\) with success parameter \(p\).
Let \(H(n) = \P(T \ge n)\) for \(n \in \N_+\). From the constant rate property, \(\P(T = n) = p \, H(n)\) for \(n \in \N_+\). Next note that \(\P(T = n) = H(n) - H(n + 1)\) for \(n \in \N_+\). Thus, \(H\) satisfies the recurrence relation \(H(n + 1) = (1 - p) \, H(n)\) for \(n \in \N_+\). Also \(H\) satisfies the initial condition \(H(1) = 1\). Solving the recurrence relation gives \(H(n) = (1 - p)^{n-1}\) for \(n \in \N_+\).
Recall that \(Y_n\), the number of successes in the first \(n\) trials, has the binomial distribution with parameters \(n\) and \(p\).
The conditional distribution of \(N\) given \(Y_n = 1\) is uniform on \(\{1, 2, \ldots, n\}\).
Note that the conditional distribution does not depend on the success parameter \(p\). If we know that there is exactly one success in the first \(n\) trials, then the trial number of that success is equally likely to be any of the \(n\) possibilities.
A standard, fair die is thrown until an ace occurs. Let \(N\) denote the number of throws. Find each of the following:
A type of missile has failure probability 0.02. Let \(N\) denote the number of launches before the first failure. Find each of the following:
A student takes a multiple choice test with 10 questions, each with 5 choices (only one correct). The student blindly guesses and gets one question correct. Find the probability that the correct question was one of the first 4.
Recall that an American roulette wheel has 38 slots: 18 are red, 18 are black, and 2 are green. Suppose that you observe red or green on 10 consecutive spins. Give the conditional distribution of the number of additional spins needed for black to occur.
The game of roulette is studied in more detail in the chapter on Games of Chance.
In the negative binomial experiment, set \(k = 1\) to get the geometric distribution and set \(p = 0.3\). Run the experiment 1000 times. Compute the appropriate relative frequencies and empirically investigate the memoryless property
\[ \P(V \gt 5 \mid V \gt 2) = \P(V \gt 3) \]We will now explore a gambling situation, known as the Petersburg problem, which leads to some famous and surprising results. Suppose that we are betting on a sequence of Bernoulli trials with success parameter \(p \in (0, 1)\). We can bet any amount of money on a trial at even stakes: if the trial results in success, we receive that amount, and if the trial results in failure, we must pay that amount. We will use the following strategy, known as a martingale strategy:
Let \(W\) denote our net winnings when we stop. Then \(W = c\) (with probability 1).
Thus, \(W\) is not random and \(W\) is independent of \(p\)! Since \(c\) is an arbitrary constant, it would appear that we have an ideal strategy. However, let us study the amount of money \(Z\) needed to play the strategy.
\(Z = c \, (2^N - 1)\) where as usual, \(N\) is the trial number of the first success.
The expected amount of money needed for the martingale strategy is
\[ \E(Z) = \begin{cases} \frac{c}{2 \, p - 1}, & p \gt \frac{1}{2} \\ \infty, & p \le \frac{1}{2} \end{cases} \]Thus, the strategy is fatally flawed when the trials are unfavorable and even when they are fair, since we need infinite expected capital to make the strategy work in these cases.
Compute \(\E(Z)\) explicitly if \(c = 100\) and \(p = 0.55\).
In the negative binomial experiment, set \(k = 1\). For each of the following values of \(p\), run the experiment 100 times. For each run compute \(Z\) (with \(c = 1\)). Find the average value of \(Z\) over the 100 runs:
For more information about gambling strategies, see the chapter on Red and Black.
A coin has probability of heads \(p \in (0, 1]\). There are \(n\) players who take turns tossing the coin in round-robin style: player 1 first, then player 2, ..., then player \(n\), then player 1 again, and so forth. The first player to toss heads wins the game.
Let \(N\) denote the number of the first toss that results in heads. Of course, \(N\) has the geometric distribution on \(\N_+\) with parameter \(p\). Additionally, let \(W\) denote the winner of the game; \(W\) takes values in the set \(\{1, 2, \ldots, n\}\). We are interested in the probability distribution of \(W\).
For \(i \in \{1, 2, \ldots, n\}\), \(W = i\) if and only if \(N = i + k \, n\) for some \(k \in \N\). That is, using modular arithmetic,
\[ W = [(N - 1) \mod n] + 1 \].The winning player \(W\) has probability density function
\[ \P(W = i) = \frac{p \, (1 - p)^{i-1}}{1 - (1 - p)^n}, \quad i \in \{1, 2, \ldots, n\} \]This follows from the previous exercise and the geometric distribution of \(N\).
\(\P(W = i) = (1 - p)^{i-1} \P(W = 1)\) for \(i \in \{1, 2, \ldots, n\}\).
This result can be argued directly, using the memoryless property of the geometric distribution. In order for player \(i\) to win, the previous \(i - 1\) players must first all toss tails. Then, player \(i\) effectively becomes the first player in a new sequence of tosses. This result can be used to give another derivation of the probability density function in the previous exercise.
Note that \(\P(W = i)\) is a decreasing function of \(i \in \{1, 2, \ldots, n\}\). Not surprisingly, the lower the toss order the better for the player.
Explicitly compute the probability density function of \(W\) when the coin is fair (\(p = 1 / 2\)).
Note from Exercise 29 that \(W\) itself has a truncated geometric distribution.
The distribution of \(W\) is the same as the conditional distribution of \(N\) given \(N \le n\):
\[ \P(W = i) = \P(N = i \mid N \le n), \quad i \in \{1, 2, \ldots, n\} \]The following problems explore some limiting distributions related to the alternating coin-tossing game.
For fixed \(p \in (0, 1]\), the distribution of \(W\) converges to the geometric distribution with parameter \(p\) as \(n \uparrow \infty\).
For fixed \(n\), the distribution of \(W\) converges to the uniform distribution on \(\{1, 2, \ldots, n\}\) as \(p \downarrow 0\).
Players at the end of the tossing order should hope for a coin biased towards tails.