\(\newcommand{\P}{\mathbb{P}}\) \(\newcommand{\E}{\mathbb{E}}\) \(\newcommand{\R}{\mathbb{R}}\) \(\newcommand{\N}{\mathbb{N}}\) \(\newcommand{\bs}{\boldsymbol}\) \(\newcommand{\var}{\text{var}}\) \(\newcommand{\cov}{\text{cov}}\) \(\newcommand{\cor}{\text{cor}}\)
  1. Random
  2. 11. Finite Sampling Models
  3. 1
  4. 2
  5. 3
  6. 4
  7. 5
  8. 6
  9. 7
  10. 8
  11. 9

9. The Secretary Problem

In this section we will study a nice problem known variously as the secretary problem or the marriage problem. It is simple to state and not difficult to solve, but the solution is interesting and a bit surprising. Also, the problem serves as a nice introduction to the general area of statistical decision making.

Statement of the Problem

As always, we must start with a clear statement of the problem.

We have \(n\) candidates (perhaps applicants for a job or possible marriage partners). The assumptions are

  1. The candidates are totally ordered from best to worst with no ties.
  2. The candidates arrive sequentially in random order.
  3. We can only determine the relative ranks of the candidates as they arrive. We cannot observe the absolute ranks.
  4. Our goal is choose the very best candidate; no one less will do.
  5. Once a candidate is rejected, she is gone forever and cannot be recalled.
  6. The number of candidates \(n\) is known.

The assumptions, of course, are not entirely reasonable in real applications. The last assumption, for example, that \(n\) is known, is more appropriate for the secretary interpretation than for the marriage interpretation.

What is an optimal strategy? What is the probability of success with this strategy? What happens to the strategy and the probability of success as \(n\) increases? In particular, when \(n\) is large, is there any reasonable hope of finding the best candidate?


Play the secretary game several times with \(n = 10\) candidates. See if you can find a good strategy just by trial and error.

After playing the secretary game a few times, it should be clear that the only reasonable type of strategy is to let a certain number \(k - 1\) of the candidates go by, and then select the first candidate we see who is better than all of the previous candidates (if she exists). If she does not exist (that is, if no candidate better than all previous candidates appears), we will agree to accept the last candidate, even though this means failure. The parameter \(k\) must be between 1 and \(n\); if \(k = 1\), we select the first candidate; if \(k = n\), we select the last candidate; for any other value of \(k\), the selected candidate is random, distributed on \(\{k, k + 1, \ldots, n\}\). We will refer to this let \(k - 1\) go by strategy as strategy \(k\).

Thus, we need to compute the probability of success \(p_n(k)\) using strategy \(k\) with \(n\) candidates. Then we can maximize the probability over \(k\) to find the optimal strategy, and then take the limit over \(n\) to study the asymptotic behavior.


First, let's do some basic computations.

For the case \(n = 3\), list the 6 permutations of \(\{1, 2, 3\}\) and verify the probabilities in the table below. Note that \(k = 2\) is optimal.

\(k\) 1 2 3
\(p_3(k)\) \(\frac{2}{6}\) \(\frac{3}{6}\) \(\frac{2}{6}\)

In the secretary experiment, set the number of candidates to \(n = 3\). Run the experiment 1000 times with each strategy \( k \in \{1, 2, 3\} \)

For the case \(n = 4\), list the 24 permutations of \(\{1, 2, 3, 4\}\) and verify the probabilities in the table below. Note that \(k = 2\) is optimal.

\(k\) 1 2 3 4
\(p_4(k)\) \(\frac{6}{24}\) \(\frac{11}{24}\) \(\frac{10}{24}\) \(\frac{6}{24}\)

In the secretary experiment, set the number of candidates to \(n = 4\). Run the experiment 1000 times with each strategy \( k \in \{1, 2, 3, 4\} \)

For the case \(n = 5\), list the 120 permutations of \(\{1, 2, 3, 4, 5\}\) and verify the probabilities in the table below. Note that \(k = 3\) is optimal.

\(k\) 1 2 3 4 5
\(p_5(k)\) \(\frac{24}{120}\) \(\frac{50}{120}\) \(\frac{52}{120}\) \(\frac{42}{120}\) \(\frac{24}{120}\)

In the secretary experiment, set the number of candidates to \(n = 5\). Run the experiment 1000 times with each strategy \( k \in \{1, 2, 3, 4, 5\} \)

Well, clearly we don't want to keep doing this. Let's see if we can find a general analysis. With \(n\) candidates, let \(X_n\) denote the number (arrival order) of the best candidate, and let \(S_{n,k}\) denote the event of success for strategy \(k\) (we select the best candidate).

\(X_n\) is uniformly distributed on \(\{1, 2, \ldots, n\}\).


This follows since the candidates arrive in random order.

Next we will compute the conditional probability of success given the arrival order of the best candidate.

For \( n \in \N_+ \) and \( k \in \{2, 3, \ldots, n\} \), \[ \P(S_{n,k} \mid X_n = j) = \begin{cases} 0, & j \in \{1, 2, \ldots, k-1\} \\ \frac{k-1}{j-1}, & j \in \{k, k + 1, \ldots, n\} \end{cases} \]


For the first case, note that if the arrival number of the best candidate is \(j \lt k\), then strategy \(k\) will certainly fail. For the second cases, note that if the arrival order of the best candidate is \(j \ge k\), then strategy \(k\) will succeed if and only if one of the first \(k - 1\) candidates (the ones that are automatically rejected) is the best among the first \(j - 1\)

The two cases are illustrated below. The large dot indicates the best candidate. Red dots indicate candidates that are rejected out of hand, while blue dots indicate candidates that are considered.

The case when \( X_n = j \lt k \)
Image: Success1.png
The case when \( X_n = j \ge k \)
Image: Success2.png

Now we can compute the probability of success with strategy \(k\).

For \( n \in \N_+ \) \[ p_n(k) = \P(S_{n,k}) = \begin{cases} \frac{1}{n}, & k = 1 \\ \frac{k - 1}{n} \sum_{j=k}^n \frac{1}{j - 1}, & k \in \{2, 3, \ldots, n\} \end{cases} \]


When \( k = 1 \) we simply select the first candidate. This candidate will be the best one with probability \( 1 / n \). The result for \( k \in \{2, 3, \ldots, n\} \) follows from the previous two results, by conditioning on \(X_n\): \[ \P(S_{n,k}) = \sum_{j=1}^n \P(X_n = j) \P(S_{n,k} \mid X_n = j) = \sum_{j=k}^n \frac{1}{n} \frac{k - 1}{j - 1} \]

Values of the function \(p_n\) can be computed by hand for small \(n\) and by a computer algebra system for moderate \(n\). The graph of \(p_{100}\) is shown below. Note the concave downward shape of the graph and the optimal value of \(k\), which turns out to be 38. The optimal probability is about 0.37104.

The graph of \( p_{100} \)
Image: Strategyk.png

The optimal strategy \(k_n\) that maximizes \(k \mapsto p_n(k)\), the ratio \(k_n / n\), and the optimal probability \(p_n(k_n)\) of finding the best candidate, as functions of \(n \in \{3, 4, \dots, 20\}\) are given in the following table:

Candidates \(n\) Optimal strategy \(k_n\) Ratio \(k_n / n\) Optimal probability \(p_n(k_n)\)
3 2 0.6667 0.5000
4 2 0.5000 0.4583
5 3 0.6000 0.4333
6 3 0.5000 0.4278
7 3 0.4286 0.4143
8 4 0.5000 0.4098
9 4 0.4444 0.4060
10 4 0.4000 0.3987
11 5 0.4545 0.3984
12 5 0.4167 0.3955
13 6 0.4615 0.3923
14 6 0.4286 0.3917
15 6 0.4000 0.3894
16 7 0.4375 0.3881
17 7 0.4118 0.3873
18 7 0.3889 0.3854
19 8 0.4211 0.3850
20 8 0.4000 0.3842

Apparently, as we might expect, the optimal strategy \(k_n\) increases and the optimal probability \(p_n(k_n)\) decreases as \(n \to \infty\). On the other hand, it's encouraging, and a bit surprising, that the optimal probability does not appear to be decreasing to 0. It's perhaps least clear what's going on with the ratio. Graphical displays of some of the information in the table may help:

The optimal probability \( p_n(k_n) \)
The optimal ratio \( k_n / n \)

Could it be that the ratio \(k_n / n\) and the probability \(p_n(k_n)\) are both converging, and moreover, are converging to the same number? First let's try to establish rigorously some of the trends observed in the table.

The success probability \(p_n\) satisfies \[ p_n(k - 1) \lt p_n(k) \text{ if and only if } \sum_{j=k}^n \frac{1}{j-1} \gt 1 \]

It follows that for each \(n \in \N_+\), the function \(p_n\) at first increases and then decreases. The maximum value of \(p_n\) occurs at the largest \(k\) with \(\sum_{j=k}^n \frac{1}{j - 1} \gt 1\). This is the optimal strategy with \(n\) candidates, which we have denoted by \(k_n\).

As \(n\) increases, \(k_n\) increases and the optimal probability \(p_n(k_n)\) decreases.

Asymptotic Analysis

We are naturally interested in the asymptotic behavior of the function \(p_n\), and the optimal strategy as \(n \to \infty\). The key is recognizing \(p_n\) as a Riemann sum for a simple integral. (Riemann sums, of course, are named for Georg Riemann.)

If \(k(n)\) depends on \(n\) and \(k(n) / n \to x \in (0, 1)\) as \(n \to \infty\) then \(p_n[k(n)] \to -x \ln(x)\)as \(n \to \infty\).


We give an argument that is not completely rigorous, but captures the general ideas. First note that \[ p_n(k) = \frac{k-1}{n} \sum_{j=k}^n \frac{1}{n} \frac{n}{j-1} \] We recognize the sum above as the left Riemann sum for the the function \(f(t) = \frac{1}{t}\) corresponding to the partition of the interval \(\left[\frac{k-1}{n}, 1\right]\) into \((n - k) + 1\) subintervals of length \(\frac{1}{n}\) each: \(\left(\frac{k-1}{n}, \frac{k}{n}, \ldots, \frac{n-1}{n}, 1\right)\). It follows that \[ p_n(k) \approx -\frac{k-1}{n} \ln\left(\frac{k-1}{n}\right) \] If \(k / n \to x \in (0, 1)\) as \(n \to \infty\) then the expression on the right converges to \(-x \, \ln(x)\) as \(n \to \infty\).

The graph below shows the true probabilities \(p_n(k)\) and the limiting values \(-\frac{k}{n} \, \ln\left(\frac{k}{n}\right)\) as a function of \(k\) with \(n = 100\).

True and approximate probabilities of success as a function of \( k \) with \( n = 100 \)
Image: Strategyk2.png

For the optimal strategy \(k_n\), there exists \(x_0 \in (0, 1)\) such that \(k_n / n \to x_0\) as \(n \to \infty\). Thus, \(x_0 \in (0, 1)\) is the limiting proportion of the candidates that we reject out of hand. Moreover, \(x_0\) maximizes \(x \mapsto -x \ln(x)\) on \((0, 1)\).

The maximum value of \(-x \ln(x)\) occurs at \(x_0 = 1 / e\) and the maximum value is also \(1 / e\).

The graph of \( x \ln(x) \) on the interval \( (0, 1) \)
Image: StrategyLimit.png

Thus, the magic number \(1 / e \approx 0.37104\) occurs twice in the problem. For large \(n\):

The article Who Solved the Secretary Problem? by Tom Ferguson (1989) has an interesting historical discussion of the problem, including speculation that Johannes Kepler may have used the optimal strategy to choose his second wife. The article also discusses several generalizations of the problem.