]> The Secretary Problem
  1. Virtual Laboratories
  2. 12. Finite Sampling Models
  3. 1
  4. 2
  5. 3
  6. 4
  7. 5
  8. 6
  9. 7
  10. 8
  11. 9

9. The Secretary Problem

In this section we will study a wonderful problem known variously as the secretary problem or the marriage problem. It is simple to state and not difficult to solve, but the solution is interesting and a bit surprising. Also, the problem serves as a nice introduction to the general area of statistical decision making.

Statement of the Problem

As always, we must start with a clear statement of the problem. We have n candidates (perhaps applicants for a job or possible marriage partners). Here are our assumptions:

The assumptions, of course, are not entirely reasonable in real applications. The last assumption, for example, that n is known, is more appropriate for the secretary interpretation than for the marriage interpretation.

What is an optimal strategy? What is the probability of success with this strategy? What happens to the strategy and the probability of success as n ? In particular, when n is large, is there any reasonable hope of finding the best candidate?

Strategies

Play the secretary game several times with n 10 candidates. See if you can find a good strategy just by trial and error.

After playing the secretary game a few times, it should be clear that the only reasonable type of strategy is to let a certain number k 1 of the candidates go by, and then select the first candidate we see who is better than all of the previous candidates (if she exists). If she does not exist (that is, if no candidate better than all previous candidates appears), we will agree to accept the last candidate, even though this means failure. The parameter k must be between 1 and n ; if k 1 , we select the first candidate; if k n , we select the last candidate; for any other value of k , the selected candidate is random, distributed on k k 1 n . We will refer to this let k 1 go by strategy as strategy k

Thus, we need to compute the probability of success p n k using strategy k with n candidates. Then we can maximize the probability over k to find the optimal strategy, and then take the limit over n to study the asymptotics.

Analysis

First, let's do some basic computations.

For the case n 3 , list the 6 permutations of 1 2 3 . and verify the probabilities in the table below. Note that k 2 is optimal.

k 1 2 3
p 3 k 13 12 13

Play the secretary game several times with n 3 candidates using the following strategies. See if you can detect a difference in the performance.

  1. Strategy 1
  2. Strategy 2
  3. Strategy 3

For the case n 4 , list the 24 permutations of 1 2 3 4 . and verify the probabilities in the table below. Note that k 2 is optimal.

k 1 2 3 4
p 4 k 624 1124 1024 624

Play the secretary game several times with n 4 candidates using the following strategies. See if you can detect a difference in the performance.

  1. Strategy 1
  2. Strategy 2
  3. Strategy 3

For the case n 5 , list the 120 permutations of 1 2 3 4 5 . and verify the probabilities in the table below. Note that k 3 is optimal.

k 1 2 3 4 5
p 5 k 24120 50120 52120 42120 24120

Play the secretary game several times with n 5 candidates using the following strategies. See if you can detect a difference in the performance.

  1. Strategy 2
  2. Strategy 3
  3. Strategy 4

Well, clearly we don't want to keep doing this. Let's see if we can find a general analysis. With n candidates, let X n denote the number (arrival order) of the best candidate, and let S n k denote the event of success for strategy k (we select the best candidate).

Argue that since the candidates arrive in random order, X n is uniformly distributed on 1 2 n .

Argue that

  1. S n k X n j 0 if j 1 2 k 1 . If the arrival number of the best candidate is j k , then strategy k will certainly fail.
  2. S n k X n j k 1 j 1 if j k k 1 n . If the arrival order of the best candidate is j k , then strategy k will succeed if and only if one of the first k 1 candidates (the ones that are automatically rejected) is the best among the first j 1 candidates.

The two cases are illustrated below. The large dot indicates the best candidate. Red dots indicate candidates that are rejected out of hand, while blue dots indicte candidates that are considered.

Image: Success1.png Image: Success2.png

Use the results of the previous exercise and a conitioning argument to show that

p n k S n k j k n 1 n k 1 j 1 k 1 n j k n 1 j 1

Values of the function p n can be computed by hand for small n and by a computer algebra system for moderate n . The graph of p 100 is shown below. Note the concave downward shape of the graph and the optimal value of k , which turns out to be 38. The optimal probability is about 0.37104.

Image: Strategyk.png

The optimal strategy k n , the ratio k n n , and the optimal probability p n k n of finding the best candidate, as functions of n 3 4 20 are given in the following table:

Candidates n Optimal strategy k n Ratio k n n Optimal probability p n k n
3 2 0.6667 0.5000
4 2 0.5000 0.4853
5 3 0.6000 0.4333
6 3 0.5000 0.4278
7 3 0.4286 0.4143
8 4 0.5000 0.4098
9 4 0.4444 0.4060
10 4 0.4000 0.3987
11 5 0.4545 0.3984
12 5 0.4167 0.3955
13 6 0.4615 0.3923
14 6 0.4286 0.3917
15 6 0.4000 0.3894
16 7 0.4375 0.3881
17 7 0.4118 0.3873
18 7 0.3889 0.3854
19 8 0.4211 0.3850
20 8 0.4000 0.3842

Apparently, as we might expect, the optimal strategy k n increases and the optimal probability p n k n decreases as n increases On the other hand, it's encouraging, and a bit surprising, that the optmal probabiliity does not appear to be decreasing to 0. It's perhaps least clear what's going on with the ratio. Graphical displays of some of the information in the table may help:

Could it be that the ratio k n n and the probability p n k n are both converging, and moreover, are converging to the same number? First let's try to establish rigorously some of the trends observed in the table.

Show that

  1. p n k 1 p n k if and only if j k n 1 j 1 1
  2. For each n , the function p n at first increases and then decreases.
  3. The maximum occurs at the largest k with j k n 1 j 1 1 . This is the optimal strategy k n .
  4. As n increases, k n increases

Give a probabilistic argument that the optimal probability p n k n decreases as n increases.

Asymptotic Analysis

We are natrually interested in the asymptotic behavior of the function p n , and the optimal strategy as n . The key is recognizing p n as a Riemann sum for a simple integral. (Riemann sums, of course, are named for Georg Riemann.)

Show that

p n k k 1 n j k n 1 n n j 1

Recognize the sum in the previous exercise as the left Riemann sum for the the function f t 1 t corresponding to the partition of the interval k 1 n 1 into n k 1 subintervals of length 1 n each: k 1 n k n k 1 n n 1 n 1

From the previous two exercises, it follows that

p n k k 1 n k 1 n k n k n

More rigorously, suppose that k n depends on n and that k n n x as n . Thus, x 0 1 is the limiting proportion of the candidates that we reject out of hand.

Show that p n k n x x as n .

The graph below shows the true probabilities p n k and the limiting values k n k n as a function of k with n 100 .

Image: Strategyk2.png

It's easy to maximize the limiting function.

Show that the maximum value of x x occurs at x 1 and the maximum value is also 1 .

Image: StrategyLimit.png

Thus, the magic number 1 0.36788 occurs twice in the problem. For large n :

The article "Who Solved the Secretary Problem?" by Tom Ferguson has an interesting historical discussion of the problem, including speculation that Johannes Kepler may have used the optimal strategy to choose his second wife. The article also discusses several generalizations of the problem.