\(\newcommand{\P}{\mathbb{P}}\)
\(\newcommand{\E}{\mathbb{E}}\)
\(\newcommand{\R}{\mathbb{R}}\)
\(\newcommand{\N}{\mathbb{N}}\)
\(\newcommand{\bs}{\boldsymbol}\)
\(\newcommand{\var}{\text{var}}\)
\(\newcommand{\sd}{\text{sd}}\)
\(\newcommand{\skew}{\text{skew}}\)
\(\newcommand{\kurt}{\text{kurt}}\)

Recall that in the basic model of the Poisson process, we have points

that occur randomly in time. The sequence of interarrival times is \(\bs{X} = (X_1, X_2, \ldots)\). The strong renewal assumption states that at each arrival time and at each fixed time, the process must probabilistically restart, independent of the past. The first part of that assumption implies that \(\bs{X}\) is a sequence of independent, identically distributed variables. The second part of the assumption implies that if the first arrival has not occurred by time \(s\), then the time remaining until the arrival occurs must have the same distribution as the first arrival time itself. This is known as the memoryless property and can be stated in terms of a generic interarrival time \(X\) as follows: the conditional distribution of \(X - s\) given \(X \gt s\) is the same as the distribution of \(X\). Equivalently,

The memoryless property on \([0, \infty)\) determines the distribution of \(X\) up to a positive parameter, as we will see now.

Suppose that \(X\) satisfies the memoryless property. Then \(X\) has a continuous distribution and there exists \(r \in (0, \infty)\) such that the distribution function \(F\) of \(X\) is

\[ F(t) = 1 - e^{-r\,t}, \quad t \in [0, \infty) \]Let \(G = 1 - F\) denote the denote the right-tail distribution function of \(X\), so that \(G(t) = \P(X \gt t)\) for \(t \ge 0\). From the definition of conditional probability, the memoryless property is equivalent to the law of exponents:

\[ G(t + s) = G(s) G(t), \quad s, \; t \in [0, \infty) \]Let \(c = G(1)\). Implicit in the memoryless property is \(\P(X \gt t) \gt 0\) for \(t \in [0, \infty)\), so \(c \gt 0\). If \(n \in \N_+\) then

\[ G(n) = G\left(\sum_{i=1}^n 1\right) = \prod_{i=1}^n G(1) = \left[G(1)\right]^n = c^n \]Next, if \(n \in \N_+\) then

\[ c = G(1) = G\left(\frac{n}{n}\right) = G\left(\sum_{i=1}^n \frac{1}{n}\right) = \prod_{i=1}^n G\left(\frac{1}{n}\right) = \left[G\left(\frac{1}{n}\right)\right]^n \]so \(G\left(\frac{1}{n}\right) = c^{1/n}\). Now suppose that \(m \in \N\) and \(n \in \N_+\). Then

\[ G\left(\frac{m}{n}\right) = G\left(\sum_{i=1}^m \frac{1}{n}\right) = \prod_{i=1}^m G\left(\frac{1}{n}\right) = \left[G\left(\frac{1}{n}\right)\right]^m = c^{m/n} \]Thus we have \(G(q) = c^q\) for rational \(q \in [0, \infty)\). For \(t \in [0, \infty)\), there exists a sequence of rational numbers \((q_1, q_2, \ldots)\) with \(q_n \downarrow t\) as \(n \uparrow \infty\). We have \(G(q_n) = c^{q_n}\) for each \(n \in \N_+\). But \(G\) is continuous from the right, so taking limits gives \(c^t = G(t) \). Now let \(r = -\ln(c)\). Then \(G(t) = e^{-r\,t}\) for \(t \in [0, \infty)\).

The probability density function of \(X\) is

\[ f(t) = r \, e^{-r\,t}, \quad 0 \le t \lt \infty \]A random variable with the distribution function in Exercise 1 or equivalently the probability density function in Exercise 2 is said to have the exponential distribution with rate parameter \(r\). The reciprocal \(\frac{1}{r}\) is known as the scale parameter.

Show directly that the exponential probability density function is a valid probability density function.

In the gamma experiment, set \(n = 1\) so that the simulated random variable has an exponential distribution. Vary \(r\) with the scroll bar and watch how the shape of the probability density function changes. For selected values of \(r\), run the experiment 1000 times and watch the apparent convergence of the empirical density function to the probability density function.

The quantile function of \(X\) is

\[ F^{-1}(p) = \frac{-\ln(1 - p)}{r}, \quad 0 \le p \lt 1 \]In particular,

- The median of \(X\) is \(\frac{1}{r} \ln(2) \approx 0.6931 \frac{1}{r}\)
- The first quartile of \(X\) is \(\frac{1}{r}[\ln(4) - \ln(3)] \approx 0.2877 \frac{1}{r}\)
- The third quartile \(X\) is \(\frac{1}{r} \ln(4) \approx 1.3863 \frac{1}{r}\)
- The interquartile range is \(\frac{1}{r} \ln(3) \approx 1.0986 \frac{1}{r}\)

Suppose now that \(X\) has a continuous distribution on \([0, \infty)\) and is interpreted as the lifetime of a device. If \(F\) denotes the distribution function of \(X\), then \(1 - F\) is the reliability function of \(X\). If \(f\) denotes the probability density function of \(X\) then the failure rate function is

\[ h(t) = \frac{f(t)}{1 - F(t)}, \quad t \in [0, \infty) \]If \(X\) has the exponential distribution with rate \(r \gt 0\), then from Theorem 1, the reliability function is \(1 - F(t) = e^{-r\,t}\) and the probability density function is \(f(t) = r \, e^{-r\,t}\), so trivially \(X\) has constant rate \(r\). The converse is also true.

If \(X\) has constant failure rate \(r \gt 0\) then \(X\) has the exponential distribution with parameter \(r\).

Recall that in general, the distribution of a lifetime variable \(X\) is determined by the failure rate function \(h\). Specifically, if \(G = 1 - F\) denotes the reliability function, then \(G^\prime = -f\), so \(-h = G^\prime / G\). Integrating and then taking exponentials gives

\[ G(t) = \exp\left(-\int_0^t h(s) \, ds\right), \quad t \in [0, \infty) \]In particular, if \(h(t) = r\) for \(t \in [0, \infty)\), then \(G(t) = e^{-r\,t}\) for \(t \in [0, \infty)\).

The memoryless and constant failure rate properties are the most famous characterizations of the exponential distribution, but are by no means the only ones. Indeed, entire books have been written on characterizations of this distribution.

Suppose again that \(X\) has the exponential distribution with rate parameter \(r \gt 0\). Naturaly, we want to know the the mean, variance, and various other moments of \(X\).

If \(n \in \N\) then \(\E(X^n) = n! / r^n\).

By the change of variables theorem,

\[ \E(X^n) = \int_0^\infty t^n r \, e^{-r\,t} \, dt\]Integrating by parts gives \(\E(X^n) = \frac{n}{r} \E(X^{n-1})\) for \(n \in \N+\). Of course \(\E(X^0) = 1\) so the result now follows by induction.

More generally, \(\E(X^a) = \Gamma(a + 1) / r^a\) for every \(a \in [0, \infty)\), where \(\Gamma\) is the gamma function.

In particular.

- \(\E(X) = \frac{1}{r}\)
- \(\var(X) = \frac{1}{r^2}\)
- \(\skew(X) = 2\)
- \(\kurt(X) = 9\)

In the context of the Poisson process, the parameter \(r\) is known as the rate of the process. On average, there are \(1 / r\) time units between arrivals, so the arrivals come at an average rate of \(r\) per unit time. Note also that the mean and standard deviation are equal for an exponential distribution, and that the median is always smaller than the mean. Recall also that skewness and kurtosis are standardized measure, and so do not depend on the parameter \(r\).

The moment generating function of \(X\) is

\[ M(s) = \E(e^{s\,X}) = \frac{r}{r - s}, \quad s \in (-\infty, r) \]In the gamma experiment, set \(n = 1\) so that the simulated random variable has an exponential distribution. Vary \(r\) with the scroll bar and watch how the mean/standard deviation bar changes. For various values of \(r\), run the experiment 1000 times and watch the apparent convergence of the empirical mean and standard deviation to the distribution mean and standard deviation, respectively.

The exponential distribution has a number of interesting and important mathematical properties.

As suggested earlier, the exponential distribution is a scale family, and \(1/r\) is the scale parameter.

Suppose that \(X\) has the exponential distribution with rate parameter \(r \gt 0\) and that \(c \gt 0\). Then \(c\,X\) has the exponential distribution with rate parameter \(r / c\).

For \(t \ge 0\), \(\P(c\,X \gt t) = \P(X \gt t / c) = e^{-r (t / c)} = e^{-(r / c) t}\).

Recall that multiplying a random variable by a positive constant frequently corresponds to a change of units (minutes into hours for a lifetime variable, for example). Thus, the exponential distribution is preserved under such changes of units.

Suppose that \(X\) has the exponential distribution with rate parameter \(r \gt 0\). Then \(\lfloor X \rfloor\) and \(\lceil X \rceil\) have geometric distributions on \(\N\) and on \(\N_+\), respectively, each with success parameter \(1 - e^{-r}\).

For \(n \in \N\) note that \(\P(\lfloor X \rfloor = n) = \P(n \le X \lt n + 1) = F(n + 1) - F(n)\). Substituting into the distribution function and simplifying gives \(\P(\lfloor X \rfloor = n) = (e^{-r})^n (1 - e^{-r})\). As a function of \(n\), this is the geometric distribution on \(\N\) with success parameter \(1 - e^{-r}\). The proof for the ceiling function is similar.

In many respects, the geometric distribution is a discrete version of the exponential distribution. In particular, recall that the geometric distribution is the only distribution on \(\N_+\) with the memoryless and constant rate properties. We will explore the analogy in more detail in the section on Bernoulli trials and the Poisson process. The following connection between the two distributions is interesting by itself, but will also be very important in the section on splitting Poisson processes.

Suppose that \(\bs{X} = (X_1, X_2, \ldots)\) is a sequence of independent variables, each with the exponential distribution with rate \(r\). Suppose that \(U\) has the geometric distribution on \(\N_+\) with success parameter \(p\) and is independent of \(\bs{X}\). Then \(Y = \sum_{i=1}^U X_i\) has the exponential distribution with rate \(r\,p\).

Recall that the moment generating function of \(Y\) is \(P \circ M\) where \(M\) is the common moment generating function of the terms in the sum, and \(P\) is the probability generating function of the number of terms \(U\). But \(M(s) = r / (r - s)\) for \(s \lt r\) and \(P(s) = p\,s / [1 - (1 - p)s]\) for \(s \lt 1 / (1 - p)\). Thus,

\[ (P \circ M)(s) = \frac{p r / (r - s)}{1 - (1 - p) r / (r - s)} = \frac{pr}{pr - s}, \quad s \lt pr \]It follows that \(Y\) has the exponential distribution with parameter \(p \, r\)

Suppose that \(X\) and \(Y\) have exponential distributions with parameters \(a\) and \(b\), respectively, and are independent. Then

\[ \P(X \lt Y) = \frac{a}{a + b} \]This result can be proved in a straightforward way by integrating the joint PDF of \((X, Y)\) over \(\{(x, y): 0 \lt x \lt y \lt \infty\}\). A more elegant proof uses conditioning and the moment generating function in Exercise 10:

\[ \P(Y \gt X) = \E[\P(Y \gt X \mid X)] = \E(e^{-b\,X}) = \frac{a}{a + b}\]The following theorem gives an important random version of the memoryless property.

Suppose that \(X\) and \(Y\) are independent variables taking values in \([0, \infty)\) and that \(Y\) has the exponential distribution with rate parameter \(r \gt 0\). Then \(X\) and \(Y - X\) are conditionally independent given \(X \lt Y\), and the conditional distribution of \(Y - X\) is also exponential with parameter \(r\).

Suppose that \(A \subseteq [0, \infty)\) (measurable of course) and \(t \ge 0\). Then

\[ \P(X \in A, Y - X \ge t \mid X \lt Y) = \frac{\P(X \in A, Y - X \ge t)}{\P(X \lt Y)} \]But conditioning on \(X\) we can write the numerator as

\[ \P(X \in A, Y - X \gt t) = \E[\P(X \in A, Y - X \gt t \mid X)] = \E[\P(Y \gt X + t \mid X), X \in A] = \E[e^{-r(t + X)}, X \in A] = e^{-rt} \E(e^{-r\,X}, X \in A) \]Similarly, conditioning on \(X\) gives (just as in the proof or Theorem 14), \(\P(X \lt Y) = \E(e^{-r\,X})\). Thus

\[ \P(X \in A, Y - X \gt t \mid X \lt Y) = e^{-r\,t} \frac{\E(e^{-r\,X}, X \in A)}{\E(e^{-rX})} \]Letting \(A = [0, \infty)\) we have \(\P(Y \gt t) = e^{-r\,t}\) so given \(X \lt Y\), the variable \(Y - X\) has the exponential distribution with parameter \(r\). Letting \(t = 0\), we see that given \(X \lt Y\), variable \(X\) has the distribution

\[ A \mapsto \frac{\E(e^{-r\,X}, X \in A)}{\E(e^{-r\,X})}\ \]Finally, because of the factoring, \(X\) and \(Y - X\) are conditionally independent given \(X \lt Y\).

For the remainder of this subsection, suppose that \(\bs{X} = (X_1, X_2, \ldots, X_n)\) is a sequence of independent random variables, and that \(X_i\) has the exponential distribution with rate parameter \(r_i \gt 0\) for each \(i \in \{1, 2, \ldots, n\}\).

Let \(U = \min\{X_1, X_2, \ldots, X_n\}\). Then \(U\) has the exponential distribution with parameter \(\sum_{i=1}^n r_i\).

Recall that in general, \(\{U \gt t\} = \{X_1 \gt t, X_2 \gt t, \ldots, X_n \gt t\}\) and therefore by independence, \(G(t) = G_1(t) G_2(t) \cdots G_n(t)\) for \(t \ge 0\), where \(G\) is the reliability function of \(U\) and \(G_i\) is the reliability function of \(X_i\) for each \(i\). When \(X_i\) has the exponential distribution with rate \(r_i\) for each \(i\), we have \(G(t) = \exp\left[-\left(\sum_{i=1}^n r_i\right) t\right]\) for \(t \ge 0\).

In the context of reliability, if a series system has independent components, each with an exponentially distributed lifetime, then the lifetime of the system is also exponentially distributed, and the failure rate of the system is the sum of the component failure rates. In the context of random processes, if we have \(n\) independent Poisson process, then the new process obtained by combining the random times is also Poisson, and the rate of the new process is the sum of the rates of the individual processes (we will return to this point latter).

Let \(V = \max\{X_1, X_2, \ldots, X_n\}\). Then \(V\) has distribution function.

\[ F(t) = \prod_{i=1}^n (1 - e^{-r_i \, t}) \]Recall that in general, \(\{V \le t\} = \{X_1 \le t, X_2 \le t, \ldots, X_n \le t\}\) and therefore by independence, \(F(t) = F_1(t) F_2(t) \cdots F_n(t)\) for \(t \ge 0\), where \(F\) is the distribution function of \(V\) and \(F_i\) is the distribution function of \(X_i\) for each \(i\).

Consider the special case where \(\bs{X}\) is a sequence of independent variables, each with the exponential distribution with rate \(r \gt 0\). In statistical terms, \(\bs{X}\) is a random sample from the exponential distribution. From the last couple of theorems, \(U\) has the exponential distribution with rate \(n\,r\) while \(V\) has distribution function \(F(t) = (1 - e^{-r\,t})^n\) for \(t \in [0, \infty)\). Recall that \(U\) and \(V\) are the first and last order statistics, respectively.

In the order statistic experiment, select the exponential distribution.

- Set \(k = 1\) (this gives the minimum \(U\)). Vary \(n\) with the scroll bar and note the shape of the probability density function. For selected values of \(n\), run the simulation 1000 times and note the apparent convergence of the empirical density function to the true probability density function.
- Vary \(n\) with the scroll bar, set \(k = n\) each time (this gives the maximum \(V\)), and note the shape of the probability density function. For selected values of \(n\), run the simulation 1000 times and note the apparent convergence of the empirical density function to the true probability density function.

We can now generalize Exercise 15:

For \(i \in \{1, 2, \ldots, n\}\),

\[ \P(X_i \lt X_j \text{ for all } j \ne i) = \frac{r_i}{\sum_{j=1}^n r_j} \]First, note that \(X_i \lt X_j\) for all \(i \ne j\) if and only if \(X_i \lt \min\{X_j: j \ne i\}\). But the minimum on the right is independent of \(X_i\) and, by Exercise 17, has an exponential distribution with parameter \(\sum_{j \ne i} r_j\). The result now follows from Exercise 15.

The results in Exercise 17 and Exercise 20 are very important in the theory of continuous-time Markov chains. Suppose that for each \(i\), \(X_i\) is the time until an event of interest occurs (the arrival of a customer, the failure of a device, etc.) and that these times are independent and exponentially distributed. Then the *first* time \(U\) that one of the events occurs is also exponentially distributed (Exercise 17), and the probability that the first event to occur is event \(i\) is proportional to the rate \(r_i\) (Exercise 20).

The probability of a total ordering is

\[ \P(X_1 \lt X_2 \lt \cdots \lt X_n) = \prod_{i=1}^n \frac{r_i}{\sum_{j=i}^n r_j} \]Of course, the probabilities of other orderings can be computed by permuting the parameters appropriately in the formula on the right.

Suppose that the length of a telephone call (in minutes) is exponentially distributed with rate parameter \(r = 0.2\). Find each of the following:

- The probability that the call lasts between 2 and 7 minutes.
- The median, the first and third quartiles, and the interquartile range of the call length.

Let \(X\) denote the call length.

- \(\P(2 \lt X \lt 7) = 0.4237\)
- \(q_1 = 1.4384\), \(q_2 = 3.4657\), \(q_3 = 6.9315\), \(q_3 - q_1 = 5.4931\)

Suppose that the lifetime of a certain electronic component (in hours) is exponentially distributed with rate parameter \(r = 0.001\). Find each of the following:

- The probability that the component lasts at least 2000 hours.
- The median, the first and third quartiles, and the interquartile range of the lifetime.

Let \(T\) denote the lifetime

- \(\P(T \ge 2000) = 0.1353\)
- \(q_1 = 287.682\), \(q_2 = 693.147\), \(q_3 = 1386.294\), \(q_3 - q_1 = 1098.612\)

Suppose that the time between requests to a web server (in seconds) is exponentially distributed with rate parameter \(r = 2\). Find each of the following:

- The mean and standard deviation of the time between requests.
- The probability that the time between requests is less that 0.5 seconds.
- The median, the first and third quartiles, and the interquartile range of the time between requests.

Let \(T\) denote the time between requests.

- \(\E(T) = 0.5\), \(\sd(T) = 0.5\)
- \(\P(T \lt 0.5) = 0.6321\)
- \(q_1 = 0.1438\), \(q_2 = 0.3466\), \(q_3 = 0.6931\), \(q_3 - q_1 = 0.5493\)

Suppose that the lifetime \(X\) of a fuse (in 100 hour units) is exponentially distributed with \(\P(X \gt 10) = 0.8\). Find each of the following:

- The rate parameter.
- The mean and standard deviation.
- The median, the first and third quartiles, and the interquartile range of the lifetime.

Let \(X\) denote the lifetime.

- \(r = 0.02231\)
- \(\E(X) = 44.814\), \(\sd(X) = 44.814\)
- \(q_1 = 12.8922\), \(q_2 = 31.0628\), \(q_3 = 62.1257\), \(q_3 - q_1 = 49.2334\)

The position \(X\) of the first defect on a digital tape (in cm) has the exponential distribution with mean 100. Find each of the following:

- The rate parameter.
- The probability that \(X \lt 200\) given \(X \gt 150\).
- The standard deviation.
- The median, the first and third quartiles, and the interquartile range of the position.

Let \(X\) denote the position of the first defect.

- \(r = 0.01\)
- \(\P(X \lt 200 \mid X \gt 150) = 0.3935\)
- \(\sd(X) = 100\)
- \(q_1 = 28.7682\), \(q_2 = 69.3147\), \(q_3 = 138.6294\), \(q_3 - q_1 = 109.6812\)