$$\newcommand{\P}{\mathbb{P}}$$ $$\newcommand{\E}{\mathbb{E}}$$ $$\newcommand{\R}{\mathbb{R}}$$ $$\newcommand{\N}{\mathbb{N}}$$ $$\newcommand{\bs}{\boldsymbol}$$ $$\newcommand{\var}{\text{var}}$$ $$\newcommand{\sd}{\text{sd}}$$ $$\newcommand{\skew}{\text{skew}}$$ $$\newcommand{\kurt}{\text{kurt}}$$

## 4. The Poisson Distribution

#### The Counting Variables

Recall that in the Poisson model, $$\bs{X} = (X_1, X_2, \ldots)$$ denotes the sequence of interarrival times, and $$\bs{T} = (T_0, T_1, T_2, \ldots)$$ denotes the sequence of arrival times. Thus $$\bs{T}$$ is the partial sum process associated with $$\bs{X}$$:

$T_n = \sum_{i=0}^n X_i, \quad n \in \N$

Based on the strong renewal assumption, that the process restarts at each fixed time and each arrival time, independently of the past, we now know that $$\bs{X}$$ is a sequence of independent random variables, each with the exponential distribution with parameter $$r \gt 0$$. We also know that $$T_n$$ has the gamma distribution with rate parameter $$r$$ and scale parameter $$n$$. The probability density function of $$T_n$$ is

$f_n(t) = r^n \frac{t^{n-1}}{(n-1)!} e^{-r\,t}, \quad 0 \le t \lt \infty$

Recall that for $$t \ge 0$$, $$N_t$$ denotes the number of arrivals in the interval $$(0, t]$$, so that $$N_t = \max\{n \in \N: T_n \le t\}$$. We refer to $$\bs{N} = (N_t: t \ge 0)$$ as the counting process. We can find the distribution of $$N_t$$ because of the inverse relation between $$\bs{N}$$ and $$\bs{T}$$. In particular, recall that

$N_t \ge n \iff T_n \le t, \quad t \in (0, \infty), \; n \in \N_+$

since both events mean that there are at least $$n$$ arrivals in $$(0, t]$$.

For $$t \in [0, \infty)$$, the probability density function of $$N_t$$ is given by

$\P(N_t = n) = e^{-r\,t} \frac{(r\,t)^n}{n!}, \quad n \in \N$
Proof:

Using the inverse relationship noted above, and integration by parts, we have

$\P(N_t \ge n) = \P(T_n \le t) = \int_0^t f_n(s) \, ds = 1 - \sum_{k=0}^{n-1} e^{-r\,t} \frac{(r\,t)^k}{k!}, \quad n \in \N$

For $$n \in \N$$ we have $$\P(N_t = n) = \P(N_t \ge n) - \P(N_t \ge n + 1)$$. Simplifying gives the result.

Note that the distribution of $$N_t$$ depends on the paramters $$r$$ and $$t$$ only through the product $$r\,t$$. The distribution is called the Poisson distribution with parameter $$r\,t$$ and is named for Simeon Poisson.

In the Poisson experiment, vary $$r$$ and $$t$$ with the scroll bars and note the shape of the probability density function. For various values of $$r$$ and $$t$$, run the experiment 1000 times and watch the apparent convergence of the relative frequency function to the probability density function.

The Poisson distribution is one of the most important in probability. In general, a random variable $$N$$ taking values in $$\N$$ is said to have the Poisson distribution with parameter $$c \gt 0$$ if it has the probability density function

$g(n) = e^{-c} \frac{c^n}{n!}, \quad n \in \N$

To check our work, note that $$g$$ is a valid probability density function.

Proof:

Recall that $$\sum_{n=0}^\infty c^n / n! = e^c$$.

The Poisson distribution is unimodal.

1. $$g(n - 1) \lt g(n)$$ if and only if $$n \lt c$$.
2. If $$c \notin \N_+$$, there is a single mode at $$\lfloor c \rfloor$$.
3. If $$c \in \N_+$$, there are consecutive modes at $$c - 1$$ and $$c$$.

Suppose that requests to a web server follow the Poisson model with rate $$r = 5$$ per minute. Find the probability that there will be at least 8 requests in a 2 minute period.

0.7798

Defects in a certain type of wire follow the Poisson model with rate 1.5 per meter. Find the probability that there will be no more than 4 defects in a 2 meter piece of the wire.

0.8153

#### Moments

Suppose that $$N$$ has the Poisson distribution with parameter $$c \gt 0$$. Naturally we want to know the mean, variance, and probability generating function of $$N$$. The easiest moments to compute are the factorial moments. For this result, recall the falling power notation for the number of permutations of size $$k$$ chosen from a population of size $$n$$:

$n^{(k)} = n \, (n - 1) \cdots [n - (k + 1)]$

The factorial moments of $$N$$ of order $$k \in \N$$ is $$\E[N^{(k)}] = c^k$$.

Proof:

Using the standard change of variables formula for expected value,

$\E[N^{(k)}] = \sum_{n=0}^\infty n^{(k)} e^{-c} \frac{c^n}{n!} = e^{-c} c^k \sum_{n=k}^\infty \frac{c^{n-k}}{(n-k)!} = e^{-c} c^k e^c = c^k$

The mean and variance of $$N$$ are the parameter $$c$$.

1. $$\E(N) = c$$
2. $$\var(N) = c$$
Proof:

Part (a) follows directly from the first factorial moment: $$\E(N) = \E[N^{(1)}] = c$$. For part (b), note that $$\E(N^2) = \E[N^{(2)}] + \E(N) = c^2 + c$$.

The probability generating function of $$N$$ is

$P(s) = \E(s^N) = e^{c(s - 1)}, \quad s \in \R$
Proof:

Using the change of variables formula again,

$\E(s^N) = \sum_{n=0}^\infty s^n e^{-c} \frac{c^n}{n!} = e^{-c} \sum_{n=0}^\infty \frac{(c\,s)^n}{n!} = e^{-c} e^{c\,s}, \quad s \in \R$

Returning to the Poisson process $$\bs{N} = (N_t: t \ge 0)$$ with rate parameter $$r$$, it follows that $$\E(N_t) = r\,t$$ and $$\var(N_t) = r\,t$$ for $$t \ge 0$$. Once again, we see that $$r$$ can be interpreted as the average arrival rate. In an interval of length $$t$$, we expect about $$r\,t$$ arrivals.

In the Poisson experiment, vary $$r$$ and $$t$$ with the scroll bars and note the location and size of the mean/standard deviation bar. For various values of $$r$$ and $$t$$, run the experiment 1000 times and watch the apparent convergence of the sample mean and standard deviation to the distribution mean and standard deviation, respectively.

Suppose that customers arrive at a service station according to the Poisson model, at a rate of $$r = 4$$. Find the mean and standard deviation of the number of customers in an 8 hour period.

32, 5.657

Let's explore the basic renewal assumption of the Poisson model in terms of the counting process $$\bs{N} = (N_t: t \ge 0)$$. Recall that $$N_t$$ is the number of arrivals in the interval $$(0, t]$$.

If $$s, \; t \in [0, \infty)$$ with $$s \lt t$$, then $$N_t - N_s$$ is the number of arrivals in the interval $$(s, t]$$.

Of course, the arrival times have continuous distributions, so the probability that an arrival occurs at a specific point $$t$$ is 0. Thus, it does not really matter if we write the interval above as $$(s, t]$$, $$(s, t)$$, $$[s, t)$$ or $$[s, t]$$.

The process $$\bs{N}$$ has stationary, independent increments.

1. If $$s, \; t \in [0, \infty)$$ with $$s \lt t$$ then $$N_t - N_s$$ has the same distribution as $$N_{t-s}$$, namely Poisson with parameter $$r\,(t - s)$$.
2. If $$t_1, \; t_2, \; t_3 \ldots \in [0, \infty)$$ with $$t_1 \lt t_2 \lt t_3 \lt \cdots$$ then $$(N_{t_1}, N_{t_2} - N_{t_1}, N_{t_3} - N_{t_2}, \ldots)$$ is an independent sequence.

The results in the last theorem can be stated more elegantly in terms of our more general counting process. Recall that for $$A \subseteq [0, \infty)$$ (measurable of course), $$N(A)$$ denotes the number of random points in $$A$$:

$N(A) = \#\{n \in \N_+: T_n \in A\}$

Recall also that $$\lambda$$ denotes the standard length (Lebesgue) measure on $$[0, \infty)$$. The stationary property states that the distribution of $$N(A)$$ depends only on $$\lambda(A)$$; in fact, the distribution is Poisson with parameter $$r \, \lambda(A)$$. The independence property states that if $$(A_1, A_2, \ldots)$$ is a sequence of disjoint subsets of $$[0, \infty)$$ then $$(N(A_1), N(A_2), \ldots)$$ is a sequence of independent variables.

Suppose that $$N$$ and $$M$$ are independent random variables, and that $$N$$ has the Poisson distribution with parameter $$c$$ and $$M$$ has the Poisson distribution with parameter $$d$$. Then $$N + M$$ has the Poisson distribution with parameter $$c + d$$.

Proof:

There are several ways to prove this result, but the one that gives the most insight is a probabilistic proof based on the Poisson process. Thus, suppose that we start with a sequence $$\bs{X} = (X_1, X_2, \ldots)$$ of independent random variables, each with the exponential distribution with parameter 1. This sequence leads to a Poisson process with rate 1, and in particular the counting process $$\bs{N} = (N_t: t \ge 0)$$. We can associate $$N$$ with $$N_c$$ and $$M$$ with $$N_{c + d} - N_c$$, so that $$N + M$$ is $$N_{c + d}$$

Simple analytic proofs can be constructed using probability generating functions or probability density functions. Recall that the PGF of $$N + M$$ is the product of the PGFs of $$N$$ and $$M$$. The PDF of $$N + M$$ is the convolution of the PDFs of $$N$$ and $$M$$.

From the last theorem, it follows that for each $$n \in \N_+$$, the Poisson distribution is the distribution of the sum of $$n$$ independent, identically distributed variables. A distribution with this property is said to be infinitely divisible. Again, the Poisson process gives an explicit decomposition. Suppose that $$(N_t: t \ge 0)$$ is a Poisson process with rate 1. For fixed $$t \gt 0$$, $$N_t$$ has the Poisson distribution with parameter $$t$$, and

$N_t = \sum_{i=1}^n [N_{i\,t / n} - N_{(i-1)t / n}]$

The variables in the sum are independent, and each has the Poisson distribution with parameter $$t / n$$. Other infinitely divisible distributions that we have studied are the normal distribution and the Cauchy distribution.

#### Normal Approximation

Because of the representation as a sum of independent, identically distributed variables, it's not surprising that the Poisson distribution can be approximated by the normal.

Suppose that $$N_t$$ has the Poisson distribution with parameter $$t \gt 0$$. Then the distribution of the variable below converges to the standard normal distribution as $$t \to \infty$$.

$Z_t = \frac{N_t - t}{\sqrt{t}}$
Proof:

As usual, we can assume that $$(N_t: t \ge 0)$$ is the Poisson counting process with rate 1. Note that $$Z_t$$ is simply the standard score associated with $$N_t$$. For $$n \in \N_+$$, $$N_n$$ is the sum of $$n$$ independent variables, each with the Poisson distribution with parameter 1. Thus, from the central limit theorem, the distribution of $$Z_n$$ converges to the standard normal distribution as $$n \to \infty$$. For general $$t \in [0, \infty)$$, it's possible to write $$Z_t = Z_n + W_t$$ where $$n = \lfloor t \rfloor$$ and $$W_t \to 0$$ as $$t \to \infty$$ (in probability and hence in distribution).

Thus, if $$N$$ has the Poisson distribution with parameter $$c$$, and $$c$$ is large, then the distribution of $$N$$ is approximately normal with mean $$c$$ and standard deviation $$\sqrt{c}$$. When using the normal approximation, we should remember to use the continuity correction, since the Poisson is a discrete distribution.

In the Poisson experiment, set $$r = t = 1$$. Increase $$t$$ and note how the graph of the probability density function becomes more bell-shaped.

In the Poisson experiment, set $$r = 5$$ and $$t = 4$$. Run the experiment 1000 times and compute the following:

1. $$\P(15 \le N_4 \le 22)$$
2. The relative frequency of the event $$\{15 \le N_4 \le 22\}$$.
3. The normal approximation to $$\P(15 \le N_4 \le 22)$$.
1. 0.6157
2. 0.6025

Suppose that requests to a web server follow the Poisson model with rate $$r = 5$$ per minute. Compute the normal approximation to the probability that there will be at least 280 requests in a 1 hour period.

0.8818

#### Conditional Distributions

Consider again the basic Poisson model with rate $$r \gt 0$$. As usual, $$\bs{T} = (T_0, T_1, \ldots)$$ denotes the arrival time sequence and $$\bs{N} = (N_t: t \ge 0)$$ the counting process.

For $$t \gt 0$$, the conditional distribution of $$T_1$$ given $$N_t = 1$$ is uniform on the interval $$(0, t]$$.

Proof:

Given $$N_t = 1$$ (one arrival in $$(0, t]$$) the arrival time $$T_1$$ takes values in $$(0, t]$$. From independent and stationary increments properties,

$\P(T_1 \le s \mid N_t = 1) = \P(N_s = 1, N_t - N_s = 0 \mid N_t = 1) = \frac{\P(N_s = 1, N_t - N_s = 0)}{\P(N_t = 0)} = \frac{\P(N_s = 1) \P(N_t - N_s = 0)}{\P(N_t = 1)}$

Hence using the Poisson distribution,

$\P(T_1 \le s \mid N_t = 1) = \frac{e^{-r\,s} s e^{-r(t - s)}}{e^{-r\,t} t} = \frac{s}{t}, \quad 0 \lt s \le t$

More generally, for $$t \gt 0$$ and $$n \in \N_+$$, the conditional distribution of $$(T_1, T_2, \ldots, T_n)$$ given $$N_t = n$$ is the same as the distribution of the order statistics of a random sample of size $$n$$ from the uniform distribution on the interval $$(0, t]$$.

Note that the conditional distribution in the last exercise is independent of the rate $$r$$. This result means that, in a sense, the Poisson model gives the most random distribution of points in time.

Suppose that requests to a web server follow the Poisson model, and that 1 request comes in a five minute period. Find the probability that the request came during the first 3 minutes of the period.

0.6

Suppose that $$0 \lt s \lt t$$ and $$n \in \N_+$$. Then the conditional distribution of $$N_s$$ given $$N_t = n$$ is binomial with trial parameter $$n$$ and success parameter $$p \ s / t$$.

Proof:

Note that given $$N_t = n$$, the number of arrivals $$N_s$$ in $$(0, s]$$ takes values in $$\{0, 1 \ldots, n\}$$. Again, the stationary and independent increments properties are critical for the proof.

$\P(N_s = k \mid N_t = n) = \frac{\P(N_s = k, N_t = n)}{\P(N_t = n)} = \frac{\P(N_s = k, N_t - N_s = n - k)}{\P(N_t = n)} = \frac{\P(N_s = k) \P(N_t - N_s = n - k)}{\P(N_t = n)}$

Subsitituting into the Poisson PDFs gives

$\P(N_s = k \mid N_t = n) = \frac{\left(e^{-r\,s}(r\,s)^k / k!\right)\left(e^{-r(t-s)}[(r(t - s)]^{n-k}/(n - k)!\right)}{e^{-r\,t}(r\,t)^n / n!} = \frac{n!}{k! (n - k)!} \left(\frac{s}{t}\right)^k \left(1 - \frac{s}{t}\right)^{n-k}$

Note again that the conditional distribution in the last Theorem does not depend on the rate $$r$$.

Suppose that requests to a web server follow the Poisson model, and that 10 requests come during a 5 minute period. Find the probability that at least 4 requests came during the first 3 minutes of the period.

0.9452

#### Estimating the Rate

In many practical situations, the rate $$r$$ of the process in unknown and must be estimated based on observing data. For fixed $$t \gt 0$$, a natural estimator of the rate $$r$$ is $$N_t / t$$.

The mean and variance of $$N_t / t$$ are

1. $$\E(N_t / t) = r$$
2. $$\var(N_t / t) = r / t$$
Proof:

These result follow easily from $$\E(N_t) = \var(N_t) = r\,t$$ and basic properties of expected value and variance.

Part (a) means that the estimator is unbiased. Since this is the case, the variance in part (b) gives the mean square error. Since $$\var(N_t)$$ decreases to 0 as $$t \to \infty$$, the estimator is consistent.

In the Poisson experiment, set $$r = 3$$ and $$t = 5$$. Run the experiment 100 times.

1. For each run, compute the estimate of $$r$$ based on $$N_t$$.
2. Over the 100 runs, compute the average of the squares of the errors.
3. Compare the result in (b) with the variance in previous exercise.

Suppose that requests to a web server follow the Poisson model with unknown rate $$r$$ per minute. In a one hour period, the server receives 342 requests. Estimate $$r$$.

$$r = 5.7$$ per minute