\( \renewcommand{\P}{\mathbb{P}} \) \( \newcommand{\R}{\mathbb{R}} \) \( \newcommand{\N}{\mathbb{N}} \) \( \newcommand{\bs}{\boldsymbol} \)
  1. Virtual Laboratories
  2. 2. Distributions
  3. 1
  4. 2
  5. 3
  6. 4
  7. 5
  8. 6
  9. 7
  10. 8

2. Continuous Distributions

Basic Theory

As usual, suppose that we have a random experiment with probability measure \(\P\) on an underlying sample space \(\Omega\). A random variable \(X\) taking values in \(S\) is said to have a continuous distribution if

\[\P(X = x) = 0 \text{ for all } x \in S\]

The fact that \(X\) takes any particular value with probability 0 might seem paradoxical at first, but conceptually it is the same as the fact that an interval of \(\R\) can have positive length even though it is composed of points each of which has 0 length. Similarly, a region of \(\R^2\) can have positive area even though it is composed of points (or curves) each of which has area 0.

If \(X\) has a continuous distribtion then \(\P(X \in C) = 0\) for any countable \(C \subseteq S\).

Proof:

Since \(C\) is countable, it follows from the additivity axiom of probability that

\[ \P(X \in C) = \sum_{x \in C} \P(X = x) = 0 \]

Thus, continuous distributions are in complete contrast with discrete distributions, for which all of the probability mass is concentrated on a discrete set. For a continuous distribution, the probability mass is continuously spread over \(S\). Note also that \(S\) itself cannot be countable. In the picture below, the light blue shading is intended to suggest a continuous distribution of probability.

A continuous distribution

Probability Density Functions

Suppose again that \(X\) has a continuous distribution on \(S \subseteq \R^n\). A real-valued function \(f\) defined on \(S\) is said to be a probability density function for \(X\) if \(f\) satisfies the following properties:

  1. \(f(x) \ge 0\) for all \(x \in S\)
  2. \(\int_S f(x) dx = 1\)
  3. \(\P(X \in A) = \int_A f(x) dx\) for \(A \subseteq S\)
A continuous distribution

Property (c) in the definition is particularly important since it implies that the probability distribution of \(X\) is completely determined by the probability density function. Conversely, any function that satisfies properties (a) and (b) is a probability density function, and then property (c) can be used to define a continuous probability distribution on \(S\).

If \(n \gt 1\), the integrals in properties (b) and (c) are multiple integrals over subsets of \(\R^n\) with \(\bs{x} = (x_1, x_2, \ldots, x_n)\) and \(d \bs{x} = dx_1 dx_2 \cdots dx_n\). In fact, technically, \(f\) is a probability density function relative to the standard \(n\)-dimensional measure, which we recall is given by

\[\lambda_n(A) = \int_A \bs{1} d \bs{x}, \quad A \subseteq \R^n\]

Note that \(\lambda_n(S)\) must be positive (perhaps infinite). In particular,

  1. If \(n = 1\), \(S\) must be a subset of \(\R\) with positive length (\(S\) is usually an interval).
  2. If \(n = 2\), \(S\) must be a subset of \(\R^2\) with positive area.
  3. If \(n = 3\), \(S\) must be a subset of \(\R^3\) with positive volume.

However, we recall that except for exposition, the low dimensional cases (\(n \in \{1, 2, 3\}\)) play no special role in probability. Interesting random experiments often involve several random variables (that is, a random vector). Finally, note that we can always extend \(f\) to a probability density function on all of \(\R^n\) by defining \(f(x) = 0\) for \(x \notin S\). This extension sometimes simplifies notation.

Just as in the discrete case, an element \(x \in S\) that maximizes the probability density function \(f\) is called a mode of the distribution. If there is only one mode, it is sometimes used as a measure of the center of the distribution.

Probability density functions of continuous distributions differ from their discrete counterparts in several important ways:

Constructing Probability Density Functions

Suppose that \(g\) is a nonnegative function on \(S \subseteq \R^n\). Let

\[c = \int_S g(x) dx\]

If \(0 \lt c \lt \infty\) then \(f(x) = \frac{1}{c} g(x)\) for \(x \in S\) defines a probability density function on \(S\).

Note that \(f\) is just a scaled version of \(g\). Thus, the result in the last exercise can be used to construct probability density functions with desired properties (domain, shape, symmetry, and so on). The constant \(c\) is sometimes called the normalizing constant.

Conditional Densities

Suppose that \(X\) is a random variable taking values in \(S \subseteq \R^n\) with a continuous distribution that has probability density function \(f\). The probability density function of \(X\), of course, is based on the underlying probability measure \(\P\) on the sample space \(\Omega\). This measure could be a conditional probability measure, conditioned on a given event \(E \subseteq \Omega\) (with \(\P(E) \gt 0\) of course). The usual notation is

\[f(x \mid E), \quad x \in S\]

Note, however, that except for notation, no new concepts are involved. The function above is a probability density function for a continuous distribution. That is, it satisfies properties (a) and (b) while property (c) becomes

\[\int_A f(x \mid E) dx = \P(X \in A \mid E)\]

All results that hold for probability density functions in general have analogies for conditional probability density functions.

Suppose that \(B \subseteq S\) with \(\P(X \in B) \gt 0\). The conditional probability density function of \(X\) given \(X \in B\) is

\[f(x \mid X \in B) = \frac{f(x)}{\P(X \in B)} \]
Proof:

For \(A \subseteq B\),

\[ \int_A \frac{f(x)}{\P(X \in B)} \, dx = \frac{1}{\P(X \in B)} \int_A f(x) \, dx = \frac{\P(X \in A)}{\P(X \in B)} = \P(X \in A \mid X \in B) \]

Examples and Applications

The Exponential Distribution

Let \(f(t) = r e^{-r t}\) for \(t \ge 0\), where \(r \gt 0\) is a parameter. Then

  1. \(f\) is a probability density function.
  2. The graph of \( f \) is decreasing and concave upward on \( [0, \infty) \)
  3. The mode occurs at \( t = 0 \).

The distribution defined by the probability density function in the previous exercise is called the exponential distribution with rate parameter \(r\). This distribution is frequently used to model random times, under certain assumptions. Specifically, in the Poisson model of random points in time, the times between successive arrivals have independent exponential distributions, and the parameter \(r\) is the average rate of arrivals. The exponential distribution is studied in detail in the chapter on Poisson Processes.

The lifetime \(T\) of a certain device (in 1000 hour units) has the exponential distribution with parameter \(r = \frac{1}{2}\). Find

  1. \(\P(T \gt 2)\)
  2. \(\P(T \gt 3 \mid T \gt 1)\)
Answer:
  1. \(e^{-1} \approx 0.3679\)
  2. \(e^{-1} \approx 0.3679\)

In the gamma experiment, set \( n =1 \) to get the exponential distribution. Vary the rate parameter \( r \) and note the shape of the probability density function. For various values of \(r\), run the simulation 1000 times and note the apparent convergence of the empirical density function to the true probability density function.

A Random Angle

In Bertrand's problem, a certain random angle \(\Theta\) has probability density function \(f(\theta) = \sin(\theta)\) for \(0 \le \theta \le \frac{\pi}{2}\).

  1. Show that \(f\) is a probability density function.
  2. Graph \(f\) and identify the mode.
  3. Find \(\P(\Theta \lt \frac{\pi}{4})\).
Answer:
  1. mode \(\theta = \frac{\pi}{2}\)
  2. \(1 - \frac{1}{\sqrt{2}} \approx 0.2929\)

Bertand's problem is named for Joseph Louis Bertrand and is studied in more detail in the chapter on Geometric Models.

In Bertrand's experiment, select the model with uniform distance. Run the simulation 1000 times and compute the empirical probability of the event \(\{\Theta \lt \frac{\pi}{4}\}\). Compare with the true probability in the previous exercise.

Gamma Distributions

Let \(g_n(t) = e^{-t} \frac{t^n}{n!}\) for \(t \ge 0\) where \(n \in \N\) is a parameter. Then

  1. \(g_n\) is a probability density function for each \(n\).
  2. \(g_n(t)\) is strictly increasing for \(t \lt n\) and strictly decreasing for \(t \gt n\).
  3. The distribution is unimodal with mode at \(t = n\).
Proof:

For (a), use induction on \(n\).

Remarkably, we showed in the last section on discrete distributions, that \(f_t(n) = g_n(t)\) is a probability density function on \(\N\) for each \(t \ge 0\) (it's the Poisson distribution with parameter \(t\)). The distribution defined by the probability density function \(g_n\) is an example of a gamma distribution; \(n + 1\) is called the shape parameter. The family of gamma distributions is studied in more generality in the chapter on Poisson Processes. Note that the special case \(n = 0\) gives the exponential distribution with rate parameter 1.

Suppose that the lifetime of a device \(T\) (in 1000 hour units) has the gamma distribution in Exercise 9 with \(n = 2\). Find \(\P(T \gt 3)\).

Answer:

\(\frac{17}{2} e^{-3} \approx 0.4232\)

In the gamma experiment, set the rate parameter \(r = 1\) and the shape parameter \(k = 3\) to get the distribution in the previous exercise. Run the experiment 1000 times and compute the empirical probability of the event \(\{T \gt 3\}\). Compare with the theoretical probability in the previous exercise.

Beta Distributions

Let \(g(x) = x (1 - x)\) for \(0 \le x \le 1\).

  1. Sketch the graph of \(g\).
  2. Find the probability density function \(f\) proportional to \(g\).
  3. Find the mode.
  4. Find \(\P\left( \frac{1}{2} \lt X \lt \frac{3}{4} \right)\) where \(X\) is a random variable with the probability density function in (b).
Answer:
  1. \(f(x) = 6 \, x (1 - x)\) for \(0 \le x \le 1\)
  2. mode \(x = \frac{1}{2}\)
  3. \(\frac{11}{32}\)

Let \(g(x) = x^2 (1 - x)\) for \(0 \le x \le 1\).

  1. Sketch the graph of \(g\).
  2. Find the probability density function \(f\) proportional to \(g\).
  3. Find the mode.
  4. Find \(\P\left( \frac{1}{2} \lt X \lt 1 \right)\) where \(X\) is a random variable with the probability density function in (b).
Answer:
  1. \(f(x) = 12 \, x^2 (1 - x)\) for \(0 \le x \le 1\)
  2. mode \(x = \frac{2}{3}\)
  3. \(\frac{11}{16}\)

Let \(g(x) = \frac{1}{\sqrt{x (1 - x)}}\) for \(0 \lt x \lt 1\).

  1. Sketch the graph of \(g\).
  2. Find the probability density function \(f\) proportional to \(g\).
  3. Find \(\P\left( 0 \lt X \lt \frac{1}{4} \right)\) where \(X\) is a random variable with the probability density function in (b).
Answer:
  1. In the integral, first use the simple substitution \(u = \sqrt{x}\) and then recognize the new integral as an arcsine integral.
  2. \(f(x) = \frac{1}{\pi \sqrt{x (1 - x)}}\) for \(0 \lt x \lt 1\)
  3. \(\frac{1}{3}\)

The distributions defined in the last three exercises are examples of beta distributions. The particular beta distribution in Exercise 14 is also known as the arcsine distribution. Beta distributions are studied in detail in the chapter on Special Distributions.

In the special distribution simulator, select the beta distribution. For the following parameter values, note the shape and location of the probability density function. Run the simulation 1000 times and note the apparent convergence of the empirical density function to the true probability density function.

  1. \(a = 2\), \(b = 2\). This gives the beta distribution in Exercise 12.
  2. \(a = 3\), \(b = 2\). This gives the beta distribution in Exercise 13.
  3. \(a = \frac{1}{2}\), \(b = \frac{1}{2}\). This gives the arcsine distribution in Exercise 14.

The Pareto Distribution

Let \(g(x) = \frac{1}{x^b}\) for \(1 \le x \lt \infty\), where \(b \gt 0\) is a parameter.

  1. Sketch the graph of \(g\).
  2. Show that for \(0 \lt b \le 1\), there is no probability density function proportional to \(g\).
  3. Show that for \(b \gt 1\), the normalizing constant is \(\frac{1}{b - 1}\).

The distribution defined in the last exercise is known as the Pareto distribution, named for Vilfredo Pareto. The parameter \(a = b - 1\), so that \(a \gt 0\), is known as the shape parameter. Thus, the Pareto distribution with shape parameter \(a\) has probability density function \(f(x) = \frac{a}{x^{a+1}}\) for \(1 \le x \lt \infty\). The Pareto distribution is studied in detail in the chapter on Special Distributions.

Suppose that the income \(X\) (in appropriate units) of a person randomly selected from a population has the Pareto distribution with shape parameter \(a = 3\). Find \(\P(X \gt 2)\).

Answer:

\(\frac{1}{4}\)

In the special distribution simulator, select the Pareto with \(a = 3\). Run the simulation 1000 times and compute the empirical probability of the event \(\{X \gt 2\}\). Compare with the theoretical probability in the last exercise.

The Cauchy Distribution

Let \(g(x) = \frac{1}{x^2 + 1}\) for \(x \in \R\).

  1. Sketch the graph of \(g\).
  2. Show that the normalizing constant is \(\pi\).
  3. Find \(\P(-1 \lt X \lt 1)\) where \(X\) has the probability density function proportional to \(g\).
Answer:
  1. In the integral, note that the antiderivative is the arctangent function.
  2. \(\frac{1}{2}\)

The distribution constructed in the previous exercise is known as the Cauchy distribution, named after Augustin Cauchy (it might also be called the arctangent distribution). The Cauchy distribution is studied in detail in the chapter on Special Distributions. The graph of \(g\) is known as the witch of Agnesi, in honor of Maria Agnesi.

In the special distribution simulator, select the student \(t\) distribution. Set \(n = 1\) to get the Cauchy distribution. Run the simulation 1000 times and note how well the empirical density function fits the true probability density function.

The Standard Normal Distribution

Let \(g(z) = e^{-z^2/2}\) for \(z \in \R\).

  1. Sketch the graph of \(g\).
  2. Show that the normalizing constant is \(\sqrt{2 \pi}\).
Proof:

In (b), let \(c = \int_{-\infty}^\infty g(x) \, dx\) denote the normalizing constant. Then

\[ c^2 = \int_{-\infty}^\infty e^{-x^2/2} \, dx \int_{-\infty}^\infty e^{-y^2/2} \, dy = \int_{-\infty}^\infty \int_{-\infty}^\infty e^{-(x^2 + y^2) / 2} \, dx \, dy \]

We now change to polar coordinates: \(x = r \cos(\theta)\), \(y = r \sin(\theta)\) where \(r \in [0, \infty)\) and \(\theta \in [0, 2 \pi)\). Then \(x^2 + y^2 = r^2\) and \(dx \, dy = r \, dr \, d\theta\). Hence

\[ c^2 = \int_0^{2 \pi} \int_0^\infty r e^{-r^2 / 2} \, dr \, d\theta \]

If we use the simple substitution \(u = r^2\), the inner integral is \(\int_0^\infty e^{-u} du = 1\). Then the outer integral is \(\int_0^{2\pi} 1 \, d\theta = 2 \pi\).

The distribution defined in the last exercise is the standard normal distribution, perhaps the most important distribution in probability. Normal distributions are widely used to model physical measurements that are subject to small, random errors. The family of normal distributions is studied in detail in the chapter on Special Distributions.

In the special distribution simulator, select the normal distribution (the default parameters give the standard normal distribution). Run the simulation 1000 times and note how well the empirical density function fits the true probability density function.

The Extreme Value Distribution

Let \(f(x) = e^{-x} e^{-e^{-x}}\) for \(x \in \R\).

  1. Show that \(f\) is a probability density function.
  2. Sketch the graph of \(f\) and identify the mode. Note in particular the asymmetry of the graph.
  3. Find \(\P(X \gt 0)\), where \(X\) has probability density function \(f\).
Answer:
  1. Mode \(x = 0\)
  2. \(1 - e^{-1} \approx 0.6321\)

The distribution in the last exercise is the type 1 extreme value distribution, also known as the Gumbel distribution in honor of Emil Gumbel. Extreme value distributions are studied in detail in the chapter on Special Distributions.

In the special distribution simulator, select the extreme value distribution. Note the shape and location of the probability density function. Run the simulation 1000 times and note how well the empirical density function fits the true probability density function.

The Logistic Distribution

Let \(f(x) = \frac{e^x}{(1 + e^x)^2}\) for \(x \in \R\).

  1. Show that \(f\) is a probability density function.
  2. Sketch the graph of \(f\) and identify the mode. Note in particular the symmetry of the graph.
  3. Find \(\P(X \gt 1)\), where \(X\) has probability density function \(f\).
Answer:
  1. Mode \(x = 0\)
  2. \(\frac{1}{1 + e} \approx 0.2689\)

The distribution in the last exercise is the logistic distribution. Logistic distributions are studied in detail in the chapter on Special Distributions.

In the special distribution simulator, select the logistic distribution. Note the shape and location of the probability density function. Run the simulation 1000 times and note how well the empirical density function fits the true probability density function.

Weibull Distributions

Let \(f(t) = 2 t e^{-t^2}\) for.

  1. Show that \(f\) is a probability density function.
  2. Sketch the graph of \(f\) and identify the mode.
  3. Find \(\P(T \gt \frac{1}{2})\), where \(T\) has probability density function \(f\).
Answer:
  1. Mode \(t = \frac{1}{\sqrt{2}}\)
  2. \(e^{-1/4} \approx 0.7788\)

Let \(f(t) = 3 t^2 e^{-t^3}\) for \(t \ge 0\).

  1. Show that \(f\) is a probability density function.
  2. Sketch the graph of \(f\) and identify the mode..
  3. Find \(\P(T \gt \frac{1}{2})\), where \(T\) has probability density function \(f\).
Answer:
  1. Mode \(t = \sqrt[3]{\frac{2}{3}}\)
  2. \(e^{-1/8} \approx 0.8825\)

The distributions in the last two exercises are examples of Weibull distributions, name for Walodi Weibull. Weibull distributions are studied in more generality in the chapter on Special Distributions. They are often used to model random failure times of devices (in appropriately scaled units).

In the special distribution simulator, select the Weibull distribution. For each of the following values of the shape parameter \(k\), note the shape and location of the probability density function. Run the simulation 1000 times and note how well the empirical density function fits the true probability density function.

  1. \(k = 2\). This gives the Weibull distribution in Exercise 27.
  2. \(k = 3\). This gives the Weibull distribution in Exercise 28.

Additional Examples

Let \(f(x) = -\ln(x)\) for \(0 \lt x \le 1\).

  1. Sketch the graph of \(f\).
  2. Show that \(f\) is a probability density function.
  3. Find \(\P(\frac{1}{3} \le X \le \frac{1}{2})\) where \(X\) has the probability density function in (b).
Answer:
  1. \(\frac{1}{2} \ln(2) - \frac{1}{3} \ln(3) + \frac{1}{6} \approx 0.147\)

Let \(f(x) = e^{-x} (1 - e^{-x})\) for \(x \ge 0\).

  1. Sketch the graph of \(g\) and find the probability density function \(f\) proportional to \(g\)
  2. Identify the mode.
  3. Find \(\P(X \ge 1)\) where \(X\) has the probability density function in (b).
Answer:
  1. \(f(x) = 2 e^{-x} (1 - e^{-x})\) for \(0 \le x \lt \infty\)
  2. Mode \(x = \ln(x)\)
  3. \(2 e^{-1} - e^{-2} \approx 0.6004 \)

Let \(f(x, y) = x + y\) for \(0 \le x \le 1\), \(0 \le y \le 1\).

  1. Show that \(f\) is a probability density function.
  2. Find \(\P(Y \ge X)\) where \((X, Y)\) has the probability density function in (a).
  3. Find the conditional density of \((X, Y)\) given \(\{X \lt \frac{1}{2}, Y \lt \frac{1}{2}\}\).
Answer:
  1. \(\frac{1}{2}\)
  2. \(f(x, y \mid X \lt \frac{1}{2}, Y \lt \frac{1}{2}) = 8 (x + y)\) for \(0 \lt x \lt \frac{1}{2}\), \(0 \lt y \lt \frac{1}{2}\)

Let \(g(x, y) = x + y\) for \(0 \le x \le y \le 1\).

  1. Find the probability density function \(f\) that is proportional to \(g\).
  2. Find \(\P(Y \ge 2 X)\) where \((X, Y)\) has the probability density function in (a).
Answer:
  1. \(f(x,y) = 2(x + y)\), \(0 \le x \le y \le 1\)
  2. \(\frac{5}{12}\)

Let \(g(x, y) = x^2 y\) for \(0 \le x \le 1\), \(0 \le y \le 1\).

  1. Find the probability density function \(f\) that is proportional to \(g\).
  2. Find \(\P(Y \ge X)\) where \((X, Y)\) has the probability density function in (a).
Answer:
  1. \(f(x,y) = 6 x^2 y\) for \(0 \le x \le 1\), \(0 \le y \le 1\)
  2. \(\frac{2}{5}\)

Let \(g(x, y) = x^2 y\) for \(0 \le x \le y \le 1\).

  1. Find the probability density function \(f\) that is proportional to \(g\).
  2. Find \(P(Y \ge 2 X)\) where \((X, Y)\) has the probability density function in (a).
Answer:
  1. \(f(x,y) = 15 x^2 y\) for \(0 \le x \le y \le 1\)
  2. \(\frac{1}{8}\)

Let \(g(x, y, z) = x + 2 y + 3 z\) for \(0 \le x \le 1\), \(0 \le y \le 1\), \(0 \le z \le 1\).

  1. Find the probability density function \(f\) that is proportional to \(g\).
  2. Find \(\P(X \le Y \le Z)\) where \((X, Y, Z)\) has the probability density function in (a).
Answer:
  1. \(f(x, y, z) = \frac{1}{3}(x + 2 y + 3 z)\) for \(0 \le x \le 1\), \(0 \le y \le 1\), \(0 \le z \le 1\)
  2. \(\frac{7}{36}\)

Let \(g(x, y) = e^{-x} e^{-y}\) for \(0 \le x \le y \lt \infty\).

  1. Find the probability density function \(f\) that is proportional to \(g\).
  2. Find \(\P(X + Y \lt 1)\) where \((X, Y)\) has the probability density function in (a).
Answer:
  1. \(f(x,y) = 2 e^{-x} e^{-y}\), \(0 \lt x \lt y \lt \infty\)
  2. \(1 - 2 e^{-1} \approx 0.2642\)

Continuous Uniform Distributions

In this subsection, we will study an important class of continuous distributions. First, recall again that the standard measure of size on \(\R^n\) is

\[\lambda_n(A) = \int_A \bs{1} d \bs{x}, \quad A \subseteq \R^n\]

In particular,

Suppose that \(S \subseteq \R^n\) with \(0 \lt \lambda_n(S) \lt \infty\). Then

  1. \(f(x) = 1 / \lambda_n(S)\) for \(x \in S\) defines a probability density function on \(S\).
  2. If \(X\) has the probability density function in (a) then \(\P(X \in A) = \lambda_n(A) / \lambda_n(S) \) for \(A \subseteq S\).

A random variable \(X\) with the probability density function in Exercise 38 is said to have the continuous uniform distribution on \(S\). From part (b), note that the probability assigned to a subset \(A\) of \(S\) is proportional to the standard measure of \(A\). Note also that in both the discrete and continuous cases, a random variable \(X\) is uniformly distributed on a set \(S\) if and only if the probability density function is constant on \(S\). Uniform distributions play a fundamental role in various Geometric Models.

The most important special case is the uniform distribution on an interval \([a, b]\) where \(a, b \in \R\), and \(a \lt b\). In this case, the probability density function is \[f(x) = \frac{1}{b - 1}, \quad a \le x \le b\] This distribution models a point chosen at random from the interval. In particular, the uniform distribution on \([0, 1]\) is known as the standard uniform distribution, and is very important because of its simplicity and the fact that it can be transformed into a variety of other probability distributions on \(\R\). Almost all computer languages have procedures for simulating independent, standard uniform variables.

Give the probability density function of \((X, Y)\) and find \(\P(X \gt 0, Y \gt 0)\) in each of the following cases:

  1. \((X, Y)\) is uniformly distributed on the square \(S = [-6, 6]^2\).
  2. \((X, Y)\) is uniformly distributed on the triangle \(S = \{(x, y): -6 \le y \le x \le 6\}\).
  3. \((X, Y)\) is uniformly distributed on the circle \(S = \{(x, y): x^2 + y^2 \le 36\}\).
Answer:
  1. \(f(x,y) = \frac{1}{36}\) for \(-6 \le x \le 6\), \(-6 \le y \le 6\). \(\P(X \gt 0, Y \gt 0) = \frac{1}{4}\)
  2. \(f(x,y) = \frac{1}{18}\) for \(-6 \le x \le y \le 6\). \(\P(X \gt 0, Y \gt 0) = \frac{1}{4}\)
  3. \(f(x,y) = \frac{1}{36 \pi}\) for \(x^2 + y^2 \le 36\). \(\P(X \gt 0, Y \gt 0) = \frac{1}{4}\).

In the bivariate uniform experiment, select each of the following domains and then run the simulation 1000 times. Watch the points in the scatter plot. Compute the empirical probability of the event \(\{X \gt 0, Y \gt 0\}\) and compare with the true probability in the previous exercise.

  1. Square
  2. Triangle
  3. Circle

Suppose that \((X, Y, Z)\) is uniformly distributed on the cube \(S = [0, 1]^3\). Find \(\P(X \lt Y \lt Z)\).

  1. Compute the probability using the probability density function.
  2. Compute the probability using a combinatorial argument.
Answer:

\(\P(X \lt Y \lt Z) = \frac{1}{6}\). In (b), argue that each of the 6 orderings of \((X, Y, Z)\) should be equally likely.

The time \(T\) (in minutes) required to perform a certain job is uniformly distributed over the interval \([15, 60]\).

  1. Find the probability that the job requires more than 30 minutes
  2. Given that the job is not finished after 30 minutes, find the probability that the job will require more than 15 additional minutes.
Answer:
  1. \(\frac{2}{3}\)
  2. \(\frac{1}{6}\)

Suppose that \(S \subseteq \R^n\) and that \(0 \lt \lambda_n(S) \lt \infty\) and that \(R \subseteq S\) with \(\lambda_n(R) \gt 0\). If \(X\) is uniformly distributed on \(S\), then the conditional distribution of \(X\) given \(X \in R\) is uniformly distributed on \(R\).

Proof:

For \( A \subseteq R \),

\[ \P(X \in A \mid X \in R) = \frac{\P(X \in A, X \in R)}{\P(X \in R)} = \frac{\P(X \in A)}{\P(X \in R)} = \frac{\lambda_n(A) / \lambda_n(S)}{\lambda_n(R) / \lambda_n(S)} = \frac{\lambda_n(A)}{\lambda_n(R)} \]

Simulation

The last exercise has important implications for simulations. Suppose that \(R \subseteq \R^n\) satisfies \(\lambda_n(R) \gt 0\) as before, but suppose also that \(R\) is bounded, so that \(R \subseteq S\) where \(S \subseteq \R^n\) is a Cartesian product of \(n\) bounded intervals. It turns out to be quite easy to simulate a random variable \(X\) that is uniformly distributed on the product set \(S\). More generally, it is easy to simulate a sequence of independent random variables \((X_1, X_2, \ldots)\) each of which is uniformly distributed on \(S\). Now let \(N = \min\{k \in \N_+: X_k \in R\}\), the first time that one of the random variables lands in \(R\). Note that \(N\) has the geometric distribution on \(\N_+\) with success parameter \(p = \lambda_n(R) / \lambda_n(S)\). Let \(Y = X_N\) so that \(Y\) is the first term of the sequence that falls in \(R\). We know from our work on independence and conditional probability, that the distribution of \(Y\) is the same as the conditional distribution of \(X\) given \(X \in R\), which by the previous exercise, is uniformly distributed on \(R\). Thus, we have derived an algorithm for simulating a random variable that is uniformly distributed on an irregularly shaped region \(R\) (assuming that we have an algorithm for recognizing when a point \(x \in \R^n\) falls in \(R\)). This method of simulation is known as the rejection method, and as we will see in subsequent sections, is more important that might first appear.

Data Analysis Exercises

If \(D\) is a data set from a variable \(X\) with a continuous distribution, then an empirical density function can be computed by partitioning the data range into subsets of small size, and then computing the probability density of points in each subset. Empirical probability density functions are studied in more detail in the chapter on Random Samples.

For the cicada data, \(BW\) denotes body weight (in grams), \(BL\) body length (in millimeters), and \(G\) gender (0 for female and 1 for male). Construct an empirical density function for each of the following and display each as a bar graph:

  1. \(BW\)
  2. \(BL\)
  3. \(BW\) given \(G = 0\)
Answer:
  1. BW \((0, 0.1]\) \((0.1, 0.2]\) \((0.2, 0.3]\) \((0.3, 0.4]\)
    Density 0.8654 5.8654 3.0769 0.1923
  2. BL \((15, 29]\) \((20, 25]\) \((25, 30]\) \((30, 35]\)
    Density 0.0058 0.1577 0.0346 0.0019
  3. BW \((0, 0.1]\) \((0.1, 0.2]\) \((0.2, 0.3]\) \((0.3, 0.4]\)
    Density given \(G = 0\) 0.3390 4.4068 5.0847 0.1695

For the cicada data, \(WL\) denotes wing length and \(WW\) wing width (both in millimeters). Construct an empirical density function for \((WL, WW)\).

Degenerate Continuous Distributions

Unlike the discrete case, the existence of a probability density function for a continuous distribution is an assumption that we are making. A random variable can have a continuous distribution on a subset \(S \subseteq \R^n\) but with no probability density function; the distribution is sometimes said to be degenerate. In this subsection, we explore the common ways in which such distributions can occur.

Reducing the Dimension

First, suppose that \(\bs{X}\) is a random variable taking values in \(S \subseteq \R^n\) with \(\lambda_n(S) = 0\). It is possible for \(\bs{X}\) to have a continuous distribution, but not for \(\bs{X}\) to have a probability density function relative to \(\lambda_n\). In particular, property (c) in the definition could not hold, since \(\int_A f(\bs{x}) d \bs{x} = 0\) for any function \(f\) and any \(A \subseteq S\). However, in many cases, \(\bs{X}\) may be defined in terms of continuous random variables on lower dimensional spaces that do have probability density functions.

For example, suppose that \(\bs{U}\) is a random variable with a continuous distribution on \(T \subseteq \R^k\) where \(k \lt n\), and that \(\bs{X} = h(\bs{U})\) for some continuous function \(h : T \to S\). Any event defined in terms of \(\bs{X}\) can be changed into an event defined in terms of \(\bs{U}\). The following exercise illustrates this situation.

Suppose that \(\Theta\) is uniformly distributed on the interval \([0, 2 \pi)\). Let \(X = \cos(\Theta)\), \(Y = \sin(\Theta)\).

  1. Show that \((X, Y)\) has a continuous distribution on the circle \(C = \{(x, y): x^2 + y^2 = 1\}\).
  2. Show that \((X, Y)\) does not have a probability density function on \(C\) (with respect to the area measure \(\lambda_2\) on \(\R^2\)).
  3. Find \(\P(Y \gt X)\).
Answer:
  1. \(\frac{1}{2}\)

Mixed Components

Another situation occurs when a random vector \(\bs{X}\) in \(\R^n\) (with \(n \gt 1\)) has some components with discrete distributions and others with continuous distributions. Such distributions with mixed components are studied in more detail in the section on mixed distributions; however, the following exercise gives an illustration.

Suppose that \(X\) is uniformly distributed on the set \(\{0, 1, 2\}\), \(Y\) is uniformly distributed on the interval \([0, 2]\), and that \(X\) and \(Y\) are independent.

  1. Show that \((X, Y)\) has a continuous distribution on the product set \(S = \{0, 1, 2\} \times [0, 2]\).
  2. Show that \((X, Y)\) does not have a probability density function on \(S\) (with respect to the area measure \(\lambda_2\) on \(\R^2\)).
  3. Find \(\P(Y \gt X)\).
Answer:
  1. \(\frac{1}{2}\)

Singular Continuous Distributions

Finally, it is also possible to have a continuous distribution on \(S \subseteq \R^n\) with \(\lambda_n(S) \gt 0\), yet still with no probability density function. Such distributions are said to be singular, and are rare in applied probability. However, it is not difficult to construct such a distribution. Let \((X_1, X_2, \ldots)\) be a sequence of Bernoulli trials with success parameter \(p \in (0, 1)\). We will indicate the dependence of the probability measure \(\P\) on the parameter \(p\) with a subscript. Thus, we have a sequence of independent indicator variables with

\[\P_p(X_i = 1) = p, \quad \P_p(X_i = 0) = 1 - p\]

We interpret \(X_i\) as the \(i\)th binary digit (bit) of a random variable \(X\) taking values in \([0, 1]\). That is,

\[X = \sum_{i=1}^\infty \frac{X_i}{2^i}\]

Conversely, recall that every number \(x \in [0, 1]\) can be written in binary form:

\[x = \sum_{i=1}^\infty \frac{x_i}{2^i} \text{ where } x_i \in \{0, 1\} \text{ for each } i \in \N_+\]

This representation is unique except when \(x\) is a binary rational of the form \(x = \frac{k}{2^n}\) for some \(k \in \{1, 3, \ldots 2^n - 1\}\). In this case, there are two representations, one in which the bits are eventually 0 and one in which the bits are eventually 1. By convention, we will use the first representation. Note, however, that the set of binary rationals is countable.

\(X\) has a continuous distribution. That is, \(\P_p(X = x) = 0\) for \(x \in [0, 1]\).

Proof:

If \(x \in [0, 1]\) is not a binary rational, then

\[ \P_p(X = x) = \P_p(X_i = x_i \text{ for all } i \in \N_+) = \lim_{n \to \infty} \P_p(X_i = x_i \text{ for } i = 1, \; 2 \ldots, \; n) = \lim_{n \to \infty} p^y (1 - p)^{n - y} \text{ where } y = \sum_{i=1}^n x_i \]

Let \(q = \max\{p, 1 - p\}\). Then \(p^y (1 - p)^{n - y} \le q^n \to 0\) as \(n \to \infty\). Hence, \(\P_p(X = x) = 0\). If \(x \in [0, 1]\) is a binary rational, then there are two bit strings that represent \(x\), say \((x_1, x_2, \ldots)\) (with bits eventually 0) and \((y_1, y_2, \ldots)\) (with bits eventually 1). Hence \(\P_p(X = x) = \P_p(X_i = x_i \text{ for all } i \in \N_+) + \P_p(X_i = y_i \text{ for all } i \in \N_+)\). But both of these probabilities are 0 by the same argument as before.

Next, we define the set of numbers for which the limiting relative frequency of 1's is \(p\). Let

\[C_p = \left\{ x \in [0, 1]: \frac{1}{n} \sum_{i = 1}^n x_i \to p \text{ as } n \to \infty \right\} \]

The sets \(\{C_p: p \in [0, 1]\}\) satisfy the following properties:

  1. \(C_p \cap C_q = \emptyset\) for \(p \ne q\).
  2. \(\P_p(X \in C_p) = 1\)
Proof:

Part (a) is trivial, since limits are unique. Part (b), follows from the strong law of large numbers which we will study later in the chapter on random samples. The basic idea is simple: in a sequence of Bernoulli trials with success probability \( p \), the long-term relative frequence of successes is \( p \).

It follows that the distributions of \(X\), as \(p\) varies from 0 to 1, are mutually singular; that is, as \(p\) varies, \(X\) takes values in mutually disjoint sets.

If \(p = \frac{1}{2}\) then \(X\) is uniformly distributed on \([0, 1]\).

Proof:

Let \(F(x) = \P_p(X \le x) = \P_p(X \lt x)\) for \(x \in [0, 1]\). The function \(F\) is the distribution function of \(X\). Distribution functions are studied in more detail later in this chapter. If \(x \in [0, 1]\) is not a binary rational, then \(X \lt x\) if and only if there exists \(n \in \N_+\) such that \(X_i = x_i\) for \(i \in \{1, 2, \ldots, n - 1\}\) and \(X_n = 0\) while \(x_n = 1\). Hence

\[ \P_{1/2}(X \lt x) = \sum_{n=1}^\infty \frac{x_n}{2^n} = x \]

It turns out, as we will see, that the distribution function of a continuous distribution is continuous, so \(F(x) = x\) for all \(x \in [0, 1]\). This means that \(X\) has the uniform distribution on \([0, 1]\).

Note that the uniform distribution on \([0, 1]\) is the same as the standard measure \(\lambda_1\) on \([0, 1]\).

If \(p \ne \frac{1}{2}\), the distribution of \(X\) does not have a probability density function relative to the standard measure on \([0, 1]\).

Proof:

Suppose that \(X\) does have a probability density function \(f\). Then for \(A \subseteq [0, 1]\), \(\P_p(X \in A) = \int_A f(x) \, dx\), so in particular

\[ 1 = \P_p(X \in C_p) = \int_{C_p} f(x) \, dx \]

But since \(p \ne 1/2\), \(\lambda_1(C_p) = \P_{1/2}(X \in C_p) = 0\) which implies that \(\int_{C_p} f(x) \, dx = 0\), so we have a contradiction.

For an applied example of the ideas in this subsection, see Bold Play in the game of Red and Black.