]> Conditional Distributions
  1. Virtual Laboratories
  2. 2. Distributions
  3. 1
  4. 2
  5. 3
  6. 4
  7. 5
  8. 6
  9. 7
  10. 8

5. Conditional Distributions

Basic Theory

As usual, we start with a random experiment with probability measure on an underlying sample space Ω . Suppose that X is a random variable for the experiment, taking values in a set S . The purpose of this section is to study the conditional probability measure given X x for x S . Thus, if E Ω is an event for the experiment, we would like to define and study

E X x ,  x S

We will see that if X has a discrete distribution, no new concepts are involved, and the simple definition of conditional probability suffices. When X has a continuous distribution, however, a fundamentally new approach is needed.

The Discrete Case

Suppose first that X has a discrete distribution with probability density function g . Thus, S is countable and we can assume that g x 0 for x S

Show that if E is an event in the experiment then

E X x X x E g x ,  x S

Prove that if E is an event in the experiment and A is a subset of S then

X A E x A E X x g x

Conversely, this result completely characterizes the conditional distribution given X x ,

Suppose that the function Q x E defined for x S and for events E satisfies

E X A x A Q x E g x ,  A S

Show that Q x E E X x for all x S and all events E .

The Continuous Case

Suppose now that X has a continuous distribution on S n , with probability density function g . We assume that g x 0 for x S . Unlike the discrete case, we cannot use simple conditional probability to define the conditional probability of an event E given X x , because the conditioning event has probability 0 for every x . Nonetheless, the concept should make sense. If we actually run the experiment, X will take on some value x (even though a priori, this event occurs with probability 0), and surely the information that X x should in general alter the probabilities that we assign to other events. A natural approach is to use the results obtained in the discrete case as definitions in the continuous case. Thus, based on the characterization above, we define the conditional probability

E X x ,  x S

by the requirement that for any (measurable) subset A of S .

E X A x A E X x g x

For now, we will accept the fact that E X x can be defined by this condition. However, we will return to this point in the section on Conditional Expectation in the chapter on Expected Value.

Conditioning and Bayes' Theorem

Again, suppose that X is a random variable and that E is an event. From our discussion in the last two sections, we have the basic formulas for computing the probability of E by conditioning on X , in the discrete and continuous cases, respectively:

E x S g x E X x E x S g x E X x

Bayes' Theorem, named after Thomas Bayes, gives a formula for the conditional probability density function of X given E , in terms of the probability density function of X and the conditional probability of E given X x

Suppose that X has probability density function g and that E is an event with E 0 . Show that the conditional probability density function of X given E is as follows, in the discrete and continuous cases, respectively.

g x E g x E X x s S g s E X s ,  x S g x E g x E X x s S g s A X s ,  x S

In the context of Bayes' theorem, g is called the prior probability density function of X and x g x E is the posterior probability density function of X given E . Note also that the conditional probability density function of X given E is proportional to g x E X x , the sum or integral is simply the normalizing constant.

Conditional Probability Density Functions

The definitions and results above apply, of course, if E is an event defined in terms of another random variable for our experiment. Thus, suppose that Y is a random variable taking values in a set T . Then X Y is a random variable taking values in the product set S T . We will assume that X Y has (joint) probability density function f . (In particular, we are assuming one of the standard distribution types: jointly discrete, jointly continuous with a probability density function, or mixed components with a probability density function.) As before, g denotes the probability density function of X and we assume g x 0 for x S .

Show that the function defined below is a probability density function in y T for each x S :

h y x f x y g x ,  x S ,  y T

The next exercise shows that y h y x is the conditional probability density function of Y given X x

Prove the following results, in the case that Y has a discrete or continuous distribution, respectively: if B T then

Y B X x y B h y x ,  x S Y B X x y B h y x ,  x S

The following theorem gives Bayes' theorem for probability density functions. We use the notation established above, and additionally, let g x y denote the conditional probability density function of X at x S given Y y , for y T .

Prove the following results, in case that X has a discrete or continuous distribution, respectively:

g x y g x h y x s S g s h y s ,  x S ,  y T g x y g x h y x s S g s h y s ,  x S ,  y T

In the context of Bayes' theorem, g is the prior probability density function of X and x g x y , is the posterior probability density function of X given Y y . Note that x g x y is proportional to x g x h y x . The sum or integral is the normalizing constant.

Independence

Intuitively, X and Y should be independent if and only if the conditional distributions are the same as the corresponding unconditional distributions.

Show that the following conditions are equivalent:

  1. X and Y are independent.
  2. h y x h y for x S ,  y T
  3. g x y g x for x S ,  y T

In many cases, a conditional distribution arises when a parameter of a given distribution is randomized. Note this situation in some of the exercises that follow.

Examples and Applications

Coins and Dice

Suppose that two standard, fair dice are rolled and the sequence of scores X 1 X 2 is recorded. Let U X 1 X 2 and V X 1 X 2 denote the minimum and maximum scores, respectively.

  1. Find the conditional density of U given V v for each v 1 2 3 4 5 6
  2. Find the conditional density of V given U u for each u 1 2 3 4 5 6

In the die-coin experiment, a standard, fair die is rolled and then a fair coin is tossed the number of times showing on the die. Let N denote the die score and Y the number of heads.

  1. Find the joint density of N Y
  2. Find the density of Y .
  3. Find the conditional density of N given Y k for each k 0 1 2 3 4 5 6

In the die-coin experiment, select the fair die and coin.

  1. Run the simulation of the1000 times, updating every 10 runs. Compare the empirical density function of Y with the true density function in the previous exercise
  2. Run the simulation 200 times, updating after each run. Compute the empirical conditional density function of N given Y k for each k , and compare with the density function in the previous exercise

In the coin-die experiment, a fair coin is tossed. If the coin is tails, a standard, fair die is rolled. If the coin is heads, a standard, ace-six flat die is rolled (faces 1 and 6 have probability 14 each and faces 2, 3, 4, 5 have probability 18 each). Let X denote the coin score (0 for tails and 1 for heads) and Y the die score.

  1. Find the joint probability density function of X Y .
  2. Find the density of Y .
  3. Find the conditional density of X given Y y for each y 1 2 3 4 5 6

In the coin-die experiment, select the settings of the previous exercise.

  1. Run the simulation 1000 times, updating every 10 runs. Compare the empirical density function of Y with the true density in the previous exercise.
  2. Run the simulation 200 times, updating after each run. Compute the empirical conditional probability density function of X given Y 2 , and compare with the density function in the previous exercise.

Suppose that a box contains 12 coins: 5 are fair, 4 are biased so that heads comes up with probability 13 , and 3 are two-headed. A coin is chosen at random and tossed 2 times. Let V denote the probability of heads of the selected coin, and Y the number of heads.

  1. Find the joint probability density function of V Y .
  2. Find the probability density function of Y .
  3. Find the conditional probability density function of V given Y k for k 0 1 2 .

Suppose that V has probability density function g p 6 p 1 p ,  0 p 1 . This is a member of the beta family of probability density functions; beta distributions are studied in more detail in the chapter on Special Distributions. Given V p , a coin with probability of heads p is tossed 3 times. Let Y denote the number of heads.

  1. Find the joint density of V Y .
  2. Find the probability density of function of Y .
  3. Find the conditional probability density of V given Y k for k 0 1 2 3 . Graph these on the same axes. Each conditional distribution is also a member of the beta family.

Compare Exercise 15 with Exercise 14. In the latter exercise, we effectively choose a coin from a box with a continuous infinity of coin types.


Suppose that there are 5 light bulbs in a box, labeled 1 to 5. The lifetime of bulb n (in months) has the exponential distribution with rate parameter n . A bulb is selected at random from the box and tested.

  1. Find the probability that the selected bulb will last more than one month.
  2. Given that the bulb lasts more than one month, find the conditional probability density function of the bulb number.

Suppose that N has the Poisson distribution with parameter 1, and given N n , Y has the binomial distribution with parameters n and p .

  1. Find the joint probability density function of N Y .
  2. Find the probability density function of Y .
  3. Find the conditional probability density function of N given Y k .

Suppose that X is uniformly distributed on 1 2 3 , and given X i , Y is uniformly distributed on the interval 0 i .

  1. Find the joint probability density function of X Y .
  2. Find the probability density function of Y .
  3. Find the conditional probability density function of X given Y y for y 0 3 .

Suppose that X Y has probability density function f x y x y for 0 x 1 ,  0 y 1 .

  1. Find the conditional density of X given Y y .
  2. Find the conditional density of Y given X x .
  3. Are X and Y independent?

Suppose that X Y has probability density function f x y 2 x y for 0 x y 1 .

  1. Find the conditional density of X given Y y .
  2. Find the conditional density of Y given X x .
  3. Are X and Y independent?

Suppose that X Y has probability density function f x y 15 x 2 y for 0 x y 1 .

  1. Find the conditional density of X given Y y .
  2. Find the conditional density of Y given X x .
  3. Are X and Y independent?

Suppose that X Y has probability density function f x y 6 x 2 y for 0 x 1 ,  0 y 1 .

  1. Find the conditional density of X given Y y .
  2. Find the conditional density of Y given X x .
  3. Are X and Y independent?

Suppose that X Y has probability density function f x y 2 x y for 0 x y .

  1. Find the conditional probability density function of X given Y y .
  2. Find the conditional probability density function of Y given X x .
  3. Are X and Y independent?

Suppose that X is uniformly distributed on the interval 0 1 , and that given X x , Y is uniformly distributed on the interval 0 x .

  1. Find the joint density of X Y .
  2. Find the probability density function of Y .
  3. Find the conditional probability density function of X given Y y for y 0 1 .

Multivariate Uniform Distributions

Multivariate uniform distributions give a geometric interpretation of some of the concepts in this section. Recall first that the standard measure n on n is

n A x A 1 ,  A n

In particular, 1 is the length measure on , 2 is the area measure on 2 , and 3 is the volume measure on 3 .

Suppose now that X takes values in j , Y takes values in k , and that X Y is uniformly distributed on a set R j k . Thus, by definition, the joint density function of X Y is

f x y 1 j k R ,  x y R

Let S and T be the projections of R onto j and k respectively, defined as follows:

S x j y k x y R T y k x j x y R

Note that R S T . Next we define the cross-sections at x and at y , respectively by

T x y T x y R ,  x S S y x S x y R ,  y T Cross-sections at x and y

In the last section on Joint Distributions, we saw that even though X Y is uniformly distributed, the marginal distributions of X and Y are not uniform in general. However, as the next exercise shows, the conditional distributions are always uniform.

Show that

  1. The conditional distribution of Y given X x is uniformly distributed on T x .
  2. The conditional distribution of X given Y y is uniformly distributed on S y .

Find the conditional density of each variable given a value of the other, and determine if the variables are independent, in each of the following cases:

  1. X Y is uniformly distributed on the square R 6 6 2 .
  2. X Y is uniformly distributed on the triangle R x y 2 6 y x 6 .
  3. X Y is uniformly distributed on the circle R x y 2 x 2 y 2 36 .

In the bivariate uniform experiment, run the simulation 5000 times, updating every 10 runs, in each of the following cases. Watch the points in the scatter plot and the graphs of the marginal distributions. Interpret what you see in the context of the discussion above.

  1. square
  2. triangle
  3. circle

Suppose that X Y Z is uniformly distributed on R x y z 3 0 x y z 1 .

  1. Find the conditional density of each pair of variables given a value of the third variable.
  2. Find the conditional density of each variable given values of the other two.

The Multivariate Hypergeometric Distribution

Recall the discussion of the (multivariate) hypergeometric distribution given in the last section on joint distributions. As in that discussion, suppose that a population consists of m objects, and that each object is one of four types. There are a objects of type 1, b objects of type 2, and c objects of type 3, and m a b c objects of type 0. The parameters a , b , and c are nonnegative integers with a b c m . We sample n objects from the population at random, and without replacement. Denote the number of type 1, 2, and 3 objects in the sample by U , V , and W , respectively. Hence, the number of type 0 objects in the sample is n U V W . In the problems below, the variables i , j , and k are nonnegative integers.

Use both a combinatorial argument and an analytic argument to show that the conditional distribution of U V given W k is hypergeometric and has the density function given below. The essence of the combinatorial argument is that we are selecting a random sample of size n k without replacement from a population of size m c , with a objects of type 1, b objects of type 2, and m a b c objects of type 0.

U i V j W k a i b j m a b c n i j k m c n k ,  i j k n

Use both a combinatorial argument and an analytic argument to show that the conditional distribution of U given V j and W k is hypergeometric, and has the density function given below. The essence of the combinatorial argument is that we are selecting a random sample of size n j k from a population of size m b c , with a objects of type 1, m a b c objects type 0.

U i V j W k a i m a b c n i j k m b c n j k ,  i j k n

These results generalize in a completely straightforward way to a population with any number of types. In brief, if a random vector has a hypergeometric distribution, then the conditional distribution of some of the variables, given values of the other variables, is also hypergeometric. The hypergeometric distribution and the multivariate hypergeometric distribution are studied in detail in the chapter on Finite Sampling Models.

Recall that a bridge hand consists of 13 cards selected at random and without replacement from a standard deck of 52 cards. Let U , V , and W denote the number of spades, hearts, and diamonds, respectively, in the hand. Find the density function of each of the following:

  1. U V given W 3
  2. U given V 3 and W 2

Multinomial Trials

Recall the discussion of multinomial trials in the last section on joint distributions. As in that discussion, suppose that we have a sequence of independent trials, each with 4 possible outcomes. On each trial, outcome 1 occurs with probability p , outcome 2 with probability q , outcome 3 with probability r , and outcome 0 with probability 1 p q r . The parameters p , q , and r are nonnegative numbers satisfying p q r 1 . Denote the number of times that outcome 1, outcome 2, and outcome 3 occurred in the n trials by U , V , and W respectively. Of course, the number of times that outcome 0 occurs is n U V W . In the problems below, variables i , j , and k are nonnegative integers.

Use a probability argument and an analytic argument to show that the conditional distribution of U V given W k is also multinomial, and has the density function given below. The essence of the probability argument is that effectively, we have n k independent trials, and on each trial, outcome 1 occurs with probability p 1 r and outcome 2 with probability q 1 r .

U i V j W k n k i j p 1 r i q 1 r j 1 p 1 r q 1 r n i j k ,  i j k n

Use a probability argument and an analytic argument to show that the conditional distribution of U given V j and W k is binomial, with the density function given below. The essence of the probability argument is that effectively, we have n j k independent trials, and on each trial, outcome 1 occurs with probability p 1 q r .

U i V j W k n j k i p 1 q r i 1 p 1 q r n i j k ,  i j k n

These results generalize in a completely straightforward way to multinomial trials with any number of trial outcomes. In brief, if a random vector has a multinomial distribution, then the conditional distribution of some of the variables, given values of the other variables, is also has a multinomial. The binomial distribution and the multinomial distribution are studied in detail in the chapter on Bernoulli Trials.

Recall that an ace-six flat die is a standard 6-sided die in which faces 1 and 6 have probability 14 each, while faces 2, 3, 4, and 5 have probability 18 each. Suppose that an ace-six flat die is thrown 50 times; let X i denote the number of times that score i occurred for i 1 2 3 4 5 6 . Find the density function of each of the following:

  1. X 1 X 2 X 4 X 5 given X 3 8
  2. X 1 X 2 X 5 given X 3 5 and X 4 7
  3. X 1 X 5 given X 2 4 , X 3 7 , and X 4 5
  4. X 3 given X 1 8 , X 2 4 , X 4 6 , X 5 7

Bivariate Normal Distributions

Suppose that X Y has probability density function

f x y 1 12 x 2 8 y 2 18 ,  x y 2
  1. Find the conditional density function of X given Y y .
  2. Find the conditional density function of Y given X x .
  3. Are X and Y independent?

Suppose that X Y has probability density function

f x y 1 3 23 x 2 x y y 2 ,  x y 2
  1. Find the conditional density function of X given Y y .
  2. Find the conditional density function of Y given X x .
  3. Are X and Y independent?

The joint distributions in the last two exercises are examples of bivariate normal distributions. The conditional distributions are also normal. Normal distributions are widely used to model physical measurements subject to small, random errors. The bivariate normal distribution is studied in more detail in the chapter on Special Distributions.

Mixtures of Distributions

With our usual sets S and T , as above, suppose that P x is a probability measure on T for each x S . Suppose also that g is a probability density function on S . We can obtain a new probability measure on T by averaging (or mixing) the given distributions according to g .

First suppose that S is countable, and that g is the probability density function of a discrete distribution on S . Show that defined below is a probability measure on T :

B x S g x P x B ,  B T

In the setting of the previous exercise, suppose that P x is a discrete (respectively continuous) distribution with density function h x for each x S . Show that is also discrete (respectively continuous) with density function h given by

h y x S g x h x y ,  y T

Suppose now that S n and that g is a probability density function of a continuous distribution on S . Show that defined below is a probability measure on T :

B x S g x P x B ,  B T

In the setting of the previous exercise, suppose that P x is a discrete (respectively continuous) distribution with probability density function h x for each x S . Show that is also discrete (respectively continuous) with density function h given by

h y x S g x h x y ,  y T

In both cases, the distribution is said to be a mixture of the set of distributions P x x S , with mixing density g .

One can have a mixture of distributions, without having random variables defined on a common probability space. However, mixtures are intimately related to conditional distributions. Returning to our usual setup, suppose that X and Y are random variables for an experiment, taking values in S and T respectively. Suppose that X either has a discrete or continuous distribution, with probability density function g . The following exercise is simply a restatement of the law of total probability.

Show that the distribution of Y is a mixture of the conditional distributions of Y given X x , over x S , with mixing density g .

Suppose that X is a random variable taking values in S n , with a mixed discrete and continuous distribution. Show that the distribution of X is a mixture of a discrete distribution and a continuous distribution, in the sense defined here.