]>
As usual, we start with a random experiment with probability measure on an underlying sample space . A random variable for the experiment that takes values in a countable set is said to have a discrete distribution. Typically, for some , so in particular, if , is vector-valued. In the picture below, the blue dots are intended to represent points of positive probability.
The (discrete) probability density function of is the function from to defined by
Show that satisfies the following properties. Hint: recall the axioms of a probability measure.
Property (c) is particularly important since it shows that the probability distribution of a discrete random variable is completely determined by its probability density function. Conversely, any function that satisfies properties (a) and (b) is a (discrete) probability density function, and then property (c) can be used to construct a discrete probability distribution on . Technically, is the density of relative to counting measure on .
As noted before, is typically a countable subset of some larger set, such as for some . We can always extend , if we want, to the larger set by defining for . Sometimes this extension simplifies formulas and notation.
An element that maximizes the probability density function is called a mode of the distribution. When there is only one mode, it is sometimes used as a measure of the center of the distribution.
A discrete probability distribution is equivalent to a discrete mass distribution, with total mass 1. In this analogy, is the (countable) set of point masses, and is the mass of the point at . Property (c) in Exercise 1 simply means that the mass of a set can be found by adding the masses of the points in .
For a probabilistic interpretation, suppose that we create a new, compound experiment by repeating the original experiment indefinitely. In the compound experiment, we have a sequence of independent random variables each with the same distribution as ; in statistical terms, we are sampling from the distribution of . Define
This is the relative frequency of in the first runs. Note that for each , is a random variable for the compound experiment. By the law of large numbers, should converge to , in some sense, as . The function is called the empirical density function; such functions are displayed in most of the simulation applets that deal with discrete variables.
Suppose that is a nonnegative function defined on a countable set . Let
Show that if , then defines a discrete probability density function on .
Note that if and only if for every . At the other extreme could only occur if is infinite. When , so that we can construct the density function , is sometimes called the normalizing constant. This result is useful for constructing density functions with desired functional properties (domain, shape, symmetry, and so on).
The probability density function of a random variable is based, of course, on the underlying probability measure on the sample space . This measure could be a conditional probability measure, conditioned on a given event (with ). The usual notation is
The following exercise shows that, except for notation, no new concepts are involved. Therefore, all results that hold for discrete probability density functions in general have analogies for conditional discrete probability density functions.
Show that for fixed , the function is a discrete probability density function. That is, show that it satisfies properties (a) and (b) of Exercise 1, and show that property (c) becomes
Suppose that and Show that the conditional density of given is
Suppose that is a random variable with a discrete distribution on a countable set , and that is an event in the experiment. Let denote the probability density function of .
Note that is a partition of the sample space .
Because of this result, the versions of the conditioning rule and Bayes' theorem given in the following exercises follow immediately from the corresponding results in the section on Conditional Probability. Only the notation is different.
Show that:
This result is useful, naturally, when the distribution of and the conditional probability of given the values of are known. We say that we are conditioning on .
Prove Bayes' Theorem, named after Thomas Bayes:
Bayes' theorem is a formula for the conditional probability density function of given . Again, it is useful, when the quantities on the right are known. In the context of Bayes' theorem, the (unconditional) distribution of is referred to as the prior distribution and the conditional distribution as the posterior distribution. Note that the denominator in Bayes' formula is and is simply the normalizing constant.
Our first three models below--the discrete uniform distribution, hypergeometric distributions, and Bernoulli trials are very important. When working the computational problems that follow, try to see if the problem fits one of these models.
An element is chosen at random from a finite set . The phrase at random means that all outcomes are equally likely.
The distribution in the last exercise is called the discrete uniform distribution on . Many random variables that arise in sampling or combinatorial experiments are transformations of uniformly distributed variables.
Suppose that elements are chosen at random, with replacement from a set with elements. Let denote the ordered sequence of elements chosen. Argue that is uniformly distributed on the set , and hence has probability density function
Suppose that elements are chosen at random, without replacement from a set with elements. Let denote the ordered sequence of elements chosen. Argue that is uniformly distributed on the set of permutations of size chosen from , and hence has probability density function
Suppose that elements are chosen at random, without replacement, from a set with elements. Let denote the unordered set of elements chosen. Show that is uniformly distributed on the set of combinations of size chosen from , and hence has probability density function
Suppose that is uniformly distributed on a finite set and that is a nonempty subset of . Show that the conditional distribution of given is uniform on .
Suppose that a population consists of objects; of the objects are type 1 and are type 0. A sample of objects is chosen at random (without replacement). Let denote the number of type 1 objects in the sample. Show that has probability density function.
Suppose that a population consists of objects; of the objects are type 1, are type 2, and are type 0. A sample of objects is chosen at random (without replacement). Let denote the number of type 1 objects in the sample and the number of type 2 objects in the sample. Show that has probability density function.
The distribution defined by the density function in Exercise 13 is the hypergeometric distributions with parameters , , and . The distribution defined by the density function in Exercise 14 is the bivariate hypergeometric distribution with parameters , , , and . Clearly, the same general pattern applies to populations with even more types. The hypergeometric distribution and the multivariate hypergeometric distribution are studied in detail in the chapter on Finite Sampling Models. This chapter contains a rich variety of distributions that are based on discrete uniform distributions.
A Bernoulli trials sequence is a sequence of independent, identically distributed indicator variables. Random variable is the outcome of trial , and in the usual terminology of reliability, 1 denotes success while 0 denotes failure, The process is named for Jacob Bernoulli. Let denote the success parameter of the process.
Show that has probability density function
Let denote the number of successes in the first trials. Show that has probability density function
The distribution defined by the probability density function in the last exercise is called the binomial distribution with parameters and . The binomial distribution is studied in detail in the chapter on Bernoulli Trials.
Consider again a sequence of Bernoulli trials with success parameter . Let denote the trial number of the first success and the number of failures before the first success. Show that
The distribution defined by the density function in part (a) is the geometric distribution on
.
The distribution defined by the density function in part (b) is the geometric distribution on
An urn contains 30 red and 20 green balls. A sample of 5 balls is selected at random, without replacement. Let
In the ball and urn experiment, select sampling without replacement and set
An urn contains 30 red and 20 green balls. A sample of 5 balls is selected at random, with replacement. Let
In the ball and urn experiment, select sampling with replacement and set
Suppose that a coin with probability of heads
In the coin experiment, set
Suppose that two fair, standard dice are tossed and the sequence of scores
In the dice experiment, select
In the die-coin experiment, a fair, standard die is rolled and then a fair coin is tossed the number of times showing on the die. Let
Run the die-coin experiment 1000 times, updating after each run. For the number of heads, note the apparent convergence of the empirical density function to the probability density function.
Suppose that a bag contains 12 coins: 5 are fair, 4 are biased with probability of heads
Compare Exercise 26 and Exercise 28. In the first exercise, we toss a coin with a fixed probability of heads a random number of times. In second exercise, we effectively toss a coin with a random probability of heads a fixed number of times.
In the coin-die experiment, a fair coin is tossed. If the coin lands tails, a fair die is rolled. If the coin lands heads, an ace-six flat die is tossed (faces 1 and 6 have probability
Run the coin-die experiment 1000 times, updating every 10 runs. Compare the empirical density of the die score with the theoretical density in the last exercise.
A coin with probability of heads
In the negative binomial experiment, set
Recall that a poker hand consists of 5 cards chosen at random and without replacement from a standard deck of 52 cards. Let
Recall that a bridge hand consists of 13 cards chosen at random and without replacement from a standard deck of 52 cards. An honor card is a card of denomination ace, king, queen, or jack. In the most common point counting system, an ace is worth 4 points, a king is worth 3 points, a queen is worth 2 points, and a jack is worth 1 point. Let
Suppose that in a batch of 500 components, 20 are defective and the rest are good. A sample of 10 components is selected at random and tested. Let
A plant has 3 assembly lines that produce a certain type of component. Line 1 produces 50% of the components and has a defective rate of 4%; line 2 has produces 30% of the components and has a defective rate of 5%; line 3 produces 20% of the components and has a defective rate of 1%. A component is chosen at random from the plant and tested.
Recall that in the standard model of structural reliability, a systems consists of
The reliability of a device is the probability that it is working. Let
Suppose that the component reliabilities all have the same value
Suppose again that the component reliabilities all have the same value
Let
The distribution defined by the density in the previous exercise is the Poisson distribution with parameter
random points
in a region of time or space. The parameter
Suppose that the number of misprints
In the Poisson process, set
Let
The distribution defined in the previous exercise is a member of the zeta family of distributions. Zeta distributions are used to model sizes or ranks of certain types of objects, and are studied in more detail in the chapter on Special Distributions.
Let
The distribution defined in the previous exercise is known as Benford's law, and is named for the American physicist and engineer Frank Benford. This distribution governs the leading digit in many real sets of data. Benford's law is studied in more detail in the chapter on Special Distributions.
Let
Let
Let
Let
Let
In the M&M data, let
In the Cicada data, let