]>
Suppose that we have a random experiment with sample space . Intuitively, the probability of an event is a measure of how likely the event is to occur when we run the experiment.
Mathematically, a probability measure (or distribution) for a random experiment is a real-valued function, defined on the collection of events, and satisfying the following axioms:
Axiom 3 is known as countable additivity, and states that the probability of a union of a finite or countably infinite collection of disjoint events is the sum of the corresponding probabilities. The axioms are known as the Kolmogorov axioms, in honor of Andrei Kolmogorov.
Axioms 1 and 2 are really just a matter of convention; we choose to measure the probability of an event with a number between 0 and 1 (as opposed, say, to a number between −5 and 7). Axiom 3 however, is fundamental and inescapable. It is required for probability for precisely the same reason that it is required for other measures of the size
of a set, such as
In all these cases, the size of a set that is composed of countably many disjoint pieces is the sum of the sizes of the pieces. For more on general measures, see the section on Measure Theory.
On the other hand, uncountable additivity (the extension of axiom 3 to an uncountable index set ) is unreasonable for probability, just as it is for other measures. For example, an interval of positive length in is a union of uncountably many points, each of which has length 0.
We now have defined the three essential ingredients for the model a random experiment:
Together these define a probability space
Intuitively, the probability of an event is supposed to measure the long-term relative frequency of the event--in fact, this concept was taken as the definition of probability by Richard Von Mises. Specifically, suppose that we repeat the experiment indefinitely. (Note that this actually creates a new, compound experiment.) For an event in the basic experiment, let denote the number of times occurred (the frequency of ) in the first runs. (Note that this is a random variable in the compound experiment.) Thus,
is the relative frequency of in the first runs (it is also a random variable in the compound experiment). If we have chosen the correct probability measure for the experiment, then in some sense we expect that the relative frequency of each event should converge to the probability of the event:
The precise statement of this is the law of large numbers or law of averages, one of the fundamental theorems in probability. To emphasize the point, note that in general there will be lots of possible probability measures for an experiment, in the sense of the axioms. However, only the true probability measure will satisfy the law of large numbers.
It follows that if we have the data from runs of the experiment , the observed relative frequency can be used as an approximation for ; this approximation is called the empirical probability of .
Show that satisfies the axioms of a probability measure (given the data from runs of the experiment)
Suppose that is a random variable for the experiment, taking values in a set .
Show that defines a probability measure on .
Hint: Recall that the inverse image preserves all set operations.
The probability measure in the previous exercise is called the probability distribution of . Thus, any random variable for an experiment defines a new probability space:
Moreover, recall that the outcome of the experiment itself can be thought of as a random variable. Specifically, if we take to be the identity function on , then is a random variable and
Thus, any probability measure can be thought of as the distribution of a random variable.
How can we construct probability measures? As noted briefly above, there are other measures of the size
of sets; in many cases, these can be converted into probability measures. First, a (nonnegative) measure
on
is a real-valued function defined on the collection of events
that satisfies axioms 1 and 3 above. In general,
is allowed to be infinite for a subset
.
However, if
is positive and finite, then
can easily be re-scaled into a probability measure.
Show that if is a measure on with then defined below is a probability measure on .
In the context of Exercise 3, is called the normalizing constant. In the next two subsections, we consider some very important special cases.
Suppose that is a finite, nonempty set. Clearly, counting measure is a finite measure on :
The corresponding probability measure is called the discrete uniform distribution on , and is particularly important in combinatorial and sampling experiments:
We can give a more general construction for countable sample spaces that can be used to define many probability measures.
Suppose that is nonempty and countable, and that is a nonnegative real-valued function defined on . Show that defined below is a measure on :
Thus, if then defines a probability measure by Exercise 3. Distributions of this type are said to be discrete. Discrete distributions are studied in detail in the chapter on Distributions.
In the setting of previous exercise, show that if is finite and is a constant function, then the corresponding probability measure is the discrete uniform distribution on .
We define the standard -dimensional measure on (called Lebesgue measure, in honor of Henri Lebesgue) by
Technically, the integral is more general than the one defined in calculus. However, the standard calculus integral will suffice for most of this project. In particular, we assume that set is nice enough that the integral exists; see the section on Measurability for more details. Note that if , the integral above is a multiple integral; and . The countable additivity axiom holds because of a basic property of integrals. In particular, note from calculus that
Now, if with , then
is a probability measure on by Exercise 3, called the continuous uniform distribution on .
We can generalize this construction to produce many other distributions. Suppose that is a nonnegative real valued function defined on . Define
Then is a measure on . Thus if , then defines a probability measure as in Exercise 3. Distributions of this type are said to be continuous. Continuous distributions are studied in detail in the chapter on Distributions.
It is important to note again that, unlike many other areas of mathematics, the low-dimensional spaces () do not play a special role, except for exposition. For example in the Cicada data, some of the variables recorded are body weight, body length, wing width, and wing length. A probability model for these variables would specify a distribution on a subset of .
Suppose that we have a random experiment with sample space and probability measure . In the following exercises, and are events. Prove the results using the axioms of probability.
Show that .
Hint: and are disjoint and their union is .
Show that .
Hint: Apply the complement rule to .
Show that .
Hint: and are disjoint and their union is .
Show that if then .
Hint: Apply the difference rule and note that
Show that if then .
Thus, is an increasing function, relative to the subset partial order on the collection of events, and the ordinary order on . In particular, it follows that for any event .
Suppose that . Show that
Suppose that is a countable collection of events. Prove Boole's inequality (named after George Boole):
Intuitively, Boole's inequality holds because parts of the union have been measured more than once in the expression on the right.
Suppose that is a countable collection of events with for . Use Boole's inequality to show that
An event with is said to be null. Thus, a countable union of null events is still a null event.
Suppose that is a countable collection of events. Prove Bonferroni's inequality (named after Carlo Bonferroni):
Hint: Apply Boole's inequality to
Suppose that is a countable collection of events with for each . Use Bonferroni's inequality to show that
An event with is sometimes called almost sure or almost certain. Thus, a countable intersection of almost sure events is still almost sure.
Suppose that and are events in an experiment. Prove the following:
Suppose that is a countable collection of events that partition the sample space . Show that for any event ,
Naturally, this result is useful when the probabilities of the intersections are known. Partitions usually arise in connection with a random variable. Suppose that is a random variable taking values in a countable set , and that is an event. Then
In this formula, note that the comma acts like the intersection symbol in the previous formula.
The inclusion-exclusion formulas provide a method for computing the probability of a union of events in terms of the probabilities of the intersections of the events.
Show that for any events and .
Show that for any events , , and .
Hint: Use the inclusion-exclusion rule for two events. You will use this rule three times.
The last two exercises can be generalized to a union of events; the generalization is known as the inclusion-exclusion formula.
Suppose that is an event for each where . Show that
Hint: Use induction on .
The general Bonferroni inequalities state that if sum on the right in Exercise 20 is truncated after terms () then the truncated sum is an upper bound for the probability of the union if is odd (so that the last term has a positive sign) and is a lower bound for the probability of the union if is even (so that the last terms has a negative sign).
If you go back and look at your proofs of the basic properties in Exercises 6-20, you will see that they hold for any finite measure , not just probability. The only change is that the number 1 is replaced by . In particular, the inclusion-exclusion rule is as important in combinatorics (the study of counting measure) as it is in probability.
Intuitively, equivalent events or random variables are those that are indistinguishable from a probabilistic point of view. The purpose of this subsection is to make this idea precise.
Events and in a random experiment are said to be equivalent if the probability of the symmetric difference is 0:
Show that equivalence really is an equivalence relation on the collection of events of a random experiment. Thus, the collection of events is partitioned into disjoint classes of mutually equivalent events.
Show that equivalent events have the same probability: if and are equivalent then
The converse fails with a passion. Consider the simple experiment of tossing a fair coin. Show that the event that the coin lands heads and the event that the coin lands tails have the same probability, but are not equivalent.
However, the null and almost sure events do form equivalence classes.
Now suppose that and are random variables for an experiment, each taking values in a set . Then and are said to be equivalent if
.Show that equivalence really is an equivalence relation on the collection of random variables that take values in . Thus, the collection of such random variables is partitioned into disjoint classes of mutually equivalent variables.
Suppose that and are equivalent random variables, taking values in a set . Show that for any , the events and are equivalent. Conclude that and have the same distribution.
Suppose that and are events for a random experiment. Show that and are equivalent if and only if the indicator random variables and are equivalent.
Suppose that and are equivalent random variables, taking values in a set , and that is a function from into a set . Show that and are equivalent.
Suppose that and are events in an experiment with , , . Express each of the following events in the language of the experiment and find its probability:
Suppose that , , and are events in an experiment with , , , , , , . Express each of the following events in set notation and find its probability:
Suppose that and are events in an experiment with , , and . Find the probability of each of the following events:
Suppose that and are events in an experiment with , , and . Find the probability of each of the following events:
Recall that the coin experiment consists of tossing coins and recording the sequence of scores (where 1 denotes heads and 0 denotes tails). Let denote the number of heads.
The experiment in the previous exercise is a special case of Bernoulli trials, named for Jacob Bernoulli. The number of heads has a binomial distribution
Consider the coin experiment with 3 fair coins. Let be the event that the first coin is heads and the event that there are exactly 2 heads. Find each of the following probabilities:
In the Coin experiment, select 3 coins. Run the experiment 1000 times, updating after every run, and compute the empirical probability of each event in the previous exercise.
Recall that the dice experiment consists of throwing distinct, -sided dice (with sides numbered from 1 to ) and recording the sequence of scores . Recall that this experiment serves as a generic example of multinomial trials and as a generic example of sampling with replacement from a finite population. The special case corresponds to standard dice.
Suppose that 2 fair, standard dice are rolled and the sequence of scores recorded. Let denote the event that the first die score is less than 3 and the event that the sum of the dice scores is 6.
In the dice experiment, set . Run the experiment 100 times and compute the empirical probability of each event in the previous exercise.
Suppose that 2 fair, standard dice are rolled and the sequence of scores recorded. Let denote the sum of the scores, the minimum score, and the maximum score.
A pair of fair, standard dice are thrown repeatedly until the sum of the scores is either 5 or 7. Let denote the event that the sum of the scores on the last throw is 5 rather than 7. Events of this type are important in the game of craps.
Recall that a standard card deck can be modeled by the product set
where the first coordinate encodes the denomination or kind (ace, 2-10, jack, queen, king) and where the second coordinate encodes the suit (clubs, diamonds, hearts, spades). Sometimes we represent a card as a string rather than an ordered pair (for example ).
Recall that the card experiment consists of dealing cards from a well-shuffled deck and recording the sequence of cards , where is the card. Let denote the unordered set of cards in the hand.
Recall also that the special case is the poker experiment and the special case is the bridge experiment. The poker experiment is studied in more detail in the chapter on Games of Chance.
Consider the card experiment with cards. For , let denote the event that card is a heart.
In the card experiment, set . Run the experiment 100 times and compute the empirical probability of each event in the previous exercise
In the poker experiment, find the probability of each of the following events:
Run the poker experiment 10000 times, updating every 10 runs. Compute the empirical probability of each event in the previous problem.
Find the probability that a bridge hand will contain no honor cards that is, no cards of denomination 10, jack, queen, king, or ace. Such a hand is called a Yarborough, in honor of the second Earl of Yarborough.
Find the probability that a bridge hand will contain
A card hand that contains no cards in a particular suit is said to be void in that suit. Use the inclusion-exclusion rule to find the probability of each of the following events:
Recall that in Buffon's coin experiment, a coin with radius is tossed "randomly" on a floor with square tiles of side length 1, and the coordinates of the center of the coin are recorded, relative to the center of the square in which the coin lands. Let denote the event that the coin does not touch the sides of the square, and let denote the random variable that gives the distance from the coin center to the origin.
In Buffon's coin experiment, set . Run the experiment 100 times and compute the empirical probability of each event in the previous exercise.
Recall the basic urn model in which an urn contains balls with distinct labels. A random sample of balls is chosen from the urn without replacement, and the sequence of ball labels is recorded. Let denote the unordered sample. Recall that this experiment serves as a generic example of sampling without replacement from a finite population.
Suppose that an urn contains balls; are red and the remaining are green. A random sample of balls is chosen without replacement. Let denote the number of red balls in the sample. This urn model serves as a generic example of sampling without replacement from a finite, dichotomous population. Random variable has a hypergeometric distribution. Determine the set of possible values of and show that
Consider an urn with 30 balls; 10 are red and 20 are green. A sample of 5 balls is chosen at random. Explicitly compute the probabilities in the last exercise.
In the simulation of the ball and urn experiment, select 30 balls with 10 red and 20 green, and select sample size 5. Run the experiment 1000 times and compare the empirical probabilities with the ones that you computed in the previous exercise.
An urn contains 12 balls: 5 are red, 4 are green, and 3 are blue. Three balls are chosen at random, without replacement.
Repeat the last exercise under the assumption that the balls are chosen with replacement.
First, let's consider an overly simplified model of an inherited trait that has two possible states, for example a pea plant whose pods are either green or yellow. A plant has two genes for the trait (one from each parent), so the possible genotypes are
The genotypes and are called homozygotes, while the genotype is called heterozygote. Typically, one of the states of the inherited trait is dominant and the other recessive. Thus, for example, if green is the dominant state for pod color, then a plant with genotype or has green pods, while a plant with genotype has yellow pods. Finally, the gene passed from a parent to a child is randomly selected from the parent's two genes. The inheritance of pea pod color was studied by Gregor Mendel, the father of modern genetics.
Let be the event that a child plant has genotype , the event that the plant has genotype , and the event that the plant has genotype . Find , , and in each of the following cases:
A sex-linked hereditary disorder is a disorder due to a defect on the X chromosome (one of the two chromosomes that determine gender). Suppose that denotes the normal gene and the defective gene linked to the disorder. Women have two X chromosomes, and is recessive. Thus, a woman with genotype is completely normal with respect to the condition; a woman with genotype does not have the disorder, but is a carrier, since she can pass the defective gene to her children; and a woman with genotype has the disorder. A man has only one X chromosome (his other sex chromosome, the Y chromosome, typically plays no role in the disorder). A man with genotype is normal and a man with genotype has the disorder. Examples of sex-linked hereditary disorders are dichromatism, the most common form of color-blindness, and the most common form of hemophilia, a bleeding disorder. The following exercise explore the transmission of a sex-linked hereditary disorder.
Let be the event that a son has the disorder, the event that a daughter is a carrier, and the event that a daughter has the disease. Find , and in each of the following cases:
From this exercise, note that transmission of the disorder to a daughter can only occur if the mother is at least a carrier and the father has the disorder. In ordinary large populations, this is a unusual intersection of events, and thus sex-linked hereditary disorders are typically much less common in women than in men. In brief, women are protected by the extra X chromosome.
Suppose that denotes the time between emissions (in milliseconds) for a certain type of radioactive material, and that has the following exponential distribution:
Suppose that denotes the number of emissions in a one millisecond interval for a certain type of radioactive material, and that has the following Poisson distribution:
Suppose that a secretary prepares letters and corresponding envelopes to send to different people, but then stuffs the letters in the envelopes randomly. We are interested in the event that at least one letter is inserted into the proper envelope.
Use the steps below and the inclusion-exclusion rule to show that
Use the previous result to show that
A complete analysis of the matching experiment is given in the chapter on Finite Sampling Models.
For the M&M data set, let denote the event that a bag has at least 10 red candies, the event that a bag has at least 57 candies total, and the event that a bag weighs at least 50 grams. Find the empirical probability the following events:
For the cicada data, let denote the event that a cicada weighs at least 0.20 grams, the event that a cicada is female, and the event that a cicada is type tredecula. Find the empirical probability of each of the following: