]>
As usual, suppose that we have a random experiment with sample space and probability measure . In this section, we will discuss independence, one of the fundamental concepts in probability theory. Independence is frequently invoked as a modeling assumption, and moreover, probability itself is based on the idea of independent replications of the experiment.
We will define independence on increasingly complex structures, from two events, to arbitrary collections of events, and then to collections of random variables. In each case, the basic idea is the same.
Two events and are independent if
If both of the events have positive probability, then independence is equivalent to the statement that the conditional probability of one event given the other is the same as the unconditional probability of the event:
This is how you should think of independence: knowledge that one event has occurred does not change the probability assigned to the other event.
The terms independent and disjoint sound vaguely similar but they are actually very different. First, note that disjointness is purely a set-theory concept while independence is a probability (measure-theoretic) concept. Indeed, two events can be independent relative to one probability measure and dependent relative to another. But most importantly, two disjoint events can never be independent, except in the trivial case that one of the events is null.
Suppose that and are disjoint events for an experiment, each with positive probability. Show that and are negatively correlated and hence dependent.
If and are independent events in an experiment, it seems clear that any event that can be constructed from should be independent of any event that can be constructed from . This is the case, as the next exercise shows.
Suppose that and are independent events in an experiment. Show that each of the following pairs of events is independent:
Suppose that and are events in a random experiment. Show that
To extend the definition of independence to more than two events, we might think that we could just require pairwise independence, the independence of each pair of events. The exercise below shows that three events can be pairwise independent, but a combination of two of the events can be related to a third event in the strongest possible sense.
Consider the dice experiment that consists of rolling 2 standard, fair dice and recording the sequence of scores. Let denote the event that first score is 3, the event that the second score is 4, and the event that the sum of the scores is 7.
In the dice experiment, set . Run the experiment 500 times. For each pair of events in the previous exercise, compute the product of the empirical probabilities and the empirical probability of the intersection. Compare the results.
Another possible generalization would be to simply require the probability of the intersection of the events to be the product of the probabilities of the events. This is also not the right way to go, as the following exercise illustrates.
Suppose that we throw a standard, fair die one time. Let , .
However, the definition of independence for two events does generalize in a natural way to an arbitrary collection of events. Specifically, a collection of events is said to be independent if for every finite subcollection ,
General independence of a collection of events is much stronger than mere pairwise independence of the events in the collection. The basic inheritance property in the following exercise is essentially equivalent to the definition.
Suppose that is a collection of events
Show that there are non-trivial conditions in the definition of the independence of events.
If the finite collection of events is independent, then it follows immediately from the definition that
This is known as the multiplication rule for independent events. Compare this with the general multiplication rule for conditional probability.
Show that the collection of essentially deterministic events is an independent collection of events.
Suppose that , , , and are independent events in an experiment. Show directly that and are independent.
Compare the previous exercise with Exercise 2. The complete generalization of these results is a bit complicated, but goes roughly as follows: Suppose that is a independent collection of events and that is a pairwise disjoint set of subcollections of (where is an index set). That is, for each and for distinct and . Suppose now that for each , an event is constructed from the events in using a countable number of set operations. Then the collection of events is independent.
The following exercise gives a formula for the probability of the union of a collection of independent events that is much nicer than the inclusion-exclusion formula.
Suppose that is a finite collection of independent events. Show that
Suppose that is an independent collection of events, each with the same probability. Show that is also an exchangeable collection of events. The converse is not true, as Pólya's urn model so dramatically illustrates.
Suppose now that is a random variable taking values in for each in a nonempty index set . Intuitively, the random variables are independent if knowledge of the values of some of the variables tells us nothing about the values of the other variables. Mathematically, independence of random variables can be reduced to the independence of events. Formally, the collection of random variables is independent if every collection of events of the following form is independent:
Equivalently then, the collection of random variables is independent if for every finite subset of , and for every choice of for we have
Suppose that is a collection of random variables. Show that
Suppose that is a collection of independent random variables, as above, and suppose that for each , is a function from into a set . Show that the collection of random variables is also independent.
Independence of random variables subsumes independence of events. Show that a collection of events is independent if and only if the corresponding collection of indicator variables is independent.
Many of the concepts that we have been using informally can now be made precise. A compound experiment that consists of independent stages
is essentially just an experiment whose outcome is a sequence of independent random variables
where
is the outcome of the
stage:
In particular, suppose that we have a basic experiment with outcome variable
. By definition, the outcome of the experiment that consists of independent replications
of the basic experiment is a sequence of independent random variables
each with the distribution of
. This is fundamental to the very concept of probability, as expressed in the law of large numbers. From a statistical point of view, suppose that we have a population of objects and a vector of measurements
of interest for the objects in the sample. The sequence
above corresponds to sampling from the distribution of
; that is,
is the vector of measurements for the
object drawn from the sample. When we sample from a finite population, sampling with replacement generates independent random variables while sampling without replacement generates dependent random variables.
As noted at the beginning of our discussion, independence of events or random variables depends on the underlying probability measure. Thus, suppose that is an event in a random experiment with positive probability. A collection of events or a collection of random variables is conditionally independent given if the collection is independent relative to the conditional probability measure . Note that the definitions and theorems of this section would still be true, but with all probabilities conditioned on .
The following exercises gives an important interpretation of conditional probability. Suppose that we start with a basic experiment that has sample space and outcome variable . In particular, recall that is simply the identity function on so that if is an event, then trivially, .
Suppose now that we replicate the experiment independently. This results in a new, compound experiment with a sequence of independent random variables , each with the same distribution as . Suppose now that and are events in the basic experiment (that is, subsets of ) with .
Show that, in the compound experiment, the event that when
occurs for the first time,
also occurs
is
Show that the probability of the event in the last exercise is
Argue the result in the last exercise directly. Specifically, suppose that we create a new experiment by repeating the basic experiment until occurs for the first time, and then record the outcome of just the last repetition of the basic experiment. Argue that the appropriate probability measure on the new experiment is .
Suppose that
and
are disjoint events in a basic experiment with
and
.
In the compound experiment obtained by replicating the basic experiment, show that the event that
occurs before
has probability
Suppose that , , and are independent events in an experiment with , , and . Express each of the following events in set notation and find its probability:
Suppose that , and are independent events for an experiment with , , . Find the probability of each of the following events:
A small company has 100 employees; 40 are men and 60 are women. There are 6 male executives. How many female executives should there be if gender and rank are independent? (The underlying experiment is to choose an employee at random.)
Suppose that 3 students, who ride together, miss a mathematics exam. They decide to lie to the instructor by saying that the car had a flat tire. The instructor separates the students and asks each of them which tire was flat. The students, who did not anticipate this, select their answers independently and at random. Find the probability that the students get away with their deception. For a more extensive treatment of the lying students problem, see The Number of Distinct Sample Values in the Chapter on Finite Sampling Models.
A Bernoulli trials sequence is a sequence of independent, identically distributed indicator variables. Random variable is the outcome of trial , where in the usual terminology of reliability theory, 1 denotes success and 0 denotes failure. The canonical example is the sequence of scores when a coin (not necessarily fair) is tossed repeatedly. The process is named for Jacques Bernoulli, and has a single basic parameter . This random process is studied in detail in the chapter on Bernoulli trials.
Show that
Let denote the number of successes in the first trials. Show that
The distribution of is called the binomial distribution with parameters and . The binomial distribution is studied in detail in the chapter on Bernoulli Trials.
More generally, a multinomial trials sequence is a sequence of independent, identically distributed random variables, each taking values in a finite set . The canonical example is the sequence of scores when a -sided die (not necessarily fair) is thrown repeatedly. Multinomial trials are studied in detail in the chapter on Bernoulli trials.
Consider the experiment that consists of dealing 2 cards from a standard deck and recording the sequence of cards dealt. For , let be the event that card is a queen and the event that card is a heart. Compute the appropriate probabilities to verify the following results. Reflect on these results.
In the card experiment, set . Run the simulation 500 times. For each pair of events in the previous exercise, compute the product of the empirical probabilities and the empirical probability of the intersection. Compare the results.
Suppose that a standard, fair die is thrown 5 times. Find the probability of getting at least one six.
Suppose that a pair of standard, fair dice are thrown 10 times. Find the probability of getting at least one double six.
Consider the dice experiment that consists of rolling , -sided dice and recording the sequence of scores . Show that the following conditions are equivalent (and correspond to the assumption that the dice are fair):
A pair of standard, fair dice are rolled. Find the probability that a sum of 4 occurs before a sum of 7. Problems of this type are important in the game of craps.
A biased coin with probability of heads is tossed 5 times. Let denote the number of heads. Explicitly compute for .
A box contains a fair coin and a two-headed coin. A coin is chosen at random from the box and tossed repeatedly. Let denote the event that the fair coin is chosen, and let denote the outcome of the toss (where 1 encodes heads and 0 encodes tails).
Consider again the box in the previous exercise, but we change the experiment as follows: a coin is chosen at random from the box and tossed and the result recorded. The coin is returned to the box and the process is repeated. As before, let denote the outcome of toss . Show that is a Bernoulli trials sequence with parameter . Specifically,
Recall that Buffon's coin experiment consists of tossing a coin with radius randomly on a floor covered with square tiles of side length 1. The coordinates of the center of the coin are recorded relative to axes through the center of the square in which the coin lands. Show that the following conditions are equivalent:
In Buffon's coin experiment, set . Run the simulation 500 times. For the events and , compute the product of the empirical probabilities and the empirical probability of the intersection. Compare the results.
The arrival time of the train is uniformly distributed on the interval , while the arrival time of the train is uniformly distributed on the interval . (The arrival times are in minutes, after 8:00 AM). Moreover, the arrival times are independent.
Recall the simple model of structural reliability in which a system is composed of components. Suppose in addition that the components operate independently of each other. As before, let denote the state of component , where 1 means working and 0 means failure. Thus, our basic assumption is that is a vector of independent indicator random variables that specifies the states of all of the components. We assume that the state of the system (either working or failed) depends only on the states of the components, according to a structure function. Thus, the state of the system is an indicator random variable
Generally, the probability that a device is working is the reliability of the device. Thus, we will denote the reliability of component by so that the vector of component reliabilities is . By independence, the system reliability is a function of the component reliabilities:
Appropriately enough, this function is known as the reliability function. Our challenge is usually to find the reliability function, given the structure function. When the components all have the same probability then of course the system reliability is just a function of . In this case, the component states . forms a sequence of Bernoulli trials.
Comment on the independence assumption for real systems, such as your car or your computer.
Recall that a series system is working if and only if each component is working. Show that
Recall that a parallel system is working if and only if at least one component is working. Show that
Recall that a out of system is working if and only if at least of the components are working. Thus, a parallel system is a 1 out of system and a series system is an out of system. A out of system is a majority rules system. The reliability function of a general out of system is a mess. However, if the component reliabilities are the same, the function has a reasonably simple form.
Show that for a out of system with common component reliability , the system reliability is
Consider a system of 3 components with reliabilities , , . Find the reliability of each of the following:
Consider an airplane with an odd number of engines, each with reliability . Suppose that the airplane is a majority rules system, so that the airplane needs a majority of working engines in order to fly.
The graph below is known as the Wheatstone bridge network and is named for Charles Wheatstone. The edges represent components, and the system works if and only if there is a working path from vertex to vertex .
A system consists of 3 components, connected in parallel. Because of environmental factors, the components do not operate independently, so our usual assumption does not hold. However, we will asume that under low stress conditions, the components are independent, each with reliability 0.9; under medium stress conditions, the components are independent with reliability 0.8; and under high stress conditions, the components are independent, each with reliability 0.7. The probability of low stress is 0.5, of medium stress is 0.3, and of high stress is 0.2.
Please recall the discussion of diagnostic testing in the section on Conditional Probability. Thus, we have an event for a random experiment whose occurrence or non-occurrence we cannot observe directly. Suppose now that we have tests for the occurrence of , labeled from 1 to . We will let denote the event that test is positive for . The tests are independent in the following sense:
If occurs, then are (conditionally) independent and test has sensitivity
If does not occur, then are (conditionally) independent and test has specificity
We can form a new, compound test by giving a decision rule in terms of the individual test results. In other words, the event that the compound test is positive for is a function of . The typical decision rules are very similar to the reliability structures discussed above. A special case of interest is when the tests are independent applications of a given basic test. In this case, the and for each .
Consider the compound test that is positive for if and only if each of the tests is positive for . Show that
Consider the compound test that is positive for if and only if each at least one of the tests is positive for . Show that
More generally, we could define the compound out of test that is positive for if and only if at least of the individual tests are positive for . The test in Exercise 46 is the out of test, while the test in Exercise 47 is the 1 out of test. The out of test is the majority rules test.
Suppose that a woman initially believes that there is an even chance that she is pregnant or not pregnant. She buys three identical pregnancy tests with sensitivity 0.95 and specificity 0.90. Tests 1 and 3 are positive and test 2 is negative. Find the probability that the woman is pregnant.
Suppose that 3 independent, identical tests for an event are applied, each with sensitivity and specificity . Find the sensitivity and specificity of the following tests:
In a criminal trial, the defendant is convicted if and only if all 6 jurors vote guilty. Assume that if the defendant really is guilty, the jurors vote guilty, independently, with probability 0.95, while if the defendant is really innocent, the jurors vote not guilty, independently with probability 0.8. Suppose that 70% of defendants brought to trial are guilty.
Recall our discussion of genetics in the section on Probability Measure and our discussion of genetics in the section on Conditional Probability. For a given genetic trait (such as eye color or the presence of a disorder), it's usually reasonable to assume that the genotypes of the children are conditionally independent, given the genotypes of the parents. Unconditionally, however, the state of a child (for the given trait) gives information about the states of the parents, which in turn give information about the states of other children.
In the following exercise, suppose that a certain type of pea plant has either green pods or yellow pods, and that the green-pod gene is dominant. Thus, a plant with genotype or has green pods, while a plant with genotype has yellow pods.
Suppose that 2 green-pod plants are bred together. Suppose further that each plant, independently, has the recessive yellow-pod gene with probability
In the following exercise, consider a sex-linked hereditary disorder associated with a gene on the X chromosome. As before, let denote the dominant normal gene and the recessive defective gene. Thus, a woman of genotype is normal; a woman of genotype is free of the disease, but is a carrier; and a woman of genotype has the disease. A man of genotype is normal and a man of genotype has the disease.
Suppose that a healthy woman initially has a chance of being a carrier. (From our discussion above, this would be the case, for example, if her mother and father are healthy but she has a brother with the disorder, so that her mother must be a carrier).
Suppose that we have coins, labeled 0, 1, ..., . Coin lands heads with probability for each . In particular, note that, coin 0 is two-tailed and coin is two-headed. Our experiment is to choose a coin at random (so that each coin is equally likely to be chosen) and then toss the chosen coin repeatedly.
Show that the probability that the first tosses are all heads is
Show that the conditional probability that toss is heads given that the previous tosses were all heads is
Interpret as an approximating sum for the integral to show that
Conclude that
The limiting conditional probability in the last exercise is called Laplace's Rule of Succession, named after Simon Laplace. This rule was used by Laplace and others as a general principle for estimating the conditional probability that an event will occur on time , given that the event has occurred times in succession.
Suppose that a missile has had 10 successful tests in a row. Compute Laplace's estimate that the test will be successful. Does this make sense?
Comment on the validity of Laplace's rule as a general principle.