This articleneeds additional citations forverification. Please helpimprove this article byadding citations to reliable sources. Unsourced material may be challenged and removed. Find sources: "Conditional probability distribution" – news ·newspapers ·books ·scholar ·JSTOR(April 2013) (Learn how and when to remove this message) |
Inprobability theory andstatistics, the conditional probability distribution is a probability distribution that describes the probability of an outcome given the occurrence of a particular event. Given twojointly distributedrandom variables and, theconditional probability distribution of given is theprobability distribution of when is known to be a particular value; in some cases the conditional probabilities may be expressed as functions containing the unspecified value of as a parameter. When both and arecategorical variables, aconditional probability table is typically used to represent the conditional probability. The conditional distribution contrasts with themarginal distribution of a random variable, which is its distribution without reference to the value of the other variable.
If the conditional distribution of given is acontinuous distribution, then itsprobability density function is known as theconditional density function.[1] The properties of a conditional distribution, such as themoments, are often referred to by corresponding names such as theconditional mean andconditional variance.
More generally, one can refer to the conditional distribution of a subset of a set of more than two variables; this conditional distribution is contingent on the values of all the remaining variables, and if more than one variable is included in the subset then this conditional distribution is the conditionaljoint distribution of the included variables.
Fordiscrete random variables, the conditionalprobability mass function of given can be written according to its definition as:
Due to the occurrence of in the denominator, this is defined only for non-zero (hence strictly positive)
The relation with the probability distribution of given is:
Consider the roll of a fair die and let if the number is even (i.e., 2, 4, or 6) and otherwise. Furthermore, let if the number is prime (i.e., 2, 3, or 5) and otherwise.
D | 1 | 2 | 3 | 4 | 5 | 6 |
---|---|---|---|---|---|---|
X | 0 | 1 | 0 | 1 | 0 | 1 |
Y | 0 | 1 | 1 | 0 | 1 | 0 |
Then the unconditional probability that is 3/6 = 1/2 (since there are six possible rolls of the dice, of which three are even), whereas the probability that conditional on is 1/3 (since there are three possible prime number rolls—2, 3, and 5—of which one is even).
Similarly forcontinuous random variables, the conditionalprobability density function of given the occurrence of the value of can be written as[2]
where gives thejoint density of and, while gives themarginal density for. Also in this case it is necessary that.
The relation with the probability distribution of given is given by:
The concept of the conditional distribution of a continuous random variable is not as intuitive as it might seem:Borel's paradox shows that conditional probability density functions need not be invariant under coordinate transformations.
The graph shows abivariate normal joint density for random variables and. To see the distribution of conditional on, one can first visualize the line in theplane, and then visualize the plane containing that line and perpendicular to the plane. The intersection of that plane with the joint normal density, once rescaled to give unit area under the intersection, is the relevant conditional density of.
Random variables, areindependent if and only if the conditional distribution of given is, for all possible realizations of, equal to the unconditional distribution of. For discrete random variables this means for all possible and with. For continuous random variables and, having ajoint density function, it means for all possible and with.
Seen as a function of for given, is a probability mass function and so the sum over all (or integral if it is a conditional probability density) is 1. Seen as a function of for given, it is alikelihood function, so that the sum (or integral) over all need not be 1.
Additionally, a marginal of a joint distribution can be expressed as the expectation of the corresponding conditional distribution. For instance,.
Let be aprobability space, a-field in. Given, theRadon–Nikodym theorem implies that there is[3] a-measurable random variable, called theconditional probability, such thatfor every, and such a random variable is uniquely defined up to sets of probability zero. A conditional probability is calledregular if is aprobability measure on for all a.e.
Special cases:
Let be a-valued random variable. For each, defineFor any, the function is called theconditional probability distribution of given. If it is a probability measure on, then it is calledregular.
For a real-valued random variable (with respect to the Borel-field on), every conditional probability distribution is regular.[4] In this case, almost surely.
For any event, define theindicator function:
which is a random variable. Note that the expectation of this random variable is equal to the probability ofA itself:
Given a-field, the conditional probability is a version of theconditional expectation of the indicator function for:
An expectation of a random variable with respect to a regular conditional probability is equal to its conditional expectation.
Consider the probability spaceand a sub-sigma field.The sub-sigma field can be loosely interpreted as containing a subset of the information in. For example, we might think of as the probability of the event given the information in.
Also recall that an event is independent of a sub-sigma field if for all. It is incorrect to conclude in general that the information in does not tell us anything about the probability of event occurring. This can be shown with a counter-example:
Consider a probability space on the unit interval,. Let be the sigma-field of all countable sets and sets whose complement is countable. So each set in has measure or and so is independent of each event in. However, notice that also contains all the singleton events in (those sets which contain only a single). So knowing which of the events in occurred is equivalent to knowing exactly which occurred! So in one sense, contains no information about (it is independent of it), and in another sense it contains all the information in.[5][page needed]