Movatterモバイル変換


[0]ホーム

URL:


Search for probability and statistics terms on Statlect
StatLect
Index >Fundamentals of probability

Conditional probability distribution

by, PhD

A conditional distribution is the probability distribution of a random variable, calculated according to the rules ofconditional probability after observing the realization of another random variable.

Table of Contents

Table of contents

  1. Overview

  2. Conditioning on events

  3. Discrete random vectors

  4. Continuous random vectors

  5. The general case

  6. More details

    1. Conditional distribution of a random vector

    2. The joint distribution as a product of marginal and conditional

  7. Solved exercises

    1. Exercise 1

    2. Exercise 2

    3. Exercise 3

Overview

We will discuss how to update the probability distribution of arandom variableX after receiving the information that another random variableY has taken a specific value$y$.

The updated probability distribution ofX will be called the conditional distribution ofX givenY=y.

The two random variablesX andY, considered together, form arandom vector[eq1].

Depending on the characteristics of the random vector[eq2], different procedures need to be followed in order to compute the conditional probability distribution ofX givenY=y.

In the remainder of this lecture, these procedures are presented in the following order:

  1. first, we tackle the case in which the random vector[eq3] is adiscrete random vector;

  2. then, we tackle the case in which[eq4] is acontinuous random vector;

  3. finally, we briefly discuss the case in which[eq2] is neither discrete nor continuous.

Conditioning on events

Note that if we are able to update the probability distribution ofX when we receive the information thatY=y, then we can also revise the distribution ofX when we get to know that a genericeventE has happened.

It suffices to set$Y=1_{E}$, where$1_{E}$ is theindicator function of the eventE, and compute the distribution ofX conditional on the realization$Y=1_{E}=1$.

Discrete random vectors

In the case in which[eq4] is a discrete random vector, theprobability mass function (pmf) ofX conditional on the information thatY=y is called conditional probability mass function.

DefinitionLet[eq7] be a discrete random vector. We say that a function[eq8] is the conditional probability mass function ofX givenY=y if, for any$xin U{211d} $,[eq9]where[eq10] is the conditional probability that$X=x$ given thatY=y.

How do we derive the conditional pmf from thejoint pmf[eq11]?

The following proposition provides an answer to this question.

PropositionLet[eq12] be a discrete random vector. Let[eq13] be its joint pmf, and[eq14] themarginal pmf ofY. Theconditional pmf ofX givenY=y is[eq15]provided[eq16].

Proof

This is just the usual formula for computing conditional probabilities (conditional probability equals joint probability divided by marginal probability):[eq17]

In the proposition above, we assume that the marginal pmf[eq14] is known. If it is not, it can be derived from the joint pmf[eq19] bymarginalization.

Example Let the support of[eq20] be[eq21]and its joint pmf be[eq22]Let us compute the conditional pmf ofX given$Y=0$. The support ofY is[eq23]The marginal pmf ofY evaluated at$y=0$ is[eq24]The support ofX is[eq25]Thus, the conditional pmf ofX given$Y=0$ is[eq26]

When[eq27], it is in general not possible to unambiguously derive the conditional pmf ofX, as we show below with an example.

This impossibility (known as theBorel-Kolmogorov paradox) is not particularly worrying, as it is seldom relevant in applications.

Example The example is a bit involved.You might safely skip it on a first reading. Suppose that the sample spaceOmega is the set of all real numbers between0 and1:[eq28]It is possible to build a probability measure$QTR{rm}{P}$ onOmega, such that$QTR{rm}{P}$ assigns to each sub-interval of$left[ 0,1ight] $ a probability equal to its length, that is,[eq29]This is the same sample space discussed in the lecture onzero-probability events. Define a random variableX as follows:[eq30]and another random variableY as follows:[eq31]BothX andY are discrete random variables and, considered together, they constitute a discrete random vector[eq32]. Suppose that we want to compute the conditional pmf ofX conditional on$Y=1$. It is easy to see that[eq33]. As a consequence, we cannot use the formula[eq34]because division by zero is not possible. Also the technique of deriving a conditional probability implicitly, as a realization of aconditional probability with respect to a sigma-algebra does not allow us to unambiguously derive[eq35]. In this case, the partition of interest is[eq36], where[eq37]and[eq38] can be viewed as the realization of the conditional probability[eq39] when$omega in G_{1}$. Thefundamental property of conditional probability[eq40]is satisfied in this case if and only if, for a givenx, the following system of equations is satisfied:[eq41]which implies[eq42]The second equation does not help to determine[eq43]. So, from the first equation, it is evident that[eq44] is undetermined (any number, when multiplied by zero, gives zero). One can show that also the requirement that[eq45] be aregular conditional probability does not help to pin down[eq38]. What does it mean that[eq38] is undetermined? It means that any choice of[eq35] is legitimate, provided the requirement[eq49] is satisfied. Is this really a paradox? No, because conditional probability with respect to a partition is defined up to almost sure equality, and$G_{1} $ is a zero-probability event. As a consequence, the value that[eq50] takes on$G_{1}$ does not matter. Roughly speaking, we do not really need to care about zero-probability events, provided there is only a countable number of them.

Continuous random vectors

In the case in which[eq4] is a continuous random vector, theprobability density function (pdf) ofX conditional on the information thatY=y is called conditional probability density function.

DefinitionLet[eq52] be a continuous random vector. We say that a function[eq53] is the conditional probability density function ofX givenY=y if, for any interval[eq54],[eq55]and[eq56] is such that the above integral is well defined.

How do we derive the conditional pdf from thejoint pdf[eq57]?

The following proposition provides an answer to this question.

PropositionLet[eq58] be a continuous random vector. Let[eq57] be its joint pdf, and[eq60] be the marginal pdf ofY. Theconditional pdf ofX givenY=y is[eq61]provided[eq62].

Proof

Deriving the conditional distribution ofX givenY=y is far from obvious. As explained in the lecture onrandom variables, whatever value of$y$ we choose, we are conditioning on a zero-probability event:[eq63]Therefore, the standard formula (conditional probability equals joint probability divided by marginal probability) cannot be used. However, it turns out that the definition ofconditional probability with respect to a partition can be fruitfully applied in this case to derive the conditional pdf ofX givenY=y. In order to prove that[eq64]is a legitimate choice, we need to prove that conditional probabilities calculated by using this conditional pdf satisfy the fundamental property of conditional probability:[eq65]for anyH andE. Thanks to some basic results in measure theory, we can confine our attention to the eventsH andE that can be written as follows:[eq66]For these events, it is immediate to verify that the fundamental property of conditional probability holds. First, by the very definition of a conditional pdf, we have that[eq67]Furthermore, the indicator function[eq68] is also a function ofY. Therefore, the product[eq69] is a function ofY, and we can use thetransformation theorem to compute its expected value:[eq70]The last equality proves the proposition.

Example Let the support of[eq20] be[eq72]and its joint pdf be[eq73]Let us compute the conditional pdf ofX given$Y=1$. The support ofY is[eq74]When[eq75], the marginal pdf ofY is0; when[eq76], the marginal pdf is[eq77]Thus, the marginal pdf ofY is[eq78]When evaluated at$y=1$, it is[eq79]The support ofX is[eq80]Thus, the conditional pdf ofX given$Y=1$ is[eq81]

The general case

In general, when[eq4] is neither discrete nor continuous, we can characterize thedistribution function ofX conditional on the information thatY=y.

DefinitionWe say that a function[eq83] is theconditional distribution function ofX givenY=y if and only if[eq84]where[eq85] is the conditional probability that$Xleq x$ given thatY=y.

There is no immediate way of deriving the conditional distribution ofX givenY=y. However, we can characterize it by using the concept ofconditional probability with respect to a partition, as follows.

Define the events$G_{y}$ as follows:[eq86]and a partitionG of events as[eq87]where, as usual,$R_{Y}$ is the support ofY.

Then, for any$omega in G_{y}$ we have[eq88]where[eq89] is the probability that$Xleq x$ conditional on the partitionG.

As we know,[eq89] is guaranteed to exist and is unique up to almost sure equality. Of course, this does not mean that we are able to compute it.

Nonetheless, this characterization is extremely useful because it allows us to speak of the conditional distribution ofX givenY=y in general, without the need to specify whetherX andY are discrete or continuous.

More details

The following sections contain more details about conditional distributions.

Conditional distribution of a random vector

We have discussed how to update the probability distribution of a random variableX after observing the realization of another random variableY, that is, after receiving the information thatY=y.

What happens whenX andY are random vectors rather than random variables?

Basically, everything we said above still applies with straightforward modifications.

Thus, ifX andY are discrete random vectors, then the conditional probability mass function ofX givenY=y is[eq91]provided[eq92].

IfX andY are continuous random vectors then the conditional probability density function ofX givenY=y is[eq93]provided[eq94].

In general, the conditional distribution function ofX givenY=y is[eq95]

The joint distribution as a product of marginaland conditional

As we have explained above, the joint distribution ofX andY can be used to derive the marginal distribution ofY and the conditional distribution ofX givenY=y.

This process can also go in the reverse direction: if we know the marginal distribution ofY and the conditional distribution ofX givenY=y, then we can derive the joint distribution ofX andY.

For discrete random variables, we have that[eq96]

For continuous random variables, we have that[eq97]

Solved exercises

Below you can find some exercises with explained solutions.

Exercise 1

Let[eq4] be a discrete random vector with support[eq99]and joint probability mass function[eq100]

Compute the conditional probability mass function ofX given$Y=0$.

Solution

The marginal probability mass function ofY evaluated at$y=0$ is[eq101]The support ofX is[eq102]Thus, the conditional probability mass function ofX given$Y=0$ is[eq103]

Exercise 2

Let[eq4] be a continuous random vector with support[eq105]and its joint probability density function be[eq106]

Compute the conditional probability density function ofX given$Y=2$.

Solution

The support ofY is[eq107]When[eq108], the marginal probability density function ofY is[eq109]; when[eq110], the marginal probability density function ofY is[eq111]Thus, the marginal probability density function ofY is[eq112]When evaluated at the point$y=2$, it becomes[eq113]The support ofX is[eq114]Thus, the conditional probability density function ofX given$Y=2$ is[eq115]

Exercise 3

LetX be a continuous random variable with support[eq116]and probability density function[eq117]

LetY be another continuous random variable with support[eq118]and conditional probability density function[eq119]

Find the marginal probability density function ofY.

Solution

The support of the vector[eq4] is[eq121]and the joint probability function ofX andY is[eq122]The marginal probability density function ofY is obtained by marginalization, integratingx out of the joint probability density function

[eq123]Thus, for[eq124] we trivially have[eq125] (because[eq126]), while for[eq127] we have[eq128]Thus, the marginal probability density function ofY is[eq129]

How to cite

Please cite as:

Taboga, Marco (2021). "Conditional probability distribution", Lectures on probability theory and mathematical statistics. Kindle Direct Publishing. Online appendix. https://www.statlect.com/fundamentals-of-probability/conditional-probability-distributions.

The books

Most of the learning materials found on this website are now available in a traditional textbook format.

Probability and statisticsMatrix algebra
Featured pages
Explore
Main sections
About
Glossary entries
Share
  • To enhance your privacy,
  • we removed the social buttons,
  • butdon't forget to share.

[8]ページ先頭

©2009-2025 Movatter.jp