Movatterモバイル変換


[0]ホーム

URL:


Jump to content
WikipediaThe Free Encyclopedia
Search

Independent and identically distributed random variables

From Wikipedia, the free encyclopedia
(Redirected fromIndependent identically-distributed random variables)
Concept in probability and statistics
"IID" and "iid" redirect here. For other uses, seeIID (disambiguation).
A chart showing uniform distribution. Plot points are scattered randomly, with no pattern or clusters.
A chart showing a uniform distribution

Inprobability theory andstatistics, a collection ofrandom variables isindependent and identically distributed (i.i.d.,iid, orIID) if each random variable has the sameprobability distribution as the others and all are mutuallyindependent.[1] IID was first defined in statistics and finds application in many fields, such asdata mining andsignal processing.

Introduction

[edit]

Statistics commonly deals with random samples. A random sample can be thought of as a set of objects that are chosen randomly. More formally, it is "a sequence ofindependent, identically distributed (IID) random data points."

In other words, the termsrandom sample andIID are synonymous. In statistics, "random sample" is the typical terminology, but in probability, it is more common to say "IID."

  • Identically distributed means that there are no overall trends — the distribution does not fluctuate and all items in the sample are taken from the sameprobability distribution.
  • Independent means that the sample items are all independent events. In other words, they are not connected to each other in any way;[2] knowledge of the value of one variable gives no information about the value of the other and vice versa.

Application

[edit]

Independent and identically distributed random variables are often used as an assumption, which tends to simplify the underlying mathematics. In practical applications ofstatistical modeling, however, this assumption may or may not be realistic.[3]

Thei.i.d. assumption is also used in thecentral limit theorem, which states that the probability distribution of the sum (or average) of i.i.d. variables with finitevariance approaches anormal distribution.[4]

Thei.i.d. assumption frequently arises in the context of sequences of random variables. Then, "independent and identically distributed" implies that an element in the sequence is independent of the random variables that came before it. In this way, an i.i.d. sequence is different from aMarkov sequence, where the probability distribution for thenth random variable is a function of the previous random variable in the sequence (for a first-order Markov sequence). An i.i.d. sequence does not imply the probabilities for all elements of thesample space or event space must be the same.[5] For example, repeated throws of loaded dice will produce a sequence that is i.i.d., despite the outcomes being biased.

Insignal processing andimage processing, the notion of transformation to i.i.d. implies two specifications, the "i.d." part and the "i." part:

i.d. – The signal level must be balanced on the time axis.

i. – The signal spectrum must be flattened, i.e. transformed by filtering (such asdeconvolution) to awhite noise signal (i.e. a signal where all frequencies are equally present).

Definition

[edit]

Definition for two random variables

[edit]

Suppose that the random variablesX{\displaystyle X} andY{\displaystyle Y} are defined to assume values inIR{\displaystyle I\subseteq \mathbb {R} }. LetFX(x)=P(Xx){\displaystyle F_{X}(x)=\operatorname {P} (X\leq x)} andFY(y)=P(Yy){\displaystyle F_{Y}(y)=\operatorname {P} (Y\leq y)} be thecumulative distribution functions ofX{\displaystyle X} andY{\displaystyle Y}, respectively, and denote theirjoint cumulative distribution function byFX,Y(x,y)=P(XxYy){\displaystyle F_{X,Y}(x,y)=\operatorname {P} (X\leq x\land Y\leq y)}.

Two random variablesX{\displaystyle X} andY{\displaystyle Y} areindependent if and only ifFX,Y(x,y)=FX(x)FY(y){\displaystyle F_{X,Y}(x,y)=F_{X}(x)\cdot F_{Y}(y)} for allx,yI{\displaystyle x,y\in I}. (For the simpler case of events, two eventsA{\displaystyle A} andB{\displaystyle B} are independent if and only ifP(AB)=P(A)P(B){\displaystyle P(A\land B)=P(A)\cdot P(B)}, see alsoIndependence (probability theory) § Two random variables.)

Two random variablesX{\displaystyle X} andY{\displaystyle Y} areidentically distributed if and only ifFX(x)=FY(x){\displaystyle F_{X}(x)=F_{Y}(x)} for allxI{\displaystyle x\in I}.[6]

Two random variablesX{\displaystyle X} andY{\displaystyle Y} arei.i.d. if they are independentand identically distributed, i.e. if and only if

FX(x)=FY(x)xIFX,Y(x,y)=FX(x)FY(y)x,yI{\displaystyle {\begin{aligned}&F_{X}(x)=F_{Y}(x)\,&\forall x\in I\\&F_{X,Y}(x,y)=F_{X}(x)\cdot F_{Y}(y)\,&\forall x,y\in I\end{aligned}}}

Definition for more than two random variables

[edit]

The definition extends naturally to more than two random variables. We say thatn{\displaystyle n} random variablesX1,,Xn{\displaystyle X_{1},\ldots ,X_{n}} arei.i.d. if they are independent (see furtherIndependence (probability theory) § More than two random variables)and identically distributed, i.e. if and only if

FX1(x)=FXk(x)k{1,,n} and xIFX1,,Xn(x1,,xn)=FX1(x1)FXn(xn)x1,,xnI{\displaystyle {\begin{aligned}&F_{X_{1}}(x)=F_{X_{k}}(x)\,&\forall k\in \{1,\ldots ,n\}{\text{ and }}\forall x\in I\\&F_{X_{1},\ldots ,X_{n}}(x_{1},\ldots ,x_{n})=F_{X_{1}}(x_{1})\cdot \ldots \cdot F_{X_{n}}(x_{n})\,&\forall x_{1},\ldots ,x_{n}\in I\end{aligned}}}

whereFX1,,Xn(x1,,xn)=P(X1x1Xnxn){\displaystyle F_{X_{1},\ldots ,X_{n}}(x_{1},\ldots ,x_{n})=\operatorname {P} (X_{1}\leq x_{1}\land \ldots \land X_{n}\leq x_{n})} denotes the joint cumulative distribution function ofX1,,Xn{\displaystyle X_{1},\ldots ,X_{n}}.

Examples

[edit]

Example 1

[edit]

A sequence of outcomes of spins of a fair or unfairroulette wheel isi.i.d. One implication of this is that if the roulette ball lands on "red", for example, 20 times in a row, the next spin is no more or less likely to be "black" than on any other spin (see thegambler's fallacy).

Example 2

[edit]

Toss a coin 10 times and write down the results into variablesA1,,A10{\displaystyle A_{1},\ldots ,A_{10}}.

  1. Independent: Each outcomeAi{\displaystyle A_{i}} will not affect the other outcomeAj{\displaystyle A_{j}} (forij{\displaystyle i\neq j} from 1 to 10), which means the variablesA1,,A10{\displaystyle A_{1},\ldots ,A_{10}} are independent of each other.
  2. Identically distributed: Regardless of whether the coin is fair (with a probability of 1/2 for heads) or biased, as long as the same coin is used for each flip, the probability of getting heads remains consistent across all flips.

Such a sequence of i.i.d. variables is also called aBernoulli process.

Example 3

[edit]

Roll a die 10 times and save the results into variablesA1,,A10{\displaystyle A_{1},\ldots ,A_{10}}.

  1. Independent: Each outcome of the die roll will not affect the next one, which means the 10 variables are independent from each other.
  2. Identically distributed: Regardless of whether the die is fair or weighted, each roll will have the same probability of seeing each result as every other roll. In contrast, rolling 10 different dice, some of which are weighted and some of which are not, would not produce i.i.d. variables.

Example 4

[edit]

Choose a card from a standard deck of cards containing 52 cards, then place the card back in the deck. Repeat this 52 times. Observe when a king appears.

  1. Independent: Each observation will not affect the next one, which means the 52 results are independent from each other. In contrast, if each card that is drawn is kept out of the deck, subsequent draws would be affected by it (drawing one king would make drawing a second king less likely), and the observations would not be independent.
  2. Identically distributed: After drawing one card from it (and then returning the card to the deck), each time the probability for a king is 4/52, which means the probability is identical each time.

Generalizations

[edit]

Many results that were first proven under the assumption that the random variables arei.i.d. have been shown to be true even under a weakerdistributional assumption.

Exchangeable random variables

[edit]
Main article:Exchangeable random variables

The most general notion which shares the main properties of i.i.d. variables areexchangeable random variables, introduced byBruno de Finetti.[citation needed] Exchangeability means that while variables may not be independent, future ones behave like past ones — formally, any value of a finite sequence is as likely as anypermutation of those values — thejoint probability distribution is invariant under thesymmetric group.

This provides a useful generalization — for example,sampling without replacement is not independent, but is exchangeable.

Lévy process

[edit]
Main article:Lévy process

Instochastic calculus, i.i.d. variables are thought of as adiscrete timeLévy process: each variable gives how much one changes from one time to another. For example, a sequence of Bernoulli trials is interpreted as theBernoulli process.

This could be generalized to include continuous timeLévy processes, and many Lévy processes can be seen as limits of i.i.d. variables—for instance, theWiener process is the limit of the Bernoulli process.

In machine learning

[edit]

Machine learning (ML) involves learning statistical relationships within data. To train ML models effectively, it is crucial to use data that is broadly generalizable. If thetraining data is insufficiently representative of the task, the model's performance on new, unseen data may be poor.

The i.i.d. hypothesis allows for a significant reduction in the number of individual cases required in the training sample, simplifying optimization calculations. In optimization problems, the assumption of independent and identical distribution simplifies the calculation of the likelihood function.Due to this assumption, the likelihood function can be expressed as:

l(θ)=P(x1,x2,x3,...,xn|θ)=P(x1|θ)P(x2|θ)P(x3|θ)...P(xn|θ){\displaystyle l(\theta )=P(x_{1},x_{2},x_{3},...,x_{n}|\theta )=P(x_{1}|\theta )P(x_{2}|\theta )P(x_{3}|\theta )...P(x_{n}|\theta )}

To maximize the probability of the observed event, the log function is applied to maximize the parameterθ{\textstyle \theta }. Specifically, it computes:

argmaxθlog(l(θ)){\displaystyle \mathop {\rm {argmax}} \limits _{\theta }\log(l(\theta ))}

where

log(l(θ))=log(P(x1|θ))+log(P(x2|θ))+log(P(x3|θ))+...+log(P(xn|θ)){\displaystyle \log(l(\theta ))=\log(P(x_{1}|\theta ))+\log(P(x_{2}|\theta ))+\log(P(x_{3}|\theta ))+...+\log(P(x_{n}|\theta ))}

Computers are very efficient at performing multiple additions, but not as efficient at performing multiplications. This simplification enhances computational efficiency. The log transformation, in the process of maximizing, converts many exponential functions into linear functions.

There are two main reasons why this hypothesis is practically useful with thecentral limit theorem (CLT):

  1. Even if the sample originates from a complex non-Gaussian distribution, it can be well-approximated because the CLT allows it to be simplified to a Gaussian distribution.
  2. The second reason is that the model's accuracy depends on the simplicity and representational power of the model unit, as well as the data quality. The simplicity of the unit makes it easy to interpret and scale, while the representational power and scalability improve model accuracy. In a deepneural network, for instance, each neuron is simple yet powerful in representation, layer by layer, capturing more complex features to enhance model accuracy.

See also

[edit]

References

[edit]
  1. ^Clauset, Aaron (2011)."A brief primer on probability distributions"(PDF).Santa Fe Institute. Archived fromthe original(PDF) on 2012-01-20. Retrieved2011-11-29.
  2. ^Stephanie (2016-05-11)."IID Statistics: Independent and Identically Distributed Definition and Examples".Statistics How To. Retrieved2021-12-09.
  3. ^Hampel, Frank (1998), "Is statistics too difficult?",Canadian Journal of Statistics,26 (3):497–513,doi:10.2307/3315772,hdl:20.500.11850/145503,JSTOR 3315772,S2CID 53117661 (§8).
  4. ^Blum, J. R.; Chernoff, H.; Rosenblatt, M.; Teicher, H. (1958)."Central Limit Theorems for Interchangeable Processes".Canadian Journal of Mathematics.10:222–229.doi:10.4153/CJM-1958-026-0.S2CID 124843240.
  5. ^Cover, T. M.; Thomas, J. A. (2006).Elements Of Information Theory.Wiley-Interscience. pp. 57–58.ISBN 978-0-471-24195-9.
  6. ^Casella & Berger 2002, Theorem 1.5.10

Further reading

[edit]
Discrete time
Continuous time
Both
Fields and other
Time series models
Financial models
Actuarial models
Queueing models
Properties
Limit theorems
Inequalities
Tools
Disciplines
Retrieved from "https://en.wikipedia.org/w/index.php?title=Independent_and_identically_distributed_random_variables&oldid=1317113991"
Categories:
Hidden categories:

[8]ページ先頭

©2009-2025 Movatter.jp