Movatterモバイル変換


[0]ホーム

URL:


Jump to content
WikipediaThe Free Encyclopedia
Search

Negative hypergeometric distribution

From Wikipedia, the free encyclopedia
Discrete probability distribution
Negative hypergeometric
Probability mass function
Several examples of the PMF of the negative hypergeometric probability distribution.
Cumulative distribution function
Several examples of the CDF of the negative hypergeometric probability distribution.
Parameters

N{0,1,2,}{\displaystyle N\in \left\{0,1,2,\dots \right\}} - total number of elements
K{0,1,2,,N}{\displaystyle K\in \left\{0,1,2,\dots ,N\right\}} - total number of 'success' elements

r{0,1,2,,NK}{\displaystyle r\in \left\{0,1,2,\dots ,N-K\right\}} - number of failures when experiment is stopped
Supportk{0,,K}{\displaystyle k\in \left\{0,\,\dots ,\,K\right\}} - number of successes when experiment is stopped.
PMF(k+r1k)(NrkKk)(NK){\displaystyle {\frac {{{k+r-1} \choose {k}}{{N-r-k} \choose {K-k}}}{N \choose K}}}
MeanrKNK+1{\displaystyle r{\frac {K}{N-K+1}}}
Variancer(N+1)K(NK+1)(NK+2)[1rNK+1]{\displaystyle r{\frac {(N+1)K}{(N-K+1)(N-K+2)}}[1-{\frac {r}{N-K+1}}]}

Inprobability theory andstatistics, thenegative hypergeometric distribution describes probabilities for when sampling from a finite population without replacement in which each sample can be classified into two mutually exclusive categories like Pass/Fail or Employed/Unemployed. As random selections are made from the population, each subsequent draw decreases the population causing the probability of success to change with each draw. Unlike the standardhypergeometric distribution, which describes the number of successes in a fixed sample size, in the negative hypergeometric distribution, samples are drawn untilr{\displaystyle r} failures have been found, and the distribution describes the probability of findingk{\displaystyle k} successes in such a sample. In other words, the negative hypergeometric distribution describes the likelihood ofk{\displaystyle k} successes in a sample with exactlyr{\displaystyle r} failures.

Definition

[edit]

There areN{\displaystyle N} elements, of whichK{\displaystyle K} are defined as "successes" and the rest are "failures".

Elements are drawn one after the other,without replacements, untilr{\displaystyle r} failures are encountered. Then, the drawing stops and the numberk{\displaystyle k} of successes is counted. The negative hypergeometric distribution,NHGN,K,r(k){\displaystyle NHG_{N,K,r}(k)} is thediscrete distribution of thisk{\displaystyle k}.

[1]

The negative hypergeometric distribution is a special case of thebeta-binomial distribution[2] with parametersα=r{\displaystyle \alpha =r} andβ=NKr+1{\displaystyle \beta =N-K-r+1} both being integers (andn=K{\displaystyle n=K}).

The outcome requires that we observek{\displaystyle k} successes in(k+r1){\displaystyle (k+r-1)} draws and the(k+r)-th{\displaystyle (k+r){\text{-th}}} bit must be a failure. The probability of the former can be found by the direct application of thehypergeometric distribution(HGN,K,k+r1(k)){\displaystyle (HG_{N,K,k+r-1}(k))} and the probability of the latter is simply the number of failures remaining(=NK(r1)){\displaystyle (=N-K-(r-1))} divided by the size of the remaining population(=N(k+r1){\displaystyle (=N-(k+r-1)}. The probability of having exactlyk{\displaystyle k} successes up to ther-th{\displaystyle r{\text{-th}}} failure (i.e. the drawing stops as soon as the sample includes the predefined number ofr{\displaystyle r} failures) is then the product of these two probabilities:

(Kk)(NKk+r1k)(Nk+r1)NK(r1)N(k+r1)=(k+r1k)(NrkKk)(NK).{\displaystyle {\frac {{\binom {K}{k}}{\binom {N-K}{k+r-1-k}}}{\binom {N}{k+r-1}}}\cdot {\frac {N-K-(r-1)}{N-(k+r-1)}}={\frac {{{k+r-1} \choose {k}}{{N-r-k} \choose {K-k}}}{N \choose K}}.}

Therefore, arandom variableX{\displaystyle X} follows the negative hypergeometric distribution if itsprobability mass function (pmf) is given by

f(k;N,K,r)Pr(X=k)=(k+r1k)(NrkKk)(NK)for k=0,1,2,,K{\displaystyle f(k;N,K,r)\equiv \Pr(X=k)={\frac {{{k+r-1} \choose {k}}{{N-r-k} \choose {K-k}}}{N \choose K}}\quad {\text{for }}k=0,1,2,\dotsc ,K}

where

By design the probabilities sum up to 1. However, in case we want show it explicitly we have:

k=0KPr(X=k)=k=0K(k+r1k)(NrkKk)(NK)=1(NK)k=0K(k+r1k)(NrkKk)=1(NK)(NK)=1,{\displaystyle \sum _{k=0}^{K}\Pr(X=k)=\sum _{k=0}^{K}{\frac {{{k+r-1} \choose {k}}{{N-r-k} \choose {K-k}}}{N \choose K}}={\frac {1}{N \choose K}}\sum _{k=0}^{K}{{k+r-1} \choose {k}}{{N-r-k} \choose {K-k}}={\frac {1}{N \choose K}}{N \choose K}=1,}

where we have used that,

j=0k(j+mj)(nmjkj)=j=0k(1)j(m1j)(1)kj(m+1+kn2kj)=(1)kj=0k(m1j)(kn2(m1)kj)=(1)k(kn2k)=(1)k(k(n+1)1k)=(n+1k),{\displaystyle {\begin{aligned}\sum _{j=0}^{k}{\binom {j+m}{j}}{\binom {n-m-j}{k-j}}&=\sum _{j=0}^{k}(-1)^{j}{\binom {-m-1}{j}}(-1)^{k-j}{\binom {m+1+k-n-2}{k-j}}\\&=(-1)^{k}\sum _{j=0}^{k}{\binom {-m-1}{j}}{\binom {k-n-2-(-m-1)}{k-j}}\\&=(-1)^{k}{\binom {k-n-2}{k}}\\&=(-1)^{k}{\binom {k-(n+1)-1}{k}}\\&={\binom {n+1}{k}},\end{aligned}}}

which can be derived using thebinomial identity,

(nk)=(1)k(kn1k),{\displaystyle {{n \choose k}=(-1)^{k}{k-n-1 \choose k}},}

and theChu–Vandermonde identity,

j=0k(mj)(nmkj)=(nk),{\displaystyle \sum _{j=0}^{k}{\binom {m}{j}}{\binom {n-m}{k-j}}={\binom {n}{k}},}

which holds for any complex-valuesm{\displaystyle m} andn{\displaystyle n} and any non-negative integerk{\displaystyle k}.

Expectation

[edit]

When counting the numberk{\displaystyle k} of successes beforer{\displaystyle r} failures, the expected number of successes isrKNK+1{\displaystyle {\frac {rK}{N-K+1}}} and can be derived as follows.

E[X]=k=0KkPr(X=k)=k=0Kk(k+r1k)(NrkKk)(NK)=r(NK)[k=0K(k+r)r(k+r1r1)(NrkKk)]r=r(NK)[k=0K(k+rr)(NrkKk)]r=r(NK)[k=0K(k+rk)(NrkKk)]r=r(NK)[(N+1K)]r=rKNK+1,{\displaystyle {\begin{aligned}E[X]&=\sum _{k=0}^{K}k\Pr(X=k)=\sum _{k=0}^{K}k{\frac {{{k+r-1} \choose {k}}{{N-r-k} \choose {K-k}}}{N \choose K}}={\frac {r}{N \choose K}}\left[\sum _{k=0}^{K}{\frac {(k+r)}{r}}{{k+r-1} \choose {r-1}}{{N-r-k} \choose {K-k}}\right]-r\\&={\frac {r}{N \choose K}}\left[\sum _{k=0}^{K}{{k+r} \choose {r}}{{N-r-k} \choose {K-k}}\right]-r={\frac {r}{N \choose K}}\left[\sum _{k=0}^{K}{{k+r} \choose {k}}{{N-r-k} \choose {K-k}}\right]-r\\&={\frac {r}{N \choose K}}\left[{{N+1} \choose K}\right]-r={\frac {rK}{N-K+1}},\end{aligned}}}

where we have used the relationshipj=0k(j+mj)(nmjkj)=(n+1k){\displaystyle \sum _{j=0}^{k}{\binom {j+m}{j}}{\binom {n-m-j}{k-j}}={\binom {n+1}{k}}}, that we derived above to show that the negative hypergeometric distribution was properly normalized.

Variance

[edit]

The variance can be derived by the following calculation.

E[X2]=k=0Kk2Pr(X=k)=[k=0K(k+r)(k+r+1)Pr(X=k)](2r+1)E[X]r2r=r(r+1)(NK)[k=0K(k+r+1r+1)(N+1(r+1)kKk)](2r+1)E[X]r2r=r(r+1)(NK)[(N+2K)](2r+1)E[X]r2r=rK(Nr+Kr+1)(NK+1)(NK+2){\displaystyle {\begin{aligned}E[X^{2}]&=\sum _{k=0}^{K}k^{2}\Pr(X=k)=\left[\sum _{k=0}^{K}(k+r)(k+r+1)\Pr(X=k)\right]-(2r+1)E[X]-r^{2}-r\\&={\frac {r(r+1)}{N \choose K}}\left[\sum _{k=0}^{K}{{k+r+1} \choose {r+1}}{{N+1-(r+1)-k} \choose {K-k}}\right]-(2r+1)E[X]-r^{2}-r\\&={\frac {r(r+1)}{N \choose K}}\left[{{N+2} \choose K}\right]-(2r+1)E[X]-r^{2}-r={\frac {rK(N-r+Kr+1)}{(N-K+1)(N-K+2)}}\end{aligned}}}

Then the variance isVar[X]=E[X2](E[X])2=rK(N+1)(NKr+1)(NK+1)2(NK+2){\displaystyle {\textrm {Var}}[X]=E[X^{2}]-\left(E[X]\right)^{2}={\frac {rK(N+1)(N-K-r+1)}{(N-K+1)^{2}(N-K+2)}}}

Related distributions

[edit]

If the drawing stops after a constant numbern{\displaystyle n} of draws (regardless of the number of failures), then the number of successes has thehypergeometric distribution,HGN,K,n(k){\displaystyle HG_{N,K,n}(k)}. The two functions are related in the following way:[1]

NHGN,K,r(k)=1HGN,NK,k+r(r1){\displaystyle NHG_{N,K,r}(k)=1-HG_{N,N-K,k+r}(r-1)}

Negative-hypergeometric distribution (like the hypergeometric distribution) deals with drawswithout replacement, so that the probability of success is different in each draw. In contrast, negative-binomial distribution (like the binomial distribution) deals with drawswith replacement, so that the probability of success is the same and the trials are independent. The following table summarizes the four distributions related to drawing items:

With replacementsNo replacements
# of successes in constant # of drawsbinomial distributionhypergeometric distribution
# of successes in constant # of failuresnegative binomial distributionnegative hypergeometric distribution

Some authors[3][4] define the negative hypergeometric distribution to be the number of draws required to get ther{\displaystyle r}th failure. If we letY{\displaystyle Y} denote this number then it is clear thatY=X+r{\displaystyle Y=X+r} whereX{\displaystyle X} is as defined above. Hence the PMF

Pr(Y=y)=(y1r1)(NyNKr)(NNK).{\displaystyle \Pr(Y=y)={\binom {y-1}{r-1}}{\frac {\binom {N-y}{N-K-r}}{\binom {N}{N-K}}}.}

If we let the number of failuresNK{\displaystyle N-K} be denoted byM{\displaystyle M} means that we have

Pr(Y=y)=(y1r1)(NyMr)(NM).{\displaystyle \Pr(Y=y)={\binom {y-1}{r-1}}{\frac {\binom {N-y}{M-r}}{\binom {N}{M}}}.}

The support ofY{\displaystyle Y} is the set{r,r+1,,NM+r}{\displaystyle \{r,r+1,\dots ,N-M+r\}}. It is clear that:

E[Y]=E[X]+r=r(N+1)M+1{\displaystyle E[Y]=E[X]+r={\frac {r(N+1)}{M+1}}}

andVar[X]=Var[Y]{\displaystyle {\textrm {Var}}[X]={\textrm {Var}}[Y]}.

References

[edit]
  1. ^abNegative hypergeometric distribution in Encyclopedia of Math.
  2. ^Johnson, Norman L.;Kemp, Adrienne W.; Kotz, Samuel (2005).Univariate Discrete Distributions. Wiley.ISBN 0-471-27246-9. §6.2.2 (p.253–254)
  3. ^Rohatgi, Vijay K., and AK Md Ehsanes Saleh. An introduction to probability and statistics. John Wiley & Sons, 2015.
  4. ^Khan, RA (1994). A note on the generating function of a negative hypergeometric distribution. Sankhya: The Indian Journal of Statistics B, 56(3), 309-313.
Discrete
univariate
with finite
support
with infinite
support
Continuous
univariate
supported on a
bounded interval
supported on a
semi-infinite
interval
supported
on the whole
real line
with support
whose type varies
Mixed
univariate
continuous-
discrete
Multivariate
(joint)
Directional
Degenerate
andsingular
Degenerate
Dirac delta function
Singular
Cantor
Families
Retrieved from "https://en.wikipedia.org/w/index.php?title=Negative_hypergeometric_distribution&oldid=1291634405"
Categories:
Hidden categories:

[8]ページ先頭

©2009-2026 Movatter.jp