Movatterモバイル変換

The multivariate negative binomial distribution.

Recall that a random variable\(X\) has (univariate) negative binomial law with parameters\(\kappa>0, 0<p<1\), i.e.,\(X\sim \text{NB}(\kappa, p)\) if its probability mass function is given by\[P(X=x) = {\kappa+x-1 \choose x} p^x(1-p)^{\kappa}, \quad x \in \{0,1,\ldots\}.\]

A random vector\({\bf X}=(X_1,\dots,X_n)'\) is said to follow the multivariate negative binomial distribution with parameters\(\kappa, p_1, \dots, p_n\) if its probability mass function is given by\[P({\bf X}={\bf x})=\frac{\Gamma(x_1+\cdots+x_n+\kappa)}{x_1! \cdots x_n! \Gamma(\kappa)}p_1^{x_1}\cdots p_n^{x_n}(1-p_1-\cdots-p_n)^{\kappa},\] where, for\(i=1,\dots,n\),\(x_i\in\{0,1,\dots\}\),\(0<p_i<1\) such that\(\sum_{i=1}^np_i<1\) and\(\kappa>0\).

We note that the functionstats::rnbinom can be used to simulate from the univariate negative binomial distribution. Thetrawl package introduces the functionBivariate_NBsim which generates samples from the bivariate negative binomial distribution. The simulation algorithm proceeds in two steps: First, we simulate\(X_1\) from the univariate negative binomial distribution NB(\(\kappa\),\(p_1/(1-p_2)\)). Second, we simulate\(X_2|X_1=x_1\) from the univariate negative binomial distribution NB(\(\kappa+x_1,p_2\)), see for instance(Dunn 1967).

###Example

set.seed(1)kappa<-3p1<-0.1p2<-0.85p<- p1+p2p0<-1-p1-p2N<-10000#Simulate from the bivariate negative binomial distributiony<- trawl::Bivariate_NBsim(N,kappa,p1,p2)#Compare the empirical and theoretical mean of the first componentbase::mean(y[,1])#> [1] 5.9653kappa*p1/(1-p)#> [1] 6#Compare the empirical and theoretical variance of the first componentstats::var(y[,1])#> [1] 18.0047kappa*p1*(1-p2)/(1-p)^2#> [1] 18#Compare the empirical and theoretical mean of the second componentbase::mean(y[,2])#> [1] 50.9742kappa*p2/(1-p)#> [1] 51#Compare the empirical and theoretical variance of the second componentstats::var(y[,2])#> [1] 926.3358kappa*p2*(1-p1)/(1-p)^2#> [1] 918#Compare the empirical and theoretical correlation between the two componentsstats::cor(y[,1],y[,2])#> [1] 0.7891007(p1*p2/(p0+p1)*(p0+p2))^(1/2)#> [1] 0.7141428

The multivariate logarithmic series distribution

We say that a vector\({\bf X}=(X_1,\dots,X_n)'\) follows the multivariate logarithmic series distribution (LSD), see, e.g.,(Patil and Bildikar 1967).\({\bf X} \sim \text{LSD}(p_1,\ldots,p_n)\), where\(0<p_i<1, p:=\sum_{i=1}^np_i <1\) if for\({\bf x}\in \mathbb{N}_0^n \setminus \{{\bf 0} \}\), if its probability mass function is given by\[P({\bf X}={\bf x})=\frac{\Gamma(x_1+\cdots+x_n)}{x_1! \cdots x_n!}\frac{p_1^{x_1}\cdots p_n^{x_n}}{\{-\ln(1-p)\}}.\] Note that each component\(X_i\) follows the modified univariate logarithmic distribution with parameters\(\tilde p_i = p_i/(1-p+p_i)\) and\(\delta_i= \ln(1-p+p_i)/\ln(1-p)\), i.e.,\(X_i\sim\text{ModLSD}(\tilde p_i, \delta_i)\) with\[P(X_i =x_i) = \left\{ \begin{array}{ll} \delta_i, & \text{for } x_i=0\\(1-\delta_i) \frac{1}{x_i}\frac{\tilde p_i^{x_i}}{\{-\ln(1-\tilde p_i)\}}, & \text{for } x_i \in \mathbb{N}. \end{array} \right.\] Simulations from the univariate LSD can be carried out using the functionRunuran::urlogarithmic. Thetrawl package implements the functionsBivariate_LSDsim andTrivariate_LSDsim to simulate from both the bivariate and the trivariate logarithmic series distribution.

Simulating from the bivariate logarithmic series distribution

The functionBivariate_NBsim can be used to simulate from the bivariate logarithmic series distribution. To this end, note that the probability mass function of a random vector\({\bf X}=(X_1,X_2)'\) following the bivariate logarithmic series distribution with parameters\(0<p_1, p_2<1\) with\(p:=p_1+p_2<1\) is given by\[P(X_1=x_1,X_2=x_2)=\frac{\Gamma(x_1+x_2)}{x_1!x_2!}\frac{p_1^{x_1}p_2^{x_2}}{\{-\ln(1-p)\}},\] for\(x_1,x_2=0,1,2,\dots\) such that\(x_1+x_2>0\).

The simulation proceeds in two steps: First,\(X_1\) is simulated from the modified logarithmic distribution with parameters\(\tilde p_1=p_1/(1-p_2)\) and\(\delta_1=\ln(1-p_2)/\ln(1-p)\). Then we simulate\(X_2\) conditional on\(X_1\). We note that\(X_2|X_1=x_1\) follows the logarithmic series distribution with parameter\(p_2\) when\(x_1=0\), and the negative binomial distribution with parameters\((x_1,p_2)\) when\(x_1>0\).

Example

Next we provide an example of a simulation from the bivariate LSD and we showcase the functionsModLSD_Mean,ModLSD_Var,BivLSD_Cor andBivLSD_Cov which compute the mean and the variance of the univariate modified LSD and the correlation and covariance of the bivariate LSD, respectively.

set.seed(1)p1<-0.15p2<-0.3N<-10000#Simulate N realisations from the bivariate LSDy<-trawl::Bivariate_LSDsim(N, p1, p2)#Compute the empirical and theoretical mean of the first componentbase::mean(y[,1])#> [1] 0.4575trawl::ModLSD_Mean(base::log(1-p2)/base::log(1-p1-p2),p1/(1-p2))#> [1] 0.45619#Compute the empirical and theoretical mean of the second componentbase::mean(y[,2])#> [1] 0.8995trawl::ModLSD_Mean(base::log(1-p1)/base::log(1-p1-p2),p2/(1-p1))#> [1] 0.91238#Compute the empirical and theoretical variance of the first componentstats::var(y[,1])#> [1] 0.3686306trawl::ModLSD_Var(base::log(1-p2)/base::log(1-p1-p2),p1/(1-p2))#> [1] 0.3724961#Compute the empirical and theoretical variance of the second componentstats::var(y[,2])#> [1] 0.5690567trawl::ModLSD_Var(base::log(1-p1)/base::log(1-p1-p2),p2/(1-p1))#> [1] 0.5776045##Compute the empirical and theoretical correlation between the two componentsstats::cor(y[,1],y[,2])#> [1] -0.3961486trawl::BivLSD_Cor(p1,p2)#> [1] -0.3608673##Compute the empirical and theoretical covariance between the two componentsstats::cov(y[,1],y[,2])#> [1] -0.1814394trawl::BivLSD_Cov(p1,p2)#> [1] -0.1673877

Simulating from the trivariate logarithmic series distribution

The functionTrivariate_NBsim can be used to simulate from the trivariate logarithmic series distribution. The simulation proceeds in two steps: First,\(X_1\) is simulated from the modified logarithmic distribution with parameters\(\tilde p_1=p_1/(1-p_2-p_3)\) and\(\delta_1=\ln(1-p_2-p_3)/\ln(1-p)\). Then we simulate\((X_2,X_3)'\) conditional on\(X_1\). We note that\((X_2,X_3)'|X_1=x_1\) follows the bivariate logarithmic series distribution with paramaters\((p_2,p_3)\) when\(x_1=0\), and the bivariate negative binomial distribution with parameters\((x_1,p_2,p_3)\) when\(x_1>0\).

Example

set.seed(1)p1<-0.15p2<-0.25p3<-0.55N<-10000#Simulate N realisations from the bivariate LSDy<-trawl::Trivariate_LSDsim(N, p1, p2, p3)#Compute the empirical and theoretical mean of the first componentbase::mean(y[,1])#> [1] 1.0152trawl::ModLSD_Mean(base::log(1-p2-p3)/base::log(1-p1-p2-p3),p1/(1-p2-p3))#> [1] 1.001425#Compute the empirical and theoretical mean of the second componentbase::mean(y[,2])#> [1] 1.6857trawl::ModLSD_Mean(base::log(1-p1-p3)/base::log(1-p1-p2-p3),p2/(1-p1-p3))#> [1] 1.669041#Compute the empirical and theoretical mean of the third componentbase::mean(y[,3])#> [1] 3.7275trawl::ModLSD_Mean(base::log(1-p1-p2)/base::log(1-p1-p2-p3),p3/(1-p1-p2))#> [1] 3.67189#Compute the empirical and theoretical variance of the first componentstats::var(y[,1])#> [1] 2.999069trawl::ModLSD_Var(base::log(1-p2-p3)/base::log(1-p1-p2-p3),p1/(1-p2-p3))#> [1] 3.002847#Compute the empirical and theoretical variance of the second componentstats::var(y[,2])#> [1] 7.255041trawl::ModLSD_Var(base::log(1-p1-p3)/base::log(1-p1-p2-p3),p2/(1-p1-p3))#> [1] 7.228548#Compute the empirical and theoretical variance of the third componentstats::var(y[,3])#> [1] 30.87613trawl::ModLSD_Var(base::log(1-p1-p2)/base::log(1-p1-p2-p3),p3/(1-p1-p2))#> [1] 30.5799#Computing the bivariate covariances and correlations#Cor(X1,X2):delta<- base::log(1-p3)/base::log(1-p1-p2-p3)hatp1<-p1/(1-p3)hatp2<-p2/(1-p3)stats::cov(y[,1],y[,2])#> [1] 3.316309trawl::BivModLSD_Cov(delta,hatp1,hatp2)#> [1] 3.335704stats::cor(y[,1],y[,2])#> [1] 0.7109545trawl::BivModLSD_Cor(delta,hatp1,hatp2)#> [1] 0.6041258#Cor(X1,X3):delta<-log(1-p2)/log(1-p1-p2-p3)hatp1<-p1/(1-p2)hatp2<-p3/(1-p2)stats::cov(y[,1],y[,3])#> [1] 7.287671trawl::BivModLSD_Cov(delta,hatp1,hatp2)#> [1] 7.338549stats::cor(y[,1],y[,3])#> [1] 0.7573281trawl::BivModLSD_Cor(delta,hatp1,hatp2)#> [1] 0.7220044#Cor(X2,X3):delta<-log(1-p1)/log(1-p1-p2-p3)hatp1<-p2/(1-p1)hatp2<-p3/(1-p1)stats::cov(y[,2],y[,3])#> [1] 12.31899trawl::BivModLSD_Cov(delta,hatp1,hatp2)#> [1] 12.23092stats::cor(y[,2],y[,3])#> [1] 0.8230829trawl::BivModLSD_Cor(delta,hatp1,hatp2)#> [1] 0.7969081

##References

Dunn, James E. 1967.“Characterization of the Bivariate Negative Binomial Distribution.”Journal of the Arkansas Academy of Science 21.

Patil, Ganapati P., and Sheela Bildikar. 1967.“Multivariate Logarithmic Series Distribution as a Probability Model in Population and Community Ecology and Some of Its Statistical Properties.”Journal of the American Statistical Association 62 (318): 655–74.

Movatterモバイル変換

Simulation from the bivariate negative binomial and bi- and trivariate logarithmic series distribution

Almut E. D. Veraart

2021-02-22

The multivariate negative binomial distribution.

The multivariate logarithmic series distribution

Simulating from the bivariate logarithmic series distribution

Example

Simulating from the trivariate logarithmic series distribution

Example