Amalan-ConStat/fitODBODPublic

NotificationsYou must be signed in to change notification settings
Fork0
Star1

fitODBOD : R package to fit Overdispersed Binomial Outcome Data

License

Unknown, MIT licenses found

Licenses found

1 star 0 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
R		R
data		data
docs		docs
inst		inst
man		man
pkgdown/favicon		pkgdown/favicon
renv		renv
tests		tests
vignettes		vignettes
.Rbuildignore		.Rbuildignore
.Rprofile		.Rprofile
.gitignore		.gitignore
CONDUCT.md		CONDUCT.md
CRAN-SUBMISSION		CRAN-SUBMISSION
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
LICENSE.md		LICENSE.md
NAMESPACE		NAMESPACE
NEWS.md		NEWS.md
README.Rmd		README.Rmd
README.md		README.md
_pkgdown.yml		_pkgdown.yml
codecov.yml		codecov.yml
cran-comments.md		cran-comments.md
fitODBOD.Rproj		fitODBOD.Rproj
renv.lock		renv.lock

Repository files navigation

fitODBOD

How to engage with “fitODBOD” the first time ?

## Installing the package from GitHubdevtools::install_github("Amalan-ConStat/fitODBOD")## Installing the package from CRANinstall.packages("fitODBOD")

The previous version of “fitODBOD”, version 1.4.1 is available in thegithub repository asR-fitODBOD.

Key Phrases

BOD (Binomial Outcome Data)
Over Dispersion
Under Dispersion
FBMD (Family of Binomial Mixture Distributions)
ABD (Alternate Binomial Distributions)
PMF (Probability Mass Function)
CPMF (Cumulative Probability Mass Function)

What does “fitODBOD” ?

You can understand BMD & ABD with PMF & CPMF. Further, BOD can bemodeled using these Distributions

Distributions

Alternate Binomial Distributions	Binomial Mixture Distributions
1.Additive Binomial Distribution	1.Uniform Binomial Distribution
2.Beta-Correlated Binomial Distribution	2.Triangular Binomial Distribution
3.COM Poisson Binomial Distribution	3.Beta-Binomial Distribution
4.Correlated Binomial Distribution	4.Kumaraswamy Binomial Distribution
5.Multiplicative Binomial Distribution	5.Gaussian Hypergeometric Generalized Beta-Binomial Distribution
6.Lovinson Multiplicative Binomial Distribution	6.McDonald Generalized Beta-Binomial Distribution
	7.Gamma Binomial Distribution
	8.Grassia II Binomial Distribution

Modelling

To demonstrate the process the Alcohol Consumption Data, which is themost commonly used data-set by the researchers to explainOver-dispersion will be taken {lemmens1988}. In this data-set, thenumber of alcohol consumption days in two reference weeks is separatelyself-reported by a randomly selected sample of$399$ respondents fromthe Netherlands in$1983$. Here, the number of days a given individualconsumes alcohol out of seven days a week can be treated as a Binomialvariable. The collection of all such variables from all respondentswould be defined as “Binomial Outcome Data”.

fitODBODRshiny

An Rshiny application and package namedfitODBODRshinyis available to fit a few selected Binomial Outcome data throughBinomial Mixture and Alternate Binomial distributions.

Step 1

The Alcohol consumption data is already in the necessary format to applysteps$2$ to$5$ and hence, step$1$ can be avoided. The steps$2$ to$5$ can be applied only if the data-set is in the form of a frequencytable as follows.

library("fitODBOD"); library("flextable",quietly=TRUE)## Loading packages#> Hello, This is Amalan. For more details refer --> https://amalan-constat.github.io/fitODBOD/index.html

print(Alcohol_data)## print the alcohol consumption data set#>   Days week1 week2#> 1    0    47    42#> 2    1    54    47#> 3    2    43    54#> 4    3    40    40#> 5    4    40    49#> 6    5    41    40#> 7    6    39    43#> 8    7    95    84

sum(Alcohol_data$week1)## No of respondents or N#> [1] 399

Alcohol_data$Days## Binomial random variables or x#> [1] 0 1 2 3 4 5 6 7

Suppose your data-set is not a frequency table as shown in the followingdata-set calleddatapoints. Then the functionBODextract can be usedto prepare the appropriate format as follows.

datapoints<- sample(0:7,340,replace=TRUE)## creating a set of raw BODhead(datapoints)## first few observations of datapoints dataset#> [1] 3 5 2 5 4 7

## extracting and printing BOD in a usable way for the packagenew_data<- BODextract(datapoints)matrix(c(new_data$RV,new_data$Freq),ncol=2,byrow=FALSE)#>      [,1] [,2]#> [1,]    0   40#> [2,]    1   46#> [3,]    2   47#> [4,]    3   36#> [5,]    4   42#> [6,]    5   38#> [7,]    6   48#> [8,]    7   43

Step 2

As in the second step we test whether the Alcohol Consumption datafollows the Binomial distribution based on the hypothesis given below:

Null Hypothesis : The data follows Binomial Distribution.

Alternate Hypothesis : The data does not follow Binomial Distribution.

Alcohol Consumption data consists of frequency information for two weeksbut only the first week is considered for computation. By doing so theresearcher can verify if the results acquired from the functions aresimilar to the results acquired from previous researchers work.

BinFreq<- fitBin(x=Alcohol_data$Days,obs.fre=Alcohol_data$week1)#> Chi-squared approximation may be doubtful because expected frequency is less than 5

print(BinFreq)#> Call:#> fitBin(x = Alcohol_data$Days, obs.freq = Alcohol_data$week1)#>#> Chi-squared test for Binomial Distribution#>#>       Observed Frequency :  47 54 43 40 40 41 39 95#>#>       expected Frequency :  1.59 13.41 48.3 96.68 116.11 83.66 33.49 5.75#>#>       estimated probability value : 0.5456498#>#>       X-squared : 2911.434   ,df : 6   ,p-value : 0

Looking at the p-value it is clear that null hypothesis is rejected at$5%$ significance level. This indicates that data does not fit theBinomial distribution. The reason for a warning message is that one ofthe expected frequencies in the results is less than five. Now wecompare the actual and the fitting Binomial variances.

## Actual variance of observed frequenciesvar(rep(Alcohol_data$Days,times=Alcohol_data$week1))#> [1] 6.253788

## Calculated variance for frequencies of  fitted Binomial distributionvar(rep(BinFreq$bin.ran.var,times= fitted(BinFreq)))#> [1] 1.696035

The variance of observed frequencies and the variance of fittingfrequencies are 6.2537877 and 1.6960355 respectively, which indicatesOver-dispersion.

Step 3 and 4

Since the Over-dispersion exists in the data now it is necessary to fitthe Binomial Mixture distributions Triangular Binomial, Beta-Binomial,Kumaraswamy Binomial (omitted because this is time consuming), GammaBinomial, Grassia II Binomial, GHGBB and McGBB using the package, andselect the best-fitting distribution using Negative Log likelihoodvalue, p-value and by comparing observed and expected frequencies.Modelling these distributions are given in the next sub-sections.

a) Triangular Binomial distribution.

Maximizing the log likelihood value or in our case minimizing thenegative log likelihood is used in theEstMLExxx functions. Theestimation of themode parameter can be done by using theEstMLETriBin function, and then the estimated value has to be appliedtofitTriBin function to check whether the data fit the TriangularBinomial distribution.

## estimating the modemodeTB<- EstMLETriBin(x=Alcohol_data$Days,freq=Alcohol_data$week1) coef(modeTB)## printing the estimated mode#>  mode#>  0.944444

## printing the Negative log likelihood value which is minimizedNegLLTriBin(x=Alcohol_data$Days,freq=Alcohol_data$week1,mode=modeTB$mode)#> [1] 880.6167

To fit the Triangular Binomial distribution for estimated mode parameterthe following hypothesis is used

Null Hypothesis : The data follows Triangular Binomial Distribution.

Alternate Hypothesis : The data does not follow Triangular BinomialDistribution.

## fitting the Triangular Binomial Distribution for the estimated mode valuefTB<- fitTriBin(x=Alcohol_data$Days,obs.freq=Alcohol_data$week1,mode=modeTB$mode)print(fTB)#> Call:#> fitTriBin(x = Alcohol_data$Days, obs.freq = Alcohol_data$week1,#>     mode = modeTB$mode)#>#> Chi-squared test for Triangular Binomial Distribution#>#>       Observed Frequency :  47 54 43 40 40 41 39 95#>#>       expected Frequency :  11.74 23.47 35.21 46.94 58.66 70.2 79.57 73.21#>#>       estimated Mode value: 0.944444#>#>       X-squared : 193.6159   ,df : 6   ,p-value : 0#>#>       over dispersion : 0.2308269

AIC(fTB)#> [1] 1763.233

var(rep(fTB$bin.ran.var,times= fitted(fTB)))#> [1] 3.786005

Since thep-value is 0 which is less than$0.05$ it is clear that thenull hypothesis is rejected, and the estimatedOver-dispersion is0.2308269. Therefore, it is necessary to fit a better flexibledistribution than the Triangular Binomial distribution.

b) Beta-Binomial distribution.

To estimate the two shape parameters of the Beta-Binomial distributionMethods of Moments or Maximum Likelihood estimation can be used. Usingthe functionEstMLEBetaBin(wrapper function ofmle2 from packagebbmle) the Negative Log likelihood value will be minimized. In orderto estimate the shape parametersa andb, initial shape parametervalues have to be given by the user to this function. These initialvalues have to be in the domain of the shape parameters. Below given isthe pair of estimates for initial values wherea=0.1 andb=0.1.

## estimating the shape parameters a, bestimate<- EstMLEBetaBin(x=Alcohol_data$Days,freq=Alcohol_data$week1,a=0.1,b=0.1)estimate@min## extracting the minimized Negative log likelihood value#> [1] 813.4571

## extracting the estimated shape parameter a, ba1<-bbmle::coef(estimate)[1] ;b1<-bbmle::coef(estimate)[2]  print(c(a1,b1))## printing the estimated shape parameters#>         a         b#> 0.7229420 0.5808483

To fit the Beta-Binomial distribution for estimated (Maximum LikelihoodEstimation method) shape parameters the following hypothesis is used

Null Hypothesis : The data follows Beta-Binomial Distribution by theMaximum Likelihood Estimates.

Alternate Hypothesis: The data does not follow Beta-BinomialDistribution by the Maximum Likelihood Estimates.

## fitting Beta Binomial Distribution for estimated shape parametersfBB1<- fitBetaBin(x=Alcohol_data$Days,obs.fre=Alcohol_data$week1,a=a1,b=b1)print(fBB1)#> Call:#> fitBetaBin(x = Alcohol_data$Days, obs.freq = Alcohol_data$week1,#>     a = a1, b = b1)#>#> Chi-squared test for Beta-Binomial Distribution#>#>           Observed Frequency :  47 54 43 40 40 41 39 95#>#>           expected Frequency :  54.62 42 38.9 38.54 40.07 44 53.09 87.78#>#>           estimated a parameter : 0.722942   ,estimated b parameter : 0.5808483#>#>           X-squared : 9.5171   ,df : 5   ,p-value : 0.0901#>#>           over dispersion : 0.4340673

AIC(fBB1)#> [1] 1630.914

var(rep(fBB1$bin.ran.var,times= fitted(fBB1)))#> [1] 6.24275

Thep-value of 0.0901$> 0.05$ indicates that the null hypothesis isnot rejected. Current estimated shape parameters fit the Beta-Binomialdistribution. Note that the estimatedOver-dispersion parameter is0.4340673.

FunctionEstMGFBetaBin is used as below to estimate shape parametersa andb using Methods of Moments.

## estimating the shape parameter a, bestimate<- EstMGFBetaBin(Alcohol_data$Days,Alcohol_data$week1)print(c(estimate$a,estimate$b))## printing the estimated parameters a, b#> [1] 0.7161628 0.5963324

## finding the minimized negative log likelihood valueNegLLBetaBin(x=Alcohol_data$Days,freq=Alcohol_data$week1,a=estimate$a,b=estimate$b)#> [1] 813.5872

To fit the Beta-Binomial distribution for estimated (Method of Moments)shape parameters the following hypothesis is used

Null Hypothesis : The data follows Beta-Binomial Distribution by theMethod of Moments.

Alternate Hypothesis: The data does not follow Beta-BinomialDistribution by the Method of Moments.

## fitting Beta-Binomial Distribution to estimated shape parametersfBB2<- fitBetaBin(x=Alcohol_data$Days,obs.fre=Alcohol_data$week1,a=estimate$a,b=estimate$b)print(fBB2)#> Call:#> fitBetaBin(x = Alcohol_data$Days, obs.freq = Alcohol_data$week1,#>     a = estimate$a, b = estimate$b)#>#> Chi-squared test for Beta-Binomial Distribution#>#>           Observed Frequency :  47 54 43 40 40 41 39 95#>#>           expected Frequency :  56.6 43.01 39.57 38.97 40.27 43.89 52.39 84.29#>#>           estimated a parameter : 0.7161628   ,estimated b parameter : 0.5963324#>#>           X-squared : 9.7362   ,df : 5   ,p-value : 0.0831#>#>           over dispersion : 0.4324333

AIC(fBB2)#> [1] 1631.174

var(rep(fBB2$bin.ran.var,times= fitted(fBB2)))#> [1] 6.273084

Results from Method of Moments to estimate the parameters have led to ap-value of 0.0831 which is greater than$0.05$ indicates that the nullhypothesis is not rejected. The parameters estimated through Method ofMoments fit the Beta-Binomial distribution for an estimatedOver-dispersion of 0.4324333.

c) Gamma Binomial distribution.

The shape parametersc andl are estimated and fitted below. Supposethe selected input parameters arec=10.1 andl=5.1.

## estimating the shape parametersestimate<- EstMLEGammaBin(x=Alcohol_data$Days,freq=Alcohol_data$week1,c=10.1,l=5.1)estimate@min## extracting the minimized negative log likelihood value#> [1] 814.0045

## extracting the shape parameter c and lc1<-bbmle::coef(estimate)[1] ;l1<-bbmle::coef(estimate)[2]  print(c(c1,l1))## print shape parameters#>         c         l#> 0.6036061 0.6030777

To fit the Gamma Binomial distribution for estimated shape parametersthe following hypothesis is used

Null Hypothesis : The data follows Gamma Binomial Distribution.

Alternate Hypothesis : The data does not follow Gamma BinomialDistribution.

## fitting Gamma Binomial Distribution to estimated shape parametersfGB<- fitGammaBin(x=Alcohol_data$Days,obs.fre=Alcohol_data$week1,c=c1,l=l1)print(fGB)#> Call:#> fitGammaBin(x = Alcohol_data$Days, obs.freq = Alcohol_data$week1,#>     c = c1, l = l1)#>#> Chi-squared test for Gamma Binomial Distribution#>#>       Observed Frequency :  47 54 43 40 40 41 39 95#>#>       expected Frequency :  54.59 41.39 38.7 38.71 40.54 44.69 53.79 86.58#>#>       estimated c parameter : 0.6036061   ,estimated l parameter : 0.6030777#>#>       X-squared : 10.6152   ,df : 5   ,p-value : 0.0596#>#>       over dispersion : 0.4308113

AIC(fGB)#> [1] 1632.009

var(rep(fGB$bin.ran.var,times= fitted(fGB)))#> [1] 6.228652

The null hypothesis is not rejected at$5%$ significance level(p-value= 0.0596) for the estimated parameters$c=$ 0.6036061,0.6030777 and the estimatedOver-dispersion of 0.4308113.

d) Grassia II Binomial distribution.

The shape parametersa andb are estimated and fitted below usingtheEstMLEGammaBin function. Suppose the selected input parameters area=1.1 andb=5.1.

## estimating the shape parametersestimate<- EstMLEGrassiaIIBin(x=Alcohol_data$Days,freq=Alcohol_data$week1,a=1.1,b=5.1)estimate@min## extracting the minimized negative log likelihood value#> [1] 813.0395

# extracting the shape parameter a and ba1<-bbmle::coef(estimate)[1] ;b1<-bbmle::coef(estimate)[2]  print(c(a1,b1))#print shape parameters#>         a         b#> 0.7285039 2.0251513

To fit the Grassia II Binomial distribution for estimated shapeparameters the following hypothesis is used

Null Hypothesis : The data follows Grassia II Binomial Distribution.

Alternate Hypothesis : The data does not follow Grassia II BinomialDistribution.

#fitting Grassia II Binomial Distribution to estimated shape parametersfGB2<- fitGrassiaIIBin(x=Alcohol_data$Days,obs.fre=Alcohol_data$week1,a=a1,b=b1)print(fGB2)#> Call:#> fitGrassiaIIBin(x = Alcohol_data$Days, obs.freq = Alcohol_data$week1,#>     a = a1, b = b1)#>#> Chi-squared test for Grassia II Binomial Distribution#>#>       Observed Frequency :  47 54 43 40 40 41 39 95#>#>       expected Frequency :  55.02 42.36 39.08 38.51 39.78 43.39 52.13 88.74#>#>       estimated a parameter : 0.7285039   ,estimated b parameter : 2.025151#>#>       X-squared : 8.6999   ,df : 5   ,p-value : 0.1216#>#>       over dispersion : 0.259004

AIC(fGB2)#> [1] 1630.079

var(rep(fGB2$bin.ran.var,times= fitted(fGB2)))#> [1] 6.299827

The null hypothesis is not rejected at$5%$ significance level(p-value= 0.1216) for the estimated parameters$a=$ 0.7285039,$b=$2.0251513 and the estimatedOver-dispersion 0.259004.

e) GHGBB distribution.

Now we estimate the shape parameters and fit the GHGBB distribution forthe first set of randomly selected initial input shape parameters ofa=10.1,b=1.1 andc=5.

#estimating the shape parametersestimate<- EstMLEGHGBB(x=Alcohol_data$Days,freq=Alcohol_data$week1,a=10.1,b=1.1,c=5)estimate@min#extracting the minimized negative log likelihood value#> [1] 809.2767

#extracting the shape parameter a, b and ca1<-bbmle::coef(estimate)[1] ;b1<-bbmle::coef(estimate)[2] ;c1<-bbmle::coef(estimate)[3]   print(c(a1,b1,c1))#printing the shape parameters#>         a         b         c#> 1.3506835 0.3245420 0.7005209

To fit the GHGBB distribution for estimated shape parameters thefollowing hypothesis is used.

Null Hypothesis : The data follows Gaussian Hypergeometric GeneralizedBeta-Binomial Distribution.

Alternate Hypothesis : The data does not follow Gaussian HypergeometricGeneralized Beta-Binomial Distribution.

#fitting GHGBB distribution for estimated shape parametersfGG<- fitGHGBB(Alcohol_data$Days,Alcohol_data$week1,a1,b1,c1)print(fGG)#> Call:#> fitGHGBB(x = Alcohol_data$Days, obs.freq = Alcohol_data$week1,#>     a = a1, b = b1, c = c1)#>#> Chi-squared test for Gaussian Hypergeometric Generalized Beta-Binomial Distribution#>#>       Observed Frequency :  47 54 43 40 40 41 39 95#>#>       expected Frequency :  47.88 50.14 46.52 42.08 38.58 37.32 41.78 94.71#>#>       estimated a parameter : 1.350683   ,estimated b parameter : 0.324542 ,#>#>       estimated c parameter : 0.7005209#>#>       X-squared : 1.2835   ,df : 4   ,p-value : 0.8642#>#>       over dispersion : 0.4324875

AIC(fGG)#> [1] 1624.553

var(rep(fGG$bin.ran.var,times= fitted(fGG)))#> [1] 6.249335

The null hypothesis is not rejected at$5%$ significance level(p-value= 0.8642). The estimated shape parameters are$a=$ 1.3506835,$b=$ 0.324542 and$c=$ 0.7005209, where the estimatedOver-dispersionof 0.4324875.

f) McGBB distribution.

Given below is the results generated for the randomly selected initialinput parameters wherea=1.1,b=5 andc=10.

#estimating the shape parametersestimate<- EstMLEMcGBB(x=Alcohol_data$Days,freq=Alcohol_data$week1,a=1.1,b=5,c=10)estimate@min#extracting the negative log likelihood value which is minimized#> [1] 809.7134

#extracting the shape parameter a, b and ca1<-bbmle::coef(estimate)[1] ;b1<-bbmle::coef(estimate)[2] ;c1<-bbmle::coef(estimate)[3] print(c(a1,b1,c1))#printing the shape parameters#>           a           b           c#>  0.04099005  0.21082788 21.67349031

To fit the McGBB distribution for estimated shape parameters thefollowing hypothesis is used

Null Hypothesis : The data follows McDonald Generalized Beta-BinomialDistribution.

Alternate Hypothesis : The data does not follow McDonald GeneralizedBeta-Binomial Distribution.

#fitting the MCGBB distribution for estimated shape parametersfMB<- fitMcGBB(x=Alcohol_data$Days,obs.fre=Alcohol_data$week1,a=a1,b=b1,c=c1)print(fMB)#> Call:#> fitMcGBB(x = Alcohol_data$Days, obs.freq = Alcohol_data$week1,#>     a = a1, b = b1, c = c1)#>#> Chi-squared test for Mc-Donald Generalized Beta-Binomial Distribution#>#>       Observed Frequency :  47 54 43 40 40 41 39 95#>#>       expected Frequency :  51.37 45.63 43.09 41.5 40.42 39.97 42.11 94.91#>#>       estimated a parameter : 0.04099005   ,estimated b parameter : 0.2108279 ,#>#>       estimated c parameter : 21.67349#>#>       X-squared : 2.2222   ,df : 4   ,p-value : 0.695#>#>       over dispersion : 0.4359023

AIC(fMB)#> [1] 1625.427

var(rep(fMB$bin.ran.var,times= fitted(fMB)))#> [1] 6.288273

The null hypothesis is not rejected at$5%$ significance level(p-value= 0.695$> 0.05$). The estimated shape parameters are$a=$0.04099,$b=$ 0.2108279 and$c=$ 21.6734903, and the estimatedOver-dispersion of 0.4359023.

Step 5

Below table presents the expected frequencies, p-values, Negative LogLikelihood values, AIC values, Variance and Over-dispersion of theBinomial Mixture distributions obtained above for the AlcoholConsumption data.

BMD_Data<-tibble::tibble(w=BinFreq$bin.ran.var,x=BinFreq$obs.freq,y=fitted(BinFreq),z=fitted(fTB),a=fitted(fBB1),a1=fitted(fBB2),c=fitted(fGG),d=fitted(fMB),e=fitted(fGB),f=fitted(fGB2))names(BMD_Data)<- c("Bin_RV","Actual_Freq","EstFreq_BinD","EstFreq_TriBinD","EstFreq_BetaBinD(MLE)","EstFreq_BetaBinD(MGF)","EstFreq_GHGBBD","EstFreq_McGBBD","EstFreq_GammaBinD","EstFreq_GrassiaIIBinD")BMD_Total<-colSums(BMD_Data[,-1])BMD_Variance<-c(var(rep(BinFreq$bin.ran.var,times=BinFreq$obs.freq)),                var(rep(BinFreq$bin.ran.var,times=BinFreq$exp.freq)),                var(rep(BinFreq$bin.ran.var,times=fTB$exp.freq)),                var(rep(BinFreq$bin.ran.var,times=fBB1$exp.freq)),                var(rep(BinFreq$bin.ran.var,times=fBB2$exp.freq)),                var(rep(BinFreq$bin.ran.var,times=fGG$exp.freq)),                var(rep(BinFreq$bin.ran.var,times=fMB$exp.freq)),                var(rep(BinFreq$bin.ran.var,times=fGB$exp.freq)),                var(rep(BinFreq$bin.ran.var,times=fGB2$exp.freq))                )BMD_Variance<-round(BMD_Variance,4)BMD_p_value<-c(BinFreq$p.value,fTB$p.value,fBB1$p.value,fBB2$p.value,fGG$p.value,fMB$p.value,fGB$p.value,fGB2$p.value)BMD_NegLL<-c(fTB$NegLL,fBB1$NegLL,fBB2$NegLL,fGG$NegLL,fMB$NegLL,fGB$NegLL,fGB2$NegLL)BMD_NegLL<-round(BMD_NegLL,4)BMD_AIC<-c(AIC(fTB),AIC(fBB1),AIC(fBB2),AIC(fGG),           AIC(fMB),AIC(fGB),AIC(fGB2))BMD_AIC<-round(BMD_AIC,4)C_of_diff_Values<-c(length(BMD_Data$Actual_Freq),                    sum(abs(BMD_Data$Actual_Freq-BMD_Data$EstFreq_BinD)<=5),                    sum(abs(BMD_Data$Actual_Freq-BMD_Data$EstFreq_TriBinD)<=5),                    sum(abs(BMD_Data$Actual_Freq-BMD_Data$'EstFreq_BetaBinD(MLE)')<=5),                    sum(abs(BMD_Data$Actual_Freq-BMD_Data$'EstFreq_BetaBinD(MGF)')<=5),                    sum(abs(BMD_Data$Actual_Freq-BMD_Data$EstFreq_GHGBBD)<=5),                    sum(abs(BMD_Data$Actual_Freq-BMD_Data$EstFreq_McGBBD)<=5),                    sum(abs(BMD_Data$Actual_Freq-BMD_Data$EstFreq_GammaBinD)<=5),                    sum(abs(BMD_Data$Actual_Freq-BMD_Data$EstFreq_GrassiaIIBinD)<=5))Overdispersion_BMD<-c(Overdispersion(fTB),Overdispersion(fBB1),Overdispersion(fBB2),                      Overdispersion(fGG),Overdispersion(fMB),                      Overdispersion(fGB),Overdispersion(fGB2))Variance_difference<-c(abs(BMD_Variance[1]-BMD_Variance))Variance_difference<-round(Variance_difference,4)rbind(BMD_Data,      c("Total",BMD_Total),      c("Variance",BMD_Variance),      c("p-value","-",BMD_p_value),      c("Negative Log Likelihood Value","-","-",BMD_NegLL),      c("AIC","-","-",BMD_AIC),      c("Count of difference Values",C_of_diff_Values),      c("Variance difference",Variance_difference))->BMD_flexed_Dataflextable(data=BMD_flexed_Data,col_keys= c("Bin_RV","Actual_Freq","EstFreq_BinD","EstFreq_TriBinD","EstFreq_BetaBinD(MLE)","EstFreq_BetaBinD(MGF)","EstFreq_GHGBBD","EstFreq_McGBBD","EstFreq_GammaBinD","EstFreq_GrassiaIIBinD"))|>  theme_box()|> autofit()|>  fontsize(i=c(1:15),j=c(1:10),size=15,part="body")|>   fontsize(i=1,j=c(1:10),size=16,part="header")|>   bold(i=1,part="header")|>  bold(i=c(9:15),j=1,part="body")|>  align(i=c(1:15),j=c(1:10),align="center")|>  set_header_labels(values= c(Bin_RV="Binomial Random Variable",Actual_Freq="Observed Frequencies",EstFreq_BinD="Binomial Distribution",EstFreq_TriBinD="Triangular Binomial Distribution",'EstFreq_BetaBinD(MLE)'="Beta Binomial Distribution(MLE)",'EstFreq_BetaBinD(MGF)'="Beta Binomial Distribution(MGF)",EstFreq_GHGBBD="Gaussian Hypergeometric Generalized Beta Binomial Distribution",EstFreq_McGBBD="McDonald Generalized Beta Binomial Distribution",EstFreq_GammaBinD="Gamma Binomial Distribution",EstFreq_GrassiaIIBinD="Grassia II Binomial Distribution"))|>  align(i=1,part="header",align="center")|>   gen_grob(scaling="fixed",fit="width",just="center")->Final_plotplot(Final_plot)

Conclusion

The best-fitting distribution is chosen by comparing five mainmeasurements shown in the above table which are p-value, Negative LogLikelihood value, the count of difference between expected and observedfrequencies in the range of$+/-5$, absolute variance difference and AICvalues.

Then the following five criteria will be considered for the selectionprocedure

Thep-value$> 0.05$ from the hypothesis test.
The Negative Log Likelihood value.
The AIC value.
The number of difference values within the range of$+/-5$.
The absolute variance difference between expected and observedfrequency.

Triangular Binomial and Binomial distributions cannot be fitted sinceits p-value$< 0.05$. The Negative Log Likelihood values of GHGBB andMcGBB distributions are the lowest and are quite similar. Similarly AICvalues are lowest for GHGBB and McGBB, also highest AIC value is forTriangular Binomial distribution. Based on the count of differencevalues for the Beta-Binomial distribution it is four out of eight andsimilar for distributions Gamma Binomial and Grassia II Binomial. Butfor the McGBB distribution it is seven out of eight counts.

Further, Over-dispersion parameters of all three fitted distributionsare same for the second decimal point (Over-dispersion =$0.43$) exceptTriangular Binomial and Grassia II Binomial distributions where they aresimilar for the first decimal point (Over-dispersion =$0.2$). Clearlyvariance difference is highest for Binomial distribution and lowest forGHGBB distribution, while others are significant only from the seconddecimal point.

The best-fitting distribution GHGBB has the highest p-value of$0.8642$,the lowest Negative Log Likelihood value of$809.2767$ and AIC value of$1624.5534$, the count of difference values is eight out of eight andindicates an estimated Over-dispersion of$0.4324875$. The variancedifference between observed and expected frequencies of GHGBB leads tothe smallest value of$0.0045$.

Thank You

About

fitODBOD : R package to fit Overdispersed Binomial Outcome Data

amalan-constat.github.io/fitODBOD/

Topics

r-package overdispersion binomial-outcome-data

Resources

Readme

License

Unknown, MIT licenses found

Releases

No releases published

Packages

No packages published

Movatterモバイル変換

License

Licenses found

Amalan-ConStat/fitODBOD

Folders and files

Latest commit

History

Repository files navigation

fitODBOD

How to engage with “fitODBOD” the first time ?

Key Phrases

What does “fitODBOD” ?

Distributions

Modelling

fitODBODRshiny

Step 1

Step 2

Step 3 and 4

a) Triangular Binomial distribution.

b) Beta-Binomial distribution.

c) Gamma Binomial distribution.

d) Grassia II Binomial distribution.

e) GHGBB distribution.

f) McGBB distribution.

Step 5

Conclusion

Thank You

About

Topics

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Languages

Packages