R package for Maximal Information-Based Nonparametric Explorationcomputation
install.packages("minerva")devtools::install_github('filosi/minerva')mine.library(minerva)x<-0:200/200y<-sin(10* pi* x)+ xmine(x,y,n.cores=1)mine_stat.x<-0:200/200y<-sin(10* pi* x)+ xmine_stat(x, y,measure="mic")mic-r2 measure use thecorR function:x<-0:200/200y<-sin(10* pi* x)+ xr2<-cor(x, y)mm<-mine_stat(x, y,measure="mic")mm- r2**2## mine(x, y, n.cores=1)[[5]]mine_compute_pstat).mine_compute_cstat).x<-matrix(rnorm(1000),ncol=10,nrow=10)y<-as.matrix(rnorm(1000),ncol=10,nrow=20)## Compare feature of the same matrixpstats(x)## Compare features of matrix x with feature in matrix ycstats(x, y)This is inspired to the original implementation by Albanese etal. available in python here:https://github.com/minepy/mictools.
datasaurus<-read.table("https://raw.githubusercontent.com/minepy/mictools/master/examples/datasaurus.txt",header=TRUE,row.names=1,as.is=TRUE,stringsAsFactors=FALSE)datasaurus.m<-t(datasaurus)tic_eAutomatically compute:
tic_e null distribution based on permutations.tic_e for each pair of variable indatasaurus.tic_e.ticnull<-mictools(datasaurus.m,nperm=10000,seed=1234)## Get the names of the named listnames(ticnull)##[1] "tic" "nulldist" "obstic" "obsdist" "pval"ticnull$nulldist| BinStart | BinEnd | NullCount | NullCumSum |
|---|---|---|---|
| 0e+00 | 1e-04 | 0 | 1e+05 |
| 1e-04 | 2e-04 | 0 | 1e+05 |
| 2e-04 | 3e-04 | 0 | 1e+05 |
| 3e-04 | 4e-04 | 0 | 1e+05 |
| 4e-04 | 5e-04 | 0 | 1e+05 |
| 5e-04 | 6e-04 | 0 | 1e+05 |
| … | … | …. | …. |
ticnull$obsdist| BinStart | BinEnd | Count | CountCum |
|---|---|---|---|
| 0e+00 | 1e-04 | 0 | 325 |
| 1e-04 | 2e-04 | 0 | 325 |
| 2e-04 | 3e-04 | 0 | 325 |
| 3e-04 | 4e-04 | 0 | 325 |
| 4e-04 | 5e-04 | 0 | 325 |
| 5e-04 | 6e-04 | 0 | 325 |
| … | … | …. | …. |
Plottic_e and pvalue distribution.
hist(ticnull$tic)hist(ticenull$pval,breaks=50,freq=FALSE)Usep.adjust.method to use a different pvalue correctionmethod, or use theqvalue package to use Storey’sqvalue.
## Correct pvalues using qvalueqobj<-qvalue(ticnull$pval$pval)## Add column in the pval data.frameticnull$pval$qvalue<- qobj$qvalueticnull$pvalSame table as above with the qvalue column added at the end.
| pval | I1 | I2 | Var1 | Var2 | adj.P.Val | qvalue |
|---|---|---|---|---|---|---|
| 0.5202 | 1 | 2 | away_x | bullseye_x | 0.95 | 1 |
| 0.9533 | 1 | 3 | away_x | circle_x | 0.99 | 1 |
| 0.0442 | 1 | 4 | away_x | dino_x | 0.52 | 0 |
| 0.6219 | 1 | 5 | away_x | dots_x | 0.95 | 1 |
| 0.8922 | 1 | 6 | away_x | h_lines_x | 0.98 | 1 |
| 0.3972 | 1 | 7 | away_x | high_lines_x | 0.91 | 1 |
| … | … | … | … | … | … | …. |
## Use columns of indexes and FDR adjusted pvaluemicres<-mic_strength(datasaurus.m, ticnull$pval,pval.col=c(6,2,3))| TicePval | MIC | I1 | I2 |
|---|---|---|---|
| 0.0457 | 0.42 | 2 | 15 |
| 0.0000 | 0.63 | 3 | 16 |
| 0.0196 | 0.50 | 5 | 18 |
| 0.0162 | 0.36 | 9 | 22 |
| 0.0000 | 0.63 | 10 | 23 |
| 0.0000 | 0.57 | 13 | 26 |
| … | … | … | … |
Association strength computed based on theqvalueadjusted pvalue
## Use qvalue adjusted pvaluemicresq<-mic_strength(datasaurus.m, ticnull$pval,pval.col=c("qvalue","Var1","Var2"))| TicePval | MIC | I1 | I2 |
|---|---|---|---|
| 0.0401 | 0.42 | bullseye_x | bullseye_y |
| 0.0000 | 0.63 | circle_x | circle_y |
| 0.0172 | 0.50 | dots_x | dots_y |
| 0.0143 | 0.36 | slant_up_x | slant_up_y |
| 0.0000 | 0.63 | star_x | star_y |
| 0.0000 | 0.57 | x_shape_x | x_shape_y |
| … | … | … | … |
| minepy2013 | Davide Albanese, Michele Filosi, RobertoVisintainer, Samantha Riccadonna, Giuseppe Jurman and Cesare Furlanello.minerva and minepy:a C engine for the MINE suite and its R, Pythonand MATLAB wrappers. Bioinformatics (2013) 29(3): 407-408 firstpublished online December 14, 2012 |
| mictools2018 | Davide Albanese, Samantha Riccadonna,Claudio Donati, Pietro Franceschi.A practical tool for maximalinformation coefficient analysis. GigaScience (2018) |