Movatterモバイル変換

minerva

R package for Maximal Information-Based Nonparametric Explorationcomputation

MinepyHomepage
MinepyGithub
MictoolsGithub

Install

Latest cran release

install.packages("minerva")

Development version

devtools::install_github('filosi/minerva')

Usage

Basic usage with helper functionmine.

library(minerva)x<-0:200/200y<-sin(10* pi* x)+ xmine(x,y,n.cores=1)

Compute a single measure from the MINE suite usingmine_stat.
- Available mesures are:mic,mas,mev,mcn,tic,gmic

x<-0:200/200y<-sin(10* pi* x)+ xmine_stat(x, y,measure="mic")

To compute themic-r2 measure use thecorR function:

x<-0:200/200y<-sin(10* pi* x)+ xr2<-cor(x, y)mm<-mine_stat(x, y,measure="mic")mm- r2**2## mine(x, y, n.cores=1)[[5]]

Compute statistic onmatrices

All features in a single matrix(mine_compute_pstat).
All possible combination of features between two matrices(mine_compute_cstat).
- When comparing two matrices the function check for euquality ofnumber of rows of the two matrices. If the matrices have differentnumber of rows then an error is thrown.

x<-matrix(rnorm(1000),ncol=10,nrow=10)y<-as.matrix(rnorm(1000),ncol=10,nrow=20)## Compare feature of the same matrixpstats(x)## Compare features of matrix x with feature in matrix ycstats(x, y)

Mictools pipeline

This is inspired to the original implementation by Albanese etal. available in python here:https://github.com/minepy/mictools.

Reading the data frommictool repository

datasaurus<-read.table("https://raw.githubusercontent.com/minepy/mictools/master/examples/datasaurus.txt",header=TRUE,row.names=1,as.is=TRUE,stringsAsFactors=FALSE)datasaurus.m<-t(datasaurus)

Compute null distributionfor`tic_e`

Automatically compute:

tic_e null distribution based on permutations.
histogram of the distribution with cumulative distribution.
Observed values oftic_e for each pair of variable indatasaurus.
Observed distribution oftic_e.
P-value for each variable pair association.

ticnull<-mictools(datasaurus.m,nperm=10000,seed=1234)## Get the names of the named listnames(ticnull)##[1]  "tic"      "nulldist" "obstic"   "obsdist"  "pval"

Null Distribution

ticnull$nulldist

BinStart	BinEnd	NullCount	NullCumSum
0e+00	1e-04	0	1e+05
1e-04	2e-04	0	1e+05
2e-04	3e-04	0	1e+05
3e-04	4e-04	0	1e+05
4e-04	5e-04	0	1e+05
5e-04	6e-04	0	1e+05
…	…	….	….

Observed distribution

ticnull$obsdist

BinStart	BinEnd	Count	CountCum
0e+00	1e-04	0	325
1e-04	2e-04	0	325
2e-04	3e-04	0	325
3e-04	4e-04	0	325
4e-04	5e-04	0	325
5e-04	6e-04	0	325
…	…	….	….

Plottic_e and pvalue distribution.

hist(ticnull$tic)hist(ticenull$pval,breaks=50,freq=FALSE)

Usep.adjust.method to use a different pvalue correctionmethod, or use theqvalue package to use Storey’sqvalue.

## Correct pvalues using qvalueqobj<-qvalue(ticnull$pval$pval)## Add column in the pval data.frameticnull$pval$qvalue<- qobj$qvalueticnull$pval

Same table as above with the qvalue column added at the end.

pval	I1	I2	Var1	Var2	adj.P.Val	qvalue
0.5202	1	2	away_x	bullseye_x	0.95	1
0.9533	1	3	away_x	circle_x	0.99	1
0.0442	1	4	away_x	dino_x	0.52	0
0.6219	1	5	away_x	dots_x	0.95	1
0.8922	1	6	away_x	h_lines_x	0.98	1
0.3972	1	7	away_x	high_lines_x	0.91	1
…	…	…	…	…	…	….

Strenght of the association(MIC)

## Use columns of indexes and FDR adjusted pvaluemicres<-mic_strength(datasaurus.m, ticnull$pval,pval.col=c(6,2,3))

TicePval	MIC	I1	I2
0.0457	0.42	2	15
0.0000	0.63	3	16
0.0196	0.50	5	18
0.0162	0.36	9	22
0.0000	0.63	10	23
0.0000	0.57	13	26
…	…	…	…

Association strength computed based on theqvalueadjusted pvalue

## Use qvalue adjusted pvaluemicresq<-mic_strength(datasaurus.m, ticnull$pval,pval.col=c("qvalue","Var1","Var2"))

TicePval	MIC	I1	I2
0.0401	0.42	bullseye_x	bullseye_y
0.0000	0.63	circle_x	circle_y
0.0172	0.50	dots_x	dots_y
0.0143	0.36	slant_up_x	slant_up_y
0.0000	0.63	star_x	star_y
0.0000	0.57	x_shape_x	x_shape_y
…	…	…	…

Citing minepy/minerva andmictools

minepy2013	Davide Albanese, Michele Filosi, RobertoVisintainer, Samantha Riccadonna, Giuseppe Jurman and Cesare Furlanello.minerva and minepy:a C engine for the MINE suite and its R, Pythonand MATLAB wrappers. Bioinformatics (2013) 29(3): 407-408 firstpublished online December 14, 2012
mictools2018	Davide Albanese, Samantha Riccadonna,Claudio Donati, Pietro Franceschi.A practical tool for maximalinformation coefficient analysis. GigaScience (2018)

[8]ページ先頭