library(mgc)library(reshape2)library(ggplot2)plot_sim_func <-function(X, Y, Xf, Yf, name,geom='line') {if (!is.null(dim(Y))) { Y <-Y[,1] Yf <-Yf[,1] }if (geom== 'points') { funcgeom <-geom_point }else { funcgeom <-geom_line } data <-data.frame(x1=X[,1],y=Y) data_func <-data.frame(x1=Xf[,1],y=Yf)ggplot(data,aes(x=x1,y=y))+funcgeom(data=data_func,aes(x=x1,y=y),color='red',size=3)+geom_point()+xlab("x")+ylab("y")+ggtitle(name)+theme_bw()}plot_mtx <-function(Dx,main.title="Local Correlation Map",xlab.title="# X Neighbors",ylab.title="# Y Neighbors") { data <-melt(Dx)ggplot(data,aes(x=Var1,y=Var2,fill=value))+geom_tile()+scale_fill_gradientn(name="l-corr",colours=c("#f2f0f7","#cbc9e2","#9e9ac8","#6a51a3"),limits=c(min(Dx),max(Dx)))+xlab(xlab.title)+ylab(ylab.title)+theme_bw()+ggtitle(main.title)}In this notebook, we show how to use theMGC statistic in a real and simulated context.
First, we use a simulated example, whereY = XB + N; that is,Y is linearly dependent onX with added gaussian noise.
n=200# 100 samplesd=1# simple 1-d caseset.seed(12345)data <-mgc.sims.linear(n, d)# data with noisefunc <-mgc.sims.linear(n, d,eps=0)# source functionWe first visualize the data:
plot_sim_func(data$X, data$Y, func$X, func$Y,name="Linear Simulation")which clearly shows a slightly linear relationship.
visualizing theMGC image with 100 replicates:
set.seed(12345)res <-mgc.test(data$X, data$Y,nperm=20)# 20 permutations test; typically should be run with >100 permutations## Warning in mgc.test(data$X, data$Y, nperm = 20): nperm is < 100. nperm should## typically be set > 100.plot_mtx(res$localCorr,main.title="Local Correlation Map")print(res$optimalScale)## $x## [1] 200## ## $y## [1] 200print(res$statMGC)## NULLAs we can see, the local correlation plot suggests a strongly linear relationship. This is because intuitively, having more and more neighbors will help in our identification of the linear relationship betweenx andy, and as we can see in the local correlation map,k=n=200 andl=n=200 shows the best smoothedp.
In the below demo, we show the result ofMGC to determine the relationship between the first (sepal length) and third (petal length) dimensions of theiris dataset:
set.seed(12345)res <-mgc.test(iris[,1,drop=FALSE], iris[,3,drop=FALSE],nperm=20)## Warning in mgc.test(iris[, 1, drop = FALSE], iris[, 3, drop = FALSE], nperm =## 20): nperm is < 100. nperm should typically be set > 100.plot_mtx(res$localCorr,main.title="Local Correlation Map",xlab.title="Sepal Length Neighbors",ylab.title="Petal Length Neighbors")print(res$optimalScale)## $x## [1] 35## ## $y## [1] 43print(res$statMGC)## NULLviewing the corr map above we see that the relationship betweel Sepal and Petal Length is strongly linear.