o2plsda provides functions to do O2PLS-DA analysis formultiple omics integration.The algorithm came from “O2-PLS, a two-block(X±Y) latent variable regression (LVR) method with an integral OSCfilter” which published by Johan Trygg and Svante Wold at 2003. O2PLS isa bidirectional multivariate regression method that aims to separate thecovariance between two data sets (it was recently extended to multipledata sets) (Löfstedt and Trygg, 2011; Löfstedt et al., 2012) from thesystematic sources of variance being specific for each data setseparately.
In order to avoid overfitting of the model, the optimal number oflatent variables for each model structure was estimated usinggroup-balanced MCCV. The package could use the group information when weselect the best paramaters with cross-validation. In cross-validation(CV) one minimizes a certain measure of error over some parameters thatshould be determined a priori. Here, we have three parameters: (nc, nx,ny). A popular measure is the prediction error ||Y - ||, where is aprediction of Y. In our case the O2PLS method is symmetric in X and Y,so we minimize the sum of the prediction errors: ||X - ||+||Y - ||.
Here nc should be a positive integer, and nx and ny should benon-negative. The best integers are then the minimizers of theprediction error.
The O2PLS-DA analysis was performed as described by Bylesjö etal. (2007); briefly, the O2PLS predictive variation [
library(devtools)install_github("guokai8/o2plsda")library(o2plsda)set.seed(123)# sample * valuesX = matrix(rnorm(5000),50,100)# sample * valuesY = matrix(rnorm(5000),50,100)rownames(X) <- paste("S",1:50,sep="")rownames(Y) <- paste("S",1:50,sep="")colnames(X) <- paste("Gene",1:100,sep="")colnames(Y) <- paste("Lipid",1:100,sep="")X = scale(X, scale=T)Y = scale(Y, scale=T)## group factor could be omitted if you don't have any group group <- rep(c("Ctrl","Treat"),each = 25)Do cross validation with group information
set.seed(123)## nr_folds : cross validation k-fold (suggest 10)## ncores : parallel paramaters for large datasetscv <- o2cv(X,Y,1:5,1:3,1:3,group=group,nr_folds = 10)####################################### The best parameters are nc = 1, nx = 2, ny = 1####################################### The the RMSE is: 1.97990443734287#####################################Then we can do the O2PLS analysis with nc = 1, nx = 2, ny =1. You canalso select the best paramaters by looking at the cross validationresults.
fit <- o2pls(X,Y,1,2,1)summary(fit)## ## ######### Summary of the O2PLS results ########### ### Call o2pls(X, Y, nc= 1 , nx= 2 , ny= 1 ) ##### ### Total variation ## ### X: 4900 ; Y: 4900 ##### ### Total modeled variation ### X: 0.108 ; Y: 0.098 ##### ### Joint, Orthogonal, Noise (proportions) ##### X Y## Joint 0.039 0.052## Orthogonal 0.070 0.046## Noise 0.892 0.902## ### Variation in X joint part predicted by Y Joint part: 0.882 ## ### Variation in Y joint part predicted by X Joint part: 0.882 ## ### Variation in each Latent Variable (LV) in Joint part: ## LV1## X 0.039## Y 0.052## ### Variation in each Latent Variable (LV) in X Orthogonal part: ## LV1 LV2## X 0.036 0.034## ### Variation in each Latent Variable (LV) in Y Orthogonal part: ## LV1## Y 0.046## ## ########################################################################################Extract the loadings and scores from the fit results
Xl <- loadings(fit,loading="Xjoint")Xs <- scores(fit,score="Xjoint")plot(fit,type="score",var="Xjoint", group=group)plot(fit,type="loading",var="Xjoint", group=group,repel=F,rotation=TRUE)Do the OPLSDA based on the O2PLS results
res <- oplsda(fit,group, nc=1)plot(res, type="score", group=group)vip <- vip(res)plot(res,type="vip", group = group, repel = FALSE,order=TRUE)The package is still under development.
If you like this package, please contact me for the citation.
For any questions please contact guokai8@gmail.com orhttps://github.com/guokai8/o2plsda/issues