In this short tutorial we showcase a simple pipeline to create abulkAnalyseR app using a publicly available dataset from theGene Expression Omnibus(GEO). No pre-requisites are required, as the installation ofbulkAnalyseR and download of the data are included.
The example app described in this vignette can be foundhere.
First, install the latest version of bulkAnalyseR, starting with theCRAN and Bioconductor dependencies:
packages.cran<-c("ggplot2","shiny","shinythemes","gprofiler2","stats","ggrepel","utils","RColorBrewer","circlize","shinyWidgets","shinyjqui","dplyr","magrittr","ggforce","rlang","glue","matrixStats","noisyr","tibble","ggnewscale","ggrastr","visNetwork","shinyLP","grid","DT","scales","shinyjs","tidyr","UpSetR","ggVennDiagram")new.packages.cran<- packages.cran[!(packages.cran%in%installed.packages()[,"Package"])]if(length(new.packages.cran))install.packages(new.packages.cran)packages.bioc<-c("edgeR","DESeq2","preprocessCore","GENIE3","ComplexHeatmap")new.packages.bioc<- packages.bioc[!(packages.bioc%in%installed.packages()[,"Package"])]if(length(new.packages.bioc)){if (!requireNamespace("BiocManager",quietly =TRUE))install.packages("BiocManager") BiocManager::install(new.packages.bioc)}install.packages("bulkAnalyseR")We start by downloading and reading in the expression matrix. Rowsrepresent genes/features and columns represent samples (note you need aninternet connection to run the code below). The matrix is froma2022 study on the Stem Cell transcriptional response toMicroglia-Conditioned Media. We only use a few samples in the study forillustrative purposes.
download_path<-paste0(tempdir(),"expression_matrix.csv.gz")download.file("https://www.ncbi.nlm.nih.gov/geo/download/?acc=GSE178620&format=file&file=GSE178620%5Fraw%5Fabundances%2Ecsv%2Egz", download_path)exp<-as.matrix(read.csv(download_path,row.names =1))[,c(1,2,19,20)]head(exp)##> control_G322_G322_1 control_G322_G322_2 microglia_067MG_G322_1##> ENSG00000223972 0 0 0##> ENSG00000227232 51 45 25##> ENSG00000278267 6 0 0##> ENSG00000243485 0 0 0##> ENSG00000284332 0 0 0##> ENSG00000237613 0 0 0##> microglia_067MG_G322_2##> ENSG00000223972 0##> ENSG00000227232 40##> ENSG00000278267 0##> ENSG00000243485 0##> ENSG00000284332 0##> ENSG00000237613 0We use a very simple metadata table with just the main condition inthe experiment. Detailed metadata is available for all GEO datasets andcan be downloaded and used instead.
meta<-data.frame(name =colnames(exp),condition =sapply(colnames(exp),USE.NAMES =FALSE,function(nm){strsplit(nm,"_")[[1]][1] }))meta##> name condition##> 1 control_G322_G322_1 control##> 2 control_G322_G322_2 control##> 3 microglia_067MG_G322_1 microglia##> 4 microglia_067MG_G322_2 microgliaWe can now denoise and normalise the data using bulkAnalyseR
exp.proc<- bulkAnalyseR::preprocessExpressionMatrix(exp,output.plot =TRUE)##> >>> noisyR counts approach pipeline <<<##> The input matrix has 60671 rows and 4 cols##> number of genes: 60671##> number of samples: 4##> Calculating the number of elements per window##> the number of elements per window is 6067##> the step size is 303##> the selected similarity metric is correlation_pearson##> Working with sample 1##> Working with sample 2##> Working with sample 3##> Working with sample 4##> Calculating noise thresholds for 4 samples...##> similarity.threshold = 0.25##> method.chosen = Boxplot-IQR##> Denoising expression matrix...##> removing noisy genes##> adjusting matrix##> >>> Done! <<<##> Performing quantile normalisation...##> Done!Finally, we can create a shiny app. This example app can be foundhere.
bulkAnalyseR::generateShinyApp(shiny.dir ="shiny_GEO",app.title ="Shiny app for visualisation of GEO data",modality ="RNA",expression.matrix = exp.proc,metadata = meta,organism ="hsapiens",org.db ="org.Hs.eg.db")sessionInfo()##> R version 4.2.2 (2022-10-31 ucrt)##> Platform: x86_64-w64-mingw32/x64 (64-bit)##> Running under: Windows 10 x64 (build 22621)##>##> Matrix products: default##>##> locale:##> [1] LC_COLLATE=C##> [2] LC_CTYPE=English_United Kingdom.utf8##> [3] LC_MONETARY=English_United Kingdom.utf8##> [4] LC_NUMERIC=C##> [5] LC_TIME=English_United Kingdom.utf8##>##> attached base packages:##> [1] stats graphics grDevices utils datasets methods base##>##> loaded via a namespace (and not attached):##> [1] tidyselect_1.2.0 xfun_0.35 bslib_0.4.1##> [4] lattice_0.20-45 splines_4.2.2 colorspace_2.0-3##> [7] vctrs_0.5.1 generics_0.1.3 htmltools_0.5.4##> [10] yaml_2.3.6 mgcv_1.8-41 utf8_1.2.2##> [13] noisyr_1.0.0 rlang_1.0.6 jquerylib_0.1.4##> [16] pillar_1.8.1 later_1.3.0 glue_1.6.2##> [19] withr_2.5.0 DBI_1.1.3 foreach_1.5.2##> [22] lifecycle_1.0.3 stringr_1.5.0 munsell_0.5.0##> [25] gtable_0.3.1 codetools_0.2-18 evaluate_0.19##> [28] labeling_0.4.2 knitr_1.41 fastmap_1.1.0##> [31] httpuv_1.6.7 fansi_1.0.3 highr_0.9##> [34] preprocessCore_1.60.0 Rcpp_1.0.9 xtable_1.8-4##> [37] scales_1.2.1 promises_1.2.0.1 cachem_1.0.6##> [40] jsonlite_1.8.4 bulkAnalyseR_1.1.0 farver_2.1.1##> [43] mime_0.12 ggplot2_3.4.0 digest_0.6.31##> [46] stringi_1.7.8 dplyr_1.0.10 shiny_1.7.3##> [49] grid_4.2.2 cli_3.4.1 tools_4.2.2##> [52] magrittr_2.0.3 philentropy_0.7.0 sass_0.4.4##> [55] tibble_3.1.8 pkgconfig_2.0.3 Matrix_1.5-1##> [58] ellipsis_0.3.2 assertthat_0.2.1 rmarkdown_2.18##> [61] rstudioapi_0.14 iterators_1.0.14 R6_2.5.1##> [64] nlme_3.1-160 compiler_4.2.2