This vignette introduces the CoFAST workflow for the analysis ofPBMC3k single-cell RNA sequencing dataset. In this vignette, theworkflow of CoFAST consists of three steps
We demonstrate the use of CoFAST to PBMC3k data that are in theSeuratData package, which can be downloaded to the current working pathby the following command:
set.seed(2024)# set a random seed for reproducibility.library(Seurat)pbmc3k<- SeuratData::LoadData("pbmc3k")## filter the seurat_annotation is NAidx<-which(!is.na(pbmc3k$seurat_annotations))pbmc3k<- pbmc3k[,idx]pbmc3kThe package can be loaded with the command:
First, we normalize the data.
Then, we select the variable genes.
We introduce how to use the non-centered factor model (NCFM) toperform coembedding for this scRNA-seq data. First, we determine thedimension of coembeddings. Here, we use the parallel analysis method toselect the dimension.
Subsequently, we calculate coembeddings by utilizing NCFM, andobserve that thereductions field acquires an additionalcomponent namedncfm.
In the following, we show how to find the signature genes based oncomebeddings. First, we calculate the distance matrix.
Next, we find the signature genes for each cell type
print(table(pbmc3k$seurat_annotations))Idents(pbmc3k)<- pbmc3k$seurat_annotationsdf_sig_list<-find.signature.genes(pbmc3k)str(df_sig_list)Then, we obtain the top five signature genes and organize them into adata.frame. The colnamedistance means the distance betweengene (i.e., VPREB3) and cells with the specific cell type (i.e., Bcell), which is calculated based on the coembedding of genes and cellsin the coembedding space. The distance is smaller, the associationbetween gene and the cell type is stronger. The colnameexpr.prop represents the expression proportion of the gene(i.e., VPREB3) within the cell type (i.e., B cell). The colnamelabel means the cell types and colnamegenedenotes the gene name. By the data.frame object, we knowVPREB3 is the one of the top signature gene of B cell.
Next, we calculate the UMAP projections of coembeddings of cells andthe selected signature genes.
pbmc3k<-coembedding_umap( pbmc3k,reduction ="ncfm",reduction.name ="UMAP",gene.set =unique(dat$gene))Furthermore, we visualize the cells and top five signature genes of Bcell in the UMAP space of coembedding. We observe that the UMAPprojections of the five signature genes are near to B cells, whichindicates these genes are enriched in B cells.
## choose beutifual colorscols_cluster<-c("black", PRECAST::chooseColors(palettes_name ="Light 13",n_colors =9,plot_colors =TRUE))p1<-coembed_plot( pbmc3k,reduction ="UMAP",gene_txtdata =subset(dat, label=='B'),cols=cols_cluster,pt_text_size =3)p1Then, we visualize the cells and top five signature genes of allinvolved cell types in the UMAP space of coembedding. We observe thatthe UMAP projections of the five signature genes are near to thecorresponding cell type, which indicates these genes are enriched in thecorresponding cells.
In addtion, we can fully take advantages of the visualizationfunctions inSeurat package for visualization. Thefollowing is an example that visualizes the cell types on the UMAPspace.
cols_type<- cols_cluster[-1]names(cols_type)<-sort(levels(Idents(pbmc3k)))DimPlot(pbmc3k,reduction ='UMAP',cols=cols_type)Then, there is another example that we plot the two signature genesof B cell on UMAP space, in which we observed the high expression in Bcells in constrast to other cell types.
sessionInfo()#> R version 4.4.1 (2024-06-14 ucrt)#> Platform: x86_64-w64-mingw32/x64#> Running under: Windows 11 x64 (build 26100)#>#> Matrix products: default#>#>#> locale:#> [1] LC_COLLATE=C#> [2] LC_CTYPE=Chinese (Simplified)_China.utf8#> [3] LC_MONETARY=Chinese (Simplified)_China.utf8#> [4] LC_NUMERIC=C#> [5] LC_TIME=Chinese (Simplified)_China.utf8#>#> time zone: Asia/Shanghai#> tzcode source: internal#>#> attached base packages:#> [1] stats graphics grDevices utils datasets methods base#>#> loaded via a namespace (and not attached):#> [1] digest_0.6.37 R6_2.5.1 fastmap_1.2.0 xfun_0.47#> [5] cachem_1.1.0 knitr_1.48 htmltools_0.5.8.1 rmarkdown_2.28#> [9] lifecycle_1.0.4 cli_3.6.3 sass_0.4.9 jquerylib_0.1.4#> [13] compiler_4.4.1 rstudioapi_0.16.0 tools_4.4.1 evaluate_1.0.0#> [17] bslib_0.8.0 yaml_2.3.10 rlang_1.1.4 jsonlite_1.8.9