This vignette introduces the FAST workflow for the analysis ofmultiple simulated spatial transcriptomics dataset. FAST workflow isbased on the PRECASTObj object created in thePRECAST Rpackage and the workflow of FAST is similar to that of PRECAST; see (https://feiyoung.github.io/PRECAST/articles/PRECAST.BreastCancer.html)for the workflow of PRECAST. The workflow of FAST consists of threesteps
We demonstrate the use of FAST to three simulated ST data that arehere,which can be downloaded to the current working path by the followingcommand:
githubURL<-"https://github.com/feiyoung/ProFAST/blob/main/vignettes_data/simu3.rds?raw=true"download.file(githubURL,"simu3.rds",mode='wb')Then load to R
The package can be loaded with the command:
First, we view the the three simulated spatial transcriptomics datawith ST platform. There are 200 genes for each data batch and ~2000spots in total
Check the content insimu3.
We show how to create a PRECASTObject object step by step. First, wecreate a Seurat list object using the count matrix and meta data of eachdata batch. Althoughsimu3 is a prepared Seurat listobject, we re-create a same objcet seuList to show the details.
row andcol, which benefits theidentification of spaital coordinates by FAST## Get the gene-by-spot read count matricescountList<-lapply(simu3,function(x) x[["RNA"]]@counts)## Check the spatial coordinates: Yes, they are named as "row" and "col"!head(simu3[[1]]@meta.data)## Get the meta data of each spot for each data batchmetadataList<-lapply(simu3,function(x) x@meta.data)## ensure the row.names of metadata in metaList are the same as that of colnames count matrix in countListM<-length(countList)for(rin1:M){row.names(metadataList[[r]])<-colnames(countList[[r]])}## Create the Seurat list objectseuList<-list()for(rin1:M){ seuList[[r]]<-CreateSeuratObject(counts = countList[[r]],meta.data=metadataList[[r]],project ="FASTsimu")}Next, we useCreatePRECASTObject() to create aPRECASTObject object based on the Seurat list objectseuList. Users are able to seehttps://feiyoung.github.io/PRECAST/articles/PRECAST.BreastCancer.htmlfor what is done in this function. Since this is a simulated dataset, weused all the 200 genes by using a custom gene listcustomGenelist=custom_genelist). We observe that there areonly 197 genes passing the filtering step.
## Create PRECASTObjectcustom_genelist<-row.names(seuList[[1]])set.seed(2023)PRECASTObj<-CreatePRECASTObject(seuList,customGenelist=custom_genelist)## User can retain the raw seuList by the following commond.## PRECASTObj <- CreatePRECASTObject(seuList, customGenelist=row.names(seuList[[1]]), rawData.preserve = TRUE)## check the number of genes/features after filtering stepPRECASTObj@seulistAdd adjacency matrix list and parameter setting of FAST. More modelsetting parameters can be found inmodel_set_FAST().
## seuList is null since the default value `rawData.preserve` is FALSE.PRECASTObj@seuList## Add adjacency matrix list for a PRECASTObj object to prepare for FAST model fitting.PRECASTObj<-AddAdjList(PRECASTObj,platform ="ST")## Add a model setting in advance for a PRECASTObj object: verbose =TRUE helps outputing the information in the algorithm;PRECASTObj<-AddParSettingFAST(PRECASTObj,verbose=TRUE)## Check the parametersPRECASTObj@parameterListFor functionFAST, users can specify the number offactorsq and the fitted modelfit.model. Theq sets the number of spatial factors to be extracted, and alareger one means more information to be extracted but highercomputaional cost. Thefit.model specifies the version ofFAST to be fitted. The Gaussian version (gaussian) modelsthe log-count matrices while the Poisson verion (poisson)models the count matrices; default aspoisson.
Next, we investigate the performance of dimension reduction bycalculating the adjusted McFadden’s pseudo R-square for each data batch.The simulated true labels is in themeta.data of eachcomponent ofPRECASTObj@seulist.
Based on the embeddings from FAST, we useHarmony toalign the embeddings then followed by Louvain clustering to obtain thecluster labels. In this downstream analysis, other methods for embeddingalignment and clustering can be also used. In the vignette of twosections of DLPFC Visium data, we will show another method(iSC-MEB) to jointly perform embedding alignment andspatial clustering.
In the following, we remove the unwanted variations in thelog-normalized expression matrices to obtain a combined log-normalizedexpression matrix in a Seurat object. In the context of the simulateddata used in this study, housekeeping genes are not present, thus, weturn to another method to remove the unwanted variations. Specifically,we leverage the batch effect embeddings estimated through Harmony tocapture and mitigate unwanted variations. Additionally, we utilize thecluster labels obtained via Louvain clustering to retain the desiredbiological effects.
The estimated embeddings of batch effects (batchEmbed) are in theslotPRECASTObj@resList$Harmony and cluster labels(cluster) are in the slotPRECASTObj@resList$Louvain.
Then, we integrate the three sections by removing the unwantedvariations and settingseulist_HK=NULL andMethod = "HarmonyLouvain" in the functionIntegrateSRTData(). After obtainingseuInt, wewill see there are three embeddings:FAST,harmony andposition, in the slotseuInt@reductions.FAST are the embeddingsobtained by FAST model fitting and are uncorrected embeddings that mayincludes the unwanted effects (i.e., batch effects);harmonyare the embeddings obtained by Harmony fitting andare aligned embeddings; andposition are the spatialcoordinates.
First, user can choose a beautiful color schema usingchooseColors() in the R packagePRECAST.
Then, we plot the spatial scatter plot for clusters using thefunctionSpaPlot() in the R packagePRECAST.
Users can re-plot the above figures for specific need by returning aggplot list object. For example, we plot the spatial heatmap using acommon legend by using the functiondrawFigs() in the RpackagePRECAST.
pList<-SpaPlot(seuInt,item ="cluster",title_name='Section',batch =NULL,point_size =1,cols = cols_cluster,combine =FALSE)drawFigs(pList,layout.dim =c(1,3),common.legend =TRUE,legend.position ="right",align ="hv")We use the functionAddUMAP() in the R packagePRECAST to obtain the three-dimensional components of UMAPusing the aligned embeddings in the reductionharmony.
We plot the spatial tNSE RGB plot to illustrate the performance inextracting features.
p13<-SpaPlot(seuInt,batch =NULL,item ="RGB_UMAP",point_size =1,combine =FALSE,text_size =15)drawFigs(p13,layout.dim =c(1,3),common.legend =TRUE,legend.position ="right",align ="hv")We use the functionAddUTSNE() in the R packagePRECAST to obtain the two-dimensional components of UMAPusing the aligned embeddings in the reductionharmony, andplot the tSNE plot based on the extracted features to check theperformance of integration.
seuInt<-AddTSNE(seuInt,n_comp =2,reduction ='harmony',assay ='RNA')p1<-dimPlot(seuInt,item ="cluster",point_size =0.5,font_family ="serif",cols = cols_cluster,border_col ="gray10",legend_pos ="right")# Times New Romanp2<-dimPlot(seuInt,item ="batch",point_size =0.5,font_family ="serif",legend_pos ="right")drawFigs(list(p1, p2),layout.dim =c(1,2),legend.position ="right")Finally, we condut the combined differential expression analysisusing the integrated log-normalized expression matrix saved in theseuInt object. The functionFindAllMarkers()in theSeurat R package is ued to achieve thisanalysis.
dat_deg<-FindAllMarkers(seuInt)library(dplyr)n<-5dat_deg%>%group_by(cluster)%>%top_n(n = n,wt = avg_log2FC)-> top10seuInt<-ScaleData(seuInt)Plot dot plot of normalized expressions for each spatial domainidentified by using the FAST embeddings.
col_here<-c("#F2E6AB","#9C0141")library(ggplot2)p1<-DotPlot(seuInt,features=unname(top10$gene),cols=col_here,# idents = ident_here,col.min =-1,col.max =1)+coord_flip()+theme(axis.text.y =element_text(face ="italic"))+ylab("Domain")+xlab(NULL)+theme(axis.text.x =element_text(size=12,angle =25,hjust =1,family='serif'),axis.text.y =element_text(size=12,face="italic",family='serif'))p1sessionInfo()#> R version 4.4.1 (2024-06-14 ucrt)#> Platform: x86_64-w64-mingw32/x64#> Running under: Windows 11 x64 (build 26100)#>#> Matrix products: default#>#>#> locale:#> [1] LC_COLLATE=C#> [2] LC_CTYPE=Chinese (Simplified)_China.utf8#> [3] LC_MONETARY=Chinese (Simplified)_China.utf8#> [4] LC_NUMERIC=C#> [5] LC_TIME=Chinese (Simplified)_China.utf8#>#> time zone: Asia/Shanghai#> tzcode source: internal#>#> attached base packages:#> [1] stats graphics grDevices utils datasets methods base#>#> loaded via a namespace (and not attached):#> [1] digest_0.6.37 R6_2.5.1 fastmap_1.2.0 xfun_0.47#> [5] cachem_1.1.0 knitr_1.48 htmltools_0.5.8.1 rmarkdown_2.28#> [9] lifecycle_1.0.4 cli_3.6.3 sass_0.4.9 jquerylib_0.1.4#> [13] compiler_4.4.1 rstudioapi_0.16.0 tools_4.4.1 evaluate_1.0.0#> [17] bslib_0.8.0 yaml_2.3.10 rlang_1.1.4 jsonlite_1.8.9