Movatterモバイル変換

This vignette introduces the FAST workflow for the analysis ofmultiple simulated spatial transcriptomics dataset. FAST workflow isbased on the PRECASTObj object created in thePRECAST Rpackage and the workflow of FAST is similar to that of PRECAST; see (https://feiyoung.github.io/PRECAST/articles/PRECAST.BreastCancer.html)for the workflow of PRECAST. The workflow of FAST consists of threesteps

We demonstrate the use of FAST to three simulated ST data that arehere,which can be downloaded to the current working path by the followingcommand:

View the simulated data

First, we view the the three simulated spatial transcriptomics datawith ST platform. There are 200 genes for each data batch and ~2000spots in total

simu3## a list including three Seurat object with default assay: RNA

Check the content insimu3.

head(simu3[[1]])

row.names(simu3[[1]])[1:10]

Create a PRECASTObject object

We show how to create a PRECASTObject object step by step. First, wecreate a Seurat list object using the count matrix and meta data of eachdata batch. Althoughsimu3 is a prepared Seurat listobject, we re-create a same objcet seuList to show the details.

Note: the spatial coordinates must be contained in the meta data andnamed asrow andcol, which benefits theidentification of spaital coordinates by FAST

## Get the gene-by-spot read count matricescountList<-lapply(simu3,function(x) x[["RNA"]]@counts)## Check the spatial coordinates: Yes, they are named as "row" and "col"!head(simu3[[1]]@meta.data)## Get the meta data of each spot for each data batchmetadataList<-lapply(simu3,function(x) x@meta.data)## ensure the row.names of metadata in metaList are the same as that of colnames count matrix in countListM<-length(countList)for(rin1:M){row.names(metadataList[[r]])<-colnames(countList[[r]])}## Create the Seurat list  objectseuList<-list()for(rin1:M){  seuList[[r]]<-CreateSeuratObject(counts = countList[[r]],meta.data=metadataList[[r]],project ="FASTsimu")}

Prepare the PRECASTObject with preprocessing step.

Next, we useCreatePRECASTObject() to create aPRECASTObject object based on the Seurat list objectseuList. Users are able to seehttps://feiyoung.github.io/PRECAST/articles/PRECAST.BreastCancer.htmlfor what is done in this function. Since this is a simulated dataset, weused all the 200 genes by using a custom gene listcustomGenelist=custom_genelist). We observe that there areonly 197 genes passing the filtering step.

## Create PRECASTObjectcustom_genelist<-row.names(seuList[[1]])set.seed(2023)PRECASTObj<-CreatePRECASTObject(seuList,customGenelist=custom_genelist)## User can retain the raw seuList by the following commond.##  PRECASTObj <-  CreatePRECASTObject(seuList, customGenelist=row.names(seuList[[1]]), rawData.preserve = TRUE)## check the number of genes/features after filtering stepPRECASTObj@seulist

Fit FAST using simulated data

Add the model setting

Add adjacency matrix list and parameter setting of FAST. More modelsetting parameters can be found inmodel_set_FAST().

## seuList is null since the default value `rawData.preserve` is FALSE.PRECASTObj@seuList## Add adjacency matrix list for a PRECASTObj object to prepare for FAST model fitting.PRECASTObj<-AddAdjList(PRECASTObj,platform ="ST")## Add a model setting in advance for a PRECASTObj object: verbose =TRUE helps outputing the information in the algorithm;PRECASTObj<-AddParSettingFAST(PRECASTObj,verbose=TRUE)## Check the parametersPRECASTObj@parameterList

Fit FAST

For functionFAST, users can specify the number offactorsq and the fitted modelfit.model. Theq sets the number of spatial factors to be extracted, and alareger one means more information to be extracted but highercomputaional cost. Thefit.model specifies the version ofFAST to be fitted. The Gaussian version (gaussian) modelsthe log-count matrices while the Poisson verion (poisson)models the count matrices; default aspoisson.

### set q= 20 hereset.seed(2023)PRECASTObj<-FAST(PRECASTObj,q=20)### Check the resultsstr(PRECASTObj@resList)

Run gaussian version

Users can also use the gaussian version by the following command:

set.seed(2023)PRECASTObj<-FAST(PRECASTObj,q=20,fit.model='gaussian')### Check the resultsstr(PRECASTObj@resList)

Evaluate the adjusted McFadden’s pseudo R-square

Next, we investigate the performance of dimension reduction bycalculating the adjusted McFadden’s pseudo R-square for each data batch.The simulated true labels is in themeta.data of eachcomponent ofPRECASTObj@seulist.

## Obtain the true labelsyList<-lapply(PRECASTObj@seulist,function(x) x$Group)### Evaluate the MacR2MacVec<-sapply(1:length(PRECASTObj@seulist),function(r)get_r2_mcfadden(PRECASTObj@resList$FAST$hV[[r]], yList[[r]]))### output themprint(MacVec)

Embedding alignment and clustering using Harmony and Louvain

Based on the embeddings from FAST, we useHarmony toalign the embeddings then followed by Louvain clustering to obtain thecluster labels. In this downstream analysis, other methods for embeddingalignment and clustering can be also used. In the vignette of twosections of DLPFC Visium data, we will show another method(iSC-MEB) to jointly perform embedding alignment andspatial clustering.

PRECASTObj<-RunHarmonyLouvain(PRECASTObj,resolution =0.4)ARI_vec<-sapply(1:M,function(r) mclust::adjustedRandIndex(PRECASTObj@resList$Louvain$cluster[[r]], yList[[r]]))print(ARI_vec)

Remove unwanted variations in the log-normalized expressionmatrices

In the following, we remove the unwanted variations in thelog-normalized expression matrices to obtain a combined log-normalizedexpression matrix in a Seurat object. In the context of the simulateddata used in this study, housekeeping genes are not present, thus, weturn to another method to remove the unwanted variations. Specifically,we leverage the batch effect embeddings estimated through Harmony tocapture and mitigate unwanted variations. Additionally, we utilize thecluster labels obtained via Louvain clustering to retain the desiredbiological effects.

The estimated embeddings of batch effects (batchEmbed) are in theslotPRECASTObj@resList$Harmony and cluster labels(cluster) are in the slotPRECASTObj@resList$Louvain.

str(PRECASTObj@resList$Harmony)str(PRECASTObj@resList$Louvain)

Then, we integrate the three sections by removing the unwantedvariations and settingseulist_HK=NULL andMethod = "HarmonyLouvain" in the functionIntegrateSRTData(). After obtainingseuInt, wewill see there are three embeddings:FAST,harmony andposition, in the slotseuInt@reductions.FAST are the embeddingsobtained by FAST model fitting and are uncorrected embeddings that mayincludes the unwanted effects (i.e., batch effects);harmonyare the embeddings obtained by Harmony fitting andare aligned embeddings; andposition are the spatialcoordinates.

seulist_HK<-NULLseuInt<-IntegrateSRTData(PRECASTObj,seulist_HK=seulist_HK,Method ="HarmonyLouvain",seuList_raw=NULL,covariates_use=NULL,verbose=TRUE)seuInt

Visualization

First, user can choose a beautiful color schema usingchooseColors() in the R packagePRECAST.

cols_cluster<-chooseColors(palettes_name ="Nature 10",n_colors =7,plot_colors =TRUE)

Then, we plot the spatial scatter plot for clusters using thefunctionSpaPlot() in the R packagePRECAST.

p12<-SpaPlot(seuInt,item ="cluster",batch =NULL,point_size =1,cols = cols_cluster,combine =TRUE)p12

Users can re-plot the above figures for specific need by returning aggplot list object. For example, we plot the spatial heatmap using acommon legend by using the functiondrawFigs() in the RpackagePRECAST.

pList<-SpaPlot(seuInt,item ="cluster",title_name='Section',batch =NULL,point_size =1,cols = cols_cluster,combine =FALSE)drawFigs(pList,layout.dim =c(1,3),common.legend =TRUE,legend.position ="right",align ="hv")

We use the functionAddUMAP() in the R packagePRECAST to obtain the three-dimensional components of UMAPusing the aligned embeddings in the reductionharmony.

seuInt<-AddUMAP(seuInt,n_comp=3,reduction ='harmony',assay ='RNA')seuInt

We plot the spatial tNSE RGB plot to illustrate the performance inextracting features.

p13<-SpaPlot(seuInt,batch =NULL,item ="RGB_UMAP",point_size =1,combine =FALSE,text_size =15)drawFigs(p13,layout.dim =c(1,3),common.legend =TRUE,legend.position ="right",align ="hv")

We use the functionAddUTSNE() in the R packagePRECAST to obtain the two-dimensional components of UMAPusing the aligned embeddings in the reductionharmony, andplot the tSNE plot based on the extracted features to check theperformance of integration.

seuInt<-AddTSNE(seuInt,n_comp =2,reduction ='harmony',assay ='RNA')p1<-dimPlot(seuInt,item ="cluster",point_size =0.5,font_family ="serif",cols = cols_cluster,border_col ="gray10",legend_pos ="right")# Times New Romanp2<-dimPlot(seuInt,item ="batch",point_size =0.5,font_family ="serif",legend_pos ="right")drawFigs(list(p1, p2),layout.dim =c(1,2),legend.position ="right")

Combined differential epxression analysis

Finally, we condut the combined differential expression analysisusing the integrated log-normalized expression matrix saved in theseuInt object. The functionFindAllMarkers()in theSeurat R package is ued to achieve thisanalysis.

dat_deg<-FindAllMarkers(seuInt)library(dplyr)n<-5dat_deg%>%group_by(cluster)%>%top_n(n = n,wt = avg_log2FC)-> top10seuInt<-ScaleData(seuInt)

Plot dot plot of normalized expressions for each spatial domainidentified by using the FAST embeddings.

col_here<-c("#F2E6AB","#9C0141")library(ggplot2)p1<-DotPlot(seuInt,features=unname(top10$gene),cols=col_here,#  idents = ident_here,col.min =-1,col.max =1)+coord_flip()+theme(axis.text.y =element_text(face ="italic"))+ylab("Domain")+xlab(NULL)+theme(axis.text.x =element_text(size=12,angle =25,hjust =1,family='serif'),axis.text.y =element_text(size=12,face="italic",family='serif'))p1

Session Info

sessionInfo()#> R version 4.4.1 (2024-06-14 ucrt)#> Platform: x86_64-w64-mingw32/x64#> Running under: Windows 11 x64 (build 26100)#>#> Matrix products: default#>#>#> locale:#> [1] LC_COLLATE=C#> [2] LC_CTYPE=Chinese (Simplified)_China.utf8#> [3] LC_MONETARY=Chinese (Simplified)_China.utf8#> [4] LC_NUMERIC=C#> [5] LC_TIME=Chinese (Simplified)_China.utf8#>#> time zone: Asia/Shanghai#> tzcode source: internal#>#> attached base packages:#> [1] stats     graphics  grDevices utils     datasets  methods   base#>#> loaded via a namespace (and not attached):#>  [1] digest_0.6.37     R6_2.5.1          fastmap_1.2.0     xfun_0.47#>  [5] cachem_1.1.0      knitr_1.48        htmltools_0.5.8.1 rmarkdown_2.28#>  [9] lifecycle_1.0.4   cli_3.6.3         sass_0.4.9        jquerylib_0.1.4#> [13] compiler_4.4.1    rstudioapi_0.16.0 tools_4.4.1       evaluate_1.0.0#> [17] bslib_0.8.0       yaml_2.3.10       rlang_1.1.4       jsonlite_1.8.9