- Notifications
You must be signed in to change notification settings - Fork6
Adversarial domain translation networks for integrating large-scale atlas-level single-cell datasets
License
YangLabHKUST/Portal
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Adversarial domain translation networks for integrating large-scale atlas-level single-cell datasets
An efficient, accurate and flexible method for single-cell data integration.
Check out our manuscript in Nature Computational Science:
We providesource codes for reproducing the experiments of the paper "Adversarial domain translation networks for fast and accurate integration of large-scale atlas-level single-cell datasets".
- Integration of mouse spleen datasets (we reproduce the result of performance metrics in this notebook as an example).Benchmarking.
- Integration of mouse marrow datasets.
- Integration of mouse bladder datasets.
- Integration of mouse brain cerebellum datasets.
- Integration of mouse brain hippocampus datasets.
- Integration of mouse brain thalamus datasets.
- Integration of human PBMC datasets (sensitivity analysis).
- Integration of entire mouse cell atlases from the Tablula Muris project.
- Integration of mouse brain scRNA-seq and snRNA-seq datasets.
- Integration of human PBMC scRNA-seq and human brain snRNA-seq datasets.
- Integration of scRNA-seq and scATAC-seq datasets.
- Integration of developmental trajectories.
- Integration of spermatogenesis differentiation process across multiple species. Gene lists from Ensembl Biomart (we only use genes that are assigned with the type "ortholog_one2one" in the lists):orthologues (human vs mouse),orthologues (human vs macaque).
- Portal can be installed from PyPI:
pip install portal-sc
- Alternatively, Portal can also be downloaded from GitHub:
git clone https://github.com/YangLabHKUST/Portal.gitcd Portalconda env update --f environment.ymlconda activate portal
Normally the installation time is less than 5 minutes.
Starting with raw count matrices formatted as AnnData objects, Portal uses a standard pipline adopted by Seurat and Scanpy to preprocess data, followed by PCA for dimensionality reduction. After preprocessing, Portal can be trained viamodel.train()
.
importportalimportscanpyassc# read AnnDataadata_1=sc.read_h5ad("adata_1.h5ad")adata_2=sc.read_h5ad("adata_2.h5ad")model=portal.model.Model()model.preprocess(adata_1,adata_2)# perform preprocess and PCAmodel.train()# train the modelmodel.eval()# get integrated latent representation of cells
The evaluating proceduremodel.eval()
saves the integrated latent representation of cells inmodel.latent
, which can be used for downstream integrative analysis.
lambdacos
: Coefficient of the regularizer for preserving cosine similarity across domains.Default:20.0
.training_steps
: Number of steps for training.Default:2000
. Usetraining_steps=1000
for datasets with sample size < 20,000.npcs
: Dimensionality of the embeddings in each domain (number of PCs).Default:30
.n_latent
: Dimensionality of the shared latent space.Default:20
.batch_size
: Batch size for training.Default:500
.seed
: Random seed.Default:1234
.
The default setting of the parameterlambdacos
works in general. We also enable tuning of this parameter to achieve a better performance, seeTuninglambdacos
(optional). For the integration task where the cosine similarity is not a reliable cross-domain correspondance (such as cross-species integration), we recommend to use a lower value such aslambdacos=10.0
.
To deal with large single-cell datasets, we also developed a memory-efficient version by reading mini-batches from the disk:
model=portal.model.Model()model.preprocess_memory_efficient(adata_A_path="adata_1.h5ad",adata_B_path="adata_2.h5ad")model.train_memory_efficient()model.eval_memory_efficient()
Portal integrates multiple datasets incrementally. Givenadata_list = [adata_1, ..., adata_n]
is a list of AnnData objects, they can be integrated by running the following commands:
lowdim_list=portal.utils.preprocess_datasets(adata_list)integrated_data=portal.utils.integrate_datasets(lowdim_list)
An optional choice is to tune the parameterlambdacos
in the range [15.0, 50.0]. Users can run the following command to search for an optimal parameter that yields the best integration result in terms of the mixing metric:
lowdim_list=portal.utils.preprocess_datasets(adata_list)integrated_data=portal.utils.integrate_datasets(lowdim_list,search_cos=True)
Portal can provide harmonized expression matrices (in scaled level or log-normalized level):
lowdim_list,hvg,mean,std,pca=portal.utils.preprocess_recover_expression(adata_list)expression_scaled,expression_log_normalized=portal.utils.integrate_recover_expression(lowdim_list,mean,std,pca)
We provide demos for users to get a quick start:Demo 1,Demo 2.
This package is developed by Jia Zhao (jzhaoaz@connect.ust.hk) and Gefei Wang (gwangas@connect.ust.hk).
Jia Zhao, Gefei Wang, Jingsi Ming, Zhixiang Lin, Yang Wang, The Tabula Microcebus Consortium, Angela Ruohao Wu, Can Yang. Adversarial domain translation networks for integrating large-scale atlas-level single-cell datasets. Nature Computational Science 2, 317–330 (2022).
About
Adversarial domain translation networks for integrating large-scale atlas-level single-cell datasets