YangLabHKUST/PortalPublic

NotificationsYou must be signed in to change notification settings
Fork6
Star31

Adversarial domain translation networks for integrating large-scale atlas-level single-cell datasets

License

MIT license

31 stars 6 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 79 Commits
portal		portal
LICENSE		LICENSE
README.md		README.md
environment.yml		environment.yml

Repository files navigation

Portal

Adversarial domain translation networks for integrating large-scale atlas-level single-cell datasets

An efficient, accurate and flexible method for single-cell data integration.

Check out our manuscript in Nature Computational Science:

Reproducibility

We providesource codes for reproducing the experiments of the paper "Adversarial domain translation networks for fast and accurate integration of large-scale atlas-level single-cell datasets".

Integration of mouse spleen datasets (we reproduce the result of performance metrics in this notebook as an example).Benchmarking.
Integration of mouse marrow datasets.
Integration of mouse bladder datasets.
Integration of mouse brain cerebellum datasets.
Integration of mouse brain hippocampus datasets.
Integration of mouse brain thalamus datasets.
Integration of human PBMC datasets (sensitivity analysis).
Integration of entire mouse cell atlases from the Tablula Muris project.
Integration of mouse brain scRNA-seq and snRNA-seq datasets.
Integration of human PBMC scRNA-seq and human brain snRNA-seq datasets.
Integration of scRNA-seq and scATAC-seq datasets.
Integration of developmental trajectories.
Integration of spermatogenesis differentiation process across multiple species. Gene lists from Ensembl Biomart (we only use genes that are assigned with the type "ortholog_one2one" in the lists):orthologues (human vs mouse),orthologues (human vs macaque).

Installation

Portal can be installed from PyPI:

pip install portal-sc

Alternatively, Portal can also be downloaded from GitHub:

git clone https://github.com/YangLabHKUST/Portal.gitcd Portalconda env update --f environment.ymlconda activate portal

Normally the installation time is less than 5 minutes.

Quick Start

Basic Usage

Starting with raw count matrices formatted as AnnData objects, Portal uses a standard pipline adopted by Seurat and Scanpy to preprocess data, followed by PCA for dimensionality reduction. After preprocessing, Portal can be trained viamodel.train().

importportalimportscanpyassc# read AnnDataadata_1=sc.read_h5ad("adata_1.h5ad")adata_2=sc.read_h5ad("adata_2.h5ad")model=portal.model.Model()model.preprocess(adata_1,adata_2)# perform preprocess and PCAmodel.train()# train the modelmodel.eval()# get integrated latent representation of cells

The evaluating proceduremodel.eval() saves the integrated latent representation of cells inmodel.latent, which can be used for downstream integrative analysis.

Parameters in`portal.model.Model()`:

lambdacos: Coefficient of the regularizer for preserving cosine similarity across domains.Default:20.0.
training_steps: Number of steps for training.Default:2000. Usetraining_steps=1000 for datasets with sample size < 20,000.
npcs: Dimensionality of the embeddings in each domain (number of PCs).Default:30.
n_latent: Dimensionality of the shared latent space.Default:20.
batch_size: Batch size for training.Default:500.
seed: Random seed.Default:1234.

The default setting of the parameterlambdacos works in general. We also enable tuning of this parameter to achieve a better performance, seeTuninglambdacos (optional). For the integration task where the cosine similarity is not a reliable cross-domain correspondance (such as cross-species integration), we recommend to use a lower value such aslambdacos=10.0.

Memory-efficient Version

To deal with large single-cell datasets, we also developed a memory-efficient version by reading mini-batches from the disk:

model=portal.model.Model()model.preprocess_memory_efficient(adata_A_path="adata_1.h5ad",adata_B_path="adata_2.h5ad")model.train_memory_efficient()model.eval_memory_efficient()

Integrating Multiple Datasets

Portal integrates multiple datasets incrementally. Givenadata_list = [adata_1, ..., adata_n] is a list of AnnData objects, they can be integrated by running the following commands:

lowdim_list=portal.utils.preprocess_datasets(adata_list)integrated_data=portal.utils.integrate_datasets(lowdim_list)

Tuning`lambdacos` (optional)

An optional choice is to tune the parameterlambdacos in the range [15.0, 50.0]. Users can run the following command to search for an optimal parameter that yields the best integration result in terms of the mixing metric:

lowdim_list=portal.utils.preprocess_datasets(adata_list)integrated_data=portal.utils.integrate_datasets(lowdim_list,search_cos=True)

Recovering expression matrices

Portal can provide harmonized expression matrices (in scaled level or log-normalized level):

lowdim_list,hvg,mean,std,pca=portal.utils.preprocess_recover_expression(adata_list)expression_scaled,expression_log_normalized=portal.utils.integrate_recover_expression(lowdim_list,mean,std,pca)

Demos

We provide demos for users to get a quick start:Demo 1,Demo 2.

Development

This package is developed by Jia Zhao (jzhaoaz@connect.ust.hk) and Gefei Wang (gwangas@connect.ust.hk).

Citation

Jia Zhao, Gefei Wang, Jingsi Ming, Zhixiang Lin, Yang Wang, The Tabula Microcebus Consortium, Angela Ruohao Wu, Can Yang. Adversarial domain translation networks for integrating large-scale atlas-level single-cell datasets. Nature Computational Science 2, 317–330 (2022).

About

Adversarial domain translation networks for integrating large-scale atlas-level single-cell datasets

Releases3

v1.0.2 Latest

Apr 18, 2022

+ 2 releases

Packages

No packages published

Contributors2

Languages

Python100.0%

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Portal

Reproducibility

Installation

Quick Start

Basic Usage

Parameters in`portal.model.Model()`:

Memory-efficient Version

Integrating Multiple Datasets

Tuning`lambdacos` (optional)

Recovering expression matrices

Demos

Development

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases3

Packages

Uh oh!

Contributors2

Languages

Movatterモバイル変換

License

YangLabHKUST/Portal

Folders and files

Latest commit

History

Repository files navigation

Portal

Reproducibility

Installation

Quick Start

Basic Usage

Parameters inportal.model.Model():

Memory-efficient Version

Integrating Multiple Datasets

Tuninglambdacos (optional)

Recovering expression matrices

Demos

Development

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases3

Packages0

Uh oh!

Contributors2

Languages

Parameters in`portal.model.Model()`:

Tuning`lambdacos` (optional)

Packages