Get the most out of your single cell data.
Explore the docs »
ViewDemo ·ReportBug ·RequestFeature
SignacX is software developed by the Savova lab at Sanofi with afocus on single cell genomics for clinical applications. SignacXclassifies the cellular phenotype for each individual cell in singlecell RNA-sequencing data using neural networks trained with sorted bulkgene expression data from theHumanPrimary Cell Atlas. In this R implementation, we provide functionsand vignettes that demonstrate how to: integrate single cell data(mapping cells from one data set to another), classify non-human data,identify novel cell types, and classify single cell data across manytissues, diseases and technologies. To learn more, check out thepre-printhere.
Here, we provide interactive access to data from thepre-printwithSPRINGViewer. Just click the “Explore” links below, and search yourfavorite gene:
| Links | Tissue | Disease | Number of cells | Number of samples | Source | Signac version |
|---|---|---|---|---|---|---|
| Explore | Kidney | Cancer | 48,037 | 47 | Stewartet al. 2019 | v2.0.7 |
| Explore | Kidney and urine | Lupus nephritis and healthy | 5,886 | 39 | Arazi etal. 2019 | v2.0.7 |
| Explore | Lung | Cancer | 42,844 | 18 | Zilionis etal. 2020 | v2.0.7 |
| Explore | Lung | Fibrosis | 96,461 | 31 | Habermann etal. 2020 | v2.0.7 |
| Explore | Lung | Fibrosis | 109,421 | 16 | Reyfman etal. 2019 | v2.0.7 |
| Explore | Monkey PBMCs | Healthy | 5,491 | 1 | Chamberlainet al. 2021 | v2.0.7 |
| Explore | Monkey PBMCs | Healthy | 5,220 | 1 | Chamberlainet al. 2021 | v2.0.7 |
| Explore | Monkey T cells | Healthy | 5,496 | 1 | Chamberlainet al. 2021 | v2.0.7 |
| Explore | PBMCs | Cancer | 14,048 | 8 | Zilionis etal. 2020 | v2.0.7 |
| Explore | PBMCs | Healthy | 7,902 | 1 | 10XGenomics | v2.0.7 |
| Explore | PBMCs | Healthy | 4,784 | 1 | 10XGenomics | v2.0.7 |
| Explore | Skin | Atopic dermatitis | 36,690 | 17 | He etal. 2020 | v2.0.7 |
| Explore | Synovium | Rheumatoid arthritis and osteoarthritis | 8,920 | 26 | Zhanget. al 2019 | v2.0.7 |
Note: * Cell type annotations are provided at four levels (immune,celltypes, cellstates and novel celltypes). * When available, we alsoprovided information about sample covariates (i.e., disease, age,gender, FACs etc.). * Cell type annotations for all 13 data sets weregenerated with the Signac function with the default settings withoutchanging any settings or parameters.
Special thanks to Allon Klein’s lab (particularly Caleb Weinreb andSam Wolock) for hosting the data.
To install SignacX in R, simply do:
install.packages("SignacX")The main functions in Signac are:
# load the librarylibrary(SignacX)# Generate initial labelslabels=Signac(E = your_data_here)# Get cell type labelscelltypes=GenerateLabels(labels,E = your_data_here)Sometimes we don’t have time to run Signac, and need a quicksolution. Although Signac scales fine with large data sets (>300,000cells), we developed SignacFast to quickly classify single celldata:
# load the librarylibrary(SignacX)# generate labels with pre-trained modellabels_fast<-SignacFast(E = your_data_here,num.cores =4)celltypes_fast=GenerateLabels(labels_fast,E = your_data_here)To make life easier, SignacX was integrated withSeurat (versions 3 and 4), andwithSPRING. Weprovide a few vignettes:
In thepre-print,we often used Signac integrated withSPRING. Toreproduce our findings and to generate new results with SPRING, pleasevisit the SPRING repository which hasexample notebooks andinstallation instructions, particularly forprocessingCITE-seq and scRNA-seq data from 10X Genomics. Briefly, Signac isintegrated seamlessly with the output files of SPRING in R, requiringonly a few functions:
# load the Signac librarylibrary(SignacX)# dir points to the "FullDataset_v1" directory generated by the SPRING Jupyter notebookdir="./FullDataset_v1"# load the expression dataE=CID.LoadData(dir)# generate cellular phenotype labelslabels=Signac(E,spring.dir = dir)celltypes=GenerateLabels(labels,E = E,spring.dir = dir)# write cell types and Louvain clusters to SPRINGdat<-CID.writeJSON(celltypes,spring.dir = dir)After running the above functions, cellular phenotypes and Louvainclusters are ready to be visualized with SPRING Viewer, which can besetup locally as describedhere.
Another way to use Signac is with Seurat.Inthis vignette, we performed multi-modal analysis of CITE-seq PBMCsfrom 10X Genomics using Signac integrated with Seurat.
Note: * This same data set was also processed using SPRINGinthis notebook, and subsequently classified with Signac, which wasused to generate SPRING layouts for these data in thepre-print(Figures 2-4), which is available for interactive explorationhere.
Sometimes, we have single cell genomics data with diseaseinformation, and we want to know which cellular phenotypes are enrichedfor disease.Inthis vignette, we applied Signac to classify cellular phenotypes inhealthy and lupus nephritis kidney cells, and then we usedMASC to identifywhich cellular phenotypes were disease-enriched.
Note: * MASC typically requires equal numbers of cells and samplesbetween case and control: an unequal number might skew the clustering ofcells towards one sample (i.e., a “batch effect”), which could causespurious disease enrichment in the mixed effect model. Since Signacclassifies each cell independently (without using clusters), Signacannotations can be used with MASC without a priori balancing samples orcells, unlike cluster-based annotation methods.
In Supplemental Figure 8 of thepre-print,we classified single cell data for a model organism (cynomolgus monkey)for which flow-sorted datasets were generally lacking without anyadditional species-specific training. Instead, we mapped homologousgenes from theMacaca fascicularis genome to the human genomein the single cell data, and then performed cell type classificationwith Signac. We demonstrate how we mapped the gene symbolshere.
Note: * This code can be used for to identify homologous genesbetween any two species. * Monkey data used in Supplemental Figure 8 areavailable for interactive exploration in the table listed above.
In Figure 6 of thepre-print,we compiled data from three source (CellPhoneDB,GWAS catalog andFang etal. 2020) to find genes of immunological / pharmacological interest.These genes and their annotations can be accessed internally from withinSignac:
# load the librarylibrary(SignacX)# See ?Genes_Of_Interestdata("Genes_Of_Interest")In Figure 4 of thepre-print,we demonstrated that Signac mapped cell type labels from one single celldata set to another; learning CD56bright NK cells fromCITE-seq data.Here,we provide a vignette for reproducing this analysis, which can beused to map cell populations (or clusters of cells) from one data set toanother. We also provide interactive access to the single cell data thatwere annotated with the CD56bright NK cell-model (Note: theCD56bright NK cells appear in the “CellStates” annotationlayer as red cells).
| Links | Tissue | Disease | Number of cells | Number of samples | Source | Signac version |
|---|---|---|---|---|---|---|
| Explore | Kidney | Cancer | 48,037 | 47 | Stewartet al. 2019 | v2.0.7 + CD56bright NK |
| Explore | Kidney and urine | Lupus nephritis and healthy | 5,886 | 39 | Arazi etal. 2019 | v2.0.7 + CD56bright NK |
| Explore | Lung | Cancer | 42,844 | 18 | Zilionis etal. 2020 | v2.0.7 + CD56bright NK |
| Explore | Lung | Fibrosis | 96,461 | 31 | Habermann etal. 2020 | v2.0.7 + CD56bright NK |
| Explore | Lung | Fibrosis | 109,421 | 16 | Reyfman etal. 2019 | v2.0.7 + CD56bright NK |
| Explore | Monkey PBMCs | Healthy | 5,491 | 1 | Chamberlainet al. 2021 | v2.0.7 + CD56bright NK |
| Explore | Monkey PBMCs | Healthy | 5,220 | 1 | Chamberlainet al. 2021 | v2.0.7 + CD56bright NK |
| Explore | Monkey T cells | Healthy | 5,496 | 1 | Chamberlainet al. 2021 | v2.0.7 + CD56bright NK |
| Explore | PBMCs | Cancer | 14,048 | 8 | Zilionis etal. 2020 | v2.0.7 + CD56bright NK |
| Explore | PBMCs | Healthy | 4,784 | 1 | 10XGenomics | v2.0.7 + CD56bright NK |
| Explore | Skin | Atopic dermatitis | 36,690 | 17 | He etal. 2020 | v2.0.7 + CD56bright NK |
| Explore | Synovium | Rheumatoid arthritis and osteoarthritis | 8,920 | 26 | Zhanget. al 2019 | v2.0.7 + CD56bright NK |
Sometimes we don’t have time to run Signac and need a fastersolution. Although Signac scales fine with large data sets (>300,000cells) and even for large data, typically takes less than an hour, wedeveloped SignacFast to quickly classify single cell data:
# load the librarylibrary(SignacX)# generate labels with pre-trained modellabels_fast<-SignacFast(E = your_data_here,num.cores =4)celltypes_fast=GenerateLabels(labels_fast,E = your_data_here)Unlike Signac, SignacFast uses a pre-trained ensemble of neuralnetwork models generated from the HPCA reference data, speedingclasssification time ~5-10x fold. These models were generated from theHPCA training data like so:
# load the librarylibrary(SignacX)# load pre-trained neural network ensemble modelref=GetTrainingData_HPCA()# generate modelsModels_HPCA=ModelGenerator(R = training_HPCA,N =100,num.cores =4)The “Models_HPCA” are accessed from within the R package:
# load the librarylibrary(SignacX)# load pre-trained neural network ensemble modelModels=GetModels_HPCA()We demonstrate how to use SignacFast in thisvignette,which shows that the results are broadly consistent with runningSignac.
Note: * For proper use; if the concern is only major cell types(i.e., TNK and MPh cells), then SignacFast is a fine alternative toSignac.
In Figure 2-3 of thepre-print,we validated Signac with CITE-seq PBMCs. Here, we reproduced thatanalysis with SPRING (in this vignette;as was performed in the pre-print) and additionally with Seurat (inthis vignette), and provide interactive access to the datahere.
In Figure 3 of thepre-print,we validated Signac with flow cytometry and compared Signac to SingleR.We reproduced that analysis using Seuratinthis vignette, and provide interactive access to the datahere.
In Table 1 of thepre-print,we benchmarked Signac across seven different technologies: CEL-seq,Drop-Seq, inDrop, 10X (v2), 10X (v3), Seq-Well and Smart-Seq2; thisanalysis was reproducedhere.
See theopenissues for a list of proposed features (and known issues).
Any contributions you make aregreatlyappreciated.
git checkout -b feature/AmazingFeature)git commit -m 'Add some AmazingFeature')git push origin feature/AmazingFeature)You can also open a pull request to commit to the master branch.
Distributed under the GPL v3.0 License. SeeLICENSE formore information.
Mathew Chamberlain - chamberlainphd@gmail.com
Project Link:https://github.com/mathewchamberlain/SignacX