- Notifications
You must be signed in to change notification settings - Fork0
🔎 R package for detecting damaged cells in single-cell RNA sequencing data
License
AlicenJoyHenning/DamageDetective
Folders and files
| Name | Name | Last commit message | Last commit date | |
|---|---|---|---|---|
Repository files navigation
Description |Installation |Quick start |Contribute |Authors |License |References
Jump to the DamageDetective website
Damaged cells are an artifact of single-cell RNA sequencing (scRNA-seq) formed when cells succumb to stress before being sequenced. As a result, the gene expression data captured does not reflect biologically viable cells and introduces technical variability that is indistinguishable from functionally relevant variability. Filtering these cells is a standard task in scRNA-seq quality control (QC), though lacks standardisation in practice.
The majority of approaches filter damaged cells according to deviations in cell-level QC metrics. This outlier-based detection implicitly assumes viable cells follow unimodal distributions across QC metrics, where deviation is synonymous with damage. This assumption falters in the context of heterogeneous data and risks introducing filtering bias related to cell type abundance. Recent methods address this by defining damage within distinct distributions, representing cell populations, independently. This, however, assumes all distinct distributions are associated with viable cell populations and risks leaving abundant damage undetected and ultimately misclassified.
DamageDetective takes a different approach, rather than detecting damage by measuring the extent to which cells deviate from one another, it measures the extent to which cells deviate from artificially damaged profiles of themselves, created through simulating cytoplasmic RNA escape–a characteristic of damage resulting from the loss of plasma membrane integrity. This is inspired by the approach ofDoubletFinder—a high-performing tool of another prominent scRNA-seq artifact.
LikeDoubletFinder,DamageDetective uses principal component analysis to compute the proximity of true cells to artificial cells. This is calculated as a proportion (pANN) of a cell's nearest neighbours that are of artificial origin, reflecting the likelihood that the cell has experienced the same cytoplasmic RNA loss as its artificial neighbours, i.e., is damaged. This score, ranging from 0 to 1, provides an intuitive scale for filtering that is standardised across cell types, sample origin, and experimental design.
InstallDamageDetective from CRAN (R >= 4.4.0),
install.packages('DamageDetective')Or the latest development version on GitHub (R >= 3.5.0),
library(devtools)devtools::install_github("AlicenJoyHenning/DamageDetective", build_vignettes = TRUE)To verify installation, run the following to see if you can view the package vignette and the function help pages,
library(DamageDetective)help(package="DamageDetective")
This demonstration can be followed immediately after loading the package using the internal dummy dataset. For examples with true data and more detailed explanations, please refer to the package articleswebsite.
Damage detection is carried out by thedetect_damage function that accepts count matrices,Seurat orSingleCellExperiment objects, or alignment files (package tutorials) as input. We will demonstrate using a dummy count matrix,test_counts, a subset of the(kotliarov-pbmc-2020) PBMC dataset provided in thescRNAseq package.
library(DamageDetective)library(Matrix)data("test_counts",package="DamageDetective")dim(test_counts)
Expected outcome,
[1]32738500
Whiledetect_damage requires only a count matrix as input, additional parameters control aspects of the function's computations. Of these, we recommendribosome_penalty be adjusted for each dataset using theselect_penalty function as shown below,
penalty<- select_penalty(count_matrix=test_counts)penalty
Expected outcome,
Testingpenaltyof0.1...Testingpenaltyof0.15...Testingpenaltyof0.2...Testingpenaltyof0.25...Stoppingearly:dTNNisnolongerimproving.0.1
DamageDetective performs filtering using the proximity scores according to a threshold. By default,DamageDetective offers the threshold of0.5 where values greater than0.5 reflect more permissive filtering and values closer to0 reflect more stringent filtering. We recommend the default, but suggest that if adjustments are made, they are informed by the outputdetect_damage plots,generate_plot = TRUE.
Damage detection is run as shown below, using the count matrix and ribosomal penalty as inputs. Below, we have additionally specified forfilter_counts parameter to beTRUE. This will use the defaultfilter_threshold to detect damaged cells for removal and return the filtered count matrix that can be used immediately afterwards for the remainder of pre-processing. Though implemented in R,DamageDetective provides output that is platform-agnostic and can be integrated into any existing single-cell analysis workflow.
# Perform damage detectiondetection_results<- detect_damage(count_matrix=test_counts,ribosome_penalty=penalty,filter_counts=TRUE)# View the resulting count matrixdim(detection_results$output)
Expected outcome,
Clusteringcells...Simulatingdamage...ComputingpANN...32738461
Alternatively, iffilter_counts is set toFALSE, a data frame will be given as output containing the damage scores for each cell. This is provided for the user if they wish to interact with theDamageDetective results directly. From here, a user can filter their data manually, as is done byfilter_counts=TRUE automatically.
# Perform damage detectiondetection_results<- detect_damage(count_matrix=test_counts,ribosome_penalty=penalty,filter_counts=FALSE,seed=7)# View outputprint(head(detection_results$output),row.names=FALSE)# Filter matrixundamaged_cells<- subset(detection_results$output,DamageDetective<0.7)filtered_matrix<-test_counts[,undamaged_cells$Cells]dim(filtered_matrix)
Expected outcome,
Clusteringcells...Simulatingdamage...ComputingpANN...CellsDamageDetectiveTCTGGAAAGCCCAACC_H1B2ln60CCGTTCATCGTGGGAA_H1B2ln20CTTCTCTTCAGCCTAA_H1B2ln10GGATTACAGGGATGGG_H1B2ln10TCTATTGTCTGGTATG_H1B2ln20ACGGGTCAGACAAGCC_H1B2ln6032738461
We are committed to the improvement ofDamageDetective and encourage users to report any bugs or difficulties they encounter. Contributions that refine or challenge the assumptions and heuristics used to detect damaged cells are also welcome. Please reach out via the maintainer's email listed in theDESCRIPTION file or start a public discussion.
DamageDetective is made available for public use through theGNU AGPL-3.0
Alicen Henning
Stellenbosch University, Cape Town, South Africa
Bioinformatics and Computational Biology
This work was done under the supervision of Prof Marlo Möller, Prof Gian van der Spuy, and Prof André Loxton.
McGinnis, C. S., Murrow, L. M., & Gartner, Z. J. (2019). DoubletFinder: Doublet Detection in Single-Cell RNA Sequencing Data Using Artificial Nearest Neighbors.Cell Systems, 8(4), 329-337.e4.https://doi.org/10.1016/j.cels.2019.03.003
Risso D, Cole M (2024).scRNAseq: Collection of Public Single-Cell RNA-Seq Datasets. doi:10.18129/B9.bioc.scRNAseqhttps://doi.org/10.18129/B9.bioc.scRNAseq, R package version 2.20.0,https://bioconductor.org/packages/scRNAseq.
About
🔎 R package for detecting damaged cells in single-cell RNA sequencing data
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Packages0
Uh oh!
There was an error while loading.Please reload this page.