Movatterモバイル変換


[0]ホーム

URL:


HomeDEVELOPER
Related Resources
Data Science

Accelerating Single Cell Genomic Analysis using RAPIDS

Sep 02, 2020

AI-Generated Summary

Like
Dislike
  • Single-cell genomics is a new field that allows scientists to explore the genetic material of individual cells, identifying new cell types and understanding how they respond to disease or drugs.
  • RAPIDS, a suite of open-source libraries, accelerates data science workflows using GPU acceleration, enabling interactive data analysis on large datasets with Python APIs similar to NumPy, Pandas, and scikit-learn.
  • Using RAPIDS, researchers can analyze large single-cell datasets, such as 1 million mouse brain cells, in just 11 minutes on a single NVIDIA GPU, making it possible to perform interactive exploratory data analysis and leading to faster scientific discoveries.

AI-generated content may summarize information incompletely. Verify important information.Learn more

The human body is made up of nearly 40 trillion cells, of many different types. Recent advances in experimental biology have made it possible to explore the genetic material of single cells. With the birth of this new field of single-cell genomics, scientists can now probe the DNA and RNA of individual cells in the human body.

Single-cell genomic analysis has identified new types of cells in the human body, discovered what makes these cells different from each other, and how different types of cells respond to disease or drugs. Single-cell genomics has also proven key in the current COVID-19 pandemic, identifying cells susceptible to infection and revealing changes in the immune systems of infected patients.

Schematic showing a matrix of gene activity across single cells, which is analyzed to produce a 2-D visualization showing clusters of similar cells.
Figure 1. Workflow for a single-cell RNA sequencing experiment. Individual cells are isolated and gene activity is measured in each cell. Cells with similar gene activity are clustered together to identify the various types of cells in the population.

The availability of single-cell data is continuously increasing, as are dataset sizes, with recent experiments sequencing millions of cells. This analysis is often exploratory and further benefits from being interactive – to identify different types of cells at finer scales, to compare the cell types and to visualize the relationships between them. Current workflows are still very slow, making them prohibitive for the interactive analysis needed for research.

RAPIDS: Accelerating data science with GPUs

RAPIDS is a suite of open-source libraries that can speed up end-to-end data science workflows through the power of GPU acceleration. RAPIDS makes it possible to perform interactive data analysis on large datasets using Python APIs that closely resemble NumPy, Pandas, and scikit-learn.

Consider a typical workflow to perform single cell analysis. This begins with a matrix that maps the counts of each gene encountered in each cell. Preprocessing steps are performed to filter out noise, then the data is normalized to obtain the activity of every human gene in every individual cell of the dataset. Machine learning is also commonly used in this step to correct artifacts from data collection. Next, you perform dimensionality reduction beforeclustering and visualization to identify clusters of cells with similar genetic activity. Finally, you compare the genetic activity of these cell clusters to understand why different types of cells behave and respond differently.

Pipeline showing the process of RNA-seq data analysis and RAPIDS libraries that were used to accelerate each step.
Figure 2: Pipeline showing the steps in analysis of single-cell RNA sequencing data. Starting from a matrix of gene activity in every cell, RAPIDS libraries can be used to perform data processing, dimensionality reduction, clustering, and visualization, and to discover differential genes with different activity across clusters.

We released a GPU-accelerated version of this exact workflow in theclara-parabricks/rapids-single-cell-examples GitHub repo. The repo contains an examplenotebook that uses RAPIDS andScanpy to analyze a dataset of 70,000 human lung cells, to identify cells that are susceptible to COVID-19. Scanpy is a toolkit for analyzing single-cell gene expression data, with options to accelerate specific commands using RAPIDS. We also have aCPU version of this notebook in the repo for comparison.

For example, running UMAP to visualize almost 70,000 cells with RAPIDS requires the following command:

sc.tl.umap(adata, min_dist=umap_min_dist, spread=umap_spread, method='rapids')
UMAP visualization showing approximately 70,000 cells grouped into 35 clusters.
Figure 3. UMAP visualization of approximately 70,000 cells from human lung samples, created by RAPIDS. Cells are labeled by Louvain clustering.

Generating this UMAP visualization takes one second using RAPIDS, compared to 80 seconds on a CPU. In fact, RAPIDS can accelerate the entire single-cell analysis workflow, making it possible to do interactive exploratory data analysis even on large datasets.

Instancem5a.12xlargep3.2xlargeAcceleration Factor
CPU/GPU typeIntel Xeon Platinum 8000, 48 vCPUsV100-16GB
Preprocessing311844
PCA183.45
t-SNE2082.295
k-Means clustering310.478
KNN256.14
UMAP80180
Louvain clustering170.357
Differential Gene Expression5410.85
End-to-end787 (13 Min)134 (2 Min)6
Instance Price/hr ($)2.0643.06 
Total Run Cost ($)0.4510.1144
Table 1: CPU runtime, GPU runtime, and GPU acceleration for each step in the analysis of approximately 70,000 human lung cells. All times are in seconds.

Analyzing one million cells in 11 minutes

We applied our RAPIDS analysis workflow next to one of the largest single-cell datasets available, one million mouse brain cells sequenced by 10X Genomics. For more information, see the1M_brain_gpu_analysis_uvm.ipynb Jupyter notebook.

With this scale of data, analysis on CPUs becomes impractically slow; our end-to-end workflow took over three hours to run on an AWS M5a CPU instance. This makes interactive analysis virtually impossible. On the other hand, we observed even higher GPU acceleration on this larger dataset and were able to analyze the entire dataset in just over 11 minutes on a single GPU. Running the RAPIDS analysis on AWS was also 3x cheaper than the CPU version!

AWS Instancem5a.12xlargep3.8xlargeAccelerationFactor
CPU/GPU typeIntel Xeon Platinum 8000, 48 vCPUsV100-16GB
Preprocessing403332312.5
PCA3420.61.7
t-SNE541741132.1
k-Means clustering1062.150.5
KNN58553.411.0
UMAP175120.386.3
Louvain clustering5972.5238.8
End-to-end13002672.719.3
Instance Price/hr ($)2.06412.24 
Total Run Cost ($)7.4552.2873.3
Table 2. CPU runtime, GPU runtime, and GPU acceleration for each step in the analysis of 1 million mouse brain cells. All times are in seconds.

A GPU-powered cell browser for interactive single-cell analysis

As I mentioned earlier, the speed of data analysis with RAPIDS enables researchers to analyze data interactively in real time. We made this process even easier by developing a GPU-powered interactive cell browser that runs within aJupyter notebook. Within this cell browser, you can visualize all the cells in a dataset and perform clustering analysis of your data through point and click methods. Using RAPIDS, these steps run in real time.

In this post, I show how you can easily select a group of cells and perform UMAP and Louvain clustering to identify subpopulations within this cell type.

Animated GIF showing a UMAP visualization of cells. A group of cells is selected using the mouse pointer and re-clustered using RAPIDS.
Figure 4. Point-and-click re-clustering of a selected group of cells in real time, by using RAPIDS in an interactive cell browser.

Conclusion

In this post, you saw how easy it is to use RAPIDS to accelerate single-cell genomic analysis on GPUs. With RAPIDS, it becomes easy to explore the data interactively in real time, cluster cells at different scales, and re-analyze large datasets with different parameters. All of this enables faster scientific discoveries. 

In addition to the APIs covered, RAPIDS has a large library of other algorithms that you might find useful in your work. For more information, see theclara-parabricks/rapids-single-cell-examples GitHub repo for this work as well asRAPIDS.   

Like

Tags

About the Authors

Avatar photo
About Avantika Lal
Avantika Lal is a senior scientist on the NVIDIA genomics team. She develops tools that use GPUs and deep learning to accelerate and improve the analysis of human genomes. Prior to NVIDIA, she was a postdoctoral fellow in the departments of Genetics and Pathology at Stanford University.

Comments

Related posts

Decorative image.

Driving Toward Billion-Cell Analysis and Biological Breakthroughs with RAPIDS-singlecell

Driving Toward Billion-Cell Analysis and Biological Breakthroughs with RAPIDS-singlecell

GPU-Accelerated Single-Cell RNA Analysis with RAPIDS-singlecell

GPU-Accelerated Single-Cell RNA Analysis with RAPIDS-singlecell

Analyzing the RNA-Sequence of 1.3M Mouse Brain Cells with RAPIDS on NVIDIA GPUs

Analyzing the RNA-Sequence of 1.3M Mouse Brain Cells with RAPIDS on NVIDIA GPUs

Using GPUs to Analyze COVID-19 Short Read Sequencing Data

Using GPUs to Analyze COVID-19 Short Read Sequencing Data

Share Your Science: Understanding Cancer Biology with GPUs

Share Your Science: Understanding Cancer Biology with GPUs

Related posts

How to Accelerate Community Detection in Python Using GPU-Powered Leiden

How to Accelerate Community Detection in Python Using GPU-Powered Leiden

Efficient Transforms in cuDF Using JIT Compilation

Efficient Transforms in cuDF Using JIT Compilation

How to Work with Data Exceeding VRAM in the Polars GPU Engine

How to Work with Data Exceeding VRAM in the Polars GPU Engine

AI in Manufacturing and Operations at NVIDIA: Accelerating ML Models with NVIDIA CUDA-X Data Science

AI in Manufacturing and Operations at NVIDIA: Accelerating ML Models with NVIDIA CUDA-X Data Science

Accelerating GPU Analytics Using RAPIDS and Ray

Accelerating GPU Analytics Using RAPIDS and Ray

[8]ページ先頭

©2009-2025 Movatter.jp