dpeerlab/seqcPublic

forked fromambrosejcarr/seqc

NotificationsYou must be signed in to change notification settings
Fork5
Star16

Single-Cell Sequencing Quality Control and Processing Software

License

GPL-2.0 license

16 stars 10 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 1,678 Commits
.circleci		.circleci
.github		.github
docs		docs
src		src
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
nose2.cfg		nose2.cfg
repackage.py		repackage.py
requirements.txt		requirements.txt
setup.py		setup.py

Repository files navigation

SEquence Quality Control (SEQC -- /sek-si:/)

Overview:

SEQC is a python package that processes single-cell sequencing data in the cloud and analyzes it interactively on your local machine.

To faciliate easy installation and use, we have made available Amazon Machine Images (AMIs) that come with all of SEQC's dependencies pre-installed. In addition, we have uploaded common genome indices (-i/--index parameter) and barcode data (--barcode-files) to public Amazon S3 repositories. These links can be provided to SEQC and it will automatically fetch them prior to initiating an analysis run. Finally, it can fetch input data directly from BaseSpace or amazon s3 for analysis.

For users with access to in-house compute clusters, SEQC can be installed on your systems and run using the--local parameter.

Dependencies:

Python 3

Python3 must be installed on your local machine to run SEQC. We recommend installing Python3 through Miniconda (https://docs.conda.io/en/latest/miniconda.html).

Python 3 Libraries

We recommend creating a virtual environment before installing anything:

conda create -n seqc python=3.7.7 pipconda activate seqc

pip install Cythonpip install numpypip install bhtsne

STAR, Samtools, and HDF5

To process data locally using SEQC, you must install theSTAR Aligner,Samtools, andhdf5. If you only intend to use SEQC to trigger remote processing on AWS, these dependencies are optional. We recommend installing samtools and hdf5 through your package manager, if possible.

SEQC Installation

Once all dependencies have been installed, SEQC can be installed by running:

export SEQC_VERSION="0.2.11"wget https://github.com/hisplan/seqc/archive/v${SEQC_VERSION}.tar.gztar xvzf v${SEQC_VERSION}.tar.gzcd seqc-${SEQC_VERSION}pip install.

Hardware Requirements:

For processing a single lane (~200M reads) against human- and mouse-scale genomes, SEQC requires 30GB RAM, approximately 200GB free hard drive space, and scales linearly with additional compute cores. If running on AWS (see below), jobs are automatically scaled up or down according to the size of the input. There are no hardware requirements for the computer used to launch remote instances.

Running SEQC on Local Machine:

Download an example dataset (1k PBMCs from a healthy donor; freely available at 10x Genomicshttps://support.10xgenomics.com/single-cell-gene-expression/datasets/3.0.0/pbmc_1k_v3):

wget https://cf.10xgenomics.com/samples/cell-exp/3.0.0/pbmc_1k_v3/pbmc_1k_v3_fastqs.tartar xvf pbmc_1k_v3_fastqs.tar

Move R1 FASTQ files to thebarcode folder and R2 FASTQ files to thegenomic folder:

mkdir barcodemkdir genomicmv ./pbmc_1k_v3_fastqs/*R1*.fastq.gz barcode/mv ./pbmc_1k_v3_fastqs/*R2*.fastq.gz genomic/

Download the 10x barcode whitelist file:

mkdir whitelistwget https://seqc-public.s3.amazonaws.com/barcodes/ten_x_v3/flat/3M-february-2018.txtmv 3M-february-2018.txt ./whitelist/

The resulting directory structure should look something like this:

.├── barcode│   ├── pbmc_1k_v3_S1_L001_R1_001.fastq.gz│   └── pbmc_1k_v3_S1_L002_R1_001.fastq.gz├── genomic│   ├── pbmc_1k_v3_S1_L001_R2_001.fastq.gz│   └── pbmc_1k_v3_S1_L002_R2_001.fastq.gz├── pbmc_1k_v3_fastqs│   ├── pbmc_1k_v3_S1_L001_I1_001.fastq.gz│   └── pbmc_1k_v3_S1_L002_I1_001.fastq.gz├── pbmc_1k_v3_fastqs.tar└── whitelist    └── 3M-february-2018.txt

Create a reference package (STAR index + gene annotation):

SEQC index \  --organism homo_sapiens \  --ensemble-release 93 \  --valid-biotypes protein_coding lincRNA antisense IG_V_gene IG_D_gene IG_J_gene IG_C_gene TR_V_gene TR_D_gene TR_J_gene TR_C_gene \  --read-length 101 \  --folder index \  --local

Run SEQC:

export AWS_DEFAULT_REGION=us-east-1export SEQC_MAX_WORKERS=7SEQC run ten_x_v3 \  --index ./index/ \  --barcode-files ./whitelist/ \  --barcode-fastq ./barcode/ \  --genomic-fastq ./genomic/ \  --output-prefix PBMC \  --no-filter-low-coverage \  --min-poly-t 0 \  --star-args runRNGseed=0 \  --local

Running SEQC on Amazon Web Services:

SEQC can be run on any unix-based operating system, however it also features the ability to automatically spawn Amazon Web Services instances to process your data.

Run SEQC:

SEQC run ten_x_v2 \  --ami-id ami-08652ee2477761403 \  --user-tags Job:Test,Project:PBMC-Test,Sample:pbmc_1k_v3 \  --index s3://seqc-public/genomes/hg38_long_polya/ \  --barcode-files s3://seqc-public/barcodes/ten_x_v2/flat/ \  --genomic-fastq s3://.../genomic/ \  --barcode-fastq s3://.../barcode/ \  --upload-prefix s3://.../seqc-results/ \  --output-prefix PBMC \  --no-filter-low-coverage \  --min-poly-t 0 \  --star-args runRNGseed=0

About

Single-Cell Sequencing Quality Control and Processing Software

Releases10

v0.2.11 Latest

Mar 26, 2022

+ 9 releases

Packages

No packages published

Languages

Python98.0%
HTML1.1%
Other0.9%

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Folders and files

Latest commit

History

Repository files navigation

SEquence Quality Control (SEQC -- /sek-si:/)

Overview:

Dependencies:

Python 3

Python 3 Libraries

STAR, Samtools, and HDF5

SEQC Installation

Hardware Requirements:

Running SEQC on Local Machine:

Running SEQC on Amazon Web Services:

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases10

Packages

Uh oh!

Languages

Movatterモバイル変換

License

dpeerlab/seqc

Folders and files

Latest commit

History

Repository files navigation

SEquence Quality Control (SEQC -- /sek-si:/)

Overview:

Dependencies:

Python 3

Python 3 Libraries

STAR, Samtools, and HDF5

SEQC Installation

Hardware Requirements:

Running SEQC on Local Machine:

Running SEQC on Amazon Web Services:

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases10

Packages0

Uh oh!

Languages

Packages