Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 876 Commits
.github		.github
benchmark		benchmark
doc		doc
seqkit		seqkit
tests		tests
.gitignore		.gitignore
.travis.yml		.travis.yml
CHANGELOG.md		CHANGELOG.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README-v0.3.1.1.md		README-v0.3.1.1.md
README.md		README.md
go.mod		go.mod
go.sum		go.sum
seqkit2.jpg		seqkit2.jpg

Repository files navigation

SeqKit - a cross-platform and ultrafast toolkit for FASTA/Q file manipulation

Try SeqKit in your browser (Tutorials and Exercises provided bysandbox.bio)
Documents:http://bioinf.shenwei.me/seqkit(Usage,FAQs,Tutorial,andBenchmark)
Source code:https://github.com/shenwei356/seqkit
Latest version:
Please cite:
Others:

Features

Easy to install (download)
- Providing statically linked executable binaries for multiple platforms (Linux/Windows/macOS, amd64/arm64)
- Light weight and out-of-the-box, no dependencies, no compilation, no configuration
- conda install -c bioconda seqkit
Easy to use
- Ultrafast (seetechnical-details andbenchmark)
- Seamlessly parsing both FASTA and FASTQ formats
- Supporting (gzip/xz/zstd/bzip2/lz4 compressed) STDIN/STDOUT and input/output file, easily integrated in pipe
- Reproducible results (configurable rand seed insample andshuffle)
- Supporting custom sequence ID via regular expression
- SupportingBash/Zsh autocompletion
Versatile commands (usages and examples)
- Practical functions supported by38 subcommands

Installation

Method 1: Download binaries

Go toDownload Page, where you can find download links to various platforms.

Method 2: Install via Pixi

pixi global install -c bioconda seqkit

Method 3: Install via conda

conda install -c bioconda seqkit

Method 4: Install via homebrew

brew install seqkit

Subcommands

Category	Command	Function	Input	Strand-sensitivity	Multi-threads
Basic operation	seq	Transform sequences: extract ID/seq, filter by length/quality, remove gaps…	FASTA/Q
	stats	Simple statistics: #seqs, min/max_len, N50, Q20%, Q30%…	FASTA/Q		✓
	subseq	Get subsequences by region/gtf/bed, including flanking sequences	FASTA/Q	+ or/and -
	sliding	Extract subsequences in sliding windows	FASTA/Q	+ only
	faidx	Create the FASTA index file and extract subsequences (with more features than samtools faidx)	FASTA	+ or/and -
	translate	translate DNA/RNA to protein sequence	FASTA/Q	+ or/and -
	watch	Monitoring and online histograms of sequence features	FASTA/Q
	scat	Real time concatenation and streaming of fastx files	FASTA/Q		✓
Format conversion	fq2fa	Convert FASTQ to FASTA format	FASTQ
	fx2tab	Convert FASTA/Q to tabular format	FASTA/Q
	fa2fq	Retrieve corresponding FASTQ records by a FASTA file	FASTA/Q	+ only
	tab2fx	Convert tabular format to FASTA/Q format	TSV
	convert	Convert FASTQ quality encoding between Sanger, Solexa and Illumina	FASTA/Q
Searching	grep	Search sequences by ID/name/sequence/sequence motifs, mismatch allowed	FASTA/Q	+ and -	partly, -m
	locate	Locate subsequences/motifs, mismatch allowed	FASTA/Q	+ and -	partly, -m
	amplicon	Extract amplicon (or specific region around it), mismatch allowed	FASTA/Q	+ and -	partly, -m
	fish	Look for short sequences in larger sequences	FASTA/Q	+ and -
Set operation	sample	Sample sequences by number or proportion	FASTA/Q
	sample2	Sample sequences by number or proportion (version 2)	FASTA/Q
	rmdup	Remove duplicated sequences by ID/name/sequence	FASTA/Q	+ and -
	common	Find common sequences of multiple files by id/name/sequence	FASTA/Q	+ and -
	duplicate	Duplicate sequences N times	FASTA/Q
	split	Split sequences into files by id/seq region/size/parts (mainly for FASTA)	FASTA preffered
	split2	Split sequences into files by size/parts (FASTA, PE/SE FASTQ)	FASTA/Q
	head	print the first N FASTA/Q records, or leading records whose total length >= L	FASTA/Q
	head-genome	Print sequences of the first genome with common prefixes in name	FASTA/Q
	range	Print FASTA/Q records in a range (start:end)	FASTA/Q
	pair	Patch up paired-end reads from two fastq files	FASTA/Q
Edit	replace	Replace name/sequence by regular expression	FASTA/Q	+ only
	rename	Rename duplicated IDs	FASTA/Q
	concat	Concatenate sequences with same ID from multiple files	FASTA/Q	+ only
	restart	Reset start position (rotate) for circular genomes	FASTA/Q	+ only
	mutate	Edit sequence (point mutation, insertion, deletion)	FASTA/Q	+ only
	sana	Sanitize broken single line FASTQ files	FASTQ
Ordering	sort	Sort sequences by id/name/sequence/length	FASTA preffered
	shuffle	Shuffle sequences	FASTA preffered
BAM processing	bam	Monitoring and online histograms of BAM record features	BAM
Miscellaneous	sum	Compute message digest for all sequences in FASTA/Q files	FASTA/Q		✓
	merge-slides	Merge sliding windows generated from seqkit sliding	TSV

Notes:

Strand-sensitivity:
- + only: only processing on the positive/forward strand.
- + and -: searching on both strands.
- + or/and -: depends on users' flags/options/arguments.
Multiple-threads: Using the default 4 threads is fast enough for most commands, some commands can benefit from extra threads.