Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

A cross-platform and ultrafast toolkit for FASTA/Q file manipulation

License

NotificationsYou must be signed in to change notification settings

shenwei356/seqkit

Repository files navigation

Subcommands of SeqKit2

Features

  • Easy to install (download)
    • Providing statically linked executable binaries for multiple platforms (Linux/Windows/macOS, amd64/arm64)
    • Light weight and out-of-the-box, no dependencies, no compilation, no configuration
    • conda install -c bioconda seqkit
  • Easy to use
    • Ultrafast (seetechnical-details andbenchmark)
    • Seamlessly parsing both FASTA and FASTQ formats
    • Supporting (gzip/xz/zstd/bzip2/lz4 compressed) STDIN/STDOUT and input/output file, easily integrated in pipe
    • Reproducible results (configurable rand seed insample andshuffle)
    • Supporting custom sequence ID via regular expression
    • SupportingBash/Zsh autocompletion
  • Versatile commands (usages and examples)

Installation

Method 1: Download binaries

Go toDownload Page, where you can find download links to various platforms.

Method 2: Install via Pixi

pixi global install -c bioconda seqkit

Method 3: Install via conda

conda install -c bioconda seqkit

Method 4: Install via homebrew

brew install seqkit

Subcommands

CategoryCommandFunctionInputStrand-sensitivityMulti-threads
Basic operationseqTransform sequences: extract ID/seq, filter by length/quality, remove gaps…FASTA/Q
statsSimple statistics: #seqs, min/max_len, N50, Q20%, Q30%…FASTA/Q
subseqGet subsequences by region/gtf/bed, including flanking sequencesFASTA/Q+ or/and -
slidingExtract subsequences in sliding windowsFASTA/Q+ only
faidxCreate the FASTA index file and extract subsequences (with more features than samtools faidx)FASTA+ or/and -
translatetranslate DNA/RNA to protein sequenceFASTA/Q+ or/and -
watchMonitoring and online histograms of sequence featuresFASTA/Q
scatReal time concatenation and streaming of fastx filesFASTA/Q
Format conversionfq2faConvert FASTQ to FASTA formatFASTQ
fx2tabConvert FASTA/Q to tabular formatFASTA/Q
fa2fqRetrieve corresponding FASTQ records by a FASTA fileFASTA/Q+ only
tab2fxConvert tabular format to FASTA/Q formatTSV
convertConvert FASTQ quality encoding between Sanger, Solexa and IlluminaFASTA/Q
SearchinggrepSearch sequences by ID/name/sequence/sequence motifs, mismatch allowedFASTA/Q+ and -partly, -m
locateLocate subsequences/motifs, mismatch allowedFASTA/Q+ and -partly, -m
ampliconExtract amplicon (or specific region around it), mismatch allowedFASTA/Q+ and -partly, -m
fishLook for short sequences in larger sequencesFASTA/Q+ and -
Set operationsampleSample sequences by number or proportionFASTA/Q
sample2Sample sequences by number or proportion (version 2)FASTA/Q
rmdupRemove duplicated sequences by ID/name/sequenceFASTA/Q+ and -
commonFind common sequences of multiple files by id/name/sequenceFASTA/Q+ and -
duplicateDuplicate sequences N timesFASTA/Q
splitSplit sequences into files by id/seq region/size/parts (mainly for FASTA)FASTA preffered
split2Split sequences into files by size/parts (FASTA, PE/SE FASTQ)FASTA/Q
headprint the first N FASTA/Q records, or leading records whose total length >= LFASTA/Q
head-genomePrint sequences of the first genome with common prefixes in nameFASTA/Q
rangePrint FASTA/Q records in a range (start:end)FASTA/Q
pairPatch up paired-end reads from two fastq filesFASTA/Q
EditreplaceReplace name/sequence by regular expressionFASTA/Q+ only
renameRename duplicated IDsFASTA/Q
concatConcatenate sequences with same ID from multiple filesFASTA/Q+ only
restartReset start position (rotate) for circular genomesFASTA/Q+ only
mutateEdit sequence (point mutation, insertion, deletion)FASTA/Q+ only
sanaSanitize broken single line FASTQ filesFASTQ
OrderingsortSort sequences by id/name/sequence/lengthFASTA preffered
shuffleShuffle sequencesFASTA preffered
BAM processingbamMonitoring and online histograms of BAM record featuresBAM
MiscellaneoussumCompute message digest for all sequences in FASTA/Q filesFASTA/Q
merge-slidesMerge sliding windows generated from seqkit slidingTSV

Notes:

  • Strand-sensitivity:
    • + only: only processing on the positive/forward strand.
    • + and -: searching on both strands.
    • + or/and -: depends on users' flags/options/arguments.
  • Multiple-threads: Using the default 4 threads is fast enough for most commands, some commands can benefit from extra threads.

Citation

Wei Shen*, Botond Sipos, and Liuyang Zhao. 2024. SeqKit2: A Swiss Army Knife for Sequence and Alignment Processing.iMeta e191.doi:10.1002/imt2.191.

Contributors

Acknowledgements

We thank all users for their valuable feedback and suggestions. We thank all contributors for improving the code and documentation.

We appreciateKlaus Post for his fantastic packages (compress andpgzip) which accelerate gzip file reading and writing.

Contact

Create an issue to report bugs,propose new functions or ask for help.

License

MIT License

Starchart

Stargazers over time

About

A cross-platform and ultrafast toolkit for FASTA/Q file manipulation

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages


[8]ページ先頭

©2009-2026 Movatter.jp