- Notifications
You must be signed in to change notification settings - Fork5
Bioinformatics 101 tool for counting unique k-length substrings in DNA
License
suchapalaver/krust
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
krust
is ak-mer counter - a bioinformatics 101 tool for counting the frequency of substrings of lengthk
within strings of DNA data.krust
is written in Rust and run from the command line. It takes a FASTA file of DNA sequences and will output all canonical k-mers (the double helix means each k-mer has areverse complement) and their frequency across all records in the given data.krust
is tested for accuracy againstjellyfish.
krust: counts k-mers, writtenin rustUsage: krust<k><path>Arguments:<k> provides k length, e.g. 5<path> path to a FASTA file, e.g. /home/lisa/bio/cerevisiae.pan.faOptions: -h, --help Printhelp information -V, --version Print version information
krust
supports eitherrust-bio
orneedletail
to read FASTA record. Use the--features
flag to select.
Runkrust
withrust-bio
's fasta reader to count5-mers like this:
cargo run --release --features rust-bio -- 5 your/local/path/to/fasta_data.fa
or, searching for21-mers withneedletail
as the fasta reader, like this:
cargo run --release --features needletail -- 21 your/local/path/to/fasta_data.fa
krust
prints tostdout
, writing, on alternate lines:
>114928ATGCC>289495AATCA...
About
Bioinformatics 101 tool for counting unique k-length substrings in DNA