- Notifications
You must be signed in to change notification settings - Fork0
A fork of the originalhttps://github.com/medvedevgroup/UST tool.
License
jermp/UST
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
UST is a bioinformatics tool for constructing a spectrum-preserving string set (SPSS) representation from sets of k-mers.
To install, compile from source:
git clone https://github.com/jermp/USTcd USTmake
After compiling, use
./ust -i [unitigs.fa] -k [kmer_size]
e.g.
./ust -i examples/k11.unitigs.fa -k 11
The important parameters are:
k [int]
: The k-mer size that was used to generate the input, i.e. the length of the nodes of the node-centric de Bruijn graph.i [input-file]
: Unitigs file produced byBCALM2 in FASTA format.a [0 or 1]
: Default is 0. A value of 1 tells UST to preserve abundance. Use this option when the input file was generated with the-all-abundance-counts
option of BCALM2.
The output is a FASTA file with extenstion "ust.fa" in the working folder, which is the SPSS representaiton of the input.
If the program is run with the option-a 1
, then the header line of each sequence will also contain the abundance counts asin the provided BCALM input file.
In order to build a SPSS representation for your k-mer set, you must first runBCALM2 on your set of k-mers. BCALM2 will construct a set of unitigs. Those unitigs are then fed as input toust
, which outputs a FASTA file with the SPSS representation. Note that the k parameter toust
must match the-kmer-size
used when running BCALM2.
If you would like to store the data on disk in compressed form (like UST-Compress in our paper), you can then install and runMFCompress on the output of UST as follows:MFCompressC mykmers.ust.fa
If you would like to build a membership data structure based on UST, then see theSSHash repository.
If using UST in your research, please cite
- Amatur Rahman and Paul Medvedev,Representation of k-mer sets using spectrum-preserving string sets, RECOMB 2020.
- Here is the bibtex entry:
@inproceedings{RahmanMedvedevRECOMB20, author = {Amatur Rahman and Paul Medvedev}, title = {Representation of $k$-mer sets using spectrum-preserving string sets}, booktitle = {Research in Computational Molecular Biology - 24th Annual International Conference, {RECOMB} 2020, Padua, Italy, May 10-13, 2020, Proceedings}, series = {Lecture Notes in Computer Science}, volume = {12074}, pages = {152--168}, publisher = {Springer}, year = {2020}
Note that the general notion of an SPSS was independently introduced under the name of simplitigs. Therefore, if citing this general notion, please also cite:
- Brinda K, Baym M, and Kucherov G,Simplitigs as an efficient and scalable representation of de Bruijn graphs, bioRxiv 2020.
About
A fork of the originalhttps://github.com/medvedevgroup/UST tool.
Topics
Resources
License
Stars
Watchers
Forks
Releases
Packages0
Languages
- C++99.4%
- Makefile0.6%