Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
/USTPublic
forked frommedvedevgroup/UST

A fork of the originalhttps://github.com/medvedevgroup/UST tool.

License

NotificationsYou must be signed in to change notification settings

jermp/UST

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

UST is a bioinformatics tool for constructing a spectrum-preserving string set (SPSS) representation from sets of k-mers.

Quick start

To install, compile from source:

git clone https://github.com/jermp/USTcd USTmake

After compiling, use

./ust -i [unitigs.fa] -k [kmer_size]

e.g.

./ust -i examples/k11.unitigs.fa -k 11

The important parameters are:

  • k [int] : The k-mer size that was used to generate the input, i.e. the length of the nodes of the node-centric de Bruijn graph.
  • i [input-file] : Unitigs file produced byBCALM2 in FASTA format.
  • a [0 or 1] : Default is 0. A value of 1 tells UST to preserve abundance. Use this option when the input file was generated with the-all-abundance-counts option of BCALM2.

The output is a FASTA file with extenstion "ust.fa" in the working folder, which is the SPSS representaiton of the input.

If the program is run with the option-a 1, then the header line of each sequence will also contain the abundance counts asin the provided BCALM input file.

Detailed Usage

In order to build a SPSS representation for your k-mer set, you must first runBCALM2 on your set of k-mers. BCALM2 will construct a set of unitigs. Those unitigs are then fed as input toust, which outputs a FASTA file with the SPSS representation. Note that the k parameter toust must match the-kmer-size used when running BCALM2.

If you would like to store the data on disk in compressed form (like UST-Compress in our paper), you can then install and runMFCompress on the output of UST as follows:MFCompressC mykmers.ust.fa

If you would like to build a membership data structure based on UST, then see theSSHash repository.

Citation

If using UST in your research, please cite

@inproceedings{RahmanMedvedevRECOMB20,  author    = {Amatur Rahman and Paul Medvedev},  title     = {Representation of $k$-mer sets using spectrum-preserving string sets},  booktitle = {Research in Computational Molecular Biology - 24th Annual International Conference, {RECOMB} 2020, Padua, Italy, May 10-13, 2020, Proceedings},  series    = {Lecture Notes in Computer Science},  volume    = {12074},  pages     = {152--168},  publisher = {Springer},  year      = {2020}

Note that the general notion of an SPSS was independently introduced under the name of simplitigs. Therefore, if citing this general notion, please also cite:

Releases

No releases published

Packages

No packages published

Languages

  • C++99.4%
  • Makefile0.6%

[8]ページ先頭

©2009-2025 Movatter.jp