Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

Evaluation tools for "A performant bridge between fixed-size and variable-size seeding"

License

NotificationsYou must be signed in to change notification settings

ITBE-Lab/seed-evaluation

Repository files navigation

This repository contains the scripts for the experiments performed inA performant bridge between fixed-size and variable-size seeding.

❗ The pseudocode of Algorithm 2b contains an error: The Build-Max-Heap operation in line 7 should be "descending byl; for equall descending byq" (more info below)

Requirements

NameLinux install commandRecommended versionRemarks
This repogit clone https://github.com/ITBE-Lab/seed-evaluationlatestCore evaluation scripts.
DWGSIMsudo apt-get install zlib1g-dev; sudo apt-get install libncurses5-dev; git clone --recursive https://github.com/nh13/DWGSIM; make -j$(nproc)0.1.11Tool for generating Illumina reads.
SURVIVORgit clone https://github.com/ITBE-Lab/SURVIVOR; cd SURVIVOR/Debug; make -j$(nproc); cd ..; unzip *.zip1.0.5Tool for generating PacBio reads. (Modified by us, to generate specific amounts of reads.)
Python 3sudo apt-get install python33.5.3Python 3 environment.
Bokehsudo apt-get install python3-pip; pip3 install bokeh1.4.0Plotting library.
MA - The Modular Alignersee below1.1.1-ef9ab22C++ library implementing all runtime critical code.
cmakesudo apt-get install cmake3.13.2Compiling MA.

Our testing environment: Debian GNU/Linux with a 4.9.0 kernel.

Installing MA - The Modular Aligner

The MA github page can be foundhere.
The following sequence of commands creates the MA library:

git clone https://github.com/ITBE-Lab/MAgit checkout b7cf5e7            # commit used for experimentsmkdir buildcd buildcmake -DWITH_PYTHON=ON ../MA/   # with python required for evaluation scriptsmake -j$(nproc)cd ..export PYTHONPATH=$PYTHONPATH:`pwd`/build:`pwd`/MA/python   # setup system environment

Type./build/maCMD for checking if MA was built successfully.
If you get an error during the cmake setup or compilation, here are some things that might have gone wrong:

  • MA is written in C++17, so you will need an appropriate compiler. We recommend GCC 6.3.0 or above.
  • Multiple Python 3 instances on your system can confuse cmake.
  • The export command (last line in the above script) is not persistent between different terminals and logins.

Configuring the Python scripts

config.py contains the configuration:

  • Setdwgsim_str,survivor_str andsurvivor_error_profile to the appropriate paths for your system.
  • Setprefix to the folder that shall contain the temporary data and output data.
  • Create an FMD-Index via./build/maCMD --Create_Index <fasta_file_name>,<output_folder>,<index_name>, where<fasta_file_name> is the FASTA file containing the reference genome.
    Setreference_genome_path to<output_folder>/<index_name>.

Running the experiments

Once everything is configured, you can runpython3 compute_times.py.
This will trigger 4 functions (very bottom of the script):

  • read_generation: Generates the reads for all experiments and saves them in theprefix folder. Hence, this needs to be run only once and can be removed once the reads have been generated.
  • runtime_analysis: Performs the time evaluation (Figure 4 of the manuscript) and generates all indices.
  • seed_entropy_analysis: Performs the entropy analysis (Figure 5 of the manuscript).
  • seed_set_diff_analysis: Performs the seed set difference analysis (Figure 3 of the manuscript).

Tip: Double clicking on a plot will toggle the legend visibility.

Algorithm 2b

The published pseudocode for extracting maximal spanning seeds from MEMs contains an error:
The Build-Max-Heap operation in line 7 should be "descending byl; for equall descending byq" so that thefirst element in the max-heap is the largest seed (there can be multiple seeds with the same size) that reaches the furthest right.
This error was only in the pseudocode; the actual implementation and measurements are correct.

Further, a max-heap is not actually required. Instead a single iteration overT is enough to extract all relevant seeds:Alt text

We would like to thank Roman Cheplyaka for pointing out our mistake as well as this optimization.

About

Evaluation tools for "A performant bridge between fixed-size and variable-size seeding"

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages


[8]ページ先頭

©2009-2025 Movatter.jp