ITBE-Lab/seed-evaluationPublic

NotificationsYou must be signed in to change notification settings
Fork1
Star6

Evaluation tools for "A performant bridge between fixed-size and variable-size seeding"

License

MIT license

6 stars 1 fork Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.vscode		.vscode
LICENSE		LICENSE
README.md		README.md
bokeh_style_helper.py		bokeh_style_helper.py
compute_times.py		compute_times.py
config.py		config.py
dwgsim.py		dwgsim.py
mems_to_max_sp.png		mems_to_max_sp.png
runBackwardMEMOhlebusch.py		runBackwardMEMOhlebusch.py

Repository files navigation

A performant bridge between fixed-size and variable-size seeding

This repository contains the scripts for the experiments performed inA performant bridge between fixed-size and variable-size seeding.

❗ The pseudocode of Algorithm 2b contains an error: The Build-Max-Heap operation in line 7 should be "descending byl; for equall descending byq" (more info below)

Requirements

Name	Linux install command	Recommended version	Remarks
This repo	`git clone https://github.com/ITBE-Lab/seed-evaluation`	latest	Core evaluation scripts.
DWGSIM	`sudo apt-get install zlib1g-dev; sudo apt-get install libncurses5-dev; git clone --recursive https://github.com/nh13/DWGSIM; make -j$(nproc)`	0.1.11	Tool for generating Illumina reads.
SURVIVOR	`git clone https://github.com/ITBE-Lab/SURVIVOR; cd SURVIVOR/Debug; make -j$(nproc); cd ..; unzip *.zip`	1.0.5	Tool for generating PacBio reads. (Modified by us, to generate specific amounts of reads.)
Python 3	`sudo apt-get install python3`	3.5.3	Python 3 environment.
Bokeh	`sudo apt-get install python3-pip; pip3 install bokeh`	1.4.0	Plotting library.
MA - The Modular Aligner	see below	1.1.1-ef9ab22	C++ library implementing all runtime critical code.
cmake	`sudo apt-get install cmake`	3.13.2	Compiling MA.

Our testing environment: Debian GNU/Linux with a 4.9.0 kernel.

Installing MA - The Modular Aligner

The MA github page can be foundhere.
The following sequence of commands creates the MA library:

git clone https://github.com/ITBE-Lab/MAgit checkout b7cf5e7            # commit used for experimentsmkdir buildcd buildcmake -DWITH_PYTHON=ON ../MA/   # with python required for evaluation scriptsmake -j$(nproc)cd ..export PYTHONPATH=$PYTHONPATH:`pwd`/build:`pwd`/MA/python   # setup system environment

Type./build/maCMD for checking if MA was built successfully.
If you get an error during the cmake setup or compilation, here are some things that might have gone wrong:

MA is written in C++17, so you will need an appropriate compiler. We recommend GCC 6.3.0 or above.
Multiple Python 3 instances on your system can confuse cmake.
The export command (last line in the above script) is not persistent between different terminals and logins.

Configuring the Python scripts

config.py contains the configuration:

Setdwgsim_str,survivor_str andsurvivor_error_profile to the appropriate paths for your system.
Setprefix to the folder that shall contain the temporary data and output data.
Create an FMD-Index via./build/maCMD --Create_Index <fasta_file_name>,<output_folder>,<index_name>, where<fasta_file_name> is the FASTA file containing the reference genome.
Setreference_genome_path to<output_folder>/<index_name>.

Running the experiments

Once everything is configured, you can runpython3 compute_times.py.
This will trigger 4 functions (very bottom of the script):

read_generation: Generates the reads for all experiments and saves them in theprefix folder. Hence, this needs to be run only once and can be removed once the reads have been generated.
runtime_analysis: Performs the time evaluation (Figure 4 of the manuscript) and generates all indices.
seed_entropy_analysis: Performs the entropy analysis (Figure 5 of the manuscript).
seed_set_diff_analysis: Performs the seed set difference analysis (Figure 3 of the manuscript).

Tip: Double clicking on a plot will toggle the legend visibility.

Algorithm 2b

The published pseudocode for extracting maximal spanning seeds from MEMs contains an error:
The Build-Max-Heap operation in line 7 should be "descending byl; for equall descending byq" so that thefirst element in the max-heap is the largest seed (there can be multiple seeds with the same size) that reaches the furthest right.
This error was only in the pseudocode; the actual implementation and measurements are correct.

Further, a max-heap is not actually required. Instead a single iteration overT is enough to extract all relevant seeds:

We would like to thank Roman Cheplyaka for pointing out our mistake as well as this optimization.

About

Evaluation tools for "A performant bridge between fixed-size and variable-size seeding"

Releases

No releases published

Packages

No packages published

Languages

Python100.0%

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

A performant bridge between fixed-size and variable-size seeding

Requirements

Installing MA - The Modular Aligner

Configuring the Python scripts

Running the experiments

Algorithm 2b

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Languages

Movatterモバイル変換

License

ITBE-Lab/seed-evaluation

Folders and files

Latest commit

History

Repository files navigation

A performant bridge between fixed-size and variable-size seeding

Requirements

Installing MA - The Modular Aligner

Configuring the Python scripts

Running the experiments

Algorithm 2b

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Languages

Packages