- Notifications
You must be signed in to change notification settings - Fork2
Reader / Writer for UCSC 2-bit Genome Format
License
NotificationsYou must be signed in to change notification settings
weng-lab/TwoBit
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Version 2.0.10-dev
This is a (work in progress) reader implementation, based on thespecs at UCSC,http://genome.ucsc.edu/FAQ/FAQformat#format7. Theimplementation is intended to be more or less thread safe; to achievethat, each TwoBitSequence has it's own read-only file handle to the 2-bitdata.
git clone https://github.com/weng-lab/TwoBit.gitcd TwoBitgit checkout -b develop origin/develop #switch to develop branch#configure, compiler can be set or environmental CC and CXX will be usedCC=gcc-7 CXX=g++-7 ./configure.py #download cppitertools, cppprogutils./setup.py --compfile compfile.mk --outMakefile makefile-common.mk #compfile.mk created by configure#compilemake -j 4#binary now located in TwoBit/bin/bin/TwoBit#TwoBit#1) faToTwoBit#2) twoBitToFa
- probably take out buffering and rely on ifstream, seehttps://stackoverflow.com/questions/12757904/how-to-optimize-reading-and-writing-by-playing-with-buffer-size.
- make TwoBitSequence in input stream?https://stackoverflow.com/questions/14086417/how-to-write-custom-input-stream-in-c
The following converts a 2-bit file into FASTA format, on stdout.
TwoBit::TwoBitFile f("/home/vanderva/.ucscgenome/hg19.2bit");std::string buffer;for (const std::string& s : f.sequenceNames()){f[s].getSequence(buffer);std::cout << ">" << s << std::endl;for (uint32_t i = 0; i < buffer.size(); i += 80){std::cout << buffer.substr(i, 80) << '\n';}std::cout.flush();}
About
Reader / Writer for UCSC 2-bit Genome Format
Topics
Resources
License
Stars
Watchers
Forks
Packages0
No packages published