Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

cool BED-to-GFF3 converter that runs in parallel

License

NotificationsYou must be signed in to change notification settings

alejandrogzi/bed2gff

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

48 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Crates.ioGitHubCrates.io Total DownloadsConda Platform

bed2gff

A Rust BED-to-GFF3 parallel translator.

translates

chr7 56766360 56805692 ENST00000581852.25 1000 + 56766360 56805692 0,0,200 3 3,135,81, 0,496,39251,

into

chr7 bed2gff gene 56399404 56805892 . + . ID=ENSG00000166960;gene_id=ENSG00000166960chr7 bed2gff transcript 56766361 56805692 . + . ID=ENST00000581852.25;Parent=ENSG00000166960;gene_id=ENSG00000166960;transcript_id=ENST00000581852.25chr7 bed2gff exon 56766361 56766363 . + . ID=exon:ENST00000581852.25.1;Parent=ENST00000581852.25;gene_id=ENSG00000166960;transcript_id=ENST00000581852.25,exon_number=1chr7 bed2gff CDS 56766361 56766363 . + 0 ID=CDS:ENST00000581852.25.1;Parent=ENST00000581852.25;gene_id=ENSG00000166960;transcript_id=ENST00000581852.25,exon_number=1...chr7 bed2gff start_codon 56766361 56766363 . + 0 ID=start_codon:ENST00000581852.25.1;Parent=ENST00000581852.25;gene_id=ENSG00000166960;transcript_id=ENST00000581852.25,exon_number=1chr7 bed2gff stop_codon 56805690 56805692 . + 0 ID=stop_codon:ENST00000581852.25.3;Parent=ENST00000581852.25;gene_id=ENSG00000166960;transcript_id=ENST00000581852.25,exon_number=3...

in a few seconds.

Converts

  • Homo sapiens GRCh38 GENCODE 44 (252,835 transcripts) in 4.16 seconds.
  • Mus musculus GRCm39 GENCODE 44 (149,547 transcritps) in 2.15 seconds.
  • Canis lupus familiaris ROS_Cfam_1.0 Ensembl 110 (55,335 transcripts) in 1.30 seconds.
  • Gallus gallus bGalGal1 Ensembl 110 (72,689 transcripts) in 1.51 seconds.

What's new on v.0.1.5

  • Adds--no-gene flag to only perform conversion without isoforms!
  • Modifies-i to be required unless--no-gene mode is present.
  • Refactors BedRecord.

Usage

Usage:     a) bed2gff[EXE] --bed <BED> --isoforms <ISOFORMS> --output <OUTPUT>    b) bed2gff[EXE] --bed <BED> --output <OUTPUT> --no-geneArguments:    -b, --bed <BED>: a .bed file    -i, --isoforms <ISOFORMS>: a tab-delimited file    -o, --output <OUTPUT>: path to output file    -n, --no-gene <FLAG>: Flag to disable gene_id feature [default: false]Options:    --help: print help    --version: print version    --threads/-t: number of threads (default: max cpus)    --gz: compress output .gtf

Warning

All the transcripts in .bed file should appear in the isoforms file.

click for detailed formats

bed2gff just needs two files:

  1. a .bed file

    tab-delimited files with 3 required and 9 optional fields:

    chrom   chromStart  chromEnd      name    ...  |         |           |           |chr20   50222035    50222038    ENST00000595977    ...

    seeBED format for more information

  2. a tab-delimited .txt/.tsv/.csv/... file with genes/isoforms (all the transcripts in .bed file should appear in the isoforms file):

    > cat isoforms.txtENSG00000198888 ENST00000361390ENSG00000198763 ENST00000361453ENSG00000198804 ENST00000361624ENSG00000188868 ENST00000595977

    you can build a custom file for your preferred species usingEnsembl BioMart.

Installation

to install bed2gff on your system follow this steps:

  1. get rust:curl https://sh.rustup.rs -sSf | sh on unix, or gohere for other options
  2. runcargo install bed2gff (make sure~/.cargo/bin is in your$PATH before running it)
  3. usebed2gff with the required arguments
  4. enjoy!

Build

to build bed2gff from this repo, do:

  1. get rust (as described above)
  2. rungit clone https://github.com/alejandrogzi/bed2gff.git && cd bed2gff
  3. runcargo run --release -- -b <BED> -i <ISOFORMS> -o <OUTPUT>

Container image

to build the development container image:

  1. rungit clone https://github.com/alejandrogzi/bed2gff.git && cd bed2gff
  2. initialize docker withstart docker orsystemctl start docker
  3. build the imagedocker image build --tag bed2gff .
  4. rundocker run --rm -v "[dir_where_your_gtf_is]:/dir" bed2gff -b /dir/<BED> -i /dir/<ISOFORMS> -o /dir/<OUTPUT>

Conda

to use bed2gff through Conda just:

  1. conda install bed2gff -c bioconda orconda create -n bed2gff -c bioconda bed2gff

Output

bed2gff will send the output directly to the same .bed file path if you specify so

bed2gff annotation.bed isoforms.txt output.gff.├── ...├── isoforms.txt├── annotation.bed└── output.gff3

whereoutput.gff3 is the result.

FAQ

Why?

Converting formats is a daily practice in bioinformatics. This is way more common while working with gene annotations as tools differ in input/output layouts. GTF/GFF/BED are the most used structures to store gene-related annotations and the conversion needs are not well covered by available software.

A considerable portion of genomic tools reduce the software space by accepting GTF/GFF3 files only, directing BED users to translate their files into different formats. While some of this issues have already been covered (e.g.bed2gtf) with GTF files, the GFF3 layout lacks stable converting tools (1, 2).

bed2gff is presented as a straightforward option to convert BED files into ready-to-use GFF3 files, closing that gap.

How?

bed2gff, takes the base code ofbed2gtf, that basically is the reimplementation of UCSC's C binaries merged in 1 step (bedToGenePred + genePredToGtf). This tool evaluates the position of exons and other features (CDS, stop/start, UTRs), preserving reading frames and adjusting the indexing count. The main approach now is a parallel algorithm that significantly reduces computation times.

Following the rationale ofbed2gtf, bed2gff is able to produce a ready-to-use gff3 file by using an isoforms file, that works as the refTable in C binaries to map each transcript to their respective gene.

References

  1. https://bioinformatics.stackexchange.com/questions/2242/how-to-convert-bed-to-gff3
  2. https://www.biostars.org/p/2/

[8]ページ先頭

©2009-2025 Movatter.jp