- Notifications
You must be signed in to change notification settings - Fork52
Extract 3'UTR, 5'UTR, CDS, Promoter, Genes, Introns, Exons from GTF files
License
NotificationsYou must be signed in to change notification settings
saketkc/gencode_regions
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Extract 3'UTR, 5'UTR, CDS, Promoter, Genes from GTF files.
If you only care about the final output, they are hosted build and GTF version wise onriboraptor.
We recommend setting up aconda environment withPython>=3
andPython<=3.7
withgffutilsv0.9
andpybedtools:
conda create --name gencode_env python=3.7conda activate gencode_envconda install -c bioconda gffutils=0.9 pybedtools
The corresponding output gzipped beds are in thedata directory.
- r>=3.2.1
- GenomicFeatures
./create_regions_from_gencode.R <path_to_GFF/GTF> <path_to_output_dir>
Will createexons.bed, 3UTR.bed, 5UTR.bed, genes.bed, cds.bed
in<output_dir>
- Download GFF/GTF(GRCh37, v25, comprehensive, CHR) from gencodegenes.org:
wget ftp://ftp.sanger.ac.uk/pub/gencode/Gencode_human/release_25/gencode.v25.annotation.gff3.gz \ && gunzip gencode.v25.annotation.gff3.gz
- Create regions:
./create_regions_from_gencode.R gencode.v25.annotation.gff3 /path/to/GRCh37/annotation
We useGenePred
format to make the process a bit simple.
DownloadgtfToGenePred
Convert gtf to GenePred:
gtfToGenePred gencode.v25.annotation.gtf gencode.v25.annotation.genepred
Extract
first exons
:python genepred_to_bed.py --first_exon gencode.v25.annotation.genepred
Extract
last exons
:python genepred_to_bed.py --last_exon gencode.v25.annotation.genepred
This should be helpful:
or probably this: