- Notifications
You must be signed in to change notification settings - Fork21
Visualize and annotate genomic coverage with ggplot2
License
Unknown, MIT licenses found
Licenses found
showteeth/ggcoverage
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
The goal ofggcoverage
is to visualize coverage tracks from genomics,transcriptomics or proteomics data. It contains functions to load datafrom BAM, BigWig, BedGraph, txt, or xlsx files, create genome/proteincoverage plots, and add various annotations including base and aminoacid composition, GC content, copy number variation (CNV), genes,transcripts, ideograms, peak highlights, HiC contact maps, contact linksand protein features. It is based on and integrates well withggplot2
.
It contains three main parts:
- Load the data:
ggcoverage
can load BAM, BigWig (.bw), BedGraph,txt/xlsx files from various omics data, including WGS, RNA-seq,ChIP-seq, ATAC-seq, proteomics, et al. - Create omics coverage plot
- Add annotations:
ggcoverage
supports six different annotations:- base and amino acid annotation: Visualize genome coverage atsingle-nucleotide level with bases and amino acids.
- GC annotation: Visualize genome coverage with GC content
- CNV annotation: Visualize genome coverage with copy numbervariation (CNV)
- gene annotation: Visualize genome coverage across genes
- transcription annotation: Visualize genome coverage acrossdifferent transcripts
- ideogram annotation: Visualize the region showing on wholechromosome
- peak annotation: Visualize genome coverage and peak identified
- contact map annotation: Visualize genome coverage with Hi-Ccontact map
- link annotation: Visualize genome coverage with contacts
- peotein feature annotation: Visualize protein coverage withfeatures
ggcoverage
is an R package distributed as part of theCRANrepository. To install the package, startR and enter one of the following commands:
# install via CRAN (not yet available)install.packages("ggcoverage")# OR install via Githubinstall.package("remotes")remotes::install_github("showteeth/ggcoverage")
In general, it isrecommended to install from theGithubrepository (updated moreregularly).
Onceggcoverage
is installed, it can be loaded like every otherpackage:
library("ggcoverage")
ggcoverage
provides twovignettes:
- detailed manual: step-by-step usage
- customize the plot: customize the plot and add additional layers
The RNA-seq data used here is fromTranscription profiling by highthroughput sequencing of HNRNPC knockdown and control HeLacells.We select four samples to use as example:ERR127307_chr14
,ERR127306_chr14
,ERR127303_chr14
,ERR127302_chr14
, and all bamfiles were converted to bigwig files withdeeptools.
Load metadata:
# load metadatameta_file<- system.file("extdata","RNA-seq","meta_info.csv",package="ggcoverage")sample_meta<- read.csv(meta_file)sample_meta#> SampleName Type Group#> 1 ERR127302_chr14 KO_rep1 KO#> 2 ERR127303_chr14 KO_rep2 KO#> 3 ERR127306_chr14 WT_rep1 WT#> 4 ERR127307_chr14 WT_rep2 WT
Load track files:
# track foldertrack_folder<- system.file("extdata","RNA-seq",package="ggcoverage")# load bigwig filetrack_df<- LoadTrackFile(track.folder=track_folder,format="bw",region="chr14:21,677,306-21,737,601",extend=2000,meta.info=sample_meta)# check datahead(track_df)#> seqnames start end width strand score Type Group#> 1 chr14 21675306 21675950 645 * 0 KO_rep1 KO#> 2 chr14 21675951 21676000 50 * 1 KO_rep1 KO#> 3 chr14 21676001 21676100 100 * 2 KO_rep1 KO#> 4 chr14 21676101 21676150 50 * 1 KO_rep1 KO#> 5 chr14 21676151 21677100 950 * 0 KO_rep1 KO#> 6 chr14 21677101 21677200 100 * 2 KO_rep1 KO
Prepare mark region:
# create mark regionmark_region<-data.frame(start= c(21678900,21732001,21737590),end= c(21679900,21732400,21737650),label= c("M1","M2","M3"))# check datamark_region#> start end label#> 1 21678900 21679900 M1#> 2 21732001 21732400 M2#> 3 21737590 21737650 M3
To addgene annotation, the gtf file should containgene_typeandgene_name attributes incolumn 9; to addtranscriptannotation, the gtf file should contain atranscript_nameattribute incolumn 9.
gtf_file<- system.file("extdata","used_hg19.gtf",package="ggcoverage")gtf_gr<-rtracklayer::import.gff(con=gtf_file,format="gtf")
The basic coverage plot hastwo types:
- facet: Create subplot for every track (specified by
facet.key
).This is default. - joint: Visualize all tracks in a single plot.
Create line plot forevery sample (facet.key = "Type"
) and colorbyevery sample (group.key = "Type"
):
basic_coverage<- ggcoverage(data=track_df,plot.type="joint",facet.key="Type",group.key="Type",mark.region=mark_region,range.position="out")basic_coverage
Creategroup average line plot (sample is indicated byfacet.key = "Type"
, group is indicated bygroup.key = "Group"
):
basic_coverage<- ggcoverage(data=track_df,plot.type="joint",facet.key="Type",group.key="Group",joint.avg=TRUE,mark.region=mark_region,range.position="out")basic_coverage
basic_coverage<- ggcoverage(data=track_df,plot.type="facet",mark.region=mark_region,range.position="out")basic_coverage
Change the Y-axis scale label in/out of plot region withrange.position
:
basic_coverage<- ggcoverage(data=track_df,plot.type="facet",mark.region=mark_region,range.position="in")basic_coverage
Shared/Free Y-axis scale withfacet.y.scale
:
basic_coverage<- ggcoverage(data=track_df,plot.type="facet",mark.region=mark_region,range.position="in",facet.y.scale="fixed")basic_coverage
- default behavior is to draw genes (transcripts), exons and UTRs withdifferent line width
- can bec adjusted using
gene.size
,exon.size
andutr.size
parameters - frequency of intermittent arrows (light color) can be adjusted usingthe
arrow.num
andarrow.gap
parameters - genomic features are colored by
strand
by default, which can bechanged using thecolor.by
parameter
basic_coverage+ geom_gene(gtf.gr=gtf_gr)
In “loose” style (default style; each transcript occupies one line):
basic_coverage+ geom_transcript(gtf.gr=gtf_gr,label.vjust=1.5)
In “tight” style (attempted to place non-overlapping transcripts inone line):
basic_coverage+ geom_transcript(gtf.gr=gtf_gr,overlap.style="tight",label.vjust=1.5 )
The ideogram is an overview plot about the respective position on achromosome. The plotting of the ideogram is implemented by theggbio
package. This package needs to be installed separately (it is only‘Suggested’ byggcoverage
).
library(ggbio)#> Loading required package: BiocGenerics#>#> Attaching package: 'BiocGenerics'#> The following objects are masked from 'package:stats':#>#> IQR, mad, sd, var, xtabs#> The following objects are masked from 'package:base':#>#> anyDuplicated, aperm, append, as.data.frame, basename, cbind,#> colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,#> get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply,#> match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,#> Position, rank, rbind, Reduce, rownames, sapply, setdiff, table,#> tapply, union, unique, unsplit, which.max, which.min#> Loading required package: ggplot2#> Registered S3 method overwritten by 'GGally':#> method from#> +.gg ggplot2#> Need specific help about ggbio? try mailing#> the maintainer or visit https://lawremi.github.io/ggbio/#>#> Attaching package: 'ggbio'#> The following objects are masked from 'package:ggplot2':#>#> geom_bar, geom_rect, geom_segment, ggsave, stat_bin, stat_identity,#> xlimbasic_coverage+ geom_gene(gtf.gr=gtf_gr)+ geom_ideogram(genome="hg19",plot.space=0)#> Loading ideogram...#> Loading ranges...#> Scale for x is already present.#> Adding another scale for x, which will replace the existing scale.
basic_coverage+ geom_transcript(gtf.gr=gtf_gr,label.vjust=1.5)+ geom_ideogram(genome="hg19",plot.space=0)#> Loading ideogram...#> Loading ranges...#> Scale for x is already present.#> Adding another scale for x, which will replace the existing scale.
The DNA-seq data used here are fromCopy number workflow,we select tumor sample, and get bin counts withcn.mops::getReadCountsFromBAM
withWL
1000.
# prepare metafilecnv_meta_info<-data.frame(SampleName= c("CNV_example"),Type= c("tumor"),Group= c("tumor"))# track filetrack_file<- system.file("extdata","DNA-seq","CNV_example.txt",package="ggcoverage")# load txt filetrack_df<- LoadTrackFile(track.file=track_file,format="txt",region="chr4:61750000-62,700,000",meta.info=cnv_meta_info)# check datahead(track_df)#> seqnames start end score Type Group#> 1 chr4 61748000 61748000 25 tumor tumor#> 2 chr4 61748001 61749000 24 tumor tumor#> 3 chr4 61749001 61750000 17 tumor tumor#> 4 chr4 61750001 61751000 23 tumor tumor#> 5 chr4 61751001 61752000 14 tumor tumor#> 6 chr4 61752001 61753000 22 tumor tumor
basic_coverage<- ggcoverage(data=track_df,color="grey",mark.region=NULL,range.position="out")basic_coverage
AddGC,ideogram andgene annotations. The plotting of theGC content requires the genome annotation packageBSgenome.Hsapiens.UCSC.hg19
. This package needs to be installedseparately (it is only ‘Suggested’ byggcoverage
).
# load genome datalibrary("BSgenome.Hsapiens.UCSC.hg19")#> Loading required package: BSgenome#> Loading required package: S4Vectors#> Loading required package: stats4#>#> Attaching package: 'S4Vectors'#> The following object is masked from 'package:utils':#>#> findMatches#> The following objects are masked from 'package:base':#>#> expand.grid, I, unname#> Loading required package: IRanges#> Loading required package: GenomeInfoDb#> Loading required package: GenomicRanges#> Loading required package: Biostrings#> Loading required package: XVector#>#> Attaching package: 'Biostrings'#> The following object is masked from 'package:base':#>#> strsplit#> Loading required package: BiocIO#> Loading required package: rtracklayer#>#> Attaching package: 'rtracklayer'#> The following object is masked from 'package:BiocIO':#>#> FileForFormat# create plotbasic_coverage+ geom_gc(bs.fa.seq=BSgenome.Hsapiens.UCSC.hg19)+ geom_gene(gtf.gr=gtf_gr)+ geom_ideogram(genome="hg19")#> Loading ideogram...#> Loading ranges...#> Scale for x is already present.#> Adding another scale for x, which will replace the existing scale.
The DNA-seq data used here are fromGenome-wide copy number analysis ofsingle cells, and theaccession number isSRR054616.
# track filetrack_file<- system.file("extdata","DNA-seq","SRR054616.bw",package="ggcoverage")# load tracktrack_df<- LoadTrackFile(track.file=track_file,format="bw",region="4:1-160000000")#> No metadata provided, returning coverage as is.# add chr prefixtrack_df$seqnames<- paste0("chr",track_df$seqnames)# check datahead(track_df)#> seqnames start end width strand score Type Group#> 1 chr4 1 50000 50000 * 197 SRR054616.bw SRR054616.bw#> 2 chr4 50001 100000 50000 * 598 SRR054616.bw SRR054616.bw#> 3 chr4 100001 150000 50000 * 287 SRR054616.bw SRR054616.bw#> 4 chr4 150001 200000 50000 * 179 SRR054616.bw SRR054616.bw#> 5 chr4 200001 250000 50000 * 282 SRR054616.bw SRR054616.bw#> 6 chr4 250001 300000 50000 * 212 SRR054616.bw SRR054616.bw
basic_coverage<- ggcoverage(data=track_df,color="grey",mark.region=NULL,range.position="out")basic_coverage
# prepare filescnv_file<- system.file("extdata","DNA-seq","SRR054616_copynumber.txt",package="ggcoverage" )# read CNVcnv_df<- read.table(file=cnv_file,sep="\t",header=TRUE)# check datahead(cnv_df)#> chrom chrompos cn.ratio copy.number#> 1 chr4 1 11.518554 5#> 2 chr4 90501 5.648878 5#> 3 chr4 145220 4.031609 5#> 4 chr4 209519 5.005852 5#> 5 chr4 268944 4.874096 5#> 6 chr4 330272 4.605368 5
AddGC,ideogram andCNV annotations.
# create plotbasic_coverage+ geom_gc(bs.fa.seq=BSgenome.Hsapiens.UCSC.hg19)+ geom_cnv(cnv.df=cnv_df,bin.col=3,cn.col=4 )+ geom_ideogram(genome="hg19",plot.space=0,highlight.centromere=TRUE )#> Loading ideogram...#> Loading ranges...#> Scale for x is already present.#> Adding another scale for x, which will replace the existing scale.
# prepare sample metadatasample_meta<-data.frame(SampleName= c("tumorA.chr4.selected"),Type= c("tumorA"),Group= c("tumorA"))# load bam filebam_file<- system.file("extdata","DNA-seq","tumorA.chr4.selected.bam",package="ggcoverage")track_df<- LoadTrackFile(track.file=bam_file,meta.info=sample_meta,single.nuc=TRUE,single.nuc.region="chr4:62474235-62474295")#> No 'region' specified; extracting coverage for an example range#> (<=100,000 bases, first annotated sequence)#> Coverage extracted from sequence/chromosome: chr10head(track_df)#> seqnames start end width strand score Type Group#> 1 chr4 62474235 62474236 1 * 5 tumorA tumorA#> 2 chr4 62474236 62474237 1 * 5 tumorA tumorA#> 3 chr4 62474237 62474238 1 * 5 tumorA tumorA#> 4 chr4 62474238 62474239 1 * 6 tumorA tumorA#> 5 chr4 62474239 62474240 1 * 6 tumorA tumorA#> 6 chr4 62474240 62474241 1 * 6 tumorA tumorA
For base and amino acid annotation, the package comes with the followingdefault color schemes. Color schemes can be changed withnuc.color
andaa.color
parameters.
THe default color scheme for base annotation isClustal-style
, morepopular color schemes are availablehere.
# color schemenuc_color<- c("A"="#ff2b08","C"="#009aff","G"="#ffb507","T"="#00bc0d")# create plotgraphics::image( seq_along(nuc_color),1, as.matrix(seq_along(nuc_color)),col=nuc_color,xlab="",ylab="",xaxt="n",yaxt="n",bty="n")graphics::text(seq_along(nuc_color),1, names(nuc_color))graphics::mtext(text="Base",adj=1,las=1,side=2)
Default color scheme for amino acid annotation is fromResidualcolours: a proposal foraminochromography:
aa_color<- c("D"="#FF0000","S"="#FF2400","T"="#E34234","G"="#FF8000","P"="#F28500","C"="#FFFF00","A"="#FDFF00","V"="#E3FF00","I"="#C0FF00","L"="#89318C","M"="#00FF00","F"="#50C878","Y"="#30D5C8","W"="#00FFFF","H"="#0F2CB3","R"="#0000FF","K"="#4b0082","N"="#800080","Q"="#FF00FF","E"="#8F00FF","*"="#FFC0CB",""="#FFFFFF",""="#FFFFFF",""="#FFFFFF",""="#FFFFFF")graphics::image(1:5,1:5,matrix(seq_along(aa_color),nrow=5),col= rev(aa_color),xlab="",ylab="",xaxt="n",yaxt="n",bty="n")graphics::text(expand.grid(1:5,1:5), names(rev(aa_color)))graphics::mtext(text="Amino acids",adj=1,las=1,side=2)
Use twill to mark position with SNV:
# create plot with twill markggcoverage(data=track_df,color="grey",range.position="out",single.nuc=TRUE,rect.color="white")+ geom_base(bam.file=bam_file,bs.fa.seq=BSgenome.Hsapiens.UCSC.hg19,mark.type="twill" )+ geom_ideogram(genome="hg19",plot.space=0)#> Loading ideogram...#> Loading ranges...#> Scale for x is already present.#> Adding another scale for x, which will replace the existing scale.
Use star to mark position with SNV:
# create plot with star markggcoverage(data=track_df,color="grey",range.position="out",single.nuc=TRUE,rect.color="white")+ geom_base(bam.file=bam_file,bs.fa.seq=BSgenome.Hsapiens.UCSC.hg19,mark.type="star" )+ geom_ideogram(genome="hg19",plot.space=0)#> Loading ideogram...#> Loading ranges...#> Scale for x is already present.#> Adding another scale for x, which will replace the existing scale.
Highlight position with SNV:
# highlight one baseggcoverage(data=track_df,color="grey",range.position="out",single.nuc=TRUE,rect.color="white")+ geom_base(bam.file=bam_file,bs.fa.seq=BSgenome.Hsapiens.UCSC.hg19,mark.type="highlight" )+ geom_ideogram(genome="hg19",plot.space=0)#> Loading ideogram...#> Loading ranges...#> Scale for x is already present.#> Adding another scale for x, which will replace the existing scale.
The ChIP-seq data used here is fromDiffBind.Four samples are selected as examples:Chr18_MCF7_input
,Chr18_MCF7_ER_1
,Chr18_MCF7_ER_3
,Chr18_MCF7_ER_2
, and all bamfiles were converted to bigwig files withdeeptools.
Create metadata:
# load metadatasample_meta<-data.frame(SampleName= c("Chr18_MCF7_ER_1","Chr18_MCF7_ER_2","Chr18_MCF7_ER_3","Chr18_MCF7_input" ),Type= c("MCF7_ER_1","MCF7_ER_2","MCF7_ER_3","MCF7_input"),Group= c("IP","IP","IP","Input"))sample_meta#> SampleName Type Group#> 1 Chr18_MCF7_ER_1 MCF7_ER_1 IP#> 2 Chr18_MCF7_ER_2 MCF7_ER_2 IP#> 3 Chr18_MCF7_ER_3 MCF7_ER_3 IP#> 4 Chr18_MCF7_input MCF7_input Input
Load track files:
# track foldertrack_folder<- system.file("extdata","ChIP-seq",package="ggcoverage")# load bigwig filetrack_df<- LoadTrackFile(track.folder=track_folder,format="bw",region="chr18:76822285-76900000",meta.info=sample_meta)# check datahead(track_df)#> seqnames start end width strand score Type Group#> 1 chr18 76820285 76820400 116 * 219.658005 MCF7_ER_1 IP#> 2 chr18 76820401 76820700 300 * 0.000000 MCF7_ER_1 IP#> 3 chr18 76820701 76821000 300 * 439.316010 MCF7_ER_1 IP#> 4 chr18 76821001 76821300 300 * 219.658005 MCF7_ER_1 IP#> 5 chr18 76821301 76821600 300 * 0.000000 MCF7_ER_1 IP#> 6 chr18 76821601 76821900 300 * 219.658005 MCF7_ER_1 IP
Prepare mark region:
# create mark regionmark_region<-data.frame(start= c(76822533),end= c(76823743),label= c("Promoter"))# check datamark_region#> start end label#> 1 76822533 76823743 Promoter
basic_coverage<- ggcoverage(data=track_df,mark.region=mark_region,show.mark.label=FALSE)basic_coverage
Addgene,ideogram andpeak annotations. To create peakannotation, we firstget consensus peaks withMSPC.
# get consensus peak filepeak_file<- system.file("extdata","ChIP-seq","consensus.peak",package="ggcoverage")basic_coverage+ geom_gene(gtf.gr=gtf_gr)+ geom_peak(bed.file=peak_file)+ geom_ideogram(genome="hg19",plot.space=0)#> Loading ideogram...#> Loading ranges...#> Scale for x is already present.#> Adding another scale for x, which will replace the existing scale.
The Hi-C method maps chromosome contacts in eukaryotic cells. For thispurpose, DNA and protein complexes are cross-linked and DNA fragmentsthen purified. As a result, even distant chromatin fragments can befound to interact due to the spatial organization of the DNA andhistones in the cell. Hi-C data shows these interactions for example asa contact map.
The Hi-C data is taken frompyGenomeTracks: reproducible plots formultivariate genomicdatasets.
The Hi-C matrix visualization is implemented byHiCBricks
. This packageneeds to be installed separately (it is only ‘Suggested’ byggcoverage
).
# prepare track dataframetrack_file<- system.file("extdata","HiC","H3K36me3.bw",package="ggcoverage")track_df<- LoadTrackFile(track.file=track_file,format="bw",region="chr2L:8050000-8300000",extend=0)#> No metadata provided, returning coverage as is.track_df$score<- ifelse(track_df$score<0,0,track_df$score)# check the datahead(track_df)#> seqnames start end width strand score Type Group#> 1 chr2L 8050000 8050009 10 * 1.66490245 H3K36me3.bw H3K36me3.bw#> 2 chr2L 8050015 8050049 35 * 1.59976900 H3K36me3.bw H3K36me3.bw#> 3 chr2L 8050057 8050091 35 * 1.60730922 H3K36me3.bw H3K36me3.bw#> 4 chr2L 8050097 8050131 35 * 1.65555012 H3K36me3.bw H3K36me3.bw#> 5 chr2L 8050137 8050171 35 * 1.71025538 H3K36me3.bw H3K36me3.bw#> 6 chr2L 8050176 8050210 35 * 1.75198197 H3K36me3.bw H3K36me3.bw
Matrix:
## matrixhic_mat_file<- system.file("extdata","HiC","HiC_mat.txt",package="ggcoverage")hic_mat<- read.table(file=hic_mat_file,sep="\t")hic_mat<- as.matrix(hic_mat)
Bin table:
## binhic_bin_file<- system.file("extdata","HiC","HiC_bin.txt",package="ggcoverage")hic_bin<- read.table(file=hic_bin_file,sep="\t")colnames(hic_bin)<- c("chr","start","end")hic_bin_gr<-GenomicRanges::makeGRangesFromDataFrame(df=hic_bin)## transfrom funcfailsafe_log10<-function(x) {x[is.na(x)| is.nan(x)| is.infinite(x)]<-0return(log10(x+1))}
Data transfromation method:
# prepare arcslink_file<- system.file("extdata","HiC","HiC_link.bedpe",package="ggcoverage")
basic_coverage<- ggcoverage(data=track_df,color="grey",mark.region=NULL,range.position="out" )basic_coverage
Addlink,contact mapannotations:
library(HiCBricks)#> Loading required package: curl#> Using libcurl 7.81.0 with OpenSSL/3.0.2#> Loading required package: rhdf5#> Loading required package: R6#> Loading required package: grid#>#> Attaching package: 'grid'#> The following object is masked from 'package:Biostrings':#>#> patternbasic_coverage+ geom_tad(matrix=hic_mat,granges=hic_bin_gr,value.cut=0.99,color.palette="viridis",transform.fun=failsafe_log10,top=FALSE,show.rect=TRUE )+ geom_link(link.file=link_file,file.type="bedpe",show.rect=TRUE )#> Read 534 lines after Skipping 0 lines#> Inserting Data at location: 1#> Data length: 534#> Loaded 2315864 bytes of data...#> Read 534 records...#> Scale for y is already present.#> Adding another scale for y, which will replace the existing scale.#> Scale for x is already present.#> Adding another scale for x, which will replace the existing scale.
Massspectrometry(MS) is an important method for the accurate mass determination andcharacterization of proteins, and a variety of methods and instrumentshave been developed for its many uses. Withggcoverage
, we can easilyinspect the peptide coverage of a protein in order to learn about thequality of the data.
The exported coverage fromProteomeDiscoverer:
library(openxlsx)# prepare coverage dataframecoverage_file<- system.file("extdata","Proteomics","MS_BSA_coverage.xlsx",package="ggcoverage" )coverage_df<-openxlsx::read.xlsx(coverage_file,sheet="Sheet1")# check the datahead(coverage_df)#> Confidence Annotated.Sequence#> 1 High [K].ATEEQLKTVMENFVAFVDKCCAADDKEACFAVEGPK.[L]#> 2 High [K].ATEEQLKTVMENFVAFVDKCCAADDKEACFAVEGPK.[L]#> 3 High [K].TVMENFVAFVDKCCAADDKEACFAVEGPK.[L]#> 4 High [K].HLVDEPQNLIKQNCDQFEKLGEYGFQNALIVR.[Y]#> 5 High [R].RHPYFYAPELLYYANKYNGVFQECCQAEDKGACLLPK.[I]#> 6 High [K].AFDEKLFTFHADICTLPDTEKQIKK.[Q]#> Modifications Contaminant#> 1 3xCarbamidomethyl [C20; C21; C29] TRUE#> 2 3xCarbamidomethyl [C20; C21; C29]; 1xOxidation [M10] TRUE#> 3 3xCarbamidomethyl [C13; C14; C22]; 1xOxidation [M3] TRUE#> 4 1xCarbamidomethyl [C14] TRUE#> 5 3xCarbamidomethyl [C24; C25; C33] TRUE#> 6 1xCarbamidomethyl [C14] TRUE#> #.Protein.Groups #.Proteins #.PSMs Master.Protein.Accessions#> 1 1 2 15 ALBU_BOVIN#> 2 1 2 26 ALBU_BOVIN#> 3 1 2 14 ALBU_BOVIN#> 4 1 2 41 ALBU_BOVIN#> 5 1 2 37 ALBU_BOVIN#> 6 1 2 40 ALBU_BOVIN#> Positions.in.Master.Proteins Modifications.in.Master.Proteins#> 1 ALBU_BOVIN [562-597] NA#> 2 ALBU_BOVIN [562-597] NA#> 3 ALBU_BOVIN [569-597] NA#> 4 ALBU_BOVIN [402-433] NA#> 5 ALBU_BOVIN [168-204] NA#> 6 ALBU_BOVIN [524-548] NA#> #.Missed.Cleavages Theo..MH+.[Da] Abundance:.F3:.Sample Quan.Info#> 1 3 4107.88065 18692597.5 <NA>#> 2 3 4123.87556 87767162.0 <NA>#> 3 2 3324.46798 19803927.2 <NA>#> 4 2 3815.91737 204933705.0 <NA>#> 5 3 4513.12024 57012156.5 <NA>#> 6 3 2995.52337 183934556.7 <NA>#> Found.in.Sample:.[S3].F3:.Sample Confidence.(by.Search.Engine):.Sequest.HT#> 1 High High#> 2 High High#> 3 High High#> 4 High High#> 5 High High#> 6 High High#> XCorr.(by.Search.Engine):.Sequest.HT Top.Apex.RT.[min]#> 1 11.96 97.50#> 2 10.91 90.09#> 3 9.89 84.90#> 4 9.75 91.84#> 5 8.94 93.30#> 6 8.90 75.40
The input protein fasta:
fasta_file<- system.file("extdata","Proteomics","MS_BSA_coverage.fasta",package="ggcoverage" )# prepare track dataframeprotein_set<-Biostrings::readAAStringSet(fasta_file)# check the dataprotein_set#> AAStringSet object of length 2:#> width seq names#> [1] 607 MKWVTFISLLLLFSSAYSRGVFR...DDKEACFAVEGPKLVVSTQTALA sp|P02769|ALBU_BOVIN#> [2] 583 DTHKSEIAHRFKDLGEEHFKGLV...DDKEACFAVEGPKLVVSTQTALA decoy
protein_coverage<- ggprotein(coverage.df=coverage_df,fasta.file=fasta_file,protein.id="sp|P02769|ALBU_BOVIN",range.position="out")protein_coverage
We can obtain features of the protein fromUniProt. For example, the above proteincoverage plot shows that there is empty region in 1-24, and this emptyregion inUniProt isannotated as Signal peptide and Propeptide peptide. When the protein ismature and released extracellular, these peptides will be cleaved. Thisis the reason why there is empty region in 1-24.
# protein feature obtained from UniProtprotein_feature_df<-data.frame(ProteinID="sp|P02769|ALBU_BOVIN",start= c(1,19,25),end= c(18,24,607),Type= c("Signal","Propeptide","Chain"))# add annotationprotein_coverage+ geom_feature(feature.df=protein_feature_df,feature.color= c("#4d81be","#173b5e","#6a521d") )
Please note that theggcoverage
project is released with aContributor Code ofConduct.By contributing to this project, you agree to abide by its terms.
About
Visualize and annotate genomic coverage with ggplot2