showteeth/ggcoveragePublic

NotificationsYou must be signed in to change notification settings
Fork21
Star261

Visualize and annotate genomic coverage with ggplot2

License

Unknown, MIT licenses found

Licenses found

261 stars 21 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 111 Commits
.github/workflows		.github/workflows
R		R
docs		docs
inst/extdata		inst/extdata
man		man
vignettes		vignettes
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
LICENSE.md		LICENSE.md
NAMESPACE		NAMESPACE
NEWS.md		NEWS.md
README.Rmd		README.Rmd
README.md		README.md
_pkgdown.yml		_pkgdown.yml
cran-comments.md		cran-comments.md
ggcoverage.Rproj		ggcoverage.Rproj

Repository files navigation

ggcoverage - Visualize and annotate omics coverage with ggplot2

Introduction

The goal ofggcoverage is to visualize coverage tracks from genomics,transcriptomics or proteomics data. It contains functions to load datafrom BAM, BigWig, BedGraph, txt, or xlsx files, create genome/proteincoverage plots, and add various annotations including base and aminoacid composition, GC content, copy number variation (CNV), genes,transcripts, ideograms, peak highlights, HiC contact maps, contact linksand protein features. It is based on and integrates well withggplot2.

It contains three main parts:

Load the data:ggcoverage can load BAM, BigWig (.bw), BedGraph,txt/xlsx files from various omics data, including WGS, RNA-seq,ChIP-seq, ATAC-seq, proteomics, et al.
Create omics coverage plot
Add annotations:ggcoverage supports six different annotations:
- base and amino acid annotation: Visualize genome coverage atsingle-nucleotide level with bases and amino acids.
- GC annotation: Visualize genome coverage with GC content
- CNV annotation: Visualize genome coverage with copy numbervariation (CNV)
- gene annotation: Visualize genome coverage across genes
- transcription annotation: Visualize genome coverage acrossdifferent transcripts
- ideogram annotation: Visualize the region showing on wholechromosome
- peak annotation: Visualize genome coverage and peak identified
- contact map annotation: Visualize genome coverage with Hi-Ccontact map
- link annotation: Visualize genome coverage with contacts
- peotein feature annotation: Visualize protein coverage withfeatures

Installation

ggcoverage is an R package distributed as part of theCRANrepository. To install the package, startR and enter one of the following commands:

# install via CRAN (not yet available)install.packages("ggcoverage")# OR install via Githubinstall.package("remotes")remotes::install_github("showteeth/ggcoverage")

In general, it isrecommended to install from theGithubrepository (updated moreregularly).

Onceggcoverage is installed, it can be loaded like every otherpackage:

library("ggcoverage")

Manual

ggcoverage provides twovignettes:

detailed manual: step-by-step usage
customize the plot: customize the plot and add additional layers

RNA-seq data

Load the data

The RNA-seq data used here is fromTranscription profiling by highthroughput sequencing of HNRNPC knockdown and control HeLacells.We select four samples to use as example:ERR127307_chr14,ERR127306_chr14,ERR127303_chr14,ERR127302_chr14, and all bamfiles were converted to bigwig files withdeeptools.

Load metadata:

# load metadatameta_file<-  system.file("extdata","RNA-seq","meta_info.csv",package="ggcoverage")sample_meta<- read.csv(meta_file)sample_meta#>        SampleName    Type Group#> 1 ERR127302_chr14 KO_rep1    KO#> 2 ERR127303_chr14 KO_rep2    KO#> 3 ERR127306_chr14 WT_rep1    WT#> 4 ERR127307_chr14 WT_rep2    WT

Load track files:

# track foldertrack_folder<- system.file("extdata","RNA-seq",package="ggcoverage")# load bigwig filetrack_df<- LoadTrackFile(track.folder=track_folder,format="bw",region="chr14:21,677,306-21,737,601",extend=2000,meta.info=sample_meta)# check datahead(track_df)#>   seqnames    start      end width strand score    Type Group#> 1    chr14 21675306 21675950   645      *     0 KO_rep1    KO#> 2    chr14 21675951 21676000    50      *     1 KO_rep1    KO#> 3    chr14 21676001 21676100   100      *     2 KO_rep1    KO#> 4    chr14 21676101 21676150    50      *     1 KO_rep1    KO#> 5    chr14 21676151 21677100   950      *     0 KO_rep1    KO#> 6    chr14 21677101 21677200   100      *     2 KO_rep1    KO

Prepare mark region:

# create mark regionmark_region<-data.frame(start= c(21678900,21732001,21737590),end= c(21679900,21732400,21737650),label= c("M1","M2","M3"))# check datamark_region#>      start      end label#> 1 21678900 21679900    M1#> 2 21732001 21732400    M2#> 3 21737590 21737650    M3

Load GTF

To addgene annotation, the gtf file should containgene_typeandgene_name attributes incolumn 9; to addtranscriptannotation, the gtf file should contain atranscript_nameattribute incolumn 9.

gtf_file<-  system.file("extdata","used_hg19.gtf",package="ggcoverage")gtf_gr<-rtracklayer::import.gff(con=gtf_file,format="gtf")

Basic coverage

The basic coverage plot hastwo types:

facet: Create subplot for every track (specified byfacet.key).This is default.
joint: Visualize all tracks in a single plot.

joint view

Create line plot forevery sample (facet.key = "Type") and colorbyevery sample (group.key = "Type"):

basic_coverage<- ggcoverage(data=track_df,plot.type="joint",facet.key="Type",group.key="Type",mark.region=mark_region,range.position="out")basic_coverage

Creategroup average line plot (sample is indicated byfacet.key = "Type", group is indicated bygroup.key = "Group"):

basic_coverage<- ggcoverage(data=track_df,plot.type="joint",facet.key="Type",group.key="Group",joint.avg=TRUE,mark.region=mark_region,range.position="out")basic_coverage

Facet view

basic_coverage<- ggcoverage(data=track_df,plot.type="facet",mark.region=mark_region,range.position="out")basic_coverage

Custom Y-axis style

Change the Y-axis scale label in/out of plot region withrange.position:

basic_coverage<- ggcoverage(data=track_df,plot.type="facet",mark.region=mark_region,range.position="in")basic_coverage

Shared/Free Y-axis scale withfacet.y.scale:

basic_coverage<- ggcoverage(data=track_df,plot.type="facet",mark.region=mark_region,range.position="in",facet.y.scale="fixed")basic_coverage

Add gene annotation

default behavior is to draw genes (transcripts), exons and UTRs withdifferent line width
can bec adjusted usinggene.size,exon.size andutr.sizeparameters
frequency of intermittent arrows (light color) can be adjusted usingthearrow.num andarrow.gap parameters
genomic features are colored bystrand by default, which can bechanged using thecolor.by parameter

basic_coverage+  geom_gene(gtf.gr=gtf_gr)

Add transcript annotation

In “loose” style (default style; each transcript occupies one line):

basic_coverage+  geom_transcript(gtf.gr=gtf_gr,label.vjust=1.5)

In “tight” style (attempted to place non-overlapping transcripts inone line):

basic_coverage+  geom_transcript(gtf.gr=gtf_gr,overlap.style="tight",label.vjust=1.5  )

Add ideogram

The ideogram is an overview plot about the respective position on achromosome. The plotting of the ideogram is implemented by theggbiopackage. This package needs to be installed separately (it is only‘Suggested’ byggcoverage).

library(ggbio)#> Loading required package: BiocGenerics#>#> Attaching package: 'BiocGenerics'#> The following objects are masked from 'package:stats':#>#>     IQR, mad, sd, var, xtabs#> The following objects are masked from 'package:base':#>#>     anyDuplicated, aperm, append, as.data.frame, basename, cbind,#>     colnames, dirname, do.call, duplicated, eval, evalq, Filter, Find,#>     get, grep, grepl, intersect, is.unsorted, lapply, Map, mapply,#>     match, mget, order, paste, pmax, pmax.int, pmin, pmin.int,#>     Position, rank, rbind, Reduce, rownames, sapply, setdiff, table,#>     tapply, union, unique, unsplit, which.max, which.min#> Loading required package: ggplot2#> Registered S3 method overwritten by 'GGally':#>   method from#>   +.gg   ggplot2#> Need specific help about ggbio? try mailing#>  the maintainer or visit https://lawremi.github.io/ggbio/#>#> Attaching package: 'ggbio'#> The following objects are masked from 'package:ggplot2':#>#>     geom_bar, geom_rect, geom_segment, ggsave, stat_bin, stat_identity,#>     xlimbasic_coverage+  geom_gene(gtf.gr=gtf_gr)+  geom_ideogram(genome="hg19",plot.space=0)#> Loading ideogram...#> Loading ranges...#> Scale for x is already present.#> Adding another scale for x, which will replace the existing scale.

basic_coverage+  geom_transcript(gtf.gr=gtf_gr,label.vjust=1.5)+  geom_ideogram(genome="hg19",plot.space=0)#> Loading ideogram...#> Loading ranges...#> Scale for x is already present.#> Adding another scale for x, which will replace the existing scale.

DNA-seq data

CNV

Example 1

Load the data

The DNA-seq data used here are fromCopy number workflow,we select tumor sample, and get bin counts withcn.mops::getReadCountsFromBAM withWL 1000.

# prepare metafilecnv_meta_info<-data.frame(SampleName= c("CNV_example"),Type= c("tumor"),Group= c("tumor"))# track filetrack_file<- system.file("extdata","DNA-seq","CNV_example.txt",package="ggcoverage")# load txt filetrack_df<- LoadTrackFile(track.file=track_file,format="txt",region="chr4:61750000-62,700,000",meta.info=cnv_meta_info)# check datahead(track_df)#>   seqnames    start      end score  Type Group#> 1     chr4 61748000 61748000    25 tumor tumor#> 2     chr4 61748001 61749000    24 tumor tumor#> 3     chr4 61749001 61750000    17 tumor tumor#> 4     chr4 61750001 61751000    23 tumor tumor#> 5     chr4 61751001 61752000    14 tumor tumor#> 6     chr4 61752001 61753000    22 tumor tumor

Basic coverage

basic_coverage<- ggcoverage(data=track_df,color="grey",mark.region=NULL,range.position="out")basic_coverage

Add GC annotations

AddGC,ideogram andgene annotations. The plotting of theGC content requires the genome annotation packageBSgenome.Hsapiens.UCSC.hg19. This package needs to be installedseparately (it is only ‘Suggested’ byggcoverage).

# load genome datalibrary("BSgenome.Hsapiens.UCSC.hg19")#> Loading required package: BSgenome#> Loading required package: S4Vectors#> Loading required package: stats4#>#> Attaching package: 'S4Vectors'#> The following object is masked from 'package:utils':#>#>     findMatches#> The following objects are masked from 'package:base':#>#>     expand.grid, I, unname#> Loading required package: IRanges#> Loading required package: GenomeInfoDb#> Loading required package: GenomicRanges#> Loading required package: Biostrings#> Loading required package: XVector#>#> Attaching package: 'Biostrings'#> The following object is masked from 'package:base':#>#>     strsplit#> Loading required package: BiocIO#> Loading required package: rtracklayer#>#> Attaching package: 'rtracklayer'#> The following object is masked from 'package:BiocIO':#>#>     FileForFormat# create plotbasic_coverage+  geom_gc(bs.fa.seq=BSgenome.Hsapiens.UCSC.hg19)+  geom_gene(gtf.gr=gtf_gr)+  geom_ideogram(genome="hg19")#> Loading ideogram...#> Loading ranges...#> Scale for x is already present.#> Adding another scale for x, which will replace the existing scale.

Example 2

Load the data

The DNA-seq data used here are fromGenome-wide copy number analysis ofsingle cells, and theaccession number isSRR054616.

# track filetrack_file<-  system.file("extdata","DNA-seq","SRR054616.bw",package="ggcoverage")# load tracktrack_df<- LoadTrackFile(track.file=track_file,format="bw",region="4:1-160000000")#> No metadata provided, returning coverage as is.# add chr prefixtrack_df$seqnames<- paste0("chr",track_df$seqnames)# check datahead(track_df)#>   seqnames  start    end width strand score         Type        Group#> 1     chr4      1  50000 50000      *   197 SRR054616.bw SRR054616.bw#> 2     chr4  50001 100000 50000      *   598 SRR054616.bw SRR054616.bw#> 3     chr4 100001 150000 50000      *   287 SRR054616.bw SRR054616.bw#> 4     chr4 150001 200000 50000      *   179 SRR054616.bw SRR054616.bw#> 5     chr4 200001 250000 50000      *   282 SRR054616.bw SRR054616.bw#> 6     chr4 250001 300000 50000      *   212 SRR054616.bw SRR054616.bw

Basic coverage

basic_coverage<- ggcoverage(data=track_df,color="grey",mark.region=NULL,range.position="out")basic_coverage

Load CNV file

# prepare filescnv_file<-  system.file("extdata","DNA-seq","SRR054616_copynumber.txt",package="ggcoverage"  )# read CNVcnv_df<- read.table(file=cnv_file,sep="\t",header=TRUE)# check datahead(cnv_df)#>   chrom chrompos  cn.ratio copy.number#> 1  chr4        1 11.518554           5#> 2  chr4    90501  5.648878           5#> 3  chr4   145220  4.031609           5#> 4  chr4   209519  5.005852           5#> 5  chr4   268944  4.874096           5#> 6  chr4   330272  4.605368           5

Add annotations

AddGC,ideogram andCNV annotations.

# create plotbasic_coverage+  geom_gc(bs.fa.seq=BSgenome.Hsapiens.UCSC.hg19)+  geom_cnv(cnv.df=cnv_df,bin.col=3,cn.col=4  )+  geom_ideogram(genome="hg19",plot.space=0,highlight.centromere=TRUE  )#> Loading ideogram...#> Loading ranges...#> Scale for x is already present.#> Adding another scale for x, which will replace the existing scale.

Single-nucleotide level

Load the data

# prepare sample metadatasample_meta<-data.frame(SampleName= c("tumorA.chr4.selected"),Type= c("tumorA"),Group= c("tumorA"))# load bam filebam_file<- system.file("extdata","DNA-seq","tumorA.chr4.selected.bam",package="ggcoverage")track_df<- LoadTrackFile(track.file=bam_file,meta.info=sample_meta,single.nuc=TRUE,single.nuc.region="chr4:62474235-62474295")#> No 'region' specified; extracting coverage for an example range#> (<=100,000 bases, first annotated sequence)#> Coverage extracted from sequence/chromosome: chr10head(track_df)#>   seqnames    start      end width strand score   Type  Group#> 1     chr4 62474235 62474236     1      *     5 tumorA tumorA#> 2     chr4 62474236 62474237     1      *     5 tumorA tumorA#> 3     chr4 62474237 62474238     1      *     5 tumorA tumorA#> 4     chr4 62474238 62474239     1      *     6 tumorA tumorA#> 5     chr4 62474239 62474240     1      *     6 tumorA tumorA#> 6     chr4 62474240 62474241     1      *     6 tumorA tumorA

Default color scheme

For base and amino acid annotation, the package comes with the followingdefault color schemes. Color schemes can be changed withnuc.color andaa.color parameters.

THe default color scheme for base annotation isClustal-style, morepopular color schemes are availablehere.

# color schemenuc_color<- c("A"="#ff2b08","C"="#009aff","G"="#ffb507","T"="#00bc0d")# create plotgraphics::image(  seq_along(nuc_color),1,  as.matrix(seq_along(nuc_color)),col=nuc_color,xlab="",ylab="",xaxt="n",yaxt="n",bty="n")graphics::text(seq_along(nuc_color),1, names(nuc_color))graphics::mtext(text="Base",adj=1,las=1,side=2)

Default color scheme for amino acid annotation is fromResidualcolours: a proposal foraminochromography:

aa_color<- c("D"="#FF0000","S"="#FF2400","T"="#E34234","G"="#FF8000","P"="#F28500","C"="#FFFF00","A"="#FDFF00","V"="#E3FF00","I"="#C0FF00","L"="#89318C","M"="#00FF00","F"="#50C878","Y"="#30D5C8","W"="#00FFFF","H"="#0F2CB3","R"="#0000FF","K"="#4b0082","N"="#800080","Q"="#FF00FF","E"="#8F00FF","*"="#FFC0CB",""="#FFFFFF",""="#FFFFFF",""="#FFFFFF",""="#FFFFFF")graphics::image(1:5,1:5,matrix(seq_along(aa_color),nrow=5),col= rev(aa_color),xlab="",ylab="",xaxt="n",yaxt="n",bty="n")graphics::text(expand.grid(1:5,1:5), names(rev(aa_color)))graphics::mtext(text="Amino acids",adj=1,las=1,side=2)

Add base and amino acid annotation

Use twill to mark position with SNV:

# create plot with twill markggcoverage(data=track_df,color="grey",range.position="out",single.nuc=TRUE,rect.color="white")+  geom_base(bam.file=bam_file,bs.fa.seq=BSgenome.Hsapiens.UCSC.hg19,mark.type="twill"  )+  geom_ideogram(genome="hg19",plot.space=0)#> Loading ideogram...#> Loading ranges...#> Scale for x is already present.#> Adding another scale for x, which will replace the existing scale.

Use star to mark position with SNV:

# create plot with star markggcoverage(data=track_df,color="grey",range.position="out",single.nuc=TRUE,rect.color="white")+  geom_base(bam.file=bam_file,bs.fa.seq=BSgenome.Hsapiens.UCSC.hg19,mark.type="star"  )+  geom_ideogram(genome="hg19",plot.space=0)#> Loading ideogram...#> Loading ranges...#> Scale for x is already present.#> Adding another scale for x, which will replace the existing scale.

Highlight position with SNV:

# highlight one baseggcoverage(data=track_df,color="grey",range.position="out",single.nuc=TRUE,rect.color="white")+  geom_base(bam.file=bam_file,bs.fa.seq=BSgenome.Hsapiens.UCSC.hg19,mark.type="highlight"  )+  geom_ideogram(genome="hg19",plot.space=0)#> Loading ideogram...#> Loading ranges...#> Scale for x is already present.#> Adding another scale for x, which will replace the existing scale.

ChIP-seq data

The ChIP-seq data used here is fromDiffBind.Four samples are selected as examples:Chr18_MCF7_input,Chr18_MCF7_ER_1,Chr18_MCF7_ER_3,Chr18_MCF7_ER_2, and all bamfiles were converted to bigwig files withdeeptools.

Create metadata:

# load metadatasample_meta<-data.frame(SampleName= c("Chr18_MCF7_ER_1","Chr18_MCF7_ER_2","Chr18_MCF7_ER_3","Chr18_MCF7_input"  ),Type= c("MCF7_ER_1","MCF7_ER_2","MCF7_ER_3","MCF7_input"),Group= c("IP","IP","IP","Input"))sample_meta#>         SampleName       Type Group#> 1  Chr18_MCF7_ER_1  MCF7_ER_1    IP#> 2  Chr18_MCF7_ER_2  MCF7_ER_2    IP#> 3  Chr18_MCF7_ER_3  MCF7_ER_3    IP#> 4 Chr18_MCF7_input MCF7_input Input

Load track files:

# track foldertrack_folder<- system.file("extdata","ChIP-seq",package="ggcoverage")# load bigwig filetrack_df<- LoadTrackFile(track.folder=track_folder,format="bw",region="chr18:76822285-76900000",meta.info=sample_meta)# check datahead(track_df)#>   seqnames    start      end width strand      score      Type Group#> 1    chr18 76820285 76820400   116      * 219.658005 MCF7_ER_1    IP#> 2    chr18 76820401 76820700   300      *   0.000000 MCF7_ER_1    IP#> 3    chr18 76820701 76821000   300      * 439.316010 MCF7_ER_1    IP#> 4    chr18 76821001 76821300   300      * 219.658005 MCF7_ER_1    IP#> 5    chr18 76821301 76821600   300      *   0.000000 MCF7_ER_1    IP#> 6    chr18 76821601 76821900   300      * 219.658005 MCF7_ER_1    IP

Prepare mark region:

# create mark regionmark_region<-data.frame(start= c(76822533),end= c(76823743),label= c("Promoter"))# check datamark_region#>      start      end    label#> 1 76822533 76823743 Promoter

Basic coverage

basic_coverage<- ggcoverage(data=track_df,mark.region=mark_region,show.mark.label=FALSE)basic_coverage

Add annotations

Addgene,ideogram andpeak annotations. To create peakannotation, we firstget consensus peaks withMSPC.

# get consensus peak filepeak_file<- system.file("extdata","ChIP-seq","consensus.peak",package="ggcoverage")basic_coverage+  geom_gene(gtf.gr=gtf_gr)+  geom_peak(bed.file=peak_file)+  geom_ideogram(genome="hg19",plot.space=0)#> Loading ideogram...#> Loading ranges...#> Scale for x is already present.#> Adding another scale for x, which will replace the existing scale.

Hi-C data

The Hi-C method maps chromosome contacts in eukaryotic cells. For thispurpose, DNA and protein complexes are cross-linked and DNA fragmentsthen purified. As a result, even distant chromatin fragments can befound to interact due to the spatial organization of the DNA andhistones in the cell. Hi-C data shows these interactions for example asa contact map.

The Hi-C data is taken frompyGenomeTracks: reproducible plots formultivariate genomicdatasets.

The Hi-C matrix visualization is implemented byHiCBricks. This packageneeds to be installed separately (it is only ‘Suggested’ byggcoverage).

Load track data

# prepare track dataframetrack_file<-  system.file("extdata","HiC","H3K36me3.bw",package="ggcoverage")track_df<- LoadTrackFile(track.file=track_file,format="bw",region="chr2L:8050000-8300000",extend=0)#> No metadata provided, returning coverage as is.track_df$score<- ifelse(track_df$score<0,0,track_df$score)# check the datahead(track_df)#>   seqnames   start     end width strand      score        Type       Group#> 1    chr2L 8050000 8050009    10      * 1.66490245 H3K36me3.bw H3K36me3.bw#> 2    chr2L 8050015 8050049    35      * 1.59976900 H3K36me3.bw H3K36me3.bw#> 3    chr2L 8050057 8050091    35      * 1.60730922 H3K36me3.bw H3K36me3.bw#> 4    chr2L 8050097 8050131    35      * 1.65555012 H3K36me3.bw H3K36me3.bw#> 5    chr2L 8050137 8050171    35      * 1.71025538 H3K36me3.bw H3K36me3.bw#> 6    chr2L 8050176 8050210    35      * 1.75198197 H3K36me3.bw H3K36me3.bw

Load Hi-C data

Matrix:

## matrixhic_mat_file<- system.file("extdata","HiC","HiC_mat.txt",package="ggcoverage")hic_mat<- read.table(file=hic_mat_file,sep="\t")hic_mat<- as.matrix(hic_mat)

Bin table:

## binhic_bin_file<-  system.file("extdata","HiC","HiC_bin.txt",package="ggcoverage")hic_bin<- read.table(file=hic_bin_file,sep="\t")colnames(hic_bin)<- c("chr","start","end")hic_bin_gr<-GenomicRanges::makeGRangesFromDataFrame(df=hic_bin)## transfrom funcfailsafe_log10<-function(x) {x[is.na(x)| is.nan(x)| is.infinite(x)]<-0return(log10(x+1))}

Data transfromation method:

Load link

# prepare arcslink_file<-  system.file("extdata","HiC","HiC_link.bedpe",package="ggcoverage")

Basic coverage

basic_coverage<-  ggcoverage(data=track_df,color="grey",mark.region=NULL,range.position="out"  )basic_coverage

Add annotations

Addlink,contact mapannotations:

library(HiCBricks)#> Loading required package: curl#> Using libcurl 7.81.0 with OpenSSL/3.0.2#> Loading required package: rhdf5#> Loading required package: R6#> Loading required package: grid#>#> Attaching package: 'grid'#> The following object is masked from 'package:Biostrings':#>#>     patternbasic_coverage+  geom_tad(matrix=hic_mat,granges=hic_bin_gr,value.cut=0.99,color.palette="viridis",transform.fun=failsafe_log10,top=FALSE,show.rect=TRUE  )+  geom_link(link.file=link_file,file.type="bedpe",show.rect=TRUE  )#> Read 534 lines after Skipping 0 lines#> Inserting Data at location: 1#> Data length: 534#> Loaded 2315864 bytes of data...#> Read 534 records...#> Scale for y is already present.#> Adding another scale for y, which will replace the existing scale.#> Scale for x is already present.#> Adding another scale for x, which will replace the existing scale.

Mass spectrometry protein coverage

Massspectrometry(MS) is an important method for the accurate mass determination andcharacterization of proteins, and a variety of methods and instrumentshave been developed for its many uses. Withggcoverage, we can easilyinspect the peptide coverage of a protein in order to learn about thequality of the data.

Load coverage

The exported coverage fromProteomeDiscoverer:

library(openxlsx)# prepare coverage dataframecoverage_file<-  system.file("extdata","Proteomics","MS_BSA_coverage.xlsx",package="ggcoverage"  )coverage_df<-openxlsx::read.xlsx(coverage_file,sheet="Sheet1")# check the datahead(coverage_df)#>   Confidence                            Annotated.Sequence#> 1       High  [K].ATEEQLKTVMENFVAFVDKCCAADDKEACFAVEGPK.[L]#> 2       High  [K].ATEEQLKTVMENFVAFVDKCCAADDKEACFAVEGPK.[L]#> 3       High         [K].TVMENFVAFVDKCCAADDKEACFAVEGPK.[L]#> 4       High      [K].HLVDEPQNLIKQNCDQFEKLGEYGFQNALIVR.[Y]#> 5       High [R].RHPYFYAPELLYYANKYNGVFQECCQAEDKGACLLPK.[I]#> 6       High             [K].AFDEKLFTFHADICTLPDTEKQIKK.[Q]#>                                          Modifications Contaminant#> 1                    3xCarbamidomethyl [C20; C21; C29]        TRUE#> 2 3xCarbamidomethyl [C20; C21; C29]; 1xOxidation [M10]        TRUE#> 3  3xCarbamidomethyl [C13; C14; C22]; 1xOxidation [M3]        TRUE#> 4                              1xCarbamidomethyl [C14]        TRUE#> 5                    3xCarbamidomethyl [C24; C25; C33]        TRUE#> 6                              1xCarbamidomethyl [C14]        TRUE#>   #.Protein.Groups #.Proteins #.PSMs Master.Protein.Accessions#> 1                1          2     15                ALBU_BOVIN#> 2                1          2     26                ALBU_BOVIN#> 3                1          2     14                ALBU_BOVIN#> 4                1          2     41                ALBU_BOVIN#> 5                1          2     37                ALBU_BOVIN#> 6                1          2     40                ALBU_BOVIN#>   Positions.in.Master.Proteins Modifications.in.Master.Proteins#> 1         ALBU_BOVIN [562-597]                               NA#> 2         ALBU_BOVIN [562-597]                               NA#> 3         ALBU_BOVIN [569-597]                               NA#> 4         ALBU_BOVIN [402-433]                               NA#> 5         ALBU_BOVIN [168-204]                               NA#> 6         ALBU_BOVIN [524-548]                               NA#>   #.Missed.Cleavages Theo..MH+.[Da] Abundance:.F3:.Sample Quan.Info#> 1                  3     4107.88065            18692597.5      <NA>#> 2                  3     4123.87556            87767162.0      <NA>#> 3                  2     3324.46798            19803927.2      <NA>#> 4                  2     3815.91737           204933705.0      <NA>#> 5                  3     4513.12024            57012156.5      <NA>#> 6                  3     2995.52337           183934556.7      <NA>#>   Found.in.Sample:.[S3].F3:.Sample Confidence.(by.Search.Engine):.Sequest.HT#> 1                             High                                      High#> 2                             High                                      High#> 3                             High                                      High#> 4                             High                                      High#> 5                             High                                      High#> 6                             High                                      High#>   XCorr.(by.Search.Engine):.Sequest.HT Top.Apex.RT.[min]#> 1                                11.96             97.50#> 2                                10.91             90.09#> 3                                 9.89             84.90#> 4                                 9.75             91.84#> 5                                 8.94             93.30#> 6                                 8.90             75.40

The input protein fasta:

fasta_file<-  system.file("extdata","Proteomics","MS_BSA_coverage.fasta",package="ggcoverage"  )# prepare track dataframeprotein_set<-Biostrings::readAAStringSet(fasta_file)# check the dataprotein_set#> AAStringSet object of length 2:#>     width seq                                               names#> [1]   607 MKWVTFISLLLLFSSAYSRGVFR...DDKEACFAVEGPKLVVSTQTALA sp|P02769|ALBU_BOVIN#> [2]   583 DTHKSEIAHRFKDLGEEHFKGLV...DDKEACFAVEGPKLVVSTQTALA decoy

Protein coverage

protein_coverage<- ggprotein(coverage.df=coverage_df,fasta.file=fasta_file,protein.id="sp|P02769|ALBU_BOVIN",range.position="out")protein_coverage

Add annotation

We can obtain features of the protein fromUniProt. For example, the above proteincoverage plot shows that there is empty region in 1-24, and this emptyregion inUniProt isannotated as Signal peptide and Propeptide peptide. When the protein ismature and released extracellular, these peptides will be cleaved. Thisis the reason why there is empty region in 1-24.

# protein feature obtained from UniProtprotein_feature_df<-data.frame(ProteinID="sp|P02769|ALBU_BOVIN",start= c(1,19,25),end= c(18,24,607),Type= c("Signal","Propeptide","Chain"))# add annotationprotein_coverage+  geom_feature(feature.df=protein_feature_df,feature.color= c("#4d81be","#173b5e","#6a521d")  )

Code of Conduct

Please note that theggcoverage project is released with aContributor Code ofConduct.By contributing to this project, you agree to abide by its terms.

About

Visualize and annotate genomic coverage with ggplot2

showteeth.github.io/ggcoverage

Topics

visualization coverage ggplot2 gene track transcripts peak ideogram gc-content base-frequency genome-coverage

Resources

Readme

License

Unknown, MIT licenses found

Releases3

v1.3.0 Latest

Apr 19, 2024

+ 2 releases

Packages

No packages published

Contributors3

Languages

R100.0%

Movatterモバイル変換

License

Licenses found

showteeth/ggcoverage

Folders and files

Latest commit

History

Repository files navigation

ggcoverage - Visualize and annotate omics coverage with ggplot2

Introduction

Installation

Manual

RNA-seq data

Load the data

Load GTF

Basic coverage

joint view

Facet view

Custom Y-axis style

Add gene annotation

Add transcript annotation

Add ideogram

DNA-seq data

CNV

Example 1

Load the data

Basic coverage

Add GC annotations

Example 2

Load the data

Basic coverage

Load CNV file

Add annotations

Single-nucleotide level

Load the data

Default color scheme

Add base and amino acid annotation

ChIP-seq data

Basic coverage

Add annotations

Hi-C data

Load track data

Load Hi-C data

Load link

Basic coverage

Add annotations

Mass spectrometry protein coverage

Load coverage

Protein coverage

Add annotation

Code of Conduct

About

Topics

Resources

License

Licenses found

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases3

Packages0

Uh oh!

Contributors3

Uh oh!

Languages

Packages