Example usage

How we use deepTools for ChIP-seq analyses

To get a feeling for what deepTools can do, we’d like to give you a brief glimpse into how we typically use deepTools for ChIP-seq analyses. For more detailed exampes and descriptions of the tools, simply follow the respective links.

Note

While some tools, such asplotFingerprint, specifically address ChIP-seq-issues, the majority of tools is widely applicable to deep-sequencing data, including RNA-seq.

../_images/start_workflow.png

As shown in the flow chart above, our work usually begins with one ormoreFASTQfile(s) of deeply-sequenced samples. After preliminary quality control usingFASTQC,we align the reads to the reference genome, e.g., usingbowtie2.The standard output of bowtie2 (and other mapping tools) is in the form of sorted and indexedBAM filesthat provide the common input and starting point for all subsequent deepTools analyses.We then use deepTools to assess the quality of the aligned reads:

  1. Correlation between BAM files (multiBamSummary andplotCorrelation).Together, these two modules perform a very basic test to see whetherthe sequenced and aligned reads meet your expectations. We use thischeck to assess reproducibility - either between replicatesand/or between different experiments that might have used the sameantibody or the same cell type, etc. For instance, replicates shouldcorrelate better than differently treated samples.

    Tip

    You can also assess the correlation ofbigWig files usingmultiBigwigSummary.

../_images/heatmap_SpearmanCorr_readCounts.png
  1. Coverage check (plotCoverage). To see how many bp in the genome are actually covered by (a good number) of sequencing reads, we useplotCoverage which generates two diagnostic plots that help us decide whether we need to sequence deeper or not. The option--ignoreDuplicates is particularly useful here!

../_images/ExamplePlotCoverage.png

For paired-end samples, we often additionally check whether the fragment sizes are more or less what we would expected based on the library preparation. The modulebamPEFragmentSize can be used for that.

../_images/fragmentSize.png
  1. GC-bias check (computeGCBias). Many sequencing protocolsrequire several rounds of PCR-based DNA amplification, which often introduces notable bias, due to many DNA polymerases preferentially amplifying GC-rich templates. Depending on the sample (preparation), the GC-bias can vary significantly and we routinely check its extent. When we need to compare files with different GC biases, we use thecorrectGCBias module.See the paper byBenjamini and Speed for many insights into this problem.

../_images/ExampleCorrectGCBias.png
  1. Assessing the ChIP strength. We do this quality control step to get afeeling for the signal-to-noise ratio in samples from ChIP-seqexperiments. It is based on the insights published byDiaz etal.

../_images/fingerprints.png

Once we’re satisfied with the basic quality checks, we normallyconvertthe largeBAM files into a leaner data format, typicallybigWig.bigWig files have several advantages over BAM files, mainly stemmingfrom their significantly decreased size:

  • useful for data sharing and storage

  • intuitive visualization in Genome Browsers (e.g.IGV)

  • more efficient downstream analyses are possible

The deepTools modulesbamCompare andbamCoverage not only allow for simple conversion of BAM to bigWig (orbedGraph for that matter), but also for normalization, such that different samples can be compared despite differences in their sequencing depth.

Finally, once all the converted files have passed our visual inspections (e.g., using theIntegrative Genomics Viewer), the funof downstream analysis withcomputeMatrix,plotHeatmap andplotProfile can begin!

deepTools Galaxy.

code @ github.