Introduction: What is ChIP-seq?

Disclaimer: the following introduction is being adapted from Bioconductor: BayesPeak package.

Chromatin ImmunoPrecipitation (ChIP) is an experiment designed to study protein-DNA interactions, particularly to identify the genomic sites where proteins, such as transcription factors, bind to the DNA, or sites where histone modifications occur. The experiment produces samples that are enriched for the sites of interest compared to the rest of the genome. The use of this method combined with high-throughput sequencing of the samples is referred to as ChIP-seq.

Given our protein of interest, the ChIP-seq protocol usually consists of the following steps. (The exact protocol may vary between different experiments, but BayesPeak will still be able to perform the peak-calling step.)

Usually this process is repeated omitting the immunoprecipitation step to produce a sample with no preferential enrichment. This control sample has the same characteristics as ChIP-seq data and its inclusion is important to identify experimental biases.

There are many sources of error - for example, misalignment, impurities, or DNA that simply has a high affinity for being sequenced - which can result in noise across the genome, or even false peaks. The BayesPeak model is designed to separate the peaks from the noise, and to avoid calling false peaks.

ChIP-seq: an overview

alt text Note: Figure adapted from A computational pipeline for comparative ChIP-seq analysis. Nature protocols. 2012, Vol 7, page 45-61

References

  • Bailey T, Krajewski P, Ladunga I, Lefebvre C, Li Q, Liu T, Madrigal P, Taslim C, Zhang J. Practical guidelines for the comprehensive analysis of ChIP-seq data. PLoS Comput Biol. 2013;9(11):e1003326 Main Paper
  • Bardet AF, He Q, Zeitlinger J, Stark A. A computational pipeline for comparative ChIP-seq analyses. Nat Protoc. 2011 Dec 15;7(1):45-61.
  • Shin H, Liu T, Duan X, Zhang Y, Liu XS. Computational methodology for ChIP-seq analysis. Quant Biol. 2013 Mar 1;1(1):54-70
  • Park PJ. ChIP-seq: advantages and challenges of a maturing technology. Nat Rev Genet. 2009 Oct;10(10):669-80

Software Installation

BioConductor Software Installation

The packages can be installed using the biocLite( ).

source(“http://bioconductor.org/biocLite.R”)
biocLite(‘GenomicAlignments’); biocLite(‘DBChIP’); biocLite(‘ShortRead’)
biocLite(‘rtracklayer’); biocLite(‘GenomicRanges’); biocLite(‘GenomicFeatures’)
biocLite(‘Rsamtools’); biocLite(‘chipseq’); biocLite(‘annaffy’)
biocLite(‘annotate’); biocLite(‘ChIPseeker’); biocLite(‘TxDb.Hsapiens.UCSC.hg19.knownGene’)
biocLite(‘BSgenome.Hsapiens.UCSC.hg19’); biocLite(‘org.Hs.eg.db’); biocLite(‘BayesPeak’)

Integrated Genome Viewer (IGV)

We will be using a stand-alone genome viewing software; Integrated Genome Viewer (IGV), from Broad Institue to visualize our result. Goto the bottom of the page; register, download the install the software.

alt text

A working copy of Microsoft Excel

We will be using Microsoft Excel to open the final result file. So, make sure your copy on your computer is in working order

alt text

Load Required Libraries

library(GenomicAlignments)
library(DBChIP)
library(ShortRead)
library(rtracklayer)
library(GenomicRanges)
library(GenomicFeatures)
library(Rsamtools)
library(annaffy)
library(annotate)
library(ChIPseeker)
library(TxDb.Hsapiens.UCSC.hg19.knownGene)
txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
library(BSgenome.Hsapiens.UCSC.hg19)
genome = BSgenome.Hsapiens.UCSC.hg19
library(org.Hs.eg.db)
library(BayesPeak)
library(ChIPQC)
library(ChIPpeakAnno)
data("TSS.human.GRCh37")

Data Preparation

Create a data folder and download Gerber Dataset and decompressed the file inside the data folder

data
  |____ Gerber_chr22.zip

Make sure that your unzipped files are in the proper directory structure to run these code:

data
  |____ ‘Gerber_chr22’
    |____ all my files

You should find 4 mapped Chromosome 22 only bam files:

Dex_GR_7A_chr22.bam,
Dex_GR_7C_chr22.bam,
Veh_GR_2B_chr22.bam,
Veh_GR_2D_chr22.bam

1 mapped background Chromosome 22 only bed files:

Background_2A_Input_chr22.bed

4 peaks called bed files:

Dex_GR_7A_ori_peaks.bed,
Dex_GR_7C_ori_peaks.bed,
Veh_GR_2B_ori_peaks.bed,
Veh_GR_2D_ori_peaks.bed

Precomputed ChIPQC binary file:

Gerber_QC.Rdata

Workflow Overview

Here is an overview of the ChIP-seq data analysis workflow: