Learning Objectives

Upon completion of this class, the students will be able to:

The following analysis report was adapted from my formal student; Tiffany Callahan.

Introduction

Microarray has revolutionize high-throughput transcriptomic analysis. It measures thousands of genes simultaneously using very small amount of biological speciment. We will use the EMA package to learn to analysis micoarray data. A comprehensive function definition for the EMA package can be found here

Microarray Learning Resources

Software Installation

The EMA package can be installed using the biocLite( ). We will also install the oligo package

source(“http://bioconductor.org/biocLite.R”)
biocLite(“EMA”) biocLite(“oligo”)

Load Required Libraries

We will load the required EMA and affy libraries

library(EMA)
library(oligo)

Data Preparation

Create a data folder and download Practice Dateset and uncompressed the file inside the data folder

data
  |____ BIOS6660_Microarray_Practice_Dataset.zip

You should find 17 CEL files and a text file in your data folder

PHBI_001.CEL PHBI_027.CEL PHBI_059.CEL PHBI_068.CEL PHBI_078.CEL
PHBI_002.CEL PHBI_042.CEL PHBI_062.CEL PHBI_069.CEL
PHBI_003.CEL PHBI_044.CEL PHBI_066.CEL PHBI_073.CEL
PHBI_019.CEL PHBI_053.CEL PHBI_067.CEL PHBI_075.CEL

PHBI cel annotation BIO6660 v2.txt

Please place these files directly under the data folder:

data
  |____ PHBI_001.CEL PHBI_027.CEL PHBI_059.CEL PHBI_068.CEL
  |____ …
  |____ PHBI cel annotation BIO6660 v2.txt

Data Import and Processing

Import and Examine Meta Data

First we will import a file PHBI cel annotation BIO6660 v2.txt that contains experimental info pertaining to the dataset.

## Read in descriptive data file
PHBI.file.info = read.table(file = "./data/PHBI cel annotation BIO6660 v2.txt", header = T, 
    sep = "\t")
knitr::kable(PHBI.file.info)
PHBI.cel.file PHBI.number Clinical.Category Age Race Sex Scan.Date Batch
PHBI_001.cel PHBI_001 IPAH 62 White Female 10/1/09 1
PHBI_002.cel PHBI_002 IPAH 49 White Female 10/1/09 1
PHBI_003.cel PHBI_003 IPAH 64 White Male 10/1/09 1
PHBI_019.cel PHBI_019 Failed Donor 60 White Male 10/1/09 1
PHBI_027.cel PHBI_027 Failed Donor 49 White Male 10/1/09 1
PHBI_042.cel PHBI_042 IPAH 44 White Female 10/1/10 2
PHBI_044.cel PHBI_044 IPAH 40 White Female 10/1/10 2
PHBI_053.cel PHBI_053 Failed Donor 25 White Male 10/1/10 2
PHBI_059.cel PHBI_059 Failed Donor 55 White Male 10/1/10 2
PHBI_062.cel PHBI_062 IPAH 45 White Male 10/1/10 2
PHBI_066.cel PHBI_066 Failed Donor 19 White Male 10/1/10 2
PHBI_067.cel PHBI_067 Failed Donor 62 White Male 10/1/11 3
PHBI_068.cel PHBI_068 Failed Donor 21 White Female 10/1/11 3
PHBI_069.cel PHBI_069 Failed Donor 28 White Female 10/1/11 3
PHBI_073.cel PHBI_073 IPAH 71 White Female 10/1/11 3
PHBI_075.cel PHBI_075 IPAH 56 White Male 10/1/11 3
PHBI_078.cel PHBI_078 IPAH 61 White Male 10/1/11 3

Extract Clinical Category

## Extract groups: IPAH and Failed Donor
PHBI.type.cl = as.character(PHBI.file.info$Clinical.Category)
PHBI.type.cl
##  [1] "IPAH"         "IPAH"         "IPAH"         "Failed Donor"
##  [5] "Failed Donor" "IPAH"         "IPAH"         "Failed Donor"
##  [9] "Failed Donor" "IPAH"         "Failed Donor" "Failed Donor"
## [13] "Failed Donor" "Failed Donor" "IPAH"         "IPAH"        
## [17] "IPAH"

Extract Batch Information

PHBI.batch.cl = PHBI.file.info$Batch
PHBI.batch.cl
##  [1] 1 1 1 1 1 2 2 2 2 2 2 3 3 3 3 3 3

Run Summary Statistics

## Run basic summary statistics
tapply(PHBI.file.info$Age, PHBI.file.info$Clinical.Category, mean)
## Failed Donor         IPAH 
##     39.87500     54.66667
tapply(PHBI.file.info$Sex, PHBI.file.info$Clinical.Category, summary)
## $`Failed Donor`
## Female   Male 
##      2      6 
## 
## $IPAH
## Female   Male 
##      5      4

Summary Visualization

## Visualize age distributions by group
library(ggplot2)

plot = ggplot(PHBI.file.info, aes(factor(PHBI.file.info$Clinical.Category), PHBI.file.info$Age)) + geom_boxplot(aes(fill = factor(PHBI.file.info$Clinical.Category))) + labs(title = "Age Distributions by PAH Group") + xlab("Group") + ylab("Age")

plot + scale_fill_manual("PAH Group", values = c("#56B4E9", "#009E73"))

Import Raw Data and Normalization

library('pd.hugene.1.0.st.v1')
## Import CEL files
PHBI.CEL = PHBI.file.info$PHBI.cel.file
PHBI.CEL = gsub('.cel','.CEL',PHBI.CEL)
PHBI.CEL = paste('./data/', PHBI.CEL, sep = '')
#PHBI.CEL = list.celfiles()

## Read CEL files to directory
PHBI.data = read.celfiles(PHBI.CEL)
## Reading in : ./data/PHBI_001.CEL
## Reading in : ./data/PHBI_002.CEL
## Reading in : ./data/PHBI_003.CEL
## Reading in : ./data/PHBI_019.CEL
## Reading in : ./data/PHBI_027.CEL
## Reading in : ./data/PHBI_042.CEL
## Reading in : ./data/PHBI_044.CEL
## Reading in : ./data/PHBI_053.CEL
## Reading in : ./data/PHBI_059.CEL
## Reading in : ./data/PHBI_062.CEL
## Reading in : ./data/PHBI_066.CEL
## Reading in : ./data/PHBI_067.CEL
## Reading in : ./data/PHBI_068.CEL
## Reading in : ./data/PHBI_069.CEL
## Reading in : ./data/PHBI_073.CEL
## Reading in : ./data/PHBI_075.CEL
## Reading in : ./data/PHBI_078.CEL
## Normalize the data
PHBI.norm = rma(PHBI.data)
## Background correcting
## Normalizing
## Calculating Expression

Quality Control

Before and After Normalization

## Load color libraries
library(RColorBrewer)

## Set color palette
color.palette = brewer.pal(8, "Set1")

## Pre-normalized intensity values boxplot
boxplot(PHBI.data, col = color.palette, main = "Pre-normalized Intensity Values")

## Normalized intensity values boxplot
boxplot(PHBI.norm, col = color.palette, main = "Normalized Intensity Values")

## Pre-normalized density plot of log-intensity distribution
hist(PHBI.data, col = color.palette, main = "Pre-Normalized Density Plot of log-Intensity Distribution")