BIOS6660 2015

Genomic Data Analysis Using R and BioConductor

Stacks Image 2744

BIOS6660: Biomedical Big Data Analysis with R

This course provides students with hands on experience in solving real life biological problem using the statistical software R and its companion packages from the Bioconductor consortium. The students will have an opportunity to work with participating researchers and clinicians in finding practical solutions for case studies in both the statistical and biological perspective. Students will also learn to communicate with the scientists and interpret the results in the biological context
Stacks Image 2759

Office Hour

Tzu L Phang:

  • RC2 Room 9003
  • Monday: 10 am to 12 noon

JunXiao (Jim) Hu:

  • Ed 1 Room 2304
  • Tuesday: 3 pm to 4 pm
  • Wednesday: 2 pm to 3 pm

Week 1 Day 1 (Sept 1, 2015): Course Overview and RMarkDown+Knitr

Description

An overview of the course. We will review our plan in using R, Bioconductor, and RStudio to perform Reproducible Research compliance analysis workflow on the first 3 weeks of the course. I will also give you a brief overview of each use cases we will be performing. We will introduce you to the idea of reproducible research by recording and generating proper documentation in multiple formats; PDF, HTML, and Words.

Outcomes and Learning Objectives

Upon completion of this class, the students will be able to
  • Have a better idea of what the course is all about
  • Understand what other skill sets are need to be updated before taking the course successfully
  • Learn how to generate Reproducible Research Report using RStudio

Week 1 Day 2 (Sept 3, 2015): R Basic

Description

This is a brief review of some R basic. We will refresh our memory on vector, list, data frame, and how to navigate and manipulate these data structure

Outcomes and Learning Objectives

Upon completion of this class, the students will be able to

  • Create and manipulate the basic data structure: vector, matrix, list and data frame
  • Subsetting and combining tables using “subset( )”, “merge( )” and “melt( )” functions
  • Create citation in your report.

Week 2 Day 1 (Sept 8, 2015): R Function and Control Structure

Description

In this class, you will learn how to write an R function. But, first you need to learn the various “Control Structure” in R such as “if-else”, “for-loop” and etc. In addition, you will also learn the “apply” family function to speed up your programming task.

Outcomes and Learning Objectives

Upon completion of this class, the students will be able to
  • Use “Control Structure” to make decision in the code
  • Write a user defined function
  • Use the “apply” family of functions to perform operations on various data types.

Week 2 Day 2 (Sept 10, 2015): R Graphics

Description

R is a very powerful graphic plotting program. Today we are going to learn about 2 types of R graphing engines:
  • Basic R Graphing Engine
  • ggplot2 Grammar of Graphics R implementation

Outcomes and Learning Objectives

Upon completion of this class, the students will be able to
  • Use basic R graphic method to plat simple plot/graph
  • Learn how to control various aspects of a graphi
  • Learn about Grammar of Graphic using ggplot2

Week 3 Day 1 (Sept 15, 2015): Genomic Object Structure

Description

Genomic is an information rich science. Storing these genomic information; DNA, RNA, Protein, etc, requires special implementation of object structure. This class introduce you to Bioconductor and R solution to store, access, and manipulate these data storage objects

Outcomes and Learning Objectives

Upon completion of this class, the students will be able to
  • Learn to store genomic data at multiple levels
    • Individual elements; DNA, RNA, protein, etc,
    • Genome level — how to store the whole genome
    • Genomic Ranges — how to represent genomic regions
    • Genomic Feature — how to annotate different entities in a genomic region; promoter, coding region, enhancer, etc

Week 3 Day 2 (Sept 17, 2015): Dynamic Visualization using shiny

Description

shiny is developed by the RStudio group to perform interactive graphic without JavaScipt and other web languages. Today, we will learn the basic of shiny

Outcomes and Learning Objectives

Upon completion of this class, the students will be able to
  • Build a basic application to plot a histogram with interactive controls
  • Reference for RStudio in house shiny tutorial

Week 4 & 5 (Sept 22 to Oct 1, 2015): Case Study 1: Microarray Data Analysis

Description

This is the Microarray Data Analysis Module where we learn how to build a workflow to analyze Affymetrix Microarray Data.

Outcomes and Learning Objectives

  • Understand the concept of Microarray technology and its purpose
  • Learn how to build a microarray data analysis workflow
  • Learn how to interpret the analysis result
  • Learn how to generate a Reproducible Research Report using the analysis pipeline

Week 6 & 7 (Oct 6 to Oct 15, 2015): Case Study 2: RNA-seq Data Analysis

Description

This is the RNAseq module for analyzing RNA-seq dataset. RNA-seq is a Next Generation Sequencing technology to measure the gene expression in the transcriptomic.

Outcomes and Learning Objectives

List learning objective

  • Understand the Next Generation Sequencing (NGS) RNA-seq technology
  • Understand the file system used in NGS
  • Understand the super fast alignment algorithm
  • Learn how to quantitate gene expression
  • Perform Differential Expression (DE) analysis
  • Annotate significant gene list
  • Perform Pathway analysis

Week 8 & 9 (Oct 20 to Oct 29, 2015): Case Study 3: ChIP-seq Data Analysis

Description

ChIP-seq is a Next Generation Sequencing method to detect the location of DNA binding protein.

Outcomes and Learning Objectives

List learning objective
  • Learn how to perform peak finding algorithm
  • Perform quality assessment on the found peaks
  • Perform differential binding comparing between experimental groups
  • Annotate the significant peaks
  • Optional: perform “motif” analysis

Day 4: Oct 29

Report Due:

Nov 3, 2015

Week 10 & 11 (Nov 3 to Nov 12, 2015): Case Study 4: Clinical Informatics Data Analysis

Description

One of the big-data in biomedical Informatics is Electronic Medical Record (EMR). Here we demonstrate the business financial part of EMR by analyzing a MediCare dataset using nothing but R

Outcomes and Learning Objectives

List learning objective

  • How to deal with big-data, and why conventional R methods are insufficient
  • Introduction to the data.table library for big-data analytics
  • How to map statistical value to map
  • A real case study: Medicare Healthcare Payment

Week 12 & 13 (Nov 17 to Dec 3, 2015): Case Study 5: Exome-Seq Data Analysis

Description

Thus far, we have been analyzing data measured from RNA molecule. Today, we will change that and measure the DNA molecule using Next Generation Sequencing. Specifically, we are measuring the mutation on the DNA itself. To that end, we are sequencing the Exome of the samples and trying to detect mutations at the genomic DNA level.

We will look at three real life patient cases, and trying to identify potential mutations on genes that might have caused the manifestation of the disease phenotype. In this exercise, the students will have an opportunity to learn each case in depth; the family, the disease condition, and etc, and try to explain the condition using the results from Exom-seq data analysis.

At the end their analysis presentation, our PI will reveal the true of what the scientist actually found, and whether it matches what the student identified.

This will be fun :-)

Outcomes and Learning Objectives

List learning objective
  • Understand the Trio Family design to detect potential mutations that cause the illness
  • Understand how the genetic variants are stored VCF file
  • Write algorithm to detect these disease models
    • de Novo
    • Autosomal Recessive Homozygous
    • X-link
    • Autosomal Recessive Heterozygous
  • Annotate potential affected mutation
  • Determine if the mutation is non synonymous
  • Visualize the mutation in IGV

Week 14 & 15 (Dec 8 to Dec 17, 2015): Case Study 6: The Dream Challenge

Description

This is one of the module

Outcomes and Learning Objectives

List learning objective

Day 4: Dec 17

Report Due:

Dec 21, 2015