Lesson Plan

Data Files

There are only 3 VCF files for this analysis module:


These are mini-VCF with a compilation of 100 genes for each case.

Sample Description

alt text

Genetic Variation

In this analysis, we are only interested in small-scale sequence variation (<1 Kbp). There are 2 major types of small-scale varations; substitution and Indels


Substitution is a point mutation, or signle base modification, which causes a change in a single nucleotide. This mutation is also commonly known as Single-Nucleotide Polymorphism SNP.


Indel is short for insertion or deletion of bases in the DNA. A microindel is an indel that results in a net change of 1 (such as SNP) to 50 nucleotide. Indel often results in frameshift in the coding region and have disastrous consequences in biology.

Variation Detectin using GATK

The most popular Variate Detection tool is from Broad Institute called GATK

alt text

James’ Analysis Workflow

alt text

Capturing Genetic Variation in VCF file

I found these 2 resources for an introduction to the VCF file:

VCF stands for Variant Call Format. It is a standardized text file format for representing SNP, indel, and structural variations found in sequencing your samples.

Understanding the VCF file

At First Glance

VCF is a text file by nature. Opening it in any text editor (Try avoid using Microsoft Word since the software may alter the content unintentionally) will reveal that the VCF file is consists of 2 general regions; meta-data and variant-data:

  • meta-data: This segment contain information used to explain the variant-data contents
  • variant-data: These ar