Lecture Synopsis

Today we are going to learn about R’s basic programming structure, aka control structure, and how to use them to build a function(). Recalled that R is all about function, we need to learn the nomenclature of a function, how to write a function, and how to use built-in and installed R based and packages’ function. We need to learn to deal with conflict and fixing bugs Finally, we will learn the family of apply functions.

Now that you had a chance to watch Roger Peng’s Week 2 Video, let’s apply what you have learned.

R is all about object and function(). A function() always ends with ()

My first function

Quick-R is a very useful R resources for all things R. Click here to learn how to write a function.

Write a function to display the word “Hello Word”

This first function has no argument; meaning that there is no item inside the parenthesis. It simply display the word “Hello World”

my.hello.word <- function(){
  cat('Hello World \n')
}

my.hello.word()
## Hello World

Now, say your owned word

This function has one argument; that lives inside the parenthesis. It takes a character string to be displayed onto the console when the function is executed.

my.owned.word <- function(myWord){
  cat('This is ', myWord, '\n')
}

my.owned.word('tzu world')
## This is  tzu world

control-structure: if-else

Before we can do more cool things in the function, we need to learn a little bit more programming. First, let’s explain the control structure called if-else.

Use a variable to make decision

Extra: Short Cut: Run Current Code Chunk

  • Mac: option command c
  • Windown:
## Short Cut: Run Current Code Chunk 
x = 6
decider = 6

if(x > decider){
  cat('x is big enough \n')
}else if(x < decider){
  cat('x is too small \n')
}else {
  cat('x might be equal to decider\n')
}
## x might be equal to decider
## Change the "decider" to make a different decision

control-structure: For-loop

The power of programming lies in iteration with almost no human error. The most common repeater is For-loop

Given a vector of baby name, display them one at a time

baby.names.2014 = c('Jackson', 'Aiden', 'Liam', 'Lucas', 'Noah', 'Mason', 'Ethan', 'Caden', 'Jacob', 'Logan')

for(i in 1: length(baby.names.2014)){
  
  cat('Hello ', baby.names.2014[i], '\n')
  
}
## Hello  Jackson 
## Hello  Aiden 
## Hello  Liam 
## Hello  Lucas 
## Hello  Noah 
## Hello  Mason 
## Hello  Ethan 
## Hello  Caden 
## Hello  Jacob 
## Hello  Logan

OK, you have learned enough to do some real work.

Given a DNA sequence, translate it into Protein single letters representation

x = c(‘GAT’, ‘GAG’, ‘GTA’, ‘ATC’, ‘TTG’, ‘TGT’, ‘TTG’, ‘GCA’, ‘TCA’, ‘TCT’)

## Here is the DNA sequence
genetic.code = c('GATGAGGTAATCTTGTGTTTGGCATCATCT')

## To translate DNA code to Protein letter, first we need break them into a codon of three bases


## Count how many character/base in the sequence
(num.char = nchar(genetic.code))
## [1] 30
## Generate a sequence count of interval of 3

(interval.of.3 = seq(from = 1, to = num.char, by = 3))
##  [1]  1  4  7 10 13 16 19 22 25 28
genetic.code.codon = c()
for(i in 1:length(interval.of.3)){
  
  tmp = substr(genetic.code, start = interval.of.3[i], stop = interval.of.3[i]+2)
  genetic.code.codon = c(genetic.code.codon, tmp)
  
}

(genetic.code.codon)
##  [1] "GAT" "GAG" "GTA" "ATC" "TTG" "TGT" "TTG" "GCA" "TCA" "TCT"

alt text

## Now translate them

protein.letters = c()

for(i in 1:length(genetic.code.codon)){
  
  if(genetic.code.codon[i] == 'GAT'){
    tmp = 'D'
  }else if(genetic.code.codon[i] == 'GTA'){
    tmp = 'V'
  }else if(genetic.code.codon[i] == 'TCA' | genetic.code.codon[i] == 'TCT'){
    tmp = 'S'
  }else if(genetic.code.codon[i] == 'TTG' | genetic.code.codon[i] == 'TTG'){
    tmp = 'L'
  }else if(genetic.code.codon[i] == 'TGT'){
    tmp = 'C'
  }else if(genetic.code.codon[i] == 'GAG'){
    tmp = 'E'
  }else if(genetic.code.codon[i] == 'ATC'){
    tmp = 'I'
  }else if(genetic.code.codon[i] == 'GCA'){
    tmp = 'A'
  } 
  protein.letters = c(protein.letters, tmp)
}

## Now, let's reveal the genetic code secret ...

(protein.letters)
##  [1] "D" "E" "V" "I" "L" "C" "L" "A" "S" "S"

Introducing Biostrings BioConductor package

## A much easier way to do this; use the Biostring library

library(Biostrings)

print(GENETIC_CODE)
## TTT TTC TTA TTG TCT TCC TCA TCG TAT TAC TAA TAG TGT TGC TGA TGG CTT CTC 
## "F" "F" "L" "L" "S" "S" "S" "S" "Y" "Y" "*" "*" "C" "C" "*" "W" "L" "L" 
## CTA CTG CCT CCC CCA CCG CAT CAC CAA CAG CGT CGC CGA CGG ATT ATC ATA ATG 
## "L" "L" "P" "P" "P" "P" "H" "H" "Q" "Q" "R" "R" "R" "R" "I" "I" "I" "M" 
## ACT ACC ACA ACG AAT AAC AAA AAG AGT AGC AGA AGG GTT GTC GTA GTG GCT GCC 
## "T" "T" "T" "T" "N" "N" "K" "K" "S" "S" "R" "R" "V" "V" "V" "V" "A" "A" 
## GCA GCG GAT GAC GAA GAG GGT GGC GGA GGG 
## "A" "A" "D" "D" "E" "E" "G" "G" "G" "G"
(protein.letters.2 = GENETIC_CODE[genetic.code.codon])
## GAT GAG GTA ATC TTG TGT TTG GCA TCA TCT 
## "D" "E" "V" "I" "L" "C" "L" "A" "S" "S"

control-structure: Other Resources

Quick R

There are many more control structure in R in the Quick-R reference.

Simply ?

By now you have a chance to use a number R’s function(); seq(), substr(), nchar, etc. How to one learn how to use these function()?

## The easiest way to learn about a function is:

?substr

Useful websites

At time we may not know the function name. The best place to ask question is StackOverFlow and seekR. The R-bloggers site has some tips for posting good questions to R-help

The apply function family

The apply function

for-loop is great, but it is too bulky. We now introduce the apply family utilities.

StackOverFlow has a very interesting disucssion on apply

When you want to apply a function to the rows or columns of a matrix (and higher dimensional analogues)

alt text

alt text

# Two dimensional matrix
(M <- matrix(seq(1,16), 4, 4))
##      [,1] [,2] [,3] [,4]
## [1,]    1    5    9   13
## [2,]    2    6   10   14
## [3,]    3    7   11   15
## [4,]    4    8   12   16
# apply min to rows
apply(M, 1, min)
## [1] 1 2 3 4
# apply max to columns
apply(M, 2, max)
## [1]  4  8 12 16
# 3 dimensional array
M <- array( seq(32), dim = c(4,4,2))

# Apply sum across each M[*, , ] - i.e Sum across 2nd and 3rd dimension
apply(M, 1, sum)
## [1] 120 128 136 144
# Result is one-dimensional

# Apply sum across each M[*, *, ] - i.e Sum across 3rd dimension
apply(M, c(1,2), sum)
##      [,1] [,2] [,3] [,4]
## [1,]   18   26   34   42
## [2,]   20   28   36   44
## [3,]   22   30   38   46
## [4,]   24   32   40   48
# Result is two-dimensional

The lapply function

When you want to apply a function to each element of a list in turn and get a list back

alt text

x <- list(a = 1, b = 1:3, c = 10:100) 
(lapply(x, FUN = length))
## $a
## [1] 1
## 
## $b
## [1] 3
## 
## $c
## [1] 91
(lapply(x, FUN = sum))
## $a
## [1] 1
## 
## $b
## [1] 6
## 
## $c
## [1] 5005

The sapply function

When you want to apply a function to each element of a list, but you want a vector back (rather than a list)