R, Data Science and other buzzwords

Jason Hartford
Feb 2015

What we'll look at…

Originally developed as an open source clone of S.

… so it's very popular in statistics departments (where I first learnt it)

Now incredibly popular in data science because of abundance of good libraries

(Though Python may be taking over)

It's full of inconsistencies

Can be a pain to learn - especially if you think of it as a programming language (!)

But… (personal opinion) it's still the fastest way to go from raw data to useful information…

R is the language

RStudio is the IDE

Just about everyone uses RStudio… unless you're a VIM ninja or have some other good reason not to, I'd advise you do the same…

Four main datatypes

x <- 10.5
a <- c(1,3,5,7)
b <- c(1,2,4,8)
d <- c(a,b)
d

[1] 1 3 5 7 1 2 4 8

y <- 3L #integers
z <- 1 + 2i # Complex

a <- c(101,115,83,120)
b <- c("Tom","Bob","Jason","Jennifer")
c <- c(TRUE,FALSE,FALSE,TRUE)

dat <- data.frame(iq = a, name = b, google_job = c)
dat

   iq     name google_job
1 101      Tom       TRUE
2 115      Bob      FALSE
3  83    Jason      FALSE
4 120 Jennifer       TRUE

x <- list(1:3, "a", c(TRUE, FALSE, TRUE), c(2.3, 5.9))
str(x) # One of the most useful commands you'll learn

List of 4
 $ : int [1:3] 1 2 3
 $ : chr "a"
 $ : logi [1:3] TRUE FALSE TRUE
 $ : num [1:2] 2.3 5.9

a <- matrix(1:6, ncol = 3, nrow = 2)
b <- matrix(c(2,2,4), nrow = 3, ncol = 1)
a %*% b # note * does elementwise multiplication

     [,1]
[1,]   28
[2,]   36

t(b) #transpose and all the other usual operations are available

     [,1] [,2] [,3]
[1,]    2    2    4

R can read from just about anywhere…

CSV and flat files:

dat <- read.csv('file.csv', sep = ",")

or more generally, read.table()
Databases: "RMySQL", "RODBC", "ROracle", "RPostgreSQL", etc, etc
Excel, JSON, URLs, Twitter, APIs, etc., etc.

Presentations

Markdown (Shiny)

Latex

Advanced R by Hadley Wickham

Introduction to Statistical Learning by Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani

Statistical Learning by Trevor Hastie and Robert Tibshirani