Avoid the R-oot of All Evil

1/31/2018

A Little Scenario

Your script takes 30 minutes to run
You can only go get coffee so many times in a day
You remember "someone told me for loops are slow"
You take a half a day to get rid of them all
Now your script takes 29 minutes and 55 seconds to run
You fling your coffee at the wall of your office

A Similar Scenario

Your bike feels sluggish on your way to work
You remember "someone told me carbon components make your bike lighter"
You buy expensive carbon components
Your bike still feels slow
Your brake pads have been rubbing on your wheels

Wisdom from Donald

Donald Knuth, creator of TeX and general computer science titan, once gave some sound advice:

"The real problem is that programmers have spent far too much time worrying about efficiency in the wrong places and at the wrong times; premature optimisation is the root of all evil (or at least most of it) in programming."

Ground We'll Cover Today

So many topics to cover, so I'll go for breadth, not depth (this isn't an apply tutorial)
I'll try to arm you with tools to optimize strategically
I'll also hit some low-hanging fruit that apply across contexts
Remember, R is really broad and flexible, so it's my way or one of a hundred highways
Also there are lots of puns

Getting the Lay of the Land

First thing we want to do is figure out our optimization targets
R has a built-in feature, Rprof()
However, Rprof() is evil
Not really, it's just hard to use

A Savior!

Meet your new best friend, profvis()

first_profile <- profvis({
times <- 4e5
cols <- 150
data <- as.data.frame(x = matrix(rnorm(times * cols, mean = 5), ncol = cols))
data <- cbind(id = paste0("g", seq_len(times)), data)
data1 <- data
means <- apply(data1[, names(data1) != "id"], 2, mean)
for (i in seq_along(means)) {
  data1[, names(data1) != "id"][, i] <- data1[, names(data1) != "id"][, i] - means[i]
}
}, height = "400px")

`profvis()` results

RStudio Built-In

In RStudio, you can use a built-in feature
Select some lines of code, then go up to the Profile drop-down menu up top (between Debug and Tools)
Use "Profile Selected Lines"

Finer-Scale Timing

microbenchmark is a nice way to check speeds of several similar functions
It will run many times, since results are stochastic

microbenchmark(
import("AllYearsTeamRatings.csv"),
read_csv("AllYearsTeamRatings.csv"),
read.csv("AllYearsTeamRatings.csv"))

## Unit: milliseconds
##                                 expr       min        lq      mean
##    import("AllYearsTeamRatings.csv")  1.990818  2.215731  2.634484
##  read_csv("AllYearsTeamRatings.csv")  7.851478  8.585897 10.288128
##  read.csv("AllYearsTeamRatings.csv") 10.131602 11.255166 12.597679
##     median       uq       max neval
##   2.399930  2.86970  7.484421   100
##   9.525741 10.48398 35.140998   100
##  11.945069 12.82106 28.540198   100

PassI/Onate About Speed

As our previous results show, import speeds can vary a lot
Input/Output (I/O) can be a bottleneck sometimes
rio package is smart, picks fastest function based on file type
import() and export() automatically detect file extension
Consider using a different file extension for data you'll be using within a computing environment

CSV vs. feather

data <- data.frame(rnorm(100000))
microbenchmark(write_csv(data, "test_file.csv"),
               write_feather(data, "test_file.feather"))

## Unit: microseconds
##                                      expr       min         lq      mean
##          write_csv(data, "test_file.csv") 26455.111 28355.5820 31801.475
##  write_feather(data, "test_file.feather")   739.033   876.0075  1260.006
##     median        uq       max neval
##  29344.907 34407.686 45073.669   100
##   1257.766  1550.091  4826.323   100

CSV vs. feather

microbenchmark(read_csv("test_file.csv"),
               read_feather("test_file.feather"))

## Unit: microseconds
##                               expr       min        lq      mean    median
##          read_csv("test_file.csv") 30614.785 36119.094 47165.268 44306.647
##  read_feather("test_file.feather")   610.757  1054.277  1523.343  1226.899
##         uq       max neval
##  52416.341 121495.88   100
##   1460.957  20330.38   100

How to Make Your Code Fast

Poor, Misunderstood `for` Loops

Often the first thing to get picked on
They aren't always as bad as people say
They're often poorly used, but it's not their fault!

What Actually Makes a Loop Slow?

Here's a stupid example:

microbenchmark({
  A <- 0
  for (i in 1:10000){
    10
    A = A + 1
  }
},
{
  A <- 0
  for (i in 1:10000){
    ((((((10))))))
    A = A + 1
  }
})

What Actually Makes a Loop Slow?

## Unit: milliseconds
##                                                                                    expr
##              {     A <- 0     for (i in 1:10000) {         10         A = A + 1     } }
##  {     A <- 0     for (i in 1:10000) {         ((((((10))))))         A = A + 1     } }
##       min       lq     mean   median       uq       max neval
##  2.166571 2.260783 2.566741 2.343574 2.680313  9.398676   100
##  4.705065 4.890848 6.066976 5.056176 5.768841 13.720298   100

Remember, in R, everything is a function call (or an object)
2 + 3 is actually +(2,3), the function + with arguments 2 and 3
( itself is a function, so using a bunch of them results in lots of function calls and lookups
Lots of calls in a loop = way more calls in total

What Actually Makes a Loop Slow?

for loops are slow when written inefficiently
Memory pre-allocation helps a lot
Do as much as possible outside the loop (also true for *apply)
Don't avoid loops just to avoid loops

`*apply` yourself

Fiery debate over whether *apply functions are anything more than "syntactic sugar"
They contain loops, but avoid some overhead and sometimes use faster C code
Truly vectorized code, like colMeans() loops through values in the underlying C/FORTRAN code, which is what makes it so fast

`*apply` yourself

Comparing an inefficient, vector-growing for loop to vapply

loop_square <- function(i){
  results <- NA
  for (i in seq_along(x)) results[i] <- x[i]^2
  return(results)
}

## Unit: milliseconds
##                                                expr       min         lq
##                                      loop_square(x) 106.75635 117.890377
##  vapply(x, function(z) z^2, FUN.VALUE = numeric(1))   5.17472   5.634594
##       mean     median         uq      max neval
##  190.82372 154.439612 253.061675 371.4646   100
##   11.07022   6.146159   6.757886 126.7339   100

`*apply` yourself

Comparing a more efficient for loop to vapply

loop_square <- function(i){
  results <- vector(mode = "double", length = length(x))
  for (i in seq_along(x)) results[i] <- x[i]^2
  return(results)
}

## Unit: milliseconds
##                                                expr      min       lq
##                                      loop_square(x) 6.738691 7.764370
##  vapply(x, function(z) z^2, FUN.VALUE = numeric(1)) 5.073235 5.732297
##       mean   median      uq      max neval
##  10.092268 8.274952 9.09996 93.48478   100
##   6.821297 6.045478 6.49372 14.24019   100

`*apply` yourself

Another big benefit is that lots of packages that deal with parallel computing in R include parallelized apply functions
mclapply() from parallel package is an example of this
Always remember, apply will never beat real vectorized base functions, like ^

## Unit: microseconds
##                                                expr     min       lq
##                                                 x^2   1.383   1.6910
##  vapply(x, function(z) z^2, FUN.VALUE = numeric(1)) 496.614 573.0450
##                                      loop_square(x) 653.372 760.8065
##       mean   median        uq      max neval
##    2.32919   1.9530    2.5885   10.679   100
##  778.87296 602.0425  685.9055 7786.401   100
##  984.00888 845.3905 1000.8405 8598.782   100

Cache Me If You Can

memoise package can cache answers to a function
If you call the function with the same inputs, it'll remember the old answer

fib <- function(n) {
  if (n < 2) {
    return(n)
  } else {
    return(fib(n-1) + fib(n-2))
  }
}

Cache Me If You Can

system.time(x <- fib(25))

##    user  system elapsed 
##   0.199   0.001   0.202

system.time(z <- fib(30))

##    user  system elapsed 
##   2.250   0.018   2.285

Cache Me If You Can

Now let's memoise the function and look at how speed changes

fib_mem <- memoise(fib)
system.time(a <- fib_mem(30))

##    user  system elapsed 
##   2.471   0.022   2.607

system.time(b <- fib_mem(30))

##    user  system elapsed 
##   0.001   0.000   0.000

Cache Me If You Can

A few potential uses:
You're running a Shiny app and want to store results in case the user selects the same inputs again
You're applying a function over a dataset where you expect many similar inputs
You're working on a script and want to save results of some slow calls (similar to caching chunks in RMarkdown)
Tradeoff here is speed vs. memory

Cache Me If You Can

Very dangerous if you're doing some sort of randomization

rnorm_plus1 <- function(x) {
  rnorm(x)+1
}
x <- rnorm_plus1(10)
y <- rnorm_plus1(10)
all.equal(x,y)

## [1] "Mean relative difference: 1.247497"

rp1_mem <- memoise(rnorm_plus1)
x <- rp1_mem(10)
y <- rp1_mem(10)
all.equal(x,y)

## [1] TRUE

Take a Big Ole Byte (Compiler)

R is interpreted, not compiled when it's run
Just-In-Time compiler compiles your code into bytecode, "just in time" to be run
If you're running R 3.3 or earlier, JIT compiler is not turned on by default
In R 3.4, JIT is turned on by default
value passed to enableJIT() changes how many types of functions get compiled (3 is the most)

Take a Big Ole Byte (Compiler)

Here we're comparing enableJIT(3) to enableJIT(0)
The (boring) function that we create will get used 100 times

## Unit: milliseconds
##                                                                                                          expr
##  {     enableJIT(3)     f <- function(n, x) for (i in 1:n) x = (1 + x)^(-1)     for (i in 1:100) f(1000, 1) }
##  {     enableJIT(0)     f <- function(n, x) for (i in 1:n) x = (1 + x)^(-1)     for (i in 1:100) f(1000, 1) }
##       min       lq     mean   median       uq      max neval
##  13.76068 14.35882 16.70984 15.08390 16.10643 28.74623    50
##  45.80460 55.12551 57.14903 56.53855 58.43033 71.93230    50

Take a Big Ole Byte (Compiler)

Same comparison, but the function we create is only getting used once

## Unit: microseconds
##                                                                                         expr
##  {     enableJIT(3)     f <- function(n, x) for (i in 1:n) x = (1 + x)^(-1)     f(1000, 1) }
##  {     enableJIT(0)     f <- function(n, x) for (i in 1:n) x = (1 + x)^(-1)     f(1000, 1) }
##       min       lq      mean   median       uq      max neval
##  3028.035 3377.980 4146.3370 3803.077 4300.393 14756.11    50
##   392.075  431.119  709.4553  493.852  574.315 10404.19    50

If a function only gets called once, compiling it can bring overhead costs

Take a Big Ole Byte (Compiler)

Be careful though, JIT compiler is turned to 3 by default in R 3.4, and sometimes JIT can be slower
These are results from an Integral Projection Model script

Take a Big Ole Byte (Compiler)

## Note: no visible binding for global variable 'JitValue' 
## Note: no visible binding for global variable 'Seconds'

Take a Big Ole Byte (Compiler)

You can also compile individual functions with cmpfun

## [1] 3

x <- rnorm(10000)
ls_pre_compile <- cmpfun(loop_square)

## Unit: milliseconds
##               expr      min       lq     mean   median       uq       max
##     loop_square(x) 6.791415 7.847052 9.841105 8.409107 9.623776 19.823237
##  ls_pre_compile(x) 1.273389 1.331444 1.542107 1.425895 1.592417  3.072625
##  neval
##    100
##    100

Take a Big Ole Byte (Compiler)

What if we take the compilation time into consideration?

## Unit: milliseconds
##                                                         expr      min
##  {     ls_compile <- cmpfun(loop_square)     ls_compile(x) } 4.995944
##                                               loop_square(x) 6.811277
##                                            ls_pre_compile(x) 1.275968
##        lq      mean   median        uq       max neval
##  5.555326  6.318468 5.818634  6.249761  16.66565   100
##  7.685619 11.407816 8.325102 10.198276 112.37745   100
##  1.328937  1.490730 1.398649  1.530100   2.57359   100

It's faster even if you include the compilation step!
Some functions benefit more from compiling than others
Individual compilation may not make much of a difference in R 3.4, where enableJIT(3) is the default

It's My Computer's Fault

Sometimes an excuse, sometimes true
Benchmarking scripts to see how fast your computer is
benchmarkme package, benchmark_std() function
Virtual machines through Google or Amazon
Don't sell the FARM
Buy more RAM (if you can)

Put Your Matrices on BLASt

BLAS = Basic Linear Algebra Subprograms
CRAN gives a universal, solid one
Others are optimized for certain computers
You can figure out what BLAS library you're using with this function:

library(benchmarkme)
get_linear_algebra()

Here's how to change yours on a Mac

R-ecipe For Success

First do the one-time low-hanging fruit (BLAS)
Profile your code, identify targets AND speed goals
Decide which targets are worth the work/learning to fix
Test some alternatives, use benchmarking/profiling
Keep your old versions for future reference
For goodness sake, use base functions

Remember- being strategic and disciplined is good, but nobody is perfect, and experimentation/playing around can be an important part of learning

Things to Keep in Mind

How much you'll use your code
How bad is the learning curve
How much time you have as a researcher
R is not a fast language

If you're crunched for time, learning a new method will take a while, and you'll never use this script again, why optimize it?

Credit Where It's Due

I got a lot of the information from these sources, among others:

Efficient R Programming by Colin Gillespie: This book covers a HUGE range of topics and is really well written.

Noam's D-RUG talk from 2013 and his blog post on vectorization

Hadley Wickham's Advanced R: I particularly used the Performance and Profiling chapters

An Article on For Loops in an R Newsletter From 2008

Lots of Stack Overflow

R Inferno by Patrick Burns: If you want to slowly descend into infernal torment of R knowledge, really dig into this book

A Little Scenario

A Similar Scenario

Wisdom from Donald

Ground We'll Cover Today

Getting the Lay of the Land

A Savior!

profvis() results

RStudio Built-In

Finer-Scale Timing

PassI/Onate About Speed

CSV vs. feather

CSV vs. feather

How to Make Your Code Fast

Poor, Misunderstood for Loops

What Actually Makes a Loop Slow?

What Actually Makes a Loop Slow?

What Actually Makes a Loop Slow?

*apply yourself

*apply yourself

*apply yourself

*apply yourself

Cache Me If You Can

Cache Me If You Can

Cache Me If You Can

Cache Me If You Can

Cache Me If You Can

Take a Big Ole Byte (Compiler)

Take a Big Ole Byte (Compiler)

Take a Big Ole Byte (Compiler)

Take a Big Ole Byte (Compiler)

Take a Big Ole Byte (Compiler)

Take a Big Ole Byte (Compiler)

Take a Big Ole Byte (Compiler)

It's My Computer's Fault

Put Your Matrices on BLASt

R-ecipe For Success

Things to Keep in Mind

Credit Where It's Due

`profvis()` results

Poor, Misunderstood `for` Loops

`*apply` yourself

`*apply` yourself

`*apply` yourself

`*apply` yourself