1/31/2018

A Little Scenario

  • Your script takes 30 minutes to run
  • You can only go get coffee so many times in a day
  • You remember "someone told me for loops are slow"
  • You take a half a day to get rid of them all
  • Now your script takes 29 minutes and 55 seconds to run
  • You fling your coffee at the wall of your office

A Similar Scenario

  • Your bike feels sluggish on your way to work
  • You remember "someone told me carbon components make your bike lighter"
  • You buy expensive carbon components
  • Your bike still feels slow
  • Your brake pads have been rubbing on your wheels

Wisdom from Donald

Donald Knuth, creator of TeX and general computer science titan, once gave some sound advice:

"The real problem is that programmers have spent far too much time worrying about efficiency in the wrong places and at the wrong times; premature optimisation is the root of all evil (or at least most of it) in programming."

Ground We'll Cover Today

  • So many topics to cover, so I'll go for breadth, not depth (this isn't an apply tutorial)
  • I'll try to arm you with tools to optimize strategically
  • I'll also hit some low-hanging fruit that apply across contexts
  • Remember, R is really broad and flexible, so it's my way or one of a hundred highways
  • Also there are lots of puns

Getting the Lay of the Land

  • First thing we want to do is figure out our optimization targets
  • R has a built-in feature, Rprof()
  • However, Rprof() is evil
  • Not really, it's just hard to use

A Savior!

Meet your new best friend, profvis()

first_profile <- profvis({
times <- 4e5
cols <- 150
data <- as.data.frame(x = matrix(rnorm(times * cols, mean = 5), ncol = cols))
data <- cbind(id = paste0("g", seq_len(times)), data)
data1 <- data
means <- apply(data1[, names(data1) != "id"], 2, mean)
for (i in seq_along(means)) {
  data1[, names(data1) != "id"][, i] <- data1[, names(data1) != "id"][, i] - means[i]
}
}, height = "400px")

profvis() results

RStudio Built-In

  • In RStudio, you can use a built-in feature
  • Select some lines of code, then go up to the Profile drop-down menu up top (between Debug and Tools)
  • Use "Profile Selected Lines"

Finer-Scale Timing

  • microbenchmark is a nice way to check speeds of several similar functions
  • It will run many times, since results are stochastic
microbenchmark(
import("AllYearsTeamRatings.csv"),
read_csv("AllYearsTeamRatings.csv"),
read.csv("AllYearsTeamRatings.csv"))
## Unit: milliseconds
##                                 expr       min        lq      mean
##    import("AllYearsTeamRatings.csv")  1.990818  2.215731  2.634484
##  read_csv("AllYearsTeamRatings.csv")  7.851478  8.585897 10.288128
##  read.csv("AllYearsTeamRatings.csv") 10.131602 11.255166 12.597679
##     median       uq       max neval
##   2.399930  2.86970  7.484421   100
##   9.525741 10.48398 35.140998   100
##  11.945069 12.82106 28.540198   100

PassI/Onate About Speed

  • As our previous results show, import speeds can vary a lot
  • Input/Output (I/O) can be a bottleneck sometimes
  • rio package is smart, picks fastest function based on file type
  • import() and export() automatically detect file extension
  • Consider using a different file extension for data you'll be using within a computing environment

CSV vs. feather

data <- data.frame(rnorm(100000))
microbenchmark(write_csv(data, "test_file.csv"),
               write_feather(data, "test_file.feather"))
## Unit: microseconds
##                                      expr       min         lq      mean
##          write_csv(data, "test_file.csv") 26455.111 28355.5820 31801.475
##  write_feather(data, "test_file.feather")   739.033   876.0075  1260.006
##     median        uq       max neval
##  29344.907 34407.686 45073.669   100
##   1257.766  1550.091  4826.323   100

CSV vs. feather

microbenchmark(read_csv("test_file.csv"),
               read_feather("test_file.feather"))
## Unit: microseconds
##                               expr       min        lq      mean    median
##          read_csv("test_file.csv") 30614.785 36119.094 47165.268 44306.647
##  read_feather("test_file.feather")   610.757  1054.277  1523.343  1226.899
##         uq       max neval
##  52416.341 121495.88   100
##   1460.957  20330.38   100

How to Make Your Code Fast

Poor, Misunderstood for Loops

  • Often the first thing to get picked on
  • They aren't always as bad as people say
  • They're often poorly used, but it's not their fault!

What Actually Makes a Loop Slow?

  • Here's a stupid example:
microbenchmark({
  A <- 0
  for (i in 1:10000){
    10
    A = A + 1
  }
},
{
  A <- 0
  for (i in 1:10000){
    ((((((10))))))
    A = A + 1
  }
})

What Actually Makes a Loop Slow?

## Unit: milliseconds
##                                                                                    expr
##              {     A <- 0     for (i in 1:10000) {         10         A = A + 1     } }
##  {     A <- 0     for (i in 1:10000) {         ((((((10))))))         A = A + 1     } }
##       min       lq     mean   median       uq       max neval
##  2.166571 2.260783 2.566741 2.343574 2.680313  9.398676   100
##  4.705065 4.890848 6.066976 5.056176 5.768841 13.720298   100
  • Remember, in R, everything is a function call (or an object)
  • 2 + 3 is actually +(2,3), the function + with arguments 2 and 3
  • ( itself is a function, so using a bunch of them results in lots of function calls and lookups
  • Lots of calls in a loop = way more calls in total

What Actually Makes a Loop Slow?

  • for loops are slow when written inefficiently
  • Memory pre-allocation helps a lot
  • Do as much as possible outside the loop (also true for *apply)
  • Don't avoid loops just to avoid loops

*apply yourself

  • Fiery debate over whether *apply functions are anything more than "syntactic sugar"
  • They contain loops, but avoid some overhead and sometimes use faster C code
  • Truly vectorized code, like colMeans() loops through values in the underlying C/FORTRAN code, which is what makes it so fast

*apply yourself

  • Comparing an inefficient, vector-growing for loop to vapply
loop_square <- function(i){
  results <- NA
  for (i in seq_along(x)) results[i] <- x[i]^2
  return(results)
}
## Unit: milliseconds
##                                                expr       min         lq
##                                      loop_square(x) 106.75635 117.890377
##  vapply(x, function(z) z^2, FUN.VALUE = numeric(1))   5.17472   5.634594
##       mean     median         uq      max neval
##  190.82372 154.439612 253.061675 371.4646   100
##   11.07022   6.146159   6.757886 126.7339   100

*apply yourself

  • Comparing a more efficient for loop to vapply
loop_square <- function(i){
  results <- vector(mode = "double", length = length(x))
  for (i in seq_along(x)) results[i] <- x[i]^2
  return(results)
}
## Unit: milliseconds
##                                                expr      min       lq
##                                      loop_square(x) 6.738691 7.764370
##  vapply(x, function(z) z^2, FUN.VALUE = numeric(1)) 5.073235 5.732297
##       mean   median      uq      max neval
##  10.092268 8.274952 9.09996 93.48478   100
##   6.821297 6.045478 6.49372 14.24019   100

*apply yourself

  • Another big benefit is that lots of packages that deal with parallel computing in R include parallelized apply functions
  • mclapply() from parallel package is an example of this
  • Always remember, apply will never beat real vectorized base functions, like ^
## Unit: microseconds
##                                                expr     min       lq
##                                                 x^2   1.383   1.6910
##  vapply(x, function(z) z^2, FUN.VALUE = numeric(1)) 496.614 573.0450
##                                      loop_square(x) 653.372 760.8065
##       mean   median        uq      max neval
##    2.32919   1.9530    2.5885   10.679   100
##  778.87296 602.0425  685.9055 7786.401   100
##  984.00888 845.3905 1000.8405 8598.782   100

Cache Me If You Can

  • memoise package can cache answers to a function
  • If you call the function with the same inputs, it'll remember the old answer
fib <- function(n) {
  if (n < 2) {
    return(n)
  } else {
    return(fib(n-1) + fib(n-2))
  }
}

Cache Me If You Can

system.time(x <- fib(25))
##    user  system elapsed 
##   0.199   0.001   0.202
system.time(z <- fib(30))
##    user  system elapsed 
##   2.250   0.018   2.285

Cache Me If You Can

  • Now let's memoise the function and look at how speed changes
fib_mem <- memoise(fib)
system.time(a <- fib_mem(30))
##    user  system elapsed 
##   2.471   0.022   2.607
system.time(b <- fib_mem(30))
##    user  system elapsed 
##   0.001   0.000   0.000

Cache Me If You Can

  • A few potential uses:
  • You're running a Shiny app and want to store results in case the user selects the same inputs again
  • You're applying a function over a dataset where you expect many similar inputs
  • You're working on a script and want to save results of some slow calls (similar to caching chunks in RMarkdown)
  • Tradeoff here is speed vs. memory

Cache Me If You Can

  • Very dangerous if you're doing some sort of randomization
rnorm_plus1 <- function(x) {
  rnorm(x)+1
}
x <- rnorm_plus1(10)
y <- rnorm_plus1(10)
all.equal(x,y)
## [1] "Mean relative difference: 1.247497"
rp1_mem <- memoise(rnorm_plus1)
x <- rp1_mem(10)
y <- rp1_mem(10)
all.equal(x,y)
## [1] TRUE

Take a Big Ole Byte (Compiler)

  • R is interpreted, not compiled when it's run
  • Just-In-Time compiler compiles your code into bytecode, "just in time" to be run
  • If you're running R 3.3 or earlier, JIT compiler is not turned on by default
  • In R 3.4, JIT is turned on by default
  • value passed to enableJIT() changes how many types of functions get compiled (3 is the most)

Take a Big Ole Byte (Compiler)

  • Here we're comparing enableJIT(3) to enableJIT(0)
  • The (boring) function that we create will get used 100 times
## Unit: milliseconds
##                                                                                                          expr
##  {     enableJIT(3)     f <- function(n, x) for (i in 1:n) x = (1 + x)^(-1)     for (i in 1:100) f(1000, 1) }
##  {     enableJIT(0)     f <- function(n, x) for (i in 1:n) x = (1 + x)^(-1)     for (i in 1:100) f(1000, 1) }
##       min       lq     mean   median       uq      max neval
##  13.76068 14.35882 16.70984 15.08390 16.10643 28.74623    50
##  45.80460 55.12551 57.14903 56.53855 58.43033 71.93230    50

Take a Big Ole Byte (Compiler)

  • Same comparison, but the function we create is only getting used once
## Unit: microseconds
##                                                                                         expr
##  {     enableJIT(3)     f <- function(n, x) for (i in 1:n) x = (1 + x)^(-1)     f(1000, 1) }
##  {     enableJIT(0)     f <- function(n, x) for (i in 1:n) x = (1 + x)^(-1)     f(1000, 1) }
##       min       lq      mean   median       uq      max neval
##  3028.035 3377.980 4146.3370 3803.077 4300.393 14756.11    50
##   392.075  431.119  709.4553  493.852  574.315 10404.19    50
  • If a function only gets called once, compiling it can bring overhead costs

Take a Big Ole Byte (Compiler)

  • Be careful though, JIT compiler is turned to 3 by default in R 3.4, and sometimes JIT can be slower
  • These are results from an Integral Projection Model script

Take a Big Ole Byte (Compiler)

## Note: no visible binding for global variable 'JitValue' 
## Note: no visible binding for global variable 'Seconds'

Take a Big Ole Byte (Compiler)

  • You can also compile individual functions with cmpfun
## [1] 3
x <- rnorm(10000)
ls_pre_compile <- cmpfun(loop_square)
## Unit: milliseconds
##               expr      min       lq     mean   median       uq       max
##     loop_square(x) 6.791415 7.847052 9.841105 8.409107 9.623776 19.823237
##  ls_pre_compile(x) 1.273389 1.331444 1.542107 1.425895 1.592417  3.072625
##  neval
##    100
##    100

Take a Big Ole Byte (Compiler)

  • What if we take the compilation time into consideration?
## Unit: milliseconds
##                                                         expr      min
##  {     ls_compile <- cmpfun(loop_square)     ls_compile(x) } 4.995944
##                                               loop_square(x) 6.811277
##                                            ls_pre_compile(x) 1.275968
##        lq      mean   median        uq       max neval
##  5.555326  6.318468 5.818634  6.249761  16.66565   100
##  7.685619 11.407816 8.325102 10.198276 112.37745   100
##  1.328937  1.490730 1.398649  1.530100   2.57359   100
  • It's faster even if you include the compilation step!
  • Some functions benefit more from compiling than others
  • Individual compilation may not make much of a difference in R 3.4, where enableJIT(3) is the default

It's My Computer's Fault

  • Sometimes an excuse, sometimes true
  • Benchmarking scripts to see how fast your computer is
  • benchmarkme package, benchmark_std() function
  • Virtual machines through Google or Amazon
  • Don't sell the FARM
  • Buy more RAM (if you can)

Put Your Matrices on BLASt

  • BLAS = Basic Linear Algebra Subprograms
  • CRAN gives a universal, solid one
  • Others are optimized for certain computers
  • You can figure out what BLAS library you're using with this function:
library(benchmarkme)
get_linear_algebra()

R-ecipe For Success

  • First do the one-time low-hanging fruit (BLAS)
  • Profile your code, identify targets AND speed goals
  • Decide which targets are worth the work/learning to fix
  • Test some alternatives, use benchmarking/profiling
  • Keep your old versions for future reference
  • For goodness sake, use base functions

Remember- being strategic and disciplined is good, but nobody is perfect, and experimentation/playing around can be an important part of learning

Things to Keep in Mind

  • How much you'll use your code
  • How bad is the learning curve
  • How much time you have as a researcher
  • R is not a fast language

If you're crunched for time, learning a new method will take a while, and you'll never use this script again, why optimize it?

Credit Where It's Due

I got a lot of the information from these sources, among others:

Efficient R Programming by Colin Gillespie: This book covers a HUGE range of topics and is really well written.

Noam's D-RUG talk from 2013 and his blog post on vectorization

Hadley Wickham's Advanced R: I particularly used the Performance and Profiling chapters

An Article on For Loops in an R Newsletter From 2008

Lots of Stack Overflow

R Inferno by Patrick Burns: If you want to slowly descend into infernal torment of R knowledge, really dig into this book