1/31/2018

A Little Scenario

  • Your script takes 30 minutes to run
  • You can only go get coffee so many times in a day
  • You remember "someone told me for loops are slow"
  • You take a half a day to get rid of them all
  • Now your script takes 29 minutes and 55 seconds to run
  • You fling your coffee at the wall of your office

A Similar Scenario

  • Your bike feels sluggish on your way to work
  • You remember "someone told me carbon components make your bike lighter"
  • You buy expensive carbon components
  • Your bike still feels slow
  • Your brake pads have been rubbing on your wheels

Wisdom from Donald

Donald Knuth, creator of TeX and general computer science titan, once gave some sound advice:

"The real problem is that programmers have spent far too much time worrying about efficiency in the wrong places and at the wrong times; premature optimisation is the root of all evil (or at least most of it) in programming."

Ground We'll Cover Today

  • So many topics to cover, so I'll go for breadth, not depth (this isn't an apply tutorial)
  • I'll try to arm you with tools to optimize strategically
  • I'll also hit some low-hanging fruit that apply across contexts
  • Remember, R is really broad and flexible, so it's my way or one of a hundred highways
  • Also there are lots of puns

Getting the Lay of the Land

  • First thing we want to do is figure out our optimization targets
  • R has a built-in feature, Rprof()
  • However, Rprof() is evil
  • Not really, it's just hard to use

A Savior!

Meet your new best friend, profvis()

first_profile <- profvis({
times <- 4e5
cols <- 150
data <- as.data.frame(x = matrix(rnorm(times * cols, mean = 5), ncol = cols))
data <- cbind(id = paste0("g", seq_len(times)), data)
data1 <- data
means <- apply(data1[, names(data1) != "id"], 2, mean)
for (i in seq_along(means)) {
  data1[, names(data1) != "id"][, i] <- data1[, names(data1) != "id"][, i] - means[i]
}
}, height = "400px")

profvis() results