Shortcuts to making your R code run faster! Part 1


The goal of running code is usually to make tasks easier, faster, and more efficient. Sometimes however, it may feel like you are always waiting for your code to run. We will go through a few fun tips that can help speed up that process.

1. Preset your vector or Matrix destinations.

Believe it or not that presetting your vector destinations saves your code a lot of running time. This is because when you run a loop or a function and you don't preset your destination R has to hold those variables in its cache memory, however, if you have preset vector ready for variables the values will be routed to that vector. Take this simple example below for instance:
# Creating a destination with non-specific number of variables for output vector
X = c()
# Start timing the loop
Strt<-Sys.time()
for(i in 1:100000){
  X[i] = i
}
print(Sys.time()-Strt) # Print how long it takes to run the loop
## Time difference of 8.179752 secs

Ok lets try it with a more specific destination.

# Creating a destination with specific number of variables for output vector
X = numeric(100000)
# Start the loop
Strt<-Sys.time()
for(i in 1:100000){
  X[i] = i
}
print(Sys.time()-Strt) # Print how long it takes to run the loop
## Time difference of 0.215426 secs

See the major difference in the run time between the two?! How much of a difference it makes in the execution of that loop! Now I know you are thinking to yourself well this was only 8 seconds. But now imagine if you had to run a longer more complicated loop. This could save you minutes rather then seconds. Now there are more then one way to create an preset output vector. The numeric() function is good for when you need a vector filled with 0's. If you need to fill a vector with other values you can use the rep() function. This function will repeat any value for a specific length. The matrix() function will also create a matrix with any value for a specific number of rows and columns.

# Create a vector filled with 0's
X = numeric(100)

# Create a vecotr filled with 1's
X = rep(1,100)

#Create a matrix filled with 10's
X = matrix(10, nrow = 10,ncol = 10)

2. Loose the loop.

Another way to speed up your code is to lose the loop.  R is a power house and has built in features like indexing that help streamline your code and speed it up. Lets take a look at a normal FOR and IF THEN loop that we can avoid. 

# Example FOR and IF THEN loop
X = rep(c(1,2,3),100000)
Strt<-Sys.time()
for(i in 1:length(X)){
  if(i == 1){
    X[i] = 4
  }
}
print(Sys.time()-Strt) # Print how long it takes to run the loop
## Time difference of 0.142683 secs

Now lets see what it looks like without the loop.

# Example with Indexing
X = rep(c(1,2,3),100000)
Strt<-Sys.time()
X[X==1] = 4
print(Sys.time()-Strt) # Print how long it takes to run the loop
## Time difference of 0.03125691 secs

Now I know you are probably thinking well that only shaved off a few seconds. Yes it did. But the other beauty of using indexing verses a loop is the ability to streamline your code. Look at the first example. It takes up 5 lines of code where as the second one only take one line. AND yes I know that you could fit the FOR loop into one line of code if you wanted to, but it would be hard to read and in my humble opinion ugly. Indexing is far more elegant and its faster, so why not go with the best of both worlds! This is especially important when you are writing hundreds or more lines of code. Every line and every fraction of a second will count in the overall efficiency of the code.

Have fun cleaning and speeding up your code! Next time in Shortcuts to making your R code run faster! Part 2 we will talk about how you can loose the loop when calculating between matrices.

Comments

Popular posts from this blog

Creating your own .gifs! Part 1

Women Who Code Connect!

Open Source Platforms ... Why aren't you using them yet?