Shortcuts to making your R code run faster! Part 2




Hello everyone! I am sorry it took me so long to write this next post. As in my last post you can see I have been dealing with a lot on my plate. However, I am excited to share some more R tips to making your code run faster and more efficiently. In this post we will be working with vectors and matrices.

1. Loose the loop between matrices. 

How to run your code without a loop and using the apply() function. Now I am only going to show you the basic form of this function. The apply function helps save lots of computation time and makes your code so much more cleaner. For example, lets say we have a matrix and we want to find the mean of each column. You can't just use the mean() function, because that will give you the mean of the whole matrix. And you can do it column by column, but that takes time so to automate it one would think to create a loop.


# Example of loop of calculations done on a column. 
m <- matrix(c(1:100000), nrow = 20)
MeanCol = numeric(ncol(m))

Strt<-Sys.time()
for(i in 1:ncol(m)){
  MeanCol[i]= mean(m[,i])
}
print(Sys.time()-Strt)  # Print how long it takes to run the loop
## Time difference of 0.1315508 secs

Now lets see what it looks like without the loop.

# Example with Apply
Strt<-Sys.time()
MeanCol2 = apply(m,2,mean)
print(Sys.time()-Strt) # Print how long it takes to run the loop
## Time difference of 0.06251311 secs

The power of this function is amazing! First see how many lines it cuts your code down too.  We went from for lines of code down to just 1. Its clean and beautiful. On top of that it cuts your run time down by a lot.  So the 2 in the apply(m , 2 , mean), stands for apply the following functions via columns. You can run this function via the rows using 1:

Strt<-Sys.time()
MeanRow2 = apply(m,1,mean)
print(Sys.time()-Strt) # Print how long it takes to run the loop
## Time difference of 0.007017851 secs

You can customize apply() to use other kinds of functions also.  Check out ?apply for more info.

2. Loose the loop and do matrix math.

So we saw the power of the apply() function. Now lets look at its sister function sweep(). This function can help you do computation between columns and rows. Now I am only going to show you the simple version of this function. So lets start with you have a vector that contains values that need to be subtracted from its corresponding column in the matrix.

m <- matrix(c(1:100000), ncol = 20)
MeanCol2 = apply(m,2,mean)
m2 = m

# Example with loop.
Strt<-Sys.time()
for(i in 1:ncol(m)){
  m2[,i] = m[,i]-MeanCol2[i]
}
print(Sys.time()-Strt) # Print how long it takes to run the loop
## Time difference of 0.007526159 secs

Now lets see it without the loop.

# Example with sweep()
Strt<-Sys.time()
m2 = sweep(m,2,MeanCol2,FUN = "-")
print(Sys.time()-Strt) # Print how long it takes to run the loop
## Time difference of 0.00601697 secs

Even though this looks like a small amount of time difference between the two. Look at all the power that the sweep() function provides. Again you cut the number of coding lines down from 4 to 1. Also this is powerful in other matrix math. Check our ?sweep for more info.

I hope that these will help you with refining and cleaning your code. The goal is always to make the most easy efficient code. And these few functions can help you not only make it faster but cleaner in the end. Next time I will try to show you how to speed up your code using parallel processing.

Happy Coding!

Comments

Popular posts from this blog

Creating your own .gifs! Part 1

Women Who Code Connect!

Open Source Platforms ... Why aren't you using them yet?