R has several methods to run the processes in parallel. In this post I will show how to use 'lapply' function and 'doparallel' package's parallel processing method in R. The tutorial covers:
- Preparing test function
- Lapply method
- DoParallel method
- Source code listing
Preparing sample function
We use a below function and vector data to perform parallel process.
some_work <- function(a)
{
sum <- 0
for(i in 0:a)
{
sum <- sum+i
}
write(sum,file=paste0(sum,"_output.txt"))
}
items <- c(10001119,20001119,3000111,4000111,5000111,6000111,
7000111,8000111,10091119,20011119,3020111,4030111,
5040111,6050111,7060111,8070111)
Lapply function
'lapply' helps us to apply function over the vector or list. We can use the function as shown below.
lapply(items, some_work)
DoParallel method
We use 'doParallell' library and you need to install this package . First we create clusters and register them, then run the process in 'foreach' loop. After finishing the process the clusters should be stopped. We apply parallel process as shown below.
install.packages("doParallel")
library(doParallel)
cl = makeCluster(detectCores())
registerDoParallel(cl)
foreach(i=1:length(items)) %dopar%
{
some_work(items[i])
}
stopCluster(cl)
You can also set fixed number into the makeCluster() function instead of detectCore().
In this tutorial, we've briefly learned how to use lapply() and doParallel() methods in parallel process. The full source code is listed below.
Source code listing
library(doParallel)
some_work<-function(a)
{
sum <- 0
for(i in 0:a)
{
sum <- sum+i
}
write(sum,file=paste0(sum,"_output.txt"))
cat("done! ")
}
items<-c(10001119,20001119,3000111,4000111,5000111,6000111,7000111,8000111,
10091119,20011119,3020111,4030111,5040111,6050111,7060111,8070111)
lapply(numbers, some_work)
cl = makeCluster(detectCores())
registerDoParallel(cl)
foreach(i=1:length(items)) %dopar%
{
some_work(items[i])
}
stopCluster(cl)
No comments:
Post a Comment