Home

# R Notes

Recently, I worked with R, Rcpp, RcppArmadillo, RcppGSL, foreach, doParallel, doMC and doSNOW. I was shocked by the shortage of documentation. Following are what I learned by hacking around.

• sourceCpp('source.cpp') from Rcpp is very useful to build and test cpp code working with R. What's more awesome is the convenience Rcpp brought to build a package around the *.cpp file. Just do Rcpp.package.skeleton and compileAttributes, then you will get a "R CMD check package" free package. Normally, generate a package is necessary to make the sources.cpp to work with parallel package.
• code generate from compileAttributes is platform dependent. Therefore, if the package was generate from a machine with gcc 4.6, the package may not be able to pass "R CMD check" on another machine with gcc 4.8. Anything, just run compileAttributes again on the new machine. Also, don't forget the run compileAttributes, whenever the cpp file is changed, epecially the signature of the exported function is changed.
• The pipeline working with Rcpp,
• build cpp source using sourceCpp('*')
• debug using "R -d gdb" to debug the cpp code
• initialize package using Rcpp.package.skeleton
• move cpp file into package/src/
• compileAttributes(package) to generate necessary files for the build R package
• R CMD check package to check whether it's working
• R CMD build package to generate the *.tar.gz file
• R> install.packages("*.tar.gz") to install into R library folder
• RcppArmadillo is a binding with a convenient matrix package Armadillo. However, RcppArmadillo just do some wrapper to reformat arma:: data structures into Rcpp structures. and Rcpp is just wrapper to reformat SEXP structure into Rcpp structures. Therefore, one should find references for Armadillo from Armadillo website, not RcppArmadillo.
• doParallel, doMC and doSNOW are excelllent parallel backend for R. doParallel and doMC is for one machine multicore architecture. doSNOW is for multi machine clusters.
• foreach is a frontend for parallel computing. Here is the baseline code that illustrate how to work with qsub and doSNOW.
library(doSNOW)

getnodes <- function(hosts) {
#f <- Sys.getenv('PBS_NODEFILE')
#x <- if (nzchar(f)) readLines(f) else rep('localhost', 3)
x <- hosts
as.data.frame(table(x), stringsAsFactors=FALSE)
}

hosts <- c('cycle1', 'cycle1', 'cycle3', 'cycle2', 'cycle3')
nodes <- getnodes(hosts)
cl <- makeSOCKcluster(nodes$x) registerDoSNOW(cl) setcores <- function(cl, nodes) { f <- function(cores) assign('allocated.cores', cores, pos=.GlobalEnv) clusterApply(cl, nodes$Freq, f)
}
setcores(cl, nodes)

r <- foreach(i=seq_along(nodes$x), .packages='doMC') %dopar% { registerDoMC(allocated.cores) ppid <- Sys.getpid() foreach(j=1:allocated.cores, .combine='c', .packages='R.utils') %dopar% { Sys.sleep(1) c(ppid, Sys.getpid(), System$getHostname())
}
}
stopCluster(cl)

###
refer to the original link http://stackoverflow.com/questions/17288379/how-to-set-up-dosnow-and-sock-cluster-with-torque-moab-scheduler to see how to work with qsub.

• I encountered some strange errors with socket connections. I am trying to avoid data transfer over the socket, which may be the problem.
• take special care for RNG using clusters. You may end up exact the same results using replicate in sampling.