In R, I use dist function. This is very fast but using only one core. And I have no idea how to make it use multiple cores. In R, there is a library irlba which calculate the first few eigenvectors pretty fast.
I want to use C++ to do this job. Now for both R and C++, the backend Linear Algebra is done by BLAS (the basic linear algebra such as addition multiplication), and LAPACK (the higher level linear algebra such as SVD decomposition, QR, Cholesky etc.). Performance of the the basic linear algebra shouldn't be different between R and C++ since R's functions are written in C++.
In C++ there are two competing Linear Algebra packages namely Armadillo and Eigen. Eigen does not use BLAS and LAPACK, and consensus is Eigen slightly faster than Armadillo. Armadillo uses BLAS and LAPACK, basically a wrapper for these basic libraries, as the basic libraries have very ugly syntax.
1. How to use Armadillo in R. First install RcppArmadillo in R. Then in C++ write this:
Note that both of the comment are needed for the function to work. The second load the function into R environment. The first declares dependency of RcppArmadillo on Rcpp.#include <RcppArmadillo.h> // [[Rcpp::depends(RcppArmadillo)]] // [[Rcpp::export]] arma::vec getEigenValues(arma::mat M) { return arma::eig_sym(M); }
On R write this:
2. How to use Eigen in R. Again install RcppEigen in R. Then in C++ write this:library('Rcpp') library('microbenchmark') working.directory <- PATH # Fill in your path here sourceCpp(file.path(working.directory, 'EigenARMA.cpp')) n = 2e3 M = matrix(rnorm(n^2),nrow = n, ncol = n) M = t(M)%*%M print(system.time(eigen(M))) print(system.time(getEigenValuesARMA(M)))
In R write this:#include <RcppEigen.h> // [[Rcpp::depends(RcppEigen)]] using Eigen::Map; // 'maps' rather than copies using Eigen::MatrixXd; // variable size matrix, double precision using Eigen::VectorXd; // variable size vector, double precision using Eigen::SelfAdjointEigenSolver; // one of the eigenvalue solvers // [[Rcpp::export]] VectorXd getEigenValuesEG(Map<MatrixXd> M) { SelfAdjointEigenSolver<MatrixXd> es(M); return es.eigenvalues(); }
Reference:library('Rcpp') library('microbenchmark') working.directory <- PATH # Fill in your path here sourceCpp(file.path(working.directory, 'EigenEG.cpp')) n = 2e3 M = matrix(rnorm(n^2),nrow = n, ncol = n) M = t(M)%*%M print(system.time(eigen(M))) print(system.time(getEigenValuesEG(M)))
RcppEigen Example
RcppArmadillo Example
3. Now comparing the performance: calculating eigenvalues of matrix 2000*2000.
Arma: 3.459s
Eigen: 46.625s
R's built in eigen: 12.547sBut the difference here is because Eigen.cpp only uses one core, while both ARMA and R's built in function use multicores. To set the number of core Blas uses in R, insert this at the beginning:
Time now is:library(RhpcBLASctl) blas_set_num_threads(1) # For 1 means using one core.
Arma: 4.319s
Eigen: 47.635s
R's built in eigen: 8.434sEverything uses one core now, but there might be still a difference as the Arma seems to work for symmetric matrix. Also it returns only the eigenvalues, not eigenvectors.
4. Other discussion:
It seems rARPACK provides an interface to ARPACK in R directly too. Will check this out.
No comments:
Post a Comment