Tuesday, October 21, 2014

Linear Algebra on R part II

It turns out rARPACK is the package of choice when dealing with calculating a few eigenvalues and eigenvectors. To summarize this mess of matrix linear algebra library, we have:

1. BLAS: the basic linear algebra e.g. matrix multiplication
2. LAPACK: the second level linear algebra e.g. SVD, full eigendecomposition
3. ARPACK: sparse linear algebra, which means calculating a few eigenvalues / eigenvectors

openBLAS, intel MKL are the implementation of these library. In C++, Armadillo is an interface for BLAS, LAPACK, and ARPACK. In R, there is RCPP and RcppArmadillo is an interface to Armadillo. Unfortunately the eigs_sym function does not work, as R is not linked with ARPACK as explained here.

At the end, I found rARPACK, which is an interface for ARPACK. It uses Lanczos algorithm. The irlba package written in R also uses this algorithm to do the same thing, but at a slower speed. If we were to just use the full eigendecomposition function it would be very slow on large matrix.

A few benchmark on Macbook Air 2014 13inch Rstudio:
library("corpcor")
library("rARPACK")
#library(rbenchmark)
library(microbenchmark)
library(irlba)

# generate a "fat" data matrix
n = 5000
X = matrix(rnorm(n*n), n, n)
X = t(X)%*%X

# compute SVD
system.time( (s1 = eigen(X,symmetric = TRUE)) ) 
#   user  system elapsed 
# 58.487   1.042  35.895 
system.time( (s2 = svd(X, nu = 5, nv = 0)))
# user  system elapsed 
# 304.342   5.497 138.519
system.time( (s3 = fast.svd(X)) )
# user  system elapsed 
# 396.007   8.395 187.781 
system.time( (s4 = eigs_sym(X, 5)))
# user  system elapsed 
# 1.909   0.029   2.013 
system.time( (s5 = svds(X,k=5)) )
# user  system elapsed 
# 7.686   0.082   4.113 
system.time( (s6 = irlba(X, nu = 5, nv = 0)))
# user  system elapsed 
# 27.559   0.684  12.167 
Note that in a Mac, to use multithreaded linear algebra, you might need to replace R's default BLAS library with Mac BLAS implementation name Accelerate following thisguide.

No comments:

Post a Comment