Movatterモバイル変換


[0]ホーム

URL:


rsparse

R build statuscodecovLicenseProject Status

rsparse is an R package for statistical learningprimarily onsparse matrices -matrixfactorizations, factorization machines, out-of-core regression.Many of the implemented algorithms are particularly useful forrecommender systems andNLP.

We’ve paid some attention to the implementation details - we try toavoid data copies, utilize multiple threads via OpenMP and use SIMDwhere appropriate. Packageallows to work on datasets withmillions of rows and millions of columns.

Features

Classification/Regression

  1. Followthe proximally-regularized leader which allows to solveverylarge linear/logistic regression problems with elastic-netpenalty. Solver uses stochastic gradient descent with adaptive learningrates (so can be used for online learning - not necessary to load alldata to RAM). SeeAdClick Prediction: a View from the Trenches for more examples.
  2. FactorizationMachines supervised learning algorithm which learns second orderpolynomial interactions in a factorized way. We provide highly optimizedSIMD accelerated implementation.

Matrix Factorizations

  1. VanillaMaximum Margin Matrix Factorization -classic approch for “rating” prediction. SeeWRMF class andconstructor optionfeedback = "explicit". Original paperwhich indroduced MMMF could be foundhere.
  2. Weighted Regularized Matrix Factorization (WRMF)fromCollaborative Filtering forImplicit Feedback Datasets. SeeWRMF class andconstructor optionfeedback = "implicit". We provide 2solvers:
    1. Exact based on Cholesky Factorization
    2. Approximated based on fixed number of steps ofConjugateGradient. See details inApplications ofthe Conjugate Gradient Method for Implicit Feedback CollaborativeFiltering andFasterImplicit Matrix Factorization.
  3. Linear-Flow fromPractical Linear Modelsfor Large-Scale One-Class Collaborative Filtering. Algorithm looksfor factorized low-rank item-item similarity matrix (in some sense it issimilar toSLIM)
  4. FastTruncated SVD andTruncatedSoft-SVD via Alternating Least Squares as described inMatrix Completion and Low-Rank SVDvia Fast Alternating Least Squares. Works for both sparse and densematrices. Works onfloat matrices as well!For certain problems may be even faster thanirlba package.
  5. Soft-Impute via fast Alternating Least Squares asdescribed inMatrix Completionand Low-Rank SVD via Fast Alternating Least Squares.
  6. GloVe as described inGloVe: Global Vectors forWord Representation.
  7. Matrix scaling as descibed inEigenRec: Generalizing PureSVDfor Effective and Efficient Top-N Recommendations

Note: the optimized matrix operations whichrparseused to offer have been moved to aseparatepackage

Installation

Most of the algorithms benefit from OpenMP and many of them couldutilize high-performance implementations of BLAS. If you want to makethe maximum out of this package, please read the section belowcarefully.

It is recommended to:

  1. Use high-performance BLAS (such as OpenBLAS, MKL, AppleAccelerate).
  2. Add proper compiler optimizations in your~/.R/Makevars. For example on recent processors (with AVXsupport) and compiler with OpenMP support, the following lines could bea good option:
CXX11FLAGS += -O3 -march=native -fopenmpCXXFLAGS   += -O3 -march=native -fopenmp

Mac OS

If you are onMac follow the instructions athttps://mac.r-project.org/openmp/.Afterclang configuration, additionally put aPKG_CXXFLAGS += -DARMA_USE_OPENMP line in your~/.R/Makevars. After that, installrsparse inthe usual way.

Also we recommend to usevecLib- Apple’s implementations of BLAS.

ln-sf  /System/Library/Frameworks/Accelerate.framework/Frameworks/vecLib.framework/Versions/Current/libBLAS.dylib /Library/Frameworks/R.framework/Resources/lib/libRblas.dylib

Linux

On Linux, it’s enough to just create this file if it doesn’t exist(~/.R/Makevars).

If using OpenBLAS, it is highly recommended to use theopenmp variant rather than thepthreadsvariant. On Linux, it is usually available as a separate package intypical distribution package managers (e.g. for Debian, it can beobtained by installinglibopenblas-openmp-dev, which is notthe default version), and if there are multiple BLASes installed, can beset as the default through theDebianalternatives system - which can also be usedfor MKL.

Windows

By default, R for Windows comes with unoptimized BLAS and LAPACKlibraries, andrsparse will prefer using Armadillo’sreplacements instead. In order to use BLAS,installrsparse from source (not from CRAN), removing theoption-DARMA_DONT_USE_BLAS fromsrc/Makevars.win and ideally adding-march=native (underPKG_CXXFLAGS). Seethistutorial for instructions on getting R for Windows to use OpenBLAS.Alternatively, Microsoft’s MRAN distribution for Windows comes withMKL.

Materials

Note that syntax is these posts/slides is not up to datesince package was under active development

  1. Slidesfrom DataFest Tbilisi(2017-11-16)

Here is example ofrsparse::WRMF onlastfm360k dataset incomparison with other good implementations:

API

We followmlapiconventions.

Release and configure

Making release

Don’t forget to addDARMA_NO_DEBUG toPKG_CXXFLAGS to skip bound checks (this has significantimpact on NNLS solver)

PKG_CXXFLAGS = ... -DARMA_NO_DEBUG

Configure

Generate configure:

autoconf configure.ac> configure&&chmod +x configure

[8]ページ先頭

©2009-2025 Movatter.jp