Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

High-level, R-like interface for fmlr

NotificationsYou must be signed in to change notification settings

fml-fam/craze

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

54 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

High-level, R-like interface forfmlr. The package name is a play on the German word 'fimmel' (fml, fimmel, ...).

The goal of the package is to give a more traditional, R-like interface around fmlr functions and methods. It's basically just a shallow S4 wrapper. The canonical use case would be something like:

  • build your matrix in R
  • convert to an fmlr object:
    • the easy way: usefmlmat()
    • the harder, more robust way:
      • convert the R matrix to an fmlr object viaas_cpumat(),as_gpumat(), oras_mpimat() (may require a copy)
      • convert to a crazefmlmat object viaas_fmlmat() (no copy)
  • call the desired linear algebra function(s)

Installation

You will need an installation offmlr. See thefmlr installation guide for more details.

You can install the stable version fromthe HPCRAN using the usualinstall.packages():

install.packages("craze",repos=c("https://hpcran.org","https://cran.rstudio.com"))

The development version of craze is maintained on GitHub:

remotes::install_github("fml-fam/craze")

Example

Multiplying CPU data:

library(craze)x=matrix(as.double(1:9),3)x_cpu= fmlmat(x)x_cpu
## # cpumat 3x3 type=d## 1.0000 4.0000 7.0000 ## 2.0000 5.0000 8.0000 ## 3.0000 6.0000 9.0000
x_cpu%*%x_cpu
## # cpumat 3x3 type=d## 30.0000 66.0000 102.0000 ## 36.0000 81.0000 126.0000 ## 42.0000 96.0000 150.0000

and GPU data:

x_gpu= fmlmat(x,backend="gpu")x_gpu
## # gpumat 3x3 type=d ## 1.0000 4.0000 7.0000 ## 2.0000 5.0000 8.0000 ## 3.0000 6.0000 9.0000
x_gpu%*%x_gpu
## # gpumat 3x3 type=d ## 30.0000 66.0000 102.0000 ## 36.0000 81.0000 126.0000 ## 42.0000 96.0000 150.0000

Benchmark

Throughout, I'm using:

  • R
    • R version 3.6.2
    • float version 0.2-4
    • fml version 0.2-1
    • craze version 0.1-0
  • CPU
    • AMD Ryzen 7 1700X Eight-Core Processor
    • OpenBLAS with 8 threads
  • GPU
    • NVIDIA GeForce GTX 1070 Ti
    • cuBLAS

Let's take a look at a quick matrix multiplication benchmark. First, we need to set up the test matrices:

library(craze)n=5000x=matrix(runif(n*n),n,n)x_flt= fl(x)x_cpu= fmlmat(x)x_gpu= fmlmat(x,type="float",backend="gpu")

We're using float for the GPU data because my graphics card doesn't have the full double precision cores. That change should give us a roughly 2x run-time advantage over a double precision test, like the R version. It's more amenable to the float test.

First we'll time the R matrix product (double precision, CPU)

system.time(x%*%x)
##   user  system elapsed ## 10.241   0.330   1.345

As I recall, R's matrix multiplication does some pre-scanning for bad numerical values. None of the implementations that follow do, so there is some overhead in this implementation that may or may not be of value to you.

Here's the float test (single precision, CPU)

system.time(x_flt%*%x_flt)
##  user  system elapsed ## 4.898   0.212   0.640

This is a little more than twice as fast, which makes sense.

Here's a double precision test using fmlr as the backend (double precision, CPU)

system.time(x_cpu%*%x_cpu)
##   user  system elapsed ## 10.285   0.317   1.327

Even with the overhead of the R version, the run times are essentially the same. This is expected, since most of the work is actually in computing the product. And both are calling out to the samedgemm() function in OpenBLAS. The float version above is callingsgemm() from OpenBLAS.

The GPU numbers are pretty different though:

system.time(x_gpu%*%x_gpu)
##  user  system elapsed ## 0.002   0.001   0.002

This is more than 300x faster than the CPU float version. There's a reason I chose matrix multiplication for this benchmark 😉

BenchmarkPrecisionWall-clock timeRelative Performance
R matrixxdouble1.345672.5
float matrixx_fltsingle0.640320.0
fmlr CPU matrixx_cpudouble1.327663.5
fmlr GPU matrixx_gpusingle0.0021.0

About

High-level, R-like interface for fmlr

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages


[8]ページ先頭

©2009-2025 Movatter.jp