- Notifications
You must be signed in to change notification settings - Fork0
francois141/Efficient-SDDMM-CUDA-implementation
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
The Sampled Dense-Dense Matrix Multiplication (SDDMM) represents a foundational operation crucial for numerous significant machine learning factor analysis algorithms. Among these algorithms are Alternating Least Squares (ALS), Latent Dirichlet Allocation (LDA), Sparse Factor Analysis (SFA), and Gama Poisson. In this repository, we present both our code and the comprehensive findings detailed in ourfinal report. Our focus lies on the development ofGPU-Dynamic
, an efficient GPU-based implementation of the SDDMM kernel. Our solution boasts remarkable performance enhancements, surpassing current implementations found in Torch with notable speedups of up to 100x. Furthermore, our implementation delivers competitive outcomes when compared to DGL.
At the bottom of this README is a representation of all matrices that we have used for evaluation. The matrices range different dimensions as well as different densities. All matrices originate from theSuiteSparse Matrix Collection and can be downloaded by executing theinstall_matrices.sh
script.
To run the code, you need to install LibTorch for C++ which cou can download fromhere. We recommend usingPyTorch >= 2.1.0
andCUDA >= 12.1
. In addition you should havegcc >= 10.2.0
andcmake >= 3.21
installed.
Make sure to update therun_cmake.sh
file by updating the path to yourlibtorch
library. You can finally compile and run the code by executing
./run_cmake.sh./build/src/dphpc --K 32 --data_folder data/
About
Resources
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Releases
Packages0
Uh oh!
There was an error while loading.Please reload this page.
Contributors5
Uh oh!
There was an error while loading.Please reload this page.