UDC-GAC/venomPublic

NotificationsYou must be signed in to change notification settings
Fork7
Star52

A Vectorized N:M Format for Unleashing the Power of Sparse Tensor Cores

License

Apache-2.0 license

52 stars 7 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
benchmark		benchmark
cmake		cmake
end2end		end2end
include		include
plot		plot
result		result
sparseml		sparseml
src		src
.gitmodules		.gitmodules
CMakeLists.txt		CMakeLists.txt
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md

Repository files navigation

VENOM: A Vectorized N:M Format for Unleashing the Power of Sparse Tensor Cores

The V:N:M (VENOM) format enables the execution of arbitrary N:M ratios on SPTCs, which natively only support 2:4 patterns (50% sparsity). To efficiently exploit VENOM, we proposeSpatha 🗡️, a high-performance sparse-library for DL routines. We ran all the experiments on NVIDIA RTX 3090 GPU. The software requirements to reproduce the artifact are: CUDA Toolkit 11.5 or 11.7, cuSparseLt v.0.3.0, Python 3.10, PyTorch 1.13.1 and cmake 3.16.3.

Reproduction with container

Step 1: Download and run the container

Option 1: download an already-built docker image

wget https://zenodo.org/record/8084447/files/venom_container.tar.gzdocker load -i venom_container.tar.gzdocker run -it –-gpus all venom_container

Option 2: build the container from scratch

git clone --recurse-submodules git@github.com:UDC-GAC/venom.git&&cd venomdocker build -t venom_container.docker run -it --gpus all --name<your_container_name> venom_container

Step 2: Compile and run the experiments

Compilation is already inlined in the scripts provided, so you can jump directly to (1) if you plan to follow the artifact scripts. However, the instructions to build and install the code are the following:

Build and install the centralized benchmarking tool:

cd /projects/venom/mkdir build&&cd build# about 1 minutecmake .. -DCMAKE_BUILD_TYPE=Debug -DCUDA_ARCHS="86" -DBASELINE=OFF -DIDEAL_KERNEL=OFF -DOUT_32B=OFF&& make -j 16

Three compiling options are defined to build the following kernel versions:

-DBASELINE: baseline Spatha implementation for 2:4 sparsity
-DIDEAL_KERNEL: Spatha N:M implementation without column-loc structure overhead (ideal situation)
-DOUT_32B: Spatha N:M implementation with 32-bit storage instructions. By default 128-bit instructions are used.

Note: If you find a problem like this:

Policy "CMP0104" is not known to this version of CMake

Please, comment this linecmake_policy(SET CMP0104 OLD) ininclude/sputnik/CMakeLists.txt

Build and install VENOM as a Python module:

cd end2end# about 1 minute./install.sh

(1) To reproduce the results on Fig 9

cd /projects/venom/# about 1 hour./benchmark/run_ablation1.shpython plot/run_ablation1.py

(2) To reproduce the results on Fig 10

cd /projects/venom/# about 5 minutes./benchmark/run_ablation2.shpython plot/run_ablation2.py

(3) To reproduce the results on Fig 12

cd /projects/venom/# about 20 minutes./benchmark/run_baseline_a.sh./benchmark/run_baseline_b.shpython plot/run_baseline_a.pypython plot/run_baseline_b.py

(4) To reproduce the results on Fig 13

cd /projects/venom/# about 2 hours./benchmark/run_spmm_spatha.shpython plot/run_spmm_spatha.py

(5) To reproduce the results on Fig 15

conda activate end2end# about 10 minutes./end2end/run_inference.shpython3 plot/run_inference.py

(6) To reproduce the results on Fig 11

conda activate end2end# about 6 minutespython3 benchmark/energy.py

(7) Since reproducing results on Table 2 can take a significant amount of time, we provide three different scripts to alleviate this process

conda activate sparseml_artfcd sparseml# Script that contains a subset of the experiments with the most aggressive configurations using the pair-wise version of the sparsifier# about 4 days./sparseml_SS1.sh# Script that contains all the sparsity-format configurations but relaxed with pair-wise version of the sparsifier# about 10 days./sparseml_SS2.sh# Script that contains all the sparsity-format configurations and performs the exhaustive search process# about 25 days./sparseml_SS3.sh

Note: each script inintegrations/huggingface-transformers/scripts has two execution possibilities. Please, uncomment the first line if you want to use a single-GPU, or the second one with the total number of GPUs available for multiple-GPU execution.

#single-GPUCUDA_VISIBLE_DEVICES=0 python3.10 src/sparseml/transformers/question_answering.py \#multi-GPU (3 in this example)python3.10 -m torch.distributed.launch --nproc_per_node=3 src/sparseml/transformers/question_answering.py \

Step 3: check plots

cd /projects/venom/resultscp*.pdf username@hostmachine:/host/path/target

Reproduction with source code

Step 1: Prepare code and setup python environments

git clone --recurse-submodules git@github.com:UDC-GAC/venom.git&&cd venom

Setup environments:

conda create -y --name end2endconda activate end2endconda install pytorch cudatoolkit torchvision torchaudio pytorch-cuda==11.7 -c pytorch -c nvidiapip install pybind11 matplotlib pandas seaborn shapely holoviewscd end2end/stenpip install.conda deactivate

cd sparsemlconda env create -f sparseml.ymlconda activate sparseml_artfpython3.10 -m pip install -e.python3.10 uninstall transformerspython3.10 -m pip install https://github.com/neuralmagic/transformers/releases/download/v1.5/transformers-4.23.1-py3-none-any.whl datasets scikit-learn seqeval pulpconda deactivate

Step 2&3: Suppose the source code is in the path`/projects/venom`. Then, follow the same`Step 2&3` instructions as described for docker containers

How to use. Examples:

Spatha 🗡️

./src/benchmark_spmm --sparsity-type n-to-m --spmm spatha --gemm cuBlas --precision half --meta-block-size 32 --block-size 4 --nn_row 2 --mm_row 8 --m 1024 --k 4096 --n 4096 --d 0.5 --bm 128 --bn 64 --bk 32 --wm 32 --wn 64 --wk 32 --mm 16 --mn 8 --mk 32 --nstage 2 --random --check

./src/benchmark_spmm --sparsity-type n-to-m --spmm spatha --gemm cuBlas --precision half --meta-block-size 32 --block-size 4 --nn_row 2 --mm_row 16 --m 1024 --k 4096 --n 4096 --d 0.5 --bm 128 --bn 64 --bk 32 --wm 32 --wn 64 --wk 32 --mm 16 --mn 8 --mk 32 --nstage 2 --random --check

cuSparseLt

./src/benchmark_spmm --sparsity-type csr --spmm cuSparseLt --gemm cuBlas --precision half --m 1024 --k 4096 --n 768 --d 0.5 --check

CLASP

./src/benchmark_spmm --sparsity-type cvs --spmm CLASP --gemm cuBlas --precision half --block-size 16 --m 1024 --k 256 --n 256 --d 0.2 --check

Publication

VENOM is published in SC'23. To cite our work:

@inproceedings{10.1145/3581784.3607087,author ={Castro, Roberto L. and Ivanov, Andrei and Andrade, Diego and Ben-Nun, Tal and Fraguela, Basilio B. and Hoefler, Torsten},title ={VENOM: A Vectorized N:M Format for Unleashing the Power of Sparse Tensor Cores},year ={2023},isbn ={9798400701092},publisher ={Association for Computing Machinery},address ={New York, NY, USA},doi ={10.1145/3581784.3607087},booktitle ={Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis},articleno ={72},numpages ={14},location ={Denver, CO, USA},series ={SC '23}}

License

Apache-2.0 License

-- Roberto López Castro

About

A Vectorized N:M Format for Unleashing the Power of Sparse Tensor Cores

Releases

No releases published

Packages

No packages published

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

VENOM: A Vectorized N:M Format for Unleashing the Power of Sparse Tensor Cores

Reproduction with container

Step 1: Download and run the container

Option 1: download an already-built docker image

Option 2: build the container from scratch

Step 2: Compile and run the experiments

Step 3: check plots

Reproduction with source code

Step 1: Prepare code and setup python environments

Step 2&3: Suppose the source code is in the path`/projects/venom`. Then, follow the same`Step 2&3` instructions as described for docker containers

How to use. Examples:

Spatha 🗡️

cuSparseLt

CLASP

Publication

License

-- Roberto López Castro

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Languages

Movatterモバイル変換

License

UDC-GAC/venom

Folders and files

Latest commit

History

Repository files navigation

VENOM: A Vectorized N:M Format for Unleashing the Power of Sparse Tensor Cores

Reproduction with container

Step 1: Download and run the container

Option 1: download an already-built docker image

Option 2: build the container from scratch

Step 2: Compile and run the experiments

Step 3: check plots

Reproduction with source code

Step 1: Prepare code and setup python environments

Step 2&3: Suppose the source code is in the path/projects/venom. Then, follow the sameStep 2&3 instructions as described for docker containers

How to use. Examples:

Spatha 🗡️

cuSparseLt

CLASP

Publication

License

-- Roberto López Castro

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Uh oh!

Languages

Step 2&3: Suppose the source code is in the path`/projects/venom`. Then, follow the same`Step 2&3` instructions as described for docker containers

Packages