Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

A Vectorized N:M Format for Unleashing the Power of Sparse Tensor Cores

License

NotificationsYou must be signed in to change notification settings

UDC-GAC/venom

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DOI

The V:N:M (VENOM) format enables the execution of arbitrary N:M ratios on SPTCs, which natively only support 2:4 patterns (50% sparsity). To efficiently exploit VENOM, we proposeSpatha 🗡️, a high-performance sparse-library for DL routines. We ran all the experiments on NVIDIA RTX 3090 GPU. The software requirements to reproduce the artifact are: CUDA Toolkit 11.5 or 11.7, cuSparseLt v.0.3.0, Python 3.10, PyTorch 1.13.1 and cmake 3.16.3.

Reproduction with container

Step 1: Download and run the container

Option 1: download an already-built docker image

wget https://zenodo.org/record/8084447/files/venom_container.tar.gzdocker load -i venom_container.tar.gzdocker run -it –-gpus all venom_container

Option 2: build the container from scratch

git clone --recurse-submodules git@github.com:UDC-GAC/venom.git&&cd venomdocker build -t venom_container.docker run -it --gpus all --name<your_container_name> venom_container

Step 2: Compile and run the experiments

Compilation is already inlined in the scripts provided, so you can jump directly to (1) if you plan to follow the artifact scripts. However, the instructions to build and install the code are the following:

Build and install the centralized benchmarking tool:

cd /projects/venom/mkdir build&&cd build# about 1 minutecmake .. -DCMAKE_BUILD_TYPE=Debug -DCUDA_ARCHS="86" -DBASELINE=OFF -DIDEAL_KERNEL=OFF -DOUT_32B=OFF&& make -j 16

Three compiling options are defined to build the following kernel versions:

  • -DBASELINE: baseline Spatha implementation for 2:4 sparsity
  • -DIDEAL_KERNEL: Spatha N:M implementation without column-loc structure overhead (ideal situation)
  • -DOUT_32B: Spatha N:M implementation with 32-bit storage instructions. By default 128-bit instructions are used.

Note: If you find a problem like this:

Policy "CMP0104" is not known to this version of CMake

Please, comment this linecmake_policy(SET CMP0104 OLD) ininclude/sputnik/CMakeLists.txt

Build and install VENOM as a Python module:

cd end2end# about 1 minute./install.sh

(1) To reproduce the results on Fig 9

cd /projects/venom/# about 1 hour./benchmark/run_ablation1.shpython plot/run_ablation1.py

(2) To reproduce the results on Fig 10

cd /projects/venom/# about 5 minutes./benchmark/run_ablation2.shpython plot/run_ablation2.py

(3) To reproduce the results on Fig 12

cd /projects/venom/# about 20 minutes./benchmark/run_baseline_a.sh./benchmark/run_baseline_b.shpython plot/run_baseline_a.pypython plot/run_baseline_b.py

(4) To reproduce the results on Fig 13

cd /projects/venom/# about 2 hours./benchmark/run_spmm_spatha.shpython plot/run_spmm_spatha.py

(5) To reproduce the results on Fig 15

conda activate end2end# about 10 minutes./end2end/run_inference.shpython3 plot/run_inference.py

(6) To reproduce the results on Fig 11

conda activate end2end# about 6 minutespython3 benchmark/energy.py

(7) Since reproducing results on Table 2 can take a significant amount of time, we provide three different scripts to alleviate this process

conda activate sparseml_artfcd sparseml# Script that contains a subset of the experiments with the most aggressive configurations using the pair-wise version of the sparsifier# about 4 days./sparseml_SS1.sh# Script that contains all the sparsity-format configurations but relaxed with pair-wise version of the sparsifier# about 10 days./sparseml_SS2.sh# Script that contains all the sparsity-format configurations and performs the exhaustive search process# about 25 days./sparseml_SS3.sh

Note: each script inintegrations/huggingface-transformers/scripts has two execution possibilities. Please, uncomment the first line if you want to use a single-GPU, or the second one with the total number of GPUs available for multiple-GPU execution.

#single-GPUCUDA_VISIBLE_DEVICES=0 python3.10 src/sparseml/transformers/question_answering.py \#multi-GPU (3 in this example)python3.10 -m torch.distributed.launch --nproc_per_node=3 src/sparseml/transformers/question_answering.py \

Step 3: check plots

cd /projects/venom/resultscp*.pdf username@hostmachine:/host/path/target

Reproduction with source code

Step 1: Prepare code and setup python environments

git clone --recurse-submodules git@github.com:UDC-GAC/venom.git&&cd venom

Setup environments:

conda create -y --name end2endconda activate end2endconda install pytorch cudatoolkit torchvision torchaudio pytorch-cuda==11.7 -c pytorch -c nvidiapip install pybind11 matplotlib pandas seaborn shapely holoviewscd end2end/stenpip install.conda deactivate
cd sparsemlconda env create -f sparseml.ymlconda activate sparseml_artfpython3.10 -m pip install -e.python3.10 uninstall transformerspython3.10 -m pip install https://github.com/neuralmagic/transformers/releases/download/v1.5/transformers-4.23.1-py3-none-any.whl datasets scikit-learn seqeval pulpconda deactivate

Step 2&3: Suppose the source code is in the path/projects/venom. Then, follow the sameStep 2&3 instructions as described for docker containers

How to use. Examples:

Spatha 🗡️

./src/benchmark_spmm --sparsity-type n-to-m --spmm spatha --gemm cuBlas --precision half --meta-block-size 32 --block-size 4 --nn_row 2 --mm_row 8 --m 1024 --k 4096 --n 4096 --d 0.5 --bm 128 --bn 64 --bk 32 --wm 32 --wn 64 --wk 32 --mm 16 --mn 8 --mk 32 --nstage 2 --random --check
./src/benchmark_spmm --sparsity-type n-to-m --spmm spatha --gemm cuBlas --precision half --meta-block-size 32 --block-size 4 --nn_row 2 --mm_row 16 --m 1024 --k 4096 --n 4096 --d 0.5 --bm 128 --bn 64 --bk 32 --wm 32 --wn 64 --wk 32 --mm 16 --mn 8 --mk 32 --nstage 2 --random --check

cuSparseLt

./src/benchmark_spmm --sparsity-type csr --spmm cuSparseLt --gemm cuBlas --precision half --m 1024 --k 4096 --n 768 --d 0.5 --check

CLASP

./src/benchmark_spmm --sparsity-type cvs --spmm CLASP --gemm cuBlas --precision half --block-size 16 --m 1024 --k 256 --n 256 --d 0.2 --check

Publication

VENOM is published in SC'23. To cite our work:

@inproceedings{10.1145/3581784.3607087,author ={Castro, Roberto L. and Ivanov, Andrei and Andrade, Diego and Ben-Nun, Tal and Fraguela, Basilio B. and Hoefler, Torsten},title ={VENOM: A Vectorized N:M Format for Unleashing the Power of Sparse Tensor Cores},year ={2023},isbn ={9798400701092},publisher ={Association for Computing Machinery},address ={New York, NY, USA},doi ={10.1145/3581784.3607087},booktitle ={Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis},articleno ={72},numpages ={14},location ={Denver, CO, USA},series ={SC '23}}

License

Apache-2.0 License

-- Roberto López Castro

About

A Vectorized N:M Format for Unleashing the Power of Sparse Tensor Cores

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

[8]ページ先頭

©2009-2025 Movatter.jp