Movatterモバイル変換

cbalint13/rvv-kernelsPublic

NotificationsYou must be signed in to change notification settings
Fork1
Star8

RISCV Vector Kernel C/LLVM-IR generator

License

Apache-2.0 license

8 stars 1 fork Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
docs		docs
script		script
trials		trials
.gitignore		.gitignore
Dockerfile.ML.fedora		Dockerfile.ML.fedora
LICENSE		LICENSE
README.md		README.md
benchmark-v0.7.1-fp16.log		benchmark-v0.7.1-fp16.log
benchmark-v0.7.1-fp16.log.png		benchmark-v0.7.1-fp16.log.png
benchmark-v0.7.1-fp32.log		benchmark-v0.7.1-fp32.log
benchmark-v0.7.1-fp32.log.png		benchmark-v0.7.1-fp32.log.png
benchmark-v0.7.1-int8.log		benchmark-v0.7.1-int8.log
benchmark-v0.7.1-int8.log.png		benchmark-v0.7.1-int8.log.png
benchmark-v1.0-fp16.log		benchmark-v1.0-fp16.log
benchmark-v1.0-fp16.log.png		benchmark-v1.0-fp16.log.png
benchmark-v1.0-fp32.log		benchmark-v1.0-fp32.log
benchmark-v1.0-fp32.log.png		benchmark-v1.0-fp32.log.png
benchmark-v1.0-int8.log		benchmark-v1.0-int8.log
benchmark-v1.0-int8.log.png		benchmark-v1.0-int8.log.png
dot_fp16_kernel.c		dot_fp16_kernel.c
dot_fp16_kernel.ir		dot_fp16_kernel.ir
dot_fp32_kernel.c		dot_fp32_kernel.c
dot_fp32_kernel.ir		dot_fp32_kernel.ir
dot_int8_kernel.c		dot_int8_kernel.c
dot_int8_kernel.ir		dot_int8_kernel.ir
make.sh		make.sh
rvv-bench.c		rvv-bench.c
rvv-bench.h		rvv-bench.h
rvv-dot-kernel-gen.py		rvv-dot-kernel-gen.py

Repository files navigation

High performance RVV kernel generator to C & LLVM-IR dialects

This is a C/LLVM-IR kernel generator that address unsupported RVV ISA versions for LLVM or any other toolchains.

Benchmark

XuanTie TH1520	SpacemiT K1 X60

Usage

Prepare a docker image with rv64 cross compiler

$ git clone https://github.com/cbalint13/rvv-kernels$ cd rvv-kernels$ docker build --file Dockerfile.ML.fedora --tag th1520-rvv .

Generate a kernel

$ docker run -it --rm -v "$PWD":/opt/src th1520-rvv bash[root@b8032fd28a75 src]# ./make.sh 32 4 int8 v0.7.1 cbalint@192.168.1.45(x) Naive kernel:  HEX = b0 28 00 00 b0 66 00 00 b0 a4 00 00 b0 e2 00 00  O[] = 00010416 00026288 00042160 00058032(x) MACC operations: elems[32] x lanes[4] = 256 Ops(x) RVV kernel:  HEX = b0 28 00 00 b0 66 00 00 b0 a4 00 00 b0 e2 00 00  O[] = 00010416 00026288 00042160 00058032RVV bench: 25.600 GOPS in 2.215818 secsRVV speed: 11.553 GOPS/sec[root@b8032fd28a75 src]# ls -l dot_int8_kernel.*-rw-r--r-- 1 1000 1000 3867 Mar 13 18:03 dot_int8_kernel.c-rw-r--r-- 1 1000 1000 5034 Mar 13 18:03 dot_int8_kernel.ir

Optional benchmark logs & graph

[root@b8032fd28a75 src]# ./script/0-explore.sh[root@b8032fd28a75 src]# ls -l benchmark-int8.log-rw-r--r-- 1 1000 1000 5731 Mar 13 17:38 benchmark-int8.log[root@b8032fd28a75 src]# ./script/1-plotgraph.py --logs benchmark-int8.log --title 'RVV v0.7.1 int8 kernels benchmark (TH1520)'[root@b8032fd28a75 src]# ls -l benchmark-int8.log.png-rw-r--r-- 1 1000 1000 58380 Mar 13 18:47 benchmark-int8.log.png

Notes

This generator emmits C / LLVM-IR kernels, with encoded insn, thus making it RVV version agnostic
T-Head 1520 (C906, also others) implements older v0.7.1 RVV ISA, now unsupported by LLVM upstream
TH1520setvli ASIC implementation is slow, see comments on a dynamic kernel:trials/riscv-asm.c
Thesetvli slowness issue force the SVE (scalable vector) concept to avoid frequentsetvli calls

Thetrials/riscv-asm.c sample kernel would cope withSVE concept ofruntime dynamismbut for reasons tested and mentioned here, on the particular T-Head's C906 RVV ASIC implementation, the contextswitchingsetvli drags down the whole performance in a severe way, thussetvli calls should be minimizedfor this particular target.

For RVV 0.7.1 there is a limit of how & which vector registers can be used in the context of MUL (multiplier),so the maximum vector fill width of 64 xint8 being reduced into x2 lanes is not possible, it would requiree8/m4 MUL mode that leaves room for only 4 x vregs (v0, v8, v16, v24) a insufficient amount of registers.The maximum usableint8 elements width is 32 for RVV 0.7.1 version.

The generated kernel setssetvli once and unrolls computations across the vector registers.

Changelog

16 Dec 2024 benchmark full int8/fp16/fp32 RVV v1.0 & v0.7.1
06 Jun 2024 realeasefp16 &fp32 for RVV 0.7.1 version
13 Mar 2024 intial realease, for nowint8 with RVV 0.7.1 version

About

RISCV Vector Kernel C/LLVM-IR generator

Releases

No releases published

Packages

No packages published

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

High performance RVV kernel generator to C & LLVM-IR dialects

Benchmark

Usage

Notes

Changelog

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Languages

Movatterモバイル変換

License

cbalint13/rvv-kernels

Folders and files

Latest commit

History

Repository files navigation

High performance RVV kernel generator to C & LLVM-IR dialects

Benchmark

Usage

Notes

Changelog

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Languages

Packages