- Notifications
You must be signed in to change notification settings - Fork1
RISCV Vector Kernel C/LLVM-IR generator
License
cbalint13/rvv-kernels
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
This is a C/LLVM-IR kernel generator that address unsupported RVV ISA versions for LLVM or any other toolchains.
XuanTie TH1520 | SpacemiT K1 X60 |
---|---|
![]() | ![]() |
![]() | ![]() |
![]() | ![]() |
- Prepare a docker image with rv64 cross compiler
$ git clone https://github.com/cbalint13/rvv-kernels$ cd rvv-kernels$ docker build --file Dockerfile.ML.fedora --tag th1520-rvv .
- Generate a kernel
$ docker run -it --rm -v "$PWD":/opt/src th1520-rvv bash[root@b8032fd28a75 src]# ./make.sh 32 4 int8 v0.7.1 cbalint@192.168.1.45(x) Naive kernel: HEX = b0 28 00 00 b0 66 00 00 b0 a4 00 00 b0 e2 00 00 O[] = 00010416 00026288 00042160 00058032(x) MACC operations: elems[32] x lanes[4] = 256 Ops(x) RVV kernel: HEX = b0 28 00 00 b0 66 00 00 b0 a4 00 00 b0 e2 00 00 O[] = 00010416 00026288 00042160 00058032RVV bench: 25.600 GOPS in 2.215818 secsRVV speed: 11.553 GOPS/sec[root@b8032fd28a75 src]# ls -l dot_int8_kernel.*-rw-r--r-- 1 1000 1000 3867 Mar 13 18:03 dot_int8_kernel.c-rw-r--r-- 1 1000 1000 5034 Mar 13 18:03 dot_int8_kernel.ir
- Optional benchmark logs & graph
[root@b8032fd28a75 src]# ./script/0-explore.sh[root@b8032fd28a75 src]# ls -l benchmark-int8.log-rw-r--r-- 1 1000 1000 5731 Mar 13 17:38 benchmark-int8.log[root@b8032fd28a75 src]# ./script/1-plotgraph.py --logs benchmark-int8.log --title 'RVV v0.7.1 int8 kernels benchmark (TH1520)'[root@b8032fd28a75 src]# ls -l benchmark-int8.log.png-rw-r--r-- 1 1000 1000 58380 Mar 13 18:47 benchmark-int8.log.png
- This generator emmits C / LLVM-IR kernels, with encoded insn, thus making it RVV version agnostic
- T-Head 1520 (C906, also others) implements older v0.7.1 RVV ISA, now unsupported by LLVM upstream
- TH1520
setvli
ASIC implementation is slow, see comments on a dynamic kernel:trials/riscv-asm.c - The
setvli
slowness issue force the SVE (scalable vector) concept to avoid frequentsetvli
calls
Thetrials/riscv-asm.c sample kernel would cope withSVE concept ofruntime dynamismbut for reasons tested and mentioned here, on the particular T-Head's C906 RVV ASIC implementation, the contextswitchingsetvli
drags down the whole performance in a severe way, thussetvli
calls should be minimizedfor this particular target.
For RVV 0.7.1 there is a limit of how & which vector registers can be used in the context of MUL (multiplier),so the maximum vector fill width of 64 xint8
being reduced into x2 lanes is not possible, it would requiree8/m4 MUL mode that leaves room for only 4 x vregs (v0, v8, v16, v24) a insufficient amount of registers.The maximum usableint8
elements width is 32 for RVV 0.7.1 version.
The generated kernel setssetvli
once and unrolls computations across the vector registers.
- 16 Dec 2024 benchmark full int8/fp16/fp32 RVV v1.0 & v0.7.1
- 06 Jun 2024 realease
fp16
&fp32
for RVV 0.7.1 version - 13 Mar 2024 intial realease, for now
int8
with RVV 0.7.1 version