- Notifications
You must be signed in to change notification settings - Fork109
OpenSource GPU, in Verilog, loosely based on RISC-V ISA
License
hughperkins/VeriGPU
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Build an opensource GPU, targeting ASIC tape-out, formachine learning ("ML"). Hopefully, can get it to work with thePyTorch deep learning framework.
Create an opensource GPU for machine learning.
I don't actually intend to tape this out myself, but I intend to do what I can to verify somehow that tape-out would work ok, timings ok, etc.
Intend to implement aHIP API, that is compatible withpytorch machine learning framework. Open to provision of other APIs, such asSYCL orNVIDIA® CUDA™.
Internal GPU Core ISA loosely compliant withRISC-V ISA. Where RISC-V conflicts with designing for a GPU setting, we break with RISC-V.
Intend to keep the cores very focused on ML. For example,brain floating point ("BF16") throughout, to keep core die area low. This should keep the per-core cost low. Similarly, Intend to implement only few float operations critical to ML, such asexp
,log
,tanh
,sqrt
.
Big Picture:
GPU Die Architecture:
Single Core:
Single-source compilation and runtime
Single-source C++:
Compile the GPU and runtime:
- CMakeLists.txt:src/gpu_runtime/CMakeLists.txt
- GPU runtime:src/gpu_runtime/gpu_runtime.cpp
- GPU controller:src/gpu_controller.sv
- Single GPU RISC-V core:src/core.sv
Compile the single-source C++, and run:
What direction are we thinking of going in? What works already? See:
Our assembly language implementation and progress. Design of GPU memory, registers, and so on. See:
If we want to tape-out, we need solid verification. Read more at:
we want the GPU to run quickly, and to use minimal die area. Read how we measure timings and area at: