- Notifications
You must be signed in to change notification settings - Fork59
A collection of extensions for Vitis and Intel FPGA OpenCL to improve developer quality of life.
License
definelicht/hlslib
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Do you use hlslib? Please considerciting us, and let us know so we can feature your project in the list of examples.
hlslib is a collection of C++ headers, CMake files, and examples, aimed at improving the quality of life of HLS developers. The current repertoire primarily supports Vitis and Vitis HLS (formerly SDAccel and Vivado HLS), with some support for Intel FPGA OpenCL. An extended abstract describing the project isavailable here.
This project is developed at theScalable Parallel Computing Lab (SPCL) at ETH Zurich (see ourgithub).
There are a few ways:
- Grab the headers and/or CMake files you need and stick them in your project.
- Install hlslib using the standard CMake installation procedure to a location of your choice.
- Clone this repository into your project as a git submodule and integrate it, with or without CMake.
Just#include
the header(s) you are interested in, and/or put the CMake files somewhere in your project directory andadd their path to your${CMAKE_MODULE_PATH}
. You can see an examplehere!
Officially supported versions of Vitis currently include 2021.2, 2021.1, 2020.2, 2020.1, and 2019.2. Older versions (including SDx and SDAccel)might work, but should be used at your own discretion.
We have Doxygen! Set the CMake flagHLSLIB_BUILD_DOCUMENTATION=ON
to generate the docs.
A brief overview of hlslib features is given below.
For integrating the Xilinx or Intel HLS tools in your project, theFindVitis.cmake
andFindIntelFPGAOpenCL.cmake
are provided in thecmake
subdirectory. The scripts will set all necessary variables required to build both host and device code. It also provides theadd_vitis_kernel
andadd_vitis_program
functions, which will produce targets for building hardware emulation, hardware, and high-level synthesis.
ExampleCMakeLists.txt
:
set(CMAKE_MODULE_PATH${CMAKE_MODULE_PATH} hlslib/cmake)find_package(Vitis REQUIRED)add_executable(MyHostExecutable src/MyHostExecutable.cpp)include_directories(${Vitis_INCLUDE_DIRS})target_link_libraries(MyHostExecutable${Vitis_LIBRARIES})# Will populate the "hw", "hw_emu", and "synthesis" targetsadd_vitis_kernel(MyKernelFILES src/MyKernel.cpp)add_vitis_program(MyKernel xilinx_u250_gen3x16_xdma_3_1_202020_1)
Kernels can then be built with:
make hw
Theadd_vitis_kernel
andadd_vitis_program
functions takes a number of optional arguments that can be used to configure the hardware targets:
add_vitis_kernel(MyKernelFILES src/MyKernel.cpp src/MyKernelHelper.cpp# All flags below this are optional keywords, and any# combination of them can be specified/not specified. KERNEL MyKernelName# If different from target name HLS_FLAGS"-DMY_IMPORTANT_DEFINITION -O2"DEPENDSinclude/MyHeader.hinclude/OtherDependency.h INCLUDE_DIRS${CMAKE_CURRENT_SOURCE_DIR}/include hlslib/include PORT_MAPPING"ddr0:DDR[0]""ddr1:DDR[1]")add_vitis_kernel(MyOtherKernelFILES src/MyOtherKernel.cpp)add_vitis_program(MyProgram xilinx_u250_gen3x16_xdma_3_1_202020_1# Name of Vitis platform# All flags below this are optional keywords, and any# combination of them can be specified/not specified. KERNELS MyKernel# If KERNELS is not specified, the function checks for a kernel MyOtherKernel# with the same name as specified for the program# Connect multiple linked kernels using streaming interfaces CONNECTIVITY"MyKernel_1.stream_out:MyOtherKernel_1.stream_in" CLOCK 400# Target a different clock frequency than the default CONFIG scripts/my_config.cfg# Given as --config to Vitis SAVE_TEMPSON# Forwards --save-temps to Vitis BUILD_FLAGS"-Os --export_script" DEBUGGINGON# Enables Chipscope debugging on all interfaces PROFILINGON)# Enables profiling for stalls, data transfers, and execution
Thehlslib::DataPack
class located inhlslib/xilinx/DataPack.h
facilitates SIMD-style vectorization, and makes it easy to build wide data paths in your code.
Examples usage:
hlslib::DataPack<float,4>Foo(hlslib::DataPack<float,4> &a, hlslib::DataPack<float,4> &b) { #pragma HLS INLINEauto add = a + b;// Vector addition add[0] = add[1];// Indexing for both reads and writesreturn0.5 * add;// Element-wise multiplication by a scalar}
For kernels with multiple processing elements (PEs) executing in parallel, thehlslib/xilinx/Simulation.h
adds some convenient macros to simulate this behavior, by wrapping each PE in a thread executed in parallel, all of which are joined when the program terminates.
Example usage:
HLSLIB_DATAFLOW_INIT();hlslib::Stream<Data_t> pipes[kStages +1];HLSLIB_DATAFLOW_FUNCTION(MemoryToStream, memory_in, pipes[0]);for (int i =0; i <kStages; ++i) { #pragma HLS UNROLLHLSLIB_DATAFLOW_FUNCTION(PE, pipes[i], pipes[i +1]);// Launches new C++ thread}HLSLIB_DATAFLOW_FUNCTION(StreamToMemory, pipes[kStages], memory_out);HLSLIB_DATAFLOW_FINALIZE();// In simulation mode, joins threads created as dataflow functions.
When building programs using the simulation features, you must link against a thread library (e.g., pthreads).
While Vivado HLS provides thehls::stream
class, it is somewhat lacking in features, in particular when simulating multiple processing elements. Thehlslib::Stream
class inhlslib/xilinx/Stream.h
compiles to Vivado HLS streams, but provides a richer interface. hlslib streams are:
- thread-safe during simulation, allowing producer and consumer to be executed in parallel;
- bounded, simulating the finite capacity of hardware FIFOs, allowing easier detection of deadlocks in software; and
- self-contained, allowing the stream depth and implementation (e.g., using LUTRAM or BRAM) to be specified directly in the object, without excess pragmas.
Example usage:
voidBar(hlslib::Stream<int> &a, hlslib::Stream<int> &b,int N) {for (int i =0; i < N; ++i) { #pragma HLS PIPELINE II=1autoread = a.Pop();// Queue-like interface b.Push(read +1); }}voidFoo(hlslib::Stream<int> &in_stream,// Specifying stream depth is optional hlslib::Stream<int> &out_stream,int N) { #pragma HLS DATAFLOW hlslib::Stream<int,4> foo_pipe;// Implements a FIFO of depth 4// Dataflow functions running in parallelHLSLIB_DATAFLOW_INIT();HLSLIB_DATAFLOW_FUNCTION(Bar, in_stream, foo_pipe, N);HLSLIB_DATAFLOW_FUNCTION(Bar, foo_pipe, out_stream, N);HLSLIB_DATAFLOW_FINALIZE();}
To greatly reduce the amount of boilerplate code required to create and launch OpenCL kernels, and to handle FPGA-specific configuration required by the vendors, hlslib provides a C++14 convenience interface inhlslib/xilinx/OpenCL.h
andhlslib/intel/OpenCL.h
for Xilinx and Intel FPGA OpenCL, respectively.
Example usage:
using hlslib::ocl;Context context;std::vector<float>input_host(N,5); std::vector<float>output_host(N,5);auto input_device = context.MakeBuffer<float, Access::read>( MemoryBank::bank0, input_host.cbegin(), input_end.cend());auto output_device = context.MakeBuffer<float, Access::write>(MemoryBank::bank1, N);auto program = context.MakeProgram("MyKernel.xclbin");auto kernel = program.MakeKernel("MyKernel", input_device, output_device, N);kernel.ExecuteTask();output_device.CopyToHost(output_host.begin());
Various other features are provided, including:
- Classes to flatten loop nests and keep track of indices (
include/hlslib/xilinx/Flatten.h
), both for bounds known at runtime (hlslib::Flatten
) and bounds known at compile-time (hlslib::ConstFlatten
). Example usage can be found inxilinx_test/kernels/Flatten.cpp
. - Various compile-time functions commonly used when designing hardware, such as log2, in
include/hlslib/xilinx/Utility.h
. - A template tcl-file that can be used with CMake or another templating engine to produce a high-level synthesis script.
xilinx_test/CMakeLists.txt
that builds a number of tests to verify hlslib functionality, doubling as a reference for how to integrate HLS projects with CMake using the provided module files .- An example of how to use the Simulation and Stream headers, at
xilinx_test/kernels/MultiStageAdd.cpp
, both as a host-only simulation (xilinx_test/test/TestMultiStageAdd.cpp
), and as a hardware kernel (xilinx_test/host/RunMultiStageAdd.cpp
). include/hlslib/xilinx/Accumulate.h
, which includes a streaming implementation of accumulation, including for type/operator combinations with a non-zero latency (such as floating point addition). Example kernels of usage for both integer and floating point types are included asxilinx_test/kernel/AccumulateInt.cpp
andxilinx_test/kernel/AccumulateFloat.cpp
, respectively.include/hlslib/xilinx/Operators.h
, which includes some commonly used operators as functors to be plugged into templated functions such asTreeReduce
andAccumulate
.include/hlslib/xilinx/Axi.h
, which implements the AXI Stream interface and the bus interfaces required by the DataMover IP, enabling the use of a command stream-based memory interface for HLS kernels if packaged as an RTL kernel where the DataMover IP is connected to the AXI interfaces.
Some of these headers depend on others. Please refer to the source code.
On Ubuntu, the following package might need to be installed to run hardware emulation:
sudo apt install libc6-dev-i386
- Matrix multiplication code [1]: Uses a wide range of hlslib features, including simulation, streams, vectors, CMake integration, and OpenCL wrapper code.
- SMI [2]: Streaming message passing library for inter-FPGA communication in OpenCL. Uses hlslib for OpenCL host code.
- HelmGemm [3]: uses the simulation features of hlslib, and incorporates the matrix multiplication code above.
- DaCe [4]: A data-centric parallel programming framework targeting a wide range of architectures, including both Intel and Xilinx FPGAs. Uses hlslib for CMake integration, OpenCL host code, vectors, streams, and simulation.
- REVD [5]: Uses hlslib streaming and simulation features to accelerate the deconvolution workloads of GANs with reverse looping optimizations on Xilinx FPGAs.
If you use hlslib in your project, please let us know, so we can add you to the list.
Please use the issue tracker.
If your project uses hlslib, please consider citing us:
BibTeX:
@article{hlslib, title={{hlslib}: Software Engineering for Hardware Design}, author={de~Fine~Licht, Johannes and Hoefler, Torsten}, journal={arXiv:1910.04436}, year={2019}}
Plain text:
J. de Fine Licht and T. Hoefler, "hlslib: Software Engineering for Hardware Design", arXiv:1910.04436 (2019).
- [1] Johannes de Fine Licht, Grzegorz Kwasniewski, and Torsten Hoefler."Flexible Communication Avoiding Matrix Multiplication on FPGA with High-Level Synthesis", in Proceedings of 28th ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA'20).
- [2] Tiziano De Matteis, Johannes de Fine Licht, Jakub Beránek, and Torsten Hoefler."Streaming Message Interface: High-Performance Distributed Memory Programming on Reconfigurable Hardware", in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC'19).
- [3] Dionysios Diamantopoulos, and Christoph Hagleitner."HelmGemm: Managing GPUs and FPGAs for transprecision GEMM workloads in containerized environments.", in Proceedings of the 2019 IEEE 30th International Conference on Application-specific Systems, Architectures and Processors (ASAP'19).
- [4] Tal Ben-Nun, Johannes de Fine Licht, Alexandros Nikolaos Ziogas, Timo Schneider, and Torsten Hoefler."Stateful Dataflow Multigraphs: A Data-Centric Model for High-Performance Parallel Programs", in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC'19).
- [5] Ian Colbert, Jake Daly, Ken Kreutz-Delgado, and Srinjoy Das."A Competitive Edge: Can FPGAs Beat GPUs at DCNN Inference Acceleration in Resource-Limited Edge Computing Applications?", arXiv:2102.00294.
About
A collection of extensions for Vitis and Intel FPGA OpenCL to improve developer quality of life.