- Notifications
You must be signed in to change notification settings - Fork5
Implementation of a cool communication layer
License
uiuc-hpc/lci
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
Implementation of a cool communication layer.
- @danghvu (danghvu@gmail.com)
- @omor1
- @JiakunYan (jiakunyan1998@gmail.com)
- @snirmarc
The Lightweight Communication Interface (LCI) is designed to be an efficient communication libraryfor multithreaded, irregular communications. It is a research tool to explore design choicesfor such libraries. It has the following major features:
Multithreaded performance as the first priority: No big blocking locks like those in MPI!We carefully design the internal data structures to minimize interference between threads.We use atomic operations and fine-grained try locks extensively instead of coarse-grained blocking locks.The posting sends/receives, polling completions, and making progress on the progress engine(
LCI_progress
) use different locks or no locks at all! Threads would not interfere with each otherunless necessary.Versatile communication interface: LCI provides users with various options including:
- Communication primitives: two-sided send/recv, one-sided put/get.
- Completion mechanisms: synchronizers (similar to MPI requests/futures), completion queues, function handlers.
- Protocols: small/medium/long messages mapping to eager/rendezvous protocol.
- Communication buffers: for both source/target buffers, we can useuser-provided/runtime-allocated buffers.
- Registration: for long messages, users can explicitly register the buffer or leave it to runtime(just use
LCI_SEGMENT_ALL
).
The options are orthogonal and almost all combinations are valid!For example, the example codeputla_queue usesone-sided put + user-provided source buffer + runtime-allocated target buffer +rendezvous protocol + completion queue on source/targer side + explicit registration.
Explicit control of communication behaviors and resources: versatile communication interface has alreadygiven users a lot of control. Besides, users can control various low-level features throughAPI/environmental variables/cmake variables such as
- Replication of communication devices.
- The semantics of send/receive tag matching.
- All communication primitives are non-blocking and users can decide when to retry in case oftemporarily unavailable resources.
- LCI also gives users an explicit function (
LCI_progress
) to make progress on the communication engine. - Different implementation and size of completion queues/matching tables.
Users can tailor the LCI configuration to reduce software overheads, or just use default settings ifLCI is not a performance bottleneck.
Currently, LCI is implemented as a mix of C and C++ libraries. Lightweight Communication Tools (LCT)is a C++ library providing basic tools that can be used across libraries. Lightweight CommunicationInterface (LCI) is a C library implementing communication-related features.
Currently, the functionalities in the LCT library include:
- timing.
- string searching and manipulation.
- query thread ID and number.
- logging.
- performance counters.
- different implementation of queues.
- PMI (Process Management Interface) wrappers.
The actual API and (some) documentation are located inlct.h andlci.h.
cmake .makemake install
CMAKE_INSTALL_PREFIX=/path/to/install
: Where to install LCI- This is the same across all the cmake projects.
LCI_DEBUG=ON/OFF
: Enable/disable the debug mode (more assertions and logs).The default value isOFF
.LCI_SERVER=ibv/ofi/ucx
: Hint to which network backend to use.If the backend indicated by this variable are found, LCI will just use it.Otherwise, LCI will use whatever are found with the priorityibv
>ofi
>ucx
.The default value isibv
. Typically, you don't need tomodify this variable, because iflibibverbs
presents, it is likely to be the recommended one to use.ibv
:libibverbs,typically for infiniband.ofi
:libfabrics,for all other networks (slingshot-11, ethernet, shared memory).ucx
:UCX.Currently, the backend is in the experimental state.
LCI_FORCE_SERVER=ON/OFF
: Default value isOFF
. If it is set toON
,LCI_SERVER
will not be treated as a hint but a requirement.LCI_WITH_LCT_ONLY=ON/OFF
: Whether to only build LCT (The Lightweight Communication Tools).Default isOFF
(build both LCT and LCI).
We use the same mechanisms as MPI to launch LCI processes, so you can use the same wayyou run MPI applications to run LCI applications. Typically, it would bempirun
orsrun
. For example,
mpirun -n 2 ./hello_world
or
srun -n 2 ./hello_world
Seeexamples
andtests
for some example code.
Seelci/api/lci.h
for public APIs.
doxygen
for a fulldocumentation.
See LICENSE file.
This typically happens when libibverbs/libfabric/UCX is not installed or not installed in the standardlocation. Make sure the backend is installed and specifyIBV_ROOT
,OFI_ROOT
, orUCX_ROOT
(either through environment variables or CMake variables) to point CMake to the correct location.
A common trick for finding whether/where they are installed is
# where libibverbs is installedwhich ibv_devinfo# where libfabric is installedwhich fi_info# where ucx is installedwhich ucx_info
About
Implementation of a cool communication layer