- Notifications
You must be signed in to change notification settings - Fork37
A library for constructing allocators and memory pools. It also contains broadly useful abstractions and utilities for memory management. UMF allows users to manage multiple memory pools characterized by different attributes, allowing certain allocation types to be isolated from others and allocated using different hardware resources as required.
License
oneapi-src/unified-memory-framework
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
The Unified Memory Framework (UMF) is a library for constructing allocators and memory pools. It also contains broadly useful abstractions and utilities for memory management. UMF allows users to manage multiple memory pools characterized by different attributes, allowing certain allocation types to be isolated from others and allocated using different hardware resources as required.
For a quick introduction to UMF usage, please seeexamplesdocumentation, which includes the code of thebasic example.There are also more advanced examples that allocate USM memory from theLevel Zero device using the Level Zero API and UMF Level Zero memory provider andCUDA device using the CUDA API and UMF CUDA memory provider.
Required packages:
- libhwloc-dev >= 2.3.0 (Linux) / hwloc >= 2.3.0 (Windows)
- C compiler
- CMake >= 3.14.0
For development and contributions:
- clang-format-15.0 (can be installed with
python -m pip install clang-format==15.0.7
) - cmake-format-0.6 (can be installed with
python -m pip install cmake-format==0.6.13
) - black (can be installed with
python -m pip install black==24.3.0
)
Note: All devs dependencies are defined in
third-party/requirements.txt
and can be installed, for example:pip install -r third_party/requirements.txt
.
For building tests and multithreaded benchmarks:
- C++ compiler with C++17 support
For Level Zero memory provider tests:
- Level Zero headers and libraries
- compatible GPU with installed driver
Executable and binaries will be inbuild/bin.The{build_config}
can be eitherDebug
orRelease
.
cmake -B build -DCMAKE_BUILD_TYPE={build_config}cmake --build build -j$(nproc)
Generating Visual Studio Project. EXE and binaries will be inbuild/bin/{build_config}.The{build_config}
can be eitherDebug
orRelease
.
cmake -B build -G"Visual Studio 15 2017 Win64"cmake --build build --config {build_config} -j$Env:NUMBER_OF_PROCESSORS
UMF comes with a single-threaded micro benchmark based onubench.In order to build the benchmark, theUMF_BUILD_BENCHMARKS
CMake configuration flag has to be turnedON
.
UMF also provides multithreaded benchmarks that can be enabled by setting bothUMF_BUILD_BENCHMARKS
andUMF_BUILD_BENCHMARKS_MT
CMakeconfiguration flags toON
. Multithreaded benchmarks require C++ support.
The Scalable Pool requirements can be found in the relevant 'Memory Poolmanagers' section below.
List of sanitizers available on Linux:
- AddressSanitizer
- UndefinedBehaviorSanitizer
- ThreadSanitizer
- Is mutually exclusive with other sanitizers.
- MemorySanitizer
- Requires linking against MSan-instrumented libraries to prevent false positive reports. More informationhere.
List of sanitizers available on Windows:
- AddressSanitizer
Listed sanitizers can be enabled with appropriateCMake options.
To enable fuzz testing, theUMF_BUILD_FUZZTESTS
CMake configuration flag mustbe set toON
. Note, that this feature is supported only on Linux and requiresClang. Additionally, ensure that theCMAKE_PREFIX_PATH
includes the directorycontaining the libraries necessary for fuzzing (e.g., Clang'slibclang_rt.fuzzer_no_main-x86_64.a).
Example:
cmake -B build -DCMAKE_C_COMPILER=clang -DCMAKE_CXX_COMPILER=clang++ -DCMAKE_BUILD_TYPE=Debug -DUMF_BUILD_FUZZTESTS=ON -DCMAKE_PREFIX_PATH=/path/to/fuzzer/libs
List of options provided by CMake:
Name | Description | Values | Default |
---|---|---|---|
UMF_BUILD_SHARED_LIBRARY | Build UMF as shared library | ON/OFF | OFF |
UMF_BUILD_LEVEL_ZERO_PROVIDER | Build Level Zero memory provider | ON/OFF | ON |
UMF_BUILD_CUDA_PROVIDER | Build CUDA memory provider | ON/OFF | ON |
UMF_BUILD_LIBUMF_POOL_JEMALLOC | Build the libumf_pool_jemalloc static library | ON/OFF | OFF |
UMF_BUILD_TESTS | Build UMF tests | ON/OFF | ON |
UMF_BUILD_GPU_TESTS | Build UMF GPU tests | ON/OFF | OFF |
UMF_BUILD_BENCHMARKS | Build UMF benchmarks | ON/OFF | OFF |
UMF_BUILD_EXAMPLES | Build UMF examples | ON/OFF | ON |
UMF_BUILD_FUZZTESTS | Build UMF fuzz tests (supported only on Linux with Clang) | ON/OFF | OFF |
UMF_BUILD_GPU_EXAMPLES | Build UMF GPU examples | ON/OFF | OFF |
UMF_DEVELOPER_MODE | Enable additional developer checks | ON/OFF | OFF |
UMF_FORMAT_CODE_STYLE | Add clang, cmake, and black -format-check and -format-apply targets to make | ON/OFF | OFF |
UMF_TESTS_FAIL_ON_SKIP | Treat skips in tests as fail | ON/OFF | OFF |
UMF_USE_ASAN | Enable AddressSanitizer checks | ON/OFF | OFF |
UMF_USE_UBSAN | Enable UndefinedBehaviorSanitizer checks | ON/OFF | OFF |
UMF_USE_TSAN | Enable ThreadSanitizer checks | ON/OFF | OFF |
UMF_USE_MSAN | Enable MemorySanitizer checks | ON/OFF | OFF |
UMF_USE_VALGRIND | Enable Valgrind instrumentation | ON/OFF | OFF |
UMF_USE_COVERAGE | Build with coverage enabled (Linux only) | ON/OFF | OFF |
UMF_LINK_HWLOC_STATICALLY | Link UMF with HWLOC library statically (proxy library will be disabled on Windows+Debug build) | ON/OFF | OFF |
A UMF memory pool is a combination of a pool allocator and a memory provider. A memory provider is responsible forcoarse-grained memory allocations and management of memory pages, while the pool allocator controls memory poolingand handles fine-grained memory allocations.
Pool allocator can leverage existing allocators (e.g. jemalloc or tbbmalloc) or be written from scratch.
UMF comes with predefined pool allocators (seeinclude/umf/pools
) and providers(seeinclude/umf/providers
). UMF can also work with user-defined pools andproviders that implement a specific interface (seeinclude/umf/memory_pool_ops.h
andinclude/umf/memory_provider_ops.h
).
More detailed documentation is available here:https://oneapi-src.github.io/unified-memory-framework/
A memory provider that can provide memory from a given pre-allocated buffer.
A memory provider that provides memory from an operating system.
OS memory provider supports two types of memory mappings (set by thevisibility
parameter):
- private memory mapping (
UMF_MEM_MAP_PRIVATE
) - shared memory mapping (
UMF_MEM_MAP_SHARED
- supported on Linux only yet)
IPC API requires theUMF_MEM_MAP_SHARED
memoryvisibility
mode(UMF_RESULT_ERROR_INVALID_ARGUMENT
is returned otherwise).
IPC API uses file descriptor duplication, which requires thepidfd_getfd(2)
system call to obtaina duplicate of another process's file descriptor. This system call is supported since Linux 5.6.Required permission ("restricted ptrace") is governed by thePTRACE_MODE_ATTACH_REALCREDS
check(seeptrace(2)
). To allow file descriptor duplication in a binary that opens IPC handle, you can callprctl(PR_SET_PTRACER, ...)
in the producer binary that gets the IPC handle.Alternatively you can change theptrace_scope
globally in the system, e.g.:
sudo bash -c"echo 0 > /proc/sys/kernel/yama/ptrace_scope"
There are available two mechanisms for the shared memory mapping:
- a named shared memory object (used if the
shm_name
parameter is not NULL) or - an anonymous file descriptor (used if the
shm_name
parameter is NULL)
Theshm_name
parameter should be a null-terminated string of up to NAME_MAX (i.e., 255) characters none of which are slashes.
An anonymous file descriptor for the shared memory mapping will be created using:
memfd_secret()
syscall - (if it is implemented and) if theUMF_MEM_FD_FUNC
environment variable does not contain the "memfd_create" string ormemfd_create()
syscall - otherwise (and if it is implemented).
IPC API on Linux requires thePTRACE_MODE_ATTACH_REALCREDS
permission (seeptrace(2)
)to duplicate another process's file descriptor (see above).
Packages required for tests (Linux-only yet):
- libnuma-dev
A memory provider that provides memory from L0 device.
IPC API uses file descriptor duplication, which requires thepidfd_getfd(2)
system call to obtaina duplicate of another process's file descriptor. This system call is supported since Linux 5.6.Required permission ("restricted ptrace") is governed by thePTRACE_MODE_ATTACH_REALCREDS
check(seeptrace(2)
). To allow file descriptor duplication in a binary that opens IPC handle, you can callprctl(PR_SET_PTRACER, ...)
in the producer binary that gets the IPC handle.Alternatively you can change theptrace_scope
globally in the system, e.g.:
sudo bash -c"echo 0 > /proc/sys/kernel/yama/ptrace_scope"
- Linux or Windows OS
- The
UMF_BUILD_LEVEL_ZERO_PROVIDER
option turnedON
(by default) - IPC API on Linux requires the
PTRACE_MODE_ATTACH_REALCREDS
permission (seeptrace(2)
)to duplicate another process's file descriptor (see above).
Additionally, required for tests:
- The
UMF_BUILD_GPU_TESTS
option turnedON
- System with Level Zero compatible GPU
- Required packages:
- liblevel-zero-dev (Linux) or level-zero-sdk (Windows)
A memory provider that provides memory from a device DAX (a character device file like/dev/daxX.Y
).It can be used when large memory mappings are needed.
- Linux OS
- A character device file /dev/daxX.Y created in the OS.
A memory provider that provides memory by mapping a regular, extendable file.
IPC API requires theUMF_MEM_MAP_SHARED
memoryvisibility
mode(UMF_RESULT_ERROR_INVALID_ARGUMENT
is returned otherwise).
The memory visibility mode parameter must be set toUMF_MEM_MAP_SHARED
in case of FSDAX.
- Linux OS
- A length of a path of a file to be mapped can be
PATH_MAX
(4096) characters at most.
A memory provider that provides memory from CUDA device.
- Linux or Windows OS
- The
UMF_BUILD_CUDA_PROVIDER
option turnedON
(by default)
Additionally, required for tests:
- The
UMF_BUILD_GPU_TESTS
option turnedON
- System with CUDA compatible GPU
- Required packages:
- nvidia-cuda-dev (Linux) or cuda-sdk (Windows)
This memory pool is distributed as part of libumf. It forwards all requests to the underlyingmemory provider. Currently umfPoolRealloc, umfPoolCalloc and umfPoolMallocUsableSize functionsare not supported by the proxy pool.
The Disjoint pool is designed to keep internal metadata separate from user data.This separation is particularly useful when user data needs to be placed in memory with relatively high latency,such as GPU memory or disk storage.
Jemalloc pool is ajemalloc-based memorypool manager built as a separate static library: libjemalloc_pool.a on Linux andjemalloc_pool.lib on Windows.TheUMF_BUILD_LIBUMF_POOL_JEMALLOC
option has to be turnedON
to build this library.
jemalloc is required to build the jemalloc pool.
In case of Linux OS jemalloc is built from the (fetched) sources with the followingnon-default options enabled:
--with-jemalloc-prefix=je_
- adds theje_
prefix to all public APIs,--disable-cxx
- disables C++ integration, it will cause thenew
and thedelete
operators implementations to be omitted.--disable-initial-exec-tls
- disables the initial-exec TLS model for jemalloc'sinternal thread-local storage (on those platforms that supportexplicit settings), it can allow jemalloc to be dynamicallyloaded after program startup (e.g. usingdlopen()
).
The default jemalloc package is required on Windows.
- The
UMF_BUILD_LIBUMF_POOL_JEMALLOC
option turnedON
- jemalloc is required:
- on Linux and MacOS: jemalloc is fetched and built from sources (a custom build),
- on Windows: the default jemalloc package is required
Scalable Pool is aoneTBB-based memory pool manager.It is distributed as part of libumf. To use this pool, TBB must be installed in the system.
Packages required for using this pool and executing tests/benchmarks (not required for build):
- libtbb-dev (libtbbmalloc.so.2) on Linux or tbb (tbbmalloc.dll) on Windows
Note: The memspace, memtarget and mempolicy APIs are experimental and may change in future releases.
TODO: Add general information about memspaces.
Memspace backed by all available NUMA nodes discovered on the platform. Can be retrievedusing umfMemspaceHostAllGet.
Memspace backed by all available NUMA nodes discovered on the platform sorted by capacity.Can be retrieved using umfMemspaceHighestCapacityGet.
Memspace backed by an aggregated list of NUMA nodes identified as highest bandwidth after selecting each available NUMA node as the initiator.Querying the bandwidth value requires HMAT support on the platform. CallingumfMemspaceHighestBandwidthGet()
will return NULL if it's not supported.
Memspace backed by an aggregated list of NUMA nodes identified as lowest latency after selecting each available NUMA node as the initiator.Querying the latency value requires HMAT support on the platform. CallingumfMemspaceLowestLatencyGet()
will return NULL if it's not supported.
UMF provides the UMF proxy library (umf_proxy
) that makes it possibleto override the default allocator in other programs in both Linux and Windows.
To enable this feature, theUMF_BUILD_SHARED_LIBRARY
option needs to be turnedON
.
In case of Linux it can be done without any code changes using theLD_PRELOAD
environment variable:
LD_PRELOAD=/usr/lib/libumf_proxy.so myprogram
The memory used by the proxy memory allocator is mmap'ed:
- with the
MAP_PRIVATE
flag by default or - with the
MAP_SHARED
flag if theUMF_PROXY
environment variable contains one of two following strings:page.disposition=shared-shm
orpage.disposition=shared-fd
. These two options differ in a mechanism used during IPC:page.disposition=shared-shm
- IPC uses the named shared memory. An SHM name is generated using theumf_proxy_lib_shm_pid_$PID
pattern, where$PID
is the PID of the process. It creates the/dev/shm/umf_proxy_lib_shm_pid_$PID
file.page.disposition=shared-fd
- IPC API uses file descriptor duplication, which requires thepidfd_getfd(2)
system call to obtain a duplicate of another process's file descriptor. This system call is supported since Linux 5.6. Required permission ("restricted ptrace") is governed by thePTRACE_MODE_ATTACH_REALCREDS
check (seeptrace(2)
). To allow file descriptor duplication in a binary that opens IPC handle, you can callprctl(PR_SET_PTRACER, ...)
in the producer binary that gets the IPC handle. Alternatively you can change theptrace_scope
globally in the system, e.g.:sudo bash -c "echo 0 > /proc/sys/kernel/yama/ptrace_scope"
.
Size threshold
Thesize threshold feature (Linux only) causes that all allocations of size less than the given threshold value go to the default system allocator instead of the proxy library.It can be enabled by adding thesize.threshold=<value>
string to theUMF_PROXY
environment variable (with';'
as a separator), for example:UMF_PROXY="page.disposition=shared-shm;size.threshold=64"
.
Remark: changing a size of allocation (usingrealloc()
) does not change the allocator (realloc(malloc(threshold - 1), threshold + 1)
still belongs to the default system allocator andrealloc(malloc(threshold + 1), threshold - 1)
still belongs to the proxy library pool allocator).
In case of Windows it requires:
- explicitly linking your program dynamically with the
umf_proxy.dll
library - (C++ code only) including
proxy_lib_new_delete.h
in a single(!) source file in your projectto override also thenew
/delete
operations.
All contributions to the UMF project are most welcome! Before submittingan issue or a Pull Request, please readContribution Guide.
To enable logging in UMF source files please follow the guide in theweb documentation.
Integration of UMF into another project via CMake'sFetchContent,is possible with:
include(FetchContent)FetchContent_Declare(unified-memory-frameworkGIT_REPOSITORYhttps://github.com/oneapi-src/unified-memory-framework.gitGIT_TAGmain# This will pull the latest (potentially unstable) changes from the main branch)FetchContent_MakeAvailable(unified-memory-framework)add_executable(some_examplesome_example.cpp)target_include_directories(some_examplePRIVATE${unified-memory-framework_SOURCE_DIR}/include)target_link_libraries(some_examplePRIVATEumf::umfumf::headers)
The contents of this repository may have been developed with support from one or more Intel-operated generative artificial intelligence solutions.
About
A library for constructing allocators and memory pools. It also contains broadly useful abstractions and utilities for memory management. UMF allows users to manage multiple memory pools characterized by different attributes, allowing certain allocation types to be isolated from others and allocated using different hardware resources as required.
Topics
Resources
License
Code of conduct
Security policy
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Packages0
Uh oh!
There was an error while loading.Please reload this page.