Memory Allocation Guide¶
Linux provides a variety of APIs for memory allocation. You canallocate small chunks usingkmalloc orkmem_cache_alloc families,large virtually contiguous areas usingvmalloc and its derivatives,or you can directly request pages from the page allocator withalloc_pages. It is also possible to use more specialized allocators,for instancecma_alloc orzs_malloc.
Most of the memory allocation APIs use GFP flags to express how thatmemory should be allocated. The GFP acronym stands for “get freepages”, the underlying memory allocation function.
Diversity of the allocation APIs combined with the numerous GFP flagsmakes the question “How should I allocate memory?” not that easy toanswer, although very likely you should use
kzalloc(<size>, GFP_KERNEL);
Of course there are cases when other allocation APIs and different GFPflags must be used.
Get Free Page flags¶
The GFP flags control the allocators behavior. They tell what memoryzones can be used, how hard the allocator should try to find freememory, whether the memory can be accessed by the userspace etc. TheDocumentation/core-api/mm-api.rst providesreference documentation for the GFP flags and their combinations andhere we briefly outline their recommended usage:
Most of the time
GFP_KERNELis what you need. Memory for thekernel data structures, DMAable memory, inode cache, all these andmany other allocations types can useGFP_KERNEL. Note, thatusingGFP_KERNELimpliesGFP_RECLAIM, which means thatdirect reclaim may be triggered under memory pressure; the callingcontext must be allowed to sleep.If the allocation is performed from an atomic context, e.g interrupthandler, use
GFP_NOWAIT. This flag prevents direct reclaim andIO or filesystem operations. Consequently, under memory pressureGFP_NOWAITallocation is likely to fail. Users of this flag needto provide a suitable fallback to cope with such failures whereappropriate.If you think that accessing memory reserves is justified and the kernelwill be stressed unless allocation succeeds, you may use
GFP_ATOMIC.Untrusted allocations triggered from userspace should be a subjectof kmem accounting and must have
__GFP_ACCOUNTbit set. Thereis the handyGFP_KERNEL_ACCOUNTshortcut forGFP_KERNELallocations that should be accounted.Userspace allocations should use either of the
GFP_USER,GFP_HIGHUSERorGFP_HIGHUSER_MOVABLEflags. The longerthe flag name the less restrictive it is.
GFP_HIGHUSER_MOVABLEdoes not require that allocated memorywill be directly accessible by the kernel and implies that thedata is movable.
GFP_HIGHUSERmeans that the allocated memory is not movable,but it is not required to be directly accessible by the kernel. Anexample may be a hardware allocation that maps data directly intouserspace but has no addressing limitations.
GFP_USERmeans that the allocated memory is not movable and itmust be directly accessible by the kernel.
You may notice that quite a few allocations in the existing codespecifyGFP_NOIO orGFP_NOFS. Historically, they were used toprevent recursion deadlocks caused by direct memory reclaim callingback into the FS or IO paths and blocking on already heldresources. Since 4.12 the preferred way to address this issue is touse new scope APIs described inDocumentation/core-api/gfp_mask-from-fs-io.rst.
Other legacy GFP flags areGFP_DMA andGFP_DMA32. They areused to ensure that the allocated memory is accessible by hardwarewith limited addressing capabilities. So unless you are writing adriver for a device with such restrictions, avoid using these flags.And even with hardware with restrictions it is preferable to usedma_alloc* APIs.
GFP flags and reclaim behavior¶
Memory allocations may trigger direct or background reclaim and it isuseful to understand how hard the page allocator will try to satisfy thator another request.
GFP_KERNEL&~__GFP_RECLAIM- optimistic allocation without _any_attempt to free memory at all. The most light weight mode which evendoesn’t kick the background reclaim. Should be used carefully because itmight deplete the memory and the next user might hit the more aggressivereclaim.
GFP_KERNEL&~__GFP_DIRECT_RECLAIM(orGFP_NOWAIT)- optimisticallocation without any attempt to free memory from the currentcontext but can wake kswapd to reclaim memory if the zone is belowthe low watermark. Can be used from either atomic contexts or whenthe request is a performance optimization and there is anotherfallback for a slow path.
(GFP_KERNEL|__GFP_HIGH)&~__GFP_DIRECT_RECLAIM(akaGFP_ATOMIC) -non sleeping allocation with an expensive fallback so it can accesssome portion of memory reserves. Usually used from interrupt/bottom-halfcontext with an expensive slow path fallback.
GFP_KERNEL- both background and direct reclaim are allowed and thedefault page allocator behavior is used. That means that not costlyallocation requests are basically no-fail but there is no guarantee ofthat behavior so failures have to be checked properly by callers(e.g. OOM killer victim is allowed to fail currently).
GFP_KERNEL|__GFP_NORETRY- overrides the default allocator behaviorand all allocation requests fail early rather than cause disruptivereclaim (one round of reclaim in this implementation). The OOM killeris not invoked.
GFP_KERNEL|__GFP_RETRY_MAYFAIL- overrides the default allocatorbehavior and all allocation requests try really hard. The requestwill fail if the reclaim cannot make any progress. The OOM killerwon’t be triggered.
GFP_KERNEL|__GFP_NOFAIL- overrides the default allocator behaviorand all allocation requests will loop endlessly until they succeed.This might be really dangerous especially for larger orders.
Selecting memory allocator¶
The most straightforward way to allocate memory is to use a functionfrom thekmalloc() family. And, to be on the safe side it’s best to useroutines that set memory to zero, likekzalloc(). If you need toallocate memory for an array, there arekmalloc_array() andkcalloc()helpers. The helpersstruct_size(),array_size() andarray3_size() canbe used to safely calculate object sizes without overflowing.
The maximal size of a chunk that can be allocated withkmalloc islimited. The actual limit depends on the hardware and the kernelconfiguration, but it is a good practice to usekmalloc for objectssmaller than page size.
The address of a chunk allocated withkmalloc is aligned to at leastARCH_KMALLOC_MINALIGN bytes. For sizes which are a power of two, thealignment is also guaranteed to be at least the respective size. For othersizes, the alignment is guaranteed to be at least the largest power-of-twodivisor of the size.
Chunks allocated withkmalloc() can be resized withkrealloc(). Similarlytokmalloc_array(): a helper for resizing arrays is provided in the form ofkrealloc_array().
For large allocations you can usevmalloc() andvzalloc(), or directlyrequest pages from the page allocator. The memory allocated byvmallocand related functions is not physically contiguous.
If you are not sure whether the allocation size is too large forkmalloc, it is possible to usekvmalloc() and its derivatives. It willtry to allocate memory withkmalloc and if the allocation fails itwill be retried withvmalloc. There are restrictions on which GFPflags can be used withkvmalloc; please seekvmalloc_node() referencedocumentation. Note thatkvmalloc may return memory that is notphysically contiguous.
If you need to allocate many identical objects you can use the slabcache allocator. The cache should be set up withkmem_cache_create() orkmem_cache_create_usercopy() before it can be used. The second functionshould be used if a part of the cache might be copied to the userspace.After the cache is createdkmem_cache_alloc() and its conveniencewrappers can allocate memory from that cache.
When the allocated memory is no longer needed it must be freed.
Objects allocated bykmalloc can be freed bykfree orkvfree. Objectsallocated bykmem_cache_alloc can be freed withkmem_cache_free,kfreeorkvfree, where the latter two might be more convenient thanks to notneeding the kmem_cache pointer.
The same rules apply to _bulk and _rcu flavors of freeing functions.
Memory allocated byvmalloc can be freed withvfree orkvfree.Memory allocated bykvmalloc can be freed withkvfree.Caches created bykmem_cache_create should be freed withkmem_cache_destroy only after freeing all the allocated objects first.