GFP masks used from FS/IO context¶
- Date:
May, 2018
- Author:
Michal Hocko <mhocko@kernel.org>
Introduction¶
Code paths in the filesystem and IO stacks must be careful whenallocating memory to prevent recursion deadlocks caused by directmemory reclaim calling back into the FS or IO paths and blocking onalready held resources (e.g. locks - most commonly those used for thetransaction context).
The traditional way to avoid this deadlock problem is to clear __GFP_FSrespectively __GFP_IO (note the latter implies clearing the first as well) inthe gfp mask when calling an allocator. GFP_NOFS respectively GFP_NOIO can beused as shortcut. It turned out though that above approach has led toabuses when the restricted gfp mask is used “just in case” without adeeper consideration which leads to problems because an excessive useof GFP_NOFS/GFP_NOIO can lead to memory over-reclaim or other memoryreclaim issues.
New API¶
Since 4.12 we do have a generic scope API for both NOFS and NOIO contextmemalloc_nofs_save,memalloc_nofs_restore respectivelymemalloc_noio_save,memalloc_noio_restore which allow to mark a scope to be a criticalsection from a filesystem or I/O point of view. Any allocation from thatscope will inherently drop __GFP_FS respectively __GFP_IO from the givenmask so no memory allocation can recurse back in the FS/IO.
- unsignedintmemalloc_nofs_save(void)¶
Marks implicit GFP_NOFS allocation scope.
Parameters
voidno arguments
Description
This functions marks the beginning of the GFP_NOFS allocation scope.All further allocations will implicitly drop __GFP_FS flag and sothey are safe for the FS critical section from the allocation recursionpoint of view. Use memalloc_nofs_restore to end the scope with flagsreturned by this function.
Context
This function is safe to be used from any context.
Return
The saved flags to be passed to memalloc_nofs_restore.
- voidmemalloc_nofs_restore(unsignedintflags)¶
Ends the implicit GFP_NOFS scope.
Parameters
unsignedintflagsFlags to restore.
Description
Ends the implicit GFP_NOFS scope started by memalloc_nofs_save function.Always make sure that the given flags is the return value from thepairing memalloc_nofs_save call.
- unsignedintmemalloc_noio_save(void)¶
Marks implicit GFP_NOIO allocation scope.
Parameters
voidno arguments
Description
This functions marks the beginning of the GFP_NOIO allocation scope.All further allocations will implicitly drop __GFP_IO flag and sothey are safe for the IO critical section from the allocation recursionpoint of view. Use memalloc_noio_restore to end the scope with flagsreturned by this function.
Context
This function is safe to be used from any context.
Return
The saved flags to be passed to memalloc_noio_restore.
- voidmemalloc_noio_restore(unsignedintflags)¶
Ends the implicit GFP_NOIO scope.
Parameters
unsignedintflagsFlags to restore.
Description
Ends the implicit GFP_NOIO scope started by memalloc_noio_save function.Always make sure that the given flags is the return value from thepairing memalloc_noio_save call.
FS/IO code then simply calls the appropriate save function beforeany critical section with respect to the reclaim is started - e.g.lock shared with the reclaim context or when a transaction contextnesting would be possible via reclaim. The restore function should becalled when the critical section ends. All that ideally along with anexplanation what is the reclaim context for easier maintenance.
Please note that the proper pairing of save/restore functionsallows nesting so it is safe to callmemalloc_noio_save ormemalloc_noio_restore respectively from an existing NOIO or NOFSscope.
What about __vmalloc(GFP_NOFS)¶
Since v5.17, and specifically after thecommit 451769ebb7e79 (“mm/vmalloc:alloc GFP_NO{FS,IO} for vmalloc”), GFP_NOFS/GFP_NOIO are now supported in[k]vmalloc by implicitly using scope API.
In earlier kernelsvmalloc didn’t support GFP_NOFS semantic because therewere hardcoded GFP_KERNEL allocations deep inside the allocator. That meansthat callingvmalloc with GFP_NOFS/GFP_NOIO was almost always a bug.
In the ideal world, upper layers should already mark dangerous contextsand so no special care is required andvmalloc should be called without anyproblems. Sometimes if the context is not really clear or there are layeringviolations then the recommended way around that (on pre-v5.17 kernels) is towrapvmalloc by the scope API with a comment explaining the problem.