Memory hotplug¶
Memory hotplug event notifier¶
Hotplugging events are sent to a notification queue.
There are six types of notification defined ininclude/linux/memory.h:
- MEM_GOING_ONLINE
- Generated before new memory becomes available in order to be able toprepare subsystems to handle memory. The page allocator is still unableto allocate from the new memory.
- MEM_CANCEL_ONLINE
- Generated if MEM_GOING_ONLINE fails.
- MEM_ONLINE
- Generated when memory has successfully brought online. The callback mayallocate pages from the new memory.
- MEM_GOING_OFFLINE
- Generated to begin the process of offlining memory. Allocations are nolonger possible from the memory but some of the memory to be offlinedis still in use. The callback can be used to free memory known to asubsystem from the indicated memory block.
- MEM_CANCEL_OFFLINE
- Generated if MEM_GOING_OFFLINE fails. Memory is available again fromthe memory block that we attempted to offline.
- MEM_OFFLINE
- Generated after offlining memory is complete.
A callback routine can be registered by calling:
hotplug_memory_notifier(callback_func, priority)
Callback functions with higher values of priority are called before callbackfunctions with lower values.
A callback function must have the following prototype:
int callback_func( struct notifier_block *self, unsigned long action, void *arg);
The first argument of the callback function (self) is a pointer to the blockof the notifier chain that points to the callback function itself.The second argument (action) is one of the event types described above.The third argument (arg) passes a pointer of struct memory_notify:
struct memory_notify { unsigned long start_pfn; unsigned long nr_pages; int status_change_nid_normal; int status_change_nid_high; int status_change_nid;}start_pfn is start_pfn of online/offline memory.
nr_pages is # of pages of online/offline memory.
status_change_nid_normal is set node id when N_NORMAL_MEMORY of nodemaskis (will be) set/clear, if this is -1, then nodemask status is not changed.
status_change_nid_high is set node id when N_HIGH_MEMORY of nodemaskis (will be) set/clear, if this is -1, then nodemask status is not changed.
status_change_nid is set node id when N_MEMORY of nodemask is (will be)set/clear. It means a new(memoryless) node gets new memory by online and anode loses all memory. If this is -1, then nodemask status is not changed.
If status_changed_nid* >= 0, callback should create/discard structures for thenode if necessary.
The callback routine shall return one of the valuesNOTIFY_DONE, NOTIFY_OK, NOTIFY_BAD, NOTIFY_STOPdefined ininclude/linux/notifier.h
NOTIFY_DONE and NOTIFY_OK have no effect on the further processing.
NOTIFY_BAD is used as response to the MEM_GOING_ONLINE, MEM_GOING_OFFLINE,MEM_ONLINE, or MEM_OFFLINE action to cancel hotplugging. It stopsfurther processing of the notification queue.
NOTIFY_STOP stops further processing of the notification queue.
Locking Internals¶
When adding/removing memory that uses memory block devices (i.e. ordinary RAM),the device_hotplug_lock should be held to:
- synchronize against online/offline requests (e.g. via sysfs). This way, memoryblock devices can only be accessed (.online/.state attributes) by userspace once memory has been fully added. And when removing memory, weknow nobody is in critical sections.
- synchronize against CPU hotplug and similar (e.g. relevant for ACPI and PPC)
Especially, there is a possible lock inversion that is avoided usingdevice_hotplug_lock when adding memory and user space tries to online thatmemory faster than expected:
- device_online() will first take the device_lock(), followed bymem_hotplug_lock
- add_memory_resource() will first take the mem_hotplug_lock, followed bythe device_lock() (while creating the devices, during bus_add_device()).
As the device is visible to user space before taking the device_lock(), thiscan result in a lock inversion.
onlining/offlining of memory should be done via device_online()/device_offline() - to make sure it is properly synchronized to actionsvia sysfs. Holding device_hotplug_lock is advised (to e.g. protect online_type)
When adding/removing/onlining/offlining memory or adding/removingheterogeneous/device memory, we should always hold the mem_hotplug_lock inwrite mode to serialise memory hotplug (e.g. access to global/zonevariables).
In addition, mem_hotplug_lock (in contrast to device_hotplug_lock) in readmode allows for a quite efficient get_online_mems/put_online_memsimplementation, so code accessing memory can protect from that memoryvanishing.