Subsystem Trace Points: kmem

The kmem tracing system captures events related to object and page allocationwithin the kernel. Broadly speaking there are five major subheadings.

  • Slab allocation of small objects of unknown type (kmalloc)
  • Slab allocation of small objects of known type
  • Page allocation
  • Per-CPU Allocator Activity
  • External Fragmentation

This document describes what each of the tracepoints is and why theymight be useful.

1. Slab allocation of small objects of unknown type

kmalloc               call_site=%lx ptr=%p bytes_req=%zu bytes_alloc=%zu gfp_flags=%skmalloc_node  call_site=%lx ptr=%p bytes_req=%zu bytes_alloc=%zu gfp_flags=%s node=%dkfree         call_site=%lx ptr=%p

Heavy activity for these events may indicate that a specific cache isjustified, particularly if kmalloc slab pages are getting significantlyinternal fragmented as a result of the allocation pattern. By correlatingkmalloc with kfree, it may be possible to identify memory leaks and wherethe allocation sites were.

2. Slab allocation of small objects of known type

kmem_cache_alloc      call_site=%lx ptr=%p bytes_req=%zu bytes_alloc=%zu gfp_flags=%skmem_cache_alloc_node call_site=%lx ptr=%p bytes_req=%zu bytes_alloc=%zu gfp_flags=%s node=%dkmem_cache_free               call_site=%lx ptr=%p

These events are similar in usage to the kmalloc-related events except thatit is likely easier to pin the event down to a specific cache. At the timeof writing, no information is available on what slab is being allocated from,but the call_site can usually be used to extrapolate that information.

3. Page allocation

mm_page_alloc           page=%p pfn=%lu order=%d migratetype=%d gfp_flags=%smm_page_alloc_zone_locked page=%p pfn=%lu order=%u migratetype=%d cpu=%d percpu_refill=%dmm_page_free            page=%p pfn=%lu order=%dmm_page_free_batched    page=%p pfn=%lu order=%d cold=%d

These four events deal with page allocation and freeing. mm_page_alloc isa simple indicator of page allocator activity. Pages may be allocated fromthe per-CPU allocator (high performance) or the buddy allocator.

If pages are allocated directly from the buddy allocator, themm_page_alloc_zone_locked event is triggered. This event is important as highamounts of activity imply high activity on the zone->lock. Taking this lockimpairs performance by disabling interrupts, dirtying cache lines betweenCPUs and serialising many CPUs.

When a page is freed directly by the caller, the only mm_page_free eventis triggered. Significant amounts of activity here could indicate that thecallers should be batching their activities.

When pages are freed in batch, the also mm_page_free_batched is triggered.Broadly speaking, pages are taken off the LRU lock in bulk andfreed in batch with a page list. Significant amounts of activity here couldindicate that the system is under memory pressure and can also indicatecontention on the zone->lru_lock.

4. Per-CPU Allocator Activity

mm_page_alloc_zone_locked     page=%p pfn=%lu order=%u migratetype=%d cpu=%d percpu_refill=%dmm_page_pcpu_drain            page=%p pfn=%lu order=%d cpu=%d migratetype=%d

In front of the page allocator is a per-cpu page allocator. It exists onlyfor order-0 pages, reduces contention on the zone->lock and reduces theamount of writing on struct page.

When a per-CPU list is empty or pages of the wrong type are allocated,the zone->lock will be taken once and the per-CPU list refilled. The eventtriggered is mm_page_alloc_zone_locked for each page allocated with theevent indicating whether it is for a percpu_refill or not.

When the per-CPU list is too full, a number of pages are freed, each onewhich triggers a mm_page_pcpu_drain event.

The individual nature of the events is so that pages can be trackedbetween allocation and freeing. A number of drain or refill pages that occurconsecutively imply the zone->lock being taken once. Large amounts of per-CPUrefills and drains could imply an imbalance between CPUs where too much workis being concentrated in one place. It could also indicate that the per-CPUlists should be a larger size. Finally, large amounts of refills on one CPUand drains on another could be a factor in causing large amounts of cacheline bounces due to writes between CPUs and worth investigating if pagescan be allocated and freed on the same CPU through some algorithm change.

5. External Fragmentation

mm_page_alloc_extfrag         page=%p pfn=%lu alloc_order=%d fallback_order=%d pageblock_order=%d alloc_migratetype=%d fallback_migratetype=%d fragmenting=%d change_ownership=%d

External fragmentation affects whether a high-order allocation will besuccessful or not. For some types of hardware, this is important althoughit is avoided where possible. If the system is using huge pages and needsto be able to resize the pool over the lifetime of the system, this valueis important.

Large numbers of this event implies that memory is fragmenting andhigh-order allocations will start failing at some time in the future. Onemeans of reducing the occurrence of this event is to increase the size ofmin_free_kbytes in increments of 3*pageblock_size*nr_online_nodes wherepageblock_size is usually the size of the default hugepage size.