Split page table lock

Originally, mm->page_table_lock spinlock protected all page tables of themm_struct. But this approach leads to poor page fault scalability ofmulti-threaded applications due high contention on the lock. To improvescalability, split page table lock was introduced.

With split page table lock we have separate per-table lock to serializeaccess to the table. At the moment we use split lock for PTE and PMDtables. Access to higher level tables protected by mm->page_table_lock.

There are helpers to lock/unlock a table and other accessor functions:

  • pte_offset_map_lock()
    maps pte and takes PTE table lock, returns pointer to the takenlock;
  • pte_unmap_unlock()
    unlocks and unmaps PTE table;
  • pte_alloc_map_lock()
    allocates PTE table if needed and take the lock, returns pointerto taken lock or NULL if allocation failed;
  • pte_lockptr()
    returns pointer to PTE table lock;
  • pmd_lock()
    takes PMD table lock, returns pointer to taken lock;
  • pmd_lockptr()
    returns pointer to PMD table lock;

Split page table lock for PTE tables is enabled compile-time ifCONFIG_SPLIT_PTLOCK_CPUS (usually 4) is less or equal to NR_CPUS.If split lock is disabled, all tables guaded by mm->page_table_lock.

Split page table lock for PMD tables is enabled, if it’s enabled for PTEtables and the architecture supports it (see below).

Hugetlb and split page table lock

Hugetlb can support several page sizes. We use split lock only for PMDlevel, but not for PUD.

Hugetlb-specific helpers:

  • huge_pte_lock()
    takes pmd split lock for PMD_SIZE page, mm->page_table_lockotherwise;
  • huge_pte_lockptr()
    returns pointer to table lock;

Support of split page table lock by an architecture

There’s no need in special enabling of PTE split page table lock: everythingrequired is done by pgtable_pte_page_ctor() and pgtable_pte_page_dtor(), whichmust be called on PTE table allocation / freeing.

Make sure the architecture doesn’t use slab allocator for page tableallocation: slab uses page->slab_cache for its pages.This field shares storage with page->ptl.

PMD split lock only makes sense if you have more than two page tablelevels.

PMD split lock enabling requires pgtable_pmd_page_ctor() call on PMD tableallocation and pgtable_pmd_page_dtor() on freeing.

Allocation usually happens in pmd_alloc_one(), freeing in pmd_free() andpmd_free_tlb(), but make sure you cover all PMD table allocation / freeingpaths: i.e X86_PAE preallocate few PMDs on pgd_alloc().

With everything in place you can set CONFIG_ARCH_ENABLE_SPLIT_PMD_PTLOCK.

NOTE: pgtable_pte_page_ctor() and pgtable_pmd_page_ctor() can fail – it mustbe handled properly.

page->ptl

page->ptl is used to access split page table lock, where ‘page’ is structpage of page containing the table. It shares storage with page->private(and few other fields in union).

To avoid increasing size of struct page and have best performance, we use atrick:

  • if spinlock_t fits into long, we use page->ptr as spinlock, so wecan avoid indirect access and save a cache line.
  • if size of spinlock_t is bigger then size of long, we use page->ptl aspointer to spinlock_t and allocate it dynamically. This allows to usesplit lock with enabled DEBUG_SPINLOCK or DEBUG_LOCK_ALLOC, but costsone more cache line for indirect access;

The spinlock_t allocated in pgtable_pte_page_ctor() for PTE table and inpgtable_pmd_page_ctor() for PMD table.

Please, never access page->ptl directly – use appropriate helper.