Page Table Check¶
Introduction¶
Page table check allows to harden the kernel by ensuring that some types ofthe memory corruptions are prevented.
Page table check performs extra verifications at the time when new pages becomeaccessible from the userspace by getting their page table entries (PTEs PMDsetc.) added into the table.
In case of most detected corruption, the kernel is crashed. There is a smallperformance and memory overhead associated with the page table check. Therefore,it is disabled by default, but can be optionally enabled on systems where theextra hardening outweighs the performance costs. Also, because page table checkis synchronous, it can help with debugging double map memory corruption issues,by crashing kernel at the time wrong mapping occurs instead of later which isoften the case with memory corruptions bugs.
It can also be used to do page table entry checks over various flags, dumpwarnings when illegal combinations of entry flags are detected. Currently,userfaultfd is the only user of such to sanity check wr-protect bit againstany writable flags. Illegal flag combinations will not directly cause datacorruption in this case immediately, but that will cause read-only data tobe writable, leading to corrupt when the page content is later modified.
Double mapping detection logic¶
Current Mapping | New mapping | Permissions | Rule |
|---|---|---|---|
Anonymous | Anonymous | Read | Allow |
Anonymous | Anonymous | Read / Write | Prohibit |
Anonymous | Named | Any | Prohibit |
Named | Anonymous | Any | Prohibit |
Named | Named | Any | Allow |
Enabling Page Table Check¶
Build kernel with:
PAGE_TABLE_CHECK=yNote, it can only be enabled on platforms where ARCH_SUPPORTS_PAGE_TABLE_CHECKis available.
Boot with ‘page_table_check=on’ kernel parameter.
Optionally, build kernel with PAGE_TABLE_CHECK_ENFORCED in order to have pagetable support without extra kernel parameter.
Implementation notes¶
We specifically decided not to use VMA information in order to avoid relying onMM states (except for limited “structpage” info). The page table check is aseparate from Linux-MM state machine that verifies that the user accessiblepages are not falsely shared.
PAGE_TABLE_CHECK depends on EXCLUSIVE_SYSTEM_RAM. The reason is that withoutEXCLUSIVE_SYSTEM_RAM, users are allowed to map arbitrary physical memoryregions into the userspace via /dev/mem. At the same time, pages may changetheir properties (e.g., from anonymous pages to named pages) while they arestill being mapped in the userspace, leading to “corruption” detected by thepage table check.
Even with EXCLUSIVE_SYSTEM_RAM, I/O pages may be still allowed to be mapped via/dev/mem. However, these pages are always considered as named pages, so theywon’t break the logic used in the page table check.