Memory Layout on AArch64 Linux

Author: Catalin Marinas <catalin.marinas@arm.com>

This document describes the virtual memory layout used by the AArch64Linux kernel. The architecture allows up to 4 levels of translationtables with a 4KB page size and up to 3 levels with a 64KB page size.

AArch64 Linux uses either 3 levels or 4 levels of translation tableswith the 4KB page configuration, allowing 39-bit (512GB) or 48-bit(256TB) virtual addresses, respectively, for both user and kernel. With64KB pages, only 2 levels of translation tables, allowing 42-bit (4TB)virtual address, are used but the memory layout is the same.

ARMv8.2 adds optional support for Large Virtual Address space. This isonly available when running with a 64KB page size and expands thenumber of descriptors in the first level of translation.

User addresses have bits 63:48 set to 0 while the kernel addresses havethe same bits set to 1. TTBRx selection is given by bit 63 of thevirtual address. The swapper_pg_dir contains only kernel (global)mappings while the user pgd contains only user (non-global) mappings.The swapper_pg_dir address is written to TTBR1 and never written toTTBR0.

AArch64 Linux memory layout with 4KB pages + 4 levels (48-bit):

Start                 End                     Size            Use-----------------------------------------------------------------------0000000000000000      0000ffffffffffff         256TB          userffff000000000000      ffff7fffffffffff         128TB          kernel logical memory mapffff800000000000      ffff9fffffffffff          32TB          kasan shadow regionffffa00000000000      ffffa00007ffffff         128MB          bpf jit regionffffa00008000000      ffffa0000fffffff         128MB          modulesffffa00010000000      fffffdffbffeffff         ~93TB          vmallocfffffdffbfff0000      fffffdfffe5f8fff        ~998MB          [guard region]fffffdfffe5f9000      fffffdfffe9fffff        4124KB          fixed mappingsfffffdfffea00000      fffffdfffebfffff           2MB          [guard region]fffffdfffec00000      fffffdffffbfffff          16MB          PCI I/O spacefffffdffffc00000      fffffdffffdfffff           2MB          [guard region]fffffdffffe00000      ffffffffffdfffff           2TB          vmemmapffffffffffe00000      ffffffffffffffff           2MB          [guard region]

AArch64 Linux memory layout with 64KB pages + 3 levels (52-bit with HW support):

Start                 End                     Size            Use-----------------------------------------------------------------------0000000000000000      000fffffffffffff           4PB          userfff0000000000000      fff7ffffffffffff           2PB          kernel logical memory mapfff8000000000000      fffd9fffffffffff        1440TB          [gap]fffda00000000000      ffff9fffffffffff         512TB          kasan shadow regionffffa00000000000      ffffa00007ffffff         128MB          bpf jit regionffffa00008000000      ffffa0000fffffff         128MB          modulesffffa00010000000      fffff81ffffeffff         ~88TB          vmallocfffff81fffff0000      fffffc1ffe58ffff          ~3TB          [guard region]fffffc1ffe590000      fffffc1ffe9fffff        4544KB          fixed mappingsfffffc1ffea00000      fffffc1ffebfffff           2MB          [guard region]fffffc1ffec00000      fffffc1fffbfffff          16MB          PCI I/O spacefffffc1fffc00000      fffffc1fffdfffff           2MB          [guard region]fffffc1fffe00000      ffffffffffdfffff        3968GB          vmemmapffffffffffe00000      ffffffffffffffff           2MB          [guard region]

Translation table lookup with 4KB pages:

+--------+--------+--------+--------+--------+--------+--------+--------+|63    56|55    48|47    40|39    32|31    24|23    16|15     8|7      0|+--------+--------+--------+--------+--------+--------+--------+--------+ |                 |         |         |         |         | |                 |         |         |         |         v |                 |         |         |         |   [11:0]  in-page offset |                 |         |         |         +-> [20:12] L3 index |                 |         |         +-----------> [29:21] L2 index |                 |         +---------------------> [38:30] L1 index |                 +-------------------------------> [47:39] L0 index +-------------------------------------------------> [63] TTBR0/1

Translation table lookup with 64KB pages:

+--------+--------+--------+--------+--------+--------+--------+--------+|63    56|55    48|47    40|39    32|31    24|23    16|15     8|7      0|+--------+--------+--------+--------+--------+--------+--------+--------+ |                 |    |               |              | |                 |    |               |              v |                 |    |               |            [15:0]  in-page offset |                 |    |               +----------> [28:16] L3 index |                 |    +--------------------------> [41:29] L2 index |                 +-------------------------------> [47:42] L1 index (48-bit) |                                                   [51:42] L1 index (52-bit) +-------------------------------------------------> [63] TTBR0/1

When using KVM without the Virtualization Host Extensions, thehypervisor maps kernel pages in EL2 at a fixed (and potentiallyrandom) offset from the linear mapping. See the kern_hyp_va macro andkvm_update_va_mask function for more details. MMIO devices such asGICv2 gets mapped next to the HYP idmap page, as do vectors whenARM64_HARDEN_EL2_VECTORS is selected for particular CPUs.

When using KVM with the Virtualization Host Extensions, no additionalmappings are created, since the host kernel runs directly in EL2.

52-bit VA support in the kernel

If the ARMv8.2-LVA optional feature is present, and we are runningwith a 64KB page size; then it is possible to use 52-bits of addressspace for both userspace and kernel addresses. However, any kernelbinary that supports 52-bit must also be able to fall back to 48-bitat early boot time if the hardware feature is not present.

This fallback mechanism necessitates the kernel .text to be in thehigher addresses such that they are invariant to 48/52-bit VAs. Dueto the kasan shadow being a fraction of the entire kernel VA space,the end of the kasan shadow must also be in the higher half of thekernel VA space for both 48/52-bit. (Switching from 48-bit to 52-bit,the end of the kasan shadow is invariant and dependent on ~0UL,whilst the start address will “grow” towards the lower addresses).

In order to optimise phys_to_virt and virt_to_phys, the PAGE_OFFSETis kept constant at 0xFFF0000000000000 (corresponding to 52-bit),this obviates the need for an extra variable read. The physvirtoffset and vmemmap offsets are computed at early boot to enablethis logic.

As a single binary will need to support both 48-bit and 52-bit VAspaces, the VMEMMAP must be sized large enough for 52-bit VAs andalso must be sized large enough to accommodate a fixed PAGE_OFFSET.

Most code in the kernel should not need to consider the VA_BITS, forcode that does need to know the VA size the variables aredefined as follows:

VA_BITS constant themaximum VA space size

VA_BITS_MIN constant theminimum VA space size

vabits_actual variable theactual VA space size

Maximum and minimum sizes can be useful to ensure that buffers aresized large enough or that addresses are positioned close enough forthe “worst” case.

52-bit userspace VAs

To maintain compatibility with software that relies on the ARMv8.0VA space maximum size of 48-bits, the kernel will, by default,return virtual addresses to userspace from a 48-bit range.

Software can “opt-in” to receiving VAs from a 52-bit space byspecifying an mmap hint parameter that is larger than 48-bit.

For example:

maybe_high_address=mmap(~0UL,size,prot,flags,...);

It is also possible to build a debug kernel that returns addressesfrom a 52-bit space by enabling the following kernel config options:

CONFIG_EXPERT=y&&CONFIG_ARM64_FORCE_52BIT=y

Note that this option is only intended for debugging applicationsand should not be used in production.