English

Memory Protection Keys

Memory Protection Keys provide a mechanism for enforcing page-basedprotections, but without requiring modification of the page tables when anapplication changes protection domains.

Pkeys Userspace (PKU) is a feature which can be found on:
  • Intel server CPUs, Skylake and later

  • Intel client CPUs, Tiger Lake (11th Gen Core) and later

  • Future AMD CPUs

  • arm64 CPUs implementing the Permission Overlay Extension (FEAT_S1POE)

x86_64

Pkeys work by dedicating 4 previously Reserved bits in each page table entry toa “protection key”, giving 16 possible keys.

Protections for each key are defined with a per-CPU user-accessible register(PKRU). Each of these is a 32-bit register storing two bits (Access Disableand Write Disable) for each of 16 keys.

Being a CPU register, PKRU is inherently thread-local, potentially giving eachthread a different set of protections from every other thread.

There are two instructions (RDPKRU/WRPKRU) for reading and writing to theregister. The feature is only available in 64-bit mode, even though there istheoretically space in the PAE PTEs. These permissions are enforced on dataaccess only and have no effect on instruction fetches.

arm64

Pkeys use 3 bits in each page table entry, to encode a “protection key index”,giving 8 possible keys.

Protections for each key are defined with a per-CPU user-writable systemregister (POR_EL0). This is a 64-bit register encoding read, write and executeoverlay permissions for each protection key index.

Being a CPU register, POR_EL0 is inherently thread-local, potentially givingeach thread a different set of protections from every other thread.

Unlike x86_64, the protection key permissions also apply to instructionfetches.

Syscalls

There are 3 system calls which directly interact with pkeys:

int pkey_alloc(unsigned long flags, unsigned long init_access_rights)int pkey_free(int pkey);int pkey_mprotect(unsigned long start, size_t len,                  unsigned long prot, int pkey);

Before a pkey can be used, it must first be allocated withpkey_alloc(). Anapplication writes to the architecture specific CPU register directly in orderto change access permissions to memory covered with a key. In this examplethis is wrapped by a C function calledpkey_set().

int real_prot = PROT_READ|PROT_WRITE;pkey = pkey_alloc(0, PKEY_DISABLE_WRITE);ptr = mmap(NULL, PAGE_SIZE, PROT_NONE, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);ret = pkey_mprotect(ptr, PAGE_SIZE, real_prot, pkey);... application runs here

Now, if the application needs to update the data at ‘ptr’, it cangain access, do the update, then remove its write access:

pkey_set(pkey, 0); // clear PKEY_DISABLE_WRITE*ptr = foo; // assign somethingpkey_set(pkey, PKEY_DISABLE_WRITE); // set PKEY_DISABLE_WRITE again

Now when it frees the memory, it will also free the pkey since itis no longer in use:

munmap(ptr, PAGE_SIZE);pkey_free(pkey);

Note

pkey_set() is a wrapper around writing to the CPU register.Example implementations can be found intools/testing/selftests/mm/pkey-{arm64,powerpc,x86}.h

Behavior

The kernel attempts to make protection keys consistent with thebehavior of a plainmprotect(). For instance if you do this:

mprotect(ptr, size, PROT_NONE);something(ptr);

you can expect the same effects with protection keys when doing this:

pkey = pkey_alloc(0, PKEY_DISABLE_WRITE | PKEY_DISABLE_READ);pkey_mprotect(ptr, size, PROT_READ|PROT_WRITE, pkey);something(ptr);

That should be true whethersomething() is a direct access to ‘ptr’like:

*ptr = foo;

or when the kernel does the access on the application’s behalf likewith a read():

read(fd, ptr, 1);

The kernel will send a SIGSEGV in both cases, but si_code will be setto SEGV_PKERR when violating protection keys versus SEGV_ACCERR whenthe plainmprotect() permissions are violated.

Note that kernel accesses from a kthread (such as io_uring) will use a defaultvalue for the protection key register and so will not be consistent withuserspace’s value of the register ormprotect().