Memory Protection Keys¶

Memory Protection Keys provide a mechanism for enforcing page-basedprotections, but without requiring modification of the page tables when anapplication changes protection domains.

Pkeys Userspace (PKU) is a feature which can be found on:

Intel server CPUs, Skylake and later
Intel client CPUs, Tiger Lake (11th Gen Core) and later
Future AMD CPUs
arm64 CPUs implementing the Permission Overlay Extension (FEAT_S1POE)

x86_64¶

Pkeys work by dedicating 4 previously Reserved bits in each page table entry toa “protection key”, giving 16 possible keys.

Protections for each key are defined with a per-CPU user-accessible register(PKRU). Each of these is a 32-bit register storing two bits (Access Disableand Write Disable) for each of 16 keys.

Being a CPU register, PKRU is inherently thread-local, potentially giving eachthread a different set of protections from every other thread.

There are two instructions (RDPKRU/WRPKRU) for reading and writing to theregister. The feature is only available in 64-bit mode, even though there istheoretically space in the PAE PTEs. These permissions are enforced on dataaccess only and have no effect on instruction fetches.

arm64¶

Pkeys use 3 bits in each page table entry, to encode a “protection key index”,giving 8 possible keys.

Protections for each key are defined with a per-CPU user-writable systemregister (POR_EL0). This is a 64-bit register encoding read, write and executeoverlay permissions for each protection key index.

Being a CPU register, POR_EL0 is inherently thread-local, potentially givingeach thread a different set of protections from every other thread.

Unlike x86_64, the protection key permissions also apply to instructionfetches.

Syscalls¶

There are 3 system calls which directly interact with pkeys:

int pkey_alloc(unsigned long flags, unsigned long init_access_rights)int pkey_free(int pkey);int pkey_mprotect(unsigned long start, size_t len,                  unsigned long prot, int pkey);

Before a pkey can be used, it must first be allocated withpkey_alloc(). Anapplication writes to the architecture specific CPU register directly in orderto change access permissions to memory covered with a key. In this examplethis is wrapped by a C function calledpkey_set().

int real_prot = PROT_READ|PROT_WRITE;pkey = pkey_alloc(0, PKEY_DISABLE_WRITE);ptr = mmap(NULL, PAGE_SIZE, PROT_NONE, MAP_ANONYMOUS|MAP_PRIVATE, -1, 0);ret = pkey_mprotect(ptr, PAGE_SIZE, real_prot, pkey);... application runs here

Now, if the application needs to update the data at ‘ptr’, it cangain access, do the update, then remove its write access:

pkey_set(pkey, 0); // clear PKEY_DISABLE_WRITE*ptr = foo; // assign somethingpkey_set(pkey, PKEY_DISABLE_WRITE); // set PKEY_DISABLE_WRITE again

Now when it frees the memory, it will also free the pkey since itis no longer in use:

munmap(ptr, PAGE_SIZE);pkey_free(pkey);

Note

pkey_set() is a wrapper around writing to the CPU register.Example implementations can be found intools/testing/selftests/mm/pkey-{arm64,powerpc,x86}.h

Behavior¶

The kernel attempts to make protection keys consistent with thebehavior of a plainmprotect(). For instance if you do this:

mprotect(ptr, size, PROT_NONE);something(ptr);

you can expect the same effects with protection keys when doing this:

pkey = pkey_alloc(0, PKEY_DISABLE_WRITE | PKEY_DISABLE_READ);pkey_mprotect(ptr, size, PROT_READ|PROT_WRITE, pkey);something(ptr);

That should be true whethersomething() is a direct access to ‘ptr’like:

*ptr = foo;

or when the kernel does the access on the application’s behalf likewith a read():

read(fd, ptr, 1);

The kernel will send a SIGSEGV in both cases, but si_code will be setto SEGV_PKERR when violating protection keys versus SEGV_ACCERR whenthe plainmprotect() permissions are violated.

Note that kernel accesses from a kthread (such as io_uring) will use a defaultvalue for the protection key register and so will not be consistent withuserspace’s value of the register ormprotect().

Movatterモバイル変換

The Linux Kernel

Quick search

Contents

This Page

Memory Protection Keys¶

x86_64¶

arm64¶

Syscalls¶

Behavior¶