36.Using XSTATE features in user space applications¶
The x86 architecture supports floating-point extensions which areenumerated via CPUID. Applications consult CPUID and use XGETBV toevaluate which features have been enabled by the kernel XCR0.
Up to AVX-512 and PKRU states, these features are automatically enabled bythe kernel if available. Features like AMX TILE_DATA (XSTATE component 18)are enabled by XCR0 as well, but the first use of related instruction istrapped by the kernel because by default the required large XSTATE buffersare not allocated automatically.
36.1.The purpose for dynamic features¶
Legacy userspace libraries often have hard-coded, static sizes foralternate signal stacks, often using MINSIGSTKSZ which is typically 2KB.That stack must be able to store atleast the signal frame that thekernel sets up before jumping into the signal handler. That signal framemust include an XSAVE buffer defined by the CPU.
However, that means that the size of signal stacks is dynamic, not static,because different CPUs have differently-sized XSAVE buffers. A compiled-insize of 2KB with existing applications is too small for new CPU featureslike AMX. Instead of universally requiring larger stack, with the dynamicenabling, the kernel can enforce userspace applications to haveproperly-sized altstacks.
36.2.Using dynamically enabled XSTATE features in user space applications¶
The kernel provides an arch_prctl(2) based mechanism for applications torequest the usage of such features. The arch_prctl(2) options related tothis are:
- -ARCH_GET_XCOMP_SUPP
arch_prctl(ARCH_GET_XCOMP_SUPP, &features);
ARCH_GET_XCOMP_SUPP stores the supported features in userspace storage oftype uint64_t. The second argument is a pointer to that storage.
- -ARCH_GET_XCOMP_PERM
arch_prctl(ARCH_GET_XCOMP_PERM, &features);
ARCH_GET_XCOMP_PERM stores the features for which the userspace processhas permission in userspace storage of type uint64_t. The second argumentis a pointer to that storage.
- -ARCH_REQ_XCOMP_PERM
arch_prctl(ARCH_REQ_XCOMP_PERM, feature_nr);
ARCH_REQ_XCOMP_PERM allows to request permission for a dynamically enabledfeature or a feature set. A feature set can be mapped to a facility, e.g.AMX, and can require one or more XSTATE components to be enabled.
The feature argument is the number of the highest XSTATE component whichis required for a facility to work.
When requesting permission for a feature, the kernel checks theavailability. The kernel ensures that sigaltstacks in the process’s tasksare large enough to accommodate the resulting large signal frame. Itenforces this both during ARCH_REQ_XCOMP_SUPP and during any subsequentsigaltstack(2) calls. If an installed sigaltstack is smaller than theresulting sigframe size, ARCH_REQ_XCOMP_SUPP results in -ENOSUPP. Also,sigaltstack(2) results in -ENOMEM if the requested altstack is too smallfor the permitted features.
Permission, when granted, is valid per process. Permissions are inheritedon fork(2) and cleared on exec(3).
The first use of an instruction related to a dynamically enabled feature istrapped by the kernel. The trap handler checks whether the process haspermission to use the feature. If the process has no permission then thekernel sends SIGILL to the application. If the process has permission thenthe handler allocates a larger xstate buffer for the task so the largestate can be context switched. In the unlikely cases that the allocationfails, the kernel sends SIGSEGV.
36.2.1.AMX TILE_DATA enabling example¶
Below is the example of how userspace applications enableTILE_DATA dynamically:
The application first needs to query the kernel for AMXsupport:
#include <asm/prctl.h>#include <sys/syscall.h>#include <stdio.h>#include <unistd.h>#ifndef ARCH_GET_XCOMP_SUPP#define ARCH_GET_XCOMP_SUPP 0x1021#endif#ifndef ARCH_XCOMP_TILECFG#define ARCH_XCOMP_TILECFG 17#endif#ifndef ARCH_XCOMP_TILEDATA#define ARCH_XCOMP_TILEDATA 18#endif#define MASK_XCOMP_TILE ((1 << ARCH_XCOMP_TILECFG) | \ (1 << ARCH_XCOMP_TILEDATA))unsigned long features;long rc;...rc = syscall(SYS_arch_prctl, ARCH_GET_XCOMP_SUPP, &features);if (!rc && (features & MASK_XCOMP_TILE) == MASK_XCOMP_TILE) printf("AMX is available.\n");After that, determining support for AMX, an application mustexplicitly ask permission to use it:
#ifndef ARCH_REQ_XCOMP_PERM#define ARCH_REQ_XCOMP_PERM 0x1023#endif...rc = syscall(SYS_arch_prctl, ARCH_REQ_XCOMP_PERM, ARCH_XCOMP_TILEDATA);if (!rc) printf("AMX is ready for use.\n");
Note this example does not include the sigaltstack preparation.
36.3.Dynamic features in signal frames¶
Dynamically enabled features are not written to the signal frame upon signalentry if the feature is in its initial configuration. This differs fromnon-dynamic features which are always written regardless of theirconfiguration. Signal handlers can examine the XSAVE buffer’s XSTATE_BVfield to determine if a features was written.
36.4.Dynamic features for virtual machines¶
The permission for the guest state component needs to be managed separatelyfrom the host, as they are exclusive to each other. A coupled of optionsare extended to control the guest permission:
- -ARCH_GET_XCOMP_GUEST_PERM
arch_prctl(ARCH_GET_XCOMP_GUEST_PERM, &features);
ARCH_GET_XCOMP_GUEST_PERM is a variant of ARCH_GET_XCOMP_PERM. So itprovides the same semantics and functionality but for the guestcomponents.
- -ARCH_REQ_XCOMP_GUEST_PERM
arch_prctl(ARCH_REQ_XCOMP_GUEST_PERM, feature_nr);
ARCH_REQ_XCOMP_GUEST_PERM is a variant of ARCH_REQ_XCOMP_PERM. It has thesame semantics for the guest permission. While providing a similarfunctionality, this comes with a constraint. Permission is frozen when thefirst VCPU is created. Any attempt to change permission after that pointis going to be rejected. So, the permission has to be requested before thefirst VCPU creation.
Note that some VMMs may have already established a set of supported statecomponents. These options are not presumed to support any particular VMM.