33.Software Guard eXtensions (SGX)¶
33.1.Overview¶
Software Guard eXtensions (SGX) hardware enables for user space applicationsto set aside private memory regions of code and data:
Privileged (ring-0) ENCLS functions orchestrate the construction of theregions.
Unprivileged (ring-3) ENCLU functions allow an application to enter andexecute inside the regions.
These memory regions are called enclaves. An enclave can be only entered at afixed set of entry points. Each entry point can hold a single hardware threadat a time. While the enclave is loaded from a regular binary file by usingENCLS functions, only the threads inside the enclave can access its memory. Theregion is denied from outside access by the CPU, and encrypted before it leavesfrom LLC.
The support can be determined by
grepsgx/proc/cpuinfo
SGX must both be supported in the processor and enabled by the BIOS. If SGXappears to be unsupported on a system which has hardware support, ensuresupport is enabled in the BIOS. If a BIOS presents a choice between “Enabled”and “Software Enabled” modes for SGX, choose “Enabled”.
33.2.Enclave Page Cache¶
SGX utilizes anEnclave Page Cache (EPC) to store pages that are associatedwith an enclave. It is contained in a BIOS-reserved region of physical memory.Unlike pages used for regular memory, pages can only be accessed from outside ofthe enclave during enclave construction with special, limited SGX instructions.
Only a CPU executing inside an enclave can directly access enclave memory.However, a CPU executing inside an enclave may access normal memory outside theenclave.
The kernel manages enclave memory similar to how it treats device memory.
33.2.1.Enclave Page Types¶
- SGX Enclave Control Structure (SECS)
Enclave’s address range, attributes and other global data are definedby this structure.
- Regular (REG)
Regular EPC pages contain the code and data of an enclave.
- Thread Control Structure (TCS)
Thread Control Structure pages define the entry points to an enclave andtrack the execution state of an enclave thread.
- Version Array (VA)
Version Array pages contain 512 slots, each of which can contain a versionnumber for a page evicted from the EPC.
33.2.2.Enclave Page Cache Map¶
The processor tracks EPC pages in a hardware metadata structure called theEnclave Page Cache Map (EPCM). The EPCM contains an entry for each EPC pagewhich describes the owning enclave, access rights and page type among the otherthings.
EPCM permissions are separate from the normal page tables. This prevents thekernel from, for instance, allowing writes to data which an enclave wishes toremain read-only. EPCM permissions may only impose additional restrictions ontop of normal x86 page permissions.
For all intents and purposes, the SGX architecture allows the processor toinvalidate all EPCM entries at will. This requires that software be prepared tohandle an EPCM fault at any time. In practice, this can happen on events likepower transitions when the ephemeral key that encrypts enclave memory is lost.
33.3.Application interface¶
33.3.1.Enclave build functions¶
In addition to the traditional compiler and linker build process, SGX has aseparate enclave “build” process. Enclaves must be built before they can beexecuted (entered). The first step in building an enclave is opening the/dev/sgx_enclave device. Since enclave memory is protected from directaccess, special privileged instructions are then used to copy data into enclavepages and establish enclave page permissions.
- longsgx_ioc_enclave_create(structsgx_encl*encl,void__user*arg)¶
handler for
SGX_IOC_ENCLAVE_CREATE
Parameters
structsgx_encl*enclAn enclave pointer.
void__user*argThe ioctl argument.
Description
Allocate kernel data structures for the enclave and invoke ECREATE.
Return
0: Success.
-EIO: ECREATE failed.
-errno: POSIX error.
- longsgx_ioc_enclave_add_pages(structsgx_encl*encl,void__user*arg)¶
The handler for
SGX_IOC_ENCLAVE_ADD_PAGES
Parameters
structsgx_encl*enclan enclave pointer
void__user*arga user pointer to a
structsgx_enclave_add_pagesinstance
Description
Add one or more pages to an uninitialized enclave, and optionally extend themeasurement with the contents of the page. The SECINFO and measurement maskare applied to all pages.
A SECINFO for a TCS is required to always contain zero permissions becauseCPU silently zeros them. Allowing anything else would cause a mismatch inthe measurement.
mmap()’s protection bits are capped by the page permissions. For each pageaddress, the maximum protection bits are computed with the followingheuristics:
A regular page: PROT_R, PROT_W and PROT_X match the SECINFO permissions.
A TCS page: PROT_R | PROT_W.
mmap() is not allowed to surpass the minimum of the maximum protection bitswithin the given address range.
The function deinitializes kernel data structures for enclave and returns-EIO in any of the following conditions:
Enclave Page Cache (EPC), the physical memory holding enclaves, hasbeen invalidated. This will cause EADD and EEXTEND to fail.
If the source address is corrupted somehow when executing EADD.
Return
0: Success.
-EACCES: The source page is located in a noexec partition.
-ENOMEM: Out of EPC pages.
-EINTR: The call was interrupted before data was processed.
- -EIO: Either EADD or EEXTEND failed because invalid source address
or power cycle.
-errno: POSIX error.
- longsgx_ioc_enclave_init(structsgx_encl*encl,void__user*arg)¶
handler for
SGX_IOC_ENCLAVE_INIT
Parameters
structsgx_encl*enclan enclave pointer
void__user*arguserspace pointer to a
structsgx_enclave_initinstance
Description
Flush any outstanding enqueued EADD operations and perform EINIT. TheLaunch Enclave Public Key Hash MSRs are rewritten as necessary to matchthe enclave’s MRSIGNER, which is calculated from the provided sigstruct.
Return
0: Success.
-EPERM: Invalid SIGSTRUCT.
-EIO: EINIT failed because of a power cycle.
-errno: POSIX error.
- longsgx_ioc_enclave_provision(structsgx_encl*encl,void__user*arg)¶
handler for
SGX_IOC_ENCLAVE_PROVISION
Parameters
structsgx_encl*enclan enclave pointer
void__user*arguserspace pointer to a
structsgx_enclave_provisioninstance
Description
Allow ATTRIBUTE.PROVISION_KEY for an enclave by providing a file handle to/dev/sgx_provision.
Return
0: Success.
-errno: Otherwise.
33.3.2.Enclave runtime management¶
Systems supporting SGX2 additionally support changes to initializedenclaves: modifying enclave page permissions and type, and dynamicallyadding and removing of enclave pages. When an enclave accesses an addresswithin its address range that does not have a backing page then a newregular page will be dynamically added to the enclave. The enclave isstill required to run EACCEPT on the new page before it can be used.
- longsgx_ioc_enclave_restrict_permissions(structsgx_encl*encl,void__user*arg)¶
handler for
SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS
Parameters
structsgx_encl*enclan enclave pointer
void__user*arguserspace pointer to a
structsgx_enclave_restrict_permissionsinstance
Description
SGX2 distinguishes between relaxing and restricting the enclave pagepermissions maintained by the hardware (EPCM permissions) of pagesbelonging to an initialized enclave (after SGX_IOC_ENCLAVE_INIT).
EPCM permissions cannot be restricted from within the enclave, the enclaverequires the kernel to run the privileged level 0 instructions ENCLS[EMODPR]and ENCLS[ETRACK]. An attempt to relax EPCM permissions with this callwill be ignored by the hardware.
Return
0: Success
-errno: Otherwise
- longsgx_ioc_enclave_modify_types(structsgx_encl*encl,void__user*arg)¶
handler for
SGX_IOC_ENCLAVE_MODIFY_TYPES
Parameters
structsgx_encl*enclan enclave pointer
void__user*arguserspace pointer to a
structsgx_enclave_modify_typesinstance
Description
Ability to change the enclave page type supports the following use cases:
It is possible to add TCS pages to an enclave by changing the type ofregular pages (
SGX_PAGE_TYPE_REG) to TCS (SGX_PAGE_TYPE_TCS) pages.With this support the number of threads supported by an initializedenclave can be increased dynamically.Regular or TCS pages can dynamically be removed from an initializedenclave by changing the page type to
SGX_PAGE_TYPE_TRIM. Changing thepage type toSGX_PAGE_TYPE_TRIMmarks the page for removal with actualremoval done by handler ofSGX_IOC_ENCLAVE_REMOVE_PAGESioctl() calledafter ENCLU[EACCEPT] is run onSGX_PAGE_TYPE_TRIMpage from within theenclave.
Return
0: Success
-errno: Otherwise
- longsgx_ioc_enclave_remove_pages(structsgx_encl*encl,void__user*arg)¶
handler for
SGX_IOC_ENCLAVE_REMOVE_PAGES
Parameters
structsgx_encl*enclan enclave pointer
void__user*arguserspace pointer to
structsgx_enclave_remove_pagesinstance
Description
Final step of the flow removing pages from an initialized enclave. Thecomplete flow is:
User changes the type of the pages to be removed to
SGX_PAGE_TYPE_TRIMusing theSGX_IOC_ENCLAVE_MODIFY_TYPESioctl().User approves the page removal by running ENCLU[EACCEPT] from withinthe enclave.
User initiates actual page removal using the
SGX_IOC_ENCLAVE_REMOVE_PAGESioctl() that is handled here.
First remove any page table entries pointing to the page and then proceedwith the actual removal of the enclave page and data in support of it.
VA pages are not affected by this removal. It is thus possible that theenclave may end up with more VA pages than needed to support all itspages.
Return
0: Success
-errno: Otherwise
33.3.3.Enclave vDSO¶
Entering an enclave can only be done through SGX-specific EENTER and ERESUMEfunctions, and is a non-trivial process. Because of the complexity oftransitioning to and from an enclave, enclaves typically utilize a library tohandle the actual transitions. This is roughly analogous to how glibcimplementations are used by most applications to wrap system calls.
Another crucial characteristic of enclaves is that they can generate exceptionsas part of their normal operation that need to be handled in the enclave or areunique to SGX.
Instead of the traditional signal mechanism to handle these exceptions, SGXcan leverage special exception fixup provided by the vDSO. The kernel-providedvDSO function wraps low-level transitions to/from the enclave like EENTER andERESUME. The vDSO function intercepts exceptions that would otherwise generatea signal and return the fault information directly to its caller. This avoidsthe need to juggle signal handlers.
- vdso_sgx_enter_enclave_t¶
Typedef: Prototype for
__vdso_sgx_enter_enclave(), a vDSO function to enter an SGX enclave.
Syntax
intvdso_sgx_enter_enclave_t(unsignedlongrdi,unsignedlongrsi,unsignedlongrdx,unsignedintfunction,unsignedlongr8,unsignedlongr9,structsgx_enclave_run*run)
Parameters
unsignedlongrdiPass-through value for RDI
unsignedlongrsiPass-through value for RSI
unsignedlongrdxPass-through value for RDX
unsignedintfunctionENCLU function, must be EENTER or ERESUME
unsignedlongr8Pass-through value for R8
unsignedlongr9Pass-through value for R9
structsgx_enclave_run*runstructsgx_enclave_run, must be non-NULL
NOTE
__vdso_sgx_enter_enclave() does not ensure full compliance with thex86-64 ABI, e.g. doesn’t handle XSAVE state. Except for non-volatilegeneral purpose registers, EFLAGS.DF, and RSP alignment, preserving/settingstate in accordance with the x86-64 ABI is the responsibility of the enclaveand its runtime, i.e.__vdso_sgx_enter_enclave() cannot be called from Ccode without careful consideration by both the enclave and its runtime.
All general purpose registers except RAX, RBX and RCX are passed as-is to theenclave. RAX, RBX and RCX are consumed by EENTER and ERESUME and are loadedwithfunction, asynchronous exit pointer, andrun.tcs respectively.
RBP and the stack are used to anchor__vdso_sgx_enter_enclave() to thepre-enclave state, e.g. to retrieverun.exception andrun.user_handlerafter an enclave exit. All other registers are available for use by theenclave and its runtime, e.g. an enclave can push additional data onto thestack (and modify RSP) to pass information to the optional user handler (seebelow).
Most exceptions reported on ENCLU, including those that occur within theenclave, are fixed up and reported synchronously instead of being deliveredvia a standard signal. Debug Exceptions (#DB) and Breakpoints (#BP) arenever fixed up and are always delivered via standard signals. On synchronouslyreported exceptions, -EFAULT is returned and details about the exception arerecorded inrun.exception, the optional sgx_enclave_exception struct.
Return
0: ENCLU function was successfully executed.
-EINVAL: Invalid ENCL number (neither EENTER nor ERESUME).
33.4.ksgxd¶
SGX support includes a kernel thread calledksgxd.
33.4.1.EPC sanitization¶
ksgxd is started when SGX initializes. Enclave memory is typically readyfor use when the processor powers on or resets. However, if SGX has been inuse since the reset, enclave pages may be in an inconsistent state. This mightoccur after a crash andkexec() cycle, for instance. At boot, ksgxdreinitializes all enclave pages so that they can be allocated and re-used.
The sanitization is done by going through EPC address space and applying theEREMOVE function to each physical page. Some enclave pages like SECS pages havehardware dependencies on other pages which prevents EREMOVE from functioning.Executing two EREMOVE passes removes the dependencies.
33.4.2.Page reclaimer¶
Similar to the core kswapd, ksgxd, is responsible for managing theovercommitment of enclave memory. If the system runs out of enclave memory,ksgxd “swaps” enclave memory to normal memory.
33.5.Launch Control¶
SGX provides a launch control mechanism. After all enclave pages have beencopied, kernel executes EINIT function, which initializes the enclave. Only afterthis the CPU can execute inside the enclave.
EINIT function takes an RSA-3072 signature of the enclave measurement. The functionchecks that the measurement is correct and signature is signed with the keyhashed to the fourIA32_SGXLEPUBKEYHASH{0, 1, 2, 3} MSRs representing theSHA256 of a public key.
Those MSRs can be configured by the BIOS to be either readable or writable.Linux supports only writable configuration in order to give full control to thekernel on launch control policy. Before calling EINIT function, the driver setsthe MSRs to match the enclave’s signing key.
33.6.Encryption engines¶
In order to conceal the enclave data while it is out of the CPU package, thememory controller has an encryption engine to transparently encrypt and decryptenclave memory.
In CPUs prior to Ice Lake, the Memory Encryption Engine (MEE) is used toencrypt pages leaving the CPU caches. MEE uses a n-ary Merkle tree with root inSRAM to maintain integrity of the encrypted data. This provides integrity andanti-replay protection but does not scale to large memory sizes because the timerequired to update the Merkle tree grows logarithmically in relation to thememory size.
CPUs starting from Icelake use Total Memory Encryption (TME) in the place ofMEE. TME-based SGX implementations do not have an integrity Merkle tree, whichmeans integrity and replay-attacks are not mitigated. B, it includesadditional changes to prevent cipher text from being returned and SW memoryaliases from being created.
DMA to enclave memory is blocked by range registers on both MEE and TME systems(SDM section 41.10).
33.7.Usage Models¶
33.7.1.Shared Library¶
Sensitive data and the code that acts on it is partitioned from the applicationinto a separate library. The library is then linked as a DSO which can be loadedinto an enclave. The application can then make individual function calls intothe enclave through special SGX instructions. A run-time within the enclave isconfigured to marshal function parameters into and out of the enclave and tocall the correct library function.
33.7.2.Application Container¶
An application may be loaded into a container enclave which is speciallyconfigured with a library OS and run-time which permits the application to run.The enclave run-time and library OS work together to execute the applicationwhen a thread enters the enclave.
33.8.Impact of Potential Kernel SGX Bugs¶
33.8.1.EPC leaks¶
When EPC page leaks happen, a WARNING like this is shown in dmesg:
“EREMOVE returned ... and an EPC page was leaked. SGX may become unusable...”
This is effectively a kernel use-after-free of an EPC page, and dueto the way SGX works, the bug is detected at freeing. Rather thanadding the page back to the pool of available EPC pages, the kernelintentionally leaks the page to avoid additional errors in the future.
When this happens, the kernel will likely soon leak more EPC pages, andSGX will likely become unusable because the memory available to SGX islimited. However, while this may be fatal to SGX, the rest of the kernelis unlikely to be impacted and should continue to work.
As a result, when this happens, user should stop running any newSGX workloads, (or just any new workloads), and migrate all valuableworkloads. Although a machine reboot can recover all EPC memory, the bugshould be reported to Linux developers.
33.9.Virtual EPC¶
The implementation has also a virtual EPC driver to support SGX enclavesin guests. Unlike the SGX driver, an EPC page allocated by the virtualEPC driver doesn’t have a specific enclave associated with it. This isbecause KVM doesn’t track how a guest uses EPC pages.
As a result, the SGX core page reclaimer doesn’t support reclaiming EPCpages allocated to KVM guests through the virtual EPC driver. If theuser wants to deploy SGX applications both on the host and in guestson the same machine, the user should reserve enough EPC (by taking outtotal virtual EPC size of all SGX VMs from the physical EPC size) forhost SGX applications so they can run with acceptable performance.
Architectural behavior is to restore all EPC pages to an uninitializedstate also after a guest reboot. Because this state can be reached onlythrough the privilegedENCLS[EREMOVE] instruction,/dev/sgx_vepcprovides theSGX_IOC_VEPC_REMOVE_ALL ioctl to execute the instructionon all pages in the virtual EPC.
EREMOVE can fail for three reasons. Userspace must pay attentionto expected failures and handle them as follows:
Page removal will always fail when any thread is running in theenclave to which the page belongs. In this case the ioctl willreturn
EBUSYindependent of whether it has successfully removedsome pages; userspace can avoid these failures by preventing executionof any vcpu which maps the virtual EPC.Page removal will cause a general protection fault if two calls to
EREMOVEhappen concurrently for pages that refer to the same“SECS” metadata pages. This can happen if there are concurrentinvocations toSGX_IOC_VEPC_REMOVE_ALL, or if a/dev/sgx_vepcfile descriptor in the guest is closed at the same time asSGX_IOC_VEPC_REMOVE_ALL; it will also be reported asEBUSY.This can be avoided in userspace by serializing calls to the ioctl()and to close(), but in general it should not be a problem.Finally, page removal will fail for SECS metadata pages which stillhave child pages. Child pages can be removed by executing
SGX_IOC_VEPC_REMOVE_ALLon all/dev/sgx_vepcfile descriptorsmapped into the guest. This means that the ioctl() must be calledtwice: an initial set of calls to remove child pages and a subsequentset of calls to remove SECS pages. The second set of calls is onlyrequired for those mappings that returned a nonzero value from thefirst call. It indicates a bug in the kernel or the userspace clientif any of the second round ofSGX_IOC_VEPC_REMOVE_ALLcalls hasa return code other than 0.