Hypercall Op-codes (hcalls)¶
Overview¶
Virtualization on 64-bit Power Book3S Platforms is based on the PAPRspecification[1] which describes the run-time environment for a guestoperating system and how it should interact with the hypervisor forprivileged operations. Currently there are two PAPR compliant hypervisors:
IBM PowerVM (PHYP): IBM’s proprietary hypervisor that supports AIX,IBM-i and Linux as supported guests (termed as Logical Partitionsor LPARS). It supports the full PAPR specification.
Qemu/KVM: Supports PPC64 linux guests running on a PPC64 linux host.Though it only implements a subset of PAPR specification called LoPAPR[2].
On PPC64 arch a guest kernel running on top of a PAPR hypervisor is calledapSeries guest. A pseries guest runs in a supervisor mode (HV=0) and mustissue hypercalls to the hypervisor whenever it needs to perform an actionthat is hypervisor privileged[3] or for other services managed by thehypervisor.
Hence a Hypercall (hcall) is essentially a request by the pseries guestasking hypervisor to perform a privileged operation on behalf of the guest. Theguest issues a with necessary input operands. The hypervisor after performingthe privilege operation returns a status code and output operands back to theguest.
HCALL ABI¶
The ABI specification for a hcall between a pseries guest and PAPR hypervisoris covered in section 14.5.3 of ref[2]. Switch to the Hypervisor context isdone via the instructionHVCS that expects the Opcode for hcall is set inr3and any in-arguments for the hcall are provided in registersr4-r12. If valueshave to be passed through a memory buffer, the data stored in that buffer should bein Big-endian byte order.
Once control returns back to the guest after hypervisor has serviced the‘HVCS’ instruction the return value of the hcall is available inr3 and anyout values are returned in registersr4-r12. Again like in case of in-arguments,any out values stored in a memory buffer will be in Big-endian byte order.
Powerpc arch code provides convenient wrappers namedplpar_hcall_xxx definedin a arch specific header[4] to issue hcalls from the linux kernelrunning as pseries guest.
Register Conventions¶
Any hcall should follow same register convention as described in section 2.2.1.1of “64-Bit ELF V2 ABI Specification: Power Architecture”[5]. Table belowsummarizes these conventions:
RegisterRange | Volatile(Y/N) | Purpose |
|---|---|---|
r0 | Y | Optional-usage |
r1 | N | Stack Pointer |
r2 | N | TOC |
r3 | Y | hcall opcode/return value |
r4-r10 | Y | in and out values |
r11 | Y | Optional-usage/Environmental pointer |
r12 | Y | Optional-usage/Function entry address atglobal entry point |
r13 | N | Thread-Pointer |
r14-r31 | N | Local Variables |
LR | Y | Link Register |
CTR | Y | Loop Counter |
XER | Y | Fixed-point exception register. |
CR0-1 | Y | Condition register fields. |
CR2-4 | N | Condition register fields. |
CR5-7 | Y | Condition register fields. |
Others | N |
DRC & DRC Indexes¶
DR1 Guest+--+ +------------+ +---------+| | <----> | | | User |+--+ DRC1 | | DRC | Space | | PAPR | Index +---------+DR2 | Hypervisor | | |+--+ | | <-----> | Kernel || | <----> | | Hcall | |+--+ DRC2 +------------+ +---------+
PAPR hypervisor terms shared hardware resources like PCI devices, NVDIMMs etcavailable for use by LPARs as Dynamic Resource (DR). When a DR is allocated toan LPAR, PHYP creates a data-structure called Dynamic Resource Connector (DRC)to manage LPAR access. An LPAR refers to a DRC via an opaque 32-bit numbercalled DRC-Index. The DRC-index value is provided to the LPAR via device-treewhere its present as an attribute in the device tree node associated with theDR.
HCALL Return-values¶
After servicing the hcall, hypervisor sets the return-value inr3 indicatingsuccess or failure of the hcall. In case of a failure an error code indicatesthe cause for error. These codes are defined and documented in arch specificheader[4].
In some cases a hcall can potentially take a long time and need to be issuedmultiple times in order to be completely serviced. These hcalls will usuallyaccept an opaque valuecontinue-token within there argument list and areturn value ofH_CONTINUE indicates that hypervisor hasn’t still finishedservicing the hcall yet.
To make such hcalls the guest need to setcontinue-token == 0 for theinitial call and use the hypervisor returned value ofcontinue-tokenfor each subsequent hcall until hypervisor returns a nonH_CONTINUEreturn value.
HCALL Op-codes¶
Below is a partial list of HCALLs that are supported by PHYP. For thecorresponding opcode values please look into the arch specific header[4]:
H_SCM_READ_METADATA
Given a DRC Index of an NVDIMM, read N-bytes from the metadata areaassociated with it, at a specified offset and copy it to provided buffer.The metadata area stores configuration information such as label information,bad-blocks etc. The metadata area is located out-of-band of NVDIMM storagearea hence a separate access semantics is provided.
H_SCM_WRITE_METADATA
Given a DRC Index of an NVDIMM, write N-bytes to the metadata areaassociated with it, at the specified offset and from the provided buffer.
H_SCM_BIND_MEM
Given a DRC-Index of an NVDIMM, map a continuous SCM blocks range(startingScmBlockIndex, startingScmBlockIndex+numScmBlocksToBind) to the guestattargetLogicalMemoryAddress within guest physical address space. IncasetargetLogicalMemoryAddress == 0xFFFFFFFF_FFFFFFFF then hypervisorassigns a target address to the guest. The HCALL can fail if the Guest hasan active PTE entry to the SCM block being bound.
H_SCM_UNBIND_MEM| Input: drcIndex, startingScmLogicalMemoryAddress, numScmBlocksToUnbind| Out: numScmBlocksUnbound| Return Value:H_Success, H_Parameter, H_P2, H_P3, H_In_Use, H_Overlap,|H_Busy, H_LongBusyOrder1mSec, H_LongBusyOrder10mSec
Given a DRC-Index of an NVDimm, unmapnumScmBlocksToUnbind SCM blocks startingatstartingScmLogicalMemoryAddress from guest physical address space. TheHCALL can fail if the Guest has an active PTE entry to the SCM block beingunbound.
H_SCM_QUERY_BLOCK_MEM_BINDING
Given a DRC-Index and an SCM Block index return the guest physical address towhich the SCM block is mapped to.
H_SCM_QUERY_LOGICAL_MEM_BINDING
Given a guest physical address return which DRC Index and SCM block is mappedto that address.
H_SCM_UNBIND_ALL
Depending on the Target scope unmap all SCM blocks belonging to all NVDIMMsor all SCM blocks belonging to a single NVDIMM identified by its drcIndexfrom the LPAR memory.
H_SCM_HEALTH
Given a DRC Index return the info on predictive failure and overall health ofthe PMEM device. The asserted bits in the health-bitmap indicate one or more states(described in table below) of the PMEM device and health-bit-valid-bitmap indicatewhich bits in health-bitmap are valid. The bits are reported inreverse bit ordering for example a value of 0xC400000000000000indicates bits 0, 1, and 5 are valid.
Health Bitmap Flags:
Bit | Definition |
|---|---|
00 | PMEM device is unable to persist memory contents.If the system is powered down, nothing will be saved. |
01 | PMEM device failed to persist memory contents. Either contents werenot saved successfully on power down or were not restored properly onpower up. |
02 | PMEM device contents are persisted from previous IPL. The data fromthe last boot were successfully restored. |
03 | PMEM device contents are not persisted from previous IPL. There was nodata to restore from the last boot. |
04 | PMEM device memory life remaining is critically low |
05 | PMEM device will be garded off next IPL due to failure |
06 | PMEM device contents cannot persist due to current platform healthstatus. A hardware failure may prevent data from being saved orrestored. |
07 | PMEM device is unable to persist memory contents in certain conditions |
08 | PMEM device is encrypted |
09 | PMEM device has successfully completed a requested erase or secureerase procedure. |
10:63 | Reserved / Unused |
H_SCM_PERFORMANCE_STATS
Given a DRC Index collect the performance statistics for NVDIMM and copy themto the resultBuffer.
H_SCM_FLUSH
Given a DRC Index Flush the data to backend NVDIMM device.
The hcall returns H_BUSY when the flush takes longer time and the hcall needsto be issued multiple times in order to be completely serviced. Thecontinue-token from the output to be passed in the argument list ofsubsequent hcalls to the hypervisor until the hcall is completely servicedat which point H_SUCCESS or other error is returned by the hypervisor.
H_HTM
H_HTM supports setup, configuration, control and dumping of Hardware TraceMacro (HTM) function and its data. HTM buffer stores tracing data for functionslike core instruction, core LLAT and nest.
References¶
[1]“Power Architecture Platform Reference”https://en.wikipedia.org/wiki/Power_Architecture_Platform_Reference
[2](1,2)“Linux on Power Architecture Platform Reference”https://members.openpowerfoundation.org/document/dl/469
[3]“Definitions and Notation” Book III-Section 14.5.3https://openpowerfoundation.org/?resource_lib=power-isa-version-3-0
[4](1,2,3)arch/powerpc/include/asm/hvcall.h
[5]“64-Bit ELF V2 ABI Specification: Power Architecture”https://openpowerfoundation.org/?resource_lib=64-bit-elf-v2-abi-specification-power-architecture