Hypercall Op-codes (hcalls)

Overview

Virtualization on 64-bit Power Book3S Platforms is based on the PAPRspecification[1] which describes the run-time environment for a guestoperating system and how it should interact with the hypervisor forprivileged operations. Currently there are two PAPR compliant hypervisors:

  • IBM PowerVM (PHYP): IBM’s proprietary hypervisor that supports AIX,IBM-i and Linux as supported guests (termed as Logical Partitionsor LPARS). It supports the full PAPR specification.

  • Qemu/KVM: Supports PPC64 linux guests running on a PPC64 linux host.Though it only implements a subset of PAPR specification called LoPAPR[2].

On PPC64 arch a guest kernel running on top of a PAPR hypervisor is calledapSeries guest. A pseries guest runs in a supervisor mode (HV=0) and mustissue hypercalls to the hypervisor whenever it needs to perform an actionthat is hypervisor privileged[3] or for other services managed by thehypervisor.

Hence a Hypercall (hcall) is essentially a request by the pseries guestasking hypervisor to perform a privileged operation on behalf of the guest. Theguest issues a with necessary input operands. The hypervisor after performingthe privilege operation returns a status code and output operands back to theguest.

HCALL ABI

The ABI specification for a hcall between a pseries guest and PAPR hypervisoris covered in section 14.5.3 of ref[2]. Switch to the Hypervisor context isdone via the instructionHVCS that expects the Opcode for hcall is set inr3and any in-arguments for the hcall are provided in registersr4-r12. If valueshave to be passed through a memory buffer, the data stored in that buffer should bein Big-endian byte order.

Once control returns back to the guest after hypervisor has serviced the‘HVCS’ instruction the return value of the hcall is available inr3 and anyout values are returned in registersr4-r12. Again like in case of in-arguments,any out values stored in a memory buffer will be in Big-endian byte order.

Powerpc arch code provides convenient wrappers namedplpar_hcall_xxx definedin a arch specific header[4] to issue hcalls from the linux kernelrunning as pseries guest.

Register Conventions

Any hcall should follow same register convention as described in section 2.2.1.1of “64-Bit ELF V2 ABI Specification: Power Architecture”[5]. Table belowsummarizes these conventions:

RegisterRange

Volatile(Y/N)

Purpose

r0

Y

Optional-usage

r1

N

Stack Pointer

r2

N

TOC

r3

Y

hcall opcode/return value

r4-r10

Y

in and out values

r11

Y

Optional-usage/Environmental pointer

r12

Y

Optional-usage/Function entry address atglobal entry point

r13

N

Thread-Pointer

r14-r31

N

Local Variables

LR

Y

Link Register

CTR

Y

Loop Counter

XER

Y

Fixed-point exception register.

CR0-1

Y

Condition register fields.

CR2-4

N

Condition register fields.

CR5-7

Y

Condition register fields.

Others

N

DRC & DRC Indexes

DR1                                  Guest+--+        +------------+         +---------+|  | <----> |            |         |  User   |+--+  DRC1  |            |   DRC   |  Space  |            |    PAPR    |  Index  +---------+DR2         | Hypervisor |         |         |+--+        |            | <-----> |  Kernel ||  | <----> |            |  Hcall  |         |+--+  DRC2  +------------+         +---------+

PAPR hypervisor terms shared hardware resources like PCI devices, NVDIMMs etcavailable for use by LPARs as Dynamic Resource (DR). When a DR is allocated toan LPAR, PHYP creates a data-structure called Dynamic Resource Connector (DRC)to manage LPAR access. An LPAR refers to a DRC via an opaque 32-bit numbercalled DRC-Index. The DRC-index value is provided to the LPAR via device-treewhere its present as an attribute in the device tree node associated with theDR.

HCALL Return-values

After servicing the hcall, hypervisor sets the return-value inr3 indicatingsuccess or failure of the hcall. In case of a failure an error code indicatesthe cause for error. These codes are defined and documented in arch specificheader[4].

In some cases a hcall can potentially take a long time and need to be issuedmultiple times in order to be completely serviced. These hcalls will usuallyaccept an opaque valuecontinue-token within there argument list and areturn value ofH_CONTINUE indicates that hypervisor hasn’t still finishedservicing the hcall yet.

To make such hcalls the guest need to setcontinue-token == 0 for theinitial call and use the hypervisor returned value ofcontinue-tokenfor each subsequent hcall until hypervisor returns a nonH_CONTINUEreturn value.

HCALL Op-codes

Below is a partial list of HCALLs that are supported by PHYP. For thecorresponding opcode values please look into the arch specific header[4]:

H_SCM_READ_METADATA

Input:drcIndex, offset, buffer-address, numBytesToRead
Out:numBytesRead
Return Value:H_Success, H_Parameter, H_P2, H_P3, H_Hardware

Given a DRC Index of an NVDIMM, read N-bytes from the metadata areaassociated with it, at a specified offset and copy it to provided buffer.The metadata area stores configuration information such as label information,bad-blocks etc. The metadata area is located out-of-band of NVDIMM storagearea hence a separate access semantics is provided.

H_SCM_WRITE_METADATA

Input:drcIndex, offset, data, numBytesToWrite
Out:None
Return Value:H_Success, H_Parameter, H_P2, H_P4, H_Hardware

Given a DRC Index of an NVDIMM, write N-bytes to the metadata areaassociated with it, at the specified offset and from the provided buffer.

H_SCM_BIND_MEM

Input:drcIndex, startingScmBlockIndex, numScmBlocksToBind,
targetLogicalMemoryAddress, continue-token
Out:continue-token, targetLogicalMemoryAddress, numScmBlocksToBound
Return Value:H_Success, H_Parameter, H_P2, H_P3, H_P4, H_Overlap,
H_Too_Big, H_P5, H_Busy

Given a DRC-Index of an NVDIMM, map a continuous SCM blocks range(startingScmBlockIndex, startingScmBlockIndex+numScmBlocksToBind) to the guestattargetLogicalMemoryAddress within guest physical address space. IncasetargetLogicalMemoryAddress == 0xFFFFFFFF_FFFFFFFF then hypervisorassigns a target address to the guest. The HCALL can fail if the Guest hasan active PTE entry to the SCM block being bound.

H_SCM_UNBIND_MEM| Input: drcIndex, startingScmLogicalMemoryAddress, numScmBlocksToUnbind| Out: numScmBlocksUnbound| Return Value:H_Success, H_Parameter, H_P2, H_P3, H_In_Use, H_Overlap,|H_Busy, H_LongBusyOrder1mSec, H_LongBusyOrder10mSec

Given a DRC-Index of an NVDimm, unmapnumScmBlocksToUnbind SCM blocks startingatstartingScmLogicalMemoryAddress from guest physical address space. TheHCALL can fail if the Guest has an active PTE entry to the SCM block beingunbound.

H_SCM_QUERY_BLOCK_MEM_BINDING

Input:drcIndex, scmBlockIndex
Out:Guest-Physical-Address
Return Value:H_Success, H_Parameter, H_P2, H_NotFound

Given a DRC-Index and an SCM Block index return the guest physical address towhich the SCM block is mapped to.

H_SCM_QUERY_LOGICAL_MEM_BINDING

Input:Guest-Physical-Address
Out:drcIndex, scmBlockIndex
Return Value:H_Success, H_Parameter, H_P2, H_NotFound

Given a guest physical address return which DRC Index and SCM block is mappedto that address.

H_SCM_UNBIND_ALL

Input:scmTargetScope, drcIndex
Out:None
Return Value:H_Success, H_Parameter, H_P2, H_P3, H_In_Use, H_Busy,
H_LongBusyOrder1mSec, H_LongBusyOrder10mSec

Depending on the Target scope unmap all SCM blocks belonging to all NVDIMMsor all SCM blocks belonging to a single NVDIMM identified by its drcIndexfrom the LPAR memory.

H_SCM_HEALTH

Input: drcIndex
Out:health-bitmap (r4), health-bit-valid-bitmap (r5)
Return Value:H_Success, H_Parameter, H_Hardware

Given a DRC Index return the info on predictive failure and overall health ofthe PMEM device. The asserted bits in the health-bitmap indicate one or more states(described in table below) of the PMEM device and health-bit-valid-bitmap indicatewhich bits in health-bitmap are valid. The bits are reported inreverse bit ordering for example a value of 0xC400000000000000indicates bits 0, 1, and 5 are valid.

Health Bitmap Flags:

Bit

Definition

00

PMEM device is unable to persist memory contents.If the system is powered down, nothing will be saved.

01

PMEM device failed to persist memory contents. Either contents werenot saved successfully on power down or were not restored properly onpower up.

02

PMEM device contents are persisted from previous IPL. The data fromthe last boot were successfully restored.

03

PMEM device contents are not persisted from previous IPL. There was nodata to restore from the last boot.

04

PMEM device memory life remaining is critically low

05

PMEM device will be garded off next IPL due to failure

06

PMEM device contents cannot persist due to current platform healthstatus. A hardware failure may prevent data from being saved orrestored.

07

PMEM device is unable to persist memory contents in certain conditions

08

PMEM device is encrypted

09

PMEM device has successfully completed a requested erase or secureerase procedure.

10:63

Reserved / Unused

H_SCM_PERFORMANCE_STATS

Input: drcIndex, resultBuffer Addr
Out: None
Return Value:H_Success, H_Parameter, H_Unsupported, H_Hardware, H_Authority, H_Privilege

Given a DRC Index collect the performance statistics for NVDIMM and copy themto the resultBuffer.

H_SCM_FLUSH

Input:drcIndex, continue-token
Out:continue-token
Return Value:H_SUCCESS, H_Parameter, H_P2, H_BUSY

Given a DRC Index Flush the data to backend NVDIMM device.

The hcall returns H_BUSY when the flush takes longer time and the hcall needsto be issued multiple times in order to be completely serviced. Thecontinue-token from the output to be passed in the argument list ofsubsequent hcalls to the hypervisor until the hcall is completely servicedat which point H_SUCCESS or other error is returned by the hypervisor.

H_HTM

Input: flags, target, operation (op), op-param1, op-param2, op-param3
Out:dumphtmbufferdata
Return Value:H_Success,H_Busy,H_LongBusyOrder,H_Partial,H_Parameter,H_P2,H_P3,H_P4,H_P5,H_P6,H_State,H_Not_Available,H_Authority

H_HTM supports setup, configuration, control and dumping of Hardware TraceMacro (HTM) function and its data. HTM buffer stores tracing data for functionslike core instruction, core LLAT and nest.

References

[1]

“Power Architecture Platform Reference”https://en.wikipedia.org/wiki/Power_Architecture_Platform_Reference

[2](1,2)

“Linux on Power Architecture Platform Reference”https://members.openpowerfoundation.org/document/dl/469

[3]

“Definitions and Notation” Book III-Section 14.5.3https://openpowerfoundation.org/?resource_lib=power-isa-version-3-0

[4](1,2,3)

arch/powerpc/include/asm/hvcall.h

[5]

“64-Bit ELF V2 ABI Specification: Power Architecture”https://openpowerfoundation.org/?resource_lib=64-bit-elf-v2-abi-specification-power-architecture