IBM Virtual Management Channel Kernel Driver (IBMVMC)¶
| Authors: | Dave Engebretsen <engebret@us.ibm.com>,Adam Reznechek <adreznec@linux.vnet.ibm.com>,Steven Royer <seroyer@linux.vnet.ibm.com>,Bryant G. Ly <bryantly@linux.vnet.ibm.com>, |
|---|
Introduction¶
Note: Knowledge of virtualization technology is required to understandthis document.
A good reference document would be:
https://openpowerfoundation.org/wp-content/uploads/2016/05/LoPAPR_DRAFT_v11_24March2016_cmt1.pdf
The Virtual Management Channel (VMC) is a logical device which provides aninterface between the hypervisor and a management partition. This interfaceis like a message passing interface. This management partition is intendedto provide an alternative to systems that use a Hardware ManagementConsole (HMC) - based system management.
The primary hardware management solution that is developed by IBM relieson an appliance server named the Hardware Management Console (HMC),packaged as an external tower or rack-mounted personal computer. In aPower Systems environment, a single HMC can manage multiple POWERprocessor-based systems.
Management Application¶
In the management partition, a management application exists which enablesa system administrator to configure the system’s partitioningcharacteristics via a command line interface (CLI) or RepresentationalState Transfer Application (REST API’s).
The management application runs on a Linux logical partition on aPOWER8 or newer processor-based server that is virtualized by PowerVM.System configuration, maintenance, and control functions whichtraditionally require an HMC can be implemented in the managementapplication using a combination of HMC to hypervisor interfaces andexisting operating system methods. This tool provides a subset of thefunctions implemented by the HMC and enables basic partition configuration.The set of HMC to hypervisor messages supported by the managementapplication component are passed to the hypervisor over a VMC interface,which is defined below.
The VMC enables the management partition to provide basic partitioningfunctions:
- Logical Partitioning Configuration
- Start, and stop actions for individual partitions
- Display of partition status
- Management of virtual Ethernet
- Management of virtual Storage
- Basic system management
Virtual Management Channel (VMC)¶
A logical device, called the Virtual Management Channel (VMC), is definedfor communicating between the management application and the hypervisor. Itbasically creates the pipes that enable virtualization managementsoftware. This device is presented to a designated management partition asa virtual device.
This communication device uses Command/Response Queue (CRQ) and theRemote Direct Memory Access (RDMA) interfaces. A three-way handshake isdefined that must take place to establish that both the hypervisor andmanagement partition sides of the channel are running prior tosending/receiving any of the protocol messages.
This driver also utilizes Transport Event CRQs. CRQ messages are sentwhen the hypervisor detects one of the peer partitions has abnormallyterminated, or one side has called H_FREE_CRQ to close their CRQ.Two new classes of CRQ messages are introduced for the VMC device. VMCAdministrative messages are used for each partition using the VMC tocommunicate capabilities to their partner. HMC Interface messages are usedfor the actual flow of HMC messages between the management partition andthe hypervisor. As most HMC messages far exceed the size of a CRQ buffer,a virtual DMA (RMDA) of the HMC message data is done prior to each HMCInterface CRQ message. Only the management partition drives RDMAoperations; hypervisors never directly cause the movement of message data.
Terminology¶
- RDMA
- Remote Direct Memory Access is DMA transfer from the server to itsclient or from the server to its partner partition. DMA refersto both physical I/O to and from memory operations and to memoryto memory move operations.
- CRQ
- Command/Response Queue a facility which is used to communicatebetween partner partitions. Transport events which are signaledfrom the hypervisor to partition are also reported in this queue.
Example Management Partition VMC Driver Interface¶
This section provides an example for the management applicationimplementation where a device driver is used to interface to the VMCdevice. This driver consists of a new device, for example /dev/ibmvmc,which provides interfaces to open, close, read, write, and performioctl’s against the VMC device.
VMC Interface Initialization¶
The device driver is responsible for initializing the VMC when the driveris loaded. It first creates and initializes the CRQ. Next, an exchange ofVMC capabilities is performed to indicate the code version and number ofresources available in both the management partition and the hypervisor.Finally, the hypervisor requests that the management partition create aninitial pool of VMC buffers, one buffer for each possible HMC connection,which will be used for management application session initialization.Prior to completion of this initialization sequence, the device returnsEBUSY to open() calls. EIO is returned for all open() failures.
Management Partition Hypervisor CRQ INIT----------------------------------------> CRQ INIT COMPLETE<---------------------------------------- CAPABILITIES----------------------------------------> CAPABILITIES RESPONSE<---------------------------------------- ADD BUFFER (HMC IDX=0,1,..) _<---------------------------------------- | ADD BUFFER RESPONSE | - Perform # HMCs Iterations----------------------------------------> -
VMC Interface Open¶
After the basic VMC channel has been initialized, an HMC session levelconnection can be established. The application layer performs an open() tothe VMC device and executes an ioctl() against it, indicating the HMC ID(32 bytes of data) for this session. If the VMC device is in an invalidstate, EIO will be returned for the ioctl(). The device driver creates anew HMC session value (ranging from 1 to 255) and HMC index value (startingat index 0 and ranging to 254) for this HMC ID. The driver then does anRDMA of the HMC ID to the hypervisor, and then sends an Interface Openmessage to the hypervisor to establish the session over the VMC. After thehypervisor receives this information, it sends Add Buffer messages to themanagement partition to seed an initial pool of buffers for the new HMCconnection. Finally, the hypervisor sends an Interface Open Responsemessage, to indicate that it is ready for normal runtime messaging. Thefollowing illustrates this VMC flow:
Management Partition Hypervisor RDMA HMC ID----------------------------------------> Interface Open----------------------------------------> Add Buffer _<---------------------------------------- | Add Buffer Response | - Perform N Iterations----------------------------------------> - Interface Open Response<----------------------------------------
VMC Interface Runtime¶
During normal runtime, the management application and the hypervisorexchange HMC messages via the Signal VMC message and RDMA operations. Whensending data to the hypervisor, the management application performs awrite() to the VMC device, and the driver RDMA’s the data to the hypervisorand then sends a Signal Message. If a write() is attempted before VMCdevice buffers have been made available by the hypervisor, or no buffersare currently available, EBUSY is returned in response to the write(). Awrite() will return EIO for all other errors, such as an invalid devicestate. When the hypervisor sends a message to the management, the data isput into a VMC buffer and an Signal Message is sent to the VMC driver inthe management partition. The driver RDMA’s the buffer into the partitionand passes the data up to the appropriate management application via aread() to the VMC device. The read() request blocks if there is no bufferavailable to read. The management application may use select() to wait forthe VMC device to become ready with data to read.
Management Partition Hypervisor MSG RDMA----------------------------------------> SIGNAL MSG----------------------------------------> SIGNAL MSG<---------------------------------------- MSG RDMA<----------------------------------------
VMC Interface Close¶
HMC session level connections are closed by the management partition whenthe application layer performs a close() against the device. This actionresults in an Interface Close message flowing to the hypervisor, whichcauses the session to be terminated. The device driver must free anystorage allocated for buffers for this HMC connection.
Management Partition Hypervisor INTERFACE CLOSE----------------------------------------> INTERFACE CLOSE RESPONSE<----------------------------------------
Additional Information¶
For more information on the documentation for CRQ Messages, VMC Messages,HMC interface Buffers, and signal messages please refer to the Linux onPower Architecture Platform Reference. Section F.