Pcode

Xe PCODE is the component responsible for interfacing with the PCODEfirmware.It shall provide a very simple ABI to other Xe components, but be thesingle and consolidated place that will communicate with PCODE. All readand write operations to PCODE will be internal and private to this component.

What’s next:- PCODE hw metrics- PCODE for display operations

Internal API

intxe_pcode_request(structxe_tile*tile,u32mbox,u32request,u32reply_mask,u32reply,inttimeout_base_ms)

send PCODE request until acknowledgment

Parameters

structxe_tile*tile

tile

u32mbox

PCODE mailbox ID the request is targeted for

u32request

request ID

u32reply_mask

mask used to check for request acknowledgment

u32reply

value used to check for request acknowledgment

inttimeout_base_ms

timeout for polling with preemption enabled

Description

Keep resending therequest tombox until PCODE acknowledges it, PCODEreports an error or an overall timeout oftimeout_base_ms**+50 ms expires.The request is acknowledged once the PCODE reply dword equals **reply afterapplyingreply_mask. Polling is first attempted with preemption enabledfortimeout_base_ms and if this times out for another 50 ms withpreemption disabled.

Returns 0 on success,-ETIMEDOUT in case of a timeout, <0 in case of someother error as reported by PCODE.

intxe_pcode_init_min_freq_table(structxe_tile*tile,u32min_gt_freq,u32max_gt_freq)

Initialize PCODE’s QOS frequency table

Parameters

structxe_tile*tile

tile instance

u32min_gt_freq

Minimal (RPn) GT frequency in units of 50MHz.

u32max_gt_freq

Maximal (RP0) GT frequency in units of 50MHz.

Description

This function initialize PCODE’s QOS frequency table for a proper minimalfrequency/power steering decision, depending on the current requested GTfrequency. For older platforms this was a more complete table includingthe IA freq. However for the latest platforms this table become a simple1-1 Ring vs GT frequency. Even though, without setting it, PCODE mightnot take the right decisions for some memory frequencies and affect latency.

It returns 0 on success, and -ERROR number on failure, -EINVAL if maxfrequency is higher then the minimal, and other errors directly translatedfrom the PCODE Error returns:- -ENXIO: “Illegal Command”- -ETIMEDOUT: “Timed out”- -EINVAL: “Illegal Data”- -ENXIO, “Illegal Subcommand”- -EBUSY: “PCODE Locked”- -EOVERFLOW, “GT ratio out of range”- -EACCES, “PCODE Rejected”- -EPROTO, “Unknown”

intxe_pcode_ready(structxe_device*xe,boollocked)

Ensure PCODE is initialized

Parameters

structxe_device*xe

xe instance

boollocked

true if lock held, false otherwise

Description

PCODE init mailbox is polled only on root gt of root tileas the root tile provides the initialization is complete onlyafter all the tiles have completed the initialization.Called only on early probe without locks and with locks inresume path.

Returns 0 on success, and -error number on failure.

voidxe_pcode_init(structxe_tile*tile)

initialize components of PCODE

Parameters

structxe_tile*tile

tile instance

Description

This function initializes the xe_pcode component.To be called once only during probe.

intxe_pcode_probe_early(structxe_device*xe)

initializes PCODE

Parameters

structxe_device*xe

xe instance

Description

This function checks the initialization status of PCODETo be called once only during early probe without locks.

Returns 0 on success, error code otherwise

Survivability Mode

Survivability Mode is a software based workflow for recovering a system in a failed boot stateHere system recoverability is concerned with recovering the firmware responsible for boot.

Boot Survivability

Boot Survivability is implemented by loading the driver with bare minimum (no drm card) to allowthe firmware to be flashed through mei driver and collect telemetry. The driver’s probe flow ismodified such that it enters survivability mode when pcode initialization is incomplete and bootstatus denotes a failure.

Survivability mode can also be entered manually using the survivability mode attribute availablethrough configfs which is beneficial in several usecases. It can be used to address scenarioswhere pcode does not detect failure or for validation purposes. It can also be used inIn-Field-Repair (IFR) to repair a single card without impacting the other cards in a node.

Use below command enable survivability mode manually:

# echo 1 > /sys/kernel/config/xe/0000:03:00.0/survivability_mode

It is the responsibility of the user to clear the mode once firmware flash is complete.

ReferXe Configfs for more details on how to use configfs

Survivability mode is indicated by the below admin-only readable sysfs entry. Itprovides information about the type of survivability mode (Boot/Runtime).

# cat /sys/bus/pci/devices/<device>/survivability_modeBoot

Any additional debug information if present will be visible under the directorysurvivability_info:

/sys/bus/pci/devices/<device>/survivability_info/├── aux_info0├── aux_info1├── aux_info2├── aux_info3├── aux_info4├── capability_info├── fdo_mode├── postcode_trace└── postcode_trace_overflow

This directory has the following attributes

  • capability_info : Indicates Boot status and support for additional information

  • postcode_trace,postcode_trace_overflow : Each postcode is a 8bit value andrepresents a boot failure event. When a new failure event is logged by PCODE theexisting postcodes are shifted left. These entries provide a history of 8 postcodes.

  • aux_info<n> : Some failures have additional debug information

  • fdo_mode : To allow recovery in scenarios where MEI itself fails, a new SPI FlashDescriptor Override (FDO) mode is added in v2 survivability breadcrumbs. This mode is enabledby PCODE and provides the ability to directly update the firmware via SPI Driver withoutany dependency on MEI. Xe KMD initializes the nvm aux driver if FDO mode is enabled.

Runtime Survivability

Certain runtime firmware errors can cause the device to enter a wedged state(Xe Device Wedging) requiring a firmware flash to restore normal operation.Runtime Survivability Mode indicates that a firmware flash is necessary to recover the device andis indicated by the presence of survivability mode sysfs.Survivability mode sysfs provides information about the type of survivability mode.

# cat /sys/bus/pci/devices/<device>/survivability_modeRuntime

When such errors occur, userspace is notified with the drm device wedged uevent and runtimesurvivability mode. User can then initiate a firmware flash using userspace tools like fwupdto restore device to normal operation.