Firmware¶
Firmware Layout¶
The CSS-based firmware structure is used for GuC releases on all platformsand for HuC releases up to DG1. Starting from DG2/MTL the HuC uses the GSClayout instead.The CSS firmware layout looks like this:
+======================================================================+| Firmware blob |+===============+===============+============+============+============+| CSS header | uCode | RSA key | modulus | exponent |+===============+===============+============+============+============+ <-header size-> <---header size continued -----------> <--- size -----------------------------------------------------------> <-key size-> <-mod size-> <-exp size->
The firmware may or may not have modulus key and exponent data. The header,uCode and RSA signature are must-have components that will be used by driver.Length of each components, which is all in dwords, can be found in header.In the case that modulus and exponent are not present in fw, a.k.a truncatedimage, the length value still appears in header.
Driver will do some basic fw size validation based on the following rules:
Header, uCode and RSA are must-have components.
All firmware components, if they present, are in the sequence illustratedin the layout table above.
Length info of each component can be found in header, in dwords.
Modulus and exponent key are not required by driver. They may not appearin fw. So driver will load a truncated firmware in this case.
The GSC-based firmware structure is used for GSC releases on all platformsand for HuC releases starting from DG2/MTL. Older HuC releases use theCSS-based layout instead. Differently from the CSS headers, the GSC headersuses a directory + entries structure (i.e., there is array of addressespointing to specific header extensions identified by a name). Although theheader structures are the same, some of the entries are specific to GSC whileothers are specific to HuC. The manifest header entry, which includes basicinformation about the binary (like the version) is always present, but it isnamed differently based on the binary type.
The HuC binary starts with a Code Partition Directory (CPD) header. Theentries we’re interested in for use in the driver are:
“HUCP.man”: points to the manifest header for the HuC.
“huc_fw”: points to the FW code. On platforms that support load via DMAand 2-step HuC authentication (i.e. MTL+) this is a full CSS-based binary,while if the GSC is the one doing the load (which only happens on DG2)this section only contains the uCode.
The GSC-based HuC firmware layout looks like this:
+================================================+| CPD Header |+================================================+| CPD entries[] || entry1 || ... || entryX || "HUCP.man" || ... || offset >----------------------------|------o| ... | || entryY | || "huc_fw" | || ... | || offset >----------------------------|----------o+================================================+ | | | |+================================================+ | || Manifest Header |<-----o || ... | || FW version | || ... | |+================================================+ | |+================================================+ || FW binary |<---------o| CSS (MTL+ only) || uCode || RSA Key (MTL+ only) || ... |+================================================+
The GSC binary starts instead with a layout header, which contains thelocations of the various partitions of the binary. The one we’re interestedin is the boot1 partition, where we can find a BPDT header followed byentries, one of which points to the RBE sub-section of the partition, whichcontains the CPD. The GSC blob does not contain a CSS-based binary, so weonly need to look for the manifest, which is under the “RBEP.man” CPD entry.Note that we have no need to find where the actual FW code is inside theimage because the GSC ROM will itself parse the headers to find it and loadit.The GSC firmware header layout looks like this:
+================================================+| Layout Pointers || ... || Boot1 offset >---------------------------|------o| ... | |+================================================+ | |+================================================+ || BPDT header |<-----o+================================================+| BPDT entries[] || entry1 || ... || entryX || type == GSC_RBE || offset >-----------------------------|------o| ... | |+================================================+ | |+================================================+ || CPD Header |<-----o+================================================+| CPD entries[] || entry1 || ... || entryX || "RBEP.man" || ... || offset >----------------------------|------o| ... | |+================================================+ | |+================================================+ || Manifest Header |<-----o| ... || FW version || ... || Security version || ... |+================================================+
Write Once Protected Content Memory (WOPCM) Layout¶
The layout of the WOPCM will be fixed after writing to GuC WOPCM size andoffset registers whose values are calculated and determined by HuC/GuCfirmware size and set of hardware requirements/restrictions as shown below:
+=========> +====================+ <== WOPCM Top ^ | HW contexts RSVD | | +===> +====================+ <== GuC WOPCM Top | ^ | | | | | | | | | | | GuC | | | WOPCM | | | Size +--------------------+WOPCM | | GuC FW RSVD | | | +--------------------+ | | | GuC Stack RSVD | | | +------------------- + | v | GuC WOPCM RSVD | | +===> +====================+ <== GuC WOPCM base | | WOPCM RSVD | | +------------------- + <== HuC Firmware Top v | HuC FW | +=========> +====================+ <== WOPCM Base
GuC accessible WOPCM starts at GuC WOPCM base and ends at GuC WOPCM top.The top part of the WOPCM is reserved for hardware contexts (e.g. RC6context).
GuC CTB Blob¶
We allocate single blob to hold both CTB descriptors and buffers:
offset
contents
size
0x0000
H2G CTB Descriptor (send)
4K
0x0800
G2H CTB Descriptor (g2h)
0x1000
H2G CT Buffer (send)
n*4K
0x1000+ n*4K
G2H CT Buffer (g2h)
m*4K
Size of eachCTBuffer must be multiple of 4K.We don’t expect too many messages in flight at any time, unless we areusing the GuC submission. In that case each request requires a minimum2 dwords which gives us a maximum 256 queue’d requests. Hopefully thisenough space to avoid backpressure on the driver. We increase the sizeof the receive buffer (relative to the send) to ensure a G2H responseCTB has a landing spot.
In addition to submissions, the G2H buffer needs to be able to holdenough space for recoverable page fault notifications. The number ofpage faults is interrupt driven and can be as much as the number ofcompute resources available. However, most of the actual work for theseis in a separate page fault worker thread. Therefore we only need tomake sure the queue has enough space to handle all of the submissionsand responses and an extra buffer for incoming page faults.
GuC Power Conservation (PC)¶
GuC Power Conservation (PC) supports multiple features for the mostefficient and performing use of the GT when GuC submission is enabled,including frequency management, Render-C states management, and variousalgorithms for power balancing.
Single Loop Power Conservation (SLPC) is the name given to the suite ofconnected power conservation features in the GuC firmware. The firmwareexposes a programming interface to the host for the control of SLPC.
Frequency management:¶
Xe driver enables SLPC with all of its defaults features and frequencyselection, which varies per platform.
Power profiles add another level of control to SLPC. When power savingprofile is chosen, SLPC will use conservative thresholds to ramp frequency,thus saving power. Base profile is default and ensures balanced performancefor any workload.
Render-C States:¶
Render-C states is also a GuC PC feature that is now enabled in Xe forall platforms.
Implementation details:¶
The implementation for GuC Power Management features is split as follows:
xe_guc_rc: Logic for handling GuC RCxe_gt_idle: Host side logic for RC6 and Coarse Power gating (CPG)xe_guc_pc: Logic for all other SLPC related features
There is some cross interaction between these where host C6 will need to beenabled when we plan to skip GuC RC. Also, the GuC RC mode is currentlyoverridden through 0x3003 which is an SLPC H2G call.
PCIe Gen5 Limitations¶
Default link speed of discrete GPUs is determined by configuration parametersstored in their flash memory, which are subject to override through userinitiated firmware updates. It has been observed that devices configured withPCIe Gen5 as their default link speed can come across link quality issues dueto host or motherboard limitations and may have to auto-downgrade their linkto PCIe Gen4 speed when faced with unstable link at Gen5, which makesfirmware updates rather risky on such setups. It is required to ensure thatthe device is capable of auto-downgrading its link to PCIe Gen4 speed beforepushing the firmware image with PCIe Gen5 as default configuration. This canbe done by readingauto_link_downgrade_capable sysfs entry, which willdenote if the device is capable of auto-downgrading its link to PCIe Gen4speed with boolean output value of0 or1, meaningincapable orcapable respectively.
$cat/sys/bus/pci/devices/<bdf>/auto_link_downgrade_capable
Pushing the firmware image with PCIe Gen5 as default configuration on a autolink downgrade incapable device and facing link instability due to host ormotherboard limitations can result in driver failing to bind to the device,making further firmware updates impossible with RMA being the only lastresort.
Link downgrade status of auto link downgrade capable devices is availablethroughauto_link_downgrade_status sysfs entry with boolean output valueof0 or1, where0 means no auto-downgrading was required duringlink training (which is the optimal scenario) and1 means the device hasauto-downgraded its link to PCIe Gen4 speed due to unstable Gen5 link.
$cat/sys/bus/pci/devices/<bdf>/auto_link_downgrade_status
Internal API¶
TODO