drm/i915 Intel GFX Driver

The drm/i915 driver supports all (with the exception of some very earlymodels) integrated GFX chipsets with both Intel display and renderingblocks. This excludes a set of SoC platforms with an SGX rendering unit,those have basic support through the gma500 drm driver.

Core Driver Infrastructure

This section covers core driver infrastructure used by both the displayand the GEM parts of the driver.

Runtime Power Management

The i915 driver supports dynamic enabling and disabling of entire hardwareblocks at runtime. This is especially important on the display side wheresoftware is supposed to control many power gates manually on recent hardware,since on the GT side a lot of the power management is done by the hardware.But even there some manual control at the device level is required.

Since i915 supports a diverse set of platforms with a unified codebase andhardware engineers just love to shuffle functionality around between powerdomains there’s a sizeable amount of indirection required. This file providesgeneric functions to the driver for grabbing and releasing references forabstract power domains. It then maps those to the actual power wellspresent for a given platform.

intel_wakeref_tintel_runtime_pm_get_raw(structintel_runtime_pm*rpm)

grab a raw runtime pm reference

Parameters

structintel_runtime_pm*rpm

the intel_runtime_pm structure

Description

This is the unlocked version ofintel_display_power_is_enabled() and shouldonly be used from error capture and recovery code where deadlocks arepossible.This function grabs a device-level runtime pm reference (mostly used forasynchronous PM management from display code) and ensures that it is poweredup. Raw references are not considered during wakelock assert checks.

Any runtime pm reference obtained by this function must have a symmetriccall tointel_runtime_pm_put_raw() to release the reference again.

Return

the wakeref cookie to pass tointel_runtime_pm_put_raw(), evaluatesas True if the wakeref was acquired, or False otherwise.

intel_wakeref_tintel_runtime_pm_get(structintel_runtime_pm*rpm)

grab a runtime pm reference

Parameters

structintel_runtime_pm*rpm

the intel_runtime_pm structure

Description

This function grabs a device-level runtime pm reference (mostly used for GEMcode to ensure the GTT or GT is on) and ensures that it is powered up.

Any runtime pm reference obtained by this function must have a symmetriccall tointel_runtime_pm_put() to release the reference again.

Return

the wakeref cookie to pass tointel_runtime_pm_put()

intel_wakeref_t__intel_runtime_pm_get_if_active(structintel_runtime_pm*rpm,boolignore_usecount)

grab a runtime pm reference if device is active

Parameters

structintel_runtime_pm*rpm

the intel_runtime_pm structure

boolignore_usecount

get a ref even if dev->power.usage_count is 0

Description

This function grabs a device-level runtime pm reference if the device isalready active and ensures that it is powered up. It is illegal to tryand access the HW shouldintel_runtime_pm_get_if_active() report failure.

Ifignore_usecount is true, a reference will be acquired even if there is nouser requiring the device to be powered up (dev->power.usage_count == 0).If the function returns false in this case then it’s guaranteed that thedevice’s runtime suspend hook has been called already or that it will becalled (and hence it’s also guaranteed that the device’s runtime resumehook will be called eventually).

Any runtime pm reference obtained by this function must have a symmetriccall tointel_runtime_pm_put() to release the reference again.

Return

the wakeref cookie to pass tointel_runtime_pm_put(), evaluatesas True if the wakeref was acquired, or False otherwise.

intel_wakeref_tintel_runtime_pm_get_noresume(structintel_runtime_pm*rpm)

grab a runtime pm reference

Parameters

structintel_runtime_pm*rpm

the intel_runtime_pm structure

Description

This function grabs a device-level runtime pm reference.

It will _not_ resume the device but instead only get an extra wakeref.Therefore it is only valid to call this functions from contexts wherethe device is known to be active and with another wakeref previously hold.

Any runtime pm reference obtained by this function must have a symmetriccall tointel_runtime_pm_put() to release the reference again.

Return

the wakeref cookie to pass tointel_runtime_pm_put()

voidintel_runtime_pm_put_raw(structintel_runtime_pm*rpm,intel_wakeref_twref)

release a raw runtime pm reference

Parameters

structintel_runtime_pm*rpm

the intel_runtime_pm structure

intel_wakeref_twref

wakeref acquired for the reference that is being released

Description

This function drops the device-level runtime pm reference obtained byintel_runtime_pm_get_raw() and might power down the correspondinghardware block right away if this is the last reference.

voidintel_runtime_pm_put_unchecked(structintel_runtime_pm*rpm)

release an unchecked runtime pm reference

Parameters

structintel_runtime_pm*rpm

the intel_runtime_pm structure

Description

This function drops the device-level runtime pm reference obtained byintel_runtime_pm_get() and might power down the correspondinghardware block right away if this is the last reference.

This function exists only for historical reasons and should be avoided innew code, as the correctness of its use cannot be checked. Always useintel_runtime_pm_put() instead.

voidintel_runtime_pm_put(structintel_runtime_pm*rpm,intel_wakeref_twref)

release a runtime pm reference

Parameters

structintel_runtime_pm*rpm

the intel_runtime_pm structure

intel_wakeref_twref

wakeref acquired for the reference that is being released

Description

This function drops the device-level runtime pm reference obtained byintel_runtime_pm_get() and might power down the correspondinghardware block right away if this is the last reference.

voidintel_runtime_pm_enable(structintel_runtime_pm*rpm)

enable runtime pm

Parameters

structintel_runtime_pm*rpm

the intel_runtime_pm structure

Description

This function enables runtime pm at the end of the driver load sequence.

Note that this function does currently not enable runtime pm for thesubordinate display power domains. That is done byintel_power_domains_enable().

voidintel_uncore_forcewake_get(structintel_uncore*uncore,enumforcewake_domainsfw_domains)

grab forcewake domain references

Parameters

structintel_uncore*uncore

the intel_uncore structure

enumforcewake_domainsfw_domains

forcewake domains to get reference on

Description

This function can be used get GT’s forcewake domain references.Normal register access will handle the forcewake domains automatically.However if some sequence requires the GT to not power down a particularforcewake domains this function should be called at the beginning of thesequence. And subsequently the reference should be dropped by symmetriccall tointel_unforce_forcewake_put(). Usually caller wants all the domainsto be kept awake so thefw_domains would be then FORCEWAKE_ALL.

voidintel_uncore_forcewake_user_get(structintel_uncore*uncore)

claim forcewake on behalf of userspace

Parameters

structintel_uncore*uncore

the intel_uncore structure

Description

This function is a wrapper aroundintel_uncore_forcewake_get() to acquirethe GT powerwell and in the process disable our debugging for theduration of userspace’s bypass.

voidintel_uncore_forcewake_user_put(structintel_uncore*uncore)

release forcewake on behalf of userspace

Parameters

structintel_uncore*uncore

the intel_uncore structure

Description

This function complementsintel_uncore_forcewake_user_get() and releasesthe GT powerwell taken on behalf of the userspace bypass.

voidintel_uncore_forcewake_get__locked(structintel_uncore*uncore,enumforcewake_domainsfw_domains)

grab forcewake domain references

Parameters

structintel_uncore*uncore

the intel_uncore structure

enumforcewake_domainsfw_domains

forcewake domains to get reference on

Description

Seeintel_uncore_forcewake_get(). This variant places the onuson the caller to explicitly handle the dev_priv->uncore.lock spinlock.

voidintel_uncore_forcewake_put(structintel_uncore*uncore,enumforcewake_domainsfw_domains)

release a forcewake domain reference

Parameters

structintel_uncore*uncore

the intel_uncore structure

enumforcewake_domainsfw_domains

forcewake domains to put references

Description

This function drops the device-level forcewakes for specifieddomains obtained byintel_uncore_forcewake_get().

voidintel_uncore_forcewake_flush(structintel_uncore*uncore,enumforcewake_domainsfw_domains)

flush the delayed release

Parameters

structintel_uncore*uncore

the intel_uncore structure

enumforcewake_domainsfw_domains

forcewake domains to flush

voidintel_uncore_forcewake_put__locked(structintel_uncore*uncore,enumforcewake_domainsfw_domains)

release forcewake domain references

Parameters

structintel_uncore*uncore

the intel_uncore structure

enumforcewake_domainsfw_domains

forcewake domains to put references

Description

Seeintel_uncore_forcewake_put(). This variant places the onuson the caller to explicitly handle the dev_priv->uncore.lock spinlock.

int__intel_wait_for_register_fw(structintel_uncore*uncore,i915_reg_treg,u32mask,u32value,unsignedintfast_timeout_us,unsignedintslow_timeout_ms,u32*out_value)

wait until register matches expected state

Parameters

structintel_uncore*uncore

thestructintel_uncore

i915_reg_treg

the register to read

u32mask

mask to apply to register value

u32value

expected value

unsignedintfast_timeout_us

fast timeout in microsecond for atomic/tight wait

unsignedintslow_timeout_ms

slow timeout in millisecond

u32*out_value

optional placeholder to hold registry value

Description

This routine waits until the target registerreg contains the expectedvalue after applying themask, i.e. it waits until

(intel_uncore_read_fw(uncore, reg) & mask) == value

Otherwise, the wait will timeout afterslow_timeout_ms milliseconds.For atomic contextslow_timeout_ms must be zero andfast_timeout_usmust be not larger than 20,0000 microseconds.

Note that this routine assumes the caller holds forcewake asserted, it isnot suitable for very long waits. Seeintel_wait_for_register() if youwish to wait without holding forcewake for the duration (i.e. you expectthe wait to be slow).

Return

0 if the register matches the desired condition, or -ETIMEDOUT.

int__intel_wait_for_register(structintel_uncore*uncore,i915_reg_treg,u32mask,u32value,unsignedintfast_timeout_us,unsignedintslow_timeout_ms,u32*out_value)

wait until register matches expected state

Parameters

structintel_uncore*uncore

thestructintel_uncore

i915_reg_treg

the register to read

u32mask

mask to apply to register value

u32value

expected value

unsignedintfast_timeout_us

fast timeout in microsecond for atomic/tight wait

unsignedintslow_timeout_ms

slow timeout in millisecond

u32*out_value

optional placeholder to hold registry value

Description

This routine waits until the target registerreg contains the expectedvalue after applying themask, i.e. it waits until

(intel_uncore_read(uncore, reg) & mask) == value

Otherwise, the wait will timeout aftertimeout_ms milliseconds.

Return

0 if the register matches the desired condition, or -ETIMEDOUT.

enumforcewake_domainsintel_uncore_forcewake_for_reg(structintel_uncore*uncore,i915_reg_treg,unsignedintop)

which forcewake domains are needed to access a register

Parameters

structintel_uncore*uncore

pointer tostructintel_uncore

i915_reg_treg

register in question

unsignedintop

operation bitmask of FW_REG_READ and/or FW_REG_WRITE

Description

Returns a set of forcewake domains required to be taken with for exampleintel_uncore_forcewake_get for the specified register to be accessible in thespecified mode (read, write or read/write) with raw mmio accessors.

NOTE

On Gen6 and Gen7 write forcewake domain (FORCEWAKE_RENDER) requires thecallers to do FIFO management on their own or risk losing writes.

Interrupt Handling

These functions provide the basic support for enabling and disabling theinterrupt handling support. There’s a lot more functionality in i915_irq.cand related files, but that will be described in separate chapters.

voidintel_irq_init(structdrm_i915_private*dev_priv)

initializes irq support

Parameters

structdrm_i915_private*dev_priv

i915 device instance

Description

This function initializes all the irq support including work items, timersand all the vtables. It does not setup the interrupt itself though.

voidintel_irq_suspend(structdrm_i915_private*i915)

Suspend interrupts

Parameters

structdrm_i915_private*i915

i915 device instance

Description

This function is used to disable interrupts at runtime.

voidintel_irq_resume(structdrm_i915_private*i915)

Resume interrupts

Parameters

structdrm_i915_private*i915

i915 device instance

Description

This function is used to enable interrupts at runtime.

Intel GVT-g Guest Support(vGPU)

Intel GVT-g is a graphics virtualization technology which shares theGPU among multiple virtual machines on a time-sharing basis. Eachvirtual machine is presented a virtual GPU (vGPU), which has equivalentfeatures as the underlying physical GPU (pGPU), so i915 driver can runseamlessly in a virtual machine. This file provides vGPU specificoptimizations when running in a virtual machine, to reduce the complexityof vGPU emulation and to improve the overall performance.

A primary function introduced here is so-called “address space ballooning”technique. Intel GVT-g partitions global graphics memory among multiple VMs,so each VM can directly access a portion of the memory without hypervisor’sintervention, e.g. filling textures or queuing commands. However with thepartitioning an unmodified i915 driver would assume a smaller graphicsmemory starting from address ZERO, then requires vGPU emulation module totranslate the graphics address between ‘guest view’ and ‘host view’, forall registers and command opcodes which contain a graphics memory address.To reduce the complexity, Intel GVT-g introduces “address space ballooning”,by telling the exact partitioning knowledge to each guest i915 driver, whichthen reserves and prevents non-allocated portions from allocation. Thus vGPUemulation module only needs to scan and validate graphics addresses withoutcomplexity of address translation.

voidintel_vgpu_detect(structdrm_i915_private*dev_priv)

detect virtual GPU

Parameters

structdrm_i915_private*dev_priv

i915 device private

Description

This function is called at the initialization stage, to detect whetherrunning on a vGPU.

voidintel_vgt_deballoon(structi915_ggtt*ggtt)

deballoon reserved graphics address trunks

Parameters

structi915_ggtt*ggtt

the global GGTT from which we reserved earlier

Description

This function is called to deallocate the ballooned-out graphic memory, whendriver is unloaded or when ballooning fails.

intintel_vgt_balloon(structi915_ggtt*ggtt)

balloon out reserved graphics address trunks

Parameters

structi915_ggtt*ggtt

the global GGTT from which to reserve

Description

This function is called at the initialization stage, to balloon out thegraphic address space allocated to other vGPUs, by marking these spaces asreserved. The ballooning related knowledge(starting address and size ofthe mappable/unmappable graphic memory) is described in the vgt_if structurein a reserved mmio range.

To give an example, the drawing below depicts one typical scenario afterballooning. Here the vGPU1 has 2 pieces of graphic address spaces balloonedout each for the mappable and the non-mappable part. From the vGPU1 point ofview, the total size is the same as the physical one, with the start addressof its graphic space being zero. Yet there are some portions ballooned out(the shadow part, which are marked as reserved by drm allocator). From thehost point of view, the graphic address space is partitioned by multiplevGPUs in different VMs.

                       vGPU1 view         Host view            0 ------> +-----------+     +-----------+              ^       |###########|     |   vGPU3   |              |       |###########|     +-----------+              |       |###########|     |   vGPU2   |              |       +-----------+     +-----------+       mappable GM    | available | ==> |   vGPU1   |              |       +-----------+     +-----------+              |       |###########|     |           |              v       |###########|     |   Host    |              +=======+===========+     +===========+              ^       |###########|     |   vGPU3   |              |       |###########|     +-----------+              |       |###########|     |   vGPU2   |              |       +-----------+     +-----------+     unmappable GM    | available | ==> |   vGPU1   |              |       +-----------+     +-----------+              |       |###########|     |           |              |       |###########|     |   Host    |              v       |###########|     |           |total GM size ------> +-----------+     +-----------+

Return

zero on success, non-zero if configuration invalid or ballooning failed

Intel GVT-g Host Support(vGPU device model)

Intel GVT-g is a graphics virtualization technology which shares theGPU among multiple virtual machines on a time-sharing basis. Eachvirtual machine is presented a virtual GPU (vGPU), which has equivalentfeatures as the underlying physical GPU (pGPU), so i915 driver can runseamlessly in a virtual machine.

To virtualize GPU resources GVT-g driver depends on hypervisor technologye.g KVM/VFIO/mdev, Xen, etc. to provide resource access trapping capabilityand be virtualized within GVT-g device module. More architectural designdoc is available onhttps://github.com/intel/gvt-linux/wiki.

intintel_gvt_init(structdrm_i915_private*dev_priv)

initialize GVT components

Parameters

structdrm_i915_private*dev_priv

drm i915 private data

Description

This function is called at the initialization stage to create a GVT device.

Return

Zero on success, negative error code if failed.

voidintel_gvt_driver_remove(structdrm_i915_private*dev_priv)

cleanup GVT components when i915 driver is unbinding

Parameters

structdrm_i915_private*dev_priv

drm i915 private *

Description

This function is called at the i915 driver unloading stage, to shutdownGVT components and release the related resources.

voidintel_gvt_resume(structdrm_i915_private*dev_priv)

GVT resume routine wrapper

Parameters

structdrm_i915_private*dev_priv

drm i915 private *

Description

This function is called at the i915 driver resume stage to restore requiredHW status for GVT so that vGPU can continue running after resumed.

Workarounds

Hardware workarounds are register programming documented to be executed inthe driver that fall outside of the normal programming sequences for aplatform. There are some basic categories of workarounds, depending onhow/when they are applied:

  • Context workarounds: workarounds that touch registers that aresaved/restored to/from the HW context image. The list is emitted (via LoadRegister Immediate commands) once when initializing the device and saved inthe default context. That default context is then used on every contextcreation to have a “primed golden context”, i.e. a context image thatalready contains the changes needed to all the registers.

    Context workarounds should be implemented in the *_ctx_workarounds_init()variants respective to the targeted platforms.

  • Engine workarounds: the list of these WAs is applied whenever the specificengine is reset. It’s also possible that a set of engine classes share acommon power domain and they are reset together. This happens on someplatforms with render and compute engines. In this case (at least) one ofthem need to keeep the workaround programming: the approach taken in thedriver is to tie those workarounds to the first compute/render engine thatis registered. When executing with GuC submission, engine resets areoutside of kernel driver control, hence the list of registers involved inwritten once, on engine initialization, and then passed to GuC, thatsaves/restores their values before/after the reset takes place. Seedrivers/gpu/drm/i915/gt/uc/intel_guc_ads.c for reference.

    Workarounds for registers specific to RCS and CCS should be implemented inrcs_engine_wa_init() andccs_engine_wa_init(), respectively; those forregisters belonging to BCS, VCS or VECS should be implemented inxcs_engine_wa_init(). Workarounds for registers not belonging to a specificengine’s MMIO range but that are part of of the common RCS/CCS reset domainshould be implemented ingeneral_render_compute_wa_init(). The settingsabout the CCS load balancing should be added inccs_engine_wa_mode().

  • GT workarounds: the list of these WAs is applied whenever these registersrevert to their default values: on GPU reset, suspend/resume[1], etc.

    GT workarounds should be implemented in the *_gt_workarounds_init()variants respective to the targeted platforms.

  • Register whitelist: some workarounds need to be implemented in userspace,but need to touch privileged registers. The whitelist in the kernelinstructs the hardware to allow the access to happen. From the kernel side,this is just a special case of a MMIO workaround (as we write the list ofthese to/be-whitelisted registers to some special HW registers).

    Register whitelisting should be done in the *_whitelist_build() variantsrespective to the targeted platforms.

  • Workaround batchbuffers: buffers that get executed automatically by thehardware on every HW context restore. These buffers are created andprogrammed in the default context so the hardware always go through thoseprogramming sequences when switching contexts. The support for workaroundbatchbuffers is enabled these hardware mechanisms:

    1. INDIRECT_CTX: A batchbuffer and an offset are provided in the defaultcontext, pointing the hardware to jump to that location when that offsetis reached in the context restore. Workaround batchbuffer in the drivercurrently uses this mechanism for all platforms.

    2. BB_PER_CTX_PTR: A batchbuffer is provided in the default context,pointing the hardware to a buffer to continue executing after theengine registers are restored in a context restore sequence. This iscurrently not used in the driver.

  • Other: There are WAs that, due to their nature, cannot be applied from acentral place. Those are peppered around the rest of the code, as needed.Workarounds related to the display IP are the main example.

[1]

Technically, some registers are powercontext saved & restored, so theysurvive a suspend/resume. In practice, writing them again is not toocostly and simplifies things, so it’s the approach taken in the driver.

Display Hardware Handling

This section covers everything related to the display hardware includingthe mode setting infrastructure, plane, sprite and cursor handling anddisplay, output probing and related topics.

Mode Setting Infrastructure

The i915 driver is thus far the only DRM driver which doesn’t use thecommon DRM helper code to implement mode setting sequences. Thus it hasits own tailor-made infrastructure for executing a display configurationchange.

Frontbuffer Tracking

Many features require us to track changes to the currently activefrontbuffer, especially rendering targeted at the frontbuffer.

To be able to do so we track frontbuffers using a bitmask for all possiblefrontbuffer slots throughintel_frontbuffer_track(). The functions in thisfile are then called when the contents of the frontbuffer are invalidated,when frontbuffer rendering has stopped again to flush out all the changesand when the frontbuffer is exchanged with a flip. Subsystems interested infrontbuffer changes (e.g. PSR, FBC, DRRS) should directly put their callbacksinto the relevant places and filter for the frontbuffer slots that they areinterested int.

On a high level there are two types of powersaving features. The first onework like a special cache (FBC and PSR) and are interested when they shouldstop caching and when to restart caching. This is done by placing callbacksinto the invalidate and the flush functions: At invalidate the caching mustbe stopped and at flush time it can be restarted. And maybe they need to knowwhen the frontbuffer changes (e.g. when the hw doesn’t initiate an invalidateand flush on its own) which can be achieved with placing callbacks into theflip functions.

The other type of display power saving feature only cares about busyness(e.g. DRRS). In that case all three (invalidate, flush and flip) indicatebusyness. There is no direct way to detect idleness. Instead an idle timerwork delayed work should be started from the flush and flip functions andcancelled as soon as busyness is detected.

boolintel_frontbuffer_invalidate(structintel_frontbuffer*front,enumfb_op_originorigin)

invalidate frontbuffer object

Parameters

structintel_frontbuffer*front

GEM object to invalidate

enumfb_op_originorigin

which operation caused the invalidation

Description

This function gets called every time rendering on the given object starts andfrontbuffer caching (fbc, low refresh rate for DRRS, panel self refresh) mustbe invalidated. For ORIGIN_CS any subsequent invalidation will be delayeduntil the rendering completes or a flip on this frontbuffer plane isscheduled.

voidintel_frontbuffer_flush(structintel_frontbuffer*front,enumfb_op_originorigin)

flush frontbuffer object

Parameters

structintel_frontbuffer*front

GEM object to flush

enumfb_op_originorigin

which operation caused the flush

Description

This function gets called every time rendering on the given object hascompleted and frontbuffer caching can be started again.

voidfrontbuffer_flush(structintel_display*display,unsignedintfrontbuffer_bits,enumfb_op_originorigin)

flush frontbuffer

Parameters

structintel_display*display

display device

unsignedintfrontbuffer_bits

frontbuffer plane tracking bits

enumfb_op_originorigin

which operation caused the flush

Description

This function gets called every time rendering on the given planes hascompleted and frontbuffer caching can be started again. Flushes will getdelayed if they’re blocked by some outstanding asynchronous rendering.

Can be called without any locks held.

voidintel_frontbuffer_flip(structintel_display*display,unsignedfrontbuffer_bits)

synchronous frontbuffer flip

Parameters

structintel_display*display

display device

unsignedfrontbuffer_bits

frontbuffer plane tracking bits

Description

This function gets called after scheduling a flip onobj. This is forsynchronous plane updates which will happen on the next vblank and which willnot get delayed by pending gpu rendering.

Can be called without any locks held.

voidintel_frontbuffer_queue_flush(structintel_frontbuffer*front)

queue flushing frontbuffer object

Parameters

structintel_frontbuffer*front

GEM object to flush

Description

This function is targeted for our dirty callback for queueing flush whendma fence is signals

voidintel_frontbuffer_track(structintel_frontbuffer*old,structintel_frontbuffer*new,unsignedintfrontbuffer_bits)

update frontbuffer tracking

Parameters

structintel_frontbuffer*old

current buffer for the frontbuffer slots

structintel_frontbuffer*new

new buffer for the frontbuffer slots

unsignedintfrontbuffer_bits

bitmask of frontbuffer slots

Description

This updates the frontbuffer tracking bitsfrontbuffer_bits by clearing themfromold and setting them innew. Bothold andnew can be NULL.

Display FIFO Underrun Reporting

The i915 driver checks for display fifo underruns using the interrupt signalsprovided by the hardware. This is enabled by default and fairly useful todebug display issues, especially watermark settings.

If an underrun is detected this is logged into dmesg. To avoid flooding logsand occupying the cpu underrun interrupts are disabled after the firstoccurrence until the next modeset on a given pipe.

Note that underrun detection on gmch platforms is a bit more ugly since thereis no interrupt (despite that the signalling bit is in the PIPESTAT pipeinterrupt register). Also on some other platforms underrun interrupts areshared, which means that if we detect an underrun we need to disable underrunreporting on all pipes.

The code also supports underrun detection on the PCH transcoder.

boolintel_set_cpu_fifo_underrun_reporting(structintel_display*display,enumpipepipe,boolenable)

set cpu fifo underrun reporting state

Parameters

structintel_display*display

display device instance

enumpipepipe

(CPU) pipe to set state for

boolenable

whether underruns should be reported or not

Description

This function sets the fifo underrun state forpipe. It is used in themodeset code to avoid false positives since on many platforms underruns areexpected when disabling or enabling the pipe.

Notice that on some platforms disabling underrun reports for one pipedisables for all due to shared interrupts. Actual reporting is still per-pipethough.

Returns the previous state of underrun reporting.

boolintel_set_pch_fifo_underrun_reporting(structintel_display*display,enumpipepch_transcoder,boolenable)

set PCH fifo underrun reporting state

Parameters

structintel_display*display

display device instance

enumpipepch_transcoder

the PCH transcoder (same as pipe on IVB and older)

boolenable

whether underruns should be reported or not

Description

This function makes us disable or enable PCH fifo underruns for a specificPCH transcoder. Notice that on some PCHs (e.g. CPT/PPT), disabling FIFOunderrun reporting for one transcoder may also disable all the other PCHerror interruts for the other transcoders, due to the fact that there’s justone interrupt mask/enable bit for all the transcoders.

Returns the previous state of underrun reporting.

voidintel_cpu_fifo_underrun_irq_handler(structintel_display*display,enumpipepipe)

handle CPU fifo underrun interrupt

Parameters

structintel_display*display

display device instance

enumpipepipe

(CPU) pipe to set state for

Description

This handles a CPU fifo underrun interrupt, generating an underrun warninginto dmesg if underrun reporting is enabled and then disables the underruninterrupt to avoid an irq storm.

voidintel_pch_fifo_underrun_irq_handler(structintel_display*display,enumpipepch_transcoder)

handle PCH fifo underrun interrupt

Parameters

structintel_display*display

display device instance

enumpipepch_transcoder

the PCH transcoder (same as pipe on IVB and older)

Description

This handles a PCH fifo underrun interrupt, generating an underrun warninginto dmesg if underrun reporting is enabled and then disables the underruninterrupt to avoid an irq storm.

voidintel_check_cpu_fifo_underruns(structintel_display*display)

check for CPU fifo underruns immediately

Parameters

structintel_display*display

display device instance

Description

Check for CPU fifo underruns immediately. Useful on IVB/HSW where the sharederror interrupt may have been disabled, and so CPU fifo underruns won’tnecessarily raise an interrupt, and on GMCH platforms where underruns neverraise an interrupt.

voidintel_check_pch_fifo_underruns(structintel_display*display)

check for PCH fifo underruns immediately

Parameters

structintel_display*display

display device instance

Description

Check for PCH fifo underruns immediately. Useful on CPT/PPT where the sharederror interrupt may have been disabled, and so PCH fifo underruns won’tnecessarily raise an interrupt.

Plane Configuration

This section covers plane configuration and composition with the primaryplane, sprites, cursors and overlays. This includes the infrastructureto do atomic vsync’ed updates of all this state and also tightly coupledtopics like watermark setup and computation, framebuffer compression andpanel self refresh.

Atomic Plane Helpers

The functions here are used by the atomic plane helper functions toimplement legacy plane updates (i.e., drm_plane->update_plane() anddrm_plane->disable_plane()). This allows plane updates to use theatomic state infrastructure and perform plane updates as separateprepare/check/commit/cleanup steps.

voidintel_plane_destroy(structdrm_plane*plane)

destroy a plane

Parameters

structdrm_plane*plane

plane to destroy

Description

Common destruction function for all types of planes (primary, cursor,sprite).

structdrm_plane_state*intel_plane_duplicate_state(structdrm_plane*plane)

duplicate plane state

Parameters

structdrm_plane*plane

drm plane

Description

Allocates and returns a copy of the plane state (both common andIntel-specific) for the specified plane.

Return

The newly allocated plane state, or NULL on failure.

voidintel_plane_destroy_state(structdrm_plane*plane,structdrm_plane_state*state)

destroy plane state

Parameters

structdrm_plane*plane

drm plane

structdrm_plane_state*state

state object to destroy

Description

Destroys the plane state (both common and Intel-specific) for thespecified plane.

intintel_prepare_plane_fb(structdrm_plane*_plane,structdrm_plane_state*_new_plane_state)

Prepare fb for usage on plane

Parameters

structdrm_plane*_plane

drm plane to prepare for

structdrm_plane_state*_new_plane_state

the plane state being prepared

Description

Prepares a framebuffer for usage on a display plane. Generally thisinvolves pinning the underlying object and updating the frontbuffer trackingbits. Some older platforms need special physical address handling forcursor planes.

Returns 0 on success, negative error code on failure.

voidintel_cleanup_plane_fb(structdrm_plane*plane,structdrm_plane_state*_old_plane_state)

Cleans up an fb after plane use

Parameters

structdrm_plane*plane

drm plane to clean up for

structdrm_plane_state*_old_plane_state

the state from the previous modeset

Description

Cleans up a framebuffer that has just been removed from a plane.

Asynchronous Page Flip

Asynchronous page flip is the implementation for the DRM_MODE_PAGE_FLIP_ASYNCflag. Currently async flip is only supported via the drmModePageFlip IOCTL.Correspondingly, support is currently added for primary plane only.

Async flip can only change the plane surface address, so anything elsechanging is rejected from theintel_async_flip_check_hw() function.Once this check is cleared, flip done interrupt is enabled usingtheintel_crtc_enable_flip_done() function.

As soon as the surface address register is written, flip done interrupt isgenerated and the requested events are sent to the userspace in the interrupthandler itself. The timestamp and sequence sent during the flip done eventcorrespond to the last vblank and have no relation to the actual time whenthe flip done event was sent.

Output Probing

This section covers output probing and related infrastructure like thehotplug interrupt storm detection and mitigation code. Note that thei915 driver still uses most of the common DRM helper code for outputprobing, so those sections fully apply.

Hotplug

Simply put, hotplug occurs when a display is connected to or disconnectedfrom the system. However, there may be adapters and docking stations andDisplay Port short pulses and MST devices involved, complicating matters.

Hotplug in i915 is handled in many different levels of abstraction.

The platform dependent interrupt handling code in i915_irq.c enables,disables, and does preliminary handling of the interrupts. The interrupthandlers gather the hotplug detect (HPD) information from relevant registersinto a platform independent mask of hotplug pins that have fired.

The platform independent interrupt handlerintel_hpd_irq_handler() inintel_hotplug.c does hotplug irq storm detection and mitigation, and passesfurther processing to appropriate bottom halves (Display Port specific andregular hotplug).

The Display Port work functioni915_digport_work_func() calls intointel_dp_hpd_pulse() via hooks, which handles DP short pulses and DP MST longpulses, with failures and non-MST long pulses triggering regular hotplugprocessing on the connector.

The regular hotplug work functioni915_hotplug_work_func() calls connectordetect hooks, and, if connector status changes, triggers sending of hotpluguevent to userspace viadrm_kms_helper_hotplug_event().

Finally, the userspace is responsible for triggering a modeset upon receivingthe hotplug uevent, disabling or enabling the crtc as needed.

The hotplug interrupt storm detection and mitigation code keeps track of thenumber of interrupts per hotplug pin per a period of time, and if the numberof interrupts exceeds a certain threshold, the interrupt is disabled for awhile before being re-enabled. The intention is to mitigate issues raisingfrom broken hardware triggering massive amounts of interrupts and grindingthe system to a halt.

Current implementation expects that hotplug interrupt storm will not beseen when display port sink is connected, hence on platforms whose DPcallback is handled by i915_digport_work_func reenabling of hpd is notperformed (it was never expected to be disabled in the first place ;) )this is specific to DP sinks handled by this routine and any other displaysuch as HDMI or DVI enabled on the same port will have proper logic sinceit will use i915_hotplug_work_func where this logic is handled.

enumhpd_pinintel_hpd_pin_default(enumportport)

return default pin associated with certain port.

Parameters

enumportport

the hpd port to get associated pin

Description

It is only valid and used by digital port encoder.

Return pin that is associatade withport.

boolintel_hpd_irq_storm_detect(structintel_display*display,enumhpd_pinpin,boollong_hpd)

gather stats and detect HPD IRQ storm on a pin

Parameters

structintel_display*display

display device

enumhpd_pinpin

the pin to gather stats on

boollong_hpd

whether the HPD IRQ was long or short

Description

Gather stats about HPD IRQs from the specifiedpin, and detect IRQstorms. Only the pin specific stats and state are changed, the caller isresponsible for further action.

The number of IRQs that are allowed withinHPD_STORM_DETECT_PERIOD isstored indisplay->hotplug.hpd_storm_threshold which defaults toHPD_STORM_DEFAULT_THRESHOLD. Long IRQs count as +10 to this threshold, andshort IRQs count as +1. If this threshold is exceeded, it’s considered anIRQ storm and the IRQ state is set toHPD_MARK_DISABLED.

By default, most systems will only count long IRQs towardsdisplay->hotplug.hpd_storm_threshold. However, some older systems alsosuffer from short IRQ storms and must also track these. Because short IRQstorms are naturally caused by sideband interactions with DP MST devices,short IRQ detection is only enabled for systems without DP MST support.Systems which are new enough to support DP MST are far less likely tosuffer from IRQ storms at all, so this is fine.

The HPD threshold can be controlled through i915_hpd_storm_ctl in debugfs,and should only be adjusted for automated hotplug testing.

Return true if an IRQ storm was detected onpin.

voidintel_hpd_trigger_irq(structintel_digital_port*dig_port)

trigger an hpd irq event for a port

Parameters

structintel_digital_port*dig_port

digital port

Description

Trigger an HPD interrupt event for the given port, emulating a short pulsegenerated by the sink, and schedule the dig port work to handle it.

voidintel_hpd_irq_handler(structintel_display*display,u32pin_mask,u32long_mask)

main hotplug irq handler

Parameters

structintel_display*display

display device

u32pin_mask

a mask of hpd pins that have triggered the irq

u32long_mask

a mask of hpd pins that may be long hpd pulses

Description

This is the main hotplug irq handler for all platforms. The platform specificirq handlers call the platform specific hotplug irq handlers, which read anddecode the appropriate registers into bitmasks about hpd pins that havetriggered (pin_mask), and which of those pins may be long pulses(long_mask). Thelong_mask is ignored if the port corresponding to the pinis not a digital port.

Here, we do hotplug irq storm detection and mitigation, and pass furtherprocessing to appropriate bottom halves.

voidintel_hpd_init(structintel_display*display)

initializes and enables hpd support

Parameters

structintel_display*display

display device instance

Description

This function enables the hotplug support. It requires that interrupts havealready been enabled withintel_irq_init_hw(). From this point on hotplug andpoll request can run concurrently to other code, so locking rules must beobeyed.

This is a separate step from interrupt enabling to simplify the locking rulesin the driver load and resume code.

Also see:intel_hpd_poll_enable() andintel_hpd_poll_disable().

voidintel_hpd_poll_enable(structintel_display*display)

enable polling for connectors with hpd

Parameters

structintel_display*display

display device instance

Description

This function enables polling for all connectors which support HPD.Under certain conditions HPD may not be functional. On most Intel GPUs,this happens when we enter runtime suspend.On Valleyview and Cherryview systems, this also happens when we shut off allof the powerwells.

Since this function can get called in contexts where we’re already holdingdev->mode_config.mutex, we do the actual hotplug enabling in a separateworker.

Also see:intel_hpd_init() andintel_hpd_poll_disable().

voidintel_hpd_poll_disable(structintel_display*display)

disable polling for connectors with hpd

Parameters

structintel_display*display

display device instance

Description

This function disables polling for all connectors which support HPD.Under certain conditions HPD may not be functional. On most Intel GPUs,this happens when we enter runtime suspend.On Valleyview and Cherryview systems, this also happens when we shut off allof the powerwells.

Since this function can get called in contexts where we’re already holdingdev->mode_config.mutex, we do the actual hotplug enabling in a separateworker.

Also used during driver init to initialize connector->polledappropriately for all connectors.

Also see:intel_hpd_init() andintel_hpd_poll_enable().

voidintel_hpd_block(structintel_encoder*encoder)

Block handling of HPD IRQs on an HPD pin

Parameters

structintel_encoder*encoder

Encoder to block the HPD handling for

Description

Blocks the handling of HPD IRQs on the HPD pin ofencoder.

On return:

  • It’s guaranteed that the blocked encoders’ HPD pulse handler(via intel_digital_port::hpd_pulse()) is not running.

  • The hotplug event handling (via intel_encoder::hotplug()) of anHPD IRQ pending at the time this function is called may be stillrunning.

  • Detection on the encoder’s connector (viadrm_connector_helper_funcs::detect_ctx(),drm_connector_funcs::detect()) remains allowed, for instance as part ofuserspace connector probing, or DRM core’s connector polling.

The call must be followed by callingintel_hpd_unblock(), orintel_hpd_clear_and_unblock().

Note that the handling of HPD IRQs for another encoder using the same HPDpin as that ofencoder will be also blocked.

voidintel_hpd_unblock(structintel_encoder*encoder)

Unblock handling of HPD IRQs on an HPD pin

Parameters

structintel_encoder*encoder

Encoder to unblock the HPD handling for

Description

Unblock the handling of HPD IRQs on the HPD pin ofencoder, which waspreviously blocked byintel_hpd_block(). Any HPD IRQ raised on theHPD pin while it was blocked will be handled forencoder and for anyother encoder sharing the same HPD pin.

voidintel_hpd_clear_and_unblock(structintel_encoder*encoder)

Unblock handling of new HPD IRQs on an HPD pin

Parameters

structintel_encoder*encoder

Encoder to unblock the HPD handling for

Description

Unblock the handling of HPD IRQs on the HPD pin ofencoder, which waspreviously blocked byintel_hpd_block(). Any HPD IRQ raised on theHPD pin while it was blocked will be cleared, handling only new IRQs.

High Definition Audio

The graphics and audio drivers together support High Definition Audio overHDMI and Display Port. The audio programming sequences are divided into audiocodec and controller enable and disable sequences. The graphics driverhandles the audio codec sequences, while the audio driver handles the audiocontroller sequences.

The disable sequences must be performed before disabling the transcoder orport. The enable sequences may only be performed after enabling thetranscoder and port, and after completed link training. Therefore the audioenable/disable sequences are part of the modeset sequence.

The codec and controller sequences could be done either parallel or serial,but generally the ELDV/PD change in the codec sequence indicates to the audiodriver that the controller sequence should start. Indeed, most of theco-operation between the graphics and audio drivers is handled via audiorelated registers. (The notable exception is the power management, notcovered here.)

The structi915_audio_component is used to interact between the graphicsand audio drivers. The structi915_audio_component_opsops in it isdefined in graphics driver and called in audio driver. Thestructi915_audio_component_audio_opsaudio_ops is called from i915 driver.

voidintel_audio_codec_enable(structintel_encoder*encoder,conststructintel_crtc_state*crtc_state,conststructdrm_connector_state*conn_state)

Enable the audio codec for HD audio

Parameters

structintel_encoder*encoder

encoder on which to enable audio

conststructintel_crtc_state*crtc_state

pointer to the current crtc state.

conststructdrm_connector_state*conn_state

pointer to the current connector state.

Description

The enable sequences may only be performed after enabling the transcoder andport, and after completed link training.

voidintel_audio_codec_disable(structintel_encoder*encoder,conststructintel_crtc_state*old_crtc_state,conststructdrm_connector_state*old_conn_state)

Disable the audio codec for HD audio

Parameters

structintel_encoder*encoder

encoder on which to disable audio

conststructintel_crtc_state*old_crtc_state

pointer to the old crtc state.

conststructdrm_connector_state*old_conn_state

pointer to the old connector state.

Description

The disable sequences must be performed before disabling the transcoder orport.

voidintel_audio_hooks_init(structintel_display*display)

Set up chip specific audio hooks

Parameters

structintel_display*display

display device

voidintel_audio_component_init(structintel_display*display)

initialize and register the audio component

Parameters

structintel_display*display

display device

Description

This will register with the component framework a child component whichwill bind dynamically to the snd_hda_intel driver’s corresponding mastercomponent when the latter is registered. During binding the childinitializes an instance ofstructi915_audio_component which it receivesfrom the master. The master can then start to use the interface defined bythis struct. Each side can break the binding at any point by deregisteringits own component after which each side’s component unbind callback iscalled.

We ignore any error during registration and continue with reducedfunctionality (i.e. without HDMI audio).

voidintel_audio_component_cleanup(structintel_display*display)

deregister the audio component

Parameters

structintel_display*display

display device

Description

Deregisters the audio component, breaking any existing binding to thecorresponding snd_hda_intel driver’s master component.

voidintel_audio_init(structintel_display*display)

Initialize the audio driver either using component framework or using lpe audio bridge

Parameters

structintel_display*display

display device

voidintel_audio_deinit(structintel_display*display)

deinitialize the audio driver

Parameters

structintel_display*display

display device

structi915_audio_component

Used for direct communication between i915 and hda drivers

Definition:

struct i915_audio_component {    struct drm_audio_component      base;    int aud_sample_rate[MAX_PORTS];};

Members

base

the drm_audio_component base class

aud_sample_rate

the array of audio sample rate per port

Intel HDMI LPE Audio Support

Motivation:Atom platforms (e.g. valleyview and cherryTrail) integrates a DMA-basedinterface as an alternative to the traditional HDaudio path. While thismode is unrelated to the LPE aka SST audio engine, the documentation refersto this mode as LPE so we keep this notation for the sake of consistency.

The interface is handled by a separate standalone driver maintained in theALSA subsystem for simplicity. To minimize the interaction between the twosubsystems, a bridge is setup between the hdmi-lpe-audio and i915:1. Create a platform device to share MMIO/IRQ resources2. Make the platform device child of i915 device for runtime PM.3. Create IRQ chip to forward the LPE audio irqs.the hdmi-lpe-audio driver probes the lpe audio device and creates a newsound card

Threats:Due to the restriction in Linux platform device model, user need manuallyuninstall the hdmi-lpe-audio driver before uninstalling i915 module,otherwise we might run into use-after-free issues after i915 removes theplatform device: even though hdmi-lpe-audio driver is released, the modulesis still in “installed” status.

Implementation:The MMIO/REG platform resources are created according to the registersspecification.When forwarding LPE audio irqs, the flow control handler selection dependson the platform, for example on valleyview handle_simple_irq is enough.

voidintel_lpe_audio_irq_handler(structintel_display*display)

forwards the LPE audio irq

Parameters

structintel_display*display

display device

Description

the LPE Audio irq is forwarded to the irq handler registered by LPE audiodriver.

intintel_lpe_audio_init(structintel_display*display)

detect and setup the bridge between HDMI LPE Audio driver and i915

Parameters

structintel_display*display

display device

Return

0 if successful. non-zero if detection orllocation/initialization fails

voidintel_lpe_audio_teardown(structintel_display*display)

destroy the bridge between HDMI LPE audio driver and i915

Parameters

structintel_display*display

display device

Description

release all the resources for LPE audio <-> i915 bridge.

voidintel_lpe_audio_notify(structintel_display*display,enumtranscodercpu_transcoder,enumportport,constvoid*eld,intls_clock,booldp_output)

notify lpe audio event audio driver and i915

Parameters

structintel_display*display

display device

enumtranscodercpu_transcoder

CPU transcoder

enumportport

port

constvoid*eld

ELD data

intls_clock

Link symbol clock in kHz

booldp_output

Driving a DP output?

Description

Notify lpe audio driver of eld change.

Panel Self Refresh PSR (PSR/SRD)

Since Haswell Display controller supports Panel Self-Refresh on displaypanels witch have a remote frame buffer (RFB) implemented according to PSRspec in eDP1.3. PSR feature allows the display to go to lower standby stateswhen system is idle but display is on as it eliminates display refreshrequest to DDR memory completely as long as the frame buffer for thatdisplay is unchanged.

Panel Self Refresh must be supported by both Hardware (source) andPanel (sink).

PSR saves power by caching the framebuffer in the panel RFB, which allows usto power down the link and memory controller. For DSI panels the same ideais called “manual mode”.

The implementation uses the hardware-based PSR support which automaticallyenters/exits self-refresh mode. The hardware takes care of sending therequired DP aux message and could even retrain the link (that part isn’tenabled yet though). The hardware also keeps track of any frontbufferchanges to know when to exit self-refresh mode again. Unfortunately thatpart doesn’t work too well, hence why the i915 PSR support uses thesoftware frontbuffer tracking to make sure it doesn’t miss a screenupdate. For this integrationintel_psr_invalidate() andintel_psr_flush()get called by the frontbuffer tracking code. Note that because of lockingissues the self-refresh re-enable code is done from a work queue, whichmust be correctly synchronized/cancelled when shutting down the pipe.”

DC3CO (DC3 clock off)

On top of PSR2, GEN12 adds a intermediate power savings state that turnsclock off automatically during PSR2 idle state.The smaller overhead of DC3co entry/exit vs. the overhead of PSR2 deep sleepentry/exit allows the HW to enter a low-power state even when page flippingperiodically (for instance a 30fps video playback scenario).

Every time a flips occurs PSR2 will get out of deep sleep state(if it was),so DC3CO is enabled and tgl_dc3co_disable_work is schedule to run after 6frames, if no other flip occurs and the function above is executed, DC3CO isdisabled and PSR2 is configured to enter deep sleep, resetting again in caseof another flip.Front buffer modifications do not trigger DC3CO activation on purpose as itwould bring a lot of complexity and most of the moderns systems will onlyuse page flips.

voidintel_psr_disable(structintel_dp*intel_dp,conststructintel_crtc_state*old_crtc_state)

Disable PSR

Parameters

structintel_dp*intel_dp

Intel DP

conststructintel_crtc_state*old_crtc_state

old CRTC state

Description

This function needs to be called before disabling pipe.

voidintel_psr_pause(structintel_dp*intel_dp)

Pause PSR

Parameters

structintel_dp*intel_dp

Intel DP

Description

This function need to be called after enabling psr.

voidintel_psr_resume(structintel_dp*intel_dp)

Resume PSR

Parameters

structintel_dp*intel_dp

Intel DP

Description

This function need to be called after pausing psr.

boolintel_psr_needs_vblank_notification(conststructintel_crtc_state*crtc_state)

Check if PSR need vblank enable/disable notification.

Parameters

conststructintel_crtc_state*crtc_state

CRTC status

Description

We need to block DC6 entry in case of Panel Replay as enabling VBI doesn’tprevent it in case of Panel Replay. Panel Replay switches main link off onDC entry. This means vblank interrupts are not fired and is a problem ifuser-space is polling for vblank events. Also Wa_16025596647 needsinformation when vblank is enabled/disabled.

voidintel_psr_trigger_frame_change_event(structintel_dsb*dsb,structintel_atomic_state*state,structintel_crtc*crtc)

Trigger “Frame Change” event

Parameters

structintel_dsb*dsb

DSB context

structintel_atomic_state*state

the atomic state

structintel_crtc*crtc

the CRTC

Description

Generate PSR “Frame Change” event.

intintel_psr_min_set_context_latency(conststructintel_crtc_state*crtc_state)

Minimum ‘set context latency’ lines needed by PSR

Parameters

conststructintel_crtc_state*crtc_state

the crtc state

Description

Return minimum SCL lines/delay needed by PSR.

voidintel_psr_wait_for_idle_locked(conststructintel_crtc_state*new_crtc_state)

wait for PSR be ready for a pipe update

Parameters

conststructintel_crtc_state*new_crtc_state

new CRTC state

Description

This function is expected to be called frompipe_update_start() where it isnot expected to race with PSR enable or disable.

voidintel_psr_invalidate(structintel_display*display,unsignedfrontbuffer_bits,enumfb_op_originorigin)

Invalidate PSR

Parameters

structintel_display*display

display device

unsignedfrontbuffer_bits

frontbuffer plane tracking bits

enumfb_op_originorigin

which operation caused the invalidate

Description

Since the hardware frontbuffer tracking has gaps we need to integratewith the software frontbuffer tracking. This function gets called everytime frontbuffer rendering starts and a buffer gets dirtied. PSR must bedisabled if the frontbuffer mask contains a buffer relevant to PSR.

Dirty frontbuffers relevant to PSR are tracked in busy_frontbuffer_bits.”

voidintel_psr_flush(structintel_display*display,unsignedfrontbuffer_bits,enumfb_op_originorigin)

Flush PSR

Parameters

structintel_display*display

display device

unsignedfrontbuffer_bits

frontbuffer plane tracking bits

enumfb_op_originorigin

which operation caused the flush

Description

Since the hardware frontbuffer tracking has gaps we need to integratewith the software frontbuffer tracking. This function gets called everytime frontbuffer rendering has completed and flushed out to memory. PSRcan be enabled again if no other frontbuffer relevant to PSR is dirty.

Dirty frontbuffers relevant to PSR are tracked in busy_frontbuffer_bits.

voidintel_psr_init(structintel_dp*intel_dp)

Init basic PSR work and mutex.

Parameters

structintel_dp*intel_dp

Intel DP

Description

This function is called after the initializing connector.(the initializing of connector treats the handling of connector capabilities)And it initializes basic PSR stuff for each DP Encoder.

boolintel_psr_link_ok(structintel_dp*intel_dp)

return psr->link_ok

Parameters

structintel_dp*intel_dp

structintel_dp

Description

We are seeing unexpected link re-trainings with some panels. This is causedby panel stating bad link status after PSR is enabled. Code checking linkstatus can call this to ensure it can ignore bad link status stated by thepanel I.e. if panel is stating bad link and intel_psr_link_ok is stating linkis ok caller should rely on latter.

Return value of link_ok

voidintel_psr_lock(conststructintel_crtc_state*crtc_state)

grab PSR lock

Parameters

conststructintel_crtc_state*crtc_state

the crtc state

Description

This is initially meant to be used by around CRTC update, whenvblank sensitive registers are updated and we need grab the lockbefore it to avoid vblank evasion.

voidintel_psr_unlock(conststructintel_crtc_state*crtc_state)

release PSR lock

Parameters

conststructintel_crtc_state*crtc_state

the crtc state

Description

Release the PSR lock that was held during pipe update.

voidintel_psr_notify_dc5_dc6(structintel_display*display)

Notify PSR about enable/disable dc5/dc6

Parameters

structintel_display*display

intel atomic state

Description

This is targeted for underrun on idle PSR HW bug (Wa_16025596647) to schedulepsr_dc5_dc6_wa_work used for applying/removing the workaround.

voidintel_psr_dc5_dc6_wa_init(structintel_display*display)

Init work for underrun on idle PSR HW bug wa

Parameters

structintel_display*display

intel atomic state

Description

This is targeted for underrun on idle PSR HW bug (Wa_16025596647) to initpsr_dc5_dc6_wa_work used for applying the workaround.

voidintel_psr_notify_pipe_change(structintel_atomic_state*state,structintel_crtc*crtc,boolenable)

Notify PSR about enable/disable of a pipe

Parameters

structintel_atomic_state*state

intel atomic state

structintel_crtc*crtc

intel crtc

boolenable

enable/disable

Description

This is targeted for underrun on idle PSR HW bug (Wa_16025596647) to applyremove the workaround when pipe is getting enabled/disabled

voidintel_psr_notify_vblank_enable_disable(structintel_display*display,boolenable)

Notify PSR about enable/disable of vblank

Parameters

structintel_display*display

intel display struct

boolenable

enable/disable

Description

This is targeted for underrun on idle PSR HW bug (Wa_16025596647) to applyremove the workaround when vblank is getting enabled/disabled

Frame Buffer Compression (FBC)

FBC tries to save memory bandwidth (and so power consumption) bycompressing the amount of memory used by the display. It is totaltransparent to user space and completely handled in the kernel.

The benefits of FBC are mostly visible with solid backgrounds andvariation-less patterns. It comes from keeping the memory footprint smalland having fewer memory pages opened and accessed for refreshing the display.

i915 is responsible to reserve stolen memory for FBC and configure itsoffset on proper registers. The hardware takes care of allcompress/decompress. However there are many known cases where we have toforcibly disable it to allow proper screen updates.

voidintel_fbc_disable(structintel_crtc*crtc)

disable FBC if it’s associated with crtc

Parameters

structintel_crtc*crtc

the CRTC

Description

This function disables FBC if it’s associated with the provided CRTC.

voidintel_fbc_handle_fifo_underrun_irq(structintel_display*display)

disable FBC when we get a FIFO underrun

Parameters

structintel_display*display

display

Description

Without FBC, most underruns are harmless and don’t really cause too manyproblems, except for an annoying message on dmesg. With FBC, underruns canbecome black screens or even worse, especially when paired with badwatermarks. So in order for us to be on the safe side, completely disable FBCin case we ever detect a FIFO underrun on any pipe. An underrun on any pipealready suggests that watermarks may be bad, so try to be as safe aspossible.

This function is called from the IRQ handler.

voidintel_fbc_read_underrun_dbg_info(structintel_display*display,enumpipepipe,boollog)

Read and log FBC-related FIFO underrun debug info

Parameters

structintel_display*display

display device instance

enumpipepipe

the pipe possibly containing the FBC

boollog

log the info?

Description

Ifpipe does not contain an FBC instance, this function bails early.Otherwise, FBC-related FIFO underrun is read and cleared, and then, iflogis true, printed with error level.

voidintel_fbc_init(structintel_display*display)

Initialize FBC

Parameters

structintel_display*display

display

Description

This function might be called during PM init process.

voidintel_fbc_sanitize(structintel_display*display)

Sanitize FBC

Parameters

structintel_display*display

display

Description

Make sure FBC is initially disabled since we have noidea eg. into which parts of stolen it might be scribblinginto.

Display Refresh Rate Switching (DRRS)

Display Refresh Rate Switching (DRRS) is a power conservation featurewhich enables swtching between low and high refresh rates,dynamically, based on the usage scenario. This feature is applicablefor internal panels.

Indication that the panel supports DRRS is given by the panel EDID, whichwould list multiple refresh rates for one resolution.

DRRS is of 2 types - static and seamless.Static DRRS involves changing refresh rate (RR) by doing a full modeset(may appear as a blink on screen) and is used in dock-undock scenario.Seamless DRRS involves changing RR without any visual effect to the userand can be used during normal system usage. This is done by programmingcertain registers.

Support for static/seamless DRRS may be indicated in the VBT based oninputs from the panel spec.

DRRS saves power by switching to low RR based on usage scenarios.

The implementation is based on frontbuffer tracking implementation. Whenthere is a disturbance on the screen triggered by user activity or a periodicsystem activity, DRRS is disabled (RR is changed to high RR). When there isno movement on screen, after a timeout of 1 second, a switch to low RR ismade.

For integration with frontbuffer tracking code,intel_drrs_invalidate()andintel_drrs_flush() are called.

DRRS can be further extended to support other internal panels and alsothe scenario of video playback wherein RR is set based on the raterequested by userspace.

voidintel_drrs_activate(conststructintel_crtc_state*crtc_state)

activate DRRS

Parameters

conststructintel_crtc_state*crtc_state

the crtc state

Description

Activates DRRS on the crtc.

voidintel_drrs_deactivate(conststructintel_crtc_state*old_crtc_state)

deactivate DRRS

Parameters

conststructintel_crtc_state*old_crtc_state

the old crtc state

Description

Deactivates DRRS on the crtc.

voidintel_drrs_invalidate(structintel_display*display,unsignedintfrontbuffer_bits)

Disable Idleness DRRS

Parameters

structintel_display*display

display device

unsignedintfrontbuffer_bits

frontbuffer plane tracking bits

Description

This function gets called everytime rendering on the given planes start.Hence DRRS needs to be Upclocked, i.e. (LOW_RR -> HIGH_RR).

Dirty frontbuffers relevant to DRRS are tracked in busy_frontbuffer_bits.

voidintel_drrs_flush(structintel_display*display,unsignedintfrontbuffer_bits)

Restart Idleness DRRS

Parameters

structintel_display*display

display device

unsignedintfrontbuffer_bits

frontbuffer plane tracking bits

Description

This function gets called every time rendering on the given planes hascompleted or flip on a crtc is completed. So DRRS should be upclocked(LOW_RR -> HIGH_RR). And also Idleness detection should be started again,if no other planes are dirty.

Dirty frontbuffers relevant to DRRS are tracked in busy_frontbuffer_bits.

voidintel_drrs_crtc_init(structintel_crtc*crtc)

Init DRRS for CRTC

Parameters

structintel_crtc*crtc

crtc

Description

This function is called only once at driver load to initialize basicDRRS stuff.

DPIO

VLV, CHV and BXT have slightly peculiar display PHYs for driving DP/HDMIports. DPIO is the name given to such a display PHY. These PHYsdon’t follow the standard programming model using direct MMIOregisters, and instead their registers must be accessed through IOSFsideband. VLV has one such PHY for driving ports B and C, and CHVadds another PHY for driving port D. Each PHY responds to specificIOSF-SB port.

Each display PHY is made up of one or two channels. Each channelhouses a common lane part which contains the PLL and other commonlogic. CH0 common lane also contains the IOSF-SB logic for theCommon Register Interface (CRI) ie. the DPIO registers. CRI clockmust be running when any DPIO registers are accessed.

In addition to having their own registers, the PHYs are alsocontrolled through some dedicated signals from the displaycontroller. These include PLL reference clock enable, PLL enable,and CRI clock selection, for example.

Eeach channel also has two splines (also called data lanes), andeach spline is made up of one Physical Access Coding Sub-Layer(PCS) block and two TX lanes. So each channel has two PCS blocksand four TX lanes. The TX lanes are used as DP lanes or TMDSdata/clock pairs depending on the output type.

Additionally the PHY also contains an AUX lane with AUX blocksfor each channel. This is used for DP AUX communication, butthis fact isn’t really relevant for the driver since AUX iscontrolled from the display controller side. No DPIO registersneed to be accessed during AUX communication,

Generally on VLV/CHV the common lane corresponds to the pipe andthe spline (PCS/TX) corresponds to the port.

For dual channel PHY (VLV/CHV):

pipe A == CMN/PLL/REF CH0

pipe B == CMN/PLL/REF CH1

port B == PCS/TX CH0

port C == PCS/TX CH1

This is especially important when we cross the streamsie. drive port B with pipe B, or port C with pipe A.

For single channel PHY (CHV):

pipe C == CMN/PLL/REF CH0

port D == PCS/TX CH0

On BXT the entire PHY channel corresponds to the port. That meansthe PLL is also now associated with the port rather than the pipe,and so the clock needs to be routed to the appropriate transcoder.Port A PLL is directly connected to transcoder EDP and port B/CPLLs can be routed to any transcoder A/B/C.

Note: DDI0 is digital port B, DD1 is digital port C, and DDI2 isdigital port D (CHV) or port A (BXT).

Dual channel PHY (VLV/CHV/BXT)---------------------------------|      CH0      |      CH1      ||  CMN/PLL/REF  |  CMN/PLL/REF  ||---------------|---------------| Display PHY| PCS01 | PCS23 | PCS01 | PCS23 ||-------|-------|-------|-------||TX0|TX1|TX2|TX3|TX0|TX1|TX2|TX3|---------------------------------|     DDI0      |     DDI1      | DP/HDMI ports---------------------------------Single channel PHY (CHV/BXT)-----------------|      CH0      ||  CMN/PLL/REF  ||---------------| Display PHY| PCS01 | PCS23 ||-------|-------||TX0|TX1|TX2|TX3|-----------------|     DDI2      | DP/HDMI port-----------------

DMC Firmware Support

From gen9 onwards we have newly added DMC (Display microcontroller) in displayengine to save and restore the state of display engine when it enter intolow-power state and comes back to normal.

voidintel_dmc_block_pkgc(structintel_display*display,enumpipepipe,boolblock)

block PKG C-state

Parameters

structintel_display*display

display instance

enumpipepipe

pipe which register use to block

boolblock

block/unblock

Description

This interface is target for Wa_16025596647 usage. I.e. to set/clearPIPEDMC_BLOCK_PKGC_SW_BLOCK_PKGC_ALWAYS bit in PIPEDMC_BLOCK_PKGC_SW register.

voidintel_dmc_start_pkgc_exit_at_start_of_undelayed_vblank(structintel_display*display,enumpipepipe,boolenable)

start of PKG C-state exit

Parameters

structintel_display*display

display instance

enumpipepipe

pipe which register use to block

boolenable

enable/disable

Description

This interface is target for Wa_16025596647 usage. I.e. start the package Cexit at the start of the undelayed vblank

voidintel_dmc_load_program(structintel_display*display)

write the firmware from memory to register.

Parameters

structintel_display*display

display instance

Description

DMC firmware is read from a .bin file and kept in internal memory one time.Everytime display comes back from low power state this function is called tocopy the firmware from internal memory to registers.

voidintel_dmc_disable_program(structintel_display*display)

disable the firmware

Parameters

structintel_display*display

display instance

Description

Disable all event handlers in the firmware, making sure the firmware isinactive after the display is uninitialized.

voidintel_dmc_init(structintel_display*display)

initialize the firmware loading.

Parameters

structintel_display*display

display instance

Description

This function is called at the time of loading the display driver to readfirmware from a .bin file and copied into a internal memory.

voidintel_dmc_suspend(structintel_display*display)

prepare DMC firmware before system suspend

Parameters

structintel_display*display

display instance

Description

Prepare the DMC firmware before entering system suspend. This includesflushing pending work items and releasing any resources acquired duringinit.

voidintel_dmc_resume(structintel_display*display)

init DMC firmware during system resume

Parameters

structintel_display*display

display instance

Description

Reinitialize the DMC firmware during system resume, reacquiring anyresources released inintel_dmc_suspend().

voidintel_dmc_fini(structintel_display*display)

unload the DMC firmware.

Parameters

structintel_display*display

display instance

Description

Firmmware unloading includes freeing the internal memory and reset thefirmware loading status.

DMC Flip Queue

A flip queue is a ring buffer implemented by the pipe DMC firmware.The driver inserts entries into the queues to be executed by thepipe DMC at a specified presentation timestamp (PTS).

Each pipe DMC provides several queues:

  • 1 general queue (two DSB buffers executed per entry)

  • 3 plane queues (one DSB buffer executed per entry)

  • 1 fast queue (deprecated)

DMC wakelock support

Wake lock is the mechanism to cause display engine to exit DCstates to allow programming to registers that are powered down inthose states. Previous projects exited DC states automatically whendetecting programming. Now software controls the exit byprogramming the wake lock. This improves system performance andsystem interactions and better fits the flip queue style ofprogramming. Wake lock is only required when DC5, DC6, or DC6v havebeen enabled in DC_STATE_EN and the wake lock mode of operation hasbeen enabled.

The wakelock mechanism in DMC allows the display engine to exit DCstates explicitly before programming registers that may be powereddown. In earlier hardware, this was done automatically andimplicitly when the display engine accessed a register. With thewakelock implementation, the driver asserts a wakelock in DMC,which forces it to exit the DC state until the wakelock isdeasserted.

The mechanism can be enabled and disabled by writing to theDMC_WAKELOCK_CFG register. There are also 13 control registersthat can be used to hold and release different wakelocks. In thecurrent implementation, we only need one wakelock, so onlyDMC_WAKELOCK1_CTL is used. The other definitions are here forpotential future use.

Video BIOS Table (VBT)

The Video BIOS Table, or VBT, provides platform and board specificconfiguration information to the driver that is not discoverable or availablethrough other means. The configuration is mostly related to displayhardware. The VBT is available via the ACPI OpRegion or, on older systems, inthe PCI ROM.

The VBT consists of a VBT Header (defined asstructvbt_header), a BDBHeader (structbdb_header), and a number of BIOS Data Blocks (BDB) thatcontain the actual configuration information. The VBT Header, and thus theVBT, begins with “$VBT” signature. The VBT Header contains the offset of theBDB Header. The data blocks are concatenated after the BDB Header. The datablocks have a 1-byte Block ID, 2-byte Block Size, and Block Size bytes ofdata. (Block 53, the MIPI Sequence Block is an exception.)

The driver parses the VBT during load. The relevant information is stored indriver private data for ease of use, and the actual VBT is not read afterthat.

boolintel_bios_is_valid_vbt(structintel_display*display,constvoid*buf,size_tsize)

does the given buffer contain a valid VBT

Parameters

structintel_display*display

display device

constvoid*buf

pointer to a buffer to validate

size_tsize

size of the buffer

Description

Returns true on valid VBT.

voidintel_bios_init(structintel_display*display)

find VBT and initialize settings from the BIOS

Parameters

structintel_display*display

display device instance

Description

Parse and initialize settings from the Video BIOS Tables (VBT). If the VBTwas not found in ACPI OpRegion, try to find it in PCI ROM first. Alsoinitialize some defaults if the VBT is not present at all.

voidintel_bios_driver_remove(structintel_display*display)

Free any resources allocated byintel_bios_init()

Parameters

structintel_display*display

display device instance

boolintel_bios_is_tv_present(structintel_display*display)

is integrated TV present in VBT

Parameters

structintel_display*display

display device instance

Description

Return true if TV is present. If no child devices were parsed from VBT,assume TV is present.

boolintel_bios_is_lvds_present(structintel_display*display,u8*i2c_pin)

is LVDS present in VBT

Parameters

structintel_display*display

display device instance

u8*i2c_pin

i2c pin for LVDS if present

Description

Return true if LVDS is present. If no child devices were parsed from VBT,assume LVDS is present.

boolintel_bios_is_port_present(structintel_display*display,enumportport)

is the specified digital port present

Parameters

structintel_display*display

display device instance

enumportport

port to check

Description

Return true if the device inport is present.

boolintel_bios_is_dsi_present(structintel_display*display,enumport*port)

is DSI present in VBT

Parameters

structintel_display*display

display device instance

enumport*port

port for DSI if present

Description

Return true if DSI is present, and return the port inport.

structvbt_header

VBT Header structure

Definition:

struct vbt_header {    u8 signature[20];    u16 version;    u16 header_size;    u16 vbt_size;    u8 vbt_checksum;    u8 reserved0;    u32 bdb_offset;    u32 aim_offset[4];};

Members

signature

VBT signature, always starts with “$VBT”

version

Version of this structure

header_size

Size of this structure

vbt_size

Size of VBT (VBT Header, BDB Header and data blocks)

vbt_checksum

Checksum

reserved0

Reserved

bdb_offset

Offset ofstructbdb_header from beginning of VBT

aim_offset

Offsets of add-in data blocks from beginning of VBT

structbdb_header

BDB Header structure

Definition:

struct bdb_header {    u8 signature[16];    u16 version;    u16 header_size;    u16 bdb_size;};

Members

signature

BDB signature “BIOS_DATA_BLOCK”

version

Version of the data block definitions

header_size

Size of this structure

bdb_size

Size of BDB (BDB Header and data blocks)

Display clocks

The display engine uses several different clocks to do its work. Thereare two main clocks involved that aren’t directly related to the actualpixel clock or any symbol/bit clock of the actual output port. Theseare the core display clock (CDCLK) and RAWCLK.

CDCLK clocks most of the display pipe logic, and thus its frequencymust be high enough to support the rate at which pixels are flowingthrough the pipes. Downscaling must also be accounted as that increasesthe effective pixel rate.

On several platforms the CDCLK frequency can be changed dynamicallyto minimize power consumption for a given display configuration.Typically changes to the CDCLK frequency require all the display pipesto be shut down while the frequency is being changed.

On SKL+ the DMC will toggle the CDCLK off/on during DC5/6 entry/exit.DMC will not change the active CDCLK frequency however, so that partwill still be performed by the driver directly.

There are multiple components involved in the generation of the CDCLKfrequency:

  • We have the CDCLK PLL, which generates an output clock based on areference clock and a ratio parameter.

  • The CD2X Divider, which divides the output of the PLL based on adivisor selected from a set of pre-defined choices.

  • The CD2X Squasher, which further divides the output based on awaveform represented as a sequence of bits where each zero“squashes out” a clock cycle.

  • And, finally, a fixed divider that divides the output frequency by 2.

As such, the resulting CDCLK frequency can be calculated with thefollowing formula:

cdclk = vco / cd2x_div / (sq_len / sq_div) / 2

, where vco is the frequency generated by the PLL; cd2x_divrepresents the CD2X Divider; sq_len and sq_div are the bit lengthand the number of high bits for the CD2X Squasher waveform, respectively;and 2 represents the fixed divider.

Note that some older platforms do not contain the CD2X Dividerand/or CD2X Squasher, in which case we can ignore their respectivefactors in the formula above.

Several methods exist to change the CDCLK frequency, which ones aresupported depends on the platform:

  • Full PLL disable + re-enable with new VCO frequency. Pipes must be inactive.

  • CD2X divider update. Single pipe can be active as the divider updatecan be synchronized with the pipe’s start of vblank.

  • Crawl the PLL smoothly to the new VCO frequency. Pipes can be active.

  • Squash waveform update. Pipes can be active.

  • Crawl and squash can also be done back to back. Pipes can be active.

RAWCLK is a fixed frequency clock, often used by various auxiliaryblocks such as AUX CH or backlight PWM. Hence the only thing wereally need to know about RAWCLK is its frequency so that variousdividers can be programmed correctly.

voidintel_cdclk_init_hw(structintel_display*display)

Initialize CDCLK hardware

Parameters

structintel_display*display

display instance

Description

Initialize CDCLK. This consists mainly of initializing display->cdclk.hw andsanitizing the state of the hardware if needed. This is generally done onlyduring the display core initialization sequence, after which the DMC willtake care of turning CDCLK off/on as needed.

voidintel_cdclk_uninit_hw(structintel_display*display)

Uninitialize CDCLK hardware

Parameters

structintel_display*display

display instance

Description

Uninitialize CDCLK. This is done only during the display coreuninitialization sequence.

boolintel_cdclk_clock_changed(conststructintel_cdclk_config*a,conststructintel_cdclk_config*b)

Check whether the clock changed

Parameters

conststructintel_cdclk_config*a

first CDCLK configuration

conststructintel_cdclk_config*b

second CDCLK configuration

Return

True if CDCLK changed in a way that requires re-programming andFalse otherwise.

boolintel_cdclk_can_cd2x_update(structintel_display*display,conststructintel_cdclk_config*a,conststructintel_cdclk_config*b)

Determine if changing between the two CDCLK configurations requires only a cd2x divider update

Parameters

structintel_display*display

display instance

conststructintel_cdclk_config*a

first CDCLK configuration

conststructintel_cdclk_config*b

second CDCLK configuration

Return

True if changing between the two CDCLK configurationscan be done with just a cd2x divider update, false if not.

boolintel_cdclk_changed(conststructintel_cdclk_config*a,conststructintel_cdclk_config*b)

Determine if two CDCLK configurations are different

Parameters

conststructintel_cdclk_config*a

first CDCLK configuration

conststructintel_cdclk_config*b

second CDCLK configuration

Return

True if the CDCLK configurations don’t match, false if they do.

voidintel_set_cdclk_pre_plane_update(structintel_atomic_state*state)

Push the CDCLK state to the hardware

Parameters

structintel_atomic_state*state

intel atomic state

Description

Program the hardware before updating the HW plane state based on thenew CDCLK state, if necessary.

voidintel_set_cdclk_post_plane_update(structintel_atomic_state*state)

Push the CDCLK state to the hardware

Parameters

structintel_atomic_state*state

intel atomic state

Description

Program the hardware after updating the HW plane state based on thenew CDCLK state, if necessary.

voidintel_update_max_cdclk(structintel_display*display)

Determine the maximum support CDCLK frequency

Parameters

structintel_display*display

display instance

Description

Determine the maximum CDCLK frequency the platform supports, and alsoderive the maximum dot clock frequency the maximum CDCLK frequencyallows.

voidintel_update_cdclk(structintel_display*display)

Determine the current CDCLK frequency

Parameters

structintel_display*display

display instance

Description

Determine the current CDCLK frequency.

u32intel_read_rawclk(structintel_display*display)

Determine the current RAWCLK frequency

Parameters

structintel_display*display

display instance

Description

Determine the current RAWCLK frequency. RAWCLK is a fixedfrequency clock so this needs to done only once.

voidintel_init_cdclk_hooks(structintel_display*display)

Initialize CDCLK related modesetting hooks

Parameters

structintel_display*display

display instance

Display PLLs

Display PLLs used for driving outputs vary by platform. While some haveper-pipe or per-encoder dedicated PLLs, others allow the use of any PLLfrom a pool. In the latter scenario, it is possible that multiple pipesshare a PLL if their configurations match.

This file provides an abstraction over display PLLs. The functionintel_dpll_init() initializes the PLLs for the given platform. Theusers of a PLL are tracked and that tracking is integrated with the atomicmodset interface. During an atomic operation, required PLLs can be reservedfor a given CRTC and encoder configuration by callingintel_dpll_reserve() and previously reserved PLLs can be releasedwithintel_dpll_release().Changes to the users are first staged in the atomic state, and then madeeffective by callingintel_dpll_swap_state() during the atomiccommit phase.

structintel_dpll*intel_get_dpll_by_id(structintel_display*display,enumintel_dpll_idid)

get a DPLL given its id

Parameters

structintel_display*display

intel_display device instance

enumintel_dpll_idid

pll id

Return

A pointer to the DPLL withid

voidintel_dpll_enable(conststructintel_crtc_state*crtc_state)

enable a CRTC’s DPLL

Parameters

conststructintel_crtc_state*crtc_state

CRTC, and its state, which has a DPLL

Description

Enable DPLL used bycrtc.

voidintel_dpll_disable(conststructintel_crtc_state*crtc_state)

disable a CRTC’s shared DPLL

Parameters

conststructintel_crtc_state*crtc_state

CRTC, and its state, which has a shared DPLL

Description

Disable DPLL used bycrtc.

voidintel_dpll_crtc_get(conststructintel_crtc*crtc,conststructintel_dpll*pll,structintel_dpll_state*dpll_state)

Get a DPLL reference for a CRTC

Parameters

conststructintel_crtc*crtc

CRTC on which behalf the reference is taken

conststructintel_dpll*pll

DPLL for which the reference is taken

structintel_dpll_state*dpll_state

the DPLL atomic state in which the reference is tracked

Description

Take a reference forpll tracking the use of it bycrtc.

voidintel_dpll_crtc_put(conststructintel_crtc*crtc,conststructintel_dpll*pll,structintel_dpll_state*dpll_state)

Drop a DPLL reference for a CRTC

Parameters

conststructintel_crtc*crtc

CRTC on which behalf the reference is dropped

conststructintel_dpll*pll

DPLL for which the reference is dropped

structintel_dpll_state*dpll_state

the DPLL atomic state in which the reference is tracked

Description

Drop a reference forpll tracking the end of use of it bycrtc.

voidintel_dpll_swap_state(structintel_atomic_state*state)

make atomic DPLL configuration effective

Parameters

structintel_atomic_state*state

atomic state

Description

This is the dpll version ofdrm_atomic_helper_swap_state() since thehelper does not handle driver-specific global state.

For consistency with atomic helpers this function does a complete swap,i.e. it also puts the current state intostate, even though there is noneed for that at this moment.

voidicl_set_active_port_dpll(structintel_crtc_state*crtc_state,enumicl_port_dpll_idport_dpll_id)

select the active port DPLL for a given CRTC

Parameters

structintel_crtc_state*crtc_state

state for the CRTC to select the DPLL for

enumicl_port_dpll_idport_dpll_id

the activeport_dpll_id to select

Description

Select the givenport_dpll_id instance from the DPLLs reserved for theCRTC.

voidintel_dpll_init(structintel_display*display)

Initialize DPLLs

Parameters

structintel_display*display

intel_display device

Description

Initialize DPLLs fordisplay.

intintel_dpll_compute(structintel_atomic_state*state,structintel_crtc*crtc,structintel_encoder*encoder)

compute DPLL state CRTC and encoder combination

Parameters

structintel_atomic_state*state

atomic state

structintel_crtc*crtc

CRTC to compute DPLLs for

structintel_encoder*encoder

encoder

Description

This function computes the DPLL state for the given CRTC and encoder.

The new configuration in the atomic commitstate is made effective bycallingintel_dpll_swap_state().

Return

0 on success, negative error code on failure.

intintel_dpll_reserve(structintel_atomic_state*state,structintel_crtc*crtc,structintel_encoder*encoder)

reserve DPLLs for CRTC and encoder combination

Parameters

structintel_atomic_state*state

atomic state

structintel_crtc*crtc

CRTC to reserve DPLLs for

structintel_encoder*encoder

encoder

Description

This function reserves all required DPLLs for the given CRTC and encodercombination in the current atomic commitstate and the newcrtc atomicstate.

The new configuration in the atomic commitstate is made effective bycallingintel_dpll_swap_state().

The reserved DPLLs should be released by callingintel_dpll_release().

Return

0 if all required DPLLs were successfully reserved,negative error code otherwise.

voidintel_dpll_release(structintel_atomic_state*state,structintel_crtc*crtc)

end use of DPLLs by CRTC in atomic state

Parameters

structintel_atomic_state*state

atomic state

structintel_crtc*crtc

crtc from which the DPLLs are to be released

Description

This function releases all DPLLs reserved byintel_dpll_reserve()from the current atomic commitstate and the oldcrtc atomic state.

The new configuration in the atomic commitstate is made effective bycallingintel_dpll_swap_state().

voidintel_dpll_update_active(structintel_atomic_state*state,structintel_crtc*crtc,structintel_encoder*encoder)

update the active DPLL for a CRTC/encoder

Parameters

structintel_atomic_state*state

atomic state

structintel_crtc*crtc

the CRTC for which to update the active DPLL

structintel_encoder*encoder

encoder determining the type of port DPLL

Description

Update the active DPLL for the givencrtc/encoder incrtc’s atomic state,from the port DPLLs reserved previously byintel_dpll_reserve(). TheDPLL selected will be based on the current mode of the encoder’s port.

intintel_dpll_get_freq(structintel_display*display,conststructintel_dpll*pll,conststructintel_dpll_hw_state*dpll_hw_state)

calculate the DPLL’s output frequency

Parameters

structintel_display*display

intel_display device

conststructintel_dpll*pll

DPLL for which to calculate the output frequency

conststructintel_dpll_hw_state*dpll_hw_state

DPLL state from which to calculate the output frequency

Description

Return the output frequency corresponding topll’s passed indpll_hw_state.

boolintel_dpll_get_hw_state(structintel_display*display,structintel_dpll*pll,structintel_dpll_hw_state*dpll_hw_state)

readout the DPLL’s hardware state

Parameters

structintel_display*display

intel_display device instance

structintel_dpll*pll

DPLL for which to calculate the output frequency

structintel_dpll_hw_state*dpll_hw_state

DPLL’s hardware state

Description

Read outpll’s hardware state intodpll_hw_state.

voidintel_dpll_dump_hw_state(structintel_display*display,structdrm_printer*p,conststructintel_dpll_hw_state*dpll_hw_state)

dump hw_state

Parameters

structintel_display*display

intel_display structure

structdrm_printer*p

where to print the state to

conststructintel_dpll_hw_state*dpll_hw_state

hw state to be dumped

Description

Dumo out the relevant values indpll_hw_state.

boolintel_dpll_compare_hw_state(structintel_display*display,conststructintel_dpll_hw_state*a,conststructintel_dpll_hw_state*b)

compare the two states

Parameters

structintel_display*display

intel_display structure

conststructintel_dpll_hw_state*a

first DPLL hw state

conststructintel_dpll_hw_state*b

second DPLL hw state

Description

Compare DPLL hw statesa andb.

Return

true if the states are equal, false if the differ

enumintel_dpll_id

possible DPLL ids

Constants

DPLL_ID_PRIVATE

non-shared dpll in use

DPLL_ID_PCH_PLL_A

DPLL A in ILK, SNB and IVB

DPLL_ID_PCH_PLL_B

DPLL B in ILK, SNB and IVB

DPLL_ID_WRPLL1

HSW and BDW WRPLL1

DPLL_ID_WRPLL2

HSW and BDW WRPLL2

DPLL_ID_SPLL

HSW and BDW SPLL

DPLL_ID_LCPLL_810

HSW and BDW 0.81 GHz LCPLL

DPLL_ID_LCPLL_1350

HSW and BDW 1.35 GHz LCPLL

DPLL_ID_LCPLL_2700

HSW and BDW 2.7 GHz LCPLL

DPLL_ID_SKL_DPLL0

SKL and later DPLL0

DPLL_ID_SKL_DPLL1

SKL and later DPLL1

DPLL_ID_SKL_DPLL2

SKL and later DPLL2

DPLL_ID_SKL_DPLL3

SKL and later DPLL3

DPLL_ID_ICL_DPLL0

ICL/TGL combo PHY DPLL0

DPLL_ID_ICL_DPLL1

ICL/TGL combo PHY DPLL1

DPLL_ID_EHL_DPLL4

EHL combo PHY DPLL4

DPLL_ID_ICL_TBTPLL

ICL/TGL TBT PLL

DPLL_ID_ICL_MGPLL1
ICL MG PLL 1 port 1 (C),

TGL TC PLL 1 port 1 (TC1)

DPLL_ID_ICL_MGPLL2
ICL MG PLL 1 port 2 (D)

TGL TC PLL 1 port 2 (TC2)

DPLL_ID_ICL_MGPLL3
ICL MG PLL 1 port 3 (E)

TGL TC PLL 1 port 3 (TC3)

DPLL_ID_ICL_MGPLL4
ICL MG PLL 1 port 4 (F)

TGL TC PLL 1 port 4 (TC4)

DPLL_ID_TGL_MGPLL5

TGL TC PLL port 5 (TC5)

DPLL_ID_TGL_MGPLL6

TGL TC PLL port 6 (TC6)

DPLL_ID_DG1_DPLL0

DG1 combo PHY DPLL0

DPLL_ID_DG1_DPLL1

DG1 combo PHY DPLL1

DPLL_ID_DG1_DPLL2

DG1 combo PHY DPLL2

DPLL_ID_DG1_DPLL3

DG1 combo PHY DPLL3

Description

Enumeration of possible IDs for a DPLL. Real shared dpll ids must be >= 0.

structintel_dpll_state

hold the DPLL atomic state

Definition:

struct intel_dpll_state {    u8 pipe_mask;    struct intel_dpll_hw_state hw_state;};

Members

pipe_mask

mask of pipes using this DPLL, active or not

hw_state

hardware configuration for the DPLL stored instructintel_dpll_hw_state.

Description

This structure holds an atomic state for the DPLL, that can representeither its current state (in structintel_shared_dpll) or a desiredfuture state which would be applied by an atomic mode set (stored ina structintel_atomic_state).

See alsointel_reserve_shared_dplls() andintel_release_shared_dplls().

structdpll_info

display PLL platform specific info

Definition:

struct dpll_info {    const char *name;    const struct intel_dpll_funcs *funcs;    enum intel_dpll_id id;    enum intel_display_power_domain power_domain;    bool always_on;    bool is_alt_port_dpll;};

Members

name

DPLL name; used for logging

funcs

platform specific hooks

id

unique identifier for this DPLL

power_domain

extra power domain required by the DPLL

always_on

Inform the state checker that the DPLL is kept enabled even ifnot in use by any CRTC.

is_alt_port_dpll

Inform the state checker that the DPLL can be used as a fallback(for TC->TBT fallback).

structintel_dpll

display PLL with tracked state and users

Definition:

struct intel_dpll {    struct intel_dpll_state state;    u8 index;    u8 active_mask;    bool on;    const struct dpll_info *info;    struct ref_tracker *wakeref;};

Members

state

Store the state for the pll, including its hw stateand CRTCs using it.

index

index for atomic state

active_mask

mask of active pipes (i.e. DPMS on) using this DPLL

on

is the PLL actually active? Disabled during modeset

info

platform specific info

wakeref

In some platforms a device-level runtime pm reference mayneed to be grabbed to disable DC states while this DPLL is enabled

Display State Buffer

A DSB (Display State Buffer) is a queue of MMIO instructions in the memorywhich can be offloaded to DSB HW in Display Controller. DSB HW is a DMAengine that can be programmed to download the DSB from memory.It allows driver to batch submit display HW programming. This helps toreduce loading time and CPU activity, thereby making the context switchfaster. DSB Support added from Gen12 Intel graphics based platform.

DSB’s can access only the pipe, plane, and transcoder Data Island Packetregisters.

DSB HW can support only register writes (both indexed and direct MMIOwrites). There are no registers reads possible with DSB HW engine.

voidintel_dsb_reg_write_indexed(structintel_dsb*dsb,i915_reg_treg,u32val)

Emit indexed register write to the DSB context

Parameters

structintel_dsb*dsb

DSB context

i915_reg_treg

register address.

u32val

value.

Description

This function is used for writing register-value pair in commandbuffer of DSB.

Note that indexed writes are slower than normal MMIO writesfor a small number (less than 5 or so) of writes to the sameregister.

voidintel_dsb_commit(structintel_dsb*dsb)

Trigger workload execution of DSB.

Parameters

structintel_dsb*dsb

DSB context

Description

This function is used to do actual write to hardware using DSB.

structintel_dsb*intel_dsb_prepare(structintel_atomic_state*state,structintel_crtc*crtc,enumintel_dsb_iddsb_id,unsignedintmax_cmds)

Allocate, pin and map the DSB command buffer.

Parameters

structintel_atomic_state*state

the atomic state

structintel_crtc*crtc

the CRTC

enumintel_dsb_iddsb_id

the DSB engine to use

unsignedintmax_cmds

number of commands we need to fit into command buffer

Description

This function prepare the command buffer which is used to store dsbinstructions with data.

Return

DSB context, NULL on failure

voidintel_dsb_cleanup(structintel_dsb*dsb)

To cleanup DSB context.

Parameters

structintel_dsb*dsb

DSB context

Description

This function cleanup the DSB context by unpinning and releasingthe VMA object associated with it.

GT Programming

Multicast/Replicated (MCR) Registers

Some GT registers are designed as “multicast” or “replicated” registers:multiple instances of the same register share a single MMIO offset. MCRregisters are generally used when the hardware needs to potentially trackindependent values of a register per hardware unit (e.g., per-subslice,per-L3bank, etc.). The specific types of replication that exist varyper-platform.

MMIO accesses to MCR registers are controlled according to the settingsprogrammed in the platform’s MCR_SELECTOR register(s). MMIO writes to MCRregisters can be done in either a (i.e., a single write updates allinstances of the register to the same value) or unicast (a write updates onlyone specific instance). Reads of MCR registers always operate in a unicastmanner regardless of how the multicast/unicast bit is set in MCR_SELECTOR.Selection of a specific MCR instance for unicast operations is referred toas “steering.”

If MCR register operations are steered toward a hardware unit that isfused off or currently powered down due to power gating, the MMIO operationis “terminated” by the hardware. Terminated read operations will return avalue of zero and terminated unicast write operations will be silentlyignored.

voidintel_gt_mcr_lock(structintel_gt*gt,unsignedlong*flags)

Acquire MCR steering lock

Parameters

structintel_gt*gt

GT structure

unsignedlong*flags

storage to save IRQ flags to

Description

Performs locking to protect the steering for the duration of an MCRoperation. On MTL and beyond, a hardware lock will also be taken toserialize access not only for the driver, but also for external hardware andfirmware agents.

Context

Takes gt->mcr_lock. uncore->lock shouldnot be held when thisfunction is called, although it may be acquired after thisfunction call.

voidintel_gt_mcr_unlock(structintel_gt*gt,unsignedlongflags)

Release MCR steering lock

Parameters

structintel_gt*gt

GT structure

unsignedlongflags

IRQ flags to restore

Description

Releases the lock acquired byintel_gt_mcr_lock().

Context

Releases gt->mcr_lock

voidintel_gt_mcr_lock_sanitize(structintel_gt*gt)

Sanitize MCR steering lock

Parameters

structintel_gt*gt

GT structure

Description

This will be used to sanitize the initial status of the hardware lockduring driver load and resume since there won’t be any concurrent accessfrom other agents at those times, but it’s possible that boot firmwaremay have left the lock in a bad state.

u32intel_gt_mcr_read(structintel_gt*gt,i915_mcr_reg_treg,intgroup,intinstance)

read a specific instance of an MCR register

Parameters

structintel_gt*gt

GT structure

i915_mcr_reg_treg

the MCR register to read

intgroup

the MCR group

intinstance

the MCR instance

Context

Takes and releases gt->mcr_lock

Description

Returns the value read from an MCR register after steering toward a specificgroup/instance.

voidintel_gt_mcr_unicast_write(structintel_gt*gt,i915_mcr_reg_treg,u32value,intgroup,intinstance)

write a specific instance of an MCR register

Parameters

structintel_gt*gt

GT structure

i915_mcr_reg_treg

the MCR register to write

u32value

value to write

intgroup

the MCR group

intinstance

the MCR instance

Description

Write an MCR register in unicast mode after steering toward a specificgroup/instance.

Context

Calls a function that takes and releases gt->mcr_lock

voidintel_gt_mcr_multicast_write(structintel_gt*gt,i915_mcr_reg_treg,u32value)

write a value to all instances of an MCR register

Parameters

structintel_gt*gt

GT structure

i915_mcr_reg_treg

the MCR register to write

u32value

value to write

Description

Write an MCR register in multicast mode to update all instances.

Context

Takes and releases gt->mcr_lock

voidintel_gt_mcr_multicast_write_fw(structintel_gt*gt,i915_mcr_reg_treg,u32value)

write a value to all instances of an MCR register

Parameters

structintel_gt*gt

GT structure

i915_mcr_reg_treg

the MCR register to write

u32value

value to write

Description

Write an MCR register in multicast mode to update all instances. Thisfunction assumes the caller is already holding any necessary forcewakedomains; useintel_gt_mcr_multicast_write() in cases where forcewake shouldbe obtained automatically.

Context

The caller must hold gt->mcr_lock.

u32intel_gt_mcr_multicast_rmw(structintel_gt*gt,i915_mcr_reg_treg,u32clear,u32set)

Performs a multicast RMW operations

Parameters

structintel_gt*gt

GT structure

i915_mcr_reg_treg

the MCR register to read and write

u32clear

bits to clear during RMW

u32set

bits to set during RMW

Description

Performs a read-modify-write on an MCR register in a multicast manner.This operation only makes sense on MCR registers where all instances areexpected to have the same value. The read will target any non-terminatedinstance and the write will be applied to all instances.

This function assumes the caller is already holding any necessary forcewakedomains; useintel_gt_mcr_multicast_rmw() in cases where forcewake shouldbe obtained automatically.

Returns the old (unmodified) value read.

Context

Calls functions that take and release gt->mcr_lock

voidintel_gt_mcr_get_nonterminated_steering(structintel_gt*gt,i915_mcr_reg_treg,u8*group,u8*instance)

find group/instance values that will steer a register to a non-terminated instance

Parameters

structintel_gt*gt

GT structure

i915_mcr_reg_treg

register for which the steering is required

u8*group

return variable for group steering

u8*instance

return variable for instance steering

Description

This function returns a group/instance pair that is guaranteed to work forread steering of the given register. Note that a value will be returned evenif the register is not replicated and therefore does not actually requiresteering.

u32intel_gt_mcr_read_any_fw(structintel_gt*gt,i915_mcr_reg_treg)

reads one instance of an MCR register

Parameters

structintel_gt*gt

GT structure

i915_mcr_reg_treg

register to read

Description

Reads a GT MCR register. The read will be steered to a non-terminatedinstance (i.e., one that isn’t fused off or powered down by power gating).This function assumes the caller is already holding any necessary forcewakedomains; useintel_gt_mcr_read_any() in cases where forcewake should beobtained automatically.

Returns the value from a non-terminated instance ofreg.

Context

The caller must hold gt->mcr_lock.

u32intel_gt_mcr_read_any(structintel_gt*gt,i915_mcr_reg_treg)

reads one instance of an MCR register

Parameters

structintel_gt*gt

GT structure

i915_mcr_reg_treg

register to read

Description

Reads a GT MCR register. The read will be steered to a non-terminatedinstance (i.e., one that isn’t fused off or powered down by power gating).

Returns the value from a non-terminated instance ofreg.

Context

Calls a function that takes and releases gt->mcr_lock.

voidintel_gt_mcr_get_ss_steering(structintel_gt*gt,unsignedintdss,unsignedint*group,unsignedint*instance)

returns the group/instance steering for a SS

Parameters

structintel_gt*gt

GT structure

unsignedintdss

DSS ID to obtain steering for

unsignedint*group

pointer to storage for steering group ID

unsignedint*instance

pointer to storage for steering instance ID

Description

Returns the steering IDs (via thegroup andinstance parameters) thatcorrespond to a specific subslice/DSS ID.

intintel_gt_mcr_wait_for_reg(structintel_gt*gt,i915_mcr_reg_treg,u32mask,u32value,unsignedintfast_timeout_us,unsignedintslow_timeout_ms)

wait until MCR register matches expected state

Parameters

structintel_gt*gt

GT structure

i915_mcr_reg_treg

the register to read

u32mask

mask to apply to register value

u32value

value to wait for

unsignedintfast_timeout_us

fast timeout in microsecond for atomic/tight wait

unsignedintslow_timeout_ms

slow timeout in millisecond

Description

This routine waits until the target registerreg contains the expectedvalue after applying themask, i.e. it waits until

(intel_gt_mcr_read_any_fw(gt, reg) & mask) == value

Otherwise, the wait will timeout afterslow_timeout_ms milliseconds.For atomic contextslow_timeout_ms must be zero andfast_timeout_usmust be not larger than 20,0000 microseconds.

This function is basically an MCR-friendly version of__intel_wait_for_register_fw(). Generally this function will only be usedon GAM registers which are a bit special --- although they’re MCR registers,reads (e.g., waiting for status updates) are always directed to the primaryinstance.

Note that this routine assumes the caller holds forcewake asserted, it isnot suitable for very long waits.

Context

Calls a function that takes and releases gt->mcr_lock

Return

0 if the register matches the desired condition, or -ETIMEDOUT.

Memory Management and Command Submission

This sections covers all things related to the GEM implementation in thei915 driver.

Intel GPU Basics

An Intel GPU has multiple engines. There are several engine types:

  • Render Command Streamer (RCS). An engine for rendering 3D andperforming compute.

  • Blitting Command Streamer (BCS). An engine for performing blitting and/orcopying operations.

  • Video Command Streamer. An engine used for video encoding and decoding. Alsosometimes called ‘BSD’ in hardware documentation.

  • Video Enhancement Command Streamer (VECS). An engine for video enhancement.Also sometimes called ‘VEBOX’ in hardware documentation.

  • Compute Command Streamer (CCS). An engine that has access to the media andGPGPU pipelines, but not the 3D pipeline.

  • Graphics Security Controller (GSCCS). A dedicated engine for internalcommunication with GSC controller on security related tasks likeHigh-bandwidth Digital Content Protection (HDCP), Protected Xe Path (PXP),and HuC firmware authentication.

The Intel GPU family is a family of integrated GPU’s using UnifiedMemory Access. For having the GPU “do work”, user space will feed theGPU batch buffers via one of the ioctlsDRM_IOCTL_I915_GEM_EXECBUFFER2orDRM_IOCTL_I915_GEM_EXECBUFFER2_WR. Most such batchbuffers willinstruct the GPU to perform work (for example rendering) and that workneeds memory from which to read and memory to which to write. All memoryis encapsulated within GEM buffer objects (usually created with the ioctlDRM_IOCTL_I915_GEM_CREATE). An ioctl providing a batchbuffer for the GPUto create will also list all GEM buffer objects that the batchbuffer readsand/or writes. For implementation details of memory management seeGEM BO Management Implementation Details.

The i915 driver allows user space to create a context via the ioctlDRM_IOCTL_I915_GEM_CONTEXT_CREATE which is identified by a 32-bitinteger. Such a context should be viewed by user-space as -loosely-analogous to the idea of a CPU process of an operating system. The i915driver guarantees that commands issued to a fixed context are to beexecuted so that writes of a previously issued command are seen byreads of following commands. Actions issued between different contexts(even if from the same file descriptor) are NOT given that guaranteeand the only way to synchronize across contexts (even from the samefile descriptor) is through the use of fences. At least as far back asGen4, also have that a context carries with it a GPU HW context;the HW context is essentially (most of at least) the state of a GPU.In addition to the ordering guarantees, the kernel will restore GPUstate via HW context when commands are issued to a context, this savesuser space the need to restore (most of at least) the GPU state at thestart of each batchbuffer. The non-deprecated ioctls to submit batchbufferwork can pass that ID (in the lower bits of drm_i915_gem_execbuffer2::rsvd1)to identify what context to use with the command.

The GPU has its own memory management and address space. The kerneldriver maintains the memory translation table for the GPU. For olderGPUs (i.e. those before Gen8), there is a single global such translationtable, a global Graphics Translation Table (GTT). For newer generationGPUs each context has its own translation table, called Per-ProcessGraphics Translation Table (PPGTT). Of important note, is that althoughPPGTT is named per-process it is actually per context. When user spacesubmits a batchbuffer, the kernel walks the list of GEM buffer objectsused by the batchbuffer and guarantees that not only is the memory ofeach such GEM buffer object resident but it is also present in the(PP)GTT. If the GEM buffer object is not yet placed in the (PP)GTT,then it is given an address. Two consequences of this are: the kernelneeds to edit the batchbuffer submitted to write the correct value ofthe GPU address when a GEM BO is assigned a GPU address and the kernelmight evict a different GEM BO from the (PP)GTT to make address roomfor another GEM BO. Consequently, the ioctls submitting a batchbufferfor execution also include a list of all locations within buffers thatrefer to GPU-addresses so that the kernel can edit the buffer correctly.This process is dubbed relocation.

Locking Guidelines

Note

This is a description of how the locking should be afterrefactoring is done. Does not necessarily reflect what the lockinglooks like while WIP.

  1. All locking rules and interface contracts with cross-driver interfaces(dma-buf, dma_fence) need to be followed.

  2. dma_resv will be the outermost lock (when needed) and ww_acquire_ctxis to be hoisted at highest level and passed down within i915_gem_ctxin the call chain

  3. While holding lru/memory manager (buddy, drm_mm, whatever) lockssystem memory allocations are not allowed

  4. Do not nest different lru/memory manager locks within each other.Take them in turn to update memory allocations, relying on the object’sdma_resv ww_mutex to serialize against other operations.

  5. The suggestion for lru/memory managers locks is that they are smallenough to be spinlocks.

  6. All features need to come with exhaustive kernel selftests and/orIGT tests when appropriate

  7. All LMEM uAPI paths need to be fully restartable (_interruptible()for all locks/waits/sleeps)

    • Error handling validation through signal injection.Still the best strategy we have for validating GEM uAPIcorner cases.Must be excessively used in the IGT, and we need to checkthat we really have full path coverage of all error cases.

    • -EDEADLK handling with ww_mutex

GEM BO Management Implementation Details

A VMA represents a GEM BO that is bound into an address space. Therefore, aVMA’s presence cannot be guaranteed before binding, or after unbinding theobject into/from the address space.

To make things as simple as possible (ie. no refcounting), a VMA’s lifetimewill always be <= an objects lifetime. So object refcounting should cover us.

Buffer Object Eviction

This section documents the interface functions for evicting bufferobjects to make space available in the virtual gpu address spaces. Notethat this is mostly orthogonal to shrinking buffer objects caches, whichhas the goal to make main memory (shared with the gpu through theunified memory architecture) available.

inti915_gem_evict_something(structi915_address_space*vm,structi915_gem_ww_ctx*ww,u64min_size,u64alignment,unsignedlongcolor,u64start,u64end,unsignedflags)

Evict vmas to make room for binding a new one

Parameters

structi915_address_space*vm

address space to evict from

structi915_gem_ww_ctx*ww

An optionalstructi915_gem_ww_ctx.

u64min_size

size of the desired free space

u64alignment

alignment constraint of the desired free space

unsignedlongcolor

color for the desired space

u64start

start (inclusive) of the range from which to evict objects

u64end

end (exclusive) of the range from which to evict objects

unsignedflags

additional flags to control the eviction algorithm

Description

This function will try to evict vmas until a free space satisfying therequirements is found. Callers must check first whether any such hole existsalready before calling this function.

This function is used by the object/vma binding code.

Since this function is only used to free up virtual address space it onlyignores pinned vmas, and not object where the backing storage itself ispinned. Hence obj->pages_pin_count does not protect against eviction.

To clarify: This is for freeing up virtual address space, not for freeingmemory in e.g. the shrinker.

inti915_gem_evict_for_node(structi915_address_space*vm,structi915_gem_ww_ctx*ww,structdrm_mm_node*target,unsignedintflags)

Evict vmas to make room for binding a new one

Parameters

structi915_address_space*vm

address space to evict from

structi915_gem_ww_ctx*ww

An optionalstructi915_gem_ww_ctx.

structdrm_mm_node*target

range (and color) to evict for

unsignedintflags

additional flags to control the eviction algorithm

Description

This function will try to evict vmas that overlap the target node.

To clarify: This is for freeing up virtual address space, not for freeingmemory in e.g. the shrinker.

inti915_gem_evict_vm(structi915_address_space*vm,structi915_gem_ww_ctx*ww,structdrm_i915_gem_object**busy_bo)

Evict all idle vmas from a vm

Parameters

structi915_address_space*vm

Address space to cleanse

structi915_gem_ww_ctx*ww

An optionalstructi915_gem_ww_ctx. If not NULL, i915_gem_evict_vmwill be able to evict vma’s locked by the ww as well.

structdrm_i915_gem_object**busy_bo

Optional pointer tostructdrm_i915_gem_object. If not NULL, thenin the eventi915_gem_evict_vm() is unable to trylock an object for eviction,thenbusy_bo will point to it. -EBUSY is also returned. The caller must dropthe vm->mutex, before trying again to acquire the contended lock. The calleralso owns a reference to the object.

Description

This function evicts all vmas from a vm.

This is used by the execbuf code as a last-ditch effort to defragment theaddress space.

To clarify: This is for freeing up virtual address space, not for freeingmemory in e.g. the shrinker.

Buffer Object Memory Shrinking

This section documents the interface function for shrinking memory usageof buffer object caches. Shrinking is used to make main memoryavailable. Note that this is mostly orthogonal to evicting bufferobjects, which has the goal to make space in gpu virtual address spaces.

unsignedlongi915_gem_shrink(structi915_gem_ww_ctx*ww,structdrm_i915_private*i915,unsignedlongtarget,unsignedlong*nr_scanned,unsignedintshrink)

Shrink buffer object caches

Parameters

structi915_gem_ww_ctx*ww

i915 gem ww acquire ctx, or NULL

structdrm_i915_private*i915

i915 device

unsignedlongtarget

amount of memory to make available, in pages

unsignedlong*nr_scanned

optional output for number of pages scanned (incremental)

unsignedintshrink

control flags for selecting cache types

Description

This function is the main interface to the shrinker. It will try to releaseup totarget pages of main memory backing storage from buffer objects.Selection of the specific caches can be done withflags. This is e.g. usefulwhen purgeable objects should be removed from caches preferentially.

Note that it’s not guaranteed that released amount is actually available asfree system memory - the pages might still be in-used to due to other reasons(like cpu mmaps) or the mm core has reused them before we could grab them.Therefore code that needs to explicitly shrink buffer objects caches (e.g. toavoid deadlocks in memory reclaim) must fall back toi915_gem_shrink_all().

Also note that any kind of pinning (both per-vma address space pins andbacking storage pins at the buffer object level) result in the shrinker codehaving to skip the object.

Return

The number of pages of backing storage actually released.

unsignedlongi915_gem_shrink_all(structdrm_i915_private*i915)

Shrink buffer object caches completely

Parameters

structdrm_i915_private*i915

i915 device

Description

This is a simple wrapper aroundi915_gem_shrink() to aggressively shrink allcaches completely. It also first waits for and retires all outstandingrequests to also be able to release backing storage for active objects.

This should only be used in code to intentionally quiescent the gpu or as alast-ditch effort when memory seems to have run out.

Return

The number of pages of backing storage actually released.

voidi915_gem_object_make_unshrinkable(structdrm_i915_gem_object*obj)

Hide the object from the shrinker. By default all object types that support shrinking(see IS_SHRINKABLE), will also make the object visible to the shrinker after allocating the system memory pages.

Parameters

structdrm_i915_gem_object*obj

The GEM object.

Description

This is typically used for special kernel internal objects that can’t beeasily processed by the shrinker, like if they are perma-pinned.

void__i915_gem_object_make_shrinkable(structdrm_i915_gem_object*obj)

Move the object to the tail of the shrinkable list. Objects on this list might be swapped out. Used with WILLNEED objects.

Parameters

structdrm_i915_gem_object*obj

The GEM object.

Description

DO NOT USE. This is intended to be called on very special objects that don’tyet have mm.pages, but are guaranteed to have potentially reclaimable pagesunderneath.

void__i915_gem_object_make_purgeable(structdrm_i915_gem_object*obj)

Move the object to the tail of the purgeable list. Objects on this list might be swapped out. Used with DONTNEED objects.

Parameters

structdrm_i915_gem_object*obj

The GEM object.

Description

DO NOT USE. This is intended to be called on very special objects that don’tyet have mm.pages, but are guaranteed to have potentially reclaimable pagesunderneath.

voidi915_gem_object_make_shrinkable(structdrm_i915_gem_object*obj)

Move the object to the tail of the shrinkable list. Objects on this list might be swapped out. Used with WILLNEED objects.

Parameters

structdrm_i915_gem_object*obj

The GEM object.

Description

MUST only be called on objects which have backing pages.

MUST be balanced with previous call toi915_gem_object_make_unshrinkable().

voidi915_gem_object_make_purgeable(structdrm_i915_gem_object*obj)

Move the object to the tail of the purgeable list. Used with DONTNEED objects. Unlike with shrinkable objects, the shrinker will attempt to discard the backing pages, instead of trying to swap them out.

Parameters

structdrm_i915_gem_object*obj

The GEM object.

Description

MUST only be called on objects which have backing pages.

MUST be balanced with previous call toi915_gem_object_make_unshrinkable().

Batchbuffer Parsing

Motivation:Certain OpenGL features (e.g. transform feedback, performance monitoring)require userspace code to submit batches containing commands such asMI_LOAD_REGISTER_IMM to access various registers. Unfortunately, somegenerations of the hardware will noop these commands in “unsecure” batches(which includes all userspace batches submitted via i915) even though thecommands may be safe and represent the intended programming model of thedevice.

The software command parser is similar in operation to the command parsingdone in hardware for unsecure batches. However, the software parser allowssome operations that would be noop’d by hardware, if the parser determinesthe operation is safe, and submits the batch as “secure” to prevent hardwareparsing.

Threats:At a high level, the hardware (and software) checks attempt to preventgranting userspace undue privileges. There are three categories of privilege.

First, commands which are explicitly defined as privileged or which shouldonly be used by the kernel driver. The parser rejects such commands

Second, commands which access registers. To support correct/enhanceduserspace functionality, particularly certain OpenGL extensions, the parserprovides a whitelist of registers which userspace may safely access

Third, commands which access privileged memory (i.e. GGTT, HWS page, etc).The parser always rejects such commands.

The majority of the problematic commands fall in the MI_* range, with only afew specific commands on each engine (e.g. PIPE_CONTROL and MI_FLUSH_DW).

Implementation:Each engine maintains tables of commands and registers which the parseruses in scanning batch buffers submitted to that engine.

Since the set of commands that the parser must check for is significantlysmaller than the number of commands supported, the parser tables contain onlythose commands required by the parser. This generally works because commandopcode ranges have standard command length encodings. So for commands thatthe parser does not need to check, it can easily skip them. This isimplemented via a per-engine length decoding vfunc.

Unfortunately, there are a number of commands that do not follow the standardlength encoding for their opcode range, primarily amongst the MI_* commands.To handle this, the parser provides a way to define explicit “skip” entriesin the per-engine command tables.

Other command table entries map fairly directly to high level categoriesmentioned above: rejected, register whitelist. The parser implements a numberof checks, including the privileged memory checks, via a general bitmaskingmechanism.

intintel_engine_init_cmd_parser(structintel_engine_cs*engine)

set cmd parser related fields for an engine

Parameters

structintel_engine_cs*engine

the engine to initialize

Description

Optionally initializes fields related to batch buffer command parsing in thestructintel_engine_cs based on whether the platform requires softwarecommand parsing.

voidintel_engine_cleanup_cmd_parser(structintel_engine_cs*engine)

clean up cmd parser related fields

Parameters

structintel_engine_cs*engine

the engine to clean up

Description

Releases any resources related to command parsing that may have beeninitialized for the specified engine.

intintel_engine_cmd_parser(structintel_engine_cs*engine,structi915_vma*batch,unsignedlongbatch_offset,unsignedlongbatch_length,structi915_vma*shadow,booltrampoline)

parse a batch buffer for privilege violations

Parameters

structintel_engine_cs*engine

the engine on which the batch is to execute

structi915_vma*batch

the batch buffer in question

unsignedlongbatch_offset

byte offset in the batch at which execution starts

unsignedlongbatch_length

length of the commands in batch_obj

structi915_vma*shadow

validated copy of the batch buffer in question

booltrampoline

true if we need to trampoline into privileged execution

Description

Parses the specified batch buffer looking for privilege violations asdescribed in the overview.

Return

non-zero if the parser finds violations or otherwise fails; -EACCESif the batch appears legal but should use hardware parsing

inti915_cmd_parser_get_version(structdrm_i915_private*dev_priv)

get the cmd parser version number

Parameters

structdrm_i915_private*dev_priv

i915 device private

Description

The cmd parser maintains a simple increasing integer version number suitablefor passing to userspace clients to determine what operations are permitted.

Return

the current version number of the cmd parser

User Batchbuffer Execution

structi915_gem_engines

A set of engines

Definition:

struct i915_gem_engines {    union {        struct list_head link;        struct rcu_head rcu;    };    struct i915_sw_fence fence;    struct i915_gem_context *ctx;    unsigned int num_engines;    struct intel_context *engines[];};

Members

{unnamed_union}

anonymous

link

Link in i915_gem_context::stale::engines

rcu

RCU to use when freeing

fence

Fence used for delayed destruction of engines

ctx

i915_gem_context backpointer

num_engines

Number of engines in this set

engines

Array of engines

structi915_gem_engines_iter

Iterator for an i915_gem_engines set

Definition:

struct i915_gem_engines_iter {    unsigned int idx;    const struct i915_gem_engines *engines;};

Members

idx

Index into i915_gem_engines::engines

engines

Engine set being iterated

enumi915_gem_engine_type

Describes the type of an i915_gem_proto_engine

Constants

I915_GEM_ENGINE_TYPE_INVALID

An invalid engine

I915_GEM_ENGINE_TYPE_PHYSICAL

A single physical engine

I915_GEM_ENGINE_TYPE_BALANCED

A load-balanced engine set

I915_GEM_ENGINE_TYPE_PARALLEL

A parallel engine set

structi915_gem_proto_engine

prototype engine

Definition:

struct i915_gem_proto_engine {    enum i915_gem_engine_type type;    struct intel_engine_cs *engine;    unsigned int num_siblings;    unsigned int width;    struct intel_engine_cs **siblings;    struct intel_sseu sseu;};

Members

type

Type of this engine

engine

Engine, for physical

num_siblings

Number of balanced or parallel siblings

width

Width of each sibling

siblings

Balanced siblings or num_siblings * width for parallel

sseu

Client-set SSEU parameters

Description

Thisstructdescribes an engine that a context may contain. Engineshave four types:

  • I915_GEM_ENGINE_TYPE_INVALID: Invalid engines can be created but theyshow up as a NULL in i915_gem_engines::engines[i] and any attempt touse them by the user results in -EINVAL. They are also useful duringproto-context construction because the client may create invalidengines and then set them up later as virtual engines.

  • I915_GEM_ENGINE_TYPE_PHYSICAL: A single physical engine, described byi915_gem_proto_engine::engine.

  • I915_GEM_ENGINE_TYPE_BALANCED: A load-balanced engine set, describedi915_gem_proto_engine::num_siblings and i915_gem_proto_engine::siblings.

  • I915_GEM_ENGINE_TYPE_PARALLEL: A parallel submission engine set, describedi915_gem_proto_engine::width, i915_gem_proto_engine::num_siblings, andi915_gem_proto_engine::siblings.

structi915_gem_proto_context

prototype context

Definition:

struct i915_gem_proto_context {    struct drm_i915_file_private *fpriv;    struct i915_address_space *vm;    unsigned long user_flags;    struct i915_sched_attr sched;    int num_user_engines;    struct i915_gem_proto_engine *user_engines;    struct intel_sseu legacy_rcs_sseu;    bool single_timeline;    bool uses_protected_content;    intel_wakeref_t pxp_wakeref;};

Members

fpriv

Client which creates the context

vm

Seei915_gem_context.vm

user_flags

Seei915_gem_context.user_flags

sched

Seei915_gem_context.sched

num_user_engines

Number of user-specified engines or -1

user_engines

User-specified engines

legacy_rcs_sseu

Client-set SSEU parameters for the legacy RCS

single_timeline

See Seei915_gem_context.syncobj

uses_protected_content

Seei915_gem_context.uses_protected_content

pxp_wakeref

Seei915_gem_context.pxp_wakeref

Description

Thestructi915_gem_proto_context represents the creation parameters forastructi915_gem_context. This is used to gather parameters providedeither through creation flags or via SET_CONTEXT_PARAM so that, when wecreate the final i915_gem_context, those parameters can be immutable.

The context uAPI allows for two methods of setting context parameters:SET_CONTEXT_PARAM and CONTEXT_CREATE_EXT_SETPARAM. The former isallowed to be called at any time while the later happens as part ofGEM_CONTEXT_CREATE. When these were initially added, Currently,everything settable via one is settable via the other. While someparams are fairly simple and setting them on a live context is harmlesssuch the context priority, others are far trickier such as the VM or theset of engines. To avoid some truly nasty race conditions, we don’tallow setting the VM or the set of engines on live contexts.

The way we dealt with this without breaking older userspace that setsthe VM or engine set via SET_CONTEXT_PARAM is to delay the creation ofthe actual context until after the client is done configuring it withSET_CONTEXT_PARAM. From the perspective of the client, it has the sameu32 context ID the whole time. From the perspective of i915, however,it’s an i915_gem_proto_context right up until the point where we attemptto do something which the proto-context can’t handle at which point thereal context gets created.

This is accomplished via a little xarray dance. When GEM_CONTEXT_CREATEis called, we create a proto-context, reserve a slot in context_xa butleave it NULL, the proto-context in the corresponding slot inproto_context_xa. Then, whenever we go to look up a context, we firstcheck context_xa. If it’s there, we return the i915_gem_context andwe’re done. If it’s not, we look in proto_context_xa and, if we find itthere, we create the actual context and kill the proto-context.

At the time we made this change (April, 2021), we did a fairly completeaudit of existing userspace to ensure this wouldn’t break anything:

  • Mesa/i965 didn’t use the engines or VM APIs at all

  • Mesa/ANV used the engines API but via CONTEXT_CREATE_EXT_SETPARAM anddidn’t use the VM API.

  • Mesa/iris didn’t use the engines or VM APIs at all

  • The open-source compute-runtime didn’t yet use the engines API butdid use the VM API via SET_CONTEXT_PARAM. However, CONTEXT_SETPARAMwas always the second ioctl on that context, immediately followingGEM_CONTEXT_CREATE.

  • The media driver sets engines and bonding/balancing viaSET_CONTEXT_PARAM. However, CONTEXT_SETPARAM to set the VM wasalways the second ioctl on that context, immediately followingGEM_CONTEXT_CREATE and setting engines immediately followed that.

In order for this dance to work properly, any modification to ani915_gem_proto_context that is exposed to the client viadrm_i915_file_private::proto_context_xa must be guarded bydrm_i915_file_private::proto_context_lock. The exception is when aproto-context has not yet been exposed such as when handlingCONTEXT_CREATE_SET_PARAM during GEM_CONTEXT_CREATE.

structi915_gem_context

client state

Definition:

struct i915_gem_context {    struct drm_i915_private *i915;    struct drm_i915_file_private *file_priv;    struct i915_gem_engines  *engines;    struct mutex engines_mutex;    struct drm_syncobj *syncobj;    struct i915_address_space *vm;    struct pid *pid;    struct list_head link;    struct i915_drm_client *client;    struct list_head client_link;    struct kref ref;    struct work_struct release_work;    struct rcu_head rcu;    unsigned long user_flags;#define UCONTEXT_NO_ERROR_CAPTURE       1;#define UCONTEXT_BANNABLE               2;#define UCONTEXT_RECOVERABLE            3;#define UCONTEXT_PERSISTENCE            4;#define UCONTEXT_LOW_LATENCY            5;    unsigned long flags;#define CONTEXT_CLOSED                  0;#define CONTEXT_USER_ENGINES            1;    bool uses_protected_content;    intel_wakeref_t pxp_wakeref;    struct mutex mutex;    struct i915_sched_attr sched;    atomic_t guilty_count;    atomic_t active_count;    unsigned long hang_timestamp[2];#define CONTEXT_FAST_HANG_JIFFIES (120 * HZ);    u8 remap_slice;    struct radix_tree_root handles_vma;    struct mutex lut_mutex;    char name[TASK_COMM_LEN + 8];    struct {        spinlock_t lock;        struct list_head engines;    } stale;};

Members

i915

i915 device backpointer

file_priv

owning file descriptor

engines

User defined engines for this context

Various uAPI offer the ability to lookup up anindex from this array to select an engine operate on.

Multiple logically distinct instances of the same enginemay be defined in the array, as well as composite virtualengines.

Execbuf uses the I915_EXEC_RING_MASK as an index into thisarray to select which HW context + engine to execute on. Forthe default array, the user_ring_map[] is used to translatethe legacy uABI onto the appropriate index (e.g. bothI915_EXEC_DEFAULT and I915_EXEC_RENDER select the samecontext, and I915_EXEC_BSD is weird). For a user definedarray, execbuf uses I915_EXEC_RING_MASK as a plain index.

User defined by I915_CONTEXT_PARAM_ENGINE (when theCONTEXT_USER_ENGINES flag is set).

engines_mutex

guards writes to engines

syncobj

Shared timeline syncobj

When the SHARED_TIMELINE flag is set on context creation, weemulate a single timeline across all engines using this syncobj.For every execbuffer2 call, this syncobj is used as both an in-and out-fence. Unlike the real intel_timeline, this doesn’tprovide perfect atomic in-order guarantees if the client raceswith itself by calling execbuffer2 twice concurrently. However,if userspace races with itself, that’s not likely to yield well-defined results anyway so we choose to not care.

vm

unique address space (GTT)

In full-ppgtt mode, each context has its own address space ensuringcomplete separation of one client from all others.

In other modes, this is a NULL pointer with the expectation thatthe caller uses the shared global GTT.

pid

process id of creator

Note that who created the context may not be the principle user,as the context may be shared across a local socket. However,that should only affect the default context, all contexts createdexplicitly by the client are expected to be isolated.

link

place withdrm_i915_private.context_list

client

structi915_drm_client

client_link

for linking ontoi915_drm_client.ctx_list

ref

reference count

A reference to a context is held by both the client who created itand on each request submitted to the hardware using the request(to ensure the hardware has access to the state until it hasfinished all pending writes). Seei915_gem_context_get() andi915_gem_context_put() for access.

release_work

Work item for deferred cleanup, sincei915_gem_context_put() tends tobe called from hardirq context.

FIXME: The only real reason for this isi915_gem_engines.fence, allother callers are from process context and need at most some mildshuffling to pull thei915_gem_context_put() call out of a spinlock.

rcu

rcu_head for deferred freeing.

user_flags

small set of booleans controlled by the user

flags

small set of booleans

uses_protected_content

context uses PXP-encrypted objects.

This flag can only be set at ctx creation time and it’s immutable forthe lifetime of the context. See I915_CONTEXT_PARAM_PROTECTED_CONTENTin uapi/drm/i915_drm.h for more info on setting restrictions andexpected behaviour of marked contexts.

pxp_wakeref

wakeref to keep the device awake when PXP is in use

PXP sessions are invalidated when the device is suspended, which inturns invalidates all contexts and objects using it. To keep theflow simple, we keep the device awake when contexts using PXP objectsare in use. It is expected that the userspace application only usesPXP when the display is on, so taking a wakeref here shouldn’t worsenour power metrics.

mutex

guards everything that isn’t engines or handles_vma

sched

scheduler parameters

guilty_count

How many times this context has caused a GPU hang.

active_count

How many times this context was active during a GPUhang, but did not cause it.

hang_timestamp

The last time(s) this context caused a GPU hang

remap_slice

Bitmask of cache lines that need remapping

handles_vma

rbtree to look up our context specific obj/vma forthe user handle. (user handles are per fd, but the binding isper vm, which may be one per context or shared with the global GTT)

lut_mutex

Locks handles_vma

name

arbitrary name, used for user debug

A name is constructed for the context from the creator’s processname, pid and user handle in order to uniquely identify thecontext in messages.

stale

tracks stale engines to be destroyed

Description

Thestructi915_gem_context represents the combined view of the driver andlogical hardware state for a particular client.

Userspace submits commands to be executed on the GPU as an instructionstream within a GEM object we call a batchbuffer. This instructions mayrefer to other GEM objects containing auxiliary state such as kernels,samplers, render targets and even secondary batchbuffers. Userspace doesnot know where in the GPU memory these objects reside and so before thebatchbuffer is passed to the GPU for execution, those addresses in thebatchbuffer and auxiliary objects are updated. This is known as relocation,or patching. To try and avoid having to relocate each object on the nextexecution, userspace is told the location of those objects in this pass,but this remains just a hint as the kernel may choose a new location forany object in the future.

At the level of talking to the hardware, submitting a batchbuffer for theGPU to execute is to add content to a buffer from which the HWcommand streamer is reading.

  1. Add a command to load the HW context. For Logical Ring Contexts, i.e.Execlists, this command is not placed on the same buffer as theremaining items.

  2. Add a command to invalidate caches to the buffer.

  3. Add a batchbuffer start command to the buffer; the start command isessentially a token together with the GPU address of the batchbufferto be executed.

  4. Add a pipeline flush to the buffer.

  5. Add a memory write command to the buffer to record when the GPUis done executing the batchbuffer. The memory write writes theglobal sequence number of the request,i915_request::global_seqno;the i915 driver uses the current value in the register to determineif the GPU has completed the batchbuffer.

  6. Add a user interrupt command to the buffer. This command instructsthe GPU to issue an interrupt when the command, pipeline flush andmemory write are completed.

  7. Inform the hardware of the additional commands added to the buffer(by updating the tail pointer).

Processing an execbuf ioctl is conceptually split up into a few phases.

  1. Validation - Ensure all the pointers, handles and flags are valid.

  2. Reservation - Assign GPU address space for every object

  3. Relocation - Update any addresses to point to the final locations

  4. Serialisation - Order the request with respect to its dependencies

  5. Construction - Construct a request to execute the batchbuffer

  6. Submission (at some point in the future execution)

Reserving resources for the execbuf is the most complicated phase. Weneither want to have to migrate the object in the address space, nor dowe want to have to update any relocations pointing to this object. Ideally,we want to leave the object where it is and for all the existing relocationsto match. If the object is given a new address, or if userspace thinks theobject is elsewhere, we have to parse all the relocation entries and updatethe addresses. Userspace can set the I915_EXEC_NO_RELOC flag to hint thatall the target addresses in all of its objects match the value in therelocation entries and that they all match the presumed offsets given by thelist of execbuffer objects. Using this knowledge, we know that if we haven’tmoved any buffers, all the relocation entries are valid and we can skipthe update. (If userspace is wrong, the likely outcome is an impromptu GPUhang.) The requirement for using I915_EXEC_NO_RELOC are:

The addresses written in the objects must match the correspondingreloc.presumed_offset which in turn must match the correspondingexecobject.offset.

Any render targets written to in the batch must be flagged withEXEC_OBJECT_WRITE.

To avoid stalling, execobject.offset should match the currentaddress of that object within the active context.

The reservation is done is multiple phases. First we try and keep anyobject already bound in its current location - so as long as meets theconstraints imposed by the new execbuffer. Any object left unbound after thefirst pass is then fitted into any available idle space. If an object doesnot fit, all objects are removed from the reservation and the process rerunafter sorting the objects into a priority order (more difficult to fitobjects are tried first). Failing that, the entire VM is cleared and we tryto fit the execbuf once last time before concluding that it simply will notfit.

A small complication to all of this is that we allow userspace not only tospecify an alignment and a size for the object in the address space, butwe also allow userspace to specify the exact offset. This objects aresimpler to place (the location is known a priori) all we have to do is makesure the space is available.

Once all the objects are in place, patching up the buried pointers to pointto the final locations is a fairly simple job of walking over the relocationentry arrays, looking up the right address and rewriting the value intothe object. Simple! ... The relocation entries are stored in user memoryand so to access them we have to copy them into a local buffer. That copyhas to avoid taking any pagefaults as they may lead back to a GEM objectrequiring the vm->mutex (i.e. recursive deadlock). So once again we splitthe relocation into multiple passes. First we try to do everything within anatomic context (avoid the pagefaults) which requires that we never wait. Ifwe detect that we may wait, or if we need to fault, then we have to fallbackto a slower path. The slowpath has to drop the mutex. (Can you hear alarmbells yet?) Dropping the mutex means that we lose all the state we havebuilt up so far for the execbuf and we must reset any global data. However,we do leave the objects pinned in their final locations - which is apotential issue for concurrent execbufs. Once we have left the mutex, we canallocate and copy all the relocation entries into a large array at ourleisure, reacquire the mutex, reclaim all the objects and other state andthen proceed to update any incorrect addresses with the objects.

As we process the relocation entries, we maintain a record of whether theobject is being written to. Using NORELOC, we expect userspace to providethis information instead. We also check whether we can skip the relocationby comparing the expected value inside the relocation entry with the target’sfinal address. If they differ, we have to map the current object and rewritethe 4 or 8 byte pointer within.

Serialising an execbuf is quite simple according to the rules of the GEMABI. Execution within each context is ordered by the order of submission.Writes to any GEM object are in order of submission and are exclusive. Readsfrom a GEM object are unordered with respect to other reads, but ordered bywrites. A write submitted after a read cannot occur before the read, andsimilarly any read submitted after a write cannot occur before the write.Writes are ordered between engines such that only one write occurs at anytime (completing any reads beforehand) - using semaphores where availableand CPU serialisation otherwise. Other GEM access obey the same rules, anywrite (either via mmaps using set-domain, or via pwrite) must flush all GPUreads before starting, and any read (either using set-domain or pread) mustflush all GPU writes before starting. (Note we only employ a barrier before,we currently rely on userspace not concurrently starting a new executionwhilst reading or writing to an object. This may be an advantage or notdepending on how much you trust userspace not to shoot themselves in thefoot.) Serialisation may just result in the request being inserted intoa DAG awaiting its turn, but most simple is to wait on the CPU untilall dependencies are resolved.

After all of that, is just a matter of closing the request and handing it tothe hardware (well, leaving it in a queue to be executed). However, we alsooffer the ability for batchbuffers to be run with elevated privileges sothat they access otherwise hidden registers. (Used to adjust L3 cache etc.)Before any batch is given extra privileges we first must check that itcontains no nefarious instructions, we check that each instruction is fromour whitelist and all registers are also from an allowed list. We firstcopy the user’s batchbuffer to a shadow (so that the user doesn’t haveaccess to it, either by the CPU or GPU as we scan it) and then parse eachinstruction. If everything is ok, we set a flag telling the hardware to runthe batchbuffer in trusted mode, otherwise the ioctl is rejected.

Scheduling

structi915_sched_engine

scheduler engine

Definition:

struct i915_sched_engine {    struct kref ref;    spinlock_t lock;    struct list_head requests;    struct list_head hold;    struct tasklet_struct tasklet;    struct i915_priolist default_priolist;    int queue_priority_hint;    struct rb_root_cached queue;    bool no_priolist;    void *private_data;    void (*destroy)(struct kref *kref);    bool (*disabled)(struct i915_sched_engine *sched_engine);    void (*kick_backend)(const struct i915_request *rq, int prio);    void (*bump_inflight_request_prio)(struct i915_request *rq, int prio);    void (*retire_inflight_request_prio)(struct i915_request *rq);    void (*schedule)(struct i915_request *request, const struct i915_sched_attr *attr);};

Members

ref

reference count of schedule engine object

lock

protects requests in priority lists, requests, hold andtasklet while running

requests

list of requests inflight on this schedule engine

hold

list of ready requests, but on hold

tasklet

softirq tasklet for submission

default_priolist

priority list for I915_PRIORITY_NORMAL

queue_priority_hint

Highest pending priority.

When we add requests into the queue, or adjust the priority ofexecuting requests, we compute the maximum priority of thosepending requests. We can then use this value to determine ifwe need to preempt the executing requests to service the queue.However, since the we may have recorded the priority of an inflightrequest we wanted to preempt but since completed, at the time ofdequeuing the priority hint may no longer may match the highestavailable request priority.

queue

queue of requests, in priority lists

no_priolist

priority lists disabled

private_data

private data of the submission backend

destroy

destroy schedule engine / cleanup in backend

disabled

check if backend has disabled submission

kick_backend

kick backend after a request’s priority has changed

bump_inflight_request_prio

update priority of an inflight request

retire_inflight_request_prio

indicate request is retired topriority tracking

schedule

adjust priority of request

Call when the priority on a request has changed and it and itsdependencies may need rescheduling. Note the request itself maynot be ready to run!

Description

A schedule engine represents a submission queue with different prioritybands. It contains all the common state (relative to the backend) to queue,track, and submit a request.

This object at the moment is quite i915 specific but will transition into acontainer for the drm_gpu_scheduler plus a few other variables once the i915is integrated with the DRM scheduler.

Logical Rings, Logical Ring Contexts and Execlists

Motivation:GEN8 brings an expansion of the HW contexts: “Logical Ring Contexts”.These expanded contexts enable a number of new abilities, especially“Execlists” (also implemented in this file).

One of the main differences with the legacy HW contexts is that logicalring contexts incorporate many more things to the context’s state, likePDPs or ringbuffer control registers:

The reason why PDPs are included in the context is straightforward: asPPGTTs (per-process GTTs) are actually per-context, having the PDPscontained there mean you don’t need to do a ppgtt->switch_mm yourself,instead, the GPU will do it for you on the context switch.

But, what about the ringbuffer control registers (head, tail, etc..)?shouldn’t we just need a set of those per engine command streamer? This iswhere the name “Logical Rings” starts to make sense: by virtualizing therings, the engine cs shifts to a new “ring buffer” with every contextswitch. When you want to submit a workload to the GPU you: A) choose yourcontext, B) find its appropriate virtualized ring, C) write commands to itand then, finally, D) tell the GPU to switch to that context.

Instead of the legacy MI_SET_CONTEXT, the way you tell the GPU to switchto a contexts is via a context execution list, ergo “Execlists”.

LRC implementation:Regarding the creation of contexts, we have:

  • One global default context.

  • One local default context for each opened fd.

  • One local extra context for each context create ioctl call.

Now that ringbuffers belong per-context (and not per-engine, like before)and that contexts are uniquely tied to a given engine (and not reusable,like before) we need:

  • One ringbuffer per-engine inside each context.

  • One backing object per-engine inside each context.

The global default context starts its life with these new objects fullyallocated and populated. The local default context for each opened fd ismore complex, because we don’t know at creation time which engine is goingto use them. To handle this, we have implemented a deferred creation of LRcontexts:

The local context starts its life as a hollow or blank holder, that onlygets populated for a given engine once we receive an execbuffer. If lateron we receive another execbuffer ioctl for the same context but a differentengine, we allocate/populate a new ringbuffer and context backing object andso on.

Finally, regarding local contexts created using the ioctl call: as they areonly allowed with the render ring, we can allocate & populate them rightaway (no need to defer anything, at least for now).

Execlists implementation:Execlists are the new method by which, on gen8+ hardware, workloads aresubmitted for execution (as opposed to the legacy, ringbuffer-based, method).This method works as follows:

When a request is committed, its commands (the BB start and any leading ortrailing commands, like the seqno breadcrumbs) are placed in the ringbufferfor the appropriate context. The tail pointer in the hardware context is notupdated at this time, but instead, kept by the driver in the ringbufferstructure. A structure representing this request is added to a request queuefor the appropriate engine: this structure contains a copy of the context’stail after the request was written to the ring buffer and a pointer to thecontext itself.

If the engine’s request queue was empty before the request was added, thequeue is processed immediately. Otherwise the queue will be processed duringa context switch interrupt. In any case, elements on the queue will get sent(in pairs) to the GPU’s ExecLists Submit Port (ELSP, for short) with aglobally unique 20-bits submission ID.

When execution of a request completes, the GPU updates the context statusbuffer with a context complete event and generates a context switch interrupt.During the interrupt handling, the driver examines the events in the buffer:for each context complete event, if the announced ID matches that on the headof the request queue, then that request is retired and removed from the queue.

After processing, if any requests were retired and the queue is not emptythen a new execution list can be submitted. The two requests at the front ofthe queue are next to be submitted but since a context may not occur twice inan execution list, if subsequent requests have the same ID as the first thenthe two requests must be combined. This is done simply by discarding requestsat the head of the queue until either only one requests is left (in which casewe use a NULL second context) or the first two requests have unique IDs.

By always executing the first two requests in the queue the driver ensuresthat the GPU is kept as busy as possible. In the case where a single contextcompletes but a second context is still executing, the request for this secondcontext will be at the head of the queue when we remove the first one. Thisrequest will then be resubmitted along with a new request for a different context,which will cause the hardware to continue executing the second request and queuethe new request (the GPU detects the condition of a context getting preemptedwith the same context and optimizes the context switch flow by not doingpreemption, but just sampling the new tail pointer).

Global GTT views

Background and previous state

Historically objects could exists (be bound) in global GTT space only assingular instances with a view representing all of the object’s backing pagesin a linear fashion. This view will be called a normal view.

To support multiple views of the same object, where the number of mappedpages is not equal to the backing store, or where the layout of the pagesis not linear, concept of a GGTT view was added.

One example of an alternative view is a stereo display driven by a singleimage. In this case we would have a framebuffer looking like this(2x2 pages):

1234

Above would represent a normal GGTT view as normally mapped for GPU or CPUrendering. In contrast, fed to the display engine would be an alternativeview which could look something like this:

12123434

In this example both the size and layout of pages in the alternative view isdifferent from the normal view.

Implementation and usage

GGTT views are implemented using VMAs and are distinguished viaenumi915_gtt_view_type andstructi915_gtt_view.

A new flavour of core GEM functions which work with GGTT bound objects wereadded with the _ggtt_ infix, and sometimes with _view postfix to avoidrenaming in large amounts of code. They take thestructi915_gtt_viewparameter encapsulating all metadata required to implement a view.

As a helper for callers which are only interested in the normal view,globally const i915_gtt_view_normal singleton instance exists. All old coreGEM API functions, the ones not taking the view parameter, are operating on,or with the normal GGTT view.

Code wanting to add or use a new GGTT view needs to:

  1. Add a newenumwith a suitable name.

  2. Extend the metadata in the i915_gtt_view structure if required.

  3. Add support toi915_get_vma_pages().

New views are required to build a scatter-gather table from within thei915_get_vma_pages function. This table is stored in the vma.gtt_view andexists for the lifetime of an VMA.

Core API is designed to have copy semantics which means that passed instructi915_gtt_view does not need to be persistent (left around aftercalling the core API functions).

inti915_gem_gtt_reserve(structi915_address_space*vm,structi915_gem_ww_ctx*ww,structdrm_mm_node*node,u64size,u64offset,unsignedlongcolor,unsignedintflags)

reserve a node in an address_space (GTT)

Parameters

structi915_address_space*vm

thestructi915_address_space

structi915_gem_ww_ctx*ww

An optionalstructi915_gem_ww_ctx.

structdrm_mm_node*node

thestructdrm_mm_node (typically i915_vma.node)

u64size

how much space to allocate inside the GTT,must be #I915_GTT_PAGE_SIZE aligned

u64offset

where to insert inside the GTT,must be #I915_GTT_MIN_ALIGNMENT aligned, and the node(offset +size) must fit within the address space

unsignedlongcolor

color to apply to node, if this node is not from a VMA,color must be #I915_COLOR_UNEVICTABLE

unsignedintflags

control search and eviction behaviour

Description

i915_gem_gtt_reserve() tries to insert thenode at the exactoffset insidethe address space (usingsize andcolor). If thenode does not fit, ittries to evict any overlapping nodes from the GTT, including anyneighbouring nodes if the colors do not match (to ensure guard pages betweendiffering domains). Seei915_gem_evict_for_node() for the gory detailson the eviction algorithm. #PIN_NONBLOCK may used to prevent waiting onevicting active overlapping objects, and any overlapping node that is pinnedor marked as unevictable will also result in failure.

Return

0 on success, -ENOSPC if no suitable hole is found, -EINTR ifasked to wait for eviction and interrupted.

inti915_gem_gtt_insert(structi915_address_space*vm,structi915_gem_ww_ctx*ww,structdrm_mm_node*node,u64size,u64alignment,unsignedlongcolor,u64start,u64end,unsignedintflags)

insert a node into an address_space (GTT)

Parameters

structi915_address_space*vm

thestructi915_address_space

structi915_gem_ww_ctx*ww

An optionalstructi915_gem_ww_ctx.

structdrm_mm_node*node

thestructdrm_mm_node (typically i915_vma.node)

u64size

how much space to allocate inside the GTT,must be #I915_GTT_PAGE_SIZE aligned

u64alignment

required alignment of starting offset, may be 0 butif specified, this must be a power-of-two and at least#I915_GTT_MIN_ALIGNMENT

unsignedlongcolor

color to apply to node

u64start

start of any range restriction inside GTT (0 for all),must be #I915_GTT_PAGE_SIZE aligned

u64end

end of any range restriction inside GTT (U64_MAX for all),must be #I915_GTT_PAGE_SIZE aligned if not U64_MAX

unsignedintflags

control search and eviction behaviour

Description

i915_gem_gtt_insert() first searches for an available hole into whichis can insert the node. The hole address is aligned toalignment anditssize must then fit entirely within the [start,end] bounds. Thenodes on either side of the hole must matchcolor, or else a guard pagewill be inserted between the two nodes (or the node evicted). If nosuitable hole is found, first a victim is randomly selected and testedfor eviction, otherwise then the LRU list of objects within the GTTis scanned to find the first set of replacement nodes to create the hole.Those old overlapping nodes are evicted from the GTT (and so must berebound before any future use). Any node that is currently pinned cannotbe evicted (seei915_vma_pin()). Similar if the node’s VMA is currentlyactive and #PIN_NONBLOCK is specified, that node is also skipped whensearching for an eviction candidate. Seei915_gem_evict_something() forthe gory details on the eviction algorithm.

Return

0 on success, -ENOSPC if no suitable hole is found, -EINTR ifasked to wait for eviction and interrupted.

GTT Fences and Swizzling

voidi915_vma_revoke_fence(structi915_vma*vma)

force-remove fence for a VMA

Parameters

structi915_vma*vma

vma to map linearly (not through a fence reg)

Description

This function force-removes any fence from the given object, which is usefulif the kernel wants to do untiled GTT access.

inti915_vma_pin_fence(structi915_vma*vma)

set up fencing for a vma

Parameters

structi915_vma*vma

vma to map through a fence reg

Description

When mapping objects through the GTT, userspace wants to be able to writeto them without having to worry about swizzling if the object is tiled.This function walks the fence regs looking for a free one forobj,stealing one if it can’t find any.

It then sets up the reg based on the object’s properties: address, pitchand tiling format.

For an untiled surface, this removes any existing fence.

Return

0 on success, negative error code on failure.

structi915_fence_reg*i915_reserve_fence(structi915_ggtt*ggtt)

Reserve a fence for vGPU

Parameters

structi915_ggtt*ggtt

Global GTT

Description

This function walks the fence regs looking for a free one and removeit from the fence_list. It is used to reserve fence for vGPU to use.

voidi915_unreserve_fence(structi915_fence_reg*fence)

Reclaim a reserved fence

Parameters

structi915_fence_reg*fence

the fence reg

Description

This function add a reserved fence register from vGPU to the fence_list.

voidintel_ggtt_restore_fences(structi915_ggtt*ggtt)

restore fence state

Parameters

structi915_ggtt*ggtt

Global GTT

Description

Restore the hw fence state to match the software tracking again, to be calledafter a gpu reset and on resume. Note that on runtime suspend we only cancelthe fences, to be reacquired by the user later.

voiddetect_bit_6_swizzle(structi915_ggtt*ggtt)

detect bit 6 swizzling pattern

Parameters

structi915_ggtt*ggtt

Global GGTT

Description

Detects bit 6 swizzling of address lookup between IGD access and CPUaccess through main memory.

voidi915_gem_object_do_bit_17_swizzle(structdrm_i915_gem_object*obj,structsg_table*pages)

fixup bit 17 swizzling

Parameters

structdrm_i915_gem_object*obj

i915 GEM buffer object

structsg_table*pages

the scattergather list of physical pages

Description

This function fixes up the swizzling in case any page frame number for thisobject has changed in bit 17 since that state has been saved withi915_gem_object_save_bit_17_swizzle().

This is called when pinning backing storage again, since the kernel is freeto move unpinned backing storage around (either by directly moving pages orby swapping them out and back in again).

voidi915_gem_object_save_bit_17_swizzle(structdrm_i915_gem_object*obj,structsg_table*pages)

save bit 17 swizzling

Parameters

structdrm_i915_gem_object*obj

i915 GEM buffer object

structsg_table*pages

the scattergather list of physical pages

Description

This function saves the bit 17 of each page frame number so that swizzlingcan be fixed up later on withi915_gem_object_do_bit_17_swizzle(). This mustbe called before the backing storage can be unpinned.

Global GTT Fence Handling

Important to avoid confusions: “fences” in the i915 driver are not executionfences used to track command completion but hardware detiler objects whichwrap a given range of the global GTT. Each platform has only a fairly limitedset of these objects.

Fences are used to detile GTT memory mappings. They’re also connected to thehardware frontbuffer render tracking and hence interact with frontbuffercompression. Furthermore on older platforms fences are required for tiledobjects used by the display engine. They can also be used by the renderengine - they’re required for blitter commands and are optional for rendercommands. But on gen4+ both display (with the exception of fbc) and renderinghave their own tiling state bits and don’t need fences.

Also note that fences only support X and Y tiling and hence can’t be used forthe fancier new tiling formats like W, Ys and Yf.

Finally note that because fences are such a restricted resource they’redynamically associated with objects. Furthermore fence state is committed tothe hardware lazily to avoid unnecessary stalls on gen2/3. Therefore code mustexplicitly calli915_gem_object_get_fence() to synchronize fencing statusfor cpu access. Also note that some code wants an unfenced view, for thosecases the fence can be removed forcefully withi915_gem_object_put_fence().

Internally these functions will synchronize with userspace access by removingCPU ptes into GTT mmaps (not the GTT ptes themselves) as needed.

Hardware Tiling and Swizzling Details

The idea behind tiling is to increase cache hit rates by rearrangingpixel data so that a group of pixel accesses are in the same cacheline.Performance improvement from doing this on the back/depth buffer are onthe order of 30%.

Intel architectures make this somewhat more complicated, though, byadjustments made to addressing of data when the memory is in interleavedmode (matched pairs of DIMMS) to improve memory bandwidth.For interleaved memory, the CPU sends every sequential 64 bytesto an alternate memory channel so it can get the bandwidth from both.

The GPU also rearranges its accesses for increased bandwidth to interleavedmemory, and it matches what the CPU does for non-tiled. However, when tiledit does it a little differently, since one walks addresses not just in theX direction but also Y. So, along with alternating channels when bit6 of the address flips, it also alternates when other bits flip -- Bits 9(every 512 bytes, an X tile scanline) and 10 (every two X tile scanlines)are common to both the 915 and 965-class hardware.

The CPU also sometimes XORs in higher bits as well, to improvebandwidth doing strided access like we do so frequently in graphics. Thisis called “Channel XOR Randomization” in the MCH documentation. The resultis that the CPU is XORing in either bit 11 or bit 17 to bit 6 of its addressdecode.

All of this bit 6 XORing has an effect on our memory management,as we need to make sure that the 3d driver can correctly address objectcontents.

If we don’t have interleaved memory, all tiling is safe and no swizzling isrequired.

When bit 17 is XORed in, we simply refuse to tile at all. Bit17 is not just a page offset, so as we page an object out and back in,individual pages in it will have different bit 17 addresses, resulting ineach 64 bytes being swapped with its neighbor!

Otherwise, if interleaved, we have to tell the 3d driver what the addressswizzling it needs to do is, since it’s writing with the CPU to the pages(bit 6 and potentially bit 11 XORed in), and the GPU is reading from thepages (bit 6, 9, and 10 XORed in), resulting in a cumulative bit swizzlingrequired by the CPU of XORing in bit 6, 9, 10, and potentially 11, in orderto match what the GPU expects.

Object Tiling IOCTLs

u32i915_gem_fence_size(structdrm_i915_private*i915,u32size,unsignedinttiling,unsignedintstride)

required global GTT size for a fence

Parameters

structdrm_i915_private*i915

i915 device

u32size

object size

unsignedinttiling

tiling mode

unsignedintstride

tiling stride

Description

Return the required global GTT size for a fence (view of a tiled object),taking into account potential fence register mapping.

u32i915_gem_fence_alignment(structdrm_i915_private*i915,u32size,unsignedinttiling,unsignedintstride)

required global GTT alignment for a fence

Parameters

structdrm_i915_private*i915

i915 device

u32size

object size

unsignedinttiling

tiling mode

unsignedintstride

tiling stride

Description

Return the required global GTT alignment for a fence (a view of a tiledobject), taking into account potential fence register mapping.

inti915_gem_set_tiling_ioctl(structdrm_device*dev,void*data,structdrm_file*file)

IOCTL handler to set tiling mode

Parameters

structdrm_device*dev

DRM device

void*data

data pointer for the ioctl

structdrm_file*file

DRM file for the ioctl call

Description

Sets the tiling mode of an object, returning the required swizzling ofbit 6 of addresses in the object.

Called by the user via ioctl.

Return

Zero on success, negative errno on failure.

inti915_gem_get_tiling_ioctl(structdrm_device*dev,void*data,structdrm_file*file)

IOCTL handler to get tiling mode

Parameters

structdrm_device*dev

DRM device

void*data

data pointer for the ioctl

structdrm_file*file

DRM file for the ioctl call

Description

Returns the current tiling mode and required bit 6 swizzling for the object.

Called by the user via ioctl.

Return

Zero on success, negative errno on failure.

i915_gem_set_tiling_ioctl() andi915_gem_get_tiling_ioctl() is the userspaceinterface to declare fence register requirements.

In principle GEM doesn’t care at all about the internal data layout of anobject, and hence it also doesn’t care about tiling or swizzling. There’s twoexceptions:

  • For X and Y tiling the hardware provides detilers for CPU access, so calledfences. Since there’s only a limited amount of them the kernel must managethese, and therefore userspace must tell the kernel the object tiling if itwants to use fences for detiling.

  • On gen3 and gen4 platforms have a swizzling pattern for tiled objects whichdepends upon the physical page frame number. When swapping such objects thepage frame number might change and the kernel must be able to fix this upand hence now the tiling. Note that on a subset of platforms withasymmetric memory channel population the swizzling pattern changes in anunknown way, and for those the kernel simply forbids swapping completely.

Since neither of this applies for new tiling layouts on modern platforms likeW, Ys and Yf tiling GEM only allows object tiling to be set to X or Y tiled.Anything else can be handled in userspace entirely without the kernel’sinvolvement.

Protected Objects

PXP (Protected Xe Path) is a feature available in Gen12 and newer platforms.It allows execution and flip to display of protected (i.e. encrypted)objects. The SW support is enabled via the CONFIG_DRM_I915_PXP kconfig.

Objects can opt-in to PXP encryption at creation time via theI915_GEM_CREATE_EXT_PROTECTED_CONTENT create_ext flag. For objects to becorrectly protected they must be used in conjunction with a context createdwith the I915_CONTEXT_PARAM_PROTECTED_CONTENT flag. See the documentationof those two uapi flags for details and restrictions.

Protected objects are tied to a pxp session; currently we only support onesession, which i915 manages and whose index is available in the uapi(I915_PROTECTED_CONTENT_DEFAULT_SESSION) for use in instructions targetingprotected objects.The session is invalidated by the HW when certain events occur (e.g.suspend/resume). When this happens, all the objects that were used with thesession are marked as invalid and all contexts marked as using protectedcontent are banned. Any further attempt at using them in an execbuf call isrejected, while flips are converted to black frames.

Some of the PXP setup operations are performed by the Management Engine,which is handled by the mei driver; communication between i915 and mei isperformed via the mei_pxp component module.

structintel_pxp

pxp state

Definition:

struct intel_pxp {    struct intel_gt *ctrl_gt;    bool platform_cfg_is_bad;    u32 kcr_base;    struct gsccs_session_resources {        u64 host_session_handle;        struct intel_context *ce;        struct i915_vma *pkt_vma;        void *pkt_vaddr;        struct i915_vma *bb_vma;        void *bb_vaddr;    } gsccs_res;    struct i915_pxp_component *pxp_component;    struct device_link *dev_link;    bool pxp_component_added;    struct intel_context *ce;    struct mutex arb_mutex;    bool arb_is_valid;    u32 key_instance;    struct mutex tee_mutex;    struct {        struct drm_i915_gem_object *obj;        void *vaddr;    } stream_cmd;    bool hw_state_invalidated;    bool irq_enabled;    struct completion termination;    struct work_struct session_work;    u32 session_events;#define PXP_TERMINATION_REQUEST  BIT(0);#define PXP_TERMINATION_COMPLETE BIT(1);#define PXP_INVAL_REQUIRED       BIT(2);#define PXP_EVENT_TYPE_IRQ       BIT(3);};

Members

ctrl_gt

pointer to the tile that owns the controls for PXP subsystem assets thatthe VDBOX, the KCR engine (and GSC CS depending on the platform)

platform_cfg_is_bad

used to track if any prior arb session creation resultedin a failure that was caused by a platform configuration issue, meaning thatfailure will not get resolved without a change to the platform (not kernel)such as BIOS configuration, firwmware update, etc. This bool gets reflected whenGET_PARAM:I915_PARAM_PXP_STATUS is called.

kcr_base

base mmio offset for the KCR engine which is different on legacy platformsvs newer platforms where the KCR is inside the media-tile.

gsccs_res

resources for request submission for platforms that have a GSC engine.

pxp_component

i915_pxp_componentstructof the bound mei_pxpmodule. Only set and cleared inside component bind/unbind functions,which are protected bytee_mutex.

dev_link

Enforce module relationship for power management ordering.

pxp_component_added

track if the pxp component has been added.Set and cleared in tee init and fini functions respectively.

ce

kernel-owned context used for PXP operations

arb_mutex

protects arb session start

arb_is_valid

tracks arb session status.After a teardown, the arb session can still be in play on the HWeven if the keys are gone, so we can’t rely on the HW state of thesession to know if it’s valid and need to track the status in SW.

key_instance

tracks which key instance we’re on, so we can use itto determine if an object was created using the current key or aprevious one.

tee_mutex

protects the tee channel binding and messaging.

stream_cmd

LMEM obj used to send stream PXP commands to the GSC

hw_state_invalidated

if the HW perceives an attack on the integrityof the encryption it will invalidate the keys and expect SW tore-initialize the session. We keep track of this state to make surewe only re-start the arb session when required.

irq_enabled

tracks the status of the kcr irqs

termination

tracks the status of a pending termination. Onlyre-initialized under gt->irq_lock and completed insession_work.

session_work

worker that manages session events.

session_events

pending session events, protected with gt->irq_lock.

Microcontrollers

Starting from gen9, three microcontrollers are available on the HW: thegraphics microcontroller (GuC), the HEVC/H.265 microcontroller (HuC) and thedisplay microcontroller (DMC). The driver is responsible for loading thefirmwares on the microcontrollers; the GuC and HuC firmwares are transferredto WOPCM using the DMA engine, while the DMC firmware is written through MMIO.

WOPCM

WOPCM Layout

The layout of the WOPCM will be fixed after writing to GuC WOPCM size andoffset registers whose values are calculated and determined by HuC/GuCfirmware size and set of hardware requirements/restrictions as shown below:

  +=========> +====================+ <== WOPCM Top  ^           |  HW contexts RSVD  |  |     +===> +====================+ <== GuC WOPCM Top  |     ^     |                    |  |     |     |                    |  |     |     |                    |  |    GuC    |                    |  |   WOPCM   |                    |  |    Size   +--------------------+WOPCM   |     |    GuC FW RSVD     |  |     |     +--------------------+  |     |     |   GuC Stack RSVD   |  |     |     +------------------- +  |     v     |   GuC WOPCM RSVD   |  |     +===> +====================+ <== GuC WOPCM base  |           |     WOPCM RSVD     |  |           +------------------- + <== HuC Firmware Top  v           |      HuC FW        |  +=========> +====================+ <== WOPCM Base

GuC accessible WOPCM starts at GuC WOPCM base and ends at GuC WOPCM top.The top part of the WOPCM is reserved for hardware contexts (e.g. RC6context).

GuC

The GuC is a microcontroller inside the GT HW, introduced in gen9. The GuC isdesigned to offload some of the functionality usually performed by the hostdriver; currently the main operations it can take care of are:

  • Authentication of the HuC, which is required to fully enable HuC usage.

  • Low latency graphics context scheduling (a.k.a. GuC submission).

  • GT Power management.

The enable_guc module parameter can be used to select which of thoseoperations to enable within GuC. Note that not all the operations aresupported on all gen9+ platforms.

Enabling the GuC is not mandatory and therefore the firmware is only loadedif at least one of the operations is selected. However, not loading the GuCmight result in the loss of some features that do require the GuC (currentlyjust the HuC, but more are expected to land in the future).

structintel_guc

Top level structure of GuC.

Definition:

struct intel_guc {    struct intel_uc_fw fw;    struct intel_guc_log log;    struct intel_guc_ct ct;    struct intel_guc_slpc slpc;    struct intel_guc_state_capture *capture;    struct dentry *dbgfs_node;    struct i915_sched_engine *sched_engine;    struct i915_request *stalled_request;    enum {        STALL_NONE,        STALL_REGISTER_CONTEXT,        STALL_MOVE_LRC_TAIL,        STALL_ADD_REQUEST,    } submission_stall_reason;    spinlock_t irq_lock;    unsigned int msg_enabled_mask;    atomic_t outstanding_submission_g2h;    struct xarray tlb_lookup;    u32 serial_slot;    u32 next_seqno;    struct {        bool enabled;        void (*reset)(struct intel_guc *guc);        void (*enable)(struct intel_guc *guc);        void (*disable)(struct intel_guc *guc);    } interrupts;    struct {        spinlock_t lock;        struct ida guc_ids;        int num_guc_ids;        unsigned long *guc_ids_bitmap;        struct list_head guc_id_list;        unsigned int guc_ids_in_use;        struct list_head destroyed_contexts;        struct work_struct destroyed_worker;        struct work_struct reset_fail_worker;        intel_engine_mask_t reset_fail_mask;        unsigned int sched_disable_delay_ms;        unsigned int sched_disable_gucid_threshold;    } submission_state;    bool submission_supported;    bool submission_selected;    bool submission_initialized;    struct intel_uc_fw_ver submission_version;    bool rc_supported;    bool rc_selected;    struct i915_vma *ads_vma;    struct iosys_map ads_map;    u32 ads_regset_size;    u32 ads_regset_count[I915_NUM_ENGINES];    struct guc_mmio_reg *ads_regset;    u32 ads_golden_ctxt_size;    u32 ads_waklv_size;    u32 ads_capture_size;    struct i915_vma *lrc_desc_pool_v69;    void *lrc_desc_pool_vaddr_v69;    struct xarray context_lookup;    u32 params[GUC_CTL_MAX_DWORDS];    struct {        u32 base;        unsigned int count;        enum forcewake_domains fw_domains;    } send_regs;    i915_reg_t notify_reg;    u32 mmio_msg;    struct mutex send_mutex;    struct {        spinlock_t lock;        u64 gt_stamp;        unsigned long ping_delay;        struct delayed_work work;        u32 shift;        unsigned long last_stat_jiffies;    } timestamp;    struct work_struct dead_guc_worker;    unsigned long last_dead_guc_jiffies;#ifdef CONFIG_DRM_I915_SELFTEST;    int number_guc_id_stolen;    u32 fast_response_selftest;#endif;};

Members

fw

the GuC firmware

log

sub-structure containing GuC log related data and objects

ct

the command transport communication channel

slpc

sub-structure containing SLPC related data and objects

capture

the error-state-capture module’s data and objects

dbgfs_node

debugfs node

sched_engine

Global engine used to submit requests to GuC

stalled_request

if GuC can’t process a request for any reason, wesave it until GuC restarts processing. No other request can besubmitted until the stalled request is processed.

submission_stall_reason

reason why submission is stalled

irq_lock

protects GuC irq state

msg_enabled_mask

mask of events that are processed when receivingan INTEL_GUC_ACTION_DEFAULT G2H message.

outstanding_submission_g2h

number of outstanding GuC to Hostresponses related to GuC submission, used to determine if the GT isidle

tlb_lookup

xarray to store all pending TLB invalidation requests

serial_slot

id to the initial waiter created in tlb_lookup,which is used only when failed to allocate new waiter.

next_seqno

the next id (sequence number) to allocate.

interrupts

pointers to GuC interrupt-managing functions.

submission_state

sub-structure for submission state protected bysingle lock

submission_state.lock

protects everything insubmission_state, ce->guc_id.id, and ce->guc_id.refwhen transitioning in and out of zero

submission_state.guc_ids

used to allocate newguc_ids, single-lrc

submission_state.num_guc_ids

Number of guc_ids, selftestfeature to be able to reduce this number while testing.

submission_state.guc_ids_bitmap

used to allocatenew guc_ids, multi-lrc

submission_state.guc_id_list

list of intel_contextwith valid guc_ids but no refs

submission_state.guc_ids_in_use

Number single-lrcguc_ids in use

submission_state.destroyed_contexts

list of contextswaiting to be destroyed (deregistered with the GuC)

submission_state.destroyed_worker

worker to deregistercontexts, need as we need to take a GT PM reference andcan’t from destroy function as it might be in an atomiccontext (no sleeping)

submission_state.reset_fail_worker

worker to triggera GT reset after an engine reset fails

submission_state.reset_fail_mask

mask of engines thatfailed to reset

submission_state.sched_disable_delay_ms

scheduledisable delay, in ms, for contexts

submission_state.sched_disable_gucid_threshold

threshold of min remaining available guc_ids beforewe start bypassing the schedule disable delay

submission_supported

tracks whether we support GuC submission onthe current platform

submission_selected

tracks whether the user enabled GuC submission

submission_initialized

tracks whether GuC submission has been initialised

submission_version

Submission API version of the currently loaded firmware

rc_supported

tracks whether we support GuC rc on the current platform

rc_selected

tracks whether the user enabled GuC rc

ads_vma

object allocated to hold the GuC ADS

ads_map

contents of the GuC ADS

ads_regset_size

size of the save/restore regsets in the ADS

ads_regset_count

number of save/restore registers in the ADS foreach engine

ads_regset

save/restore regsets in the ADS

ads_golden_ctxt_size

size of the golden contexts in the ADS

ads_waklv_size

size of workaround KLVs

ads_capture_size

size of register lists in the ADS used for error capture

lrc_desc_pool_v69

object allocated to hold the GuC LRC descriptor pool

lrc_desc_pool_vaddr_v69

contents of the GuC LRC descriptor pool

context_lookup

used to resolve intel_context from guc_id, if acontext is present in this structure it is registered with the GuC

params

Control params for fw initialization

send_regs

GuC’s FW specific registers used for sending MMIO H2G

notify_reg

register used to send interrupts to the GuC FW

mmio_msg

notification bitmask that the GuC writes in one of itsregisters when the CT channel is disabled, to be processed when thechannel is back up.

send_mutex

used to serialize the intel_guc_send actions

timestamp

GT timestamp object that stores a copy of the timestampand adjusts it for overflow using a worker.

timestamp.lock

Lock protecting the below fields andthe engine stats.

timestamp.gt_stamp

64-bit extended value of the GTtimestamp.

timestamp.ping_delay

Period for polling the GTtimestamp for overflow.

timestamp.work

Periodic work to adjust GT timestamp,engine and context usage for overflows.

timestamp.shift

Right shift value for the gpm timestamp

timestamp.last_stat_jiffies

jiffies at last actualstats collection time. We use this timestamp to ensurewe don’t oversample the stats because runtime powermanagement events can trigger stats collection at muchhigher rates than required.

dead_guc_worker

Asynchronous worker thread for forcing a GuC reset.Specifically used when the G2H handler wants to issue a reset. Resetsrequire flushing the G2H queue. So, the G2H processing itself must nottrigger a reset directly. Instead, go via this worker.

last_dead_guc_jiffies

timestamp of previous ‘dead guc’ occurrenceused to prevent a fundamentally broken system from continuouslyreloading the GuC.

number_guc_id_stolen

The number of guc_ids that have been stolen

fast_response_selftest

Backdoor to CT handler for fast response selftest

Description

It handles firmware loading and manages client pool. intel_guc owns ani915_sched_engine for submission.

u32intel_guc_ggtt_offset(structintel_guc*guc,structi915_vma*vma)

Get and validate the GGTT offset ofvma

Parameters

structintel_guc*guc

intel_guc structure.

structi915_vma*vma

i915 graphics virtual memory area.

Description

GuC does not allow any gfx GGTT address that falls into range[0, ggtt.pin_bias), which is reserved for Boot ROM, SRAM and WOPCM.Currently, in order to exclude [0, ggtt.pin_bias) address space fromGGTT, all gfx objects used by GuC are allocated withintel_guc_allocate_vma()and pinned with PIN_OFFSET_BIAS along with the value of ggtt.pin_bias.

Return

GGTT offset of thevma.

GuC Firmware Layout

The GuC/HuC firmware layout looks like this:

+======================================================================+|  Firmware blob                                                       |+===============+===============+============+============+============+|  CSS header   |     uCode     |  RSA key   |  modulus   |  exponent  |+===============+===============+============+============+============+ <-header size->                 <---header size continued -----------> <--- size ----------------------------------------------------------->                                 <-key size->                                              <-mod size->                                                           <-exp size->

The firmware may or may not have modulus key and exponent data. The header,uCode and RSA signature are must-have components that will be used by driver.Length of each components, which is all in dwords, can be found in header.In the case that modulus and exponent are not present in fw, a.k.a truncatedimage, the length value still appears in header.

Driver will do some basic fw size validation based on the following rules:

  1. Header, uCode and RSA are must-have components.

  2. All firmware components, if they present, are in the sequence illustratedin the layout table above.

  3. Length info of each component can be found in header, in dwords.

  4. Modulus and exponent key are not required by driver. They may not appearin fw. So driver will load a truncated firmware in this case.

Starting from DG2, the HuC is loaded by the GSC instead of i915. The GSCfirmware performs all the required integrity checks, we just need to checkthe version. Note that the header for GSC-managed blobs is different from theCSS used for dma-loaded firmwares.

GuC Memory Management

GuC can’t allocate any memory for its own usage, so all the allocations mustbe handled by the host driver. GuC accesses the memory via the GGTT, with theexception of the top and bottom parts of the 4GB address space, which areinstead re-mapped by the GuC HW to memory location of the FW itself (WOPCM)or other parts of the HW. The driver must take care not to place objects thatthe GuC is going to access in these reserved ranges. The layout of the GuCaddress space is shown below:

   +===========> +====================+ <== FFFF_FFFF   ^             |      Reserved      |   |             +====================+ <== GUC_GGTT_TOP   |             |                    |   |             |        DRAM        |  GuC            |                    |Address    +===> +====================+ <== GuC ggtt_pin_bias Space     ^     |                    |   |       |     |                    |   |      GuC    |        GuC         |   |     WOPCM   |       WOPCM        |   |      Size   |                    |   |       |     |                    |   v       v     |                    |   +=======+===> +====================+ <== 0000_0000

The lower part of GuC Address Space [0, ggtt_pin_bias) is mapped to GuC WOPCMwhile upper part of GuC Address Space [ggtt_pin_bias, GUC_GGTT_TOP) is mappedto DRAM. The value of the GuC ggtt_pin_bias is the GuC WOPCM size.

structi915_vma*intel_guc_allocate_vma(structintel_guc*guc,u32size)

Allocate a GGTT VMA for GuC usage

Parameters

structintel_guc*guc

the guc

u32size

size of area to allocate (both virtual space and memory)

Description

This is a wrapper to create an object for use with the GuC. In order touse it inside the GuC, an object needs to be pinned lifetime, so we allocateboth some backing storage and a range inside the Global GTT. We must pinit in the GGTT somewhere other than than [0, GUC ggtt_pin_bias) because thatrange is reserved inside GuC.

Return

A i915_vma if successful, otherwise an ERR_PTR.

GuC-specific firmware loader

intintel_guc_fw_upload(structintel_guc*guc)

load GuC uCode to device

Parameters

structintel_guc*guc

intel_guc structure

Description

Called fromintel_uc_init_hw() during driver load, resume from sleep andafter a GPU reset.

The firmware image should have already been fetched into memory, so onlycheck that fetch succeeded, and then transfer the image to the h/w.

Return

non-zero code on error

GuC-based command submission

The Scratch registers:There are 16 MMIO-based registers start from 0xC180. The kernel driver writesa value to the action register (SOFT_SCRATCH_0) along with any data. It thentriggers an interrupt on the GuC via another register write (0xC4C8).Firmware writes a success/fail code back to the action register afterprocesses the request. The kernel driver polls waiting for this update andthen proceeds.

Command Transport buffers (CTBs):Covered in detail in other sections but CTBs (Host to GuC - H2G, GuC to Host- G2H) are a message interface between the i915 and GuC.

Context registration:Before a context can be submitted it must be registered with the GuC via aH2G. A unique guc_id is associated with each context. The context is eitherregistered at request creation time (normal operation) or at submission time(abnormal operation, e.g. after a reset).

Context submission:The i915 updates the LRC tail value in memory. The i915 must enable thescheduling of the context within the GuC for the GuC to actually consider it.Therefore, the first time a disabled context is submitted we use a scheduleenable H2G, while follow up submissions are done via the context submit H2G,which informs the GuC that a previously enabled context has new workavailable.

Context unpin:To unpin a context a H2G is used to disable scheduling. When thecorresponding G2H returns indicating the scheduling disable operation hascompleted it is safe to unpin the context. While a disable is in flight itisn’t safe to resubmit the context so a fence is used to stall all futurerequests of that context until the G2H is returned. Because this interactionwith the GuC takes a non-zero amount of time we delay the disabling ofscheduling after the pin count goes to zero by a configurable period of time(see SCHED_DISABLE_DELAY_MS). The thought is this gives the user a window oftime to resubmit something on the context before doing this costly operation.This delay is only done if the context isn’t closed and the guc_id usage isless than a threshold (see NUM_SCHED_DISABLE_GUC_IDS_THRESHOLD).

Context deregistration:Before a context can be destroyed or if we steal its guc_id we mustderegister the context with the GuC via H2G. If stealing the guc_id it isn’tsafe to submit anything to this guc_id until the deregister completes so afence is used to stall all requests associated with this guc_id until thecorresponding G2H returns indicating the guc_id has been deregistered.

submission_state.guc_ids:Unique number associated with private GuC context data passed in duringcontext registration / submission / deregistration. 64k available. Simple idais used for allocation.

Stealing guc_ids:If no guc_ids are available they can be stolen from another context atrequest creation time if that context is unpinned. If a guc_id can’t be foundwe punt this problem to the user as we believe this is near impossible to hitduring normal use cases.

Locking:In the GuC submission code we have 3 basic spin locks which protecteverything. Details about each below.

sched_engine->lockThis is the submission lock for all contexts that share an i915 scheduleengine (sched_engine), thus only one of the contexts which share asched_engine can be submitting at a time. Currently only one sched_engine isused for all of GuC submission but that could change in the future.

guc->submission_state.lockGlobal lock for GuC submission state. Protects guc_ids and destroyed contextslist.

ce->guc_state.lockProtects everything under ce->guc_state. Ensures that a context is in thecorrect state before issuing a H2G. e.g. We don’t issue a schedule disableon a disabled context (bad idea), we don’t issue a schedule enable when aschedule disable is in flight, etc... Also protects list of inflight requestson the context and the priority management state. Lock is individual to eachcontext.

Lock ordering rules:sched_engine->lock -> ce->guc_state.lockguc->submission_state.lock -> ce->guc_state.lock

Reset races:When a full GT reset is triggered it is assumed that some G2H responses toH2Gs can be lost as the GuC is also reset. Losing these G2H can prove to befatal as we do certain operations upon receiving a G2H (e.g. destroycontexts, release guc_ids, etc...). When this occurs we can scrub thecontext state and cleanup appropriately, however this is quite racey.To avoid races, the reset code must disable submission before scrubbing forthe missing G2H, while the submission code must check for submission beingdisabled and skip sending H2Gs and updating context states when it is. Bothsides must also make sure to hold the relevant locks.

GuC ABI

HXG Message

All messages exchanged with GuC are defined using 32 bit dwords.First dword is treated as a message header. Remaining dwords are optional.

Bits

Description

0

31

ORIGIN - originator of the message
  • GUC_HXG_ORIGIN_HOST = 0

  • GUC_HXG_ORIGIN_GUC = 1

30:28

TYPE - message type
  • GUC_HXG_TYPE_REQUEST = 0

  • GUC_HXG_TYPE_EVENT = 1

  • GUC_HXG_TYPE_FAST_REQUEST = 2

  • GUC_HXG_TYPE_NO_RESPONSE_BUSY = 3

  • GUC_HXG_TYPE_NO_RESPONSE_RETRY = 5

  • GUC_HXG_TYPE_RESPONSE_FAILURE = 6

  • GUC_HXG_TYPE_RESPONSE_SUCCESS = 7

27:0

AUX - auxiliary data (depends on TYPE)

1

31:0

PAYLOAD - optional payload (depends on TYPE)

...

n

31:0

HXG Request

TheHXG Request message should be used to initiate synchronous activityfor which confirmation or return data is expected.

The recipient of this message shall useHXG Response,HXG FailureorHXG Retry message as a definite reply, and may useHXG Busymessage as a intermediate reply.

Format ofDATA0 and allDATAn fields depends on theACTION code.

Bits

Description

0

31

ORIGIN

30:28

TYPE =GUC_HXG_TYPE_REQUEST

27:16

DATA0 - request data (depends on ACTION)

15:0

ACTION - requested action code

1

31:0

DATAn - optional data (depends on ACTION)

...

n

31:0

HXG Fast Request

TheHXG Request message should be used to initiate asynchronous activityfor which confirmation or return data is not expected.

If confirmation is required thenHXG Request shall be used instead.

The recipient of this message may only useHXG Failure message if it wasunable to accept this request (like invalid data).

Format ofHXG Fast Request message is same asHXG Request exceptTYPE.

Bits

Description

0

31

ORIGIN - seeHXG Message

30:28

TYPE =GUC_HXG_TYPE_FAST_REQUEST

27:16

DATA0 - seeHXG Request

15:0

ACTION - seeHXG Request

...

DATAn - seeHXG Request

HXG Event

TheHXG Event message should be used to initiate asynchronous activitythat does not involves immediate confirmation nor data.

Format ofDATA0 and allDATAn fields depends on theACTION code.

Bits

Description

0

31

ORIGIN

30:28

TYPE =GUC_HXG_TYPE_EVENT

27:16

DATA0 - event data (depends on ACTION)

15:0

ACTION - event action code

1

31:0

DATAn - optional event data (depends on ACTION)

...

n

31:0

HXG Busy

TheHXG Busy message may be used to acknowledge reception of theHXG Requestmessage if the recipient expects that it processing will be longer than defaulttimeout.

TheCOUNTER field may be used as a progress indicator.

Bits

Description

0

31

ORIGIN

30:28

TYPE =GUC_HXG_TYPE_NO_RESPONSE_BUSY

27:0

COUNTER - progress indicator

HXG Retry

TheHXG Retry message should be used by recipient to indicate that theHXG Request message was dropped and it should be resent again.

TheREASON field may be used to provide additional information.

Bits

Description

0

31

ORIGIN

30:28

TYPE =GUC_HXG_TYPE_NO_RESPONSE_RETRY

27:0

REASON - reason for retry
  • GUC_HXG_RETRY_REASON_UNSPECIFIED = 0

HXG Failure

TheHXG Failure message shall be used as a reply to theHXG Requestmessage that could not be processed due to an error.

Bits

Description

0

31

ORIGIN

30:28

TYPE =GUC_HXG_TYPE_RESPONSE_FAILURE

27:16

HINT - additional error hint

15:0

ERROR - error/result code

HXG Response

TheHXG Response message shall be used as a reply to theHXG Requestmessage that was successfully processed without an error.

Bits

Description

0

31

ORIGIN

30:28

TYPE =GUC_HXG_TYPE_RESPONSE_SUCCESS

27:0

DATA0 - data (depends on ACTION fromHXG Request)

1

31:0

DATAn - data (depends on ACTION fromHXG Request)

...

n

31:0

GuC MMIO based communication

The MMIO based communication between Host and GuC relies on specialhardware registers which format could be defined by the software(so called scratch registers).

Each MMIO based message, both Host to GuC (H2G) and GuC to Host (G2H)messages, which maximum length depends on number of available scratchregisters, is directly written into those scratch registers.

For Gen9+, there are 16 software scratch registers 0xC180-0xC1B8,but no H2G command takes more than 4 parameters and the GuC firmwareitself uses an 4-element array to store the H2G message.

For Gen11+, there are additional 4 registers 0x190240-0x19024C, whichare, regardless on lower count, preferred over legacy ones.

The MMIO based communication is mainly used during driver initializationphase to setup theCTB based communication that will be used afterwards.

MMIO HXG Message

Format of the MMIO messages follows definitions ofHXG Message.

Bits

Description

0

31:0

[EmbeddedHXG Message]

...

n

31:0

CT Buffer

Circular buffer used to sendCTB Message

CTB Descriptor

Bits

Description

0

31:0

HEAD - offset (in dwords) to the last dword that wasread from theCT Buffer.It can only be updated by the receiver.

1

31:0

TAIL - offset (in dwords) to the last dword that waswritten to theCT Buffer.It can only be updated by the sender.

2

31:0

STATUS - status of the CTB

  • GUC_CTB_STATUS_NO_ERROR = 0 (normal operation)

  • GUC_CTB_STATUS_OVERFLOW = 1 (head/tail too large)

  • GUC_CTB_STATUS_UNDERFLOW = 2 (truncated message)

  • GUC_CTB_STATUS_MISMATCH = 4 (head/tail modified)

  • GUC_CTB_STATUS_UNUSED = 8 (CTB is not in use)

...

RESERVED = MBZ

15

31:0

RESERVED = MBZ

CTB Message

Bits

Description

0

31:16

FENCE - message identifier

15:12

FORMAT - format of the CTB message

11:8

RESERVED

7:0

NUM_DWORDS - length of the CTB message (w/o header)

1

31:0

optional (depends on FORMAT)

...

n

31:0

CTB HXG Message

Bits

Description

0

31:16

FENCE

15:12

FORMAT =GUC_CTB_FORMAT_HXG

11:8

RESERVED = MBZ

7:0

NUM_DWORDS = length (in dwords) of the embedded HXG message

1

31:0

[EmbeddedHXG Message]

...

n

31:0

CTB based communication

The CTB (command transport buffer) communication between Host and GuCis based on u32 data stream written to the shared buffer. One buffer canbe used to transmit data only in one direction (one-directional channel).

Current status of the each buffer is stored in the buffer descriptor.Buffer descriptor holds tail and head fields that represents active datastream. The tail field is updated by the data producer (sender), and headfield is updated by the data consumer (receiver):

+------------+| DESCRIPTOR |          +=================+============+========++============+          |                 | MESSAGE(s) |        || address    |--------->+=================+============+========++------------+| head       |          ^-----head--------^+------------+| tail       |          ^---------tail-----------------^+------------+| size       |          ^---------------size--------------------^+------------+

Each message in data stream starts with the single u32 treated as a header,followed by optional set of u32 data that makes message specific payload:

+------------+---------+---------+---------+|         MESSAGE                          |+------------+---------+---------+---------+|   msg[0]   |   [1]   |   ...   |  [n-1]  |+------------+---------+---------+---------+|   MESSAGE  |       MESSAGE PAYLOAD       |+   HEADER   +---------+---------+---------+|            |    0    |   ...   |    n    |+======+=====+=========+=========+=========+| 31:16| code|         |         |         |+------+-----+         |         |         ||  15:5|flags|         |         |         |+------+-----+         |         |         ||   4:0|  len|         |         |         |+------+-----+---------+---------+---------+             ^-------------len-------------^

The message header consists of:

  • len, indicates length of the message payload (in u32)

  • code, indicates message code

  • flags, holds various bits to control message handling

HOST2GUC_SELF_CFG

This message is used by Host KMD to setup of theGuC Self Config KLVs.

This message must be sent asMMIO HXG Message.

Bits

Description

0

31

ORIGIN =GUC_HXG_ORIGIN_HOST

30:28

TYPE =GUC_HXG_TYPE_REQUEST

27:16

DATA0 = MBZ

15:0

ACTION =GUC_ACTION_HOST2GUC_SELF_CFG = 0x0508

1

31:16

KLV_KEY - KLV key, seeGuC Self Config KLVs

15:0

KLV_LEN - KLV length

  • 32 bit KLV = 1

  • 64 bit KLV = 2

2

31:0

VALUE32 - Bits 31-0 of the KLV value

3

31:0

VALUE64 - Bits 63-32 of the KLV value (KLV_LEN = 2)

Bits

Description

0

31

ORIGIN =GUC_HXG_ORIGIN_GUC

30:28

TYPE =GUC_HXG_TYPE_RESPONSE_SUCCESS

27:0

DATA0 =NUM - 1 if KLV was parsed, 0 if not recognized

HOST2GUC_CONTROL_CTB

This H2G action allows Vf Host to enable or disable H2G and G2HCT Buffer.

This message must be sent asMMIO HXG Message.

Bits

Description

0

31

ORIGIN =GUC_HXG_ORIGIN_HOST

30:28

TYPE =GUC_HXG_TYPE_REQUEST

27:16

DATA0 = MBZ

15:0

ACTION =GUC_ACTION_HOST2GUC_CONTROL_CTB = 0x4509

1

31:0

CONTROL - controlCTB based communication

  • GUC_CTB_CONTROL_DISABLE = 0

  • GUC_CTB_CONTROL_ENABLE = 1

Bits

Description

0

31

ORIGIN =GUC_HXG_ORIGIN_GUC

30:28

TYPE =GUC_HXG_TYPE_RESPONSE_SUCCESS

27:0

DATA0 = MBZ

GuC KLV

Bits

Description

0

31:16

KEY - KLV key identifier

15:0

LEN - length of VALUE (in 32bit dwords)

1

31:0

VALUE - actual value of the KLV (format depends on KEY)

...

n

31:0

GuC Self Config KLVs

GuC KLV keys available for use withHOST2GUC_SELF_CFG.

GUC_KLV_SELF_CFG_H2G_CTB_ADDR0x0902

Refers to 64 bit Global Gfx address of H2GCT Buffer.Should be above WOPCM address but below APIC base address for native mode.

GUC_KLV_SELF_CFG_H2G_CTB_DESCRIPTOR_ADDR0x0903

Refers to 64 bit Global Gfx address of H2GCTB Descriptor.Should be above WOPCM address but below APIC base address for native mode.

GUC_KLV_SELF_CFG_H2G_CTB_SIZE0x0904

Refers to size of H2GCT Buffer in bytes.Should be a multiple of 4K.

GUC_KLV_SELF_CFG_G2H_CTB_ADDR0x0905

Refers to 64 bit Global Gfx address of G2HCT Buffer.Should be above WOPCM address but below APIC base address for native mode.

GUC_KLV_SELF_CFG_G2H_CTB_DESCRIPTOR_ADDR0x0906

Refers to 64 bit Global Gfx address of G2HCTB Descriptor.Should be above WOPCM address but below APIC base address for native mode.

GUC_KLV_SELF_CFG_G2H_CTB_SIZE0x0907

Refers to size of G2HCT Buffer in bytes.Should be a multiple of 4K.

HuC

The HuC is a dedicated microcontroller for usage in media HEVC (HighEfficiency Video Coding) operations. Userspace can directly use the firmwarecapabilities by adding HuC specific commands to batch buffers.

The kernel driver is only responsible for loading the HuC firmware andtriggering its security authentication. This is done differently dependingon the platform:

  • older platforms (from Gen9 to most Gen12s): the load is performed via DMAand the authentication via GuC

  • DG2: load and authentication are both performed via GSC.

  • MTL and newer platforms: the load is performed via DMA (same as withnot-DG2 older platforms), while the authentication is done in 2-steps,a first auth for clear-media workloads via GuC and a second one for allworkloads via GSC.

On platforms where the GuC does the authentication, to correctly do so theHuC binary must be loaded before the GuC one.Loading the HuC is optional; however, not using the HuC might negativelyimpact power usage and/or performance of media workloads, depending on theuse-cases.HuC must be reloaded on events that cause the WOPCM to lose its contents(S3/S4, FLR); on older platforms the HuC must also be reloaded on GuC/GTreset, while on newer ones it will survive that.

Seehttps://github.com/intel/media-driver for the latest details on HuCfunctionality.

intintel_huc_auth(structintel_huc*huc,enumintel_huc_authentication_typetype)

Authenticate HuC uCode

Parameters

structintel_huc*huc

intel_huc structure

enumintel_huc_authentication_typetype

authentication type (via GuC or via GSC)

Description

Called after HuC and GuC firmware loading duringintel_uc_init_hw().

This function invokes the GuC action to authenticate the HuC firmware,passing the offset of the RSA signature tointel_guc_auth_huc(). It thenwaits for up to 50ms for firmware verification ACK.

HuC Memory Management

Similarly to the GuC, the HuC can’t do any memory allocations on its own,with the difference being that the allocations for HuC usage are handled bythe userspace driver instead of the kernel one. The HuC accesses the memoryvia the PPGTT belonging to the context loaded on the VCS executing theHuC-specific commands.

HuC Firmware Layout

The HuC FW layout is the same as the GuC one, seeGuC Firmware Layout

DMC

SeeDMC Firmware Support

Tracing

This sections covers all things related to the tracepoints implementedin the i915 driver.

i915_ppgtt_create and i915_ppgtt_release

With full ppgtt enabled each process using drm will allocate at least onetranslation table. With these traces it is possible to keep track of theallocation and of the lifetime of the tables; this can be used duringtesting/debug to verify that we are not leaking ppgtts.These traces identify the ppgtt through the vm pointer, which is also printedby the i915_vma_bind and i915_vma_unbind tracepoints.

i915_context_create and i915_context_free

These tracepoints are used to track creation and deletion of contexts.If full ppgtt is enabled, they also print the address of the vm assigned tothe context.

Perf

Overview

Gen graphics supports a large number of performance counters that can helpdriver and application developers understand and optimize their use of theGPU.

This i915 perf interface enables userspace to configure and open a filedescriptor representing a stream of GPU metrics which can then be read() asa stream of sample records.

The interface is particularly suited to exposing buffered metrics that arecaptured by DMA from the GPU, unsynchronized with and unrelated to the CPU.

Streams representing a single context are accessible to applications with acorresponding drm file descriptor, such that OpenGL can use the interfacewithout special privileges. Access to system-wide metrics requires rootprivileges by default, unless changed via the dev.i915.perf_event_paranoidsysctl option.

Comparison with Core Perf

The interface was initially inspired by the core Perf infrastructure butsome notable differences are:

i915 perf file descriptors represent a “stream” instead of an “event”; wherea perf event primarily corresponds to a single 64bit value, while a streammight sample sets of tightly-coupled counters, depending on theconfiguration. For example the Gen OA unit isn’t designed to supportorthogonal configurations of individual counters; it’s configured for a setof related counters. Samples for an i915 perf stream capturing OA metricswill include a set of counter values packed in a compact HW specific format.The OA unit supports a number of different packing formats which can beselected by the user opening the stream. Perf has support for groupingevents, but each event in the group is configured, validated andauthenticated individually with separate system calls.

i915 perf stream configurations are provided as an array of u64 (key,value)pairs, instead of a fixedstructwith multiple miscellaneous config members,interleaved with event-type specific members.

i915 perf doesn’t support exposing metrics via an mmap’d circular buffer.The supported metrics are being written to memory by the GPU unsynchronizedwith the CPU, using HW specific packing formats for counter sets. Sometimesthe constraints on HW configuration require reports to be filtered before itwould be acceptable to expose them to unprivileged applications - to hidethe metrics of other processes/contexts. For these use cases a read() basedinterface is a good fit, and provides an opportunity to filter data as itgets copied from the GPU mapped buffers to userspace buffers.

Issues hit with first prototype based on Core Perf

The first prototype of this driver was based on the core perfinfrastructure, and while we did make that mostly work, with some changes toperf, we found we were breaking or working around too many assumptions bakedinto perf’s currently cpu centric design.

In the end we didn’t see a clear benefit to making perf’s implementation andinterface more complex by changing design assumptions while we knew we stillwouldn’t be able to use any existing perf based userspace tools.

Also considering the Gen specific nature of the Observability hardware andhow userspace will sometimes need to combine i915 perf OA metrics withside-band OA data captured via MI_REPORT_PERF_COUNT commands; we’reexpecting the interface to be used by a platform specific userspace such asOpenGL or tools. This is to say; we aren’t inherently missing out on havinga standard vendor/architecture agnostic interface by not using perf.

For posterity, in case we might re-visit trying to adapt core perf to bebetter suited to exposing i915 metrics these were the main pain points wehit:

  • The perf based OA PMU driver broke some significant design assumptions:

    Existing perf pmus are used for profiling work on a cpu and we wereintroducing the idea of _IS_DEVICE pmus with different securityimplications, the need to fake cpu-related data (such as user/kernelregisters) to fit with perf’s current design, and adding _DEVICE recordsas a way to forward device-specific status records.

    The OA unit writes reports of counters into a circular buffer, withoutinvolvement from the CPU, making our PMU driver the first of a kind.

    Given the way we were periodically forward data from the GPU-mapped, OAbuffer to perf’s buffer, those bursts of sample writes looked to perf likewe were sampling too fast and so we had to subvert its throttling checks.

    Perf supports groups of counters and allows those to be read viatransactions internally but transactions currently seem designed to beexplicitly initiated from the cpu (say in response to a userspace read())and while we could pull a report out of the OA buffer we can’ttrigger a report from the cpu on demand.

    Related to being report based; the OA counters are configured in HW as aset while perf generally expects counter configurations to be orthogonal.Although counters can be associated with a group leader as they areopened, there’s no clear precedent for being able to provide group-wideconfiguration attributes (for example we want to let userspace choose theOA unit report format used to capture all counters in a set, or specify aGPU context to filter metrics on). We avoided using perf’s groupingfeature and forwarded OA reports to userspace via perf’s ‘raw’ samplefield. This suited our userspace well considering how coupled the countersare when dealing with normalizing. It would be inconvenient to splitcounters up into separate events, only to require userspace to recombinethem. For Mesa it’s also convenient to be forwarded raw, periodic reportsfor combining with the side-band raw reports it captures usingMI_REPORT_PERF_COUNT commands.

    • As a side note on perf’s grouping feature; there was also some concernthat using PERF_FORMAT_GROUP as a way to pack together counter valueswould quite drastically inflate our sample sizes, which would likelylower the effective sampling resolutions we could use when the availablememory bandwidth is limited.

      With the OA unit’s report formats, counters are packed together as 32or 40bit values, with the largest report size being 256 bytes.

      PERF_FORMAT_GROUP values are 64bit, but there doesn’t appear to be adocumented ordering to the values, implying PERF_FORMAT_ID must also beused to add a 64bit ID before each value; giving 16 bytes per counter.

    Related to counter orthogonality; we can’t time share the OA unit, whileevent scheduling is a central design idea within perf for allowinguserspace to open + enable more events than can be configured in HW at anyone time. The OA unit is not designed to allow re-configuration while inuse. We can’t reconfigure the OA unit without losing internal OA unitstate which we can’t access explicitly to save and restore. Reconfiguringthe OA unit is also relatively slow, involving ~100 register writes. Fromuserspace Mesa also depends on a stable OA configuration when emittingMI_REPORT_PERF_COUNT commands and importantly the OA unit can’t bedisabled while there are outstanding MI_RPC commands lest we hang thecommand streamer.

    The contents of sample records aren’t extensible by device drivers (i.e.the sample_type bits). As an example; Sourab Gupta had been looking toattach GPU timestamps to our OA samples. We were shoehorning OA reportsinto sample records by using the ‘raw’ field, but it’s tricky to pack morethan one thing into this field because events/core.c currently only lets apmu give a single raw data pointer plus len which will be copied into thering buffer. To include more than the OA report we’d have to copy thereport into an intermediate larger buffer. I’d been considering allowing avector of data+len values to be specified for copying the raw data, butit felt like a kludge to being using the raw field for this purpose.

  • It felt like our perf based PMU was making some technical compromisesjust for the sake of using perf:

    perf_event_open() requires events to either relate to a pid or a specificcpu core, while our device pmu related to neither. Events opened with apid will be automatically enabled/disabled according to the scheduling ofthat process - so not appropriate for us. When an event is related to acpu id, perf ensures pmu methods will be invoked via an inter processinterrupt on that core. To avoid invasive changes our userspace opened OAperf events for a specific cpu. This was workable but it meant themajority of the OA driver ran in atomic context, including all OA reportforwarding, which wasn’t really necessary in our case and seems to makeour locking requirements somewhat complex as we handled the interactionwith the rest of the i915 driver.

i915 Driver Entry Points

This section covers the entrypoints exported outside of i915_perf.c tointegrate with drm/i915 and to handle theDRM_I915_PERF_OPEN ioctl.

inti915_perf_init(structdrm_i915_private*i915)

initialize i915-perf state on module bind

Parameters

structdrm_i915_private*i915

i915 device instance

Description

Initializes i915-perf state without exposing anything to userspace.

Note

i915-perf initialization is split into an ‘init’ and ‘register’phase with thei915_perf_register() exposing state to userspace.

voidi915_perf_fini(structdrm_i915_private*i915)

Counter part toi915_perf_init()

Parameters

structdrm_i915_private*i915

i915 device instance

voidi915_perf_register(structdrm_i915_private*i915)

exposes i915-perf to userspace

Parameters

structdrm_i915_private*i915

i915 device instance

Description

In particular OA metric sets are advertised under a sysfs metrics/directory allowing userspace to enumerate valid IDs that can beused to open an i915-perf stream.

voidi915_perf_unregister(structdrm_i915_private*i915)

hide i915-perf from userspace

Parameters

structdrm_i915_private*i915

i915 device instance

Description

i915-perf state cleanup is split up into an ‘unregister’ and‘deinit’ phase where the interface is first hidden fromuserspace byi915_perf_unregister() before cleaning upremaining state ini915_perf_fini().

inti915_perf_open_ioctl(structdrm_device*dev,void*data,structdrm_file*file)

DRM ioctl() for userspace to open a stream FD

Parameters

structdrm_device*dev

drm device

void*data

ioctl data copied from userspace (unvalidated)

structdrm_file*file

drm file

Description

Validates the stream open parameters given by userspace including flagsand an array of u64 key, value pair properties.

Very little is assumed up front about the nature of the stream beingopened (for instance we don’t assume it’s for periodic OA unit metrics). Ani915-perf stream is expected to be a suitable interface for other forms ofbuffered data written by the GPU besides periodic OA metrics.

Note we copy the properties from userspace outside of the i915 perfmutex to avoid an awkward lockdep with mmap_lock.

Most of the implementation details are handled byi915_perf_open_ioctl_locked() after taking thegt->perf.lockmutex for serializing with any non-file-operation driver hooks.

Return

A newly opened i915 Perf stream file descriptor or negativeerror code on failure.

inti915_perf_release(structinode*inode,structfile*file)

handles userspace close() of a stream file

Parameters

structinode*inode

anonymous inode associated with file

structfile*file

An i915 perf stream file

Description

Cleans up any resources associated with an open i915 perf stream file.

NB: close() can’t really fail from the userspace point of view.

Return

zero on success or a negative error code.

inti915_perf_add_config_ioctl(structdrm_device*dev,void*data,structdrm_file*file)

DRM ioctl() for userspace to add a new OA config

Parameters

structdrm_device*dev

drm device

void*data

ioctl data (pointer tostructdrm_i915_perf_oa_config) copied fromuserspace (unvalidated)

structdrm_file*file

drm file

Description

Validates the submitted OA register to be saved into a new OA config thatcan then be used for programming the OA unit and its NOA network.

Return

A new allocated config number to be used with the perf open ioctlor a negative error code on failure.

inti915_perf_remove_config_ioctl(structdrm_device*dev,void*data,structdrm_file*file)

DRM ioctl() for userspace to remove an OA config

Parameters

structdrm_device*dev

drm device

void*data

ioctl data (pointer to u64 integer) copied from userspace

structdrm_file*file

drm file

Description

Configs can be removed while being used, the will stop appearing in sysfsand their content will be freed when the stream using the config is closed.

Return

0 on success or a negative error code on failure.

i915 Perf Stream

This section covers the stream-semantics-agnostic structures and functionsfor representing an i915 perf stream FD and associated file operations.

structi915_perf_stream

state for a single open stream FD

Definition:

struct i915_perf_stream {    struct i915_perf *perf;    struct intel_uncore *uncore;    struct intel_engine_cs *engine;    struct mutex lock;    u32 sample_flags;    int sample_size;    struct i915_gem_context *ctx;    bool enabled;    bool hold_preemption;    const struct i915_perf_stream_ops *ops;    struct i915_oa_config *oa_config;    struct llist_head oa_config_bos;    struct intel_context *pinned_ctx;    u32 specific_ctx_id;    u32 specific_ctx_id_mask;    struct hrtimer poll_check_timer;    wait_queue_head_t poll_wq;    bool pollin;    bool periodic;    int period_exponent;    struct {        const struct i915_oa_format *format;        struct i915_vma *vma;        u8 *vaddr;        u32 last_ctx_id;        spinlock_t ptr_lock;        u32 head;        u32 tail;    } oa_buffer;    struct i915_vma *noa_wait;    u64 poll_oa_period;};

Members

perf

i915_perf backpointer

uncore

mmio access path

engine

Engine associated with this performance stream.

lock

Lock associated with operations on stream

sample_flags

Flags representing theDRM_I915_PERF_PROP_SAMPLE_*properties given when opening a stream, representing the contentsof a single sample as read() by userspace.

sample_size

Considering the configured contents of a samplecombined with the required header size, this is the total sizeof a single sample record.

ctx

NULL if measuring system-wide across all contexts or aspecific context that is being monitored.

enabled

Whether the stream is currently enabled, consideringwhether the stream was opened in a disabled state and basedonI915_PERF_IOCTL_ENABLE andI915_PERF_IOCTL_DISABLE calls.

hold_preemption

Whether preemption is put on hold for commandsubmissions done on thectx. This is useful for some drivers thatcannot easily post process the OA buffer context to subtract deltaof performance counters not associated withctx.

ops

The callbacks providing the implementation of this specifictype of configured stream.

oa_config

The OA configuration used by the stream.

oa_config_bos

A list ofstructi915_oa_config_bo allocated lazilyeach timeoa_config changes.

pinned_ctx

The OA context specific information.

specific_ctx_id

The id of the specific context.

specific_ctx_id_mask

The mask used to masking specific_ctx_id bits.

poll_check_timer

High resolution timer that will periodicallycheck for data in the circular OA buffer for notifying userspace(e.g. during a read() or poll()).

poll_wq

The wait queue that hrtimer callback wakes when itsees data ready to read in the circular OA buffer.

pollin

Whether there is data available to read.

periodic

Whether periodic sampling is currently enabled.

period_exponent

The OA unit sampling frequency is derived from this.

oa_buffer

State of the OA buffer.

oa_buffer.ptr_lock

Locks reads and writes to allhead/tail state

Consider: the head and tail pointer state needs to be readconsistently from a hrtimer callback (atomic context) andread() fop (user context) with tail pointer updates happeningin atomic context and head updates in user context and the(unlikely) possibility of read() errors needing to reset allhead/tail state.

Note: Contention/performance aren’t currently a significantconcern here considering the relatively low frequency ofhrtimer callbacks (5ms period) and that reads typically onlyhappen in response to a hrtimer event and likely completebefore the next callback.

Note: This lock is not heldwhile reading and copying datato userspace so the value of head observed in htrimercallbacks won’t represent any partial consumption of data.

oa_buffer.head

Although we can always read backthe head pointer register,we prefer to avoid trusting the HW state, just to avoid anyrisk that some hardware condition could * somehow bump thehead pointer unpredictably and cause us to forward the wrongOA buffer data to userspace.

oa_buffer.tail

The last verified tail that can beread by userspace.

noa_wait

A batch buffer doing a wait on the GPU for the NOA logic to bereprogrammed.

poll_oa_period

The period in nanoseconds at which the OAbuffer should be checked for available data.

structi915_perf_stream_ops

the OPs to support a specific stream type

Definition:

struct i915_perf_stream_ops {    void (*enable)(struct i915_perf_stream *stream);    void (*disable)(struct i915_perf_stream *stream);    void (*poll_wait)(struct i915_perf_stream *stream, struct file *file, poll_table *wait);    int (*wait_unlocked)(struct i915_perf_stream *stream);    int (*read)(struct i915_perf_stream *stream, char __user *buf, size_t count, size_t *offset);    void (*destroy)(struct i915_perf_stream *stream);};

Members

enable

Enables the collection of HW samples, either in response toI915_PERF_IOCTL_ENABLE or implicitly called when stream is openedwithoutI915_PERF_FLAG_DISABLED.

disable

Disables the collection of HW samples, either in responsetoI915_PERF_IOCTL_DISABLE or implicitly called before destroyingthe stream.

poll_wait

Call poll_wait, passing a wait queue that will be wokenonce there is something ready to read() for the stream

wait_unlocked

For handling a blocking read, wait until there issomething to ready to read() for the stream. E.g. wait on the samewait queue that would be passed topoll_wait().

read

Copy buffered metrics as records to userspacebuf: the userspace, destination buffercount: the number of bytes to copy, requested by userspaceoffset: zero at the start of the read, updated as the readproceeds, it represents how many bytes have been copied so far andthe buffer offset for copying the next record.

Copy as many buffered i915 perf samples and records for this streamto userspace as will fit in the given buffer.

Only write complete records; returning -ENOSPC if there isn’t roomfor a complete record.

Return any error condition that results in a short read such as-ENOSPC or -EFAULT, even though these may be squashed beforereturning to userspace.

destroy

Cleanup any stream specific resources.

The stream will always be disabled before this is called.

intread_properties_unlocked(structi915_perf*perf,u64__user*uprops,u32n_props,structperf_open_properties*props)

validate + copy userspace stream open properties

Parameters

structi915_perf*perf

i915 perf instance

u64__user*uprops

The array of u64 key value pairs given by userspace

u32n_props

The number of key value pairs expected inuprops

structperf_open_properties*props

The stream configuration built up while validating properties

Description

Note this function only validates properties in isolation it doesn’tvalidate that the combination of properties makes sense or that allproperties necessary for a particular kind of stream have been set.

Note that there currently aren’t any ordering requirements for properties sowe shouldn’t validate or assume anything about ordering here. This doesn’trule out defining new properties with ordering requirements in the future.

inti915_perf_open_ioctl_locked(structi915_perf*perf,structdrm_i915_perf_open_param*param,structperf_open_properties*props,structdrm_file*file)

DRM ioctl() for userspace to open a stream FD

Parameters

structi915_perf*perf

i915 perf instance

structdrm_i915_perf_open_param*param

The open parameters passed to ‘DRM_I915_PERF_OPEN`

structperf_open_properties*props

individually validated u64 property value pairs

structdrm_file*file

drm file

Description

Seei915_perf_ioctl_open() for interface details.

Implements further stream config validation and stream initialization onbehalf ofi915_perf_open_ioctl() with thegt->perf.lock mutextaken to serialize with any non-file-operation driver hooks.

Note

at this point theprops have only been validated in isolation andit’s still necessary to validate that the combination of properties makessense.

In the case where userspace is interested in OA unit metrics then furtherconfig validation and stream initialization details will be handled byi915_oa_stream_init(). The code here should only validate config state thatwill be relevant to all stream types / backends.

Return

zero on success or a negative error code.

voidi915_perf_destroy_locked(structi915_perf_stream*stream)

destroy an i915 perf stream

Parameters

structi915_perf_stream*stream

An i915 perf stream

Description

Frees all resources associated with the given i915 perfstream, disablingany associated data capture in the process.

Note

Thegt->perf.lock mutex has been taken to serializewith any non-file-operation driver hooks.

ssize_ti915_perf_read(structfile*file,char__user*buf,size_tcount,loff_t*ppos)

handles read() FOP for i915 perf stream FDs

Parameters

structfile*file

An i915 perf stream file

char__user*buf

destination buffer given by userspace

size_tcount

the number of bytes userspace wants to read

loff_t*ppos

(inout) file seek position (unused)

Description

The entry point for handling a read() on a stream file descriptor fromuserspace. Most of the work is left to thei915_perf_read_locked() andi915_perf_stream_ops->read but to save having stream implementations (ofwhich we might have multiple later) we handle blocking read here.

We can also consistently treat trying to read from a disabled streamas an IO error so implementations can assume the stream is enabledwhile reading.

Return

The number of bytes copied or a negative error code on failure.

longi915_perf_ioctl(structfile*file,unsignedintcmd,unsignedlongarg)

support ioctl() usage with i915 perf stream FDs

Parameters

structfile*file

An i915 perf stream file

unsignedintcmd

the ioctl request

unsignedlongarg

the ioctl data

Description

Implementation deferred toi915_perf_ioctl_locked().

Return

zero on success or a negative error code. Returns -EINVAL foran unknown ioctl request.

voidi915_perf_enable_locked(structi915_perf_stream*stream)

handleI915_PERF_IOCTL_ENABLE ioctl

Parameters

structi915_perf_stream*stream

A disabled i915 perf stream

Description

[Re]enables the associated capture of data for this stream.

If a stream was previously enabled then there’s currently no intentionto provide userspace any guarantee about the preservation of previouslybuffered data.

voidi915_perf_disable_locked(structi915_perf_stream*stream)

handleI915_PERF_IOCTL_DISABLE ioctl

Parameters

structi915_perf_stream*stream

An enabled i915 perf stream

Description

Disables the associated capture of data for this stream.

The intention is that disabling an re-enabling a stream will ideally becheaper than destroying and re-opening a stream with the same configuration,though there are no formal guarantees about what state or buffered datamust be retained between disabling and re-enabling a stream.

Note

while a stream is disabled it’s considered an error for userspaceto attempt to read from the stream (-EIO).

__poll_ti915_perf_poll(structfile*file,poll_table*wait)

callpoll_wait() with a suitable wait queue for stream

Parameters

structfile*file

An i915 perf stream file

poll_table*wait

poll() state table

Description

For handling userspace polling on an i915 perf stream, this ensurespoll_wait() gets called with a wait queue that will be woken for new streamdata.

Note

Implementation deferred toi915_perf_poll_locked()

Return

any poll events that are ready without sleeping

__poll_ti915_perf_poll_locked(structi915_perf_stream*stream,structfile*file,poll_table*wait)

poll_wait() with a suitable wait queue for stream

Parameters

structi915_perf_stream*stream

An i915 perf stream

structfile*file

An i915 perf stream file

poll_table*wait

poll() state table

Description

For handling userspace polling on an i915 perf stream, this calls through toi915_perf_stream_ops->poll_wait to callpoll_wait() with a wait queue thatwill be woken for new stream data.

Return

any poll events that are ready without sleeping

i915 Perf Observation Architecture Stream

structi915_oa_ops

Gen specific implementation of an OA unit stream

Definition:

struct i915_oa_ops {    bool (*is_valid_b_counter_reg)(struct i915_perf *perf, u32 addr);    bool (*is_valid_mux_reg)(struct i915_perf *perf, u32 addr);    bool (*is_valid_flex_reg)(struct i915_perf *perf, u32 addr);    int (*enable_metric_set)(struct i915_perf_stream *stream, struct i915_active *active);    void (*disable_metric_set)(struct i915_perf_stream *stream);    void (*oa_enable)(struct i915_perf_stream *stream);    void (*oa_disable)(struct i915_perf_stream *stream);    int (*read)(struct i915_perf_stream *stream, char __user *buf, size_t count, size_t *offset);    u32 (*oa_hw_tail_read)(struct i915_perf_stream *stream);};

Members

is_valid_b_counter_reg

Validates register’s address forprogramming boolean counters for a particular platform.

is_valid_mux_reg

Validates register’s address for programming muxfor a particular platform.

is_valid_flex_reg

Validates register’s address for programmingflex EU filtering for a particular platform.

enable_metric_set

Selects and applies any MUX configuration to setup the Boolean and Custom (B/C) counters that are part of thecounter reports being sampled. May apply system constraints such asdisabling EU clock gating as required.

disable_metric_set

Remove system constraints associated with usingthe OA unit.

oa_enable

Enable periodic sampling

oa_disable

Disable periodic sampling

read

Copy data from the circular OA buffer into a given userspacebuffer.

oa_hw_tail_read

read the OA tail pointer register

In particular this enables us to share all the fiddly code forhandling the OA unit tail pointer race that affects multiplegenerations.

inti915_oa_stream_init(structi915_perf_stream*stream,structdrm_i915_perf_open_param*param,structperf_open_properties*props)

validate combined props for OA stream and init

Parameters

structi915_perf_stream*stream

An i915 perf stream

structdrm_i915_perf_open_param*param

The open parameters passed toDRM_I915_PERF_OPEN

structperf_open_properties*props

The property state that configures stream (individually validated)

Description

Whileread_properties_unlocked() validates properties in isolation itdoesn’t ensure that the combination necessarily makes sense.

At this point it has been determined that userspace wants a stream ofOA metrics, but still we need to further validate the combinedproperties are OK.

If the configuration makes sense then we can allocate memory fora circular OA buffer and apply the requested metric set configuration.

Return

zero on success or a negative error code.

inti915_oa_read(structi915_perf_stream*stream,char__user*buf,size_tcount,size_t*offset)

just calls through toi915_oa_ops->read

Parameters

structi915_perf_stream*stream

An i915-perf stream opened for OA metrics

char__user*buf

destination buffer given by userspace

size_tcount

the number of bytes userspace wants to read

size_t*offset

(inout): the current position for writing intobuf

Description

Updatesoffset according to the number of bytes successfully copied intothe userspace buffer.

Return

zero on success or a negative error code

voidi915_oa_stream_enable(structi915_perf_stream*stream)

handleI915_PERF_IOCTL_ENABLE for OA stream

Parameters

structi915_perf_stream*stream

An i915 perf stream opened for OA metrics

Description

[Re]enables hardware periodic sampling according to the period configuredwhen opening the stream. This also starts a hrtimer that will periodicallycheck for data in the circular OA buffer for notifying userspace (e.g.during a read() or poll()).

voidi915_oa_stream_disable(structi915_perf_stream*stream)

handleI915_PERF_IOCTL_DISABLE for OA stream

Parameters

structi915_perf_stream*stream

An i915 perf stream opened for OA metrics

Description

Stops the OA unit from periodically writing counter reports into thecircular OA buffer. This also stops the hrtimer that periodically checks fordata in the circular OA buffer, for notifying userspace.

inti915_oa_wait_unlocked(structi915_perf_stream*stream)

handles blocking IO until OA data available

Parameters

structi915_perf_stream*stream

An i915-perf stream opened for OA metrics

Description

Called when userspace tries to read() from a blocking stream FD openedfor OA metrics. It waits until the hrtimer callback finds a non-emptyOA buffer and wakes us.

Note

it’s acceptable to have this return with some false positivessince any subsequent read handling will return -EAGAIN if there isn’treally data ready for userspace yet.

Return

zero on success or a negative error code

voidi915_oa_poll_wait(structi915_perf_stream*stream,structfile*file,poll_table*wait)

callpoll_wait() for an OA stream poll()

Parameters

structi915_perf_stream*stream

An i915-perf stream opened for OA metrics

structfile*file

An i915 perf stream file

poll_table*wait

poll() state table

Description

For handling userspace polling on an i915 perf stream opened for OA metrics,this starts a poll_wait with the wait queue that our hrtimer callback wakeswhen it sees data ready to read in the circular OA buffer.

Other i915 Perf Internals

This section simply includes all other currently documented i915 perf internals,in no particular order, but may include some more minor utilities or platformspecific details than found in the more high-level sections.

structperf_open_properties

for validated properties given to open a stream

Definition:

struct perf_open_properties {    u32 sample_flags;    u64 single_context:1;    u64 hold_preemption:1;    u64 ctx_handle;    int metrics_set;    int oa_format;    bool oa_periodic;    int oa_period_exponent;    struct intel_engine_cs *engine;    bool has_sseu;    struct intel_sseu sseu;    u64 poll_oa_period;};

Members

sample_flags

DRM_I915_PERF_PROP_SAMPLE_* properties are tracked as flags

single_context

Whether a single or all gpu contexts should be monitored

hold_preemption

Whether the preemption is disabled for the filteredcontext

ctx_handle

A gem ctx handle for use withsingle_context

metrics_set

An ID for an OA unit metric set advertised via sysfs

oa_format

An OA unit HW report format

oa_periodic

Whether to enable periodic OA unit sampling

oa_period_exponent

The OA unit sampling period is derived from this

engine

The engine (typically rcs0) being monitored by the OA unit

has_sseu

Whethersseu was specified by userspace

sseu

internal SSEU configuration computed either from the userspacespecified configuration in the opening parameters or a default value(seeget_default_sseu_config())

poll_oa_period

The period in nanoseconds at which the CPU will check for OAdata availability

Description

Asread_properties_unlocked() enumerates and validates the properties givento open a stream of metrics the configuration is built up in the structurewhich starts out zero initialized.

booloa_buffer_check_unlocked(structi915_perf_stream*stream)

check for data and update tail ptr state

Parameters

structi915_perf_stream*stream

i915 stream instance

Description

This is either called via fops (for blocking reads in user ctx) or the pollcheck hrtimer (atomic ctx) to check the OA buffer tail pointer and checkif there is data available for userspace to read.

This function is central to providing a workaround for the OA unit tailpointer having a race with respect to what data is visible to the CPU.It is responsible for reading tail pointers from the hardware and givingthe pointers time to ‘age’ before they are made available for reading.(See description of OA_TAIL_MARGIN_NSEC above for further details.)

Besides returning true when there is data available to read() this functionalso updates the tail in the oa_buffer object.

Note

It’s safe to read OA config state here unlocked, assuming that this isonly called while the stream is enabled, while the global OA configurationcan’t be modified.

Return

true if the OA buffer contains data, elsefalse

intappend_oa_status(structi915_perf_stream*stream,char__user*buf,size_tcount,size_t*offset,enumdrm_i915_perf_record_typetype)

Appends a status record to a userspace read() buffer.

Parameters

structi915_perf_stream*stream

An i915-perf stream opened for OA metrics

char__user*buf

destination buffer given by userspace

size_tcount

the number of bytes userspace wants to read

size_t*offset

(inout): the current position for writing intobuf

enumdrm_i915_perf_record_typetype

The kind of status to report to userspace

Description

Writes a status record (such asDRM_I915_PERF_RECORD_OA_REPORT_LOST)into the userspace read() buffer.

Thebufoffset will only be updated on success.

Return

0 on success, negative error code on failure.

intappend_oa_sample(structi915_perf_stream*stream,char__user*buf,size_tcount,size_t*offset,constu8*report)

Copies single OA report into userspace read() buffer.

Parameters

structi915_perf_stream*stream

An i915-perf stream opened for OA metrics

char__user*buf

destination buffer given by userspace

size_tcount

the number of bytes userspace wants to read

size_t*offset

(inout): the current position for writing intobuf

constu8*report

A single OA report to (optionally) include as part of the sample

Description

The contents of a sample are configured throughDRM_I915_PERF_PROP_SAMPLE_*properties when opening a stream, tracked asstream->sample_flags. Thisfunction copies the requested components of a single sample to the givenread()buf.

Thebufoffset will only be updated on success.

Return

0 on success, negative error code on failure.

intgen8_append_oa_reports(structi915_perf_stream*stream,char__user*buf,size_tcount,size_t*offset)

Copies all buffered OA reports into userspace read() buffer.

Parameters

structi915_perf_stream*stream

An i915-perf stream opened for OA metrics

char__user*buf

destination buffer given by userspace

size_tcount

the number of bytes userspace wants to read

size_t*offset

(inout): the current position for writing intobuf

Description

Notably any error condition resulting in a short read (-ENOSPC or-EFAULT) will be returned even though one or more records mayhave been successfully copied. In this case it’s up to the callerto decide if the error should be squashed before returning touserspace.

Note

reports are consumed from the head, and appended to thetail, so the tail chases the head?... If you think that’s madand back-to-front you’re not alone, but this follows theGen PRM naming convention.

Return

0 on success, negative error code on failure.

intgen8_oa_read(structi915_perf_stream*stream,char__user*buf,size_tcount,size_t*offset)

copy status records then buffered OA reports

Parameters

structi915_perf_stream*stream

An i915-perf stream opened for OA metrics

char__user*buf

destination buffer given by userspace

size_tcount

the number of bytes userspace wants to read

size_t*offset

(inout): the current position for writing intobuf

Description

Checks OA unit status registers and if necessary appends correspondingstatus records for userspace (such as for a buffer full condition) and theninitiate appending any buffered OA reports.

Updatesoffset according to the number of bytes successfully copied intothe userspace buffer.

NB: some data may be successfully copied to the userspace buffereven if an error is returned, and this is reflected in theupdatedoffset.

Return

zero on success or a negative error code

intgen7_append_oa_reports(structi915_perf_stream*stream,char__user*buf,size_tcount,size_t*offset)

Copies all buffered OA reports into userspace read() buffer.

Parameters

structi915_perf_stream*stream

An i915-perf stream opened for OA metrics

char__user*buf

destination buffer given by userspace

size_tcount

the number of bytes userspace wants to read

size_t*offset

(inout): the current position for writing intobuf

Description

Notably any error condition resulting in a short read (-ENOSPC or-EFAULT) will be returned even though one or more records mayhave been successfully copied. In this case it’s up to the callerto decide if the error should be squashed before returning touserspace.

Note

reports are consumed from the head, and appended to thetail, so the tail chases the head?... If you think that’s madand back-to-front you’re not alone, but this follows theGen PRM naming convention.

Return

0 on success, negative error code on failure.

intgen7_oa_read(structi915_perf_stream*stream,char__user*buf,size_tcount,size_t*offset)

copy status records then buffered OA reports

Parameters

structi915_perf_stream*stream

An i915-perf stream opened for OA metrics

char__user*buf

destination buffer given by userspace

size_tcount

the number of bytes userspace wants to read

size_t*offset

(inout): the current position for writing intobuf

Description

Checks Gen 7 specific OA unit status registers and if necessary appendscorresponding status records for userspace (such as for a buffer fullcondition) and then initiate appending any buffered OA reports.

Updatesoffset according to the number of bytes successfully copied intothe userspace buffer.

Return

zero on success or a negative error code

intoa_get_render_ctx_id(structi915_perf_stream*stream)

determine and hold ctx hw id

Parameters

structi915_perf_stream*stream

An i915-perf stream opened for OA metrics

Description

Determine the render context hw id, and ensure it remains fixed for thelifetime of the stream. This ensures that we don’t have to worry aboutupdating the context ID in OACONTROL on the fly.

Return

zero on success or a negative error code

voidoa_put_render_ctx_id(structi915_perf_stream*stream)

counterpart to oa_get_render_ctx_id releases hold

Parameters

structi915_perf_stream*stream

An i915-perf stream opened for OA metrics

Description

In case anything needed doing to ensure the context HW ID would remain validfor the lifetime of the stream, then that can be undone here.

longi915_perf_ioctl_locked(structi915_perf_stream*stream,unsignedintcmd,unsignedlongarg)

support ioctl() usage with i915 perf stream FDs

Parameters

structi915_perf_stream*stream

An i915 perf stream

unsignedintcmd

the ioctl request

unsignedlongarg

the ioctl data

Return

zero on success or a negative error code. Returns -EINVAL foran unknown ioctl request.

inti915_perf_ioctl_version(structdrm_i915_private*i915)

Version of the i915-perf subsystem

Parameters

structdrm_i915_private*i915

The i915 device

Description

This version number is used by userspace to detect available features.

Style

The drm/i915 driver codebase has some style rules in addition to (and, in somecases, deviating from) the kernel coding style.

Register macro definition style

The style guide fori915_reg.h.

Follow the style described here for new macros, and while changing existingmacros. Donot mass change existing definitions just to update the style.

File Layout

Keep helper macros near the top. For example,_PIPE() and friends.

Prefix macros that generally should not be used outside of this file withunderscore ‘_’. For example,_PIPE() and friends, single instances ofregisters that are defined solely for the use by function-like macros.

Avoid using the underscore prefixed macros outside of this file. There areexceptions, but keep them to a minimum.

There are two basic types of register definitions: Single registers andregister groups. Register groups are registers which have two or moreinstances, for example one per pipe, port, transcoder, etc. Register groupsshould be defined using function-like macros.

For single registers, define the register offset first, followed by registercontents.

For register groups, define the register instance offsets first, prefixedwith underscore, followed by a function-like macro choosing the rightinstance based on the parameter, followed by register contents.

Define the register contents (i.e. bit and bit field macros) from mostsignificant to least significant bit. Indent the register content macrosusing two extra spaces between#define and the macro name.

Define bit fields usingREG_GENMASK(h,l). Define bit field contentsusingREG_FIELD_PREP(mask,value). This will define the values alreadyshifted in place, so they can be directly OR’d together. For convenience,function-like macros may be used to define bit fields, but do note that themacros may be needed to read as well as write the register contents.

Define bits usingREG_BIT(N). Donot add_BIT suffix to the name.

Group the register and its contents together without blank lines, separatefrom other registers and their contents with one blank line.

Indent macro values from macro names using TABs. Align values vertically. Usebraces in macro values as needed to avoid unintended precedence after macrosubstitution. Use spaces in macro values according to kernel codingstyle. Use lower case in hexadecimal values.

Naming

Try to name registers according to the specs. If the register name changes inthe specs from platform to another, stick to the original name.

Try to reuse existing register macro definitions. Only add new macros fornew register offsets, or when the register contents have changed enough towarrant a full redefinition.

When a register macro changes for a new platform, prefix the new macro usingthe platform acronym or generation. For example,SKL_ orGEN8_. Theprefix signifies the start platform/generation using the register.

When a bit (field) macro changes or gets added for a new platform, whileretaining the existing register macro, add a platform acronym or generationsuffix to the name. For example,_SKL or_GEN8.

Examples

(Note that the values in the example are indented using spaces instead ofTABs to avoid misalignment in generated documentation. Use TABs in thedefinitions.):

#define _FOO_A                      0xf000#define _FOO_B                      0xf001#define FOO(pipe)                   _MMIO_PIPE(pipe, _FOO_A, _FOO_B)#define   FOO_ENABLE                REG_BIT(31)#define   FOO_MODE_MASK             REG_GENMASK(19, 16)#define   FOO_MODE_BAR              REG_FIELD_PREP(FOO_MODE_MASK, 0)#define   FOO_MODE_BAZ              REG_FIELD_PREP(FOO_MODE_MASK, 1)#define   FOO_MODE_QUX_SNB          REG_FIELD_PREP(FOO_MODE_MASK, 2)#define BAR                         _MMIO(0xb000)#define GEN8_BAR                    _MMIO(0xb888)

i915 DRM client usage stats implementation

The drm/i915 driver implements the DRM client usage stats specification asdocumented inDRM client usage stats.

Example of the output showing the implemented key value pairs and entirety ofthe currently possible format options:

pos:    0flags:  0100002mnt_id: 21drm-driver: i915drm-pdev:   0000:00:02.0drm-client-id:      7drm-engine-render:  9288864723 nsdrm-engine-copy:    2035071108 nsdrm-engine-video:   0 nsdrm-engine-capacity-video:   2drm-engine-video-enhance:   0 ns

Possibledrm-engine- key names are:render,copy,video andvideo-enhance.