Ring Buffer

To handle communication between user space and kernel space, AMD GPUs use aring buffer design to feed the engines (GFX, Compute, SDMA, UVD, VCE, VCN, VPE,etc.). See the figure below that illustrates how this communication works:

../../_images/ring_buffers.svg

Ring buffers in the amdgpu work as a producer-consumer model, where userspaceacts as the producer, constantly filling the ring buffer with GPU commands tobe executed. Meanwhile, the GPU retrieves the information from the ring, parsesit, and distributes the specific set of instructions between the differentamdgpu blocks.

Notice from the diagram that the ring has a Read Pointer (rptr), whichindicates where the engine is currently reading packets from the ring, and aWrite Pointer (wptr), which indicates how many packets software has added tothe ring. When the rptr and wptr are equal, the ring is idle. When softwareadds packets to the ring, it updates the wptr, this causes the engine to startfetching and processing packets. As the engine processes packets, the rptr getsupdates until the rptr catches up to the wptr and they are equal again.

Usually, ring buffers in the driver have a limited size (search for occurrencesofamdgpu_ring_init()). One of the reasons for the small ring buffer size isthat CP (Command Processor) is capable of following addresses inserted into thering; this is illustrated in the image by the reference to the IB (IndirectBuffer). The IB gives userspace the possibility to have an area in memory thatCP can read and feed the hardware with extra instructions.

All ASICs pre-GFX11 use what is called a kernel queue, which meansthe ring is allocated in kernel space and has some restrictions, such as notbeing able to bepreempted directly by the scheduler. GFX11and newer support kernel queues, but also provide a new mechanism nameduser queues, where the queue is moved to the user spaceand can be mapped and unmapped via the scheduler. In practice, both queuesinsert user-space-generated GPU commands from different jobs into the requestedcomponent ring.

Enforce Isolation

Note

After reading this section, you might want to check theProcess Isolation page for more details.

Before examining the Enforce Isolation mechanism in the ring buffer context, itis helpful to briefly discuss how instructions from the ring buffer areprocessed in the graphics pipeline. Let’s expand on this topic by checking thediagram below that illustrates the graphics pipeline:

../../_images/gfx_pipeline_seq.svg

In terms of executing instructions, the GFX pipeline follows the sequence:Shader Export (SX), Geometry Engine (GE), Shader Process or Input (SPI), ScanConverter (SC), Primitive Assembler (PA), and cache manipulation (which mayvary across ASICs). Another common way to describe the pipeline is to use PixelShader (PS), raster, and Vertex Shader (VS) to symbolize the two shader stages.Now, with this pipeline in mind, let’s assume that Job B causes a hang issue,but Job C’s instruction might already be executing, leading developers toincorrectly identify Job C as the problematic one. This problem can bemitigated on multiple levels; the diagram below illustrates how to minimizepart of this problem:

../../_images/no_enforce_isolation.svg

Note from the diagram that there is no guarantee of order or a clear separationbetween instructions, which is not a problem most of the time, and is also goodfor performance. Furthermore, notice some circles between jobs in the diagramthat represent afence wait used to avoid overlapping work in the ring. Atthe end of the fence, a cache flush occurs, ensuring that when the next jobstarts, it begins in a clean state and, if issues arise, the developer canpinpoint the problematic process more precisely.

To increase the level of isolation between jobs, there is the “EnforceIsolation” method described in the picture below:

../../_images/enforce_isolation.svg

As shown in the diagram, enforcing isolation introduces ordering betweensubmissions, since the access to GFX/Compute is serialized, think about it assingle process at a time mode for gfx/compute. Notice that this approach has asignificant performance impact, as it allows only one job to submit commands ata time. However, this option can help pinpoint the job that caused the problem.Although enforcing isolation improves the situation, it does not fully resolvethe issue of precisely pinpointing bad jobs, since isolation might mask theproblem. In summary, identifying which job caused the issue may not be precise,but enforcing isolation might help with the debugging.

Ring Operations

unsignedintamdgpu_ring_max_ibs(enumamdgpu_ring_typetype)

Return max IBs that fit in a single submission.

Parameters

enumamdgpu_ring_typetype

ring type for which to return the limit.

intamdgpu_ring_alloc(structamdgpu_ring*ring,unsignedintndw)

allocate space on the ring buffer

Parameters

structamdgpu_ring*ring

amdgpu_ring structure holding ring information

unsignedintndw

number of dwords to allocate in the ring buffer

Description

Allocatendw dwords in the ring buffer. The number of dwords should be thesum of all commands written to the ring.

Return

0 on success, otherwise -ENOMEM if it tries to allocate more than themaximum dword allowed for one submission.

voidamdgpu_ring_alloc_reemit(structamdgpu_ring*ring,unsignedintndw)

allocate space on the ring buffer for reemit

Parameters

structamdgpu_ring*ring

amdgpu_ring structure holding ring information

unsignedintndw

number of dwords to allocate in the ring buffer

Description

Allocatendw dwords in the ring buffer (all asics).doesn’t check the max_dw limit as we may be reemittingseveral submissions.

voidamdgpu_ring_insert_nop(structamdgpu_ring*ring,uint32_tcount)

insert NOP packets

Parameters

structamdgpu_ring*ring

amdgpu_ring structure holding ring information

uint32_tcount

the number of NOP packets to insert

Description

This is the generic insert_nop function for rings except SDMA

voidamdgpu_ring_generic_pad_ib(structamdgpu_ring*ring,structamdgpu_ib*ib)

pad IB with NOP packets

Parameters

structamdgpu_ring*ring

amdgpu_ring structure holding ring information

structamdgpu_ib*ib

IB to add NOP packets to

Description

This is the generic pad_ib function for rings except SDMA

voidamdgpu_ring_commit(structamdgpu_ring*ring)

tell the GPU to execute the new commands on the ring buffer

Parameters

structamdgpu_ring*ring

amdgpu_ring structure holding ring information

Description

Update the wptr (write pointer) to tell the GPU toexecute new commands on the ring buffer (all asics).

voidamdgpu_ring_undo(structamdgpu_ring*ring)

reset the wptr

Parameters

structamdgpu_ring*ring

amdgpu_ring structure holding ring information

Description

Reset the driver’s copy of the wptr (all asics).

intamdgpu_ring_init(structamdgpu_device*adev,structamdgpu_ring*ring,unsignedintmax_dw,structamdgpu_irq_src*irq_src,unsignedintirq_type,unsignedinthw_prio,atomic_t*sched_score)

init driver ring struct.

Parameters

structamdgpu_device*adev

amdgpu_device pointer

structamdgpu_ring*ring

amdgpu_ring structure holding ring information

unsignedintmax_dw

maximum number of dw for ring alloc

structamdgpu_irq_src*irq_src

interrupt source to use for this ring

unsignedintirq_type

interrupt type to use for this ring

unsignedinthw_prio

ring priority (NORMAL/HIGH)

atomic_t*sched_score

optional score atomic shared with other schedulers

Description

Initialize the driver information for the selected ring (all asics).Returns 0 on success, error on failure.

voidamdgpu_ring_fini(structamdgpu_ring*ring)

tear down the driver ring struct.

Parameters

structamdgpu_ring*ring

amdgpu_ring structure holding ring information

Description

Tear down the driver information for the selected ring (all asics).

voidamdgpu_ring_emit_reg_write_reg_wait_helper(structamdgpu_ring*ring,uint32_treg0,uint32_treg1,uint32_tref,uint32_tmask)

ring helper

Parameters

structamdgpu_ring*ring

ring to write to

uint32_treg0

register to write

uint32_treg1

register to wait on

uint32_tref

reference value to write/wait on

uint32_tmask

mask to wait on

Description

Helper for rings that don’t support write and wait in asingle oneshot packet.

boolamdgpu_ring_soft_recovery(structamdgpu_ring*ring,unsignedintvmid,structdma_fence*fence)

try to soft recover a ring lockup

Parameters

structamdgpu_ring*ring

ring to try the recovery on

unsignedintvmid

VMID we try to get going again

structdma_fence*fence

timedout fence

Description

Tries to get a ring proceeding again when it is stuck.

intamdgpu_ring_test_helper(structamdgpu_ring*ring)

tests ring and set sched readiness status

Parameters

structamdgpu_ring*ring

ring to try the recovery on

Description

Tests ring and set sched readiness status

Returns 0 on success, error on failure.