Ring Buffer¶
To handle communication between user space and kernel space, AMD GPUs use aring buffer design to feed the engines (GFX, Compute, SDMA, UVD, VCE, VCN, VPE,etc.). See the figure below that illustrates how this communication works:
Ring buffers in the amdgpu work as a producer-consumer model, where userspaceacts as the producer, constantly filling the ring buffer with GPU commands tobe executed. Meanwhile, the GPU retrieves the information from the ring, parsesit, and distributes the specific set of instructions between the differentamdgpu blocks.
Notice from the diagram that the ring has a Read Pointer (rptr), whichindicates where the engine is currently reading packets from the ring, and aWrite Pointer (wptr), which indicates how many packets software has added tothe ring. When the rptr and wptr are equal, the ring is idle. When softwareadds packets to the ring, it updates the wptr, this causes the engine to startfetching and processing packets. As the engine processes packets, the rptr getsupdates until the rptr catches up to the wptr and they are equal again.
Usually, ring buffers in the driver have a limited size (search for occurrencesofamdgpu_ring_init()). One of the reasons for the small ring buffer size isthat CP (Command Processor) is capable of following addresses inserted into thering; this is illustrated in the image by the reference to the IB (IndirectBuffer). The IB gives userspace the possibility to have an area in memory thatCP can read and feed the hardware with extra instructions.
All ASICs pre-GFX11 use what is called a kernel queue, which meansthe ring is allocated in kernel space and has some restrictions, such as notbeing able to bepreempted directly by the scheduler. GFX11and newer support kernel queues, but also provide a new mechanism nameduser queues, where the queue is moved to the user spaceand can be mapped and unmapped via the scheduler. In practice, both queuesinsert user-space-generated GPU commands from different jobs into the requestedcomponent ring.
Enforce Isolation¶
Note
After reading this section, you might want to check theProcess Isolation page for more details.
Before examining the Enforce Isolation mechanism in the ring buffer context, itis helpful to briefly discuss how instructions from the ring buffer areprocessed in the graphics pipeline. Let’s expand on this topic by checking thediagram below that illustrates the graphics pipeline:
In terms of executing instructions, the GFX pipeline follows the sequence:Shader Export (SX), Geometry Engine (GE), Shader Process or Input (SPI), ScanConverter (SC), Primitive Assembler (PA), and cache manipulation (which mayvary across ASICs). Another common way to describe the pipeline is to use PixelShader (PS), raster, and Vertex Shader (VS) to symbolize the two shader stages.Now, with this pipeline in mind, let’s assume that Job B causes a hang issue,but Job C’s instruction might already be executing, leading developers toincorrectly identify Job C as the problematic one. This problem can bemitigated on multiple levels; the diagram below illustrates how to minimizepart of this problem:
Note from the diagram that there is no guarantee of order or a clear separationbetween instructions, which is not a problem most of the time, and is also goodfor performance. Furthermore, notice some circles between jobs in the diagramthat represent afence wait used to avoid overlapping work in the ring. Atthe end of the fence, a cache flush occurs, ensuring that when the next jobstarts, it begins in a clean state and, if issues arise, the developer canpinpoint the problematic process more precisely.
To increase the level of isolation between jobs, there is the “EnforceIsolation” method described in the picture below:
As shown in the diagram, enforcing isolation introduces ordering betweensubmissions, since the access to GFX/Compute is serialized, think about it assingle process at a time mode for gfx/compute. Notice that this approach has asignificant performance impact, as it allows only one job to submit commands ata time. However, this option can help pinpoint the job that caused the problem.Although enforcing isolation improves the situation, it does not fully resolvethe issue of precisely pinpointing bad jobs, since isolation might mask theproblem. In summary, identifying which job caused the issue may not be precise,but enforcing isolation might help with the debugging.
Ring Operations¶
- unsignedintamdgpu_ring_max_ibs(enumamdgpu_ring_typetype)¶
Return max IBs that fit in a single submission.
Parameters
enumamdgpu_ring_typetypering type for which to return the limit.
- intamdgpu_ring_alloc(structamdgpu_ring*ring,unsignedintndw)¶
allocate space on the ring buffer
Parameters
structamdgpu_ring*ringamdgpu_ring structure holding ring information
unsignedintndwnumber of dwords to allocate in the ring buffer
Description
Allocatendw dwords in the ring buffer. The number of dwords should be thesum of all commands written to the ring.
Return
0 on success, otherwise -ENOMEM if it tries to allocate more than themaximum dword allowed for one submission.
- voidamdgpu_ring_alloc_reemit(structamdgpu_ring*ring,unsignedintndw)¶
allocate space on the ring buffer for reemit
Parameters
structamdgpu_ring*ringamdgpu_ring structure holding ring information
unsignedintndwnumber of dwords to allocate in the ring buffer
Description
Allocatendw dwords in the ring buffer (all asics).doesn’t check the max_dw limit as we may be reemittingseveral submissions.
- voidamdgpu_ring_insert_nop(structamdgpu_ring*ring,uint32_tcount)¶
insert NOP packets
Parameters
structamdgpu_ring*ringamdgpu_ring structure holding ring information
uint32_tcountthe number of NOP packets to insert
Description
This is the generic insert_nop function for rings except SDMA
- voidamdgpu_ring_generic_pad_ib(structamdgpu_ring*ring,structamdgpu_ib*ib)¶
pad IB with NOP packets
Parameters
structamdgpu_ring*ringamdgpu_ring structure holding ring information
structamdgpu_ib*ibIB to add NOP packets to
Description
This is the generic pad_ib function for rings except SDMA
- voidamdgpu_ring_commit(structamdgpu_ring*ring)¶
tell the GPU to execute the new commands on the ring buffer
Parameters
structamdgpu_ring*ringamdgpu_ring structure holding ring information
Description
Update the wptr (write pointer) to tell the GPU toexecute new commands on the ring buffer (all asics).
- voidamdgpu_ring_undo(structamdgpu_ring*ring)¶
reset the wptr
Parameters
structamdgpu_ring*ringamdgpu_ring structure holding ring information
Description
Reset the driver’s copy of the wptr (all asics).
- intamdgpu_ring_init(structamdgpu_device*adev,structamdgpu_ring*ring,unsignedintmax_dw,structamdgpu_irq_src*irq_src,unsignedintirq_type,unsignedinthw_prio,atomic_t*sched_score)¶
init driver ring struct.
Parameters
structamdgpu_device*adevamdgpu_device pointer
structamdgpu_ring*ringamdgpu_ring structure holding ring information
unsignedintmax_dwmaximum number of dw for ring alloc
structamdgpu_irq_src*irq_srcinterrupt source to use for this ring
unsignedintirq_typeinterrupt type to use for this ring
unsignedinthw_prioring priority (NORMAL/HIGH)
atomic_t*sched_scoreoptional score atomic shared with other schedulers
Description
Initialize the driver information for the selected ring (all asics).Returns 0 on success, error on failure.
- voidamdgpu_ring_fini(structamdgpu_ring*ring)¶
tear down the driver ring struct.
Parameters
structamdgpu_ring*ringamdgpu_ring structure holding ring information
Description
Tear down the driver information for the selected ring (all asics).
- voidamdgpu_ring_emit_reg_write_reg_wait_helper(structamdgpu_ring*ring,uint32_treg0,uint32_treg1,uint32_tref,uint32_tmask)¶
ring helper
Parameters
structamdgpu_ring*ringring to write to
uint32_treg0register to write
uint32_treg1register to wait on
uint32_trefreference value to write/wait on
uint32_tmaskmask to wait on
Description
Helper for rings that don’t support write and wait in asingle oneshot packet.
- boolamdgpu_ring_soft_recovery(structamdgpu_ring*ring,unsignedintvmid,structdma_fence*fence)¶
try to soft recover a ring lockup
Parameters
structamdgpu_ring*ringring to try the recovery on
unsignedintvmidVMID we try to get going again
structdma_fence*fencetimedout fence
Description
Tries to get a ring proceeding again when it is stuck.
- intamdgpu_ring_test_helper(structamdgpu_ring*ring)¶
tests ring and set sched readiness status
Parameters
structamdgpu_ring*ringring to try the recovery on
Description
Tests ring and set sched readiness status
Returns 0 on success, error on failure.