drm/v3d Broadcom V3D Graphics Driver¶
This driver supports the Broadcom V3D 3.3 and 4.1 OpenGL ES GPUs.For V3D 2.x support, see the VC4 driver.
The V3D GPU includes a tiled render (composed of a bin and renderpipelines), the TFU (texture formatting unit), and the CSD (computeshader dispatch).
GPU buffer object (BO) management¶
Compared to VC4 (V3D 2.x), V3D 3.3 introduces an MMU between theGPU and the bus, allowing us to use shmem objects for our storageinstead of CMA.
Physically contiguous objects may still be imported to V3D, but thedriver doesn’t allocate physically contiguous objects on its own.Display engines requiring physically contiguous allocations shouldlook into Mesa’s “renderonly” support (as used by the Mesa pl111driver) for an example of how to integrate with V3D.
Long term, we should support evicting pages from the MMU when undermemory pressure (thus the v3d_bo_get_pages() refcounting), butthat’s not a high priority since our systems tend to not have swap.
Address space management¶
The V3D 3.x hardware (compared to VC4) now includes an MMU. It hasa single level of page tables for the V3D’s 4GB address space tomap to AXI bus addresses, thus it could need up to 4MB ofphysically contiguous memory to store the PTEs.
Because the 4MB of contiguous memory for page tables is precious,and switching between them is expensive, we load all BOs into thesame 4GB address space.
To protect clients from each other, we should use the GMP toquickly mask out (at 128kb granularity) what pages are available toeach client. This is not yet implemented.
GPU Scheduling¶
The shared DRM GPU scheduler is used to coordinate submitting jobsto the hardware. Each DRM fd (roughly a client process) gets itsown scheduler entity, which will process jobs in order. The GPUscheduler will round-robin between clients to submit the next job.
For simplicity, and in order to keep latency low for interactivejobs when bulk background jobs are queued up, we submit a new jobto the HW only when it has completed the last one, instead offilling up the CT[01]Q FIFOs with jobs. Similarly, we usev3d_job_dependency() to manage the dependency between bin andrender, instead of having the clients submit jobs using the HW’ssemaphores to interlock between them.
Interrupts¶
When we take a bin, render, TFU done, or CSD done interrupt, weneed to signal the fence for that job so that the scheduler canqueue up the next one and unblock any waiters.
When we take the binner out of memory interrupt, we need toallocate some new memory and pass it to the binner so that thecurrent job can make progress.