drm/vc4 Broadcom VC4 Graphics Driver¶
The Broadcom VideoCore 4 (present in the Raspberry Pi) contains aOpenGL ES 2.0-compatible 3D engine called V3D, and a highlyconfigurable display output pipeline that supports HDMI, DSI, DPI,and Composite TV output.
The 3D engine also has an interface for submitting arbitrarycompute shader-style jobs using the same shader processor as isused for vertex and fragment shaders in GLES 2.0. However, giventhat the hardware isn’t able to expose any standard interfaces likeOpenGL compute shaders or OpenCL, it isn’t supported by thisdriver.
Display Hardware Handling¶
This section covers everything related to the display hardware includingthe mode setting infrastructure, plane, sprite and cursor handling anddisplay, output probing and related topics.
Pixel Valve (DRM CRTC)¶
In VC4, the Pixel Valve is what most closely corresponds to theDRM’s concept of a CRTC. The PV generates video timings from theencoder’s clock plus its configuration. It pulls scaled pixels fromthe HVS at that timing, and feeds it to the encoder.
However, the DRM CRTC also collects the configuration of all theDRM planes attached to it. As a result, the CRTC is alsoresponsible for writing the display list for the HVS channel thatthe CRTC will use.
The 2835 has 3 different pixel valves. pv0 in the audio powerdomain feeds DSI0 or DPI, while pv1 feeds DS1 or SMI. pv2 in theimage domain can feed either HDMI or the SDTV controller. Thepixel valve chooses from the CPRMAN clocks (HSM for HDMI, VEC forSDTV, etc.) according to which output type is chosen in the mux.
For power management, the pixel valve’s registers are all clockedby the AXI clock, while the timings and FIFOs make use of theoutput-specific clock. Since the encoders also directly consumethe CPRMAN clocks, and know what timings they need, they are theones that set the clock.
HVS¶
The Hardware Video Scaler (HVS) is the piece of hardware that doestranslation, scaling, colorspace conversion, and compositing ofpixels stored in framebuffers into a FIFO of pixels going out tothe Pixel Valve (CRTC). It operates at the system clock rate (thesystem audio clock gate, specifically), which is much higher thanthe pixel clock rate.
There is a single global HVS, with multiple output FIFOs that canbe consumed by the PVs. This file just manages the resources forthe HVS, while the vc4_crtc.c code actually drives HVS setup foreach CRTC.
HVS planes¶
Each DRM plane is a layer of pixels being scanned out by the HVS.
At atomic modeset check time, we compute the HVS display elementstate that would be necessary for displaying the plane (giving us achance to figure out if a plane configuration is invalid), then atatomic flush time the CRTC will ask us to write our element stateinto the region of the HVS that it has allocated for us.
HDMI encoder¶
The HDMI core has a state machine and a PHY. On BCM2835, most ofthe unit operates off of the HSM clock from CPRMAN. It alsointernally uses the PLLH_PIX clock for the PHY.
HDMI infoframes are kept within a small packet ram, where eachpacket can be individually enabled for including in a frame.
HDMI audio is implemented entirely within the HDMI IP block. Aregister in the HDMI encoder takes SPDIF frames from the DMA engineand transfers them over an internal MAI (multi-channel audiointerconnect) bus to the encoder side for insertion into the videoblank regions.
The driver’s HDMI encoder does not yet support power management.The HDMI encoder’s power domain and the HSM/pixel clocks are keptcontinuously running, and only the HDMI logic and packet ram arepowered off/on at disable/enable time.
The driver does not yet support CEC control, though the HDMIencoder block has CEC support.
DSI encoder¶
BCM2835 contains two DSI modules, DSI0 and DSI1. DSI0 is asingle-lane DSI controller, while DSI1 is a more modern 4-lane DSIcontroller.
Most Raspberry Pi boards expose DSI1 as their “DISPLAY” connector,while the compute module brings both DSI0 and DSI1 out.
This driver has been tested for DSI1 video-mode display onlycurrently, with most of the information necessary for DSI0hopefully present.
DPI encoder¶
The VC4 DPI hardware supports MIPI DPI type 4 and Nokia ViSSIsignals. On BCM2835, these can be routed out to GPIO0-27 with theALT2 function.
VEC (Composite TV out) encoder¶
The VEC encoder generates PAL or NTSC composite video output.
TV mode selection is done by an atomic property on the encoder,because a drm_mode_modeinfo is insufficient to distinguish betweenPAL and PAL-M or NTSC and NTSC-J.
KUnit Tests¶
The VC4 Driver uses KUnit to perform driver-specific unit andintegration tests.
These tests are using a mock driver and can be ran using thecommand below, on either arm or arm64 architectures,
$./tools/testing/kunit/kunit.pyrun\--kunitconfig=drivers/gpu/drm/vc4/tests/.kunitconfig\--cross_compileaarch64-linux-gnu---archarm64
- Parts of the driver that are currently covered by tests are:
The HVS to PixelValve dynamic FIFO assignment, for the BCM2835-7and BCM2711.
Memory Management and 3D Command Submission¶
This section covers the GEM implementation in the vc4 driver.
GPU buffer object (BO) management¶
The VC4 GPU architecture (both scanout and rendering) has directaccess to system memory with no MMU in between. To support it, weuse the GEM DMA helper functions to allocate contiguous ranges ofphysical memory for our BOs.
Since the DMA allocator is very slow, we keep a cache of recentlyfreed BOs around so that the kernel’s allocation of objects for 3Drendering can return quickly.
V3D binner command list (BCL) validation¶
Since the VC4 has no IOMMU between it and system memory, a userwith access to execute command lists could escalate privilege byoverwriting system memory (drawing to it as a framebuffer) orreading system memory it shouldn’t (reading it as a vertex bufferor index buffer)
We validate binner command lists to ensure that all accesses arewithin the bounds of the GEM objects referenced by the submittedjob. It explicitly whitelists packets, and looks at the offsets inany address fields to make sure they’re contained within the BOsthey reference.
Note that because CL validation is already reading theuser-submitted CL and writing the validated copy out to the memorythat the GPU will actually read, this is also where GEM relocationprocessing (turning BO references into actual addresses for the GPUto use) happens.
V3D render command list (RCL) generation¶
In the V3D hardware, render command lists are what load and storetiles of a framebuffer and optionally call out to binner-generatedcommand lists to do the 3D drawing for that tile.
In the VC4 driver, render command list generation is performed by thekernel instead of userspace. We do this because validating auser-submitted command list is hard to get right and has high CPU overhead,while the number of valid configurations for render command lists isactually fairly low.
Shader validator for VC4¶
Since the VC4 has no IOMMU between it and system memory, a userwith access to execute shaders could escalate privilege byoverwriting system memory (using the VPM write address register inthe general-purpose DMA mode) or reading system memory it shouldn’t(reading it as a texture, uniform data, or direct-addressed TMUlookup).
The shader validator walks over a shader’s BO, ensuring that itsaccesses are appropriately bounded, and recording where textureaccesses are made so that we can do relocations for them in theuniform stream.
Shader BO are immutable for their lifetimes (enforced by notallowing mmaps, GEM prime export, or rendering to from a CL), sothis validation is only performed at BO creation time.
V3D Interrupts¶
We have an interrupt status register (V3D_INTCTL) which reportsinterrupts, and where writing 1 bits clears those interrupts.There are also a pair of interrupt registers(V3D_INTENA/V3D_INTDIS) where writing a 1 to their bits enables ordisables that specific interrupt, and 0s written are ignored(reading either one returns the set of enabled interrupts).
When we take a binning flush done interrupt, we need to submit thenext frame for binning and move the finished frame to the renderthread.
When we take a render frame interrupt, we need to wake theprocesses waiting for some frame to be done, and get the next framesubmitted ASAP (so the hardware doesn’t sit idle when there’s workto do).
When we take the binner out of memory interrupt, we need toallocate some new memory and pass it to the binner so that thecurrent job can make progress.