(How to avoid) Botching up ioctls¶
From:https://blog.ffwll.ch/2013/11/botching-up-ioctls.html
By: Daniel Vetter, Copyright © 2013 Intel Corporation
One clear insight kernel graphics hackers gained in the past few years is thattrying to come up with a unified interface to manage the execution units andmemory on completely different GPUs is a futile effort. So nowadays everydriver has its own set of ioctls to allocate memory and submit work to the GPU.Which is nice, since there’s no more insanity in the form of fake-generic, butactually only used once interfaces. But the clear downside is that there’s muchmore potential to screw things up.
To avoid repeating all the same mistakes again I’ve written up some of thelessons learned while botching the job for the drm/i915 driver. Most of theseonly cover technicalities and not the big-picture issues like what the commandsubmission ioctl exactly should look like. Learning these lessons is probablysomething every GPU driver has to do on its own.
Prerequisites¶
First the prerequisites. Without these you have already failed, because youwill need to add a 32-bit compat layer:
Only use fixed sized integers. To avoid conflicts with typedefs in userspacethe kernel has special types like __u32, __s64. Use them.
Align everything to the natural size and use explicit padding. 32-bitplatforms don’t necessarily align 64-bit values to 64-bit boundaries, but64-bit platforms do. So we always need padding to the natural size to getthis right.
Pad the entire
structtoa multiple of 64-bits if the structure contains64-bit types - the structure size will otherwise differ on 32-bit versus64-bit. Having a different structure size hurts when passing arrays ofstructures to the kernel, or if the kernel checks the structure size, whiche.g. the drm core does.Pointers are __u64, cast from/to a uintptr_t on the userspace side andfrom/to a void __user * in the kernel. Try really hard not to delay thisconversion or worse, fiddle the raw __u64 through your code since thatdiminishes the checking tools like sparse can provide. The macrou64_to_user_ptr can be used in the kernel to avoid warnings about integersand pointers of different sizes.
Basics¶
With the joys of writing a compat layer avoided we can take a look at the basicfumbles. Neglecting these will make backward and forward compatibility a realpain. And since getting things wrong on the first attempt is guaranteed youwill have a second iteration or at least an extension for any given interface.
Have a clear way for userspace to figure out whether your new ioctl or ioctlextension is supported on a given kernel. If you can’t rely on old kernelsrejecting the new flags/modes or ioctls (since doing that was botched in thepast) then you need a driver feature flag or revision number somewhere.
Have a plan for extending ioctls with new flags or new fields at the end ofthe structure. The drm core checks the passed-in size for each ioctl calland zero-extends any mismatches between kernel and userspace. That helps,but isn’t a complete solution since newer userspace on older kernels won’tnotice that the newly added fields at the end get ignored. So this stillneeds a new driver feature flags.
Check all unused fields and flags and all the padding for whether it’s 0,and reject the ioctl if that’s not the case. Otherwise your nice plan forfuture extensions is going right down the gutters since someone will submitan ioctl
structwithrandom stack garbage in the yet unused parts. Whichthen bakes in the ABI that those fields can never be used for anything elsebut garbage. This is also the reason why you must explicitly pad allstructures, even if you never use them in an array - the padding the compilermight insert could contain garbage.Have simple testcases for all of the above.
Fun with Error Paths¶
Nowadays we don’t have any excuse left any more for drm drivers being neatlittle root exploits. This means we both need full input validation and soliderror handling paths - GPUs will die eventually in the oddmost corner casesanyway:
The ioctl must check for array overflows. Also it needs to check forover/underflows and clamping issues of integer values in general. The usualexample is sprite positioning values fed directly into the hardware with thehardware just having 12 bits or so. Works nicely until some odd displayserver doesn’t bother with clamping itself and the cursor wraps around thescreen.
Have simple testcases for every input validation failure case in your ioctl.Check that the error code matches your expectations. And finally make surethat you only test for one single error path in each subtest by submittingotherwise perfectly valid data. Without this an earlier check might rejectthe ioctl already and shadow the codepath you actually want to test, hidingbugs and regressions.
Make all your ioctls restartable. First X really loves signals and secondthis will allow you to test 90% of all error handling paths by justinterrupting your main test suite constantly with signals. Thanks to X’slove for signal you’ll get an excellent base coverage of all your errorpaths pretty much for free for graphics drivers. Also, be consistent withhow you handle ioctl restarting - e.g. drm has a tiny drmIoctl helper in itsuserspace library. The i915 driver botched this with the set_tiling ioctl,now we’re stuck forever with some arcane semantics in both the kernel anduserspace.
If you can’t make a given codepath restartable make a stuck task at leastkillable. GPUs just die and your users won’t like you more if you hang theirentire box (by means of an unkillable X process). If the state recovery isstill too tricky have a timeout or hangcheck safety net as a last-ditcheffort in case the hardware has gone bananas.
Have testcases for the really tricky corner cases in your error recovery code- it’s way too easy to create a deadlock between your hangcheck code andwaiters.
Time, Waiting and Missing it¶
GPUs do most everything asynchronously, so we have a need to time operations andwait for outstanding ones. This is really tricky business; at the moment none ofthe ioctls supported by the drm/i915 get this fully right, which means there’sstill tons more lessons to learn here.
Use CLOCK_MONOTONIC as your reference time, always. It’s what alsa, drm andv4l use by default nowadays. But let userspace know which timestamps arederived from different clock domains like your main system clock (providedby the kernel) or some independent hardware counter somewhere else. Clockswill mismatch if you look close enough, but if performance measuring toolshave this information they can at least compensate. If your userspace canget at the raw values of some clocks (e.g. through in-command-streamperformance counter sampling instructions) consider exposing those also.
Use __s64 seconds plus __u64 nanoseconds to specify time. It’s not the mostconvenient time specification, but it’s mostly the standard.
Check that input time values are normalized and reject them if not. Notethat the kernel native
structktimehas a signed integer for both secondsand nanoseconds, so beware here.For timeouts, use absolute times. If you’re a good fellow and made yourioctl restartable relative timeouts tend to be too coarse and canindefinitely extend your wait time due to rounding on each restart.Especially if your reference clock is something really slow like the displayframe counter. With a spec lawyer hat on this isn’t a bug since timeouts canalways be extended - but users will surely hate you if their neat animationsstarts to stutter due to this.
Consider ditching any synchronous wait ioctls with timeouts and just deliveran asynchronous event on a pollable file descriptor. It fits much betterinto event driven applications’ main loop.
Have testcases for corner-cases, especially whether the return values foralready-completed events, successful waits and timed-out waits are all saneand suiting to your needs.
Leaking Resources, Not¶
A full-blown drm driver essentially implements a little OS, but specialized tothe given GPU platforms. This means a driver needs to expose tons of handlesfor different objects and other resources to userspace. Doing that rightentails its own little set of pitfalls:
Always attach the lifetime of your dynamically created resources to thelifetime of a file descriptor. Consider using a 1:1 mapping if your resourceneeds to be shared across processes - fd-passing over unix domain socketsalso simplifies lifetime management for userspace.
Always have O_CLOEXEC support.
Ensure that you have sufficient insulation between different clients. Bydefault pick a private per-fd namespace which forces any sharing to be doneexplicitly. Only go with a more global per-device namespace if the objectsare truly device-unique. One counterexample in the drm modeset interfaces isthat the per-device modeset objects like connectors share a namespace withframebuffer objects, which mostly are not shared at all. A separatenamespace, private by default, for framebuffers would have been moresuitable.
Think about uniqueness requirements for userspace handles. E.g. for most drmdrivers it’s a userspace bug to submit the same object twice in the samecommand submission ioctl. But then if objects are shareable userspace needsto know whether it has seen an imported object from a different processalready or not. I haven’t tried this myself yet due to lack of a new classof objects, but consider using inode numbers on your shared file descriptorsas unique identifiers - it’s how real files are told apart, too.Unfortunately this requires a full-blown virtual filesystem in the kernel.
Last, but not Least¶
Not every problem needs a new ioctl:
Think hard whether you really want a driver-private interface. Of courseit’s much quicker to push a driver-private interface than engaging inlengthy discussions for a more generic solution. And occasionally doing aprivate interface to spearhead a new concept is what’s required. But in theend, once the generic interface comes around you’ll end up maintaining twointerfaces. Indefinitely.
Consider other interfaces than ioctls. A sysfs attribute is much better forper-device settings, or for child objects with fairly static lifetimes (likeoutput connectors in drm with all the detection override attributes). Ormaybe only your testsuite needs this interface, and then debugfs with itsdisclaimer of not having a stable ABI would be better.
Finally, the name of the game is to get it right on the first attempt, since ifyour driver proves popular and your hardware platforms long-lived then you’llbe stuck with a given ioctl essentially forever. You can try to deprecatehorrible ioctls on newer iterations of your hardware, but generally it takesyears to accomplish this. And then again years until the last user able tocomplain about regressions disappears, too.