ioctl based interfaces

ioctl() is the most common way for applications to interfacewith device drivers. It is flexible and easily extended by adding newcommands and can be passed through character devices, block devices aswell as sockets and other special file descriptors.

However, it is also very easy to get ioctl command definitions wrong,and hard to fix them later without breaking existing applications,so this documentation tries to help developers get it right.

Command number definitions

The command number, or request number, is the second argument passed tothe ioctl system call. While this can be any 32-bit number that uniquelyidentifies an action for a particular driver, there are a number ofconventions around defining them.

include/uapi/asm-generic/ioctl.h provides four macros for definingioctl commands that follow modern conventions:_IO,_IOR,_IOW, and_IOWR. These should be used for all new commands,with the correct parameters:

_IO/_IOR/_IOW/_IOWR
The macro name specifies how the argument will be used.  It may be apointer to data to be passed into the kernel (_IOW), out of the kernel(_IOR), or both (_IOWR).  _IO can indicate either commands with noargument or those passing an integer value instead of a pointer.It is recommended to only use _IO for commands without arguments,and use pointers for passing data.
type
An 8-bit number, often a character literal, specific to a subsystemor driver, and listed inIoctl Numbers
nr
An 8-bit number identifying the specific command, unique for a givevalue of ‘type’
data_type
The name of the data type pointed to by the argument, the command numberencodes thesizeof(data_type) value in a 13-bit or 14-bit integer,leading to a limit of 8191 bytes for the maximum size of the argument.Note: do not pass sizeof(data_type) type into _IOR/_IOW/IOWR, as thatwill lead to encoding sizeof(sizeof(data_type)), i.e. sizeof(size_t)._IO does not have a data_type parameter.

Interface versions

Some subsystems use version numbers in data structures to overloadcommands with different interpretations of the argument.

This is generally a bad idea, since changes to existing commands tendto break existing applications.

A better approach is to add a new ioctl command with a new number. Theold command still needs to be implemented in the kernel for compatibility,but this can be a wrapper around the new implementation.

Return code

ioctl commands can return negative error codes as documented in errno(3);these get turned into errno values in user space. On success, the returncode should be zero. It is also possible but not recommended to returna positive ‘long’ value.

When the ioctl callback is called with an unknown command number, thehandler returns either -ENOTTY or -ENOIOCTLCMD, which also results in-ENOTTY being returned from the system call. Some subsystems return-ENOSYS or -EINVAL here for historic reasons, but this is wrong.

Prior to Linux 5.5, compat_ioctl handlers were required to return-ENOIOCTLCMD in order to use the fallback conversion into nativecommands. As all subsystems are now responsible for handling compatmode themselves, this is no longer needed, but it may be important toconsider when backporting bug fixes to older kernels.

Timestamps

Traditionally, timestamps and timeout values are passed asstructtimespec orstructtimeval, but these are problematic because ofincompatible definitions of these structures in user space after themove to 64-bit time_t.

Thestruct__kernel_timespec type can be used instead to be embeddedin other data structures when separate second/nanosecond values aredesired, or passed to user space directly. This is still not ideal though,as the structure matches neither the kernel’s timespec64 nor the userspace timespec exactly. The get_timespec64() and put_timespec64() helperfunctions can be used to ensure that the layout remains compatible withuser space and the padding is treated correctly.

As it is cheap to convert seconds to nanoseconds, but the oppositerequires an expensive 64-bit division, a simple __u64 nanosecond valuecan be simpler and more efficient.

Timeout values and timestamps should ideally use CLOCK_MONOTONIC time,as returned byktime_get_ns() orktime_get_ts64(). UnlikeCLOCK_REALTIME, this makes the timestamps immune from jumping backwardsor forwards due to leap second adjustments and clock_settime() calls.

ktime_get_real_ns() can be used for CLOCK_REALTIME timestamps thatneed to be persistent across a reboot or between multiple machines.

32-bit compat mode

In order to support 32-bit user space running on a 64-bit machine, eachsubsystem or driver that implements an ioctl callback handler must alsoimplement the corresponding compat_ioctl handler.

As long as all the rules for data structures are followed, this is aseasy as setting the .compat_ioctl pointer to a helper function such ascompat_ptr_ioctl() or blkdev_compat_ptr_ioctl().

compat_ptr()

On the s390 architecture, 31-bit user space has ambiguous representationsfor data pointers, with the upper bit being ignored. When running sucha process in compat mode, the compat_ptr() helper must be used toclear the upper bit of a compat_uptr_t and turn it into a valid 64-bitpointer. On other architectures, this macro only performs a cast to avoid__user* pointer.

In an compat_ioctl() callback, the last argument is an unsigned long,which can be interpreted as either a pointer or a scalar depending onthe command. If it is a scalar, then compat_ptr() must not be used, toensure that the 64-bit kernel behaves the same way as a 32-bit kernelfor arguments with the upper bit set.

The compat_ptr_ioctl() helper can be used in place of a customcompat_ioctl file operation for drivers that only take arguments thatare pointers to compatible data structures.

Structure layout

Compatible data structures have the same layout on all architectures,avoiding all problematic members:

  • long andunsignedlong are the size of a register, sothey can be either 32-bit or 64-bit wide and cannot be used in portabledata structures. Fixed-length replacements are__s32,__u32,__s64 and__u64.

  • Pointers have the same problem, in addition to requiring theuse of compat_ptr(). The best workaround is to use__u64in place of pointers, which requires a cast touintptr_t in userspace, and the use of u64_to_user_ptr() in the kernel to convertit back into a user pointer.

  • On the x86-32 (i386) architecture, the alignment of 64-bit variablesis only 32-bit, but they are naturally aligned on most otherarchitectures including x86-64. This means a structure like:

    struct foo {    __u32 a;    __u64 b;    __u32 c;};

    has four bytes of padding between a and b on x86-64, plus another fourbytes of padding at the end, but no padding on i386, and it needs acompat_ioctl conversion handler to translate between the two formats.

    To avoid this problem, all structures should have their membersnaturally aligned, or explicit reserved fields added in place of theimplicit padding. Thepahole tool can be used for checking thealignment.

  • On ARM OABI user space, structures are padded to multiples of 32-bit,making some structs incompatible with modern EABI kernels if theydo not end on a 32-bit boundary.

  • On the m68k architecture, struct members are not guaranteed to have analignment greater than 16-bit, which is a problem when relying onimplicit padding.

  • Bitfields and enums generally work as one would expect them to,but some properties of them are implementation-defined, so it is betterto avoid them completely in ioctl interfaces.

  • char members can be either signed or unsigned, depending onthe architecture, so the __u8 and __s8 types should be used for 8-bitinteger values, though char arrays are clearer for fixed-length strings.

Information leaks

Uninitialized data must not be copied back to user space, as this cancause an information leak, which can be used to defeat kernel addressspace layout randomization (KASLR), helping in an attack.

For this reason (and for compat support) it is best to avoid anyimplicit padding in data structures.  Where there is implicit paddingin an existing structure, kernel drivers must be careful to fullyinitialize an instance of the structure before copying it to userspace.  This is usually done by callingmemset() before assigning toindividual members.

Subsystem abstractions

While some device drivers implement their own ioctl function, mostsubsystems implement the same command for multiple drivers. Ideally thesubsystem has an .ioctl() handler that copies the arguments from andto user space, passing them into subsystem specific callback functionsthrough normal kernel pointers.

This helps in various ways:

  • Applications written for one driver are more likely to work foranother one in the same subsystem if there are no subtle differencesin the user space ABI.
  • The complexity of user space access and data structure layout is donein one place, reducing the potential for implementation bugs.
  • It is more likely to be reviewed by experienced developersthat can spot problems in the interface when the ioctl is sharedbetween multiple drivers than when it is only used in a single driver.

Alternatives to ioctl

There are many cases in which ioctl is not the best solution for aproblem. Alternatives include:

  • System calls are a better choice for a system-wide feature thatis not tied to a physical device or constrained by the file systempermissions of a character device node
  • netlink is the preferred way of configuring any network relatedobjects through sockets.
  • debugfs is used for ad-hoc interfaces for debugging functionalitythat does not need to be exposed as a stable interface to applications.
  • sysfs is a good way to expose the state of an in-kernel objectthat is not tied to a file descriptor.
  • configfs can be used for more complex configuration than sysfs
  • A custom file system can provide extra flexibility with a simpleuser interface but adds a lot of complexity to the implementation.