The Userspace I/O HOWTO¶
| Author: | Hans-Jürgen Koch Linux developer, Linutronix |
|---|---|
| Date: | 2006-12-11 |
About this document¶
Translations¶
If you know of any translations for this document, or you are interestedin translating it, please email mehjk@hansjkoch.de.
Preface¶
For many types of devices, creating a Linux kernel driver is overkill.All that is really needed is some way to handle an interrupt and provideaccess to the memory space of the device. The logic of controlling thedevice does not necessarily have to be within the kernel, as the devicedoes not need to take advantage of any of other resources that thekernel provides. One such common class of devices that are like this arefor industrial I/O cards.
To address this situation, the userspace I/O system (UIO) was designed.For typical industrial I/O cards, only a very small kernel module isneeded. The main part of the driver will run in user space. Thissimplifies development and reduces the risk of serious bugs within akernel module.
Please note that UIO is not an universal driver interface. Devices thatare already handled well by other kernel subsystems (like networking orserial or USB) are no candidates for an UIO driver. Hardware that isideally suited for an UIO driver fulfills all of the following:
- The device has memory that can be mapped. The device can becontrolled completely by writing to this memory.
- The device usually generates interrupts.
- The device does not fit into one of the standard kernel subsystems.
Acknowledgments¶
I’d like to thank Thomas Gleixner and Benedikt Spranger of Linutronix,who have not only written most of the UIO code, but also helped greatlywriting this HOWTO by giving me all kinds of background information.
Feedback¶
Find something wrong with this document? (Or perhaps something right?) Iwould love to hear from you. Please email me athjk@hansjkoch.de.
About UIO¶
If you use UIO for your card’s driver, here’s what you get:
- only one small kernel module to write and maintain.
- develop the main part of your driver in user space, with all thetools and libraries you’re used to.
- bugs in your driver won’t crash the kernel.
- updates of your driver can take place without recompiling the kernel.
How UIO works¶
Each UIO device is accessed through a device file and several sysfsattribute files. The device file will be called/dev/uio0 for thefirst device, and/dev/uio1,/dev/uio2 and so on for subsequentdevices.
/dev/uioX is used to access the address space of the card. Just usemmap() to access registers or RAM locations of your card.
Interrupts are handled by reading from/dev/uioX. A blockingread() from/dev/uioX will return as soon as aninterrupt occurs. You can also useselect() on/dev/uioX to wait for an interrupt. The integer value read from/dev/uioX represents the total interrupt count. You can use thisnumber to figure out if you missed some interrupts.
For some hardware that has more than one interrupt source internally,but not separate IRQ mask and status registers, there might besituations where userspace cannot determine what the interrupt sourcewas if the kernel handler disables them by writing to the chip’s IRQregister. In such a case, the kernel has to disable the IRQ completelyto leave the chip’s register untouched. Now the userspace part candetermine the cause of the interrupt, but it cannot re-enableinterrupts. Another cornercase is chips where re-enabling interrupts isa read-modify-write operation to a combined IRQ status/acknowledgeregister. This would be racy if a new interrupt occurred simultaneously.
To address these problems, UIO also implements a write() function. It isnormally not used and can be ignored for hardware that has only a singleinterrupt source or has separate IRQ mask and status registers. If youneed it, however, a write to/dev/uioX will call theirqcontrol() function implemented by the driver. You haveto write a 32-bit value that is usually either 0 or 1 to disable orenable interrupts. If a driver does not implementirqcontrol(),write() will return with-ENOSYS.
To handle interrupts properly, your custom kernel module can provide itsown interrupt handler. It will automatically be called by the built-inhandler.
For cards that don’t generate interrupts but need to be polled, there isthe possibility to set up a timer that triggers the interrupt handler atconfigurable time intervals. This interrupt simulation is done bycallinguio_event_notify() from the timer’s eventhandler.
Each driver provides attributes that are used to read or writevariables. These attributes are accessible through sysfs files. A customkernel driver module can add its own attributes to the device owned bythe uio driver, but not added to the UIO device itself at this time.This might change in the future if it would be found to be useful.
The following standard attributes are provided by the UIO framework:
name: The name of your device. It is recommended to use the nameof your kernel module for this.version: A version string defined by your driver. This allows theuser space part of your driver to deal with different versions of thekernel module.event: The total number of interrupts handled by the driver sincethe last time the device node was read.
These attributes appear under the/sys/class/uio/uioX directory.Please note that this directory might be a symlink, and not a realdirectory. Any userspace code that accesses it must be able to handlethis.
Each UIO device can make one or more memory regions available for memorymapping. This is necessary because some industrial I/O cards requireaccess to more than one PCI memory region in a driver.
Each mapping has its own directory in sysfs, the first mapping appearsas/sys/class/uio/uioX/maps/map0/. Subsequent mappings createdirectoriesmap1/,map2/, and so on. These directories will onlyappear if the size of the mapping is not 0.
EachmapX/ directory contains four read-only files that showattributes of the memory:
name: A string identifier for this mapping. This is optional, thestring can be empty. Drivers can set this to make it easier foruserspace to find the correct mapping.addr: The address of memory that can be mapped.size: The size, in bytes, of the memory pointed to by addr.offset: The offset, in bytes, that has to be added to the pointerreturned bymmap()to get to the actual device memory.This is important if the device’s memory is not page aligned.Remember that pointers returned bymmap()are alwayspage aligned, so it is good style to always add this offset.
From userspace, the different mappings are distinguished by adjustingtheoffset parameter of themmap() call. To map thememory of mapping N, you have to use N times the page size as youroffset:
offset = N * getpagesize();
Sometimes there is hardware with memory-like regions that can not bemapped with the technique described here, but there are still ways toaccess them from userspace. The most common example are x86 ioports. Onx86 systems, userspace can access these ioports usingioperm(),iopl(),inb(),outb(), and similar functions.
Since these ioport regions can not be mapped, they will not appear under/sys/class/uio/uioX/maps/ like the normal memory described above.Without information about the port regions a hardware has to offer, itbecomes difficult for the userspace part of the driver to find out whichports belong to which UIO device.
To address this situation, the new directory/sys/class/uio/uioX/portio/ was added. It only exists if the driverwants to pass information about one or more port regions to userspace.If that is the case, subdirectories namedport0,port1, and soon, will appear underneath/sys/class/uio/uioX/portio/.
EachportX/ directory contains four read-only files that show name,start, size, and type of the port region:
name: A string identifier for this port region. The string isoptional and can be empty. Drivers can set it to make it easier foruserspace to find a certain port region.start: The first port of this region.size: The number of ports in this region.porttype: A string describing the type of port.
Writing your own kernel module¶
Please have a look atuio_cif.c as an example. The followingparagraphs explain the different sections of this file.
struct uio_info¶
This structure tells the framework the details of your driver, Some ofthe members are required, others are optional.
constchar*name: Required. The name of your driver as it willappear in sysfs. I recommend using the name of your module for this.constchar*version: Required. This string appears in/sys/class/uio/uioX/version.structuio_memmem[MAX_UIO_MAPS]: Required if you have memorythat can be mapped withmmap(). For each mapping youneed to fill one of theuio_memstructures. See the descriptionbelow for details.structuio_portport[MAX_UIO_PORTS_REGIONS]: Required if youwant to pass information about ioports to userspace. For each portregion you need to fill one of theuio_portstructures. See thedescription below for details.longirq: Required. If your hardware generates an interrupt, it’syour modules task to determine the irq number during initialization.If you don’t have a hardware generated interrupt but want to triggerthe interrupt handler in some other way, setirqtoUIO_IRQ_CUSTOM. If you had no interrupt at all, you could setirqtoUIO_IRQ_NONE, though this rarely makes sense.unsignedlongirq_flags: Required if you’ve setirqto ahardware interrupt number. The flags given here will be used in thecall torequest_irq().int(*mmap)(structuio_info*info,structvm_area_struct*vma):Optional. If you need a specialmmap()function, you can set it here. If this pointer is not NULL, yourmmap()will be called instead of the built-in one.int(*open)(structuio_info*info,structinode*inode):Optional. You might want to have your ownopen(),e.g. to enable interrupts only when your device is actually used.int(*release)(structuio_info*info,structinode*inode):Optional. If you define your ownopen(), you willprobably also want a customrelease()function.int(*irqcontrol)(structuio_info*info,s32irq_on):Optional. If you need to be able to enable or disable interruptsfrom userspace by writing to/dev/uioX, you can implement thisfunction. The parameterirq_onwill be 0 to disable interruptsand 1 to enable them.
Usually, your device will have one or more memory regions that can bemapped to user space. For each region, you have to set up astructuio_mem in themem[] array. Here’s a description of thefields ofstructuio_mem:
constchar*name: Optional. Set this to help identify the memoryregion, it will show up in the corresponding sysfs node.intmemtype: Required if the mapping is used. Set this toUIO_MEM_PHYSif you you have physical memory on your card to bemapped. UseUIO_MEM_LOGICALfor logical memory (e.g. allocatedwith__get_free_pages()but notkmalloc()). There’s alsoUIO_MEM_VIRTUALfor virtual memory.phys_addr_taddr: Required if the mapping is used. Fill in theaddress of your memory block. This address is the one that appears insysfs.resource_size_tsize: Fill in the size of the memory block thataddrpoints to. Ifsizeis zero, the mapping is consideredunused. Note that youmust initializesizewith zero for allunused mappings.void*internal_addr: If you have to access this memory regionfrom within your kernel module, you will want to map it internally byusing something likeioremap(). Addresses returned bythis function cannot be mapped to user space, so you must not storeit inaddr. Useinternal_addrinstead to remember such anaddress.
Please do not touch themap element ofstructuio_mem! It isused by the UIO framework to set up sysfs files for this mapping. Simplyleave it alone.
Sometimes, your device can have one or more port regions which can notbe mapped to userspace. But if there are other possibilities foruserspace to access these ports, it makes sense to make informationabout the ports available in sysfs. For each region, you have to set upastructuio_port in theport[] array. Here’s a description ofthe fields ofstructuio_port:
char*porttype: Required. Set this to one of the predefinedconstants. UseUIO_PORT_X86for the ioports found in x86architectures.unsignedlongstart: Required if the port region is used. Fill inthe number of the first port of this region.unsignedlongsize: Fill in the number of ports in this region.Ifsizeis zero, the region is considered unused. Note that youmust initializesizewith zero for all unused regions.
Please do not touch theportio element ofstructuio_port! It isused internally by the UIO framework to set up sysfs files for thisregion. Simply leave it alone.
Adding an interrupt handler¶
What you need to do in your interrupt handler depends on your hardwareand on how you want to handle it. You should try to keep the amount ofcode in your kernel interrupt handler low. If your hardware requires noaction that youhave to perform after each interrupt, then yourhandler can be empty.
If, on the other hand, your hardwareneeds some action to be performedafter each interrupt, then youmust do it in your kernel module. Notethat you cannot rely on the userspace part of your driver. Youruserspace program can terminate at any time, possibly leaving yourhardware in a state where proper interrupt handling is still required.
There might also be applications where you want to read data from yourhardware at each interrupt and buffer it in a piece of kernel memoryyou’ve allocated for that purpose. With this technique you could avoidloss of data if your userspace program misses an interrupt.
A note on shared interrupts: Your driver should support interruptsharing whenever this is possible. It is possible if and only if yourdriver can detect whether your hardware has triggered the interrupt ornot. This is usually done by looking at an interrupt status register. Ifyour driver sees that the IRQ bit is actually set, it will perform itsactions, and the handler returns IRQ_HANDLED. If the driver detectsthat it was not your hardware that caused the interrupt, it will donothing and return IRQ_NONE, allowing the kernel to call the nextpossible interrupt handler.
If you decide not to support shared interrupts, your card won’t work incomputers with no free interrupts. As this frequently happens on the PCplatform, you can save yourself a lot of trouble by supporting interruptsharing.
Using uio_pdrv for platform devices¶
In many cases, UIO drivers for platform devices can be handled in ageneric way. In the same place where you define yourstructplatform_device, you simply also implement your interrupthandler and fill yourstructuio_info. A pointer to thisstructuio_info is then used asplatform_data for your platformdevice.
You also need to set up an array ofstructresource containingaddresses and sizes of your memory mappings. This information is passedto the driver using the.resource and.num_resources elements ofstructplatform_device.
You now have to set the.name element ofstructplatform_deviceto"uio_pdrv" to use the generic UIO platform device driver. Thisdriver will fill themem[] array according to the resources given,and register the device.
The advantage of this approach is that you only have to edit a file youneed to edit anyway. You do not have to create an extra driver.
Using uio_pdrv_genirq for platform devices¶
Especially in embedded devices, you frequently find chips where the irqpin is tied to its own dedicated interrupt line. In such cases, whereyou can be really sure the interrupt is not shared, we can take theconcept ofuio_pdrv one step further and use a generic interrupthandler. That’s whatuio_pdrv_genirq does.
The setup for this driver is the same as described above foruio_pdrv, except that you do not implement an interrupt handler. The.handler element ofstructuio_info must remainNULL. The.irq_flags element must not containIRQF_SHARED.
You will set the.name element ofstructplatform_device to"uio_pdrv_genirq" to use this driver.
The generic interrupt handler ofuio_pdrv_genirq will simply disablethe interrupt line usingdisable_irq_nosync(). Afterdoing its work, userspace can reenable the interrupt by writing0x00000001 to the UIO device file. The driver already implements anirq_control() to make this possible, you must notimplement your own.
Usinguio_pdrv_genirq not only saves a few lines of interrupthandler code. You also do not need to know anything about the chip’sinternal registers to create the kernel part of the driver. All you needto know is the irq number of the pin the chip is connected to.
When used in a device-tree enabled system, the driver needs to beprobed with the"of_id" module parameter set to the"compatible"string of the node the driver is supposed to handle. By default, thenode’s name (without the unit address) is exposed as name for theUIO device in userspace. To set a custom name, a property named"linux,uio-name" may be specified in the DT node.
Using uio_dmem_genirq for platform devices¶
In addition to statically allocated memory ranges, they may also be adesire to use dynamically allocated regions in a user space driver. Inparticular, being able to access memory made available through thedma-mapping API, may be particularly useful. Theuio_dmem_genirqdriver provides a way to accomplish this.
This driver is used in a similar manner to the"uio_pdrv_genirq"driver with respect to interrupt configuration and handling.
Set the.name element ofstructplatform_device to"uio_dmem_genirq" to use this driver.
When using this driver, fill in the.platform_data element ofstructplatform_device, which is of typestructuio_dmem_genirq_pdata and which contains the followingelements:
structuio_infouioinfo: The same structure used as theuio_pdrv_genirqplatform dataunsignedint*dynamic_region_sizes: Pointer to list of sizes ofdynamic memory regions to be mapped into user space.unsignedintnum_dynamic_regions: Number of elements indynamic_region_sizesarray.
The dynamic regions defined in the platform data will be appended to the`` mem[] `` array after the platform device resources, which impliesthat the total number of static and dynamic memory regions cannot exceedMAX_UIO_MAPS.
The dynamic memory regions will be allocated when the UIO device file,/dev/uioX is opened. Similar to static memory resources, the memoryregion information for dynamic regions is then visible via sysfs at/sys/class/uio/uioX/maps/mapY/*. The dynamic memory regions will befreed when the UIO device file is closed. When no processes are holdingthe device file open, the address returned to userspace is ~0.
Writing a driver in userspace¶
Once you have a working kernel module for your hardware, you can writethe userspace part of your driver. You don’t need any special libraries,your driver can be written in any reasonable language, you can usefloating point numbers and so on. In short, you can use all the toolsand libraries you’d normally use for writing a userspace application.
Getting information about your UIO device¶
Information about all UIO devices is available in sysfs. The first thingyou should do in your driver is checkname andversion to makesure you’re talking to the right device and that its kernel driver hasthe version you expect.
You should also make sure that the memory mapping you need exists andhas the size you expect.
There is a tool calledlsuio that lists UIO devices and theirattributes. It is available here:
http://www.osadl.org/projects/downloads/UIO/user/
Withlsuio you can quickly check if your kernel module is loaded andwhich attributes it exports. Have a look at the manpage for details.
The source code oflsuio can serve as an example for gettinginformation about an UIO device. The fileuio_helper.c contains alot of functions you could use in your userspace driver code.
mmap() device memory¶
After you made sure you’ve got the right device with the memory mappingsyou need, all you have to do is to callmmap() to map thedevice’s memory to userspace.
The parameteroffset of themmap() call has a specialmeaning for UIO devices: It is used to select which mapping of yourdevice you want to map. To map the memory of mapping N, you have to useN times the page size as your offset:
offset = N * getpagesize();
N starts from zero, so if you’ve got only one memory range to map, setoffset=0. A drawback of this technique is that memory is alwaysmapped beginning with its start address.
Waiting for interrupts¶
After you successfully mapped your devices memory, you can access itlike an ordinary array. Usually, you will perform some initialization.After that, your hardware starts working and will generate an interruptas soon as it’s finished, has some data available, or needs yourattention because an error occurred.
/dev/uioX is a read-only file. Aread() will alwaysblock until an interrupt occurs. There is only one legal value for thecount parameter ofread(), and that is the size of asigned 32 bit integer (4). Any other value forcount causesread() to fail. The signed 32 bit integer read is theinterrupt count of your device. If the value is one more than the valueyou read the last time, everything is OK. If the difference is greaterthan one, you missed interrupts.
You can also useselect() on/dev/uioX.
Generic PCI UIO driver¶
The generic driver is a kernel module named uio_pci_generic. It canwork with any device compliant to PCI 2.3 (circa 2002) and any compliantPCI Express device. Using this, you only need to write the userspacedriver, removing the need to write a hardware-specific kernel module.
Making the driver recognize the device¶
Since the driver does not declare any device ids, it will not get loadedautomatically and will not automatically bind to any devices, you mustload it and allocate id to the driver yourself. For example:
modprobe uio_pci_genericecho "8086 10f5" > /sys/bus/pci/drivers/uio_pci_generic/new_id
If there already is a hardware specific kernel driver for your device,the generic driver still won’t bind to it, in this case if you want touse the generic driver (why would you?) you’ll have to manually unbindthe hardware specific driver and bind the generic driver, like this:
echo -n 0000:00:19.0 > /sys/bus/pci/drivers/e1000e/unbindecho -n 0000:00:19.0 > /sys/bus/pci/drivers/uio_pci_generic/bind
You can verify that the device has been bound to the driver by lookingfor it in sysfs, for example like the following:
ls -l /sys/bus/pci/devices/0000:00:19.0/driver
Which if successful should print:
.../0000:00:19.0/driver -> ../../../bus/pci/drivers/uio_pci_generic
Note that the generic driver will not bind to old PCI 2.2 devices. Ifbinding the device failed, run the following command:
dmesg
and look in the output for failure reasons.
Things to know about uio_pci_generic¶
Interrupts are handled using the Interrupt Disable bit in the PCIcommand register and Interrupt Status bit in the PCI status register.All devices compliant to PCI 2.3 (circa 2002) and all compliant PCIExpress devices should support these bits. uio_pci_generic detectsthis support, and won’t bind to devices which do not support theInterrupt Disable Bit in the command register.
On each interrupt, uio_pci_generic sets the Interrupt Disable bit.This prevents the device from generating further interrupts until thebit is cleared. The userspace driver should clear this bit beforeblocking and waiting for more interrupts.
Writing userspace driver using uio_pci_generic¶
Userspace driver can use pci sysfs interface, or the libpci library thatwraps it, to talk to the device and to re-enable interrupts by writingto the command register.
Example code using uio_pci_generic¶
Here is some sample userspace driver code using uio_pci_generic:
#include <stdlib.h>#include <stdio.h>#include <unistd.h>#include <sys/types.h>#include <sys/stat.h>#include <fcntl.h>#include <errno.h>int main(){ int uiofd; int configfd; int err; int i; unsigned icount; unsigned char command_high; uiofd = open("/dev/uio0", O_RDONLY); if (uiofd < 0) { perror("uio open:"); return errno; } configfd = open("/sys/class/uio/uio0/device/config", O_RDWR); if (configfd < 0) { perror("config open:"); return errno; } /* Read and cache command value */ err = pread(configfd, &command_high, 1, 5); if (err != 1) { perror("command config read:"); return errno; } command_high &= ~0x4; for(i = 0;; ++i) { /* Print out a message, for debugging. */ if (i == 0) fprintf(stderr, "Started uio test driver.\n"); else fprintf(stderr, "Interrupts: %d\n", icount); /****************************************/ /* Here we got an interrupt from the device. Do something to it. */ /****************************************/ /* Re-enable interrupts. */ err = pwrite(configfd, &command_high, 1, 5); if (err != 1) { perror("config write:"); break; } /* Wait for next interrupt. */ err = read(uiofd, &icount, 4); if (err != 4) { perror("uio read:"); break; } } return errno;}Generic Hyper-V UIO driver¶
The generic driver is a kernel module named uio_hv_generic. Itsupports devices on the Hyper-V VMBus similar to uio_pci_generic onPCI bus.
Making the driver recognize the device¶
Since the driver does not declare any device GUID’s, it will not getloaded automatically and will not automatically bind to any devices, youmust load it and allocate id to the driver yourself. For example, to usethe network device class GUID:
modprobe uio_hv_genericecho "f8615163-df3e-46c5-913f-f2d2f965ed0e" > /sys/bus/vmbus/drivers/uio_hv_generic/new_id
If there already is a hardware specific kernel driver for the device,the generic driver still won’t bind to it, in this case if you want touse the generic driver for a userspace library you’ll have to manually unbindthe hardware specific driver and bind the generic driver, using the device specific GUIDlike this:
echo -n ed963694-e847-4b2a-85af-bc9cfc11d6f3 > /sys/bus/vmbus/drivers/hv_netvsc/unbindecho -n ed963694-e847-4b2a-85af-bc9cfc11d6f3 > /sys/bus/vmbus/drivers/uio_hv_generic/bind
You can verify that the device has been bound to the driver by lookingfor it in sysfs, for example like the following:
ls -l /sys/bus/vmbus/devices/ed963694-e847-4b2a-85af-bc9cfc11d6f3/driver
Which if successful should print:
.../ed963694-e847-4b2a-85af-bc9cfc11d6f3/driver -> ../../../bus/vmbus/drivers/uio_hv_generic
Things to know about uio_hv_generic¶
On each interrupt, uio_hv_generic sets the Interrupt Disable bit. Thisprevents the device from generating further interrupts until the bit iscleared. The userspace driver should clear this bit before blocking andwaiting for more interrupts.
When host rescinds a device, the interrupt file descriptor is marked downand any reads of the interrupt file descriptor will return -EIO. Similarto a closed socket or disconnected serial device.
- The vmbus device regions are mapped into uio device resources:
- Channel ring buffers: guest to host and host to guest
- Guest to host interrupt signalling pages
- Guest to host monitor page
- Network receive buffer region
- Network send buffer region
If a subchannel is created by a request to host, then the uio_hv_genericdevice driver will create a sysfs binary file for the per-channel ring buffer.For example:
/sys/bus/vmbus/devices/3811fe4d-0fa0-4b62-981a-74fc1084c757/channels/21/ring