epoll is aLinux kernelsystem call for a scalable I/O event notification mechanism, first introduced in version 2.5.45 of theLinux kernel in October, 2002.[1][2] Its function is to monitor multiple file descriptors to see whether I/O is possible on any of them. It is meant to replace the olderPOSIXselect(2) andpoll(2)system calls, to achieve better performance in more demanding applications, where the number of watchedfile descriptors is large (unlike the older system calls, which operate inO(n) time,epoll operates inO(1) time).[3]
epoll is similar toFreeBSD'skqueue, in that it consists of a set ofuser-space functions, each taking afile descriptor argument denoting the configurable kernel object, against which they cooperatively operate.epoll uses ared–black tree (RB-tree) data structure to keep track of all file descriptors that are currently being monitored.[4]
intepoll_create1(intflags);
Creates anepoll object and returns its file descriptor. Theflags parameter allows epoll behavior to be modified. It has only one valid value,EPOLL_CLOEXEC.epoll_create() is an older variant ofepoll_create1() and is deprecated as of Linux kernel version 2.6.27 and glibc version 2.9.[5]
intepoll_ctl(intepfd,intop,intfd,structepoll_event*event);
Controls (configures) which file descriptors are watched by this object, and for which events.op can be ADD, MODIFY or DELETE.
intepoll_wait(intepfd,structepoll_event*events,intmaxevents,inttimeout);
Waits for any of the events registered for withepoll_ctl, until at least one occurs or the timeout elapses. Returns the occurred events inevents, up tomaxevents at once.maxevents is the maximum number ofepoll_event/file descriptors to be monitored.[6][7] In most case,maxevents is set to the value of the size of*events argument (struct epoll_event* events array).
epoll provides bothedge-triggered andlevel-triggered modes. In edge-triggered mode, a call toepoll_wait will return only when a new event is enqueued with theepoll object, while in level-triggered mode,epoll_wait will return as long as the condition holds.
For instance, if apipe registered withepoll has received data, a call toepoll_wait will return, signaling the presence of data to be read. Suppose, the reader only consumed part of data from the buffer. In level-triggered mode, further calls toepoll_wait will return immediately, as long as the pipe's buffer contains data to be read. In edge-triggered mode, however,epoll_wait will return only once new data is written to the pipe.[8]
Bryan Cantrill pointed out thatepoll had mistakes that could have been avoided, had it learned from its predecessors:input/output completion ports,event ports (Solaris) andkqueue.[9] However, a large part of his criticism was addressed byepoll'sEPOLLONESHOT andEPOLLEXCLUSIVE options.EPOLLONESHOT was added in version 2.6.2 of the Linux kernel mainline, released in February 2004.EPOLLEXCLUSIVE was added in version 4.5, released in March 2016.[10]