Inotify - A Powerful yet Simple File Change Notification System¶
Document started 15 Mar 2005 by Robert Love <rml@novell.com>
Document updated 4 Jan 2015 by Zhang Zhen <zhenzhang.zhang@huawei.com>
Deleted obsoleted interface, just refer to manpages for user interface.
Rationale
- Q:
What is the design decision behind not tying the watch to the open fd ofthe watched object?
- A:
Watches are associated with an open inotify device, not an open file.This solves the primary problem with dnotify: keeping the file open pinsthe file and thus, worse, pins the mount. Dnotify is therefore infeasiblefor use on a desktop system with removable media as the media cannot beunmounted. Watching a file should not require that it be open.
- Q:
What is the design decision behind using an-fd-per-instance as opposed toan fd-per-watch?
- A:
An fd-per-watch quickly consumes more file descriptors than are allowed,more fd’s than are feasible to manage, and more fd’s than are optimallyselect()-able. Yes, root can bump the per-process fd limit and yes, userscan use epoll, but requiring both is a silly and extraneous requirement.A watch consumes less memory than an open file, separating the numberspaces is thus sensible. The current design is what user-space developerswant: Users initialize inotify, once, and add n watches, requiring but onefd and no twiddling with fd limits. Initializing an inotify instance twothousand times is silly. If we can implement user-space’s preferencescleanly--and we can, the idr layer makes stuff like this trivial--then weshould.
There are other good arguments. With a single fd, there is a singleitem to block on, which is mapped to a single queue of events. The singlefd returns all watch events and also any potential out-of-band data. Ifevery fd was a separate watch,
There would be no way to get event ordering. Events on file foo andfile bar would pop poll() on both fd’s, but there would be no way to tellwhich happened first. A single queue trivially gives you ordering. Suchordering is crucial to existing applications such as Beagle. Imagine“mv a b ; mv b a” events without ordering.
We’d have to maintain n fd’s and n internal queues with state,versus just one. It is a lot messier in the kernel. A single, linearqueue is the data structure that makes sense.
User-space developers prefer the current API. The Beagle guys, forexample, love it. Trust me, I asked. It is not a surprise: Who’d wantto manage and block on 1000 fd’s via select?
No way to get out of band data.
1024 is still too low. ;-)
When you talk about designing a file change notification system thatscales to 1000s of directories, juggling 1000s of fd’s just does not seemthe right interface. It is too heavy.
Additionally, it _is_ possible to more than one instance andjuggle more than one queue and thus more than one associated fd. Thereneed not be a one-fd-per-process mapping; it is one-fd-per-queue and aprocess can easily want more than one queue.
- Q:
Why the system call approach?
- A:
The poor user-space interface is the second biggest problem with dnotify.Signals are a terrible, terrible interface for file notification. Or foranything, for that matter. The ideal solution, from all perspectives, is afile descriptor-based one that allows basic file I/O and poll/select.Obtaining the fd and managing the watches could have been done either via adevice file or a family of new system calls. We decided to implement afamily of system calls because that is the preferred approach for new kernelinterfaces. The only real difference was whether we wanted to use open(2)and ioctl(2) or a couple of new system calls. System calls beat ioctls.