Movatterモバイル変換


[0]ホーム

URL:


Jump to content
WikipediaThe Free Encyclopedia
Search

Everything is a file

From Wikipedia, the free encyclopedia
Unix philosophy

"Everything is a file" is an approach to interface design inUnix derivatives.While this turn of phrase does not as such figure as a Unix design principle orphilosophy,it is a common way to analyse designs, and informs the design of new interfaces in a way that prefers, in rough order of import:

  1. representing objects asfile descriptors instead of alternatives likeabstract handles or names,
  2. operating on the objects with standardinput/output operations, returningbyte streams to be interpreted by applications (rather than explicitly structured data), and
  3. allowing the usage or creation of objects by opening or creating files in the globalfilesystem name space.

The lines between the common interpretations of "file" and "file descriptor" are often blurred when analysing Unix, and nameability offiles is the least important part of this principle; thus, it is sometimes described as"Everything is a file descriptor".[1][2][3]

This approach is interpreted differently with time, philosophy of each system, and the domain to which it's applied.The rest of this article demonstrates notable examples of some of those interpretations, and their repercussions.

Objects as file descriptors

[edit]

Under Unix, adirectory can be opened like a regular file, containing fixed-size records of(i-node, filename),but directories cannot be written to directly, and are modified by the kernel as a side-effect of creating and removing files within the directory.[4]

Some interfaces only follow a subset of these guidelines, for examplepipes donot exist on the filesystem —pipe() creates a pair of unnameable file descriptors.[5]The later invention ofnamed pipes (FIFOs) byPOSIX fills this gap.

This does not mean that theonly operations on an object are reading and writing:ioctl() and similar interfaces allow for object-specific operations (like controllingtty characteristics),directory file descriptors can be used to alter path look-ups (with a growing number of*at() system call variants likeopenat()[6]) or to change the working directory to the one represented by the file descriptor,[7] in both cases preventingrace conditions and being faster than the alternative of looking up the entire path.[8]

Socket file descriptors require configuration (setting the remote address and connecting) after creation before being used for I/O.A server socket may not be used for I/O directly at all —in connection-based protocols,bind() assigns a local address to a socket,andlisten() uses that socket to wait until a remote process connects,then returns anew socket file descriptor representing that direct bidirectional connection.

This approach allows management of objects used by a program in a standardised manner, just like any other file —after binding to an address privileges may be dropped,the server socket may be distributed among many processes byfork()ing(respectively closed in subprocesses that should not have access),or the individual connections' sockets may be given asstandard input/output to specialised handlers for those connections,as in thesuper-server/CGI/inetd paradigms.

Many interfaces present in early Unixes that do not use file descriptors became duplicated in later designs:thealarm()/setitimer() system calls schedule the delivery of a signal after the specified time elapses;this timer is inherited by children, and persists afterexec().The POSIXtimer_create() API serves a similar function, but destroys the timer in child processes and onexec();these timers identified by opaque handles.Both interfaces always deliver their completions asynchronously,and cannot bepoll()ed/select()ed,making their integration into a complex event loop more difficult.

The timerfd design (originally found inLinux),turns each timer object into a file descriptor,which can be individually observed withpoll() &c. and whose inheritance to child processes can be controlled with the standardclose()/CLOEXEC/CLOFORK controls.

While the POSIX API hastimer_getoverrun() that returns how many times the timer elapsed, this is returned as the result ofread() from a timerfd.This operation blocks, so waiting until a timerfd elapses is as easy as reading from it. There is no way to atomically do this with classic Unix or POSIX timers.The timer can be inspected non-blockingly by performing a non-blocking read (a standard I/O operation).

Objects in the filesystem namespace

[edit]

Special file types

[edit]

Device special files are a defining characteristic of Unix:initially, opening a regular file withi-node number ≤40 (traditionally stored under/dev) instead returned a file descriptor corresponding to a device, and handled by the device driver.The magic i-node number scheme later became codified into files with typeS_IFBLK/S_IFCHR.

Opening special files is beholden to the samefile-system permissions checks as opening regular files, allowing common access control —chown dmr /usr/dmr /dev/rk0; chmod o= /usr/dmr /dev/rk0 changes the ownership and file access mode of both the directory/usr/dmr and device/dev/rk0.

For block devices (hard disks andtape drives), due to their size, this meant unique semantics: they were block-addressed (see[9]), and programs needed to be written specifically to work correctly with them.This is described as "extremely unfortunate", and later interfaces alleviate this.[a]

In many cases, magnetic tapes continue to have unique semantics:some tapes can be partitioned into "files" and the driver signals an end-of-file condition after the end of a partition is reached,socp /dev/nrst0 file1; cp /dev/nrst0 file2 will createfile1 andfile2 consisting of two consecutive partitions of the tape— the driver provides anabstraction layer that presents a tape file descriptor as-if it were a regular file to fit into theEverything is a file paradigm.Specialised programs likemt are used to move between partitions on a tape like this,

Named pipes (FIFOs) appear asS_IFIFO-type files in the filesystem, can be renamed, and may be opened like regular files.

Under Unix derivatives,Unix-domain sockets appear asS_IFSOCK-type files in the filesystem, can be renamed, but cannot beopen()ed —one must create the correct type of socket file descriptor andconnect() explicitly.UnderPlan 9, sockets in the filesystem may be opened like regular files.

As a replacement for dedicated system calls

[edit]

Modern systems contain high-performance I/O event notification facilities—kqueue (BSD derivatives),epoll (Linux),IOCP (Windows NT,Solaris),/dev/poll (Solaris) —the control object is generally created (kqueue(),epoll_create())and configured (kevent(),epoll_ctl()) with dedicated system calls.A/dev/poll instance is created by opening the file"/dev/poll" directly, writing configured objects to observe, andioctl()s for additional configuration.

Memory may be allocated by requesting ananonymous memory mapping — one that doesn't correspond to any file.On modern systems this can be done by specifying no file andMAP_ANONYMOUS;inUNIX System V Release 4, this was done by opening/dev/zero, andmmap()ping it.

API filesystems

[edit]

Operating systemAPIs can be implemented as regular system calls, or assynthetic file-systems.In the former case, system state can only be inspected by specially-written programs shipped with the system,and any additional processing desired by the user needs to either filter and parse the output of those programs,execute them to write the desired state,or must be implemented in the nativesystem programming language.

In the latter case, system state is presented as-if it were regular files and directories[12] —on systems with aprocfs, information about running processes can be obtained by looking at, canonically,/proc,which contains directories named after thePIDs running on the system,containing files likestat (status) with process metadata,cwd,exe, androotsymbolic links to the process' working directory, executable image, and root directory —or directories likefd which contains symbolic links to the files the process has opened, named after the file descriptors.

Because these attributes are presented as files and symbolic links, standard utilities work on them,and one can, say, inspect the identity of the process withgrep Uid /proc/1392400/status,go to the same directory as a process is in withcd /proc/1392400/cwd,look what files a process has open withls -l /proc/1392400/fd,then open a file that process has open withless /proc/1392400/fd/8.This improves ergonomics over parsing this data from the output of a utility.[13][14]

Under Linux, symbolic links under procfs are "magic": they can actually behave like cross-filesystemhard links to the files they point to.This behaviour allows recovery of files removed from the filesystem but still open by a process,and permanently persisting files created byO_TMPFILE in the filesystem (which otherwise cannot be named).

4.4BSD-derivedsysctls are key/value mappings managed by thesysctl program,which lists all variables withsysctl -a, the value of one variable withsysctl net.inet.ip.forwarding, and sets it withsysctl -w net.inet.ip.forwarding=1.Under Linux, the equivalent mechanism is provided by procfs under the/proc/sys tree:the respective operations can be done withfind /proc/sys/grep -r ^ /proc/sys,cat /proc/sys/net/ipv4/ip_forward, andecho 1 > /proc/sys/net/ipv4/ip_forward.

For convenience or standards conformance, dedicated inspection tools (likeps andsysctl) may still be provided, using these filesystems as data sources/sinks.

sysfs[15] anddebugfs[16] are similar Linux interfaces for further configuring the kernel: writingmem to/sys/power/state will trigger asuspend-to-RAM procedure,[17] and writing2 to/sys/module/iwlwifi/parameters/led_mode will start blinking the Wi-Fi LED on activity.

These aresynthetic file-systems because the contents of each file are not stored anywhere verbatim:when the file is read, the appropriate kerneldata structures areserialised into the reading process' input buffer,and when the file is written to, the output buffer is parsed.[15]This means that the file abstraction is broken, since the file metadata isn't valid: depending on the filesystem, each file reports a size of 0 orPAGE_SIZE, even though reading the data will yield a different number of bytes.

Notes

[edit]
  1. ^First in Version 4 Unix by adding specialseek() modes that multiply the offset by 512 in the kernel,[10] finally in Version 7 Unix by providinglseek() with a 32-bit argument.[11]

See also

[edit]

References

[edit]
  1. ^"Linus Torvalds - 'everything is a file descriptor or a process'".Yarchive.net. Retrieved2015-08-28.
  2. ^"Ghosts of Unix Past".Lwn.net. Retrieved2015-08-28.
  3. ^Kernighan, Brian (October 18, 2019).UNIX - A History and a Memoir. Independently published (October 18, 2019). p. 76ff.ISBN 978-1695978553.
  4. ^Ken Thompson andDennis Ritchie (3 November 1971)."DIRECTORY (V)"(PDF).UNIX Programmer's Manual.Bell Laboratories.
  5. ^Ken Thompson andDennis Ritchie (February 1973)."PIPE (II)".UNIX Programmer's Manual (Third ed.).Bell Laboratories../man2/pipe.2
  6. ^"open, openat — open file".IEEE Std 1003.1-2024, The Open Group Base Specifications Issue 8.The IEEE andThe Open Group. 2024.
  7. ^"fchdir — change working directory".IEEE Std 1003.1-2024, The Open Group Base Specifications Issue 8.The IEEE andThe Open Group. 2024.
  8. ^"D. Portability Considerations (Informative), D.2 Portability Capabilities, D.2.3 Access to Data".IEEE Std 1003.1-2024, The Open Group Base Specifications Issue 8.The IEEE andThe Open Group. 2024.
  9. ^Ken Thompson andDennis Ritchie (3 November 1971)."/DEV/RF0 (IV)"(PDF).UNIX Programmer's Manual.Bell Laboratories.
  10. ^Ken Thompson andDennis Ritchie (November 1973)."PIPE (II)".UNIX Programmer's Manual (Fourth ed.).Bell Laboratories../man2/pipe.2, and theAddressing on the tape files, like that on the RK and RF disks, is block-oriented. stanza is gone.
  11. ^"LSEEK(2)".UNIX Programmer's Manual (Seventh ed.).Bell Laboratories. January 1979.usr/man/man2/lseek.2
  12. ^Benvenuti, Christian (2006)."3. User-Space-to-Kernel Interface".Understanding Linux network internals (Nachdr. ed.). Beijing Köln: O'Reilly. p. 58.ISBN 9780596002558.
  13. ^Xiao, Yang; Li, Frank Haizhon; Chen, Hui (2011).Handbook of security and networks. Hackensack (NJ): World scientific. p. 160.ISBN 9789814273039.
  14. ^"27. Upgrading and customizing the kernel".Red Hat Linux Networking and System Administration. John Wiley & Sons. 2007. p. 662.ISBN 9780471777311.
  15. ^abMochel, Patrick; Murphy, Mike (16 August 2011)."sysfs -The filesystem for exporting kernel objects — The Linux Kernel documentation".kernel.org.Archived from the original on 13 March 2024. Retrieved15 June 2024.
  16. ^"sysfs, procfs, sysctl, debugfs and other similar kernel interfaces".John's Blog. 2013-11-20. Retrieved2024-06-15.
  17. ^Wysocki, Rafael J."System Power Management Sleep States".kernel.org. Retrieved15 June 2024.
Retrieved from "https://en.wikipedia.org/w/index.php?title=Everything_is_a_file&oldid=1323922024"
Categories:
Hidden categories:

[8]ページ先頭

©2009-2025 Movatter.jp