It has been 15 years since I wrote this paper. I have not followed the development of NFS. I occasionally get e-mails about NFS because this paper is online, probably about one per year on average.
For example, Carsten Grohmann contacted me recently and reports:
Currently parts of the documentation are outdated like thelocking section. We use NFS locks extensively on RHEL5 andRHEL7 together with NetApp and it works really well.
While the article seems to indicate that I hate NFS, this is not true. I confess that I even use NFS today in my home network to watch recordings from my PVR in another room.
I am releasing this document under aCC-SA license. Enjoy!
— Shane Kerr, 2015-12-31
| I. | The Problem | |
| II. | Introduction to NFS | |
| III. | List of Concerns | |
| a. | Time Synchronization | |
| b. | File Locking Semantics | |
| c. | File Locking API | |
| d. | Exclusive File Creation | |
| e. | Delayed Write Caching | |
| f. | Read Caching and File Access Time | |
| g. | Indestructable Files | |
| h. | User and Group Names and Numbers | |
| i. | Superuser Account | |
| j. | Security | |
| k. | Unkillable Processes | |
| l. | Timeouts on Mount and Unmount | |
| m. | Overlapping Exports | |
| n. | Automount | |
| IV. | Other Issues to Address | |
| a. | Scalability | |
| b. | Server Replication | |
| V. | Typical (Mis)Uses of NFS | |
| a. | Logging | |
| b. | Data Exchange | |
| VI. | Reasonable NFS Use | |
| a. | Package Maintenance | |
| b. | User Home Directories | |
| VII. | Alternatives | |
| a. | ||
| b. | Home Directories | |
| c. | Data Migration | |
| VIII. | Conclusion | |
| IX. | References |
NFS fails at the goal of allowing a computer to access files over anetwork as if they were on a local disk. In many ways, NFS comes closeto the objective, and in certain circumstances (detailed later), this isacceptable. However, the subtle differences can cause subtle bugs andgreater system issues. The widespread misconception about thecompatibility and transparency of NFS means that it is often usedinappropriately, and often put into production when better, moreacceptable solutions exist.
The Network File System (NFS) is a protocol developed bySun Microsystems. There are currently twoversions (versions 2 and 3), which are documented in various IETF RFC's(these arenot IETF standards, merely informational documents),and another version currently being standardized (version 4). NFS isintended to allow a computer to access files over a network as if theywere on a local disk. It achieves this via a client-server interface,where one machine exports a drive or portion of a drive, and anothermachine mounts the export locally. Any combination of imports and mountsis possible, with clients able to mount multiple exports from multipleservers, servers exporting multiple directories to multiple clients, andhosts that act as both clients and servers.
NFS is built on the Remote Procedure Call (RPC) mechanism, making it astateless protocol. In fact, NFS is not stateless from the client pointof view, but it is from the server point of view. This means that theserver does not keep any information about the clients using it. Theadvantage of this is that there is no additional burden placed on theserver for additional clients (except for any load incurred by actualdata transfer). The idea is that while clients will probably use only afew servers, servers will need to support large numbers of clients. Inaddition to scalability, using RPC allows NFS to be extremely robust inthe event of failure on either the client or server. If a client fails,the server will typically not notice. If a server fails, any pendingclient operations will typically resume without observable interruption,except for a temporary pause in operations until the server has resumednormal operation.
As protocols go, NFS has been widely accepted. Every network operatingsystem has had NFS ported to it in one form or another, and it is usedin almost every Unix environment worldwide. It provides a convenientmechanism for sharing data across platforms, and is a relatively robust,nearly ubiquitous solution to centralized data storage problems.Engineers are familiar with it, users accustomed to it, and developerscontinue to improve it.
Following are a few known problems with NFS and suggested workarounds.
a. Time Synchronization
NFS does not synchronize time between client and server, and offers nomechanism for the client to determine what time the server thinks it is.What this means is that a client can update a file, and have thetimestamp on the file be either some time long in the past, or even inthe future, from its point of view.
While this is generally not an issue if clocks are a few seconds or evena few minutes off, it can be confusing and misleading to humans. Of evengreater importance is the affect on programs. Programs often do notexpect time difference like this, and may end abnormally or behavestrangely, as various tasks timeout instantly, or take extraordinarilylong while to timeout.
Poor time synchronization also makes debugging problems difficult,because there is no easy way to establish a chronology of events. Thisis especially problematic when investigating security issues, such asbreak in attempts.
Workaround: Use theNetwork TimeProtocol (NTP) religiously. Use of NTP can result in machines thathave extremely small time differences.
Note: The NFS protocol version 3 does have support for the clientspecifying the time when updating a file, but this is not widelyimplemented. Additionally, it does not help in the case where twoclients are accessing the same file from machines with drifting clocks.
b. File Locking Semantics
Programs use file locking to insure that concurrent access to files doesnot occur except when guaranteed to be safe. This prevents datacorruption, and allows handshaking between cooperative processes.
In Unix, the kernel handles file locking. This is required so that if aprogram is terminated, any locks that it has are released. It alsoallows the operations to be atomic, meaning that a lock cannot beobtained by multiple processes.
Because NFS is stateless, there is no way for the server to keep trackof file locks - it simply does not know what clients there are or whatfiles they are using. In an effort to solve this, a separate server, thelock daemon, was added. Typically, each NFS server will run a lockdaemon.
The combination of lock daemon and NFS server yields a solution that isalmost like Unix file locking. Unfortunately, file locking is extremelyslow, compared to NFS traffic without file locking (or file locking on alocal Unix disk). Of greater concern is the behaviour of NFS locking onfailure.
In the event of server failure (e.g. server reboot or lock daemonrestart), all client locks are lost. However, the clients are notinformed of this, and because the other operations (read, write, and soon) are not visibly interrupted, they have no reliable way to preventother clients from obtaining a lock on a file they think they havelocked.
In the event of client failure, the locks are not immediately freed.Nor is there a timeout. If the client process terminates, the client OSkernel will notify the server, and the lock will be freed. However, ifthe client system shuts down abnormally (e.g. power failure or kernelpanic), then the server will not be notified. When the client rebootsand remounts the NFS exports, the server is notified and any clientlocks are freed.
If the client does not reboot, for example if a frustrated user hits thepower switch and goes home for the weekend, or if a computer has had ahardware failure and must wait for replacement parts, then the locks arenever freed! In this unfortunate scenario, the server lock daemon mustbe restarted, with the same effects as a server failure.
Workaround: If possible (given program source and skill withcode modification), remove locking and insure no inconsistency occursvia other mechanisms, possibly using atomic file creation (see below) orsome other mechanism for synchronization. Otherwise, build platformsnever fail and have a staff trained on the implications of NFS filelocking failure. If NFS is used only for files that are never accessedby more than a single client, locking is not an issue.
Note: A status monitor mechanism exists to monitor clientstatus, and free client locks if a client is unavailable. However,clients may chose not to use this mechanism, and in many implementationsdo not.
c. File Locking API
In Unix, there are two flavours of file locking,flock() fromBSD andlockf() from System V. It varies from system to systemwhich of these mechanisms work with NFS. In Solaris, Sun's Unixvariant,lockf() works with NFS, andflock() isimplemented vialockf(). On other systems, the results are lessconsistent. For example, on some systems,lockf() is notimplemented at all, andflock() does not support NFS; while onother systems,lockf() supports NFS butflock() doesnot.
Regardless of the system specifics, programs often assume that if theyare unable to obtain a lock, it is because another program has the lock.This can cause problems as programs wait for the lock to be freed. Sincethe reason the lock fails is because locking is unsupported, the attemptto obtain a lock will never work. This results in either theapplications waiting forever, or aborting their operation.
These results will also vary with the support of the server. Whiletypically the NFS server runs an accompanying lock daemon, this is notguaranteed.
Workaround: Upgrade to the latest versions of all operatingsystems, as they usually have improved and more consistent lockingsupport. Also, use the lock daemon. Additionally, try to use onlyprograms written to handle NFS locking properly, veified either by codereview or a vendor compliance statement.
d. Exclusive File Creation
In Unix, when a program creates a file, it may ask for the operation tofail if the file already exists (as opposed to the default behaviour ofusing the existing file). This allows programs to know that, forexample, they have a unique file name for a temporary file. It is alsoused by various daemons for locking various operations, e.g. modifyingmail folders or print queues.
Unfortunately, NFS does not properly implement this behaviour. A filecreation will sometimes return success even if the file already exists.Programs written to work on a local file system will experience strangeresults when they attempt to update a file after using file creation tolock it, only to discover another file is modifying it (I havepersonally seen mailboxes with hundreds of mail messages corruptedbecause of this), because it also "locked" the file via the samemechanism.
Workaround: If possible (given program source and skill with codemodification), use the following method, as documented in the Linuxopen() manual page:
The solution for performing atomic file locking using a lockfileis to create a unique file on the same fs (e.g., incorporatinghostname and pid), use link(2) to make a link to the lockfileand use stat(2) on the unique file to check if its link counthas increased to 2. Do not use the return value of the link()call.This still leaves the issue of client failure unanswered. The suggestedsolution for this is to pick a timeout value and assume if a lock isolder than a certain application-specific age that it has beenabandoned.
e. Delayed Write Caching
In an effort to improve efficiency, many NFS clients cache writes. Thismeans that they delay sending small writes to the server, with the ideathat if the client makes another small write in a short amount of time,the client need only send a single message to the server.Unix servers typically cache disk writes to local disks the same way.The difference is that Unix servers also keep track of the state of thefile in the cache memory versus the state on disk, so programs are allpresented with a single view of the file.
In NFS caching, all applications on a single client will typically seethe same file contents. However, applications accessing the file fromdifferent clients will not see the same file for several seconds.
Workaround: It is often possible to disable client write caching.Unfortunately, this frequently causes unacceptably slow performance,depending on the application. (Applications that perform I/O of largechunks of data should be unaffected, but applications that perform lotsof small I/O operations will be severely punished.) If locking isemployed, applications can explicitly cooperate and flush files from thelocal cache to the server, but see the previous sections on locking whenemploying this solution.
f. Read Caching and File Access Time
Unix file systems typically have three times associated with a file: thetime of last modification (file creation or write), the time of last"change" (write or change of inode information), or the time of lastaccess (file execution or read). NFS file systems also report thisinformation.NFS clients perform attribute caching for efficiency reasons. Readingsmall amounts of data does not update the access time on the server.This means a server may report a file has been unaccessed for a muchlonger time than is accurate.
This can cause problems as administrators and automatic cleanup softwaremay delete files that have remained unused for a long time, expectingthem to be stale lock files, abandoned temporary files and so on.
Workaround: Attribute caching may be disabled on the client, butthis is usually not a good idea for performance reasons. Administratorsshould be trained to understand the behaviour of NFS regarding fileaccess time. Any programs that rely on access time information should bemodified to use another mechanism.
g. Indestructible Files
In Unix, when a file is opened, the data of that file is accessible tothe process that opened it, even if the file is deleted. The disk blocksthe file uses are freed only when the last process which has it open hasclosed it.An NFS server, being stateless, has no way to know what clients have afile open. Indeed, in NFS clients never really "open" or "close" files.So when a file is deleted, the server merely frees the space. Woe beunto any client that was expecting the file contents to be accessible asbefore, as in the Unix world!
In an effort to minimize this as much as possible, when a client deletesa file, the operating systems checks if any process on the same clientbox has it open. If it does, the client renames the file to a "hidden"file. Any read or write requests from processes on the client that wereto the now-deleted file go to the new file.
This file is named in the form.nfsXXXX, where theXXXX value is determined by the inode of the deleted file -basically a random value. If a process (such asrm) attempts todelete this new file from the client, it is replaced by a new.nfsXXXX file, until the process with the file open closes it.
These files are difficult to get rid of, as the process with the fileopen needs to be killed, and it is not easy to determine what thatprocess is. These files may have unpleasant side effects such aspreventing directories from being removed.
If the server or client crashes while a.nfsXXXX file is inuse, they will never be deleted. There is no way for the server or aclient to know whether a.nfsXXXX file is currently being usedby a client or not.
Workaround: One should be able to delete.nfsXXXX filesfrom another client, however if a process writes to the file, it will becreated at that time. It would be best to exit or kill processes usingan NFS file before deleting it. Unfortunately, there is no way to knowif an uncooperative process has a file open.
h. User and Group Names and Numbers
NFS uses user and group numbers, rather than names. This means that eachmachine that accesses an NFS export needs (or at least should) have thesame user and group identifiers as the NFS export has. Note that thisproblem is not unique to NFS, and also applies, for instance, toremovable media and archives. It is most frequently an issue with NFS,however.Workaround: Either the/etc/passwd and/etc/group files must be synchronized, or something like NIS+needs to be used for this purpose.
i. Superuser Account
NFS has special handling of the superuser account (also known as theroot account). By default, the root user may not update files on an NFSmount.Normally on a Unix system, root may do anything to any file. When an NFSdrive has been mounted, this is no longer the case. This can confusescripts and administrators alike.
To clarify: a normal user (for example "shane" or "billg") can updatefiles that the superuser ("root") cannot.
Workaround: Enable root access to specific clients for NFSexports, but only in a trusted environment since NFS is insecure.Therefore, this does not guarantee that unauthorized client will beunable to access the mount as root.
j. Security
NFS is inherently unsecure. While certain methods of encrypted authentication exist, all data is transmitted in the clear, and accounts may be spoofedby user programs.Workaround: Only use NFS within a highly trusted environment,behind the appropriate firewalls and with careful attention to hostsecurity. Visit the NFS (in)security administration and information clearinghouse forsuggestions.
SunWorld offers the following advice in the Solaris Security FAQ:
Disable NFS if possible. NFS traffic flows in clear-text (even whenusing "AUTH_DES" or "AUTH_KERB" for authentication) so any filestransported via NFS are susceptible to snooping.
Sound advice, indeed.
k. Unkillable Processes
When an NFS server is unavailable, the client will typically not returnan error to the process attempting to use it. Rather the client willretry the operation. At some point, it will eventually give up andreturn an error to the process.In Unix there are two kinds of devices, slow and fast. The semantics ofI/O operations vary depending on the type of device. For example, a readon a fast device will always fill a buffer, whereas a read on a slowdevice will return any data ready, even if the buffer is not filled.Disks (even floppy disks or CD-ROM's) are considered fast devices.
The Unix kernel typically does not allow fast I/O operations to beinterrupted. The idea is to avoid the overhead of putting a processinto a suspended state until data is available, because the data isalways either available or not. For disk reads, this is not a problem,because a delay of even hundreds of milliseconds waiting for I/O to beinterrupted is not often harmful to system operation.
NFS mounts, since they are intended to mimic disks, are also consideredfast devices. However, in the event of a server failure, an NFS disk cantake minutes to eventually return success or failure to the application.A program using data on an NFS mount, however, can remain in anuninterruptable state until a final timeout occurs.
Workaround: Don't panic when a process will not terminate fromrepeatedkill -9 commands. Ifps reports the process isin state D, there is a good chance that it is waiting on an NFS mount.Wait 10 minutes, and if the process has still not terminated, thenpanic.
l. Timeouts on Mount and Unmount
If the Unixmount orumount command attempts tooperate on a NFS mount, it will retry the operation until success ortimeout, which is several minutes. Normally, this is not a greatproblem when mounting and unmounting manually. However, when added tothe file system table for a host, it can slow booting and shutdownconsiderably. This can impact service negatively, as machines can takemuch longer to reboot than they would otherwise.A special case of this is circular mounts. It is possible for a machineto mount a directory from a machine that it is also exporting to. Thisis not normally a problem in service, but during system boots it cancause problems. During power failures, for instance, systems will oftenhang waiting to mount each other, until eventually timing out.
Any mounting scenario that eventually leads back to the originating hostcan cause this problem.
Workaround: File system tables should be audited to make surethat old NFS servers have not been left.
Circular mounts should be avoided. (This is sometimes more difficultthan it sounds, for instance in environments where mail folders areexported from a single host, and home directories are exported from asingle host. Both of these hosts need the mounts from the other. In thiscase, tryautomount for these file systems - but see below forautomount concerns.)
m. Overlapping Exports
Unlike mounts from secondary storage, NFS exports are not tied todistinct physical media, but are determined by system administratorconfiguration. Because of this, it is possible to export a directorythat refers to a single file system multiple times.# export of printer spool directory/var/spool/lpr *.ripe.net(rw)# export of mail spool, from the same file system/var/spool/mail *.ripe.net(rw)The problem here is that it is not immediately obvious to users andapplications on clients who have mounted both exports that these resideon the same physical media. When df reports each partition has 200 Mbytefree, for instance, they may assume there is actually 400 Mbyte free,when in fact there is only the same 200 Mbyte reported twice.
Workaround: Either be careful not to export directories from asingle file system separately, or train administrators and possiblyusers as to the implications of this style of export. If the free spaceand inode count for several mounts is identical, check if they areactually from the same physical storage.
Note: Certain NFS servers, such as theNetwork Appliance servers, allowexports to have size restrictions lower than the amount of free space onthe physical storage. In this case, there is no way for a client to knowif they actually use the same disk(s). Use this "feature" with caution.
n. Automount
Automounting is a method where Unix mounts file systems on an as-neededbasis. This is useful for removable media, which may change frequently.It is also useful NFS, which may not be accessible all the time, andwhich would require a lot of time to mount large numbers of filesystems.One minor concern is the transitive nature of automounted mounts, whichdeviates from standard Unix practice. Not a large concern, but it is yetanother way in which NFS mounts deviate in operation from normal localmounts.
Another issue is the invisibility of automount mount points. Normally,a Unix filesystem requires that a directory (hopefully empty) existswhere a mount is attached. For automount directories this is not thecase. An example:
$ cd /home$ ls -F$ cat shane/hello.pl#!/usr/bin/perlprint "Hello, world (Perl rules).\n"; $ ls -Fshane/$As you see here, the/home directory was empty, then on accessof/home/shane/hello.pl the directory/home/shane wasautomounted. At this point, the/home directory contained adirectory. Most users would not expect to have a directory created bythecat command.
Workaround: Train administrators on the affects of automounting so they will not be surprised.
a. Scalability
NFS is designed the way it is for scalability NFS is typically not usedon the scale of more than a few dozen machines. While it is possiblethat larger configurations exist, it is doubtful that NFS canreasonably support them. The problem is not so much software or systemsupport, but rather bottlenecks on the media. Consider the performanceimplications of sharing even an extremely fast hard disk with hundredsof other users.b. Server Replication
Some NFS implementations support server replication. The implications onperformance, administration, and data consistency for theseconfigurations need to be seriously examined and evaluated.
NFS is generally considered an easy means of solving data sharingproblems because it is often already configured and installed. Theproblems with this incorrect assumption are often not discovered untila great deal of work is required to re-engineer a solution. As aresult, the mostly working, but not best, solution is left in place.a. Logging
The idea of using NFS for logging is that it allows a single locationwhere an entire site's records can be maintained. This allows for moreefficient use of disk resources and centralization of administrativetasks like backup. However, using a centralized log file system makesall applications using it vulnerable to any of them filling theavailable space. This can happen by a single task on the network runningaway.Also, if a host is having network problems, it may stop logging, makingit difficult for the administrator to determine what chain of events ledup to the current state. At the very least, the administrator will notbe able to access these logs from that host.
Security is always a primary concern with NFS, but with centralizedlogging potentially sensitive information may be broadcast to the LAN.For example, "from" and "to" information on e-mail is often logged bye-mail servers. Malicious users can also take advantage of thepreviously mentioned concerns to confuse administrators and cover theirtracks.
Any use of the network runs a higher risk of data loss than local disks.With two hosts, the chance of hardware error is higher than with asingle host. Additionally, any switches or cables between the hosts mayfail. Hosts are also vulnerable to DNS failures, DHCP failures, as wellas security risks like UDP spoofs and the previously mentioned denial ofservice attacks.
b. Data Exchange
NFS is often used to exchange data between cooperating tasks running ondifferent machines. Typically it involves a well-known directory whereone or more machines add files and one or more machines remove the filesfor processing.This can work, but coordination is difficult. Developers andadministrators need to be aware of the issues documented in the list ofconcerns above.
Synchronization is often difficult when using NFS for data exchange,especially when using tools not intended for this task. A common backupprocedure is for hosts to place archive files in an NFS mount that adifferent host copies to backup media. This can lead to disaster as filesystems fill and the backup procedure takes longer, resulting in thecopy to tape starting before the backup has completed. (The realdisaster occurs when a restoration from this tape is attempted.)
a. Package Maintenance
Administrators often have to maintain packages across a large number ofmachines. In this case, they desire to reduce the work and chance formisconfiguration by centralizing package installation and configuration.This is an excellent application for NFS. Executables and theirconfiguration files can be exported in a read-only fashion. Users willautomatically have the latest versions of their software available.
Changes should still occur during off-peak times to avoid user confusionas much as possible.
b. User Home Directories
NFS is often used to present a consistent environment to users. By usingNFS to mount users' home directories, machines can be treated asequivalent by users.In this scenario, care must be taken to present users with a consistentenvironment in other regards. This means identical package installationand so on. It is also difficult to do this in a heterogeneousenvironment, where different operating systems and/or hardware are used.In this case, users should be helped to adapt their scripts orconfigurations for the different environments where they need to.
Also, users must be informed that their home directories are NFSmounted, and made aware of the implications of that. At the very least,their cron jobs should only operate from a single host, and they willhave to be educated on the appropriate mechanism for password changesand such.
Simply asking for a XYZFS to replace NFS is not the solution. Likekicking any bad habit, moving away from NFS use requires some dedication and flexibility. Possible alternatives include:
Other applications will require other solutions, but by and large theseare neither especially complicated nor hard to configure and administer.a. E-mail
Rather than exporting/var/mail or/var/spool/mail,administrators should configure clients to use SMTP and IMAP (or POP3 iffor some reason IMAP is not available). This has the added advantage ofbeing easy for users to use outside of the corporate environment, suchas when telecommuting or travelling.b. Home directories
As noted above, using NFS for home directories is sometimes acceptable.I suggest that for many cases a radical new idea be employed: usingworkstation local hard drives for a users home directory.Modern workstations (often PC's in today's environments) tend to haveextremely fast and large hard disks. Being local, they also have fullspeed access, far beyond anything but the fastest network controllers.The main concern is obtaining proper backups, but this should not be anymore work than administering a typical NFS configuration.
A further concern is migrating user data.
c. Data Migration
Processes often run on different platforms for any number of reasons.They often need to share data.Rather than using NFS for this task, use explicit data duplication, via one of the long-established mechanisms designed for this purpose.
One easy solution is to use FTP to move this data. An even easiersolution, once installed, is to use the scp program that is includedwith SSH to move data. This has an added advantage of being optionallyencrypted and/or compressed. Do not use rcp for this purpose as it isinherently insecure, and often introduces security holes.
NFS suffers from a fundamental design flaw, attempting to find acompromise between the desire to deal with the network environmentefficiently and preserve existing access semantics. The advertisedability of NFS to treat network resources as if they were local plainlydoes not measure up under close scrutiny.
There are two potential solutions to the problem of remote access thatdo not make this compromise.
One potential solution is to abandon Unix file semantics for remoteaccess. This is the solution taken by many of the advanced file systemsresearched by the acedemicacademic community. This allows improvedcontrol over affects of working in a network environment, and alsoallows for use of special features only available in thise environment,such as disconnected activity or using data across multiple hosts.Unfortunately, for full advantage it requires software be rewritten withunderstanding of the different semantics.
Another solution would be to pursue Unix semantics to their full extent,without compromising for performance or reliability reasons, which hasnever been seriously attempted. It may be that this would be a poorsolution, but I suggest that in a modern environment, with processorspeed and memory availability greatly outstripping increases in numberof clients, that this would not be the case.
Until a "plug-in" alternative to NFS is developed, administrators,developers, managers, and all others who are in positions to recommendnetworking and storage solutions would do well to avoid use of NFSwhenever possible. If a decision to use NFS is finally reached, at thevery least the issues discussed here should be addressed in the earlyphases of installation, to avoid painful changes late in implementationor delivery.
