Multigrain Timestamps

Introduction

Historically, the kernel has always used coarse time values to stamp inodes.This value is updated every jiffy, so any change that happens within that jiffywill end up with the same timestamp.

When the kernel goes to stamp an inode (due to a read or write), it first getsthe current time and then compares it to the existing timestamp(s) to seewhether anything will change. If nothing changed, then it can avoid updatingthe inode’s metadata.

Coarse timestamps are therefore good from a performance standpoint, since theyreduce the need for metadata updates, but bad from the standpoint ofdetermining whether anything has changed, since a lot of things can happen in ajiffy.

They are particularly troublesome with NFSv3, where unchanging timestamps canmake it difficult to tell whether to invalidate caches. NFSv4 provides adedicated change attribute that should always show a visible change, but notall filesystems implement this properly, causing the NFS server to substitutethe ctime in many cases.

Multigrain timestamps aim to remedy this by selectively using fine-grainedtimestamps when a file has had its timestamps queried recently, and the currentcoarse-grained time does not cause a change.

Inode Timestamps

There are currently 3 timestamps in the inode that are updated to the currentwallclock time on different activity:

ctime:

The inode change time. This is stamped with the current time wheneverthe inode’s metadata is changed. Note that this value is not settablefrom userland.

mtime:

The inode modification time. This is stamped with the current timeany time a file’s contents change.

atime:

The inode access time. This is stamped whenever an inode’s contents areread. Widely considered to be a terrible mistake. Usually avoided withoptions like noatime or relatime.

Updating the mtime always implies a change to the ctime, but updating theatime due to a read request does not.

Multigrain timestamps are only tracked for the ctime and the mtime. atimes arenot affected and always use the coarse-grained value (subject to the floor).

Inode Timestamp Ordering

In addition to just providing info about changes to individual files, filetimestamps also serve an important purpose in applications like “make”. Theseprograms measure timestamps in order to determine whether source files might benewer than cached objects.

Userland applications like make can only determine ordering based onoperational boundaries. For a syscall those are the syscall entry and exitpoints. For io_uring or nfsd operations, that’s the request submission andresponse. In the case of concurrent operations, userland can make nodetermination about the order in which things will occur.

For instance, if a single thread modifies one file, and then another file insequence, the second file must show an equal or later mtime than the first. Thesame is true if two threads are issuing similar operations that do not overlapin time.

If however, two threads have racing syscalls that overlap in time, then thereis no such guarantee, and the second file may appear to have been modifiedbefore, after or at the same time as the first, regardless of which one wassubmitted first.

Note that the above assumes that the system doesn’t experience a backward jumpof the realtime clock. If that occurs at an inopportune time, then timestampscan appear to go backward, even on a properly functioning system.

Multigrain Timestamp Implementation

Multigrain timestamps are aimed at ensuring that changes to a single file arealways recognizable, without violating the ordering guarantees when multipledifferent files are modified. This affects the mtime and the ctime, but theatime will always use coarse-grained timestamps.

It uses an unused bit in the i_ctime_nsec field to indicate whether the mtimeor ctime has been queried. If either or both have, then the kernel takesspecial care to ensure the next timestamp update will display a visible change.This ensures tight cache coherency for use-cases like NFS, without sacrificingthe benefits of reduced metadata updates when files aren’t being watched.

The Ctime Floor Value

It’s not sufficient to simply use fine or coarse-grained timestamps based onwhether the mtime or ctime has been queried. A file could get a fine grainedtimestamp, and then a second file modified later could get a coarse-grained onethat appears earlier than the first, which would break the kernel’s timestampordering guarantees.

To mitigate this problem, maintain a global floor value that ensures thatthis can’t happen. The two files in the above example may appear to have beenmodified at the same time in such a case, but they will never show the reverseorder. To avoid problems with realtime clock jumps, the floor is managed as amonotonic ktime_t, and the values are converted to realtime clock values asneeded.

Implementation Notes

Multigrain timestamps are intended for use by local filesystems that getctime values from the local clock. This is in contrast to network filesystemsand the like that just mirror timestamp values from a server.

For most filesystems, it’s sufficient to just set the FS_MGTIME flag in thefstype->fs_flags in order to opt-in, providing the ctime is only ever set viainode_set_ctime_current(). If the filesystem has a ->getattr routine thatdoesn’t call generic_fillattr, then it should callfill_mg_cmtime() tofill those values. For setattr, it should usesetattr_copy() to update thetimestamps, or otherwise mimic its behavior.