Network Filesystem Caching API

Fscache provides an API by which a network filesystem can make use of localcaching facilities. The API is arranged around a number of principles:

  1. A cache is logically organised into volumes and data storage objectswithin those volumes.

  2. Volumes and data storage objects are represented by various types ofcookie.

  3. Cookies have keys that distinguish them from their peers.

  4. Cookies have coherency data that allows a cache to determine if thecached data is still valid.

  5. I/O is done asynchronously where possible.

This API is used by:

#include <linux/fscache.h>.

Overview

The fscache hierarchy is organised on two levels from a network filesystem’spoint of view. The upper level represents “volumes” and the lower levelrepresents “data storage objects”. These are represented by two types ofcookie, hereafter referred to as “volume cookies” and “cookies”.

A network filesystem acquires a volume cookie for a volume using a volume key,which represents all the information that defines that volume (e.g. cell nameor server address, volume ID or share name). This must be rendered as aprintable string that can be used as a directory name (ie. no ‘/’ charactersand shouldn’t begin with a ‘.’). The maximum name length is one less than themaximum size of a filename component (allowing the cache backend one char forits own purposes).

A filesystem would typically have a volume cookie for each superblock.

The filesystem then acquires a cookie for each file within that volume using anobject key. Object keys are binary blobs and only need to be unique withintheir parent volume. The cache backend is responsible for rendering the binaryblob into something it can use and may employ hash tables, trees or whatever toimprove its ability to find an object. This is transparent to the networkfilesystem.

A filesystem would typically have a cookie for each inode, and would acquire itin iget and relinquish it when evicting the cookie.

Once it has a cookie, the filesystem needs to mark the cookie as being in use.This causes fscache to send the cache backend off to look up/create resourcesfor the cookie in the background, to check its coherency and, if necessary, tomark the object as being under modification.

A filesystem would typically “use” the cookie in its file open routine andunuse it in file release and it needs to use the cookie around calls totruncate the cookie locally. Italso needs to use the cookie when thepagecache becomes dirty and unuse it when writeback is complete. This isslightly tricky, and provision is made for it.

When performing a read, write or resize on a cookie, the filesystem must firstbegin an operation. This copies the resources into a holdingstructand putsextra pins into the cache to stop cache withdrawal from tearing down thestructures being used. The actual operation can then be issued and conflictinginvalidations can be detected upon completion.

The filesystem is expected to use netfslib to access the cache, but that’s notactually required and it can use the fscache I/O API directly.

Volume Registration

The first step for a network filesystem is to acquire a volume cookie for thevolume it wants to access:

struct fscache_volume *fscache_acquire_volume(const char *volume_key,                       const char *cache_name,                       const void *coherency_data,                       size_t coherency_len);

This function creates a volume cookie with the specified volume key as its nameand notes the coherency data.

The volume key must be a printable string with no ‘/’ characters in it. Itshould begin with the name of the filesystem and should be no longer than 254characters. It should uniquely represent the volume and will be matched withwhat’s stored in the cache.

The caller may also specify the name of the cache to use. If specified,fscache will look up or create a cache cookie of that name and will use a cacheof that name if it is online or comes online. If no cache name is specified,it will use the first cache that comes to hand and set the name to that.

The specified coherency data is stored in the cookie and will be matchedagainst coherency data stored on disk. The data pointer may be NULL if no datais provided. If the coherency data doesn’t match, the entire cache volume willbe invalidated.

This function can return errors such as EBUSY if the volume key is already inuse by an acquired volume or ENOMEM if an allocation failure occurred. It mayalso return a NULL volume cookie if fscache is not enabled. It is safe topass a NULL cookie to any function that takes a volume cookie. This willcause that function to do nothing.

When the network filesystem has finished with a volume, it should relinquish itby calling:

void fscache_relinquish_volume(struct fscache_volume *volume,                               const void *coherency_data,                               bool invalidate);

This will cause the volume to be committed or removed, and if sealed thecoherency data will be set to the value supplied. The amount of coherency datamust match the length specified when the volume was acquired. Note that alldata cookies obtained in this volume must be relinquished before the volume isrelinquished.

Data File Registration

Once it has a volume cookie, a network filesystem can use it to acquire acookie for data storage:

struct fscache_cookie *fscache_acquire_cookie(struct fscache_volume *volume,                       u8 advice,                       const void *index_key,                       size_t index_key_len,                       const void *aux_data,                       size_t aux_data_len,                       loff_t object_size)

This creates the cookie in the volume using the specified index key. The indexkey is a binary blob of the given length and must be unique for the volume.This is saved into the cookie. There are no restrictions on the content, butits length shouldn’t exceed about three quarters of the maximum filename lengthto allow for encoding.

The caller should also pass in a piece of coherency data in aux_data. A bufferof size aux_data_len will be allocated and the coherency data copied in. It isassumed that the size is invariant over time. The coherency data is used tocheck the validity of data in the cache. Functions are provided by which thecoherency data can be updated.

The file size of the object being cached should also be provided. This may beused to trim the data and will be stored with the coherency data.

This function never returns an error, though it may return a NULL cookie onallocation failure or if fscache is not enabled. It is safe to pass in a NULLvolume cookie and pass the NULL cookie returned to any function that takes it.This will cause that function to do nothing.

When the network filesystem has finished with a cookie, it should relinquish itby calling:

void fscache_relinquish_cookie(struct fscache_cookie *cookie,                               bool retire);

This will cause fscache to either commit the storage backing the cookie ordelete it.

Marking A Cookie In-Use

Once a cookie has been acquired by a network filesystem, the filesystem shouldtell fscache when it intends to use the cookie (typically done on file open)and should say when it has finished with it (typically on file close):

void fscache_use_cookie(struct fscache_cookie *cookie,                        bool will_modify);void fscache_unuse_cookie(struct fscache_cookie *cookie,                          const void *aux_data,                          const loff_t *object_size);

Theuse function tells fscache that it will use the cookie and, additionally,indicate if the user is intending to modify the contents locally. If not yetdone, this will trigger the cache backend to go and gather the resources itneeds to access/store data in the cache. This is done in the background, andso may not be complete by the time the function returns.

Theunuse function indicates that a filesystem has finished using a cookie.It optionally updates the stored coherency data and object size and thendecreases the in-use counter. When the last user unuses the cookie, it isscheduled for garbage collection. If not reused within a short time, theresources will be released to reduce system resource consumption.

A cookie must be marked in-use before it can be accessed for read, write orresize - and an in-use mark must be kept whilst there is dirty data in thepagecache in order to avoid an oops due to trying to open a file during processexit.

Note that in-use marks are cumulative. For each time a cookie is markedin-use, it must be unused.

Resizing A Data File (Truncation)

If a network filesystem file is resized locally by truncation, the followingshould be called to notify the cache:

void fscache_resize_cookie(struct fscache_cookie *cookie,                           loff_t new_size);

The caller must have first marked the cookie in-use. The cookie and the newsize are passed in and the cache is synchronously resized. This is expected tobe called from->setattr() inode operation under the inode lock.

Data I/O API

To do data I/O operations directly through a cookie, the following functionsare available:

int fscache_begin_read_operation(struct netfs_cache_resources *cres,                                 struct fscache_cookie *cookie);int fscache_read(struct netfs_cache_resources *cres,                 loff_t start_pos,                 struct iov_iter *iter,                 enum netfs_read_from_hole read_hole,                 netfs_io_terminated_t term_func,                 void *term_func_priv);int fscache_write(struct netfs_cache_resources *cres,                  loff_t start_pos,                  struct iov_iter *iter,                  netfs_io_terminated_t term_func,                  void *term_func_priv);

Thebegin function sets up an operation, attaching the resources required tothe cache resources block from the cookie. Assuming it doesn’t return an error(for instance, it will return -ENOBUFS if given a NULL cookie, but otherwise donothing), then one of the other two functions can be issued.

Theread andwrite functions initiate a direct-IO operation. Both take thepreviously set up cache resources block, an indication of the start fileposition, and an I/O iterator that describes buffer and indicates the amount ofdata.

The read function also takes a parameter to indicate how it should handle apartially populated region (a hole) in the disk content. This may be to ignoreit, skip over an initial hole and place zeros in the buffer or give an error.

The read and write functions can be given an optional termination function thatwill be run on completion:

typedefvoid (*netfs_io_terminated_t)(void *priv, ssize_t transferred_or_error,                              bool was_async);

If a termination function is given, the operation will be run asynchronouslyand the termination function will be called upon completion. If not given, theoperation will be run synchronously. Note that in the asynchronous case, it ispossible for the operation to complete before the function returns.

Both the read and write functions end the operation when they complete,detaching any pinned resources.

The read operation will fail with ESTALE if invalidation occurred whilst theoperation was ongoing.

Data File Coherency

To request an update of the coherency data and file size on a cookie, thefollowing should be called:

void fscache_update_cookie(struct fscache_cookie *cookie,                           const void *aux_data,                           const loff_t *object_size);

This will update the cookie’s coherency data and/or file size.

Data File Invalidation

Sometimes it will be necessary to invalidate an object that contains data.Typically this will be necessary when the server informs the network filesystemof a remote third-party change - at which point the filesystem has to throwaway the state and cached data that it had for an file and reload from theserver.

To indicate that a cache object should be invalidated, the following should becalled:

void fscache_invalidate(struct fscache_cookie *cookie,                        const void *aux_data,                        loff_t size,                        unsigned int flags);

This increases the invalidation counter in the cookie to cause outstandingreads to fail with -ESTALE, sets the coherency data and file size from theinformation supplied, blocks new I/O on the cookie and dispatches the cache togo and get rid of the old data.

Invalidation runs asynchronously in a worker thread so that it doesn’t blocktoo much.

Write-Back Resource Management

To write data to the cache from network filesystem writeback, the cacheresources required need to be pinned at the point the modification is made (forinstance when the page is marked dirty) as it’s not possible to open a file ina thread that’s exiting.

The following facilities are provided to manage this:

  • An inode flag,I_PINNING_FSCACHE_WB, is provided to indicate that anin-use is held on the cookie for this inode. It can only be changed if thethe inode lock is held.

  • A flag,unpinned_fscache_wb is placed in thewriteback_controlstructthat gets set if__writeback_single_inode() clearsI_PINNING_FSCACHE_WB because all the dirty pages were cleared.

To support this, the following functions are provided:

bool fscache_dirty_folio(struct address_space *mapping,                         struct folio *folio,                         struct fscache_cookie *cookie);void fscache_unpin_writeback(struct writeback_control *wbc,                             struct fscache_cookie *cookie);void fscache_clear_inode_writeback(struct fscache_cookie *cookie,                                   struct inode *inode,                                   const void *aux);

Theset function is intended to be called from the filesystem’sdirty_folio address space operation. IfI_PINNING_FSCACHE_WB is notset, it sets that flag and increments the use count on the cookie (the callermust already have calledfscache_use_cookie()).

Theunpin function is intended to be called from the filesystem’swrite_inode superblock operation. It cleans up after writing by unusingthe cookie if unpinned_fscache_wb is set in the writeback_control struct.

Theclear function is intended to be called from the netfs’sevict_inodesuperblock operation. It must be calledaftertruncate_inode_pages_final(), butbeforeclear_inode(). This cleansup any hangingI_PINNING_FSCACHE_WB. It also allows the coherency data tobe updated.

Caching of Local Modifications

If a network filesystem has locally modified data that it wants to write to thecache, it needs to mark the pages to indicate that a write is in progress, andif the mark is already present, it needs to wait for it to be removed first(presumably due to an already in-progress operation). This prevents multiplecompeting DIO writes to the same storage in the cache.

Firstly, the netfs should determine if caching is available by doing somethinglike:

bool caching = fscache_cookie_enabled(cookie);

If caching is to be attempted, pages should be waited for and then marked usingthe following functions provided by the netfs helper library:

void set_page_fscache(struct page *page);void wait_on_page_fscache(struct page *page);int wait_on_page_fscache_killable(struct page *page);

Once all the pages in the span are marked, the netfs can ask fscache toschedule a write of that region:

void fscache_write_to_cache(struct fscache_cookie *cookie,                            struct address_space *mapping,                            loff_t start, size_t len, loff_t i_size,                            netfs_io_terminated_t term_func,                            void *term_func_priv,                            bool caching)

And if an error occurs before that point is reached, the marks can be removedby calling:

void fscache_clear_page_bits(struct address_space *mapping,                             loff_t start, size_t len,                             bool caching)

In these functions, a pointer to the mapping to which the source pages areattached is passed in and start and len indicate the size of the region that’sgoing to be written (it doesn’t have to align to page boundaries necessarily,but it does have to align to DIO boundaries on the backing filesystem). Thecaching parameter indicates if caching should be skipped, and if false, thefunctions do nothing.

The write function takes some additional parameters: the cookie representingthe cache object to be written to, i_size indicates the size of the netfs fileand term_func indicates an optional completion function, to whichterm_func_priv will be passed, along with the error or amount written.

Note that the write function will always run asynchronously and will unmark allthe pages upon completion before calling term_func.

Page Release and Invalidation

Fscache keeps track of whether we have any data in the cache yet for a cacheobject we’ve just created. It knows it doesn’t have to do any reading until ithas done a write and then the page it wrote from has been released by the VM,after which ithas to look in the cache.

To inform fscache that a page might now be in the cache, the following functionshould be called from therelease_folio address space op:

void fscache_note_page_release(struct fscache_cookie *cookie);

if the page has been released (ie. release_folio returned true).

Page release and page invalidation should also wait for any mark left on thepage to say that a DIO write is underway from that page:

void wait_on_page_fscache(struct page *page);int wait_on_page_fscache_killable(struct page *page);

API Function Reference

structfscache_volume*fscache_acquire_volume(constchar*volume_key,constchar*cache_name,constvoid*coherency_data,size_tcoherency_len)

Register a volume as desiring caching services

Parameters

constchar*volume_key

An identification string for the volume

constchar*cache_name

The name of the cache to use (or NULL for the default)

constvoid*coherency_data

Piece of arbitrary coherency data to check (or NULL)

size_tcoherency_len

The size of the coherency data

Description

Register a volume as desiring caching services if they’re available. Thecaller must provide an identifier for the volume and may also indicate whichcache it should be in. If a preexisting volume entry is found in the cache,the coherency data must match otherwise the entry will be invalidated.

Returns a cookie pointer on success, -ENOMEM if out of memory or -EBUSY if acache volume of that name is already acquired. Note that “NULL” is a validcookie pointer and can be returned if caching is refused.

voidfscache_relinquish_volume(structfscache_volume*volume,constvoid*coherency_data,boolinvalidate)

Cease caching a volume

Parameters

structfscache_volume*volume

The volume cookie

constvoid*coherency_data

Piece of arbitrary coherency data to set (or NULL)

boolinvalidate

True if the volume should be invalidated

Description

Indicate that a filesystem no longer desires caching services for a volume.The caller must have relinquished all file cookies prior to calling this.The stored coherency data is updated.

structfscache_cookie*fscache_acquire_cookie(structfscache_volume*volume,u8advice,constvoid*index_key,size_tindex_key_len,constvoid*aux_data,size_taux_data_len,loff_tobject_size)

Acquire a cookie to represent a cache object

Parameters

structfscache_volume*volume

The volume in which to locate/create this cookie

u8advice

Advice flags (FSCACHE_COOKIE_ADV_*)

constvoid*index_key

The index key for this cookie

size_tindex_key_len

Size of the index key

constvoid*aux_data

The auxiliary data for the cookie (may be NULL)

size_taux_data_len

Size of the auxiliary data buffer

loff_tobject_size

The initial size of object

Description

Acquire a cookie to represent a data file within the given cache volume.

SeeNetwork Filesystem Caching API for a completedescription.

voidfscache_use_cookie(structfscache_cookie*cookie,boolwill_modify)

Request usage of cookie attached to an object

Parameters

structfscache_cookie*cookie

The cookie representing the cache object

boolwill_modify

If cache is expected to be modified locally

Description

Request usage of the cookie attached to an object. The caller should tellthe cache if the object’s contents are about to be modified locally and thenthe cache can apply the policy that has been set to handle this case.

voidfscache_unuse_cookie(structfscache_cookie*cookie,constvoid*aux_data,constloff_t*object_size)

Cease usage of cookie attached to an object

Parameters

structfscache_cookie*cookie

The cookie representing the cache object

constvoid*aux_data

Updated auxiliary data (or NULL)

constloff_t*object_size

Revised size of the object (or NULL)

Description

Cease usage of the cookie attached to an object. When the users countreaches zero then the cookie relinquishment will be permitted to proceed.

voidfscache_relinquish_cookie(structfscache_cookie*cookie,boolretire)

Return the cookie to the cache, maybe discarding it

Parameters

structfscache_cookie*cookie

The cookie being returned

boolretire

True if the cache object the cookie represents is to be discarded

Description

This function returns a cookie to the cache, forcibly discarding theassociated cache object if retire is set to true.

SeeNetwork Filesystem Caching API for a completedescription.

voidfscache_update_cookie(structfscache_cookie*cookie,constvoid*aux_data,constloff_t*object_size)

Request that a cache object be updated

Parameters

structfscache_cookie*cookie

The cookie representing the cache object

constvoid*aux_data

The updated auxiliary data for the cookie (may be NULL)

constloff_t*object_size

The current size of the object (may be NULL)

Description

Request an update of the index data for the cache object associated with thecookie. The auxiliary data on the cookie will be updated first ifaux_datais set and the object size will be updated and the object possibly trimmedifobject_size is set.

SeeNetwork Filesystem Caching API for a completedescription.

voidfscache_resize_cookie(structfscache_cookie*cookie,loff_tnew_size)

Request that a cache object be resized

Parameters

structfscache_cookie*cookie

The cookie representing the cache object

loff_tnew_size

The new size of the object (may be NULL)

Description

Request that the size of an object be changed.

SeeNetwork Filesystem Caching API for a completedescription.

voidfscache_invalidate(structfscache_cookie*cookie,constvoid*aux_data,loff_tsize,unsignedintflags)

Notify cache that an object needs invalidation

Parameters

structfscache_cookie*cookie

The cookie representing the cache object

constvoid*aux_data

The updated auxiliary data for the cookie (may be NULL)

loff_tsize

The revised size of the object.

unsignedintflags

Invalidation flags (FSCACHE_INVAL_*)

Description

Notify the cache that an object is needs to be invalidated and that itshould abort any retrievals or stores it is doing on the cache. Thisincrements inval_counter on the cookie which can be used by the caller toreconsider I/O requests as they complete.

Ifflags has FSCACHE_INVAL_DIO_WRITE set, this indicates that this is dueto a direct I/O write and will cause caching to be disabled on this cookieuntil it is completely unused.

SeeNetwork Filesystem Caching API for a completedescription.

conststructnetfs_cache_ops*fscache_operation_valid(conststructnetfs_cache_resources*cres)

Return true if operations resources are usable

Parameters

conststructnetfs_cache_resources*cres

The resources to check.

Description

Returns a pointer to the operations table if usable or NULL if not.

intfscache_begin_read_operation(structnetfs_cache_resources*cres,structfscache_cookie*cookie)

Begin a read operation for the netfs lib

Parameters

structnetfs_cache_resources*cres

The cache resources for the read being performed

structfscache_cookie*cookie

The cookie representing the cache object

Description

Begin a read operation on behalf of the netfs helper library.cresindicates the cache resources to which the operation state should beattached;cookie indicates the cache object that will be accessed.

cres->inval_counter is set fromcookie->inval_counter for comparison atthe end of the operation. This allows invalidation during the operation tobe detected by the caller.

Return

  • 0 - Success

  • -ENOBUFS
    • No caching available

  • Other error code from the cache, such as -ENOMEM.

voidfscache_end_operation(structnetfs_cache_resources*cres)

End the read operation for the netfs lib

Parameters

structnetfs_cache_resources*cres

The cache resources for the read operation

Description

Clean up the resources at the end of the read request.

intfscache_read(structnetfs_cache_resources*cres,loff_tstart_pos,structiov_iter*iter,enumnetfs_read_from_holeread_hole,netfs_io_terminated_tterm_func,void*term_func_priv)

Start a read from the cache.

Parameters

structnetfs_cache_resources*cres

The cache resources to use

loff_tstart_pos

The beginning file offset in the cache file

structiov_iter*iter

The buffer to fill - and also the length

enumnetfs_read_from_holeread_hole

How to handle a hole in the data.

netfs_io_terminated_tterm_func

The function to call upon completion

void*term_func_priv

The private data forterm_func

Description

Start a read from the cache.cres indicates the cache object to read fromand must be obtained by a call tofscache_begin_operation() beforehand.

The data is read into the iterator,iter, and that also indicates the sizeof the operation.start_pos is the start position in the file, though ifseek_data is set appropriately, the cache can use SEEK_DATA to find thenext piece of data, writing zeros for the hole into the iterator.

Upon termination of the operation,term_func will be called and suppliedwithterm_func_priv plus the amount of data written, if successful, or theerror code otherwise.

read_hole indicates how a partially populated region in the cache should behandled. It can be one of a number of settings:

NETFS_READ_HOLE_IGNORE - Just try to read (may return a short read).

NETFS_READ_HOLE_FAIL - Give ENODATA if we encounter a hole.

intfscache_begin_write_operation(structnetfs_cache_resources*cres,structfscache_cookie*cookie)

Begin a write operation for the netfs lib

Parameters

structnetfs_cache_resources*cres

The cache resources for the write being performed

structfscache_cookie*cookie

The cookie representing the cache object

Description

Begin a write operation on behalf of the netfs helper library.cresindicates the cache resources to which the operation state should beattached;cookie indicates the cache object that will be accessed.

cres->inval_counter is set fromcookie->inval_counter for comparison atthe end of the operation. This allows invalidation during the operation tobe detected by the caller.

Return

  • 0 - Success

  • -ENOBUFS
    • No caching available

  • Other error code from the cache, such as -ENOMEM.

intfscache_write(structnetfs_cache_resources*cres,loff_tstart_pos,structiov_iter*iter,netfs_io_terminated_tterm_func,void*term_func_priv)

Start a write to the cache.

Parameters

structnetfs_cache_resources*cres

The cache resources to use

loff_tstart_pos

The beginning file offset in the cache file

structiov_iter*iter

The data to write - and also the length

netfs_io_terminated_tterm_func

The function to call upon completion

void*term_func_priv

The private data forterm_func

Description

Start a write to the cache.cres indicates the cache object to write to andmust be obtained by a call tofscache_begin_operation() beforehand.

The data to be written is obtained from the iterator,iter, and that alsoindicates the size of the operation.start_pos is the start position inthe file.

Upon termination of the operation,term_func will be called and suppliedwithterm_func_priv plus the amount of data written, if successful, or theerror code otherwise.

voidfscache_clear_page_bits(structaddress_space*mapping,loff_tstart,size_tlen,boolcaching)

Clear the PG_fscache bits from a set of pages

Parameters

structaddress_space*mapping

The netfs inode to use as the source

loff_tstart

The start position inmapping

size_tlen

The amount of data to unlock

boolcaching

If PG_fscache has been set

Description

Clear the PG_fscache flag from a sequence of pages and wake up anyone who’swaiting.

voidfscache_write_to_cache(structfscache_cookie*cookie,structaddress_space*mapping,loff_tstart,size_tlen,loff_ti_size,netfs_io_terminated_tterm_func,void*term_func_priv,boolusing_pgpriv2,boolcaching)

Save a write to the cache and clear PG_fscache

Parameters

structfscache_cookie*cookie

The cookie representing the cache object

structaddress_space*mapping

The netfs inode to use as the source

loff_tstart

The start position inmapping

size_tlen

The amount of data to write back

loff_ti_size

The new size of the inode

netfs_io_terminated_tterm_func

The function to call upon completion

void*term_func_priv

The private data forterm_func

boolusing_pgpriv2

If we’re using PG_private_2 to mark in-progress write

boolcaching

If we actually want to do the caching

Description

Helper function for a netfs to write dirty data from an inode into the cacheobject that’s backing it.

start andlen describe the range of the data. This does not need to bepage-aligned, but to satisfy DIO requirements, the cache may expand it up tothe page boundaries on either end. All the pages covering the range must bemarked with PG_fscache.

If given,term_func will be called upon completion and supplied withterm_func_priv. Note that ifusing_pgpriv2 is set, the PG_private_2 flagswill have been cleared by this point, so the netfs must retain its own pinon the mapping.

voidfscache_note_page_release(structfscache_cookie*cookie)

Note that a netfs page got released

Parameters

structfscache_cookie*cookie

The cookie corresponding to the file

Description

Note that a page that has been copied to the cache has been released. Thismeans that future reads will need to look in the cache to see if it’s there.