Linux Filesystems API summary

This section contains API-level documentation, mostly taken from the sourcecode itself.

The Linux VFS

The Filesystem types

enumpositive_aop_returns

aop return codes with specific semantics

Constants

AOP_WRITEPAGE_ACTIVATE

Informs the caller that page writeback hascompleted, that the page is still locked, andshould be considered active. The VM uses this hintto return the page to the active list -- it won’tbe a candidate for writeback again in the nearfuture. Other callers must be careful to unlockthe page if they get this return. Returned bywritepage();

AOP_TRUNCATED_PAGE

The AOP method that was handed a locked page hasunlocked it and the page might have been truncated.The caller should back up to acquiring a new page andtrying again. The aop will be taking reasonableprecautions not to livelock. If the caller held a pagereference, it should drop it before retrying. Returnedbyread_folio().

Description

address_space_operation functions return these large constants to indicatespecial semantics to the caller. These are much larger than the bytes in apage to allow for functions that return the number of bytes operated on in agiven page.

structaddress_space

Contents of a cacheable, mappable object.

Definition:

struct address_space {    struct inode            *host;    struct xarray           i_pages;    struct rw_semaphore     invalidate_lock;    gfp_t gfp_mask;    atomic_t i_mmap_writable;#ifdef CONFIG_READ_ONLY_THP_FOR_FS;    atomic_t nr_thps;#endif;    struct rb_root_cached   i_mmap;    unsigned long           nrpages;    pgoff_t writeback_index;    const struct address_space_operations *a_ops;    unsigned long           flags;    errseq_t wb_err;    spinlock_t i_private_lock;    struct list_head        i_private_list;    struct rw_semaphore     i_mmap_rwsem;    void *                  i_private_data;};

Members

host

Owner, either the inode or the block_device.

i_pages

Cached pages.

invalidate_lock

Guards coherency between page cache contents andfile offset->disk block mappings in the filesystem during invalidates.It is also used to block modification of page cache contents throughmemory mappings.

gfp_mask

Memory allocation flags to use for allocating pages.

i_mmap_writable

Number of VM_SHARED, VM_MAYWRITE mappings.

nr_thps

Number of THPs in the pagecache (non-shmem only).

i_mmap

Tree of private and shared mappings.

nrpages

Number of page entries, protected by the i_pages lock.

writeback_index

Writeback starts here.

a_ops

Methods.

flags

Error bits and flags (AS_*).

wb_err

The most recent error which has occurred.

i_private_lock

For use by the owner of the address_space.

i_private_list

For use by the owner of the address_space.

i_mmap_rwsem

Protectsi_mmap andi_mmap_writable.

i_private_data

For use by the owner of the address_space.

structfile_ra_state

Track a file’s readahead state.

Definition:

struct file_ra_state {    pgoff_t start;    unsigned int size;    unsigned int async_size;    unsigned int ra_pages;    unsigned short order;    unsigned short mmap_miss;    loff_t prev_pos;};

Members

start

Where the most recent readahead started.

size

Number of pages read in the most recent readahead.

async_size

Numer of pages that were/are not needed immediatelyand so were/are genuinely “ahead”. Start next readahead whenthe first of these pages is accessed.

ra_pages

Maximum size of a readahead request, copied from the bdi.

order

Preferred folio order used for most recent readahead.

mmap_miss

How many mmap accesses missed in the page cache.

prev_pos

The last byte in the most recent read request.

Description

When this structure is passed to ->readahead(), the “most recent”readahead means the current readahead.

structfile

Represents a file

Definition:

struct file {    spinlock_t f_lock;    fmode_t f_mode;    const struct file_operations    *f_op;    struct address_space            *f_mapping;    void *private_data;    struct inode                    *f_inode;    unsigned int                    f_flags;    unsigned int                    f_iocb_flags;    const struct cred               *f_cred;    struct fown_struct              *f_owner;    union {        const struct path       f_path;        struct path             __f_path;    };    union {        struct mutex            f_pos_lock;        u64 f_pipe;    };    loff_t f_pos;#ifdef CONFIG_SECURITY;    void *f_security;#endif;    errseq_t f_wb_err;    errseq_t f_sb_err;#ifdef CONFIG_EPOLL;    struct hlist_head               *f_ep;#endif;    union {        struct callback_head    f_task_work;        struct llist_node       f_llist;        struct file_ra_state    f_ra;        freeptr_t f_freeptr;    };    file_ref_t f_ref;};

Members

f_lock

Protects f_ep, f_flags. Must not be taken from IRQ context.

f_mode

FMODE_* flags often used in hotpaths

f_op

file operations

f_mapping

Contents of a cacheable, mappable object.

private_data

filesystem or driver specific data

f_inode

cached inode

f_flags

file flags

f_iocb_flags

iocb flags

f_cred

stashed credentials of creator/opener

f_owner

file owner

{unnamed_union}

anonymous

f_path

path of the file

__f_path

writable alias forf_path;ONLY for core VFS and only beforethe file gets open

{unnamed_union}

anonymous

f_pos_lock

lock protecting file position

f_pipe

specific to pipes

f_pos

file position

f_security

LSM security context of this file

f_wb_err

writeback error

f_sb_err

per sb writeback errors

f_ep

link of all epoll hooks for this file

{unnamed_union}

anonymous

f_task_work

task work entry point

f_llist

work queue entrypoint

f_ra

file’s readahead state

f_freeptr

Pointer used by SLAB_TYPESAFE_BY_RCU file cache (don’t touch.)

f_ref

reference count

vfsuid_ti_uid_into_vfsuid(structmnt_idmap*idmap,conststructinode*inode)

map an inode’s i_uid down according to an idmapping

Parameters

structmnt_idmap*idmap

idmap of the mount the inode was found from

conststructinode*inode

inode to map

Return

whe inode’s i_uid mapped down according toidmap.If the inode’s i_uid has no mapping INVALID_VFSUID is returned.

booli_uid_needs_update(structmnt_idmap*idmap,conststructiattr*attr,conststructinode*inode)

check whether inode’s i_uid needs to be updated

Parameters

structmnt_idmap*idmap

idmap of the mount the inode was found from

conststructiattr*attr

the new attributes ofinode

conststructinode*inode

the inode to update

Description

Check whether the $inode’s i_uid field needs to be updated taking idmappedmounts into account if the filesystem supports it.

Return

true ifinode’s i_uid field needs to be updated, false if not.

voidi_uid_update(structmnt_idmap*idmap,conststructiattr*attr,structinode*inode)

updateinode’s i_uid field

Parameters

structmnt_idmap*idmap

idmap of the mount the inode was found from

conststructiattr*attr

the new attributes ofinode

structinode*inode

the inode to update

Description

Safely updateinode’s i_uid field translating the vfsuid of any idmappedmount into the filesystem kuid.

vfsgid_ti_gid_into_vfsgid(structmnt_idmap*idmap,conststructinode*inode)

map an inode’s i_gid down according to an idmapping

Parameters

structmnt_idmap*idmap

idmap of the mount the inode was found from

conststructinode*inode

inode to map

Return

the inode’s i_gid mapped down according toidmap.If the inode’s i_gid has no mapping INVALID_VFSGID is returned.

booli_gid_needs_update(structmnt_idmap*idmap,conststructiattr*attr,conststructinode*inode)

check whether inode’s i_gid needs to be updated

Parameters

structmnt_idmap*idmap

idmap of the mount the inode was found from

conststructiattr*attr

the new attributes ofinode

conststructinode*inode

the inode to update

Description

Check whether the $inode’s i_gid field needs to be updated taking idmappedmounts into account if the filesystem supports it.

Return

true ifinode’s i_gid field needs to be updated, false if not.

voidi_gid_update(structmnt_idmap*idmap,conststructiattr*attr,structinode*inode)

updateinode’s i_gid field

Parameters

structmnt_idmap*idmap

idmap of the mount the inode was found from

conststructiattr*attr

the new attributes ofinode

structinode*inode

the inode to update

Description

Safely updateinode’s i_gid field translating the vfsgid of any idmappedmount into the filesystem kgid.

voidinode_fsuid_set(structinode*inode,structmnt_idmap*idmap)

initialize inode’s i_uid field with callers fsuid

Parameters

structinode*inode

inode to initialize

structmnt_idmap*idmap

idmap of the mount the inode was found from

Description

Initialize the i_uid field ofinode. If the inode was found/created viaan idmapped mount map the caller’s fsuid according toidmap.

voidinode_fsgid_set(structinode*inode,structmnt_idmap*idmap)

initialize inode’s i_gid field with callers fsgid

Parameters

structinode*inode

inode to initialize

structmnt_idmap*idmap

idmap of the mount the inode was found from

Description

Initialize the i_gid field ofinode. If the inode was found/created viaan idmapped mount map the caller’s fsgid according toidmap.

boolfsuidgid_has_mapping(structsuper_block*sb,structmnt_idmap*idmap)

check whether caller’s fsuid/fsgid is mapped

Parameters

structsuper_block*sb

the superblock we want a mapping in

structmnt_idmap*idmap

idmap of the relevant mount

Description

Check whether the caller’s fsuid and fsgid have a valid mapping in thes_user_ns of the superblocksb. If the caller is on an idmapped mount mapthe caller’s fsuid and fsgid according to theidmap first.

Return

true if fsuid and fsgid is mapped, false if not.

structtimespec64inode_set_ctime(structinode*inode,time64_tsec,longnsec)

set the ctime in the inode

Parameters

structinode*inode

inode in which to set the ctime

time64_tsec

tv_sec value to set

longnsec

tv_nsec value to set

Description

Set the ctime ininode to {sec,nsec }

boolfile_write_started(conststructfile*file)

check if SB_FREEZE_WRITE is held

Parameters

conststructfile*file

the file we write to

Description

May be false positive with !CONFIG_LOCKDEP/LOCK_STATE_UNKNOWN.May be false positive with !S_ISREG, becausefile_start_write() hasno effect on !S_ISREG.

boolfile_write_not_started(conststructfile*file)

check if SB_FREEZE_WRITE is not held

Parameters

conststructfile*file

the file we write to

Description

May be false positive with !CONFIG_LOCKDEP/LOCK_STATE_UNKNOWN.May be false positive with !S_ISREG, becausefile_start_write() hasno effect on !S_ISREG.

structrenamedata

contains all information required for renaming

Definition:

struct renamedata {    struct mnt_idmap *mnt_idmap;    struct dentry *old_parent;    struct dentry *old_dentry;    struct dentry *new_parent;    struct dentry *new_dentry;    struct delegated_inode *delegated_inode;    unsigned int flags;};

Members

mnt_idmap

idmap of the mount in which the rename is happening.

old_parent

parent of source

old_dentry

source

new_parent

parent of destination

new_dentry

destination

delegated_inode

returns an inode needing a delegation break

flags

rename flags

boolis_mgtime(conststructinode*inode)

is this inode using multigrain timestamps

Parameters

conststructinode*inode

inode to test for multigrain timestamps

Description

Return true if the inode uses multigrain timestamps, false otherwise.

boolis_idmapped_mnt(conststructvfsmount*mnt)

check whether a mount is mapped

Parameters

conststructvfsmount*mnt

the mount to check

Description

Ifmnt has an nonnop_mnt_idmap attached to it thenmnt is mapped.

Return

true if mount is mapped, false if not.

voidfile_start_write(structfile*file)

get write access to a superblock for regular file io

Parameters

structfile*file

the file we want to write to

Description

This is a variant ofsb_start_write() which is a noop on non-regular file.Should be matched with a call tofile_end_write().

voidfile_end_write(structfile*file)

drop write access to a superblock of a regular file

Parameters

structfile*file

the file we wrote to

Description

Should be matched with a call tofile_start_write().

voidkiocb_start_write(structkiocb*iocb)

get write access to a superblock for async file io

Parameters

structkiocb*iocb

the io context we want to submit the write with

Description

This is a variant ofsb_start_write() for async io submission.Should be matched with a call tokiocb_end_write().

voidkiocb_end_write(structkiocb*iocb)

drop write access to a superblock after async file io

Parameters

structkiocb*iocb

the io context we sumbitted the write with

Description

Should be matched with a call tokiocb_start_write().

boolis_dot_dotdot(constchar*name,size_tlen)

returns true only ifname is “.” or “..”

Parameters

constchar*name

file name to check

size_tlen

length of file name, in bytes

boolname_contains_dotdot(constchar*name)

check if a file name contains “..” path components

Parameters

constchar*name

File path string to checkSearch for “..” surrounded by either ‘/’ or start/end of string.

voidinode_dio_begin(structinode*inode)

signal start of a direct I/O requests

Parameters

structinode*inode

inode the direct I/O happens on

Description

This is called once we’ve finished processing a direct I/O request,and is used to wake up callers waiting for direct I/O to be quiesced.

voidinode_dio_end(structinode*inode)

signal finish of a direct I/O requests

Parameters

structinode*inode

inode the direct I/O happens on

Description

This is called once we’ve finished processing a direct I/O request,and is used to wake up callers waiting for direct I/O to be quiesced.

boolgeneric_ci_validate_strict_name(structinode*dir,conststructqstr*name)

Check if a given name is suitable for a directory

Parameters

structinode*dir

inode of the directory where the new file will be created

conststructqstr*name

name of the new file

Description

This functions checks if the proposed filename is valid for theparent directory. That means that only valid UTF-8 filenames will beaccepted for casefold directories from filesystems created with thestrict encoding flag. That also means that any name will beaccepted for directories that doesn’t have casefold enabled, oraren’t being strict with the encoding.

Return

  • True: if the filename is suitable for this directory. It can betrue if a given name is not suitable for a strict encodingdirectory, but the directory being used isn’t strict

  • False if the filename isn’t suitable for this directory. This onlyhappens when a directory is casefolded and the filesystem is strictabout its encoding.

The Directory Cache

voidd_drop(structdentry*dentry)

drop a dentry

Parameters

structdentry*dentry

dentry to drop

Description

d_drop() unhashes the entry from the parent dentry hashes, so that it won’tbe found through a VFS lookup any more. Note that this is different fromdeleting the dentry - d_delete will try to mark the dentry negative ifpossible, giving a successful _negative_ lookup, while d_drop willjust make the cache lookup fail.

d_drop() is used mainly for stuff that wants to invalidate a dentry for somereason (NFS timeouts or autofs deletes).

__d_drop requires dentry->d_lock

___d_drop doesn’t mark dentry as “unhashed”(dentry->d_hash.pprev will be LIST_POISON2, not NULL).

structdentry*d_find_any_alias(structinode*inode)

find any alias for a given inode

Parameters

structinode*inode

inode to find an alias for

Description

If any aliases exist for the given inode, take and return areference for one of them. If no aliases exist, returnNULL.

structdentry*d_find_alias(structinode*inode)

grab a hashed alias of inode

Parameters

structinode*inode

inode in question

Description

If inode has a hashed alias, or is a directory and has any alias,acquire the reference to alias and return it. Otherwise return NULL.Notice that if inode is a directory there can be only one alias andit can be unhashed only if it has no children, or if it is the rootof a filesystem, or if the directory was renamed and d_revalidatewas the first vfs operation to notice.

If the inode has an IS_ROOT, DCACHE_DISCONNECTED alias, then preferany other hashed alias over that one.

voidd_dispose_if_unused(structdentry*dentry,structlist_head*dispose)

move unreferenced dentries to shrink list

Parameters

structdentry*dentry

dentry in question

structlist_head*dispose

head of shrink list

Description

If dentry has no external references, move it to shrink list.

NOTE!!! The caller is responsible for preventing eviction of the dentry byholding dentry->d_inode->i_lock or equivalent.

voidshrink_dcache_sb(structsuper_block*sb)

shrink dcache for a superblock

Parameters

structsuper_block*sb

superblock

Description

Shrink the dcache for the specified super block. This is used to freethe dcache before unmounting a file system.

intpath_has_submounts(conststructpath*parent)

check for mounts over a dentry in the current namespace.

Parameters

conststructpath*parent

path to check.

Description

Return true if the parent or its subdirectories containa mount point in the current namespace.

voidd_invalidate(structdentry*dentry)

detach submounts, prune dcache, and drop

Parameters

structdentry*dentry

dentry to invalidate (aka detach, prune and drop)

structdentry*d_alloc(structdentry*parent,conststructqstr*name)

allocate a dcache entry

Parameters

structdentry*parent

parent of entry to allocate

conststructqstr*name

qstr of the name

Description

Allocates a dentry. It returnsNULL if there is insufficient memoryavailable. On a success the dentry is returned. The name passed in iscopied and the copy passed in may be reused after this call.

voidd_instantiate(structdentry*entry,structinode*inode)

fill in inode information for a dentry

Parameters

structdentry*entry

dentry to complete

structinode*inode

inode to attach to this dentry

Description

Fill in inode information in the entry.

This turns negative dentries into productive full membersof society.

NOTE! This assumes that the inode count has been incremented(or otherwise set) by the caller to indicate that it is nowin use by the dcache.

structdentry*d_obtain_alias(structinode*inode)

find or allocate a DISCONNECTED dentry for a given inode

Parameters

structinode*inode

inode to allocate the dentry for

Description

Obtain a dentry for an inode resulting from NFS filehandle conversion orsimilar open by handle operations. The returned dentry may be anonymous,or may have a full name (if the inode was already in the cache).

When called on a directory inode, we must ensure that the inode only everhas one dentry. If a dentry is found, that is returned instead ofallocating a new one.

On successful return, the reference to the inode has been transferredto the dentry. In case of an error the reference on the inode is released.To make it easier to use in export operations aNULL or IS_ERR inode maybe passed in and the error will be propagated to the return value,with aNULLinode replaced by ERR_PTR(-ESTALE).

structdentry*d_obtain_root(structinode*inode)

find or allocate a dentry for a given inode

Parameters

structinode*inode

inode to allocate the dentry for

Description

Obtain an IS_ROOT dentry for the root of a filesystem.

We must ensure that directory inodes only ever have one dentry. If adentry is found, that is returned instead of allocating a new one.

On successful return, the reference to the inode has been transferredto the dentry. In case of an error the reference on the inode isreleased. ANULL or IS_ERR inode may be passed in and will be theerror will be propagate to the return value, with aNULLinodereplaced by ERR_PTR(-ESTALE).

structdentry*d_add_ci(structdentry*dentry,structinode*inode,structqstr*name)

lookup or allocate new dentry with case-exact name

Parameters

structdentry*dentry

the negative dentry that was passed to the parent’s lookup func

structinode*inode

the inode case-insensitive lookup has found

structqstr*name

the case-exact name to be associated with the returned dentry

Description

This is to avoid filling the dcache with case-insensitive names to thesame inode, only the actual correct case is stored in the dcache forcase-insensitive filesystems.

For a case-insensitive lookup match and if the case-exact dentryalready exists in the dcache, use it and return it.

If no entry exists with the exact case name, allocate new dentry withthe exact case, and return the spliced entry.

boold_same_name(conststructdentry*dentry,conststructdentry*parent,conststructqstr*name)

compare dentry name with case-exact name

Parameters

conststructdentry*dentry

the negative dentry that was passed to the parent’s lookup func

conststructdentry*parent

parent dentry

conststructqstr*name

the case-exact name to be associated with the returned dentry

Return

true if names are same, or false

structdentry*d_lookup(conststructdentry*parent,conststructqstr*name)

search for a dentry

Parameters

conststructdentry*parent

parent dentry

conststructqstr*name

qstr of name we wish to find

Return

dentry, or NULL

Description

d_lookup searches the children of the parent dentry for the name inquestion. If the dentry is found its reference count is incremented and thedentry is returned. The caller must use dput to free the entry when it hasfinished using it.NULL is returned if the dentry does not exist.

voidd_delete(structdentry*dentry)

delete a dentry

Parameters

structdentry*dentry

The dentry to delete

Description

Turn the dentry into a negative dentry if possible, otherwiseremove it from the hash queues so it can be deleted later

voidd_rehash(structdentry*entry)

add an entry back to the hash

Parameters

structdentry*entry

dentry to add to the hash

Description

Adds a dentry to the hash according to its name.

voidd_add(structdentry*entry,structinode*inode)

add dentry to hash queues

Parameters

structdentry*entry

dentry to add

structinode*inode

The inode to attach to this dentry

Description

This adds the entry to the hash queues and initializesinode.The entry was actually filled in earlier duringd_alloc().

structdentry*d_splice_alias(structinode*inode,structdentry*dentry)

splice a disconnected dentry into the tree if one exists

Parameters

structinode*inode

the inode which may have a disconnected dentry

structdentry*dentry

a negative dentry which we want to point to the inode.

Description

If inode is a directory and has an IS_ROOT alias, then d_move that inplace of the given dentry and return it, else simply d_add the inodeto the dentry and return NULL.

If a non-IS_ROOT directory is found, the filesystem is corrupt, andwe should error out: directories can’t have multiple aliases.

This is needed in the lookup routine of any filesystem that is exportable(via knfsd) so that we can build dcache paths to directories effectively.

If a dentry was found and moved, then it is returned. Otherwise NULLis returned. This matches the expected return value of ->lookup.

Cluster filesystems may call this function with a negative, hashed dentry.In that case, we know that the inode will be a regular file, and also thiswill only occur during atomic_open. So we need to check for the dentrybeing already hashed only in the final case.

boolis_subdir(structdentry*new_dentry,structdentry*old_dentry)

is new dentry a subdirectory of old_dentry

Parameters

structdentry*new_dentry

new dentry

structdentry*old_dentry

old dentry

Description

Returns true if new_dentry is a subdirectory of the parent (at any depth).Returns false otherwise.Caller must ensure that “new_dentry” is pinned before callingis_subdir()

structdentry*dget_dlock(structdentry*dentry)

get a reference to a dentry

Parameters

structdentry*dentry

dentry to get a reference to

Description

Given a live dentry, increment the reference count and return the dentry.Caller must holddentry->d_lock. Making sure that dentry is alive iscaller’s resonsibility. There are many conditions sufficient to guaranteethat; e.g. anything with non-negative refcount is alive, so’s anythinghashed, anything positive, anyone’s parent, etc.

structdentry*dget(structdentry*dentry)

get a reference to a dentry

Parameters

structdentry*dentry

dentry to get a reference to

Description

Given a dentry orNULL pointer increment the reference countif appropriate and return the dentry. A dentry will not bedestroyed when it has references. Conversely, a dentry withno references can disappear for any number of reasons, startingwith memory pressure. In other words, that primitive isused to clone an existing reference; using it on something withzero refcount is a bug.

NOTE

it will spin ifdentry->d_lock is held. From the deadlockavoidance point of view it is equivalent tospin_lock()/incrementrefcount/spin_unlock(), so calling it underdentry->d_lock isalways a bug; so’s calling it under ->d_lock on any of its descendents.

intd_unhashed(conststructdentry*dentry)

is dentry hashed

Parameters

conststructdentry*dentry

entry to check

Description

Returns true if the dentry passed is not currently hashed.

boold_really_is_negative(conststructdentry*dentry)

Determine if a dentry is really negative (ignoring fallthroughs)

Parameters

conststructdentry*dentry

The dentry in question

Description

Returns true if the dentry represents either an absent name or a name thatdoesn’t map to an inode (ie. ->d_inode is NULL). The dentry could representa true miss, a whiteout that isn’t represented by a 0,0 chardev or afallthrough marker in an opaque directory.

Note! (1) This should be usedonly by a filesystem to examine its owndentries. It should not be used to look at some other filesystem’sdentries. (2) It should also be used in combination withd_inode() to getthe inode. (3) The dentry may have something attached to ->d_lower and thetype field of the flags may be set to something other than miss or whiteout.

boold_really_is_positive(conststructdentry*dentry)

Determine if a dentry is really positive (ignoring fallthroughs)

Parameters

conststructdentry*dentry

The dentry in question

Description

Returns true if the dentry represents a name that maps to an inode(ie. ->d_inode is not NULL). The dentry might still represent a whiteout ifthat is represented on medium as a 0,0 chardev.

Note! (1) This should be usedonly by a filesystem to examine its owndentries. It should not be used to look at some other filesystem’sdentries. (2) It should also be used in combination withd_inode() to getthe inode.

structinode*d_inode(conststructdentry*dentry)

Get the actual inode of this dentry

Parameters

conststructdentry*dentry

The dentry to query

Description

This is the helper normal filesystems should use to get at their own inodesin their own dentries and ignore the layering superimposed upon them.

structinode*d_inode_rcu(conststructdentry*dentry)

Get the actual inode of this dentry withREAD_ONCE()

Parameters

conststructdentry*dentry

The dentry to query

Description

This is the helper normal filesystems should use to get at their own inodesin their own dentries and ignore the layering superimposed upon them.

structinode*d_backing_inode(conststructdentry*upper)

Get upper or lower inode we should be using

Parameters

conststructdentry*upper

The upper layer

Description

This is the helper that should be used to get at the inode that will be usedif this dentry were to be opened as a file. The inode may be on the upperdentry or it may be on a lower dentry pinned by the upper.

Normal filesystems should not use this to access their own inodes.

structdentry*d_real(structdentry*dentry,enumd_real_typetype)

Return the real dentry

Parameters

structdentry*dentry

the dentry to query

enumd_real_typetype

the type of real dentry (data or metadata)

Description

If dentry is on a union/overlay, then return the underlying, real dentry.Otherwise return the dentry itself.

See also:Overview of the Linux Virtual File System

structinode*d_real_inode(conststructdentry*dentry)

Return the real inode hosting the data

Parameters

conststructdentry*dentry

The dentry to query

Description

If dentry is on a union/overlay, then return the underlying, real inode.Otherwise returnd_inode().

Inode Handling

intinode_init_always_gfp(structsuper_block*sb,structinode*inode,gfp_tgfp)

perform inode structure initialisation

Parameters

structsuper_block*sb

superblock inode belongs to

structinode*inode

inode to initialise

gfp_tgfp

allocation flags

Description

These are initializations that need to be done on every inodeallocation as the fields are not initialised by slab allocation.If there are additional allocations requiredgfp is used.

voiddrop_nlink(structinode*inode)

directly drop an inode’s link count

Parameters

structinode*inode

inode

Description

This is a low-level filesystem helper to replace anydirect filesystem manipulation of i_nlink. In caseswhere we are attempting to track writes to thefilesystem, a decrement to zero means an imminentwrite when the file is truncated and actually unlinkedon the filesystem.

voidclear_nlink(structinode*inode)

directly zero an inode’s link count

Parameters

structinode*inode

inode

Description

This is a low-level filesystem helper to replace anydirect filesystem manipulation of i_nlink. Seedrop_nlink() for why we care about i_nlink hitting zero.

voidset_nlink(structinode*inode,unsignedintnlink)

directly set an inode’s link count

Parameters

structinode*inode

inode

unsignedintnlink

new nlink (should be non-zero)

Description

This is a low-level filesystem helper to replace anydirect filesystem manipulation of i_nlink.

voidinc_nlink(structinode*inode)

directly increment an inode’s link count

Parameters

structinode*inode

inode

Description

This is a low-level filesystem helper to replace anydirect filesystem manipulation of i_nlink. Currently,it is only here for parity withdec_nlink().

voidinode_sb_list_add(structinode*inode)

add inode to the superblock list of inodes

Parameters

structinode*inode

inode to add

void__insert_inode_hash(structinode*inode,unsignedlonghashval)

hash an inode

Parameters

structinode*inode

unhashed inode

unsignedlonghashval

unsigned long value used to locate this object in theinode_hashtable.

Description

Add an inode to the inode hash for this superblock.

void__remove_inode_hash(structinode*inode)

remove an inode from the hash

Parameters

structinode*inode

inode to unhash

Description

Remove an inode from the superblock.

voidevict_inodes(structsuper_block*sb)

evict all evictable inodes for a superblock

Parameters

structsuper_block*sb

superblock to operate on

Description

Make sure that no inodes with zero refcount are retained. This iscalled by superblock shutdown after having SB_ACTIVE flag removed,so any inode reaching zero refcount during or after that call willbe immediately evicted.

structinode*new_inode(structsuper_block*sb)

obtain an inode

Parameters

structsuper_block*sb

superblock

Description

Allocates a new inode for given superblock. The default gfp_maskfor allocations related to inode->i_mapping is GFP_HIGHUSER_MOVABLE.If HIGHMEM pages are unsuitable or it is known that pages allocatedfor the page cache are not reclaimable or migratable,mapping_set_gfp_mask() must be called with suitable flags on thenewly created inode’s mapping

voidunlock_new_inode(structinode*inode)

clear the I_NEW state and wake up any waiters

Parameters

structinode*inode

new inode to unlock

Description

Called when the inode is fully initialised to clear the new state of theinode and wake up anyone waiting for the inode to finish initialisation.

voidlock_two_nondirectories(structinode*inode1,structinode*inode2)

take two i_mutexes on non-directory objects

Parameters

structinode*inode1

first inode to lock

structinode*inode2

second inode to lock

Description

Lock any non-NULL argument. Passed objects must not be directories.Zero, one or two objects may be locked by this function.

voidunlock_two_nondirectories(structinode*inode1,structinode*inode2)

release locks fromlock_two_nondirectories()

Parameters

structinode*inode1

first inode to unlock

structinode*inode2

second inode to unlock

structinode*inode_insert5(structinode*inode,unsignedlonghashval,int(*test)(structinode*,void*),int(*set)(structinode*,void*),void*data)

obtain an inode from a mounted file system

Parameters

structinode*inode

pre-allocated inode to use for insert to cache

unsignedlonghashval

hash value (usually inode number) to get

int(*test)(structinode*,void*)

callback used for comparisons between inodes

int(*set)(structinode*,void*)

callback used to initialize a newstructinode

void*data

opaque data pointer to pass totest andset

Description

Search for the inode specified byhashval anddata in the inode cache,and if present return it with an increased reference count. This is avariant ofiget5_locked() that doesn’t allocate an inode.

If the inode is not present in the cache, insert the pre-allocated inode andreturn it locked, hashed, and with the I_NEW flag set. The file system getsto fill it in before unlocking it viaunlock_new_inode().

Note that bothtest andset are called with the inode_hash_lock held, sothey can’t sleep.

structinode*iget5_locked(structsuper_block*sb,unsignedlonghashval,int(*test)(structinode*,void*),int(*set)(structinode*,void*),void*data)

obtain an inode from a mounted file system

Parameters

structsuper_block*sb

super block of file system

unsignedlonghashval

hash value (usually inode number) to get

int(*test)(structinode*,void*)

callback used for comparisons between inodes

int(*set)(structinode*,void*)

callback used to initialize a newstructinode

void*data

opaque data pointer to pass totest andset

Description

Search for the inode specified byhashval anddata in the inode cache,and if present return it with an increased reference count. This is ageneralized version ofiget_locked() for file systems where the inodenumber is not sufficient for unique identification of an inode.

If the inode is not present in the cache, allocate and insert a new inodeand return it locked, hashed, and with the I_NEW flag set. The file systemgets to fill it in before unlocking it viaunlock_new_inode().

Note that bothtest andset are called with the inode_hash_lock held, sothey can’t sleep.

structinode*iget5_locked_rcu(structsuper_block*sb,unsignedlonghashval,int(*test)(structinode*,void*),int(*set)(structinode*,void*),void*data)

obtain an inode from a mounted file system

Parameters

structsuper_block*sb

super block of file system

unsignedlonghashval

hash value (usually inode number) to get

int(*test)(structinode*,void*)

callback used for comparisons between inodes

int(*set)(structinode*,void*)

callback used to initialize a newstructinode

void*data

opaque data pointer to pass totest andset

Description

This is equivalent to iget5_locked, except thetest callback musttolerate the inode not being stable, including being mid-teardown.

structinode*iget_locked(structsuper_block*sb,unsignedlongino)

obtain an inode from a mounted file system

Parameters

structsuper_block*sb

super block of file system

unsignedlongino

inode number to get

Description

Search for the inode specified byino in the inode cache and if presentreturn it with an increased reference count. This is for file systemswhere the inode number is sufficient for unique identification of an inode.

If the inode is not in cache, allocate a new inode and return it locked,hashed, and with the I_NEW flag set. The file system gets to fill it inbefore unlocking it viaunlock_new_inode().

ino_tiunique(structsuper_block*sb,ino_tmax_reserved)

get a unique inode number

Parameters

structsuper_block*sb

superblock

ino_tmax_reserved

highest reserved inode number

Description

Obtain an inode number that is unique on the system for a givensuperblock. This is used by file systems that have no naturalpermanent inode numbering system. An inode number is returned thatis higher than the reserved limit but unique.

BUGS:With a large number of inodes live on the file system this functioncurrently becomes quite slow.

structinode*ilookup5_nowait(structsuper_block*sb,unsignedlonghashval,int(*test)(structinode*,void*),void*data,bool*isnew)

search for an inode in the inode cache

Parameters

structsuper_block*sb

super block of file system to search

unsignedlonghashval

hash value (usually inode number) to search for

int(*test)(structinode*,void*)

callback used for comparisons between inodes

void*data

opaque data pointer to pass totest

bool*isnew

return argument telling whether I_NEW was set whenthe inode was found in hash (the caller needs towait for I_NEW to clear)

Description

Search for the inode specified byhashval anddata in the inode cache.If the inode is in the cache, the inode is returned with an incrementedreference count.

Note

I_NEW is not waited upon so you have to be very careful what you dowith the returned inode. You probably should be usingilookup5() instead.

Note2:test is called with the inode_hash_lock held, so can’t sleep.

structinode*ilookup5(structsuper_block*sb,unsignedlonghashval,int(*test)(structinode*,void*),void*data)

search for an inode in the inode cache

Parameters

structsuper_block*sb

super block of file system to search

unsignedlonghashval

hash value (usually inode number) to search for

int(*test)(structinode*,void*)

callback used for comparisons between inodes

void*data

opaque data pointer to pass totest

Description

Search for the inode specified byhashval anddata in the inode cache,and if the inode is in the cache, return the inode with an incrementedreference count. Waits on I_NEW before returning the inode.returned with an incremented reference count.

This is a generalized version ofilookup() for file systems where theinode number is not sufficient for unique identification of an inode.

Note

test is called with the inode_hash_lock held, so can’t sleep.

structinode*ilookup(structsuper_block*sb,unsignedlongino)

search for an inode in the inode cache

Parameters

structsuper_block*sb

super block of file system to search

unsignedlongino

inode number to search for

Description

Search for the inodeino in the inode cache, and if the inode is in thecache, the inode is returned with an incremented reference count.

structinode*find_inode_nowait(structsuper_block*sb,unsignedlonghashval,int(*match)(structinode*,unsignedlong,void*),void*data)

find an inode in the inode cache

Parameters

structsuper_block*sb

super block of file system to search

unsignedlonghashval

hash value (usually inode number) to search for

int(*match)(structinode*,unsignedlong,void*)

callback used for comparisons between inodes

void*data

opaque data pointer to pass tomatch

Description

Search for the inode specified byhashval anddata in the inodecache, where the helper functionmatch will return 0 if the inodedoes not match, 1 if the inode does match, and -1 if the searchshould be stopped. Thematch function must be responsible fortaking the i_lock spin_lock and checking i_state for an inode beingfreed or being initialized, and incrementing the reference countbefore returning 1. It also must not sleep, since it is called withthe inode_hash_lock spinlock held.

This is a even more generalized version ofilookup5() when thefunction must never block ---find_inode() can block in__wait_on_freeing_inode() --- or when the caller can not incrementthe reference count because the resultingiput() might cause aninode eviction. The tradeoff is that thematch funtion must bevery carefully implemented.

structinode*find_inode_rcu(structsuper_block*sb,unsignedlonghashval,int(*test)(structinode*,void*),void*data)

find an inode in the inode cache

Parameters

structsuper_block*sb

Super block of file system to search

unsignedlonghashval

Key to hash

int(*test)(structinode*,void*)

Function to test match on an inode

void*data

Data for test function

Description

Search for the inode specified byhashval anddata in the inode cache,where the helper functiontest will return 0 if the inode does not matchand 1 if it does. Thetest function must be responsible for taking thei_lock spin_lock and checking i_state for an inode being freed or beinginitialized.

If successful, this will return the inode for which thetest functionreturned 1 and NULL otherwise.

Thetest function is not permitted to take a ref on any inode presented.It is also not permitted to sleep.

The caller must hold the RCU read lock.

structinode*find_inode_by_ino_rcu(structsuper_block*sb,unsignedlongino)

Find an inode in the inode cache

Parameters

structsuper_block*sb

Super block of file system to search

unsignedlongino

The inode number to match

Description

Search for the inode specified byhashval anddata in the inode cache,where the helper functiontest will return 0 if the inode does not matchand 1 if it does. Thetest function must be responsible for taking thei_lock spin_lock and checking i_state for an inode being freed or beinginitialized.

If successful, this will return the inode for which thetest functionreturned 1 and NULL otherwise.

Thetest function is not permitted to take a ref on any inode presented.It is also not permitted to sleep.

The caller must hold the RCU read lock.

voidiput(structinode*inode)

put an inode

Parameters

structinode*inode

inode to put

Description

Puts an inode, dropping its usage count. If the inode use count hitszero, the inode is then freed and may also be destroyed.

Consequently,iput() can sleep.

voidiput_not_last(structinode*inode)

put an inode assuming this is not the last reference

Parameters

structinode*inode

inode to put

intbmap(structinode*inode,sector_t*block)

find a block number in a file

Parameters

structinode*inode

inode owning the block number being requested

sector_t*block

pointer containing the block to find

Description

Replaces the value in*block with the block number on the device holdingcorresponding to the requested block number in the file.That is, asked for block 4 of inode 1 the function will replace the4 in*block, with disk block relative to the disk start that holds thatblock of the file.

Returns -EINVAL in case of error, 0 otherwise. If mapping falls into ahole, returns 0 and*block is also set to 0.

intinode_update_timestamps(structinode*inode,intflags)

update the timestamps on the inode

Parameters

structinode*inode

inode to be updated

intflags

S_* flags that needed to be updated

Description

The update_time function is called when an inode’s timestamps need to beupdated for a read or write operation. This function handles updating theactual timestamps. It’s up to the caller to ensure that the inode is markeddirty appropriately.

In the case where any of S_MTIME, S_CTIME, or S_VERSION need to be updated,attempt to update all three of them. S_ATIME updates can be handledindependently of the rest.

Returns a set of S_* flags indicating which values changed.

intgeneric_update_time(structinode*inode,intflags)

update the timestamps on the inode

Parameters

structinode*inode

inode to be updated

intflags

S_* flags that needed to be updated

Description

The update_time function is called when an inode’s timestamps need to beupdated for a read or write operation. In the case where any of S_MTIME, S_CTIME,or S_VERSION need to be updated we attempt to update all three of them. S_ATIMEupdates can be handled done independently of the rest.

Returns a S_* mask indicating which fields were updated.

intfile_remove_privs(structfile*file)

remove special file privileges (suid, capabilities)

Parameters

structfile*file

file to remove privileges from

Description

When file is modified by a write or truncation ensure that specialfile privileges are removed.

Return

0 on success, negative errno on failure.

structtimespec64current_time(structinode*inode)

Return FS time (possibly fine-grained)

Parameters

structinode*inode

inode.

Description

Return the current time truncated to the time granularity supported bythe fs, as suitable for a ctime/mtime change. If the ctime is flaggedas having been QUERIED, get a fine-grained timestamp, but don’t updatethe floor.

For a multigrain inode, this is effectively an estimate of the timestampthat a file would receive. An actual update must go throughinode_set_ctime_current().

intfile_update_time(structfile*file)

update mtime and ctime time

Parameters

structfile*file

file accessed

Description

Update the mtime and ctime members of an inode and mark the inode forwriteback. Note that this function is meant exclusively for usage inthe file write path of filesystems, and filesystems may choose toexplicitly ignore updates via this function with the _NOCMTIME inodeflag, e.g. for network filesystem where these imestamps are handledby the server. This can return an error for file systems who need toallocate space in order to update an inode.

Return

0 on success, negative errno on failure.

intfile_modified(structfile*file)

handle mandated vfs changes when modifying a file

Parameters

structfile*file

file that was modified

Description

When file has been modified ensure that specialfile privileges are removed and time settings are updated.

Context

Caller must hold the file’s inode lock.

Return

0 on success, negative errno on failure.

intkiocb_modified(structkiocb*iocb)

handle mandated vfs changes when modifying a file

Parameters

structkiocb*iocb

iocb that was modified

Description

When file has been modified ensure that specialfile privileges are removed and time settings are updated.

Context

Caller must hold the file’s inode lock.

Return

0 on success, negative errno on failure.

voidinode_init_owner(structmnt_idmap*idmap,structinode*inode,conststructinode*dir,umode_tmode)

Init uid,gid,mode for new inode according to posix standards

Parameters

structmnt_idmap*idmap

idmap of the mount the inode was created from

structinode*inode

New inode

conststructinode*dir

Directory inode

umode_tmode

mode of the new inode

Description

If the inode has been created through an idmapped mount the idmap ofthe vfsmount must be passed throughidmap. This function will then takecare to map the inode according toidmap before checking permissionsand initializing i_uid and i_gid. On non-idmapped mounts or if permissionchecking is to be performed on the raw inode simply passnop_mnt_idmap.

boolinode_owner_or_capable(structmnt_idmap*idmap,conststructinode*inode)

check current task permissions to inode

Parameters

structmnt_idmap*idmap

idmap of the mount the inode was found from

conststructinode*inode

inode being checked

Description

Return true if current either has CAP_FOWNER in a namespace with theinode owner uid mapped, or owns the file.

If the inode has been found through an idmapped mount the idmap ofthe vfsmount must be passed throughidmap. This function will then takecare to map the inode according toidmap before checking permissions.On non-idmapped mounts or if permission checking is to be performed on theraw inode simply passnop_mnt_idmap.

voidinode_dio_wait(structinode*inode)

wait for outstanding DIO requests to finish

Parameters

structinode*inode

inode to wait for

Description

Waits for all pending direct I/O requests to finish so that we canproceed with a truncate or equivalent operation.

Must be called under a lock that serializes taking new referencesto i_dio_count, usually by inode->i_rwsem.

structtimespec64timestamp_truncate(structtimespec64t,structinode*inode)

Truncate timespec to a granularity

Parameters

structtimespec64t

Timespec

structinode*inode

inode being updated

Description

Truncate a timespec to the granularity supported by the fscontaining the inode. Always rounds down. gran mustnot be 0 nor greater than a second (NSEC_PER_SEC, or 10^9 ns).

structtimespec64inode_set_ctime_current(structinode*inode)

set the ctime to current_time

Parameters

structinode*inode

inode

Description

Set the inode’s ctime to the current value for the inode. Returns thecurrent value that was assigned. If this is not a multigrain inode, then weset it to the later of the coarse time and floor value.

If it is multigrain, then we first see if the coarse-grained timestamp isdistinct from what is already there. If so, then use that. Otherwise, get afine-grained timestamp.

After that, try to swap the new value into i_ctime_nsec. Accept theresulting ctime, regardless of the outcome of the swap. If it hasalready been replaced, then that timestamp is later than the earlierunacceptable one, and is thus acceptable.

structtimespec64inode_set_ctime_deleg(structinode*inode,structtimespec64update)

try to update the ctime on a delegated inode

Parameters

structinode*inode

inode to update

structtimespec64update

timespec64 to set the ctime

Description

Attempt to atomically update the ctime on behalf of a delegation holder.

The nfs server can call back the holder of a delegation to get updatedinode attributes, including the mtime. When updating the mtime, updatethe ctime to a value at least equal to that.

This can race with concurrent updates to the inode, in whichcase the update is skipped.

Note that this works even when multigrain timestamps are not enabled,so it is used in either case.

boolin_group_or_capable(structmnt_idmap*idmap,conststructinode*inode,vfsgid_tvfsgid)

check whether caller is CAP_FSETID privileged

Parameters

structmnt_idmap*idmap

idmap of the mountinode was found from

conststructinode*inode

inode to check

vfsgid_tvfsgid

the new/current vfsgid ofinode

Description

Check whethervfsgid is in the caller’s group list or if the caller isprivileged with CAP_FSETID overinode. This can be used to determinewhether the setgid bit can be kept or must be dropped.

Return

true if the caller is sufficiently privileged, false if not.

umode_tmode_strip_sgid(structmnt_idmap*idmap,conststructinode*dir,umode_tmode)

handle the sgid bit for non-directories

Parameters

structmnt_idmap*idmap

idmap of the mount the inode was created from

conststructinode*dir

parent directory inode

umode_tmode

mode of the file to be created indir

Description

If themode of the new file has both the S_ISGID and S_IXGRP bitraised anddir has the S_ISGID bit raised ensure that the caller iseither in the group of the parent directory or they have CAP_FSETIDin their user namespace and are privileged over the parent directory.In all other cases, strip the S_ISGID bit frommode.

Return

the new mode to use for the file

voidmake_bad_inode(structinode*inode)

mark an inode bad due to an I/O error

Parameters

structinode*inode

Inode to mark bad

Description

When an inode cannot be read due to a media or remote networkfailure this function makes the inode “bad” and causes I/O operationson it to fail from this point on.

boolis_bad_inode(structinode*inode)

is an inode errored

Parameters

structinode*inode

inode to test

Description

Returns true if the inode in question has been marked as bad.

voidiget_failed(structinode*inode)

Mark an under-construction inode as dead and release it

Parameters

structinode*inode

The inode to discard

Description

Mark an under-construction inode as dead and release it.

Registration and Superblocks

voiddeactivate_locked_super(structsuper_block*s)

drop an active reference to superblock

Parameters

structsuper_block*s

superblock to deactivate

Description

Drops an active reference to superblock, converting it into a temporaryone if there is no other active references left. In that case wetell fs driver to shut it down and drop the temporary reference wehad just acquired.

Caller holds exclusive lock on superblock; that lock is released.

voiddeactivate_super(structsuper_block*s)

drop an active reference to superblock

Parameters

structsuper_block*s

superblock to deactivate

Description

Variant ofdeactivate_locked_super(), except that superblock isnotlocked by caller. If we are going to drop the final active reference,lock will be acquired prior to that.

voidretire_super(structsuper_block*sb)

prevents superblock from being reused

Parameters

structsuper_block*sb

superblock to retire

Description

The function marks superblock to be ignored in superblock test, whichprevents it from being reused for any new mounts. If the superblock hasa private bdi, it also unregisters it, but doesn’t reduce the refcountof the superblock to prevent potential races. The refcount is reducedbygeneric_shutdown_super(). The function can not be calledconcurrently withgeneric_shutdown_super(). It is safe to call thefunction multiple times, subsequent calls have no effect.

The marker will affect the re-use only for block-device-basedsuperblocks. Other superblocks will still get marked if this functionis used, but that will not affect their reusability.

voidgeneric_shutdown_super(structsuper_block*sb)

common helper for ->kill_sb()

Parameters

structsuper_block*sb

superblock to kill

Description

generic_shutdown_super() does all fs-independent work on superblockshutdown. Typical ->kill_sb() should pick all fs-specific objectsthat need destruction out of superblock, callgeneric_shutdown_super()and release aforementioned objects. Note: dentries and inodes _are_taken care of and do not need specific handling.

Upon calling this function, the filesystem may no longer alter orrearrange the set of dentries belonging to this super_block, nor may itchange the attachments of dentries to inodes.

structsuper_block*sget_fc(structfs_context*fc,int(*test)(structsuper_block*,structfs_context*),int(*set)(structsuper_block*,structfs_context*))

Find or create a superblock

Parameters

structfs_context*fc

Filesystem context.

int(*test)(structsuper_block*,structfs_context*)

Comparison callback

int(*set)(structsuper_block*,structfs_context*)

Setup callback

Description

Create a new superblock or find an existing one.

Thetest callback is used to find a matching existing superblock.Whether or not the requested parameters infc are taken into accountis specific to thetest callback that is used. They may even becompletely ignored.

If an extant superblock is matched, it will be returned unless:

  1. the namespace the filesystem contextfc and the extantsuperblock’s namespace differ

  2. the filesystem contextfc has requested that reusing an extantsuperblock is not allowed

In both cases EBUSY will be returned.

If no match is made, a new superblock will be allocated and basicinitialisation will be performed (s_type, s_fs_info and s_id will beset and theset callback will be invoked), the superblock will bepublished and it will be returned in a partially constructed statewith SB_BORN and SB_ACTIVE as yet unset.

Return

On success, an extant or newly created superblock isreturned. On failure an error pointer is returned.

structsuper_block*sget(structfile_system_type*type,int(*test)(structsuper_block*,void*),int(*set)(structsuper_block*,void*),intflags,void*data)

find or create a superblock

Parameters

structfile_system_type*type

filesystem type superblock should belong to

int(*test)(structsuper_block*,void*)

comparison callback

int(*set)(structsuper_block*,void*)

setup callback

intflags

mount flags

void*data

argument to each of them

voiditerate_supers_type(structfile_system_type*type,void(*f)(structsuper_block*,void*),void*arg)

call function for superblocks of given type

Parameters

structfile_system_type*type

fs type

void(*f)(structsuper_block*,void*)

function to call

void*arg

argument to pass to it

Description

Scans the superblock list and calls given function, passing itlocked superblock and given argument.

intget_anon_bdev(dev_t*p)

Allocate a block device for filesystems which don’t have one.

Parameters

dev_t*p

Pointer to a dev_t.

Description

Filesystems which don’t use real block devices can call this functionto allocate a virtual block device.

Context

Any context. Frequently called while holding sb_lock.

Return

0 on success, -EMFILE if there are no anonymous bdevs leftor -ENOMEM if memory allocation failed.

structsuper_block*sget_dev(structfs_context*fc,dev_tdev)

Find or create a superblock by device number

Parameters

structfs_context*fc

Filesystem context.

dev_tdev

device number

Description

Find or create a superblock using the provided device number thatwill be stored in fc->sget_key.

If an extant superblock is matched, then that will be returned withan elevated reference count that the caller must transfer or discard.

If no match is made, a new superblock will be allocated and basicinitialisation will be performed (s_type, s_fs_info, s_id, s_dev willbe set). The superblock will be published and it will be returned ina partially constructed state with SB_BORN and SB_ACTIVE as yetunset.

Return

an existing or newly created superblock on success, an errorpointer on failure.

intget_tree_bdev_flags(structfs_context*fc,int(*fill_super)(structsuper_block*sb,structfs_context*fc),unsignedintflags)

Get a superblock based on a single block device

Parameters

structfs_context*fc

The filesystem context holding the parameters

int(*fill_super)(structsuper_block*sb,structfs_context*fc)

Helper to initialise a new superblock

unsignedintflags

GET_TREE_BDEV_* flags

intget_tree_bdev(structfs_context*fc,int(*fill_super)(structsuper_block*,structfs_context*))

Get a superblock based on a single block device

Parameters

structfs_context*fc

The filesystem context holding the parameters

int(*fill_super)(structsuper_block*,structfs_context*)

Helper to initialise a new superblock

intvfs_get_tree(structfs_context*fc)

Get the mountable root

Parameters

structfs_context*fc

The superblock configuration context.

Description

The filesystem is invoked to get or create a superblock which can then laterbe used for mounting. The filesystem places a pointer to the root to beused for mounting infc->root.

intfreeze_super(structsuper_block*sb,enumfreeze_holderwho,constvoid*freeze_owner)

lock the filesystem and force it into a consistent state

Parameters

structsuper_block*sb

the super to lock

enumfreeze_holderwho

context that wants to freeze

constvoid*freeze_owner

owner of the freeze

Description

Syncs the super to make sure the filesystem is consistent and calls the fs’sfreeze_fs. Subsequent calls to this without first thawing the fs may return-EBUSY.

who should be:*FREEZE_HOLDER_USERSPACE if userspace wants to freeze the fs;*FREEZE_HOLDER_KERNEL if the kernel wants to freeze the fs.*FREEZE_MAY_NEST whether nesting freeze and thaw requests is allowed.

Thewho argument distinguishes between the kernel and userspace trying tofreeze the filesystem. Although there cannot be multiple kernel freezes ormultiple userspace freezes in effect at any given time, the kernel anduserspace can both hold a filesystem frozen. The filesystem remains frozenuntil there are no kernel or userspace freezes in effect.

A filesystem may hold multiple devices and thus a filesystems may befrozen through the block layer via multiple block devices. In thiscase the request is marked as being allowed to nest by passingFREEZE_MAY_NEST. The filesystem remains frozen until all blockdevices are unfrozen. If multiple freezes are attempted withoutFREEZE_MAY_NEST -EBUSY will be returned.

During this function, sb->s_writers.frozen goes through these values:

SB_UNFROZEN: File system is normal, all writes progress as usual.

SB_FREEZE_WRITE: The file system is in the process of being frozen. Newwrites should be blocked, though page faults are still allowed. We wait forall writes to complete and then proceed to the next stage.

SB_FREEZE_PAGEFAULT: Freezing continues. Now also page faults are blockedbut internal fs threads can still modify the filesystem (although theyshould not dirty new pages or inodes), writeback can run etc. After waitingfor all running page faults we sync the filesystem which will clean alldirty pages and inodes (no new dirty pages or inodes can be created whensync is running).

SB_FREEZE_FS: The file system is frozen. Now all internal sources of fsmodification are blocked (e.g. XFS preallocation truncation on inodereclaim). This is usually implemented by blocking new transactions forfilesystems that have them and need this additional guard. After allinternal writers are finished we call ->freeze_fs() to finish filesystemfreezing. Then we transition to SB_FREEZE_COMPLETE state. This state ismostly auxiliary for filesystems to verify they do not modify frozen fs.

sb->s_writers.frozen is protected by sb->s_umount.

Return

If the freeze was successful zero is returned. If the freezefailed a negative error code is returned.

intthaw_super(structsuper_block*sb,enumfreeze_holderwho,constvoid*freeze_owner)
  • unlock filesystem

Parameters

structsuper_block*sb

the super to thaw

enumfreeze_holderwho

context that wants to freeze

constvoid*freeze_owner

owner of the freeze

Description

Unlocks the filesystem and marks it writeable again afterfreeze_super()if there are no remaining freezes on the filesystem.

who should be:*FREEZE_HOLDER_USERSPACE if userspace wants to thaw the fs;*FREEZE_HOLDER_KERNEL if the kernel wants to thaw the fs.*FREEZE_MAY_NEST whether nesting freeze and thaw requests is allowed

A filesystem may hold multiple devices and thus a filesystems mayhave been frozen through the block layer via multiple block devices.The filesystem remains frozen until all block devices are unfrozen.

File Locks

boollocks_owner_has_blockers(structfile_lock_context*flctx,fl_owner_towner)

Check for blocking lock requests

Parameters

structfile_lock_context*flctx

file lock context

fl_owner_towner

lock owner

Description

Return values:

true:owner has at least one blockerfalse:owner has no blockers

intlocks_delete_block(structfile_lock*waiter)

stop waiting for a file lock

Parameters

structfile_lock*waiter

the lock which was waiting

Description

lockd/nfsd need to disconnect the lock while working on it.

intposix_lock_file(structfile*filp,structfile_lock*fl,structfile_lock*conflock)

Apply a POSIX-style lock to a file

Parameters

structfile*filp

The file to apply the lock to

structfile_lock*fl

The lock to be applied

structfile_lock*conflock

Place to return a copy of the conflicting lock, if found.

Description

Add a POSIX style lock to a file.We merge adjacent & overlapping locks whenever possible.POSIX locks are sorted by owner task, then by starting address

Note that if called with an FL_EXISTS argument, the caller may determinewhether or not a lock was successfully freed by testing the returnvalue for -ENOENT.

int__break_lease(structinode*inode,unsignedintflags)

revoke all outstanding leases on file

Parameters

structinode*inode

the inode of the file to return

unsignedintflags

LEASE_BREAK_* flags

Description

break_lease (inlined for speed) has checked there already is at leastsome kind of lock (maybe a lease) on this file. Leases are broken ona call to open() ortruncate(). This function can block waiting for thelease break unless you specify LEASE_BREAK_NONBLOCK.

voidlease_get_mtime(structinode*inode,structtimespec64*time)

update modified time of an inode with exclusive lease

Parameters

structinode*inode

the inode

structtimespec64*time

pointer to a timespec which contains the last modified time

Description

This is to force NFS clients to flush their caches for files withexclusive leases. The justification is that if someone has anexclusive lease, then they could be modifying it.

intgeneric_setlease(structfile*filp,intarg,structfile_lease**flp,void**priv)

sets a lease on an open file

Parameters

structfile*filp

file pointer

intarg

type of lease to obtain

structfile_lease**flp

input - file_lock to use, output - file_lock inserted

void**priv

private data for lm_setup (may be NULL if lm_setupdoesn’t require it)

Description

The (input) flp->fl_lmops->lm_break function is requiredbybreak_lease().

intvfs_setlease(structfile*filp,intarg,structfile_lease**lease,void**priv)

sets a lease on an open file

Parameters

structfile*filp

file pointer

intarg

type of lease to obtain

structfile_lease**lease

file_lock to use when adding a lease

void**priv

private info for lm_setup when adding a lease (may beNULL if lm_setup doesn’t require it)

Description

Call this to establish a lease on the file. The “lease” argument is notused for F_UNLCK requests and may be NULL. For commands that set or alteran existing lease, the(*lease)->fl_lmops->lm_break operation must beset; if not, this function will return -ENOLCK (and generate a scary-lookingstack trace).

The “priv” pointer is passed directly to the lm_setup function as-is. Itmay be NULL if the lm_setup operation doesn’t require it.

intlocks_lock_inode_wait(structinode*inode,structfile_lock*fl)

Apply a lock to an inode

Parameters

structinode*inode

inode of the file to apply to

structfile_lock*fl

The lock to be applied

Description

Apply a POSIX or FLOCK style lock request to an inode.

intvfs_test_lock(structfile*filp,structfile_lock*fl)

test file byte range lock

Parameters

structfile*filp

The file to test lock for

structfile_lock*fl

The byte-range in the file to test; also used to hold result

Description

On entry,fl does not contain a lock, but identifies a range (fl_start, fl_end)in the file (c.flc_file), and an owner (c.flc_owner) for whom existing locksshould be ignored. c.flc_type and c.flc_flags are ignored.Both fl_lmops and fl_ops infl must be NULL.Returns -ERRNO on failure. Indicates presence of conflicting lock bysetting fl->fl_type to something other than F_UNLCK.

Ifvfs_test_lock() does find a lock and return it, the caller mustuselocks_free_lock() orlocks_release_private() on the returned lock.

intvfs_lock_file(structfile*filp,unsignedintcmd,structfile_lock*fl,structfile_lock*conf)

file byte range lock

Parameters

structfile*filp

The file to apply the lock to

unsignedintcmd

type of locking operation (F_SETLK, F_GETLK, etc.)

structfile_lock*fl

The lock to be applied

structfile_lock*conf

Place to return a copy of the conflicting lock, if found.

Description

A caller that doesn’t care about the conflicting lock may pass NULLas the final argument.

If the filesystem defines a private ->lock() method, thenconf willbe left unchanged; so a caller that cares should initialize it tosome acceptable default.

To avoid blocking kernel daemons, such as lockd, that need to acquire POSIXlocks, the ->lock() interface may return asynchronously, before the lock hasbeen granted or denied by the underlying filesystem, if (and only if)lm_grant is set. Additionally FOP_ASYNC_LOCK in file_operations fop_flagsneed to be set.

Callers expecting ->lock() to return asynchronously will only use F_SETLK,not F_SETLKW; they will set FL_SLEEP if (and only if) the request is for ablocking lock. When ->lock() does return asynchronously, it must returnFILE_LOCK_DEFERRED, and call ->lm_grant() when the lock request completes.If the request is for non-blocking lock the file system should returnFILE_LOCK_DEFERRED then try to get the lock and call the callback routinewith the result. If the request timed out the callback routine will return anonzero return code and the file system should release the lock. The filesystem is also responsible to keep a corresponding posix lock when itgrants a lock so the VFS can find out which locks are locally held and dothe correct lock cleanup when required.The underlying filesystem must not drop the kernel lock or call->lm_grant() before returning to the caller with a FILE_LOCK_DEFERREDreturn code.

intvfs_cancel_lock(structfile*filp,structfile_lock*fl)

file byte range unblock lock

Parameters

structfile*filp

The file to apply the unblock to

structfile_lock*fl

The lock to be unblocked

Description

Used by lock managers to cancel blocked requests

boolvfs_inode_has_locks(structinode*inode)

are any file locks held oninode?

Parameters

structinode*inode

inode to check for locks

Description

Return true if there are any FL_POSIX or FL_FLOCK locks currentlyset oninode.

intlease_open_conflict(structfile*filp,constintarg)

see if the given file points to an inode that has an existing open that would conflict with the desired lease.

Parameters

structfile*filp

file to check

constintarg

type of lease that we’re trying to acquire

Description

Check to see if there’s an existing open fd on this file that wouldconflict with the lease we’re trying to set.

intposix_lock_inode_wait(structinode*inode,structfile_lock*fl)

Apply a POSIX-style lock to a file

Parameters

structinode*inode

inode of file to which lock request should be applied

structfile_lock*fl

The lock to be applied

Description

Apply a POSIX style lock request to an inode.

int__fcntl_getlease(structfile*filp,unsignedintflavor)

Enquire what lease is currently active

Parameters

structfile*filp

the file

unsignedintflavor

type of lease flags to check

Description

The value returned by this function will be one of(if no lease break is pending):

F_RDLCK to indicate a shared lease is held.

F_WRLCK to indicate an exclusive lease is held.

F_UNLCK to indicate no lease is held.

(if a lease break is pending):

F_RDLCK to indicate an exclusive lease needs to be

changed to a shared lease (or removed).

F_UNLCK to indicate the lease needs to be removed.

XXX: sfr & willy disagree over whether F_INPROGRESSshould be returned to userspace.

intfcntl_setlease(unsignedintfd,structfile*filp,intarg)

sets a lease on an open file

Parameters

unsignedintfd

open file descriptor

structfile*filp

file pointer

intarg

type of lease to obtain

Description

Call this fcntl to establish a lease on the file.Note that you also need to callF_SETSIG toreceive a signal when the lease is broken.

intfcntl_setdeleg(unsignedintfd,structfile*filp,structdelegation*deleg)

sets a delegation on an open file

Parameters

unsignedintfd

open file descriptor

structfile*filp

file pointer

structdelegation*deleg

delegation request from userland

Description

Call this fcntl to establish a delegation on the file.Note that you also need to callF_SETSIG toreceive a signal when the lease is broken.

intflock_lock_inode_wait(structinode*inode,structfile_lock*fl)

Apply a FLOCK-style lock to a file

Parameters

structinode*inode

inode of the file to apply to

structfile_lock*fl

The lock to be applied

Description

Apply a FLOCK style lock request to an inode.

longsys_flock(unsignedintfd,unsignedintcmd)
  • flock() system call.

Parameters

unsignedintfd

the file descriptor to lock.

unsignedintcmd

the type of lock to apply.

Description

Apply aFL_FLOCK style lock to an open file descriptor.Thecmd can be one of:

  • LOCK_SH -- a shared lock.

  • LOCK_EX -- an exclusive lock.

  • LOCK_UN -- remove an existing lock.

  • LOCK_MAND -- a ‘mandatory’ flock. (DEPRECATED)

LOCK_MAND support has been removed from the kernel.

pid_tlocks_translate_pid(structfile_lock_core*fl,structpid_namespace*ns)

translate a file_lock’s fl_pid number into a namespace

Parameters

structfile_lock_core*fl

The file_lock who’s fl_pid should be translated

structpid_namespace*ns

The namespace into which the pid should be translated

Description

Used to translate a fl_pid into a namespace virtual pid number

Other Functions

voidmpage_readahead(structreadahead_control*rac,get_block_tget_block)

start reads against pages

Parameters

structreadahead_control*rac

Describes which pages to read.

get_block_tget_block

The filesystem’s block mapper function.

Description

This function walks the pages and the blocks within each page, building andemitting large BIOs.

If anything unusual happens, such as:

  • encountering a page which has buffers

  • encountering a page which has a non-hole after a hole

  • encountering a page with non-contiguous blocks

then this code just gives up and calls the buffer_head-based read function.It does handle a page which has holes at the end - that is a common case:the end-of-file on blocksize < PAGE_SIZE setups.

BH_Boundary explanation:

There is a problem. The mpage read code assembles several pages, gets alltheir disk mappings, and then submits them all. That’s fine, but obtainingthe disk mappings may require I/O. Reads of indirect blocks, for example.

So an mpage read of the first 16 blocks of an ext2 file will cause I/O to besubmitted in the following order:

12 0 1 2 3 4 5 6 7 8 9 10 11 13 14 15 16

because the indirect block has to be read to get the mappings of blocks13,14,15,16. Obviously, this impacts performance.

So what we do it to allow the filesystem’sget_block() function to setBH_Boundary when it maps block 11. BH_Boundary says: mapping of the blockafter this one will require I/O against a block which is probably close tothis one. So you should push what I/O you have currently accumulated.

This all causes the disk requests to be issued in the correct order.

intmpage_writepages(structaddress_space*mapping,structwriteback_control*wbc,get_block_tget_block)

walk the list of dirty pages of the given address space &writepage() all of them

Parameters

structaddress_space*mapping

address space structure to write

structwriteback_control*wbc

subtract the number of written pages from*wbc->nr_to_write

get_block_tget_block

the filesystem’s block mapper function.

Description

This is a library function, which implements thewritepages()address_space_operation.

intgeneric_permission(structmnt_idmap*idmap,structinode*inode,intmask)

check for access rights on a Posix-like filesystem

Parameters

structmnt_idmap*idmap

idmap of the mount the inode was found from

structinode*inode

inode to check access rights for

intmask

right to check for (MAY_READ,MAY_WRITE,MAY_EXEC,MAY_NOT_BLOCK ...)

Description

Used to check for read/write/execute permissions on a file.We use “fsuid” for this, letting us set arbitrary permissionsfor filesystem access without changing the “normal” uids whichare used for other things.

generic_permission is rcu-walk aware. It returns -ECHILD in case an rcu-walkrequest cannot be satisfied (eg. requires blocking or too much complexity).It would then be called again in ref-walk mode.

If the inode has been found through an idmapped mount the idmap ofthe vfsmount must be passed throughidmap. This function will then takecare to map the inode according toidmap before checking permissions.On non-idmapped mounts or if permission checking is to be performed on theraw inode simply passnop_mnt_idmap.

intinode_permission(structmnt_idmap*idmap,structinode*inode,intmask)

Check for access rights to a given inode

Parameters

structmnt_idmap*idmap

idmap of the mount the inode was found from

structinode*inode

Inode to check permission on

intmask

Right to check for (MAY_READ,MAY_WRITE,MAY_EXEC)

Description

Check for read/write/execute permissions on an inode. We use fs[ug]id forthis, letting us set arbitrary permissions for filesystem access withoutchanging the “normal” UIDs which are used for other things.

When checking for MAY_APPEND, MAY_WRITE must also be set inmask.

voidpath_get(conststructpath*path)

get a reference to a path

Parameters

conststructpath*path

path to get the reference to

Description

Given a path increment the reference count to the dentry and the vfsmount.

voidpath_put(conststructpath*path)

put a reference to a path

Parameters

conststructpath*path

path to put the reference to

Description

Given a path decrement the reference count to the dentry and the vfsmount.

voidend_dirop(structdentry*de)

signal completion of a dirop

Parameters

structdentry*de

the dentry which was returned by start_dirop or similar.

Description

If the de is an error, nothing happens. Otherwise any lock taken toprotect the dentry is dropped and the dentry itself is release (dput()).

intvfs_path_parent_lookup(structfilename*filename,unsignedintflags,structpath*parent,structqstr*last,int*type,conststructpath*root)

lookup a parent path relative to a dentry-vfsmount pair

Parameters

structfilename*filename

filename structure

unsignedintflags

lookup flags

structpath*parent

pointer tostructpath to fill

structqstr*last

last component

int*type

type of the last component

conststructpath*root

pointer tostructpath of the base directory

intvfs_path_lookup(structdentry*dentry,structvfsmount*mnt,constchar*name,unsignedintflags,structpath*path)

lookup a file path relative to a dentry-vfsmount pair

Parameters

structdentry*dentry

pointer to dentry of the base directory

structvfsmount*mnt

pointer to vfs mount of the base directory

constchar*name

pointer to file name

unsignedintflags

lookup flags

structpath*path

pointer tostructpath to fill

structdentry*try_lookup_noperm(structqstr*name,structdentry*base)

filesystem helper to lookup single pathname component

Parameters

structqstr*name

qstr storing pathname component to lookup

structdentry*base

base directory to lookup from

Description

Look up a dentry by name in the dcache, returning NULL if it does notcurrently exist. The function does not try to create a dentry and if oneis found it doesn’t try to revalidate it.

Note that this routine is purely a helper for filesystem usage and shouldnot be called by generic code. It does no permission checking.

No locks need be held - only a counted reference tobase is needed.

structdentry*lookup_noperm(structqstr*name,structdentry*base)

filesystem helper to lookup single pathname component

Parameters

structqstr*name

qstr storing pathname component to lookup

structdentry*base

base directory to lookup from

Description

Note that this routine is purely a helper for filesystem usage and shouldnot be called by generic code. It does no permission checking.

The caller must hold base->i_rwsem.

structdentry*lookup_one(structmnt_idmap*idmap,structqstr*name,structdentry*base)

lookup single pathname component

Parameters

structmnt_idmap*idmap

idmap of the mount the lookup is performed from

structqstr*name

qstr holding pathname component to lookup

structdentry*base

base directory to lookup from

Description

This can be used for in-kernel filesystem clients such as file servers.

The caller must hold base->i_rwsem.

structdentry*lookup_one_unlocked(structmnt_idmap*idmap,structqstr*name,structdentry*base)

lookup single pathname component

Parameters

structmnt_idmap*idmap

idmap of the mount the lookup is performed from

structqstr*name

qstr olding pathname component to lookup

structdentry*base

base directory to lookup from

Description

This can be used for in-kernel filesystem clients such as file servers.

Unlike lookup_one, it should be called without the parenti_rwsem held, and will take the i_rwsem itself if necessary.

structdentry*lookup_one_positive_killable(structmnt_idmap*idmap,structqstr*name,structdentry*base)

lookup single pathname component

Parameters

structmnt_idmap*idmap

idmap of the mount the lookup is performed from

structqstr*name

qstr olding pathname component to lookup

structdentry*base

base directory to lookup from

Description

This helper will yield ERR_PTR(-ENOENT) on negatives. The helper returnsknown positive orERR_PTR(). This is what most of the users want.

Note that pinned negative with unlocked parent _can_ become positive at anytime, so callers oflookup_one_unlocked() need to be very careful; pinnedpositives have >d_inode stable, so this one avoids such problems.

This can be used for in-kernel filesystem clients such as file servers.

It should be called without the parent i_rwsem held, and will takethe i_rwsem itself if necessary. If a fatal signal is pending ordelivered, it will return-EINTR if the lock is needed.

structdentry*lookup_one_positive_unlocked(structmnt_idmap*idmap,structqstr*name,structdentry*base)

lookup single pathname component

Parameters

structmnt_idmap*idmap

idmap of the mount the lookup is performed from

structqstr*name

qstr holding pathname component to lookup

structdentry*base

base directory to lookup from

Description

This helper will yield ERR_PTR(-ENOENT) on negatives. The helper returnsknown positive orERR_PTR(). This is what most of the users want.

Note that pinned negative with unlocked parent _can_ become positive at anytime, so callers oflookup_one_unlocked() need to be very careful; pinnedpositives have >d_inode stable, so this one avoids such problems.

This can be used for in-kernel filesystem clients such as file servers.

The helper should be called without i_rwsem held.

structdentry*lookup_noperm_unlocked(structqstr*name,structdentry*base)

filesystem helper to lookup single pathname component

Parameters

structqstr*name

pathname component to lookup

structdentry*base

base directory to lookup from

Description

Note that this routine is purely a helper for filesystem usage and shouldnot be called by generic code. It does no permission checking.

Unlikelookup_noperm(), it should be called without the parenti_rwsem held, and will take the i_rwsem itself if necessary.

Unliketry_lookup_noperm() itdoes revalidate the dentry if it alreadyexisted.

structdentry*start_creating(structmnt_idmap*idmap,structdentry*parent,structqstr*name)

prepare to create a given name with permission checking

Parameters

structmnt_idmap*idmap

idmap of the mount

structdentry*parent

directory in which to prepare to create the name

structqstr*name

the name to be created

Description

Locks are taken and a lookup is performed prior to creatingan object in a directory. Permission checking (MAY_EXEC) is performedagainstidmap.

If the name already exists, a positive dentry is returned, sobehaviour is similar to O_CREAT without O_EXCL, which doesn’t failwith -EEXIST.

Return

a negative or positive dentry, or an error.

structdentry*start_removing(structmnt_idmap*idmap,structdentry*parent,structqstr*name)

prepare to remove a given name with permission checking

Parameters

structmnt_idmap*idmap

idmap of the mount

structdentry*parent

directory in which to find the name

structqstr*name

the name to be removed

Description

Locks are taken and a lookup in performed prior to removingan object from a directory. Permission checking (MAY_EXEC) is performedagainstidmap.

If the name doesn’t exist, an error is returned.

end_removing() should be called when removal is complete, or aborted.

Return

a positive dentry, or an error.

structdentry*start_creating_killable(structmnt_idmap*idmap,structdentry*parent,structqstr*name)

prepare to create a given name with permission checking

Parameters

structmnt_idmap*idmap

idmap of the mount

structdentry*parent

directory in which to prepare to create the name

structqstr*name

the name to be created

Description

Locks are taken and a lookup in performed prior to creatingan object in a directory. Permission checking (MAY_EXEC) is performedagainstidmap.

If the name already exists, a positive dentry is returned.

If a signal is received or was already pending, the function abortswith -EINTR;

Return

a negative or positive dentry, or an error.

structdentry*start_removing_killable(structmnt_idmap*idmap,structdentry*parent,structqstr*name)

prepare to remove a given name with permission checking

Parameters

structmnt_idmap*idmap

idmap of the mount

structdentry*parent

directory in which to find the name

structqstr*name

the name to be removed

Description

Locks are taken and a lookup in performed prior to removingan object from a directory. Permission checking (MAY_EXEC) is performedagainstidmap.

If the name doesn’t exist, an error is returned.

end_removing() should be called when removal is complete, or aborted.

If a signal is received or was already pending, the function abortswith -EINTR;

Return

a positive dentry, or an error.

structdentry*start_creating_noperm(structdentry*parent,structqstr*name)

prepare to create a given name without permission checking

Parameters

structdentry*parent

directory in which to prepare to create the name

structqstr*name

the name to be created

Description

Locks are taken and a lookup in performed prior to creatingan object in a directory.

If the name already exists, a positive dentry is returned.

Return

a negative or positive dentry, or an error.

structdentry*start_removing_noperm(structdentry*parent,structqstr*name)

prepare to remove a given name without permission checking

Parameters

structdentry*parent

directory in which to find the name

structqstr*name

the name to be removed

Description

Locks are taken and a lookup in performed prior to removingan object from a directory.

If the name doesn’t exist, an error is returned.

end_removing() should be called when removal is complete, or aborted.

Return

a positive dentry, or an error.

structdentry*start_creating_dentry(structdentry*parent,structdentry*child)

prepare to create a given dentry

Parameters

structdentry*parent

directory from which dentry should be removed

structdentry*child

the dentry to be removed

Description

A lock is taken to protect the dentry again other dirops andthe validity of the dentry is checked: correct parent and still hashed.

If the dentry is valid and negative a reference is taken andreturned. If not an error is returned.

end_creating() should be called when creation is complete, or aborted.

Return

the valid dentry, or an error.

structdentry*start_removing_dentry(structdentry*parent,structdentry*child)

prepare to remove a given dentry

Parameters

structdentry*parent

directory from which dentry should be removed

structdentry*child

the dentry to be removed

Description

A lock is taken to protect the dentry again other dirops andthe validity of the dentry is checked: correct parent and still hashed.

If the dentry is valid and positive, a reference is taken andreturned. If not an error is returned.

end_removing() should be called when removal is complete, or aborted.

Return

the valid dentry, or an error.

intstart_renaming(structrenamedata*rd,intlookup_flags,structqstr*old_last,structqstr*new_last)

lookup and lock names for rename with permission checking

Parameters

structrenamedata*rd

rename data containing parents and flags, andfor receiving found dentries

intlookup_flags

extra flags to pass to ->lookup (e.g. LOOKUP_REVAL,LOOKUP_NO_SYMLINKS etc).

structqstr*old_last

name of object inrd.old_parent

structqstr*new_last

name of object inrd.new_parent

Description

Look up two names and ensure locks are in place forrename.

On success the found dentries are stored inrd.old_dentry,rd.new_dentry. Also the refcount onrd->old_parent is increased.These references and the lock are dropped byend_renaming().

The passed in qstrs need not have the hash calculated, and basiceXecute permission checking is performed againstrd.mnt_idmap.

Return

zero or an error.

intstart_renaming_dentry(structrenamedata*rd,intlookup_flags,structdentry*old_dentry,structqstr*new_last)

lookup and lock name for rename with permission checking

Parameters

structrenamedata*rd

rename data containing parents and flags, andfor receiving found dentries

intlookup_flags

extra flags to pass to ->lookup (e.g. LOOKUP_REVAL,LOOKUP_NO_SYMLINKS etc).

structdentry*old_dentry

dentry of name to move

structqstr*new_last

name of target inrd.new_parent

Description

Look up target name and ensure locks are in place forrename.

On success the found dentry is stored inrd.new_dentry andrd.old_parent is confirmed to be the parent ofold_dentry. If itwas originallyNULL, it is set. In either case a reference is takenso thatend_renaming() can have a stable reference to unlock.

References and the lock can be dropped withend_renaming()

The passed in qstr need not have the hash calculated, and basiceXecute permission checking is performed againstrd.mnt_idmap.

Return

zero or an error.

intstart_renaming_two_dentries(structrenamedata*rd,structdentry*old_dentry,structdentry*new_dentry)

Lock to dentries in given parents for rename

Parameters

structrenamedata*rd

rename data containing parent

structdentry*old_dentry

dentry of name to move

structdentry*new_dentry

dentry to move to

Description

Ensure locks are in place for rename and check parentage is still correct.

On success the two dentries are stored inrd.old_dentry andrd.new_dentry andrd.old_parent andrd.new_parent are confirmed tobe the parents of the dentries.

References and the lock can be dropped withend_renaming()

Return

zero or an error.

intvfs_create(structmnt_idmap*idmap,structdentry*dentry,umode_tmode,structdelegated_inode*di)

create new file

Parameters

structmnt_idmap*idmap

idmap of the mount the inode was found from

structdentry*dentry

dentry of the child file

umode_tmode

mode of the child file

structdelegated_inode*di

returns parent inode, if the inode is delegated.

Description

Create a new file.

If the inode has been found through an idmapped mount the idmap ofthe vfsmount must be passed throughidmap. This function will then takecare to map the inode according toidmap before checking permissions.On non-idmapped mounts or if permission checking is to be performed on theraw inode simply passnop_mnt_idmap.

structfile*kernel_tmpfile_open(structmnt_idmap*idmap,conststructpath*parentpath,umode_tmode,intopen_flag,conststructcred*cred)

open a tmpfile for kernel internal use

Parameters

structmnt_idmap*idmap

idmap of the mount the inode was found from

conststructpath*parentpath

path of the base directory

umode_tmode

mode of the new tmpfile

intopen_flag

flags

conststructcred*cred

credentials for open

Description

Create and open a temporary file. The file is not accounted in nr_files,hence this is only for kernel internal use, and must not be installed intofile tables or such.

voidend_creating_path(conststructpath*path,structdentry*dentry)

finish a code section started bystart_creating_path()

Parameters

conststructpath*path

the path instantiated bystart_creating_path()

structdentry*dentry

the dentry returned bystart_creating_path()

Description

end_creating_path() will unlock and locks taken bystart_creating_path()and drop an references that were taken. It should only be calledifstart_creating_path() returned a non-error.Ifvfs_mkdir() was called and it returned an error, that errorshouldbe passed toend_creating_path() together with the path.

intvfs_mknod(structmnt_idmap*idmap,structinode*dir,structdentry*dentry,umode_tmode,dev_tdev,structdelegated_inode*delegated_inode)

create device node or file

Parameters

structmnt_idmap*idmap

idmap of the mount the inode was found from

structinode*dir

inode of the parent directory

structdentry*dentry

dentry of the child device node

umode_tmode

mode of the child device node

dev_tdev

device number of device to create

structdelegated_inode*delegated_inode

returns parent inode, if the inode is delegated.

Description

Create a device node or file.

If the inode has been found through an idmapped mount the idmap ofthe vfsmount must be passed throughidmap. This function will then takecare to map the inode according toidmap before checking permissions.On non-idmapped mounts or if permission checking is to be performed on theraw inode simply passnop_mnt_idmap.

structdentry*vfs_mkdir(structmnt_idmap*idmap,structinode*dir,structdentry*dentry,umode_tmode,structdelegated_inode*delegated_inode)

create directory returning correct dentry if possible

Parameters

structmnt_idmap*idmap

idmap of the mount the inode was found from

structinode*dir

inode of the parent directory

structdentry*dentry

dentry of the child directory

umode_tmode

mode of the child directory

structdelegated_inode*delegated_inode

returns parent inode, if the inode is delegated.

Description

Create a directory.

If the inode has been found through an idmapped mount the idmap ofthe vfsmount must be passed throughidmap. This function will then takecare to map the inode according toidmap before checking permissions.On non-idmapped mounts or if permission checking is to be performed on theraw inode simply passnop_mnt_idmap.

In the event that the filesystem does not use the*dentry but leaves itnegative or unhashes it and possibly splices a different one returning it,the original dentry isdput() and the alternate is returned.

In case of an error the dentry isdput() and anERR_PTR() is returned.

intvfs_rmdir(structmnt_idmap*idmap,structinode*dir,structdentry*dentry,structdelegated_inode*delegated_inode)

remove directory

Parameters

structmnt_idmap*idmap

idmap of the mount the inode was found from

structinode*dir

inode of the parent directory

structdentry*dentry

dentry of the child directory

structdelegated_inode*delegated_inode

returns parent inode, if it’s delegated.

Description

Remove a directory.

If the inode has been found through an idmapped mount the idmap ofthe vfsmount must be passed throughidmap. This function will then takecare to map the inode according toidmap before checking permissions.On non-idmapped mounts or if permission checking is to be performed on theraw inode simply passnop_mnt_idmap.

intvfs_unlink(structmnt_idmap*idmap,structinode*dir,structdentry*dentry,structdelegated_inode*delegated_inode)

unlink a filesystem object

Parameters

structmnt_idmap*idmap

idmap of the mount the inode was found from

structinode*dir

parent directory

structdentry*dentry

victim

structdelegated_inode*delegated_inode

returns victim inode, if the inode is delegated.

Description

The caller must hold dir->i_rwsem exclusively.

If vfs_unlink discovers a delegation, it will return -EWOULDBLOCK andreturn a reference to the inode in delegated_inode. The callershould then break the delegation on that inode and retry. Becausebreaking a delegation may take a long time, the caller should dropdir->i_rwsem before doing so.

Alternatively, a caller may pass NULL for delegated_inode. This maybe appropriate for callers that expect the underlying filesystem notto be NFS exported.

If the inode has been found through an idmapped mount the idmap ofthe vfsmount must be passed throughidmap. This function will then takecare to map the inode according toidmap before checking permissions.On non-idmapped mounts or if permission checking is to be performed on theraw inode simply passnop_mnt_idmap.

intvfs_symlink(structmnt_idmap*idmap,structinode*dir,structdentry*dentry,constchar*oldname,structdelegated_inode*delegated_inode)

create symlink

Parameters

structmnt_idmap*idmap

idmap of the mount the inode was found from

structinode*dir

inode of the parent directory

structdentry*dentry

dentry of the child symlink file

constchar*oldname

name of the file to link to

structdelegated_inode*delegated_inode

returns victim inode, if the inode is delegated.

Description

Create a symlink.

If the inode has been found through an idmapped mount the idmap ofthe vfsmount must be passed throughidmap. This function will then takecare to map the inode according toidmap before checking permissions.On non-idmapped mounts or if permission checking is to be performed on theraw inode simply passnop_mnt_idmap.

intvfs_link(structdentry*old_dentry,structmnt_idmap*idmap,structinode*dir,structdentry*new_dentry,structdelegated_inode*delegated_inode)

create a new link

Parameters

structdentry*old_dentry

object to be linked

structmnt_idmap*idmap

idmap of the mount

structinode*dir

new parent

structdentry*new_dentry

where to create the new link

structdelegated_inode*delegated_inode

returns inode needing a delegation break

Description

The caller must hold dir->i_rwsem exclusively.

If vfs_link discovers a delegation on the to-be-linked file in needof breaking, it will return -EWOULDBLOCK and return a reference to theinode in delegated_inode. The caller should then break the delegationand retry. Because breaking a delegation may take a long time, thecaller should drop the i_rwsem before doing so.

Alternatively, a caller may pass NULL for delegated_inode. This maybe appropriate for callers that expect the underlying filesystem notto be NFS exported.

If the inode has been found through an idmapped mount the idmap ofthe vfsmount must be passed throughidmap. This function will then takecare to map the inode according toidmap before checking permissions.On non-idmapped mounts or if permission checking is to be performed on theraw inode simply passnop_mnt_idmap.

intvfs_rename(structrenamedata*rd)

rename a filesystem object

Parameters

structrenamedata*rd

pointer tostructrenamedata info

Description

The caller must hold multiple mutexes--seelock_rename()).

If vfs_rename discovers a delegation in need of breaking at eitherthe source or destination, it will return -EWOULDBLOCK and return areference to the inode in delegated_inode. The caller should thenbreak the delegation and retry. Because breaking a delegation maytake a long time, the caller should drop all locks before doingso.

Alternatively, a caller may pass NULL for delegated_inode. This maybe appropriate for callers that expect the underlying filesystem notto be NFS exported.

The worst of all namespace operations - renaming directory. “Perverted”doesn’t even start to describe it. Somebody in UCB had a heck of a trip...Problems:

  1. we can get into loop creation.

  2. race potential - two innocent renames can create a loop together.That’s where 4.4BSD screws up. Current fix: serialization onsb->s_vfs_rename_mutex. We might be more accurate, but that’s anotherstory.

  3. we may have to lock up to _four_ objects - parents and victim (if it exists),and source (if it’s a non-directory or a subdirectory that moves todifferent parent).And that - after we got ->i_rwsem on parents (until then we don’t knowwhether the target exists). Solution: try to be smart with lockingorder for inodes. We rely on the fact that tree topology may changeonly under ->s_vfs_rename_mutex _and_ that parent of the object wemove will be locked. Thus we can rank directories by the tree(ancestors first) and rank all non-directories after them.That works since everybody except rename does “lock parent, lookup,lock child” and rename is under ->s_vfs_rename_mutex.HOWEVER, it relies on the assumption that any object with ->lookup()has no more than 1 dentry. If “hybrid” objects will ever appear,we’d better make sure that there’s no link(2) for them.

  4. conversion from fhandle to dentry may come in the wrong moment - whenwe are removing the target. Solution: we will have to grab ->i_rwsemin the fhandle_to_dentry code. [FIXME - current nfsfh.c relies on->i_rwsem on parents, which works but leads to some truly excessivelocking].

intvfs_readlink(structdentry*dentry,char__user*buffer,intbuflen)

copy symlink body into userspace buffer

Parameters

structdentry*dentry

dentry on which to get symbolic link

char__user*buffer

user memory pointer

intbuflen

size of buffer

Description

Does not touch atime. That’s up to the caller if necessary

Does not call security hook.

constchar*vfs_get_link(structdentry*dentry,structdelayed_call*done)

get symlink body

Parameters

structdentry*dentry

dentry on which to get symbolic link

structdelayed_call*done

caller needs to free returned data with this

Description

Calls security hook and i_op->get_link() on the supplied inode.

It does not touch atime. That’s up to the caller if necessary.

Does not work on “special” symlinks like /proc/$$/fd/N

constchar*page_get_link(structdentry*dentry,structinode*inode,structdelayed_call*callback)

An implementation of the get_link inode_operation.

Parameters

structdentry*dentry

The directory entry which is the symlink.

structinode*inode

The inode for the symlink.

structdelayed_call*callback

Used to drop the reference to the symlink.

Description

Filesystems which store their symlinks in the page cache should usethis to implement theget_link() member of their inode_operations.

Return

A pointer to the NUL-terminated symlink.

voidpage_put_link(void*arg)

Drop the reference to the symlink.

Parameters

void*arg

The folio which contains the symlink.

Description

This is used internally bypage_get_link(). It is exported for useby filesystems which need to implement a variant ofpage_get_link()themselves. Despite the apparent symmetry, filesystems which usepage_get_link() do not need to callpage_put_link().

The argument, while it has a void pointer type, must be a pointer tothe folio which was retrieved from the page cache. The delayed_callinfrastructure is used to drop the reference count once the calleris done with the symlink.

voidbio_reset(structbio*bio,structblock_device*bdev,blk_opf_topf)

reinitialize a bio

Parameters

structbio*bio

bio to reset

structblock_device*bdev

block device to use the bio for

blk_opf_topf

operation and flags for bio

Description

After callingbio_reset(),bio will be in the same state as a freshlyallocated bio returned biobio_alloc_bioset() - the only fields that arepreserved are the ones that are initialized bybio_alloc_bioset(). Seecomment instructbio.

voidbio_chain(structbio*bio,structbio*parent)

chain bio completions

Parameters

structbio*bio

the target bio

structbio*parent

the parent bio ofbio

Description

The caller won’t have a bi_end_io called whenbio completes - instead,parent’s bi_end_io won’t be called until bothparent andbio havecompleted; the chained bio will also be freed when it completes.

The caller must not set bi_private or bi_end_io inbio.

structbio*bio_alloc_bioset(structblock_device*bdev,unsignedshortnr_vecs,blk_opf_topf,gfp_tgfp_mask,structbio_set*bs)

allocate a bio for I/O

Parameters

structblock_device*bdev

block device to allocate the bio for (can beNULL)

unsignedshortnr_vecs

number of bvecs to pre-allocate

blk_opf_topf

operation and flags for bio

gfp_tgfp_mask

the GFP_* mask given to the slab allocator

structbio_set*bs

the bio_set to allocate from.

Description

Allocate a bio from the mempools inbs.

If__GFP_DIRECT_RECLAIM is set then bio_alloc will always be able toallocate a bio. This is due to the mempool guarantees. To make this work,callers must never allocate more than 1 bio at a time from the general pool.Callers that need to allocate more than 1 bio must always submit thepreviously allocated bio for IO before attempting to allocate a new one.Failure to do so can cause deadlocks under memory pressure.

Note that when running undersubmit_bio_noacct() (i.e. any block driver),bios are not submitted until after you return - see the code insubmit_bio_noacct() that converts recursion into iteration, to preventstack overflows.

This would normally mean allocating multiple bios undersubmit_bio_noacct()would be susceptible to deadlocks, but we havedeadlock avoidance code that resubmits any blocked bios from a rescuerthread.

However, we do not guarantee forward progress for allocations from othermempools. Doing multiple allocations from the same mempool undersubmit_bio_noacct() should be avoided - instead, use bio_set’s front_padfor per bio allocations.

Return

Pointer to new bio on success, NULL on failure.

structbio*bio_kmalloc(unsignedshortnr_vecs,gfp_tgfp_mask)

kmalloc a bio

Parameters

unsignedshortnr_vecs

number of bio_vecs to allocate

gfp_tgfp_mask

the GFP_* mask given to the slab allocator

Description

Use kmalloc to allocate a bio (including bvecs). The bio must be initializedusingbio_init() before use. To free a bio returned from this function usekfree() after callingbio_uninit(). A bio returned from this function canbe reused by callingbio_uninit() before callingbio_init() again.

Note that unlikebio_alloc() orbio_alloc_bioset() allocations from thisfunction are not backed by a mempool can fail. Do not use this functionfor allocations in the file system I/O path.

Return

Pointer to new bio on success, NULL on failure.

voidbio_put(structbio*bio)

release a reference to a bio

Parameters

structbio*bio

bio to release reference to

Description

Put a reference to astructbio, either one you have gotten withbio_alloc, bio_get or bio_clone_*. The last put of a bio will free it.

structbio*bio_alloc_clone(structblock_device*bdev,structbio*bio_src,gfp_tgfp,structbio_set*bs)

clone a bio that shares the original bio’s biovec

Parameters

structblock_device*bdev

block_device to clone onto

structbio*bio_src

bio to clone from

gfp_tgfp

allocation priority

structbio_set*bs

bio_set to allocate from

Description

Allocate a new bio that is a clone ofbio_src. The caller owns the returnedbio, but not the actual data it points to.

The caller must ensure that the return bio is not freed beforebio_src.

intbio_init_clone(structblock_device*bdev,structbio*bio,structbio*bio_src,gfp_tgfp)

clone a bio that shares the original bio’s biovec

Parameters

structblock_device*bdev

block_device to clone onto

structbio*bio

bio to clone into

structbio*bio_src

bio to clone from

gfp_tgfp

allocation priority

Description

Initialize a new bio in caller provided memory that is a clone ofbio_src.The caller owns the returned bio, but not the actual data it points to.

The caller must ensure thatbio_src is not freed beforebio.

void__bio_add_page(structbio*bio,structpage*page,unsignedintlen,unsignedintoff)

add page(s) to a bio in a new segment

Parameters

structbio*bio

destination bio

structpage*page

start page to add

unsignedintlen

length of the data to add, may cross pages

unsignedintoff

offset of the data relative topage, may cross pages

Description

Add the data atpage +off tobio as a new bvec. The caller must ensurethatbio has space for another bvec.

voidbio_add_virt_nofail(structbio*bio,void*vaddr,unsignedlen)

add data in the direct kernel mapping to a bio

Parameters

structbio*bio

destination bio

void*vaddr

data to add

unsignedlen

length of the data to add, may cross pages

Description

Add the data atvaddr tobio. The caller must have ensure a segmentis available for the added data. No merging into an existing segmentwill be performed.

intbio_add_page(structbio*bio,structpage*page,unsignedintlen,unsignedintoffset)

attempt to add page(s) to bio

Parameters

structbio*bio

destination bio

structpage*page

start page to add

unsignedintlen

vec entry length, may cross pages

unsignedintoffset

vec entry offset relative topage, may cross pages

Description

Attempt to add page(s) to the bio_vec maplist. This will only failif either bio->bi_vcnt == bio->bi_max_vecs or it’s a cloned bio.

boolbio_add_folio(structbio*bio,structfolio*folio,size_tlen,size_toff)

Attempt to add part of a folio to a bio.

Parameters

structbio*bio

BIO to add to.

structfolio*folio

Folio to add.

size_tlen

How many bytes from the folio to add.

size_toff

First byte in this folio to add.

Description

Filesystems that use folios can call this function instead of callingbio_add_page() for each page in the folio. Ifoff is bigger thanPAGE_SIZE, this function can create a bio_vec that starts in a pageafter the bv_page. BIOs do not support folios that are 4GiB or larger.

Return

Whether the addition was successful.

unsignedintbio_add_vmalloc_chunk(structbio*bio,void*vaddr,unsignedlen)

add a vmalloc chunk to a bio

Parameters

structbio*bio

destination bio

void*vaddr

vmalloc address to add

unsignedlen

total length in bytes of the data to add

Description

Add data starting atvaddr tobio and return how many bytes were added.This may be less than the amount originally asked. Returns 0 if no datacould be added tobio.

This helper callsflush_kernel_vmap_range() for the range added. For readsthe caller still needs to manually callinvalidate_kernel_vmap_range() inthe completion handler.

boolbio_add_vmalloc(structbio*bio,void*vaddr,unsignedintlen)

add a vmalloc region to a bio

Parameters

structbio*bio

destination bio

void*vaddr

vmalloc address to add

unsignedintlen

total length in bytes of the data to add

Description

Add data starting atvaddr tobio. Returntrue on success orfalse ifbio does not have enough space for the payload.

This helper callsflush_kernel_vmap_range() for the range added. For readsthe caller still needs to manually callinvalidate_kernel_vmap_range() inthe completion handler.

intsubmit_bio_wait(structbio*bio)

submit a bio, and wait until it completes

Parameters

structbio*bio

Thestructbio which describes the I/O

Description

Simple wrapper aroundsubmit_bio(). Returns 0 on success, or the error frombio_endio() on failure.

WARNING: Unlike to howsubmit_bio() is usually used, this function does notresult in bio reference to be consumed. The caller must drop the referenceon his own.

intbdev_rw_virt(structblock_device*bdev,sector_tsector,void*data,size_tlen,enumreq_opop)

synchronously read into / write from kernel mapping

Parameters

structblock_device*bdev

block device to access

sector_tsector

sector to access

void*data

data to read/write

size_tlen

length in byte to read/write

enumreq_opop

operation (e.g. REQ_OP_READ/REQ_OP_WRITE)

Description

Performs synchronous I/O tobdev fordata/len.data must be inthe kernel direct mapping and not a vmalloc address.

voidbio_copy_data(structbio*dst,structbio*src)

copy contents of data buffers from one bio to another

Parameters

structbio*dst

destination bio

structbio*src

source bio

Description

Stops when it reaches the end of eithersrc ordst - that is, copiesmin(src->bi_size, dst->bi_size) bytes (or the equivalent for lists of bios).

voidbio_endio(structbio*bio)

end I/O on a bio

Parameters

structbio*bio

bio

Description

bio_endio() will end I/O on the whole bio.bio_endio() is the preferredway to end I/O on a bio. No one should callbi_end_io() directly on abio unless they own it and thus know that it has an end_io function.

bio_endio() can be called several times on a bio that has been chainedusingbio_chain(). The ->bi_end_io() function will only be called thelast time.

structbio*bio_split(structbio*bio,intsectors,gfp_tgfp,structbio_set*bs)

split a bio

Parameters

structbio*bio

bio to split

intsectors

number of sectors to split from the front ofbio

gfp_tgfp

gfp mask

structbio_set*bs

bio set to allocate from

Description

Allocates and returns a new bio which representssectors from the start ofbio, and updatesbio to represent the remaining sectors.

Unless this is a discard request the newly allocated bio will pointtobio’s bi_io_vec. It is the caller’s responsibility to ensure thatneitherbio norbs are freed before the split bio.

voidbio_trim(structbio*bio,sector_toffset,sector_tsize)

trim a bio

Parameters

structbio*bio

bio to trim

sector_toffset

number of sectors to trim from the front ofbio

sector_tsize

size we want to trimbio to, in sectors

Description

This function is typically used for bios that are cloned and submittedto the underlying device in parts.

intbioset_init(structbio_set*bs,unsignedintpool_size,unsignedintfront_pad,intflags)

Initialize a bio_set

Parameters

structbio_set*bs

pool to initialize

unsignedintpool_size

Number of bio and bio_vecs to cache in the mempool

unsignedintfront_pad

Number of bytes to allocate in front of the returned bio

intflags

Flags to modify behavior, currentlyBIOSET_NEED_BVECSandBIOSET_NEED_RESCUER

Description

Set up a bio_set to be used withbio_alloc_bioset. Allows the callerto ask for a number of bytes to be allocated in front of the bio.Front pad allocation is useful for embedding the bio insideanother structure, to avoid allocating extra data to go with the bio.Note that the bio must be embedded at the END of that structure always,or things will break badly.IfBIOSET_NEED_BVECS is set inflags, a separate pool will be allocatedfor allocating iovecs. This pool is not needed e.g. forbio_init_clone().IfBIOSET_NEED_RESCUER is set, a workqueue is created which can be usedto dispatch queued requests when the mempool runs out of space.

intseq_open(structfile*file,conststructseq_operations*op)

initialize sequential file

Parameters

structfile*file

file we initialize

conststructseq_operations*op

method table describing the sequence

Description

seq_open() setsfile, associating it with a sequence describedbyop.op->start() sets the iterator up and returns the firstelement of sequence.op->stop() shuts it down.op->next()returns the next element of sequence.op->show() prints elementinto the buffer. In case of error ->start() and ->next() returnERR_PTR(error). In the end of sequence they returnNULL. ->show()returns 0 in case of success and negative number in case of error.Returning SEQ_SKIP means “discard this element and move on”.

Note

seq_open() will allocate a struct seq_file and store its

pointer infile->private_data. This pointer should not be modified.

ssize_tseq_read(structfile*file,char__user*buf,size_tsize,loff_t*ppos)

->read() method for sequential files.

Parameters

structfile*file

the file to read from

char__user*buf

the buffer to read to

size_tsize

the maximum number of bytes to read

loff_t*ppos

the current position in the file

Description

Ready-made ->f_op->read()

loff_tseq_lseek(structfile*file,loff_toffset,intwhence)

->llseek() method for sequential files.

Parameters

structfile*file

the file in question

loff_toffset

new position

intwhence

0 for absolute, 1 for relative position

Description

Ready-made ->f_op->llseek()

intseq_release(structinode*inode,structfile*file)

free the structures associated with sequential file.

Parameters

structinode*inode

its inode

structfile*file

file in question

Description

Frees the structures associated with sequential file; can be usedas ->f_op->release() if you don’t have private data to destroy.

voidseq_escape_mem(structseq_file*m,constchar*src,size_tlen,unsignedintflags,constchar*esc)

print data into buffer, escaping some characters

Parameters

structseq_file*m

target buffer

constchar*src

source buffer

size_tlen

size of source buffer

unsignedintflags

flags to pass tostring_escape_mem()

constchar*esc

set of characters that need escaping

Description

Puts data into buffer, replacing each occurrence of character fromgiven class (defined byflags andesc) with printable escaped sequence.

Useseq_has_overflowed() to check for errors.

char*mangle_path(char*s,constchar*p,constchar*esc)

mangle and copy path to buffer beginning

Parameters

char*s

buffer start

constchar*p

beginning of path in above buffer

constchar*esc

set of characters that need escaping

Description

Copy the path fromp tos, replacing each occurrence of character fromesc with usual octal escape.Returns pointer past last written character ins, or NULL in case offailure.

intseq_path(structseq_file*m,conststructpath*path,constchar*esc)

seq_file interface to print a pathname

Parameters

structseq_file*m

the seq_file handle

conststructpath*path

thestructpath to print

constchar*esc

set of characters to escape in the output

Description

return the absolute path of ‘path’, as represented by thedentry / mnt pair in the path parameter.

intseq_file_path(structseq_file*m,structfile*file,constchar*esc)

seq_file interface to print a pathname of a file

Parameters

structseq_file*m

the seq_file handle

structfile*file

thestructfile to print

constchar*esc

set of characters to escape in the output

Description

return the absolute path to the file.

intseq_write(structseq_file*seq,constvoid*data,size_tlen)

write arbitrary data to buffer

Parameters

structseq_file*seq

seq_file identifying the buffer to which data should be written

constvoid*data

data address

size_tlen

number of bytes

Description

Return 0 on success, non-zero otherwise.

voidseq_pad(structseq_file*m,charc)

write padding spaces to buffer

Parameters

structseq_file*m

seq_file identifying the buffer to which data should be written

charc

the byte to append after padding if non-zero

structhlist_node*seq_hlist_start(structhlist_head*head,loff_tpos)

start an iteration of a hlist

Parameters

structhlist_head*head

the head of the hlist

loff_tpos

the start position of the sequence

Description

Called at seq_file->op->start().

structhlist_node*seq_hlist_start_head(structhlist_head*head,loff_tpos)

start an iteration of a hlist

Parameters

structhlist_head*head

the head of the hlist

loff_tpos

the start position of the sequence

Description

Called at seq_file->op->start(). Call this function if you want toprint a header at the top of the output.

structhlist_node*seq_hlist_next(void*v,structhlist_head*head,loff_t*ppos)

move to the next position of the hlist

Parameters

void*v

the current iterator

structhlist_head*head

the head of the hlist

loff_t*ppos

the current position

Description

Called at seq_file->op->next().

structhlist_node*seq_hlist_start_rcu(structhlist_head*head,loff_tpos)

start an iteration of a hlist protected by RCU

Parameters

structhlist_head*head

the head of the hlist

loff_tpos

the start position of the sequence

Description

Called at seq_file->op->start().

This list-traversal primitive may safely run concurrently withthe _rcu list-mutation primitives such ashlist_add_head_rcu()as long as the traversal is guarded byrcu_read_lock().

structhlist_node*seq_hlist_start_head_rcu(structhlist_head*head,loff_tpos)

start an iteration of a hlist protected by RCU

Parameters

structhlist_head*head

the head of the hlist

loff_tpos

the start position of the sequence

Description

Called at seq_file->op->start(). Call this function if you want toprint a header at the top of the output.

This list-traversal primitive may safely run concurrently withthe _rcu list-mutation primitives such ashlist_add_head_rcu()as long as the traversal is guarded byrcu_read_lock().

structhlist_node*seq_hlist_next_rcu(void*v,structhlist_head*head,loff_t*ppos)

move to the next position of the hlist protected by RCU

Parameters

void*v

the current iterator

structhlist_head*head

the head of the hlist

loff_t*ppos

the current position

Description

Called at seq_file->op->next().

This list-traversal primitive may safely run concurrently withthe _rcu list-mutation primitives such ashlist_add_head_rcu()as long as the traversal is guarded byrcu_read_lock().

structhlist_node*seq_hlist_start_percpu(structhlist_head__percpu*head,int*cpu,loff_tpos)

start an iteration of a percpu hlist array

Parameters

structhlist_head__percpu*head

pointer to percpu array ofstructhlist_heads

int*cpu

pointer to cpu “cursor”

loff_tpos

start position of sequence

Description

Called at seq_file->op->start().

structhlist_node*seq_hlist_next_percpu(void*v,structhlist_head__percpu*head,int*cpu,loff_t*pos)

move to the next position of the percpu hlist array

Parameters

void*v

pointer to current hlist_node

structhlist_head__percpu*head

pointer to percpu array ofstructhlist_heads

int*cpu

pointer to cpu “cursor”

loff_t*pos

start position of sequence

Description

Called at seq_file->op->next().

intregister_filesystem(structfile_system_type*fs)

register a new filesystem

Parameters

structfile_system_type*fs

the file system structure

Description

Adds the file system passed to the list of file systems the kernelis aware of for mount and other syscalls. Returns 0 on success,or a negative errno code on an error.

Thestructfile_system_type that is passed is linked into the kernelstructures and must not be freed until the file system has beenunregistered.

intunregister_filesystem(structfile_system_type*fs)

unregister a file system

Parameters

structfile_system_type*fs

filesystem to unregister

Description

Remove a file system that was previously successfully registeredwith the kernel. An error is returned if the file system is not found.Zero is returned on a success.

Once this function has returned thestructfile_system_type structuremay be freed or reused.

voidwbc_attach_fdatawrite_inode(structwriteback_control*wbc,structinode*inode)

associate wbc and inode for fdatawrite

Parameters

structwriteback_control*wbc

writeback_control of interest

structinode*inode

target inode

Description

This function is to be used byfilemap_writeback(), which is an alternativeentry point into writeback code, and first ensuresinode is associated witha bdi_writeback and attaches it towbc.

voidwbc_detach_inode(structwriteback_control*wbc)

disassociate wbc from inode and perform foreign detection

Parameters

structwriteback_control*wbc

writeback_control of the just finished writeback

Description

To be called after a writeback attempt of an inode finishes and undoeswbc_attach_and_unlock_inode(). Can be called under any context.

As concurrent write sharing of an inode is expected to be very rare andmemcg only tracks page ownership on first-use basis severely confiningthe usefulness of such sharing, cgroup writeback tracks ownershipper-inode. While the support for concurrent write sharing of an inodeis deemed unnecessary, an inode being written to by different cgroups atdifferent points in time is a lot more common, and, more importantly,charging only by first-use can too readily lead to grossly incorrectbehaviors (single foreign page can lead to gigabytes of writeback to beincorrectly attributed).

To resolve this issue, cgroup writeback detects the majority dirtier ofan inode and transfers the ownership to it. To avoid unnecessaryoscillation, the detection mechanism keeps track of history and givesout the switch verdict only if the foreign usage pattern is stable overa certain amount of time and/or writeback attempts.

On each writeback attempt,wbc tries to detect the majority writerusing Boyer-Moore majority vote algorithm. In addition to the bytecount from the majority voting, it also counts the bytes written for thecurrent wb and the last round’s winner wb (max of last round’s currentwb, the winner from two rounds ago, and the last round’s majoritycandidate). Keeping track of the historical winner helps the algorithmto semi-reliably detect the most active writer even when it’s not theabsolute majority.

Once the winner of the round is determined, whether the winner isforeign or not and how much IO time the round consumed is recorded ininode->i_wb_frn_history. If the amount of recorded foreign IO time isover a certain threshold, the switch verdict is given.

voidwbc_account_cgroup_owner(structwriteback_control*wbc,structfolio*folio,size_tbytes)

account writeback to update inode cgroup ownership

Parameters

structwriteback_control*wbc

writeback_control of the writeback in progress

structfolio*folio

folio being written out

size_tbytes

number of bytes being written out

Description

bytes fromfolio are about to written out during the writebackcontrolled bywbc. Keep the book for foreign inode detection. Seewbc_detach_inode().

void__mark_inode_dirty(structinode*inode,intflags)

internal function to mark an inode dirty

Parameters

structinode*inode

inode to mark

intflags

what kind of dirty, e.g. I_DIRTY_SYNC. This can be a combination ofmultiple I_DIRTY_* flags, except that I_DIRTY_TIME can’t be combinedwith I_DIRTY_PAGES.

Description

Mark an inode as dirty. We notify the filesystem, then update the inode’sdirty flags. Then, if needed we add the inode to the appropriate dirty list.

Most callers should usemark_inode_dirty() ormark_inode_dirty_sync()instead of calling this directly.

CAREFUL! We only add the inode to the dirty list if it is hashed or if itrefers to a blockdev. Unhashed inodes will never be added to the dirty listeven if they are later hashed, as they will have been marked dirty already.

In short, ensure you hash any inodes _before_ you start marking them dirty.

Note that for blockdevs, inode->dirtied_when represents the dirtying time ofthe block-special inode (/dev/hda1) itself. And the ->dirtied_when field ofthe kernel-internal blockdev inode represents the dirtying time of theblockdev’s pages. This is why for I_DIRTY_PAGES we always usepage->mapping->host, so the page-dirtying time is recorded in the internalblockdev inode.

voidwriteback_inodes_sb_nr(structsuper_block*sb,unsignedlongnr,enumwb_reasonreason)

writeback dirty inodes from given super_block

Parameters

structsuper_block*sb

the superblock

unsignedlongnr

the number of pages to write

enumwb_reasonreason

reason why some writeback work initiated

Description

Start writeback on some inodes on this super_block. No guarantees are madeon how many (if any) will be written, and this function does not waitfor IO completion of submitted IO.

voidwriteback_inodes_sb(structsuper_block*sb,enumwb_reasonreason)

writeback dirty inodes from given super_block

Parameters

structsuper_block*sb

the superblock

enumwb_reasonreason

reason why some writeback work was initiated

Description

Start writeback on some inodes on this super_block. No guarantees are madeon how many (if any) will be written, and this function does not waitfor IO completion of submitted IO.

voidtry_to_writeback_inodes_sb(structsuper_block*sb,enumwb_reasonreason)

try to start writeback if none underway

Parameters

structsuper_block*sb

the superblock

enumwb_reasonreason

reason why some writeback work was initiated

Description

Invoke __writeback_inodes_sb_nr if no writeback is currently underway.

voidsync_inodes_sb(structsuper_block*sb)

sync sb inode pages

Parameters

structsuper_block*sb

the superblock

Description

This function writes and waits on any dirty inode belonging to thissuper_block.

intwrite_inode_now(structinode*inode,intsync)

write an inode to disk

Parameters

structinode*inode

inode to write to disk

intsync

whether the write should be synchronous or not

Description

This function commits an inode to disk immediately if it is dirty. This isprimarily needed by knfsd.

The caller must either have a ref on the inode or must have set I_WILL_FREE.

intsync_inode_metadata(structinode*inode,intwait)

write an inode to disk

Parameters

structinode*inode

the inode to sync

intwait

wait for I/O to complete.

Description

Write an inode to disk and adjust its dirty state after completion.

Note

only writes the actual inode, no associated data or other metadata.

structfile*anon_inode_getfile(constchar*name,conststructfile_operations*fops,void*priv,intflags)

creates a new file instance by hooking it up to an anonymous inode, and a dentry that describe the “class” of the file

Parameters

constchar*name

[in] name of the “class” of the new file

conststructfile_operations*fops

[in] file operations for the new file

void*priv

[in] private data for the new file (will be file’s private_data)

intflags

[in] flags

Description

Creates a new file by hooking it on a single inode. This is useful for filesthat do not need to have a full-fledged inode in order to operate correctly.All the files created withanon_inode_getfile() will share a single inode,hence saving memory and avoiding code duplication for the file/inode/dentrysetup. Returns the newly created file* or an error pointer.

structfile*anon_inode_getfile_fmode(constchar*name,conststructfile_operations*fops,void*priv,intflags,fmode_tf_mode)

creates a new file instance by hooking it up to an anonymous inode, and a dentry that describe the “class” of the file

Parameters

constchar*name

[in] name of the “class” of the new file

conststructfile_operations*fops

[in] file operations for the new file

void*priv

[in] private data for the new file (will be file’s private_data)

intflags

[in] flags

fmode_tf_mode

[in] fmode

Description

Creates a new file by hooking it on a single inode. This is useful for filesthat do not need to have a full-fledged inode in order to operate correctly.All the files created withanon_inode_getfile() will share a single inode,hence saving memory and avoiding code duplication for the file/inode/dentrysetup. Allows setting the fmode. Returns the newly created file* or an errorpointer.

structfile*anon_inode_create_getfile(constchar*name,conststructfile_operations*fops,void*priv,intflags,conststructinode*context_inode)

Likeanon_inode_getfile(), but creates a new !S_PRIVATE anon inode rather than reuse the singleton anon inode and calls theinode_init_security_anon() LSM hook.

Parameters

constchar*name

[in] name of the “class” of the new file

conststructfile_operations*fops

[in] file operations for the new file

void*priv

[in] private data for the new file (will be file’s private_data)

intflags

[in] flags

conststructinode*context_inode

[in] the logical relationship with the new inode (optional)

Description

Create a new anonymous inode and file pair. This can be done for tworeasons:

  • for the inode to have its own security context, so that LSMs can enforcepolicy on the inode’s creation;

  • if the caller needs a unique inode, for example in order to customizethe size returned byfstat()

The LSM may usecontext_inode ininode_init_security_anon(), but areference to it is not held.

Returns the newly created file* or an error pointer.

intanon_inode_getfd(constchar*name,conststructfile_operations*fops,void*priv,intflags)

creates a new file instance by hooking it up to an anonymous inode and a dentry that describe the “class” of the file

Parameters

constchar*name

[in] name of the “class” of the new file

conststructfile_operations*fops

[in] file operations for the new file

void*priv

[in] private data for the new file (will be file’s private_data)

intflags

[in] flags

Description

Creates a new file by hooking it on a single inode. This isuseful for files that do not need to have a full-fledged inode inorder to operate correctly. All the files created withanon_inode_getfd() will use the same singleton inode, reducingmemory use and avoiding code duplication for the file/inode/dentrysetup. Returns a newly created file descriptor or an error code.

intsetattr_should_drop_sgid(structmnt_idmap*idmap,conststructinode*inode)

determine whether the setgid bit needs to be removed

Parameters

structmnt_idmap*idmap

idmap of the mountinode was found from

conststructinode*inode

inode to check

Description

This function determines whether the setgid bit needs to be removed.We retain backwards compatibility and require setgid bit to be removedunconditionally if S_IXGRP is set. Otherwise we have the exact samerequirements assetattr_prepare() andsetattr_copy().

Return

ATTR_KILL_SGID if setgid bit needs to be removed, 0 otherwise.

intsetattr_should_drop_suidgid(structmnt_idmap*idmap,structinode*inode)

determine whether the set{g,u}id bit needs to be dropped

Parameters

structmnt_idmap*idmap

idmap of the mountinode was found from

structinode*inode

inode to check

Description

This function determines whether the set{g,u}id bits need to be removed.If the setuid bit needs to be removed ATTR_KILL_SUID is returned. If thesetgid bit needs to be removed ATTR_KILL_SGID is returned. If bothset{g,u}id bits need to be removed the corresponding mask of both flags isreturned.

Return

A mask of ATTR_KILL_S{G,U}ID indicating which - if any - setid bitsto remove, 0 otherwise.

intsetattr_prepare(structmnt_idmap*idmap,structdentry*dentry,structiattr*attr)

check if attribute changes to a dentry are allowed

Parameters

structmnt_idmap*idmap

idmap of the mount the inode was found from

structdentry*dentry

dentry to check

structiattr*attr

attributes to change

Description

Check if we are allowed to change the attributes contained inattrin the given dentry. This includes the normal unix access permissionchecks, as well as checks for rlimits and others. The function also clearsSGID bit from mode if user is not allowed to set it. Also file capabilitiesand IMA extended attributes are cleared if ATTR_KILL_PRIV is set.

If the inode has been found through an idmapped mount the idmap ofthe vfsmount must be passed throughidmap. This function will thentake care to map the inode according toidmap before checkingpermissions. On non-idmapped mounts or if permission checking is to beperformed on the raw inode simply passnop_mnt_idmap.

Should be called as the first thing in ->setattr implementations,possibly after taking additional locks.

intinode_newsize_ok(conststructinode*inode,loff_toffset)

may this inode be truncated to a given size

Parameters

conststructinode*inode

the inode to be truncated

loff_toffset

the new size to assign to the inode

Description

inode_newsize_ok must be called with i_rwsem held exclusively.

inode_newsize_ok will check filesystem limits and ulimits to check that thenew inode size is within limits. inode_newsize_ok will also send SIGXFSZwhen necessary. Caller must not proceed with inode size change if failure isreturned.inode must be a file (not directory), with appropriatepermissions to allow truncate (inode_newsize_ok does NOT check theseconditions).

Return

0 on success, -ve errno on failure

voidsetattr_copy(structmnt_idmap*idmap,structinode*inode,conststructiattr*attr)

copy simple metadata updates into the generic inode

Parameters

structmnt_idmap*idmap

idmap of the mount the inode was found from

structinode*inode

the inode to be updated

conststructiattr*attr

the new attributes

Description

setattr_copy must be called with i_rwsem held exclusively.

setattr_copy updates the inode’s metadata with that specifiedin attr on idmapped mounts. Necessary permission checks to determinewhether or not the S_ISGID property needs to be removed are performed withthe correct idmapped mount permission helpers.Noticeably missing is inode size update, which is more complexas it requires pagecache updates.

If the inode has been found through an idmapped mount the idmap ofthe vfsmount must be passed throughidmap. This function will thentake care to map the inode according toidmap before checkingpermissions. On non-idmapped mounts or if permission checking is to beperformed on the raw inode simply passnop_mnt_idmap.

The inode is not marked as dirty after this operation. The rationale isthat for “simple” filesystems, thestructinode is the inode storage.The caller is free to mark the inode dirty afterwards if needed.

intnotify_change(structmnt_idmap*idmap,structdentry*dentry,structiattr*attr,structdelegated_inode*delegated_inode)

modify attributes of a filesystem object

Parameters

structmnt_idmap*idmap

idmap of the mount the inode was found from

structdentry*dentry

object affected

structiattr*attr

new attributes

structdelegated_inode*delegated_inode

returns inode, if the inode is delegated

Description

The caller must hold the i_rwsem exclusively on the affected object.

If notify_change discovers a delegation in need of breaking,it will return -EWOULDBLOCK and return a reference to the inode indelegated_inode. The caller should then break the delegation andretry. Because breaking a delegation may take a long time, thecaller should drop the i_rwsem before doing so.

Alternatively, a caller may pass NULL for delegated_inode. This maybe appropriate for callers that expect the underlying filesystem notto be NFS exported. Also, passing NULL is fine for callers holdingthe file open for write, as there can be no conflicting delegation inthat case.

If the inode has been found through an idmapped mount the idmap ofthe vfsmount must be passed throughidmap. This function will thentake care to map the inode according toidmap before checkingpermissions. On non-idmapped mounts or if permission checking is to beperformed on the raw inode simply passnop_mnt_idmap.

char*d_path(conststructpath*path,char*buf,intbuflen)

return the path of a dentry

Parameters

conststructpath*path

path to report

char*buf

buffer to return value in

intbuflen

buffer length

Description

Convert a dentry into an ASCII path name. If the entry has been deletedthe string “ (deleted)” is appended. Note that this is ambiguous.

Returns a pointer into the buffer or an error code if the path wastoo long. Note: Callers should use the returned pointer, not the passedin buffer, to use the name! The implementation often starts at an offsetinto the buffer, and may leave 0 bytes at the start.

“buflen” should be positive.

structpage*dax_layout_busy_page_range(structaddress_space*mapping,loff_tstart,loff_tend)

find first pinned page inmapping

Parameters

structaddress_space*mapping

address space to scan for a page with ref count > 1

loff_tstart

Starting offset. Page containing ‘start’ is included.

loff_tend

End offset. Page containing ‘end’ is included. If ‘end’ is LLONG_MAX,pages from ‘start’ till the end of file are included.

Description

DAX requires ZONE_DEVICE mapped pages. These pages are never‘onlined’ to the page allocator so they are considered idle whenpage->count == 1. A filesystem uses this interface to determine ifany page in the mapping is busy, i.e. for DMA, or otherget_user_pages() usages.

It is expected that the filesystem is holding locks to block theestablishment of new mappings in this address_space. I.e. it expectsto be able to rununmap_mapping_range() and subsequently not racemapping_mapped() becoming true.

ssize_tdax_iomap_rw(structkiocb*iocb,structiov_iter*iter,conststructiomap_ops*ops)

Perform I/O to a DAX file

Parameters

structkiocb*iocb

The control block for this I/O

structiov_iter*iter

The addresses to do I/O from or to

conststructiomap_ops*ops

iomap ops passed from the file system

Description

This function performs read and write operations to directly mappedpersistent memory. The callers needs to take care of read/write exclusionand evicting any page cache pages in the region under I/O.

vm_fault_tdax_iomap_fault(structvm_fault*vmf,unsignedintorder,unsignedlong*pfnp,int*iomap_errp,conststructiomap_ops*ops)

handle a page fault on a DAX file

Parameters

structvm_fault*vmf

The description of the fault

unsignedintorder

Order of the page to fault in

unsignedlong*pfnp

PFN to insert for synchronous faults if fsync is required

int*iomap_errp

Storage for detailed error code in case of error

conststructiomap_ops*ops

Iomap ops passed from the file system

Description

When a page fault occurs, filesystems may call this helper intheir fault handler for DAX files.dax_iomap_fault() assumes the callerhas done all the necessary locking for page fault to proceedsuccessfully.

vm_fault_tdax_finish_sync_fault(structvm_fault*vmf,unsignedintorder,unsignedlongpfn)

finish synchronous page fault

Parameters

structvm_fault*vmf

The description of the fault

unsignedintorder

Order of entry to be inserted

unsignedlongpfn

PFN to insert

Description

This function ensures that the file range touched by the page fault isstored persistently on the media and handles inserting of appropriate pagetable entry.

voidsimple_rename_timestamp(structinode*old_dir,structdentry*old_dentry,structinode*new_dir,structdentry*new_dentry)

update the various inode timestamps for rename

Parameters

structinode*old_dir

old parent directory

structdentry*old_dentry

dentry that is being renamed

structinode*new_dir

new parent directory

structdentry*new_dentry

target for rename

Description

POSIX mandates that the old and new parent directories have their ctime andmtime updated, and that inodes ofold_dentry andnew_dentry (if any), havetheir ctime updated.

intsimple_setattr(structmnt_idmap*idmap,structdentry*dentry,structiattr*iattr)

setattr for simple filesystem

Parameters

structmnt_idmap*idmap

idmap of the target mount

structdentry*dentry

dentry

structiattr*iattr

iattr structure

Description

Returns 0 on success, -error on failure.

simple_setattr is a simple ->setattr implementation without a properimplementation of size changes.

It can either be used for in-memory filesystems or special fileson simple regular filesystems. Anything that needs to change on-diskor wire state on size changes needs its own setattr method.

ssize_tsimple_read_from_buffer(void__user*to,size_tcount,loff_t*ppos,constvoid*from,size_tavailable)

copy data from the buffer to user space

Parameters

void__user*to

the user space buffer to read to

size_tcount

the maximum number of bytes to read

loff_t*ppos

the current position in the buffer

constvoid*from

the buffer to read from

size_tavailable

the size of the buffer

Description

Thesimple_read_from_buffer() function reads up tocount bytes from thebufferfrom at offsetppos into the user space address starting atto.

On success, the number of bytes read is returned and the offsetppos isadvanced by this number, or negative value is returned on error.

ssize_tsimple_write_to_buffer(void*to,size_tavailable,loff_t*ppos,constvoid__user*from,size_tcount)

copy data from user space to the buffer

Parameters

void*to

the buffer to write to

size_tavailable

the size of the buffer

loff_t*ppos

the current position in the buffer

constvoid__user*from

the user space buffer to read from

size_tcount

the maximum number of bytes to read

Description

Thesimple_write_to_buffer() function reads up tocount bytes from the userspace address starting atfrom into the bufferto at offsetppos.

On success, the number of bytes written is returned and the offsetppos isadvanced by this number, or negative value is returned on error.

ssize_tmemory_read_from_buffer(void*to,size_tcount,loff_t*ppos,constvoid*from,size_tavailable)

copy data from the buffer

Parameters

void*to

the kernel space buffer to read to

size_tcount

the maximum number of bytes to read

loff_t*ppos

the current position in the buffer

constvoid*from

the buffer to read from

size_tavailable

the size of the buffer

Description

Thememory_read_from_buffer() function reads up tocount bytes from thebufferfrom at offsetppos into the kernel space address starting atto.

On success, the number of bytes read is returned and the offsetppos isadvanced by this number, or negative value is returned on error.

intgeneric_encode_ino32_fh(structinode*inode,__u32*fh,int*max_len,structinode*parent)

generic export_operations->encode_fh function

Parameters

structinode*inode

the object to encode

__u32*fh

where to store the file handle fragment

int*max_len

maximum length to store there (in 4 byte units)

structinode*parent

parent directory inode, if wanted

Description

This generic encode_fh function assumes that the 32 inode numberis suitable for locating an inode, and that the generation numbercan be used to check that it is still valid. It places them in thefilehandle fragment where export_decode_fh expects to find them.

structdentry*generic_fh_to_dentry(structsuper_block*sb,structfid*fid,intfh_len,intfh_type,structinode*(*get_inode)(structsuper_block*sb,u64ino,u32gen))

generic helper for the fh_to_dentry export operation

Parameters

structsuper_block*sb

filesystem to do the file handle conversion on

structfid*fid

file handle to convert

intfh_len

length of the file handle in bytes

intfh_type

type of file handle

structinode*(*get_inode)(structsuper_block*sb,u64ino,u32gen)

filesystem callback to retrieve inode

Description

This function decodesfid as long as it has one of the well-knownLinux filehandle types and callsget_inode on it to retrieve theinode for the object specified in the file handle.

structdentry*generic_fh_to_parent(structsuper_block*sb,structfid*fid,intfh_len,intfh_type,structinode*(*get_inode)(structsuper_block*sb,u64ino,u32gen))

generic helper for the fh_to_parent export operation

Parameters

structsuper_block*sb

filesystem to do the file handle conversion on

structfid*fid

file handle to convert

intfh_len

length of the file handle in bytes

intfh_type

type of file handle

structinode*(*get_inode)(structsuper_block*sb,u64ino,u32gen)

filesystem callback to retrieve inode

Description

This function decodesfid as long as it has one of the well-knownLinux filehandle types and callsget_inode on it to retrieve theinode for the _parent_ object specified in the file handle if itis specified in the file handle, or NULL otherwise.

int__generic_file_fsync(structfile*file,loff_tstart,loff_tend,intdatasync)

generic fsync implementation for simple filesystems

Parameters

structfile*file

file to synchronize

loff_tstart

start offset in bytes

loff_tend

end offset in bytes (inclusive)

intdatasync

only synchronize essential metadata if true

Description

This is a generic implementation of the fsync method for simplefilesystems which track all non-inode metadata in the buffers listhanging off the address_space structure.

intgeneric_file_fsync(structfile*file,loff_tstart,loff_tend,intdatasync)

generic fsync implementation for simple filesystems with flush

Parameters

structfile*file

file to synchronize

loff_tstart

start offset in bytes

loff_tend

end offset in bytes (inclusive)

intdatasync

only synchronize essential metadata if true

intgeneric_check_addressable(unsignedblocksize_bits,u64num_blocks)

Check addressability of file system

Parameters

unsignedblocksize_bits

log of file system block size

u64num_blocks

number of blocks in file system

Description

Determine whether a file system withnum_blocks blocks (and ablock size of 2****blocksize_bits**) is addressable by the sector_tand page cache of the system. Return 0 if so and -EFBIG otherwise.

intsimple_nosetlease(structfile*filp,intarg,structfile_lease**flp,void**priv)

generic helper for prohibiting leases

Parameters

structfile*filp

file pointer

intarg

type of lease to obtain

structfile_lease**flp

new lease supplied for insertion

void**priv

private data for lm_setup operation

Description

Generic helper for filesystems that do not wish to allow leases to be set.All arguments are ignored and it just returns -EINVAL.

constchar*simple_get_link(structdentry*dentry,structinode*inode,structdelayed_call*done)

generic helper to get the target of “fast” symlinks

Parameters

structdentry*dentry

not used here

structinode*inode

the symlink inode

structdelayed_call*done

not used here

Description

Generic helper for filesystems to use for symlink inodes where a pointer tothe symlink target is stored in ->i_link. NOTE: this isn’t normally called,since as an optimization the path lookup code uses any non-NULL ->i_linkdirectly, without calling ->get_link(). But ->get_link() still must be set,to mark the inode_operations as being for a symlink.

Return

the symlink target

intgeneric_ci_d_compare(conststructdentry*dentry,unsignedintlen,constchar*str,conststructqstr*name)

generic d_compare implementation for casefolding filesystems

Parameters

conststructdentry*dentry

dentry whose name we are checking against

unsignedintlen

len of name of dentry

constchar*str

str pointer to name of dentry

conststructqstr*name

Name to compare against

Return

0 if names match, 1 if mismatch, or -ERRNO

intgeneric_ci_d_hash(conststructdentry*dentry,structqstr*str)

generic d_hash implementation for casefolding filesystems

Parameters

conststructdentry*dentry

dentry of the parent directory

structqstr*str

qstr of name whose hash we should fill in

Return

0 if hash was successful or unchanged, and -EINVAL on error

intgeneric_ci_match(conststructinode*parent,conststructqstr*name,conststructqstr*folded_name,constu8*de_name,u32de_name_len)

Match a name (case-insensitively) with a dirent. This is a filesystem helper for comparison with directory entries. generic_ci_d_compare should be used in VFS’ ->d_compare instead.

Parameters

conststructinode*parent

Inode of the parent of the dirent under comparison

conststructqstr*name

name under lookup.

conststructqstr*folded_name

Optional pre-folded name under lookup

constu8*de_name

Dirent name.

u32de_name_len

dirent name length.

Description

Test whether a case-insensitive directory entry matches the filenamebeing searched. Iffolded_name is provided, it is used instead ofrecalculating the casefold ofname.

Return

> 0 if the directory entry matches, 0 if it doesn’t match, or< 0 on error.

voidgeneric_set_sb_d_ops(structsuper_block*sb)

helper for choosing the set of filesystem-wide dentry operations for the enabled features

Parameters

structsuper_block*sb

superblock to be configured

Description

Filesystems supporting casefolding and/or fscrypt can call thishelper at mount-time to configure default dentry_operations to thebest set of dentry operations required for the enabled features.The helper must be called after these have been configured, butbefore the root dentry is created.

boolinode_maybe_inc_iversion(structinode*inode,boolforce)

increments i_version

Parameters

structinode*inode

inode with the i_version that should be updated

boolforce

increment the counter even if it’s not necessary?

Description

Every time the inode is modified, the i_version field must be seen to havechanged by any observer.

If “force” is set or the QUERIED flag is set, then ensure that we incrementthe value, and clear the queried flag.

In the common case where neither is set, then we can return “false” withoutupdating i_version.

If this function returns false, and no other metadata has changed, then wecan avoid logging the metadata.

u64inode_query_iversion(structinode*inode)

read i_version for later use

Parameters

structinode*inode

inode from which i_version should be read

Description

Read the inode i_version counter. This should be used by callers that wishto store the returned i_version for later comparison. This will guaranteethat a later query of the i_version will result in a different value ifanything has changed.

In this implementation, we fetch the current value, set the QUERIED flag andthen try to swap it into place with a cmpxchg, if it wasn’t already set. Ifthat fails, we try again with the newly fetched value from the cmpxchg.

structtimespec64simple_inode_init_ts(structinode*inode)

initialize the timestamps for a new inode

Parameters

structinode*inode

inode to be initialized

Description

When a new inode is created, most filesystems set the timestamps to thecurrent time. Add a helper to do this.

structdentry*simple_start_creating(structdentry*parent,constchar*name)

prepare to create a given name

Parameters

structdentry*parent

directory in which to prepare to create the name

constchar*name

the name to be created

Description

Required lock is taken and a lookup in performed prior to creating anobject in a directory. No permission checking is performed.

Return

a negative dentry on whichvfs_create() or similar maybe attempted, or an error.

intposix_acl_chmod(structmnt_idmap*idmap,structdentry*dentry,umode_tmode)

chmod a posix acl

Parameters

structmnt_idmap*idmap

idmap of the mountinode was found from

structdentry*dentry

dentry to check permissions on

umode_tmode

the new mode ofinode

Description

If the dentry has been found through an idmapped mount the idmap ofthe vfsmount must be passed throughidmap. This function will thentake care to map the inode according toidmap before checkingpermissions. On non-idmapped mounts or if permission checking is to beperformed on the raw inode simply passnop_mnt_idmap.

intposix_acl_update_mode(structmnt_idmap*idmap,structinode*inode,umode_t*mode_p,structposix_acl**acl)

update mode in set_acl

Parameters

structmnt_idmap*idmap

idmap of the mountinode was found from

structinode*inode

target inode

umode_t*mode_p

mode (pointer) for update

structposix_acl**acl

acl pointer

Description

Update the file mode when setting an ACL: compute the new file permissionbits based on the ACL. In addition, if the ACL is equivalent to the newfile mode, set*acl to NULL to indicate that no ACL should be set.

As with chmod, clear the setgid bit if the caller is not in the owning groupor capable of CAP_FSETID (see inode_change_ok).

If the inode has been found through an idmapped mount the idmap ofthe vfsmount must be passed throughidmap. This function will thentake care to map the inode according toidmap before checkingpermissions. On non-idmapped mounts or if permission checking is to beperformed on the raw inode simply passnop_mnt_idmap.

Called from set_acl inode operations.

structposix_acl*posix_acl_from_xattr(structuser_namespace*userns,constvoid*value,size_tsize)

convert POSIX ACLs from backing store to VFS format

Parameters

structuser_namespace*userns

the filesystem’s idmapping

constvoid*value

the uapi representation of POSIX ACLs

size_tsize

the size ofvoid

Description

Filesystems that store POSIX ACLs in the unaltered uapi format should useposix_acl_from_xattr() when reading them from the backing store andconverting them into thestructposix_acl VFS format. The helper isspecifically intended to be called from the acl inode operation.

Theposix_acl_from_xattr() function will map the raw {g,u}id values storedin ACL_{GROUP,USER} entries into idmapping inuserns.

Note thatposix_acl_from_xattr() does not take idmapped mounts into account.If it did it calling it from the get acl inode operation would return POSIXACLs mapped according to an idmapped mount which would mean that the valuecouldn’t be cached for the filesystem. Idmapped mounts are taken intoaccount on the fly during permission checking or right at the VFS -userspace boundary before reporting them to the user.

Return

Allocatedstructposix_acl on success, NULL for a valid header butwithout actual POSIX ACL entries, orERR_PTR() encoded error code.

intvfs_set_acl(structmnt_idmap*idmap,structdentry*dentry,constchar*acl_name,structposix_acl*kacl)

set posix acls

Parameters

structmnt_idmap*idmap

idmap of the mount

structdentry*dentry

the dentry based on which to set the posix acls

constchar*acl_name

the name of the posix acl

structposix_acl*kacl

the posix acls in the appropriate VFS format

Description

This function setskacl. The caller must allposix_acl_release() onkaclafterwards.

Return

On success 0, on error negative errno.

structposix_acl*vfs_get_acl(structmnt_idmap*idmap,structdentry*dentry,constchar*acl_name)

get posix acls

Parameters

structmnt_idmap*idmap

idmap of the mount

structdentry*dentry

the dentry based on which to retrieve the posix acls

constchar*acl_name

the name of the posix acl

Description

This function retrieveskacl from the filesystem. The caller must allposix_acl_release() onkacl.

Return

On success POSIX ACLs in VFS format, on error negative errno.

intvfs_remove_acl(structmnt_idmap*idmap,structdentry*dentry,constchar*acl_name)

remove posix acls

Parameters

structmnt_idmap*idmap

idmap of the mount

structdentry*dentry

the dentry based on which to retrieve the posix acls

constchar*acl_name

the name of the posix acl

Description

This function removes posix acls.

Return

On success 0, on error negative errno.

voidfill_mg_cmtime(structkstat*stat,u32request_mask,structinode*inode)

Fill in the mtime and ctime and flag ctime as QUERIED

Parameters

structkstat*stat

where to store the resulting values

u32request_mask

STATX_* values requested

structinode*inode

inode from which to grab the c/mtime

Description

Giveninode, grab the ctime and mtime out if it and store the resultinstat. When fetching the value, flag it as QUERIED (if not already)so the next write will record a distinct timestamp.

NB: The QUERIED flag is tracked in the ctime, but we set it there evenif only the mtime was requested, as that ensures that the next mtimechange will be distinct.

voidgeneric_fillattr(structmnt_idmap*idmap,u32request_mask,structinode*inode,structkstat*stat)

Fill in the basic attributes from the inode struct

Parameters

structmnt_idmap*idmap

idmap of the mount the inode was found from

u32request_mask

statx request_mask

structinode*inode

Inode to use as the source

structkstat*stat

Where to fill in the attributes

Description

Fill in the basic attributes in the kstat structure from data that’s to befound on the VFS inode structure. This is the default if no getattr inodeoperation is supplied.

If the inode has been found through an idmapped mount the idmap ofthe vfsmount must be passed throughidmap. This function will thentake care to map the inode according toidmap before filling in theuid and gid filds. On non-idmapped mounts or if permission checking is to beperformed on the raw inode simply passnop_mnt_idmap.

voidgeneric_fill_statx_attr(structinode*inode,structkstat*stat)

Fill in the statx attributes from the inode flags

Parameters

structinode*inode

Inode to use as the source

structkstat*stat

Where to fill in the attribute flags

Description

Fill in the STATX_ATTR_* flags in the kstat structure for properties of theinode that are published on i_flags and enforced by the VFS.

voidgeneric_fill_statx_atomic_writes(structkstat*stat,unsignedintunit_min,unsignedintunit_max,unsignedintunit_max_opt)

Fill in atomic writes statx attributes

Parameters

structkstat*stat

Where to fill in the attribute flags

unsignedintunit_min

Minimum supported atomic write length in bytes

unsignedintunit_max

Maximum supported atomic write length in bytes

unsignedintunit_max_opt

Optimised maximum supported atomic write length in bytes

Description

Fill in the STATX{_ATTR}_WRITE_ATOMIC flags in the kstat structure fromatomic write unit_min and unit_max values.

intvfs_getattr_nosec(conststructpath*path,structkstat*stat,u32request_mask,unsignedintquery_flags)

getattr without security checks

Parameters

conststructpath*path

file to get attributes from

structkstat*stat

structure to return attributes in

u32request_mask

STATX_xxx flags indicating what the caller wants

unsignedintquery_flags

Query mode (AT_STATX_SYNC_TYPE)

Description

Get attributes without calling security_inode_getattr.

Currently the only caller other than vfs_getattr is internal to thefilehandle lookup code, which uses only the inode number and returns noattributes to any user. Any other code probably wants vfs_getattr.

intvfs_fsync_range(structfile*file,loff_tstart,loff_tend,intdatasync)

helper to sync a range of data & metadata to disk

Parameters

structfile*file

file to sync

loff_tstart

offset in bytes of the beginning of data range to sync

loff_tend

offset in bytes of the end of data range (inclusive)

intdatasync

perform only datasync

Description

Write back data in rangestart..**end** and metadata forfile to disk. Ifdatasync is set only metadata needed to access modified file data iswritten.

intvfs_fsync(structfile*file,intdatasync)

perform a fsync or fdatasync on a file

Parameters

structfile*file

file to sync

intdatasync

only perform a fdatasync operation

Description

Write back data and metadata forfile to disk. Ifdatasync isset only metadata needed to access modified file data is written.

int__vfs_setxattr_locked(structmnt_idmap*idmap,structdentry*dentry,constchar*name,constvoid*value,size_tsize,intflags,structdelegated_inode*delegated_inode)

set an extended attribute while holding the inode lock

Parameters

structmnt_idmap*idmap

idmap of the mount of the target inode

structdentry*dentry

object to perform setxattr on

constchar*name

xattr name to set

constvoid*value

value to setname to

size_tsize

size ofvalue

intflags

flags to pass into filesystem operations

structdelegated_inode*delegated_inode

on return, will contain an inode pointer thata delegation was broken on, NULL if none.

ssize_tvfs_listxattr(structdentry*dentry,char*list,size_tsize)

retrieve 0 separated list of xattr names

Parameters

structdentry*dentry

the dentry from whose inode the xattr names are retrieved

char*list

buffer to store xattr names into

size_tsize

size of the buffer

Description

This function returns the names of all xattrs associated with theinode ofdentry.

Note, for legacy reasons thevfs_listxattr() function lists POSIXACLs as well. Since POSIX ACLs are decoupled from IOP_XATTR thevfs_listxattr() function doesn’t check for this flag since afilesystem could implement POSIX ACLs without implementing any otherxattrs.

However, since all codepaths that remove IOP_XATTR also assign ofinode operations that either don’t implement or implement a stub->listxattr() operation.

Return

On success, the size of the buffer that was used. On error anegative error code.

int__vfs_removexattr_locked(structmnt_idmap*idmap,structdentry*dentry,constchar*name,structdelegated_inode*delegated_inode)

set an extended attribute while holding the inode lock

Parameters

structmnt_idmap*idmap

idmap of the mount of the target inode

structdentry*dentry

object to perform setxattr on

constchar*name

name of xattr to remove

structdelegated_inode*delegated_inode

on return, will contain an inode pointer thata delegation was broken on, NULL if none.

ssize_tgeneric_listxattr(structdentry*dentry,char*buffer,size_tbuffer_size)

run through a dentry’s xattrlist() operations

Parameters

structdentry*dentry

dentry to list the xattrs

char*buffer

result buffer

size_tbuffer_size

size ofbuffer

Description

Combine the results of thelist() operation from every xattr_handler in thexattr_handler stack.

Note that this will not include the entries for POSIX ACLs.

constchar*xattr_full_name(conststructxattr_handler*handler,constchar*name)

Compute full attribute name from suffix

Parameters

conststructxattr_handler*handler

handler of the xattr_handler operation

constchar*name

name passed to the xattr_handler operation

Description

The get and set xattr handler operations are called with the remainder ofthe attribute name after skipping the handler’s prefix: for example, “foo”is passed to the get operation of a handler with prefix “user.” to getattribute “user.foo”. The full name is still “there” in the name though.

Note

the list xattr handler operation when called from the vfs is passed aNULL name; some file systems use this operation internally, with varyingsemantics.

intmnt_get_write_access(structvfsmount*m)

get write access to a mount without freeze protection

Parameters

structvfsmount*m

the mount on which to take a write

Description

This tells the low-level filesystem that a write is about to be performed toit, and makes sure that writes are allowed (mnt it read-write) beforereturning success. This operation does not protect against filesystem beingfrozen. When the write operation is finished,mnt_put_write_access() must becalled. This is effectively a refcount.

intmnt_want_write(structvfsmount*m)

get write access to a mount

Parameters

structvfsmount*m

the mount on which to take a write

Description

This tells the low-level filesystem that a write is about to be performed toit, and makes sure that writes are allowed (mount is read-write, filesystemis not frozen) before returning success. When the write operation isfinished,mnt_drop_write() must be called. This is effectively a refcount.

intmnt_want_write_file(structfile*file)

get write access to a file’s mount

Parameters

structfile*file

the file who’s mount on which to take a write

Description

This is like mnt_want_write, but if the file is already open for writing itskips incrementing mnt_writers (since the open file already has a reference)and instead only does the freeze protection and the check for emergency r/oremounts. This must be paired with mnt_drop_write_file.

voidmnt_put_write_access(structvfsmount*mnt)

give up write access to a mount

Parameters

structvfsmount*mnt

the mount on which to give up write access

Description

Tells the low-level filesystem that we are doneperforming writes to it. Must be matched withmnt_get_write_access() call above.

voidmnt_drop_write(structvfsmount*mnt)

give up write access to a mount

Parameters

structvfsmount*mnt

the mount on which to give up write access

Description

Tells the low-level filesystem that we are done performing writes to it andalso allows filesystem to be frozen again. Must be matched withmnt_want_write() call above.

structvfsmount*vfs_create_mount(structfs_context*fc)

Create a mount for a configured superblock

Parameters

structfs_context*fc

The configuration context with the superblock attached

Description

Create a mount to an already configured superblock. If necessary, thecaller should invokevfs_get_tree() before calling this.

Note that this does not attach the mount to anything.

boolpath_is_mountpoint(conststructpath*path)

Check if path is a mount in the current namespace.

Parameters

conststructpath*path

path to check

Description

d_mountpoint() can only be used reliably to establish if a dentry isnot mounted in any namespace and that common case is handled inline.d_mountpoint() isn’t aware of the possibility there may be multiplemounts using a given dentry in a different namespace. This functionchecks if the passed in path is a mountpoint rather than the dentryalone.

intmay_umount_tree(structvfsmount*m)

check if a mount tree is busy

Parameters

structvfsmount*m

root of mount tree

Description

This is called to check if a tree of mounts has anyopen files, pwds, chroots or sub mounts that arebusy.

intmay_umount(structvfsmount*mnt)

check if a mount point is busy

Parameters

structvfsmount*mnt

root of mount

Description

This is called to check if a mount point has anyopen files, pwds, chroots or sub mounts. If themount has sub mounts this will return busyregardless of whether the sub mounts are busy.

Doesn’t take quota and stuff into account. IOW, in some cases it willgive false negatives. The main reason why it’s here is that we needa non-destructive way to look for easily umountable filesystems.

structvfsmount*clone_private_mount(conststructpath*path)

create a private clone of a path

Parameters

conststructpath*path

path to clone

Description

This creates a new vfsmount, which will be the clone ofpath. The new mountwill not be attached anywhere in the namespace and will be private (i.e.changes to the originating mount won’t be propagated into this).

This assumes caller has called or done the equivalent ofmay_mount().

Release withmntput().

voidmnt_set_expiry(structvfsmount*mnt,structlist_head*expiry_list)

Put a mount on an expiration list

Parameters

structvfsmount*mnt

The mount to list.

structlist_head*expiry_list

The list to add the mount to.

The proc filesystem

sysctl interface

intproc_dostring(conststructctl_table*table,intdir,void*buffer,size_t*lenp,loff_t*ppos)

read a string sysctl

Parameters

conststructctl_table*table

the sysctl table

intdir

TRUE if this is a write to the sysctl file

void*buffer

the user buffer

size_t*lenp

the size of the user buffer

loff_t*ppos

file position

Description

Reads/writes a string from/to the user buffer. If the kernelbuffer provided is not large enough to hold the string, thestring is truncated. The copied string isNULL-terminated.If the string is being read by the user process, it is copiedand a newline ‘n’ is added. It is truncated if the buffer isnot large enough.

Returns 0 on success.

intproc_dobool(conststructctl_table*table,intdir,void*buffer,size_t*lenp,loff_t*ppos)

read/write a bool

Parameters

conststructctl_table*table

the sysctl table

intdir

TRUE if this is a write to the sysctl file

void*buffer

the user buffer

size_t*lenp

the size of the user buffer

loff_t*ppos

file position

Description

Reads/writes one integer value from/to the user buffer,treated as an ASCII string.

table->data must point to a bool variable and table->maxlen mustbe sizeof(bool).

Returns 0 on success.

intproc_dointvec(conststructctl_table*table,intdir,void*buffer,size_t*lenp,loff_t*ppos)

read a vector of integers

Parameters

conststructctl_table*table

the sysctl table

intdir

TRUE if this is a write to the sysctl file

void*buffer

the user buffer

size_t*lenp

the size of the user buffer

loff_t*ppos

file position

Description

Reads/writes up to table->maxlen/sizeof(unsigned int) integervalues from/to the user buffer, treated as an ASCII string.

Returns 0 on success.

intproc_douintvec(conststructctl_table*table,intdir,void*buffer,size_t*lenp,loff_t*ppos)

read a vector of unsigned integers

Parameters

conststructctl_table*table

the sysctl table

intdir

TRUE if this is a write to the sysctl file

void*buffer

the user buffer

size_t*lenp

the size of the user buffer

loff_t*ppos

file position

Description

Reads/writes up to table->maxlen/sizeof(unsigned int) unsigned integervalues from/to the user buffer, treated as an ASCII string.

Returns 0 on success.

intproc_dointvec_minmax(conststructctl_table*table,intdir,void*buffer,size_t*lenp,loff_t*ppos)

read a vector of integers with min/max values

Parameters

conststructctl_table*table

the sysctl table

intdir

TRUE if this is a write to the sysctl file

void*buffer

the user buffer

size_t*lenp

the size of the user buffer

loff_t*ppos

file position

Description

Reads/writes up to table->maxlen/sizeof(unsigned int) integervalues from/to the user buffer, treated as an ASCII string.

This routine will ensure the values are within the range specified bytable->extra1 (min) and table->extra2 (max).

Returns 0 on success or -EINVAL when the range check fails andSYSCTL_USER_TO_KERN(dir) == true

intproc_douintvec_minmax(conststructctl_table*table,intdir,void*buffer,size_t*lenp,loff_t*ppos)

read a vector of unsigned ints with min/max values

Parameters

conststructctl_table*table

the sysctl table

intdir

TRUE if this is a write to the sysctl file

void*buffer

the user buffer

size_t*lenp

the size of the user buffer

loff_t*ppos

file position

Description

Reads/writes up to table->maxlen/sizeof(unsigned int) unsigned integervalues from/to the user buffer, treated as an ASCII string. Negativestrings are not allowed.

This routine will ensure the values are within the range specified bytable->extra1 (min) and table->extra2 (max). There is a final sanitycheck for UINT_MAX to avoid having to support wrap around uses fromuserspace.

Returns 0 on success or -ERANGE when range check failes andSYSCTL_USER_TO_KERN(dir) == true

intproc_dou8vec_minmax(conststructctl_table*table,intdir,void*buffer,size_t*lenp,loff_t*ppos)

read a vector of unsigned chars with min/max values

Parameters

conststructctl_table*table

the sysctl table

intdir

TRUE if this is a write to the sysctl file

void*buffer

the user buffer

size_t*lenp

the size of the user buffer

loff_t*ppos

file position

Description

Reads/writes up to table->maxlen/sizeof(u8) unsigned charsvalues from/to the user buffer, treated as an ASCII string. Negativestrings are not allowed.

This routine will ensure the values are within the range specified bytable->extra1 (min) and table->extra2 (max).

Returns 0 on success or an error on SYSCTL_USER_TO_KERN(dir) == trueand the range check fails.

intproc_doulongvec_minmax(conststructctl_table*table,intdir,void*buffer,size_t*lenp,loff_t*ppos)

read a vector of long integers with min/max values

Parameters

conststructctl_table*table

the sysctl table

intdir

TRUE if this is a write to the sysctl file

void*buffer

the user buffer

size_t*lenp

the size of the user buffer

loff_t*ppos

file position

Description

Reads/writes up to table->maxlen/sizeof(unsigned long) unsigned longvalues from/to the user buffer, treated as an ASCII string.

This routine will ensure the values are within the range specified bytable->extra1 (min) and table->extra2 (max).

Returns 0 on success.

intproc_do_large_bitmap(conststructctl_table*table,intdir,void*buffer,size_t*lenp,loff_t*ppos)

read/write from/to a large bitmap

Parameters

conststructctl_table*table

the sysctl table

intdir

TRUE if this is a write to the sysctl file

void*buffer

the user buffer

size_t*lenp

the size of the user buffer

loff_t*ppos

file position

Description

The bitmap is stored at table->data and the bitmap length (in bits)in table->maxlen.

We use a range comma separated format (e.g. 1,3-4,10-10) so thatlarge bitmaps may be represented in a compact manner. Writing intothe file will clear the bitmap then update it with the given input.

Returns 0 on success.

proc filesystem interface

voidproc_flush_pid(structpid*pid)

Remove dcache entries forpid from the /proc dcache.

Parameters

structpid*pid

pid that should be flushed.

Description

This function walks a list of inodes (that belong to any procfilesystem) that are attached to the pid and flushes them fromthe dentry cache.

It is safe and reasonable to cache /proc entries for a task untilthat task exits. After that they just clog up the dcache withuseless entries, possibly causing useful dcache entries to beflushed instead. This routine is provided to flush those uselessdcache entries when a process is reaped.

NOTE

This routine is just an optimization so it does not guarantee

that no dcache entries will exist after a process is reapedit just makes it very unlikely that any will persist.

Events based on file descriptors

voideventfd_signal_mask(structeventfd_ctx*ctx,__poll_tmask)

Increment the event counter

Parameters

structeventfd_ctx*ctx

[in] Pointer to the eventfd context.

__poll_tmask

[in] poll mask

Description

This function is supposed to be called by the kernel in paths that do notallow sleeping. In this function we allow the counter to reach the ULLONG_MAXvalue, and we signal this as overflow condition by returning a EPOLLERRto poll(2).

voideventfd_ctx_put(structeventfd_ctx*ctx)

Releases a reference to the internal eventfd context.

Parameters

structeventfd_ctx*ctx

[in] Pointer to eventfd context.

Description

The eventfd context reference must have been previously acquired eitherwitheventfd_ctx_fdget() oreventfd_ctx_fileget().

inteventfd_ctx_remove_wait_queue(structeventfd_ctx*ctx,wait_queue_entry_t*wait,__u64*cnt)

Read the current counter and removes wait queue.

Parameters

structeventfd_ctx*ctx

[in] Pointer to eventfd context.

wait_queue_entry_t*wait

[in] Wait queue to be removed.

__u64*cnt

[out] Pointer to the 64-bit counter value.

Description

Returns0 if successful, or the following error codes:

-EAGAIN

: The operation would have blocked.

This is used to atomically remove a wait queue entry from the eventfd waitqueue head, and read/reset the counter value.

structfile*eventfd_fget(intfd)

Acquire a reference of an eventfd file descriptor.

Parameters

intfd

[in] Eventfd file descriptor.

Description

Returns a pointer to the eventfd file structure in case of success, or thefollowing error pointer:

-EBADF

: Invalidfd file descriptor.

-EINVAL

: Thefd file descriptor is not an eventfd file.

structeventfd_ctx*eventfd_ctx_fdget(intfd)

Acquires a reference to the internal eventfd context.

Parameters

intfd

[in] Eventfd file descriptor.

Description

Returns a pointer to the internal eventfd context, otherwise the errorpointers returned by the following functions:

eventfd_fget

structeventfd_ctx*eventfd_ctx_fileget(structfile*file)

Acquires a reference to the internal eventfd context.

Parameters

structfile*file

[in] Eventfd file pointer.

Description

Returns a pointer to the internal eventfd context, otherwise the errorpointer:

-EINVAL

: Thefd file descriptor is not an eventfd file.

eventpoll (epoll) interfaces

intep_events_available(structeventpoll*ep)

Checks if ready events might be available.

Parameters

structeventpoll*ep

Pointer to the eventpoll context.

Return

a value different thanzero if ready events are available,orzero otherwise.

boolbusy_loop_ep_timeout(unsignedlongstart_time,structeventpoll*ep)

check if busy poll has timed out. The timeout value from the epoll instance ep is preferred, but if it is not set fallback to the system-wide global via busy_loop_timeout.

Parameters

unsignedlongstart_time

The start time used to compute the remaining time until timeout.

structeventpoll*ep

Pointer to the eventpoll context.

Return

true if the timeout has expired, false otherwise.

intreverse_path_check(void)

The tfile_check_list is list of epitem_head, which have links that are proposed to be newly added. We need to make sure that those added links don’t add too many paths such that we will spend all our time waking up eventpoll objects.

Parameters

void

no arguments

Return

zero if the proposed links don’t create too many paths,-1 otherwise.

intep_poll(structeventpoll*ep,structepoll_event__user*events,intmaxevents,structtimespec64*timeout)

Retrieves ready events, and delivers them to the caller-supplied event buffer.

Parameters

structeventpoll*ep

Pointer to the eventpoll context.

structepoll_event__user*events

Pointer to the userspace buffer where the ready events should bestored.

intmaxevents

Size (in terms of number of events) of the caller event buffer.

structtimespec64*timeout

Maximum timeout for the ready events fetch operation, intimespec. If the timeout is zero, the function will not block,while if thetimeout ptr is NULL, the function will blockuntil at least one event has been retrieved (or an erroroccurred).

Return

the number of ready events which have been fetched, or anerror code, in case of error.

intep_loop_check_proc(structeventpoll*ep,intdepth)

verify that adding an epoll fileep inside another epoll file does not create closed loops, and determine the depth of the subtree starting atep

Parameters

structeventpoll*ep

thestructeventpoll to be currently checked.

intdepth

Current depth of the path being checked.

Return

depth of the subtree, or INT_MAX if we found a loop or went too deep.

intep_loop_check(structeventpoll*ep,structeventpoll*to)

Performs a check to verify that adding an epoll file (to) into another epoll file (represented byep) does not create closed loops or too deep chains.

Parameters

structeventpoll*ep

Pointer to the epoll we are inserting into.

structeventpoll*to

Pointer to the epoll to be inserted.

Return

zero if adding the epollto inside the epollfromdoes not violate the constraints, or-1 otherwise.

The Filesystem for Exporting Kernel Objects

intsysfs_create_file_ns(structkobject*kobj,conststructattribute*attr,constvoid*ns)

create an attribute file for an object with custom ns

Parameters

structkobject*kobj

object we’re creating for

conststructattribute*attr

attribute descriptor

constvoid*ns

namespace the new file should belong to

intsysfs_add_file_to_group(structkobject*kobj,conststructattribute*attr,constchar*group)

add an attribute file to a pre-existing group.

Parameters

structkobject*kobj

object we’re acting for.

conststructattribute*attr

attribute descriptor.

constchar*group

group name.

intsysfs_chmod_file(structkobject*kobj,conststructattribute*attr,umode_tmode)

update the modified mode value on an object attribute.

Parameters

structkobject*kobj

object we’re acting for.

conststructattribute*attr

attribute descriptor.

umode_tmode

file permissions.

structkernfs_node*sysfs_break_active_protection(structkobject*kobj,conststructattribute*attr)

break “active” protection

Parameters

structkobject*kobj

The kernel objectattr is associated with.

conststructattribute*attr

The attribute to break the “active” protection for.

Description

With sysfs, just like kernfs, deletion of an attribute is postponed untilall active .show() and .store() callbacks have finished unless this functionis called. Hence this function is useful in methods that implement selfdeletion.

voidsysfs_unbreak_active_protection(structkernfs_node*kn)

restore “active” protection

Parameters

structkernfs_node*kn

Pointer returned bysysfs_break_active_protection().

Description

Undo the effects ofsysfs_break_active_protection(). Since this functioncallskernfs_put() on the kernfs node that corresponds to the ‘attr’argument passed tosysfs_break_active_protection() that attribute may havebeen removed between thesysfs_break_active_protection() andsysfs_unbreak_active_protection() calls, it is not safe to accesskn afterthis function has returned.

voidsysfs_remove_file_ns(structkobject*kobj,conststructattribute*attr,constvoid*ns)

remove an object attribute with a custom ns tag

Parameters

structkobject*kobj

object we’re acting for

conststructattribute*attr

attribute descriptor

constvoid*ns

namespace tag of the file to remove

Description

Hash the attribute name and namespace tag and kill the victim.

boolsysfs_remove_file_self(structkobject*kobj,conststructattribute*attr)

remove an object attribute from its own method

Parameters

structkobject*kobj

object we’re acting for

conststructattribute*attr

attribute descriptor

Description

Seekernfs_remove_self() for details.

voidsysfs_remove_file_from_group(structkobject*kobj,conststructattribute*attr,constchar*group)

remove an attribute file from a group.

Parameters

structkobject*kobj

object we’re acting for.

conststructattribute*attr

attribute descriptor.

constchar*group

group name.

intsysfs_create_bin_file(structkobject*kobj,conststructbin_attribute*attr)

create binary file for object.

Parameters

structkobject*kobj

object.

conststructbin_attribute*attr

attribute descriptor.

voidsysfs_remove_bin_file(structkobject*kobj,conststructbin_attribute*attr)

remove binary file for object.

Parameters

structkobject*kobj

object.

conststructbin_attribute*attr

attribute descriptor.

intsysfs_file_change_owner(structkobject*kobj,constchar*name,kuid_tkuid,kgid_tkgid)

change owner of a sysfs file.

Parameters

structkobject*kobj

object.

constchar*name

name of the file to change.

kuid_tkuid

new owner’s kuid

kgid_tkgid

new owner’s kgid

Description

This function looks up the sysfs entryname underkobj and changes theownership tokuid/kgid.

Returns 0 on success or error code on failure.

intsysfs_change_owner(structkobject*kobj,kuid_tkuid,kgid_tkgid)

change owner of the given object.

Parameters

structkobject*kobj

object.

kuid_tkuid

new owner’s kuid

kgid_tkgid

new owner’s kgid

Description

Change the owner of the default directory, files, groups, and attributes ofkobj tokuid/kgid. Note that sysfs_change_owner mirrors how the sysfsentries for a kobject are added by driver core. In summary,sysfs_change_owner() takes care of the default directory entry forkobj,the default attributes associated with the ktype ofkobj and the defaultattributes associated with the ktype ofkobj.Additional properties not added by driver core have to be changed by thedriver or subsystem which created them. This is similar to howdriver/subsystem specific entries are removed.

Returns 0 on success or error code on failure.

intsysfs_emit(char*buf,constchar*fmt,...)

scnprintf equivalent, aware of PAGE_SIZE buffer.

Parameters

char*buf

start of PAGE_SIZE buffer.

constchar*fmt

format

...

optional arguments toformat

Description

Returns number of characters written tobuf.

intsysfs_emit_at(char*buf,intat,constchar*fmt,...)

scnprintf equivalent, aware of PAGE_SIZE buffer.

Parameters

char*buf

start of PAGE_SIZE buffer.

intat

offset inbuf to start write in bytesat must be >= 0 && < PAGE_SIZE

constchar*fmt

format

...

optional arguments tofmt

Description

Returns number of characters written starting at &**buf**[at].

ssize_tsysfs_bin_attr_simple_read(structfile*file,structkobject*kobj,conststructbin_attribute*attr,char*buf,loff_toff,size_tcount)

read callback to simply copy from memory.

Parameters

structfile*file

attribute file which is being read.

structkobject*kobj

object to which the attribute belongs.

conststructbin_attribute*attr

attribute descriptor.

char*buf

destination buffer.

loff_toff

offset in bytes from which to read.

size_tcount

maximum number of bytes to read.

Description

Simple ->read() callback for bin_attributes backed by a buffer in memory.Theprivate andsize members instructbin_attribute must be set to thebuffer’s location and size before the bin_attribute is created in sysfs.

Bounds check foroff andcount is done insysfs_kf_bin_read().Negative value check foroff is done invfs_setpos() anddefault_llseek().

Returns number of bytes written tobuf.

intsysfs_create_link(structkobject*kobj,structkobject*target,constchar*name)

create symlink between two objects.

Parameters

structkobject*kobj

object whose directory we’re creating the link in.

structkobject*target

object we’re pointing to.

constchar*name

name of the symlink.

intsysfs_create_link_nowarn(structkobject*kobj,structkobject*target,constchar*name)

create symlink between two objects.

Parameters

structkobject*kobj

object whose directory we’re creating the link in.

structkobject*target

object we’re pointing to.

constchar*name

name of the symlink.

Description

This function does the same assysfs_create_link(), but itdoesn’t warn if the link already exists.

voidsysfs_remove_link(structkobject*kobj,constchar*name)

remove symlink in object’s directory.

Parameters

structkobject*kobj

object we’re acting for.

constchar*name

name of the symlink to remove.

intsysfs_rename_link_ns(structkobject*kobj,structkobject*targ,constchar*old,constchar*new,constvoid*new_ns)

rename symlink in object’s directory.

Parameters

structkobject*kobj

object we’re acting for.

structkobject*targ

object we’re pointing to.

constchar*old

previous name of the symlink.

constchar*new

new name of the symlink.

constvoid*new_ns

new namespace of the symlink.

Description

A helper function for the common rename symlink idiom.

The debugfs filesystem

debugfs interface

structdentry*debugfs_lookup(constchar*name,structdentry*parent)

look up an existing debugfs file

Parameters

constchar*name

a pointer to a string containing the name of the file to look up.

structdentry*parent

a pointer to the parent dentry of the file.

Description

This function will return a pointer to a dentry if it succeeds. If the filedoesn’t exist or an error occurs,NULL will be returned. The returneddentry must be passed todput() when it is no longer needed.

If debugfs is not enabled in the kernel, the value -ENODEV will bereturned.

structdentry*debugfs_create_file_unsafe(constchar*name,umode_tmode,structdentry*parent,void*data,conststructfile_operations*fops)

create a file in the debugfs filesystem

Parameters

constchar*name

a pointer to a string containing the name of the file to create.

umode_tmode

the permission that the file should have.

structdentry*parent

a pointer to the parent dentry for this file. This should be adirectory dentry if set. If this parameter is NULL, then thefile will be created in the root of the debugfs filesystem.

void*data

a pointer to something that the caller will want to get to lateron. The inode.i_private pointer will point to this value onthe open() call.

conststructfile_operations*fops

a pointer to astructfile_operations that should be used forthis file.

Description

debugfs_create_file_unsafe() is completely analogous todebugfs_create_file(), the only difference being that the fopshanded it will not get protected against file removals by thedebugfs core.

It is your responsibility to protect yourstructfile_operationmethods against file removals by means ofdebugfs_file_get()anddebugfs_file_put(). ->open() is still protected bydebugfs though.

Anystructfile_operations defined by means ofDEFINE_DEBUGFS_ATTRIBUTE() is protected against file removals andthus, may be used here.

voiddebugfs_create_file_size(constchar*name,umode_tmode,structdentry*parent,void*data,conststructfile_operations*fops,loff_tfile_size)

create a file in the debugfs filesystem

Parameters

constchar*name

a pointer to a string containing the name of the file to create.

umode_tmode

the permission that the file should have.

structdentry*parent

a pointer to the parent dentry for this file. This should be adirectory dentry if set. If this parameter is NULL, then thefile will be created in the root of the debugfs filesystem.

void*data

a pointer to something that the caller will want to get to lateron. The inode.i_private pointer will point to this value onthe open() call.

conststructfile_operations*fops

a pointer to astructfile_operations that should be used forthis file.

loff_tfile_size

initial file size

Description

This is the basic “create a file” function for debugfs. It allows for awide range of flexibility in creating a file, or a directory (if you wantto create a directory, thedebugfs_create_dir() function isrecommended to be used instead.)

structdentry*debugfs_create_dir(constchar*name,structdentry*parent)

create a directory in the debugfs filesystem

Parameters

constchar*name

a pointer to a string containing the name of the directory tocreate.

structdentry*parent

a pointer to the parent dentry for this file. This should be adirectory dentry if set. If this parameter is NULL, then thedirectory will be created in the root of the debugfs filesystem.

Description

This function creates a directory in debugfs with the given name.

This function will return a pointer to a dentry if it succeeds. Thispointer must be passed to thedebugfs_remove() function when the file isto be removed (no automatic cleanup happens if your module is unloaded,you are responsible here.) If an error occurs, ERR_PTR(-ERROR) will bereturned.

If debugfs is not enabled in the kernel, the value -ENODEV will bereturned.

NOTE

it’s expected that most callers should _ignore_ the errors returnedby this function. Other debugfs functions handle the fact that the “dentry”passed to them could be an error and they don’t crash in that case.Drivers should generally work fine even if debugfs fails to init anyway.

structdentry*debugfs_create_automount(constchar*name,structdentry*parent,debugfs_automount_tf,void*data)

create automount point in the debugfs filesystem

Parameters

constchar*name

a pointer to a string containing the name of the file to create.

structdentry*parent

a pointer to the parent dentry for this file. This should be adirectory dentry if set. If this parameter is NULL, then thefile will be created in the root of the debugfs filesystem.

debugfs_automount_tf

function to be called when pathname resolution steps on that one.

void*data

opaque argument to pass to f().

Description

f should return what ->d_automount() would.

structdentry*debugfs_create_symlink(constchar*name,structdentry*parent,constchar*target)

create a symbolic link in the debugfs filesystem

Parameters

constchar*name

a pointer to a string containing the name of the symbolic link tocreate.

structdentry*parent

a pointer to the parent dentry for this symbolic link. Thisshould be a directory dentry if set. If this parameter is NULL,then the symbolic link will be created in the root of the debugfsfilesystem.

constchar*target

a pointer to a string containing the path to the target of thesymbolic link.

Description

This function creates a symbolic link with the given name in debugfs thatlinks to the given target path.

This function will return a pointer to a dentry if it succeeds. Thispointer must be passed to thedebugfs_remove() function when the symboliclink is to be removed (no automatic cleanup happens if your module isunloaded, you are responsible here.) If an error occurs, ERR_PTR(-ERROR)will be returned.

If debugfs is not enabled in the kernel, the value -ENODEV will bereturned.

voiddebugfs_remove(structdentry*dentry)

recursively removes a directory

Parameters

structdentry*dentry

a pointer to a the dentry of the directory to be removed. If thisparameter is NULL or an error value, nothing will be done.

Description

This function recursively removes a directory tree in debugfs thatwas previously created with a call to another debugfs function(likedebugfs_create_file() or variants thereof.)

This function is required to be called in order for the file to beremoved, no automatic cleanup of files will happen when a module isremoved, you are responsible here.

voiddebugfs_lookup_and_remove(constchar*name,structdentry*parent)

lookup a directory or file and recursively remove it

Parameters

constchar*name

a pointer to a string containing the name of the item to look up.

structdentry*parent

a pointer to the parent dentry of the item.

Description

This is the equlivant of doing something likedebugfs_remove(debugfs_lookup(..)) but with the proper reference countinghandled for the directory being looked up.

intdebugfs_change_name(structdentry*dentry,constchar*fmt,...)

rename a file/directory in the debugfs filesystem

Parameters

structdentry*dentry

dentry of an object to be renamed.

constchar*fmt

format for new name

...

variable arguments

Description

This function renames a file/directory in debugfs. The target must notexist for rename to succeed.

This function will return 0 on success and -E... on failure.

If debugfs is not enabled in the kernel, the value -ENODEV will bereturned.

booldebugfs_initialized(void)

Tells whether debugfs has been registered

Parameters

void

no arguments

intdebugfs_file_get(structdentry*dentry)

mark the beginning of file data access

Parameters

structdentry*dentry

the dentry object whose data is being accessed.

Description

Up to a matching call todebugfs_file_put(), any successive callinto the file removing functionsdebugfs_remove() anddebugfs_remove_recursive() will block. Since associated privatefile data may only get freed after a successful return of any ofthe removal functions, you may safely access it after a successfulcall todebugfs_file_get() without worrying about lifetime issues.

If -EIO is returned, the file has already been removed and thus,it is not safe to access any of its data. If, on the other hand,it is allowed to access the file data, zero is returned.

voiddebugfs_file_put(structdentry*dentry)

mark the end of file data access

Parameters

structdentry*dentry

the dentry object formerly passed todebugfs_file_get().

Description

Allow any ongoing concurrent call intodebugfs_remove() ordebugfs_remove_recursive() blocked by a former call todebugfs_file_get() to proceed and return to its caller.

voiddebugfs_enter_cancellation(structfile*file,structdebugfs_cancellation*cancellation)

enter a debugfs cancellation

Parameters

structfile*file

the file being accessed

structdebugfs_cancellation*cancellation

the cancellation object, the cancel callbackinside of it must be initialized

Description

When a debugfs file is removed it needs to wait for all activeoperations to complete. However, the operation itself may needto wait for hardware or completion of some asynchronous processor similar. As such, it may need to be cancelled to avoid longwaits or even deadlocks.

This function can be used inside a debugfs handler that mayneed to be cancelled. As soon as this function is called, thecancellation’s ‘cancel’ callback may be called, at which pointthe caller should proceed to calldebugfs_leave_cancellation()and leave the debugfs handler function as soon as possible.Note that the ‘cancel’ callback is only ever called in thecontext of some kind ofdebugfs_remove().

This function must be paired withdebugfs_leave_cancellation().

voiddebugfs_leave_cancellation(structfile*file,structdebugfs_cancellation*cancellation)

leave cancellation section

Parameters

structfile*file

the file being accessed

structdebugfs_cancellation*cancellation

the cancellation previously registered withdebugfs_enter_cancellation()

Description

See the documentation ofdebugfs_enter_cancellation().

voiddebugfs_create_u8(constchar*name,umode_tmode,structdentry*parent,u8*value)

create a debugfs file that is used to read and write an unsigned 8-bit value

Parameters

constchar*name

a pointer to a string containing the name of the file to create.

umode_tmode

the permission that the file should have

structdentry*parent

a pointer to the parent dentry for this file. This should be adirectory dentry if set. If this parameter isNULL, then thefile will be created in the root of the debugfs filesystem.

u8*value

a pointer to the variable that the file should read to and writefrom.

Description

This function creates a file in debugfs with the given name thatcontains the value of the variablevalue. If themode variable is soset, it can be read from, and written to.

voiddebugfs_create_u16(constchar*name,umode_tmode,structdentry*parent,u16*value)

create a debugfs file that is used to read and write an unsigned 16-bit value

Parameters

constchar*name

a pointer to a string containing the name of the file to create.

umode_tmode

the permission that the file should have

structdentry*parent

a pointer to the parent dentry for this file. This should be adirectory dentry if set. If this parameter isNULL, then thefile will be created in the root of the debugfs filesystem.

u16*value

a pointer to the variable that the file should read to and writefrom.

Description

This function creates a file in debugfs with the given name thatcontains the value of the variablevalue. If themode variable is soset, it can be read from, and written to.

voiddebugfs_create_u32(constchar*name,umode_tmode,structdentry*parent,u32*value)

create a debugfs file that is used to read and write an unsigned 32-bit value

Parameters

constchar*name

a pointer to a string containing the name of the file to create.

umode_tmode

the permission that the file should have

structdentry*parent

a pointer to the parent dentry for this file. This should be adirectory dentry if set. If this parameter isNULL, then thefile will be created in the root of the debugfs filesystem.

u32*value

a pointer to the variable that the file should read to and writefrom.

Description

This function creates a file in debugfs with the given name thatcontains the value of the variablevalue. If themode variable is soset, it can be read from, and written to.

voiddebugfs_create_u64(constchar*name,umode_tmode,structdentry*parent,u64*value)

create a debugfs file that is used to read and write an unsigned 64-bit value

Parameters

constchar*name

a pointer to a string containing the name of the file to create.

umode_tmode

the permission that the file should have

structdentry*parent

a pointer to the parent dentry for this file. This should be adirectory dentry if set. If this parameter isNULL, then thefile will be created in the root of the debugfs filesystem.

u64*value

a pointer to the variable that the file should read to and writefrom.

Description

This function creates a file in debugfs with the given name thatcontains the value of the variablevalue. If themode variable is soset, it can be read from, and written to.

voiddebugfs_create_ulong(constchar*name,umode_tmode,structdentry*parent,unsignedlong*value)

create a debugfs file that is used to read and write an unsigned long value.

Parameters

constchar*name

a pointer to a string containing the name of the file to create.

umode_tmode

the permission that the file should have

structdentry*parent

a pointer to the parent dentry for this file. This should be adirectory dentry if set. If this parameter isNULL, then thefile will be created in the root of the debugfs filesystem.

unsignedlong*value

a pointer to the variable that the file should read to and writefrom.

Description

This function creates a file in debugfs with the given name thatcontains the value of the variablevalue. If themode variable is soset, it can be read from, and written to.

voiddebugfs_create_x8(constchar*name,umode_tmode,structdentry*parent,u8*value)

create a debugfs file that is used to read and write an unsigned 8-bit value

Parameters

constchar*name

a pointer to a string containing the name of the file to create.

umode_tmode

the permission that the file should have

structdentry*parent

a pointer to the parent dentry for this file. This should be adirectory dentry if set. If this parameter isNULL, then thefile will be created in the root of the debugfs filesystem.

u8*value

a pointer to the variable that the file should read to and writefrom.

voiddebugfs_create_x16(constchar*name,umode_tmode,structdentry*parent,u16*value)

create a debugfs file that is used to read and write an unsigned 16-bit value

Parameters

constchar*name

a pointer to a string containing the name of the file to create.

umode_tmode

the permission that the file should have

structdentry*parent

a pointer to the parent dentry for this file. This should be adirectory dentry if set. If this parameter isNULL, then thefile will be created in the root of the debugfs filesystem.

u16*value

a pointer to the variable that the file should read to and writefrom.

voiddebugfs_create_x32(constchar*name,umode_tmode,structdentry*parent,u32*value)

create a debugfs file that is used to read and write an unsigned 32-bit value

Parameters

constchar*name

a pointer to a string containing the name of the file to create.

umode_tmode

the permission that the file should have

structdentry*parent

a pointer to the parent dentry for this file. This should be adirectory dentry if set. If this parameter isNULL, then thefile will be created in the root of the debugfs filesystem.

u32*value

a pointer to the variable that the file should read to and writefrom.

voiddebugfs_create_x64(constchar*name,umode_tmode,structdentry*parent,u64*value)

create a debugfs file that is used to read and write an unsigned 64-bit value

Parameters

constchar*name

a pointer to a string containing the name of the file to create.

umode_tmode

the permission that the file should have

structdentry*parent

a pointer to the parent dentry for this file. This should be adirectory dentry if set. If this parameter isNULL, then thefile will be created in the root of the debugfs filesystem.

u64*value

a pointer to the variable that the file should read to and writefrom.

voiddebugfs_create_size_t(constchar*name,umode_tmode,structdentry*parent,size_t*value)

create a debugfs file that is used to read and write an size_t value

Parameters

constchar*name

a pointer to a string containing the name of the file to create.

umode_tmode

the permission that the file should have

structdentry*parent

a pointer to the parent dentry for this file. This should be adirectory dentry if set. If this parameter isNULL, then thefile will be created in the root of the debugfs filesystem.

size_t*value

a pointer to the variable that the file should read to and writefrom.

voiddebugfs_create_atomic_t(constchar*name,umode_tmode,structdentry*parent,atomic_t*value)

create a debugfs file that is used to read and write an atomic_t value

Parameters

constchar*name

a pointer to a string containing the name of the file to create.

umode_tmode

the permission that the file should have

structdentry*parent

a pointer to the parent dentry for this file. This should be adirectory dentry if set. If this parameter isNULL, then thefile will be created in the root of the debugfs filesystem.

atomic_t*value

a pointer to the variable that the file should read to and writefrom.

voiddebugfs_create_bool(constchar*name,umode_tmode,structdentry*parent,bool*value)

create a debugfs file that is used to read and write a boolean value

Parameters

constchar*name

a pointer to a string containing the name of the file to create.

umode_tmode

the permission that the file should have

structdentry*parent

a pointer to the parent dentry for this file. This should be adirectory dentry if set. If this parameter isNULL, then thefile will be created in the root of the debugfs filesystem.

bool*value

a pointer to the variable that the file should read to and writefrom.

Description

This function creates a file in debugfs with the given name thatcontains the value of the variablevalue. If themode variable is soset, it can be read from, and written to.

voiddebugfs_create_str(constchar*name,umode_tmode,structdentry*parent,char**value)

create a debugfs file that is used to read and write a string value

Parameters

constchar*name

a pointer to a string containing the name of the file to create.

umode_tmode

the permission that the file should have

structdentry*parent

a pointer to the parent dentry for this file. This should be adirectory dentry if set. If this parameter isNULL, then thefile will be created in the root of the debugfs filesystem.

char**value

a pointer to the variable that the file should read to and writefrom.

Description

This function creates a file in debugfs with the given name thatcontains the value of the variablevalue. If themode variable is soset, it can be read from, and written to.

structdentry*debugfs_create_blob(constchar*name,umode_tmode,structdentry*parent,structdebugfs_blob_wrapper*blob)

create a debugfs file that is used to read and write a binary blob

Parameters

constchar*name

a pointer to a string containing the name of the file to create.

umode_tmode

the permission that the file should have

structdentry*parent

a pointer to the parent dentry for this file. This should be adirectory dentry if set. If this parameter isNULL, then thefile will be created in the root of the debugfs filesystem.

structdebugfs_blob_wrapper*blob

a pointer to astructdebugfs_blob_wrapper which contains a pointerto the blob data and the size of the data.

Description

This function creates a file in debugfs with the given name that exportsblob->data as a binary blob. If themode variable is so set it can beread from and written to.

This function will return a pointer to a dentry if it succeeds. Thispointer must be passed to thedebugfs_remove() function when the file isto be removed (no automatic cleanup happens if your module is unloaded,you are responsible here.) If an error occurs, ERR_PTR(-ERROR) will bereturned.

If debugfs is not enabled in the kernel, the value ERR_PTR(-ENODEV) willbe returned.

voiddebugfs_create_u32_array(constchar*name,umode_tmode,structdentry*parent,structdebugfs_u32_array*array)

create a debugfs file that is used to read u32 array.

Parameters

constchar*name

a pointer to a string containing the name of the file to create.

umode_tmode

the permission that the file should have.

structdentry*parent

a pointer to the parent dentry for this file. This should be adirectory dentry if set. If this parameter isNULL, then thefile will be created in the root of the debugfs filesystem.

structdebugfs_u32_array*array

wrapperstructcontaining data pointer and size of the array.

Description

This function creates a file in debugfs with the given name that exportsarray as data. If themode variable is so set it can be read from.Writing is not supported. Seek within the file is also not supported.Once array is created its size can not be changed.

voiddebugfs_print_regs32(structseq_file*s,conststructdebugfs_reg32*regs,intnregs,void__iomem*base,char*prefix)

use seq_print to describe a set of registers

Parameters

structseq_file*s

the seq_file structure being used to generate output

conststructdebugfs_reg32*regs

an array ifstructdebugfs_reg32 structures

intnregs

the length of the above array

void__iomem*base

the base address to be used in reading the registers

char*prefix

a string to be prefixed to every output line

Description

This function outputs a text block describing the current values ofsome 32-bit hardware registers. It is meant to be used within debugfsfiles based on seq_file that need to show registers, intermixed with otherinformation. The prefix argument may be used to specify a leading string,because some peripherals have several blocks of identical registers,for example configuration of dma channels

voiddebugfs_create_regset32(constchar*name,umode_tmode,structdentry*parent,structdebugfs_regset32*regset)

create a debugfs file that returns register values

Parameters

constchar*name

a pointer to a string containing the name of the file to create.

umode_tmode

the permission that the file should have

structdentry*parent

a pointer to the parent dentry for this file. This should be adirectory dentry if set. If this parameter isNULL, then thefile will be created in the root of the debugfs filesystem.

structdebugfs_regset32*regset

a pointer to astructdebugfs_regset32, which contains a pointerto an array of register definitions, the array size and the baseaddress where the register bank is to be found.

Description

This function creates a file in debugfs with the given name that reportsthe names and values of a set of 32-bit registers. If themode variableis so set it can be read from. Writing is not supported.

voiddebugfs_create_devm_seqfile(structdevice*dev,constchar*name,structdentry*parent,int(*read_fn)(structseq_file*s,void*data))

create a debugfs file that is bound to device.

Parameters

structdevice*dev

device related to this debugfs file.

constchar*name

name of the debugfs file.

structdentry*parent

a pointer to the parent dentry for this file. This should be adirectory dentry if set. If this parameter isNULL, then thefile will be created in the root of the debugfs filesystem.

int(*read_fn)(structseq_file*s,void*data)

function pointer called to print the seq_file content.