Idmappings

Most filesystem developers will have encountered idmappings. They are used whenreading from or writing ownership to disk, reporting ownership to userspace, orfor permission checking. This document is aimed at filesystem developers thatwant to know how idmappings work.

Formal notes

An idmapping is essentially a translation of a range of ids into another or thesame range of ids. The notational convention for idmappings that is widely usedin userspace is:

u:k:r

u indicates the first element in the upper idmapsetU andkindicates the first element in the lower idmapsetK. Ther parameterindicates the range of the idmapping, i.e. how many ids are mapped. From nowon, we will always prefix ids withu ork to make it clear whetherwe’re talking about an id in the upper or lower idmapset.

To see what this looks like in practice, let’s take the following idmapping:

u22:k10000:r3

and write down the mappings it will generate:

u22 -> k10000u23 -> k10001u24 -> k10002

From a mathematical viewpointU andK are well-ordered sets and anidmapping is an order isomorphism fromU intoK. SoU andK areorder isomorphic. In fact,U andK are always well-ordered subsets ofthe set of all possible ids usable on a given system.

Looking at this mathematically briefly will help us highlight some propertiesthat make it easier to understand how we can translate between idmappings. Forexample, we know that the inverse idmapping is an order isomorphism as well:

k10000 -> u22k10001 -> u23k10002 -> u24

Given that we are dealing with order isomorphisms plus the fact that we’redealing with subsets we can embed idmappings into each other, i.e. we cansensibly translate between different idmappings. For example, assume we’ve beengiven the three idmappings:

1. u0:k10000:r100002. u0:k20000:r100003. u0:k30000:r10000

and idk11000 which has been generated by the first idmapping by mappingu1000 from the upper idmapset down tok11000 in the lower idmapset.

Because we’re dealing with order isomorphic subsets it is meaningful to askwhat idk11000 corresponds to in the second or third idmapping. Thestraightforward algorithm to use is to apply the inverse of the first idmapping,mappingk11000 up tou1000. Afterwards, we can mapu1000 down usingeither the second idmapping mapping or third idmapping mapping. The secondidmapping would mapu1000 down tok21000. The third idmapping would mapu1000 down tok31000.

If we were given the same task for the following three idmappings:

1. u0:k10000:r100002. u0:k20000:r2003. u0:k30000:r300

we would fail to translate as the sets aren’t order isomorphic over the fullrange of the first idmapping anymore (However they are order isomorphic overthe full range of the second idmapping.). Neither the second or third idmappingcontainu1000 in the upper idmapsetU. This is equivalent to not havingan id mapped. We can simply say thatu1000 is unmapped in the second andthird idmapping. The kernel will report unmapped ids as the overflowuid(uid_t)-1 or overflowgid(gid_t)-1 to userspace.

The algorithm to calculate what a given id maps to is pretty simple. First, weneed to verify that the range can contain our target id. We will skip this stepfor simplicity. After that if we want to know whatid maps to we can dosimple calculations:

  • If we want to map from left to right:

    u:k:rid - u + k = n
  • If we want to map from right to left:

    u:k:rid - k + u = n

Instead of “left to right” we can also say “down” and instead of “right toleft” we can also say “up”. Obviously mapping down and up invert each other.

To see whether the simple formulas above work, consider the following twoidmappings:

1. u0:k20000:r100002. u500:k30000:r10000

Assume we are givenk21000 in the lower idmapset of the first idmapping. Wewant to know what id this was mapped from in the upper idmapset of the firstidmapping. So we’re mapping up in the first idmapping:

id     - k      + u  = nk21000 - k20000 + u0 = u1000

Now assume we are given the idu1100 in the upper idmapset of the secondidmapping and we want to know what this id maps down to in the lower idmapsetof the second idmapping. This means we’re mapping down in the secondidmapping:

id    - u    + k      = nu1100 - u500 + k30000 = k30600

General notes

In the context of the kernel an idmapping can be interpreted as mapping a rangeof userspace ids into a range of kernel ids:

userspace-id:kernel-id:range

A userspace id is always an element in the upper idmapset of an idmapping oftypeuid_t orgid_t and a kernel id is always an element in the loweridmapset of an idmapping of typekuid_t orkgid_t. From now on“userspace id” will be used to refer to the well knownuid_t andgid_ttypes and “kernel id” will be used to refer tokuid_t andkgid_t.

The kernel is mostly concerned with kernel ids. They are used when performingpermission checks and are stored in an inode’si_uid andi_gid field.A userspace id on the other hand is an id that is reported to userspace by thekernel, or is passed by userspace to the kernel, or a raw device id that iswritten or read from disk.

Note that we are only concerned with idmappings as the kernel stores them nothow userspace would specify them.

For the rest of this document we will prefix all userspace ids withu andall kernel ids withk. Ranges of idmappings will be prefixed withr. Soan idmapping will be written asu0:k10000:r10000.

For example, within this idmapping, the idu1000 is an id in the upperidmapset or “userspace idmapset” starting withu0. And it is mapped tok11000 which is a kernel id in the lower idmapset or “kernel idmapset”starting withk10000.

A kernel id is always created by an idmapping. Such idmappings are associatedwith user namespaces. Since we mainly care about how idmappings work we’re notgoing to be concerned with how idmappings are created nor how they are usedoutside of the filesystem context. This is best left to an explanation of usernamespaces.

The initial user namespace is special. It always has an idmapping of thefollowing form:

u0:k0:r4294967295

which is an identity idmapping over the full range of ids available on thissystem.

Other user namespaces usually have non-identity idmappings such as:

u0:k10000:r10000

When a process creates or wants to change ownership of a file, or when theownership of a file is read from disk by a filesystem, the userspace id isimmediately translated into a kernel id according to the idmapping associatedwith the relevant user namespace.

For instance, consider a file that is stored on disk by a filesystem as beingowned byu1000:

  • If a filesystem were to be mounted in the initial user namespaces (as mostfilesystems are) then the initial idmapping will be used. As we saw this issimply the identity idmapping. This would mean idu1000 read from diskwould be mapped to idk1000. So an inode’si_uid andi_gid fieldwould containk1000.

  • If a filesystem were to be mounted with an idmapping ofu0:k10000:r10000thenu1000 read from disk would be mapped tok11000. So an inode’si_uid andi_gid would containk11000.

Translation algorithms

We’ve already seen briefly that it is possible to translate between differentidmappings. We’ll now take a closer look how that works.

Crossmapping

This translation algorithm is used by the kernel in quite a few places. Forexample, it is used when reporting back the ownership of a file to userspacevia thestat() system call family.

If we’ve been givenk11000 from one idmapping we can map that id up inanother idmapping. In order for this to work both idmappings need to containthe same kernel id in their kernel idmapsets. For example, consider thefollowing idmappings:

1. u0:k10000:r100002. u20000:k10000:r10000

and we are mappingu1000 down tok11000 in the first idmapping . We canthen translatek11000 into a userspace id in the second idmapping using thekernel idmapset of the second idmapping:

/* Map the kernel id up into a userspace id in the second idmapping. */from_kuid(u20000:k10000:r10000, k11000) = u21000

Note, how we can get back to the kernel id in the first idmapping by invertingthe algorithm:

/* Map the userspace id down into a kernel id in the second idmapping. */make_kuid(u20000:k10000:r10000, u21000) = k11000/* Map the kernel id up into a userspace id in the first idmapping. */from_kuid(u0:k10000:r10000, k11000) = u1000

This algorithm allows us to answer the question what userspace id a givenkernel id corresponds to in a given idmapping. In order to be able to answerthis question both idmappings need to contain the same kernel id in theirrespective kernel idmapsets.

For example, when the kernel reads a raw userspace id from disk it maps it downinto a kernel id according to the idmapping associated with the filesystem.Let’s assume the filesystem was mounted with an idmapping ofu0:k20000:r10000 and it reads a file owned byu1000 from disk. Thismeansu1000 will be mapped tok21000 which is what will be stored inthe inode’si_uid andi_gid field.

When someone in userspace callsstat() or a related function to getownership information about the file the kernel can’t simply map the id back upaccording to the filesystem’s idmapping as this would give the wrong owner ifthe caller is using an idmapping.

So the kernel will map the id back up in the idmapping of the caller. Let’sassume the caller has the somewhat unconventional idmappingu3000:k20000:r10000 thenk21000 would map back up tou4000.Consequently the user would see that this file is owned byu4000.

Remapping

It is possible to translate a kernel id from one idmapping to another one viathe userspace idmapset of the two idmappings. This is equivalent to remappinga kernel id.

Let’s look at an example. We are given the following two idmappings:

1. u0:k10000:r100002. u0:k20000:r10000

and we are givenk11000 in the first idmapping. In order to translate thiskernel id in the first idmapping into a kernel id in the second idmapping weneed to perform two steps:

  1. Map the kernel id up into a userspace id in the first idmapping:

    /* Map the kernel id up into a userspace id in the first idmapping. */from_kuid(u0:k10000:r10000, k11000) = u1000
  2. Map the userspace id down into a kernel id in the second idmapping:

    /* Map the userspace id down into a kernel id in the second idmapping. */make_kuid(u0:k20000:r10000, u1000) = k21000

As you can see we used the userspace idmapset in both idmappings to translatethe kernel id in one idmapping to a kernel id in another idmapping.

This allows us to answer the question what kernel id we would need to use toget the same userspace id in another idmapping. In order to be able to answerthis question both idmappings need to contain the same userspace id in theirrespective userspace idmapsets.

Note, how we can easily get back to the kernel id in the first idmapping byinverting the algorithm:

  1. Map the kernel id up into a userspace id in the second idmapping:

    /* Map the kernel id up into a userspace id in the second idmapping. */from_kuid(u0:k20000:r10000, k21000) = u1000
  2. Map the userspace id down into a kernel id in the first idmapping:

    /* Map the userspace id down into a kernel id in the first idmapping. */make_kuid(u0:k10000:r10000, u1000) = k11000

Another way to look at this translation is to treat it as inverting oneidmapping and applying another idmapping if both idmappings have the relevantuserspace id mapped. This will come in handy when working with idmapped mounts.

Invalid translations

It is never valid to use an id in the kernel idmapset of one idmapping as theid in the userspace idmapset of another or the same idmapping. While the kernelidmapset always indicates an idmapset in the kernel id space the userspaceidmapset indicates a userspace id. So the following translations are forbidden:

/* Map the userspace id down into a kernel id in the first idmapping. */make_kuid(u0:k10000:r10000, u1000) = k11000/* INVALID: Map the kernel id down into a kernel id in the second idmapping. */make_kuid(u10000:k20000:r10000, k110000) = k21000                                ~~~~~~~

and equally wrong:

/* Map the kernel id up into a userspace id in the first idmapping. */from_kuid(u0:k10000:r10000, k11000) = u1000/* INVALID: Map the userspace id up into a userspace id in the second idmapping. */from_kuid(u20000:k0:r10000, u1000) = k21000                            ~~~~~

Since userspace ids have typeuid_t andgid_t and kernel ids have typekuid_t andkgid_t the compiler will throw an error when they areconflated. So the two examples above would cause a compilation failure.

Idmappings when creating filesystem objects

The concepts of mapping an id down or mapping an id up are expressed in the twokernel functions filesystem developers are rather familiar with and which we’vealready used in this document:

/* Map the userspace id down into a kernel id. */make_kuid(idmapping, uid)/* Map the kernel id up into a userspace id. */from_kuid(idmapping, kuid)

We will take an abbreviated look into how idmappings figure into creatingfilesystem objects. For simplicity we will only look at what happens when theVFS has already completed path lookup right before it calls into the filesystemitself. So we’re concerned with what happens when e.g.vfs_mkdir() iscalled. We will also assume that the directory we’re creating filesystemobjects in is readable and writable for everyone.

When creating a filesystem object the caller will look at the caller’sfilesystem ids. These are just regularuid_t andgid_t userspace idsbut they are exclusively used when determining file ownership which is why theyare called “filesystem ids”. They are usually identical to the uid and gid ofthe caller but can differ. We will just assume they are always identical to notget lost in too many details.

When the caller enters the kernel two things happen:

  1. Map the caller’s userspace ids down into kernel ids in the caller’sidmapping.(To be precise, the kernel will simply look at the kernel ids stashed in thecredentials of the current task but for our education we’ll pretend thistranslation happens just in time.)

  2. Verify that the caller’s kernel ids can be mapped up to userspace ids in thefilesystem’s idmapping.

The second step is important as regular filesystem will ultimately need to mapthe kernel id back up into a userspace id when writing to disk.So with the second step the kernel guarantees that a valid userspace id can bewritten to disk. If it can’t the kernel will refuse the creation request to noteven remotely risk filesystem corruption.

The astute reader will have realized that this is simply a variation of thecrossmapping algorithm we mentioned above in a previous section. First, thekernel maps the caller’s userspace id down into a kernel id according to thecaller’s idmapping and then maps that kernel id up according to thefilesystem’s idmapping.

From the implementation point it’s worth mentioning how idmappings are represented.All idmappings are taken from the corresponding user namespace.

  • caller’s idmapping (usually taken fromcurrent_user_ns())

  • filesystem’s idmapping (sb->s_user_ns)

  • mount’s idmapping (mnt_idmap(vfsmnt))

Let’s see some examples with caller/filesystem idmapping but without mountidmappings. This will exhibit some problems we can hit. After that we willrevisit/reconsider these examples, this time using mount idmappings, to see howthey can solve the problems we observed before.

Example 1

caller id:            u1000caller idmapping:     u0:k0:r4294967295filesystem idmapping: u0:k0:r4294967295

Both the caller and the filesystem use the identity idmapping:

  1. Map the caller’s userspace ids into kernel ids in the caller’s idmapping:

    make_kuid(u0:k0:r4294967295, u1000) = k1000
  2. Verify that the caller’s kernel ids can be mapped to userspace ids in thefilesystem’s idmapping.

    For this second step the kernel will call the functionfsuidgid_has_mapping() which ultimately boils down to callingfrom_kuid():

    from_kuid(u0:k0:r4294967295, k1000) = u1000

In this example both idmappings are the same so there’s nothing exciting goingon. Ultimately the userspace id that lands on disk will beu1000.

Example 2

caller id:            u1000caller idmapping:     u0:k10000:r10000filesystem idmapping: u0:k20000:r10000
  1. Map the caller’s userspace ids down into kernel ids in the caller’sidmapping:

    make_kuid(u0:k10000:r10000, u1000) = k11000
  2. Verify that the caller’s kernel ids can be mapped up to userspace ids in thefilesystem’s idmapping:

    from_kuid(u0:k20000:r10000, k11000) = u-1

It’s immediately clear that while the caller’s userspace id could besuccessfully mapped down into kernel ids in the caller’s idmapping the kernelids could not be mapped up according to the filesystem’s idmapping. So thekernel will deny this creation request.

Note that while this example is less common, because most filesystem can’t bemounted with non-initial idmappings this is a general problem as we can see inthe next examples.

Example 3

caller id:            u1000caller idmapping:     u0:k10000:r10000filesystem idmapping: u0:k0:r4294967295
  1. Map the caller’s userspace ids down into kernel ids in the caller’sidmapping:

    make_kuid(u0:k10000:r10000, u1000) = k11000
  2. Verify that the caller’s kernel ids can be mapped up to userspace ids in thefilesystem’s idmapping:

    from_kuid(u0:k0:r4294967295, k11000) = u11000

We can see that the translation always succeeds. The userspace id that thefilesystem will ultimately put to disk will always be identical to the value ofthe kernel id that was created in the caller’s idmapping. This has mainly twoconsequences.

First, that we can’t allow a caller to ultimately write to disk with anotheruserspace id. We could only do this if we were to mount the whole filesystemwith the caller’s or another idmapping. But that solution is limited to a fewfilesystems and not very flexible. But this is a use-case that is prettyimportant in containerized workloads.

Second, the caller will usually not be able to create any files or accessdirectories that have stricter permissions because none of the filesystem’skernel ids map up into valid userspace ids in the caller’s idmapping

  1. Map raw userspace ids down to kernel ids in the filesystem’s idmapping:

    make_kuid(u0:k0:r4294967295, u1000) = k1000
  2. Map kernel ids up to userspace ids in the caller’s idmapping:

    from_kuid(u0:k10000:r10000, k1000) = u-1

Example 4

file id:              u1000caller idmapping:     u0:k10000:r10000filesystem idmapping: u0:k0:r4294967295

In order to report ownership to userspace the kernel uses the crossmappingalgorithm introduced in a previous section:

  1. Map the userspace id on disk down into a kernel id in the filesystem’sidmapping:

    make_kuid(u0:k0:r4294967295, u1000) = k1000
  2. Map the kernel id up into a userspace id in the caller’s idmapping:

    from_kuid(u0:k10000:r10000, k1000) = u-1

The crossmapping algorithm fails in this case because the kernel id in thefilesystem idmapping cannot be mapped up to a userspace id in the caller’sidmapping. Thus, the kernel will report the ownership of this file as theoverflowid.

Example 5

file id:              u1000caller idmapping:     u0:k10000:r10000filesystem idmapping: u0:k20000:r10000

In order to report ownership to userspace the kernel uses the crossmappingalgorithm introduced in a previous section:

  1. Map the userspace id on disk down into a kernel id in the filesystem’sidmapping:

    make_kuid(u0:k20000:r10000, u1000) = k21000
  2. Map the kernel id up into a userspace id in the caller’s idmapping:

    from_kuid(u0:k10000:r10000, k21000) = u-1

Again, the crossmapping algorithm fails in this case because the kernel id inthe filesystem idmapping cannot be mapped to a userspace id in the caller’sidmapping. Thus, the kernel will report the ownership of this file as theoverflowid.

Note how in the last two examples things would be simple if the caller would beusing the initial idmapping. For a filesystem mounted with the initialidmapping it would be trivial. So we only consider a filesystem with anidmapping ofu0:k20000:r10000:

  1. Map the userspace id on disk down into a kernel id in the filesystem’sidmapping:

    make_kuid(u0:k20000:r10000, u1000) = k21000
  2. Map the kernel id up into a userspace id in the caller’s idmapping:

    from_kuid(u0:k0:r4294967295, k21000) = u21000

Idmappings on idmapped mounts

The examples we’ve seen in the previous section where the caller’s idmappingand the filesystem’s idmapping are incompatible causes various issues forworkloads. For a more complex but common example, consider two containersstarted on the host. To completely prevent the two containers from affectingeach other, an administrator may often use different non-overlapping idmappingsfor the two containers:

container1 idmapping:  u0:k10000:r10000container2 idmapping:  u0:k20000:r10000filesystem idmapping:  u0:k30000:r10000

An administrator wanting to provide easy read-write access to the following setof files:

dir id:       u0dir/file1 id: u1000dir/file2 id: u2000

to both containers currently can’t.

Of course the administrator has the option to recursively change ownership viachown(). For example, they could change ownership so thatdir and allfiles below it can be crossmapped from the filesystem’s into the container’sidmapping. Let’s assume they change ownership so it is compatible with thefirst container’s idmapping:

dir id:       u10000dir/file1 id: u11000dir/file2 id: u12000

This would still leavedir rather useless to the second container. In fact,dir and all files below it would continue to appear owned by the overflowidfor the second container.

Or consider another increasingly popular example. Some service managers such assystemd implement a concept called “portable home directories”. A user may wantto use their home directories on different machines where they are assigneddifferent login userspace ids. Most users will haveu1000 as the login idon their machine at home and all files in their home directory will usually beowned byu1000. At uni or at work they may have another login id such asu1125. This makes it rather difficult to interact with their home directoryon their work machine.

In both cases changing ownership recursively has grave implications. The mostobvious one is that ownership is changed globally and permanently. In the homedirectory case this change in ownership would even need to happen every time theuser switches from their home to their work machine. For really large sets offiles this becomes increasingly costly.

If the user is lucky, they are dealing with a filesystem that is mountableinside user namespaces. But this would also change ownership globally and thechange in ownership is tied to the lifetime of the filesystem mount, i.e. thesuperblock. The only way to change ownership is to completely unmount thefilesystem and mount it again in another user namespace. This is usuallyimpossible because it would mean that all users currently accessing thefilesystem can’t anymore. And it means thatdir still can’t be sharedbetween two containers with different idmappings.But usually the user doesn’t even have this option since most filesystemsaren’t mountable inside containers. And not having them mountable might bedesirable as it doesn’t require the filesystem to deal with maliciousfilesystem images.

But the usecases mentioned above and more can be handled by idmapped mounts.They allow to expose the same set of dentries with different ownership atdifferent mounts. This is achieved by marking the mounts with a user namespacethrough themount_setattr() system call. The idmapping associated with itis then used to translate from the caller’s idmapping to the filesystem’sidmapping and vica versa using the remapping algorithm we introduced above.

Idmapped mounts make it possible to change ownership in a temporary andlocalized way. The ownership changes are restricted to a specific mount and theownership changes are tied to the lifetime of the mount. All other users andlocations where the filesystem is exposed are unaffected.

Filesystems that support idmapped mounts don’t have any real reason to supportbeing mountable inside user namespaces. A filesystem could be exposedcompletely under an idmapped mount to get the same effect. This has theadvantage that filesystems can leave the creation of the superblock toprivileged users in the initial user namespace.

However, it is perfectly possible to combine idmapped mounts with filesystemsmountable inside user namespaces. We will touch on this further below.

Filesystem types vs idmapped mount types

With the introduction of idmapped mounts we need to distinguish betweenfilesystem ownership and mount ownership of a VFS object such as an inode. Theowner of a inode might be different when looked at from a filesystemperspective than when looked at from an idmapped mount. Such fundamentalconceptual distinctions should almost always be clearly expressed in the code.So, to distinguish idmapped mount ownership from filesystem ownership separatetypes have been introduced.

If a uid or gid has been generated using the filesystem or caller’s idmappingthen we will use thekuid_t andkgid_t types. However, if a uid or gidhas been generated using a mount idmapping then we will be using the dedicatedvfsuid_t andvfsgid_t types.

All VFS helpers that generate or take uids and gids as arguments use thevfsuid_t andvfsgid_t types and we will be able to rely on the compilerto catch errors that originate from conflating filesystem and VFS uids and gids.

Thevfsuid_t andvfsgid_t types are often mapped from and tokuid_tandkgid_t types similar howkuid_t andkgid_t types are mappedfrom and touid_t andgid_t types:

uid_t <--> kuid_t <--> vfsuid_tgid_t <--> kgid_t <--> vfsgid_t

Whenever we report ownership based on avfsuid_t orvfsgid_t type,e.g., duringstat(), or store ownership information in a shared VFS objectbased on avfsuid_t orvfsgid_t type, e.g., duringchown() we canuse thevfsuid_into_kuid() andvfsgid_into_kgid() helpers.

To illustrate why this helper currently exists, consider what happens when wechange ownership of an inode from an idmapped mount. After we generatedavfsuid_t orvfsgid_t based on the mount idmapping we later commit tothisvfsuid_t orvfsgid_t to become the new filesystem wide ownership.Thus, we are turning thevfsuid_t orvfsgid_t into a globalkuid_torkgid_t. And this can be done by usingvfsuid_into_kuid() andvfsgid_into_kgid().

Note, whenever a shared VFS object, e.g., a cachedstructinode or a cachedstructposix_acl, stores ownership information a filesystem or “global”kuid_t andkgid_t must be used. Ownership expressed viavfsuid_tandvfsgid_t is specific to an idmapped mount.

We already noted thatvfsuid_t andvfsgid_t types are generated basedon mount idmappings whereaskuid_t andkgid_t types are generated basedon filesystem idmappings. To prevent abusing filesystem idmappings to generatevfsuid_t orvfsgid_t types or mount idmappings to generatekuid_torkgid_t types filesystem idmappings and mount idmappings are differenttypes as well.

All helpers that map to or fromvfsuid_t andvfsgid_t types requirea mount idmapping to be passed which is of typestructmnt_idmap. Passinga filesystem or caller idmapping will cause a compilation error.

Similar to how we prefix all userspace ids in this document withu and allkernel ids withk we will prefix all VFS ids withv. So a mountidmapping will be written as:u0:v10000:r10000.

Remapping helpers

Idmapping functions were added that translate between idmappings. They make useof the remapping algorithm we’ve introduced earlier. We’re going to look at:

  • i_uid_into_vfsuid() andi_gid_into_vfsgid()

    Thei_*id_into_vfs*id() functions translate filesystem’s kernel ids intoVFS ids in the mount’s idmapping:

    /* Map the filesystem's kernel id up into a userspace id in the filesystem's idmapping. */from_kuid(filesystem, kid) = uid/* Map the filesystem's userspace id down ito a VFS id in the mount's idmapping. */make_kuid(mount, uid) = kuid
  • mapped_fsuid() andmapped_fsgid()

    Themapped_fs*id() functions translate the caller’s kernel ids intokernel ids in the filesystem’s idmapping. This translation is achieved byremapping the caller’s VFS ids using the mount’s idmapping:

    /* Map the caller's VFS id up into a userspace id in the mount's idmapping. */from_kuid(mount, kid) = uid/* Map the mount's userspace id down into a kernel id in the filesystem's idmapping. */make_kuid(filesystem, uid) = kuid
  • vfsuid_into_kuid() andvfsgid_into_kgid()

    Whenever

Note that these two functions invert each other. Consider the followingidmappings:

caller idmapping:     u0:k10000:r10000filesystem idmapping: u0:k20000:r10000mount idmapping:      u0:v10000:r10000

Assume a file owned byu1000 is read from disk. The filesystem maps this idtok21000 according to its idmapping. This is what is stored in theinode’si_uid andi_gid fields.

When the caller queries the ownership of this file viastat() the kernelwould usually simply use the crossmapping algorithm and map the filesystem’skernel id up to a userspace id in the caller’s idmapping.

But when the caller is accessing the file on an idmapped mount the kernel willfirst calli_uid_into_vfsuid() thereby translating the filesystem’s kernelid into a VFS id in the mount’s idmapping:

i_uid_into_vfsuid(k21000):  /* Map the filesystem's kernel id up into a userspace id. */  from_kuid(u0:k20000:r10000, k21000) = u1000  /* Map the filesystem's userspace id down into a VFS id in the mount's idmapping. */  make_kuid(u0:v10000:r10000, u1000) = v11000

Finally, when the kernel reports the owner to the caller it will turn theVFS id in the mount’s idmapping into a userspace id in the caller’sidmapping:

k11000 = vfsuid_into_kuid(v11000)from_kuid(u0:k10000:r10000, k11000) = u1000

We can test whether this algorithm really works by verifying what happens whenwe create a new file. Let’s say the user is creating a file withu1000.

The kernel maps this tok11000 in the caller’s idmapping. Usually thekernel would now apply the crossmapping, verifying thatk11000 can bemapped to a userspace id in the filesystem’s idmapping. Sincek11000 can’tbe mapped up in the filesystem’s idmapping directly this creation requestfails.

But when the caller is accessing the file on an idmapped mount the kernel willfirst callmapped_fs*id() thereby translating the caller’s kernel id intoa VFS id according to the mount’s idmapping:

mapped_fsuid(k11000):   /* Map the caller's kernel id up into a userspace id in the mount's idmapping. */   from_kuid(u0:k10000:r10000, k11000) = u1000   /* Map the mount's userspace id down into a kernel id in the filesystem's idmapping. */   make_kuid(u0:v20000:r10000, u1000) = v21000

When finally writing to disk the kernel will then mapv21000 up into auserspace id in the filesystem’s idmapping:

k21000 = vfsuid_into_kuid(v21000)from_kuid(u0:k20000:r10000, k21000) = u1000

As we can see, we end up with an invertible and therefore informationpreserving algorithm. A file created fromu1000 on an idmapped mount willalso be reported as being owned byu1000 and vica versa.

Let’s now briefly reconsider the failing examples from earlier in the contextof idmapped mounts.

Example 2 reconsidered

caller id:            u1000caller idmapping:     u0:k10000:r10000filesystem idmapping: u0:k20000:r10000mount idmapping:      u0:v10000:r10000

When the caller is using a non-initial idmapping the common case is to attachthe same idmapping to the mount. We now perform three steps:

  1. Map the caller’s userspace ids into kernel ids in the caller’s idmapping:

    make_kuid(u0:k10000:r10000, u1000) = k11000
  2. Translate the caller’s VFS id into a kernel id in the filesystem’sidmapping:

    mapped_fsuid(v11000):  /* Map the VFS id up into a userspace id in the mount's idmapping. */  from_kuid(u0:v10000:r10000, v11000) = u1000  /* Map the userspace id down into a kernel id in the filesystem's idmapping. */  make_kuid(u0:k20000:r10000, u1000) = k21000
  3. Verify that the caller’s kernel ids can be mapped to userspace ids in thefilesystem’s idmapping:

    from_kuid(u0:k20000:r10000, k21000) = u1000

So the ownership that lands on disk will beu1000.

Example 3 reconsidered

caller id:            u1000caller idmapping:     u0:k10000:r10000filesystem idmapping: u0:k0:r4294967295mount idmapping:      u0:v10000:r10000

The same translation algorithm works with the third example.

  1. Map the caller’s userspace ids into kernel ids in the caller’s idmapping:

    make_kuid(u0:k10000:r10000, u1000) = k11000
  2. Translate the caller’s VFS id into a kernel id in the filesystem’sidmapping:

    mapped_fsuid(v11000):   /* Map the VFS id up into a userspace id in the mount's idmapping. */   from_kuid(u0:v10000:r10000, v11000) = u1000   /* Map the userspace id down into a kernel id in the filesystem's idmapping. */   make_kuid(u0:k0:r4294967295, u1000) = k1000
  3. Verify that the caller’s kernel ids can be mapped to userspace ids in thefilesystem’s idmapping:

    from_kuid(u0:k0:r4294967295, k1000) = u1000

So the ownership that lands on disk will beu1000.

Example 4 reconsidered

file id:              u1000caller idmapping:     u0:k10000:r10000filesystem idmapping: u0:k0:r4294967295mount idmapping:      u0:v10000:r10000

In order to report ownership to userspace the kernel now does three steps usingthe translation algorithm we introduced earlier:

  1. Map the userspace id on disk down into a kernel id in the filesystem’sidmapping:

    make_kuid(u0:k0:r4294967295, u1000) = k1000
  2. Translate the kernel id into a VFS id in the mount’s idmapping:

    i_uid_into_vfsuid(k1000):  /* Map the kernel id up into a userspace id in the filesystem's idmapping. */  from_kuid(u0:k0:r4294967295, k1000) = u1000  /* Map the userspace id down into a VFS id in the mounts's idmapping. */  make_kuid(u0:v10000:r10000, u1000) = v11000
  3. Map the VFS id up into a userspace id in the caller’s idmapping:

    k11000 = vfsuid_into_kuid(v11000)from_kuid(u0:k10000:r10000, k11000) = u1000

Earlier, the caller’s kernel id couldn’t be crossmapped in the filesystems’sidmapping. With the idmapped mount in place it now can be crossmapped into thefilesystem’s idmapping via the mount’s idmapping. The file will now be createdwithu1000 according to the mount’s idmapping.

Example 5 reconsidered

file id:              u1000caller idmapping:     u0:k10000:r10000filesystem idmapping: u0:k20000:r10000mount idmapping:      u0:v10000:r10000

Again, in order to report ownership to userspace the kernel now does threesteps using the translation algorithm we introduced earlier:

  1. Map the userspace id on disk down into a kernel id in the filesystem’sidmapping:

    make_kuid(u0:k20000:r10000, u1000) = k21000
  2. Translate the kernel id into a VFS id in the mount’s idmapping:

    i_uid_into_vfsuid(k21000):  /* Map the kernel id up into a userspace id in the filesystem's idmapping. */  from_kuid(u0:k20000:r10000, k21000) = u1000  /* Map the userspace id down into a VFS id in the mounts's idmapping. */  make_kuid(u0:v10000:r10000, u1000) = v11000
  3. Map the VFS id up into a userspace id in the caller’s idmapping:

    k11000 = vfsuid_into_kuid(v11000)from_kuid(u0:k10000:r10000, k11000) = u1000

Earlier, the file’s kernel id couldn’t be crossmapped in the filesystems’sidmapping. With the idmapped mount in place it now can be crossmapped into thefilesystem’s idmapping via the mount’s idmapping. The file is now owned byu1000 according to the mount’s idmapping.

Changing ownership on a home directory

We’ve seen above how idmapped mounts can be used to translate betweenidmappings when either the caller, the filesystem or both uses a non-initialidmapping. A wide range of usecases exist when the caller is usinga non-initial idmapping. This mostly happens in the context of containerizedworkloads. The consequence is as we have seen that for both, filesystem’smounted with the initial idmapping and filesystems mounted with non-initialidmappings, access to the filesystem isn’t working because the kernel ids can’tbe crossmapped between the caller’s and the filesystem’s idmapping.

As we’ve seen above idmapped mounts provide a solution to this by remapping thecaller’s or filesystem’s idmapping according to the mount’s idmapping.

Aside from containerized workloads, idmapped mounts have the advantage thatthey also work when both the caller and the filesystem use the initialidmapping which means users on the host can change the ownership of directoriesand files on a per-mount basis.

Consider our previous example where a user has their home directory on portablestorage. At home they have idu1000 and all files in their home directoryare owned byu1000 whereas at uni or work they have login idu1125.

Taking their home directory with them becomes problematic. They can’t easilyaccess their files, they might not be able to write to disk without applyinglax permissions or ACLs and even if they can, they will end up with an annoyingmix of files and directories owned byu1000 andu1125.

Idmapped mounts allow to solve this problem. A user can create an idmappedmount for their home directory on their work computer or their computer at homedepending on what ownership they would prefer to end up on the portable storageitself.

Let’s assume they want all files on disk to belong tou1000. When the userplugs in their portable storage at their work station they can setup a job thatcreates an idmapped mount with the minimal idmappingu1000:k1125:r1. So nowwhen they create a file the kernel performs the following steps we already knowfrom above::

caller id:            u1125caller idmapping:     u0:k0:r4294967295filesystem idmapping: u0:k0:r4294967295mount idmapping:      u1000:v1125:r1
  1. Map the caller’s userspace ids into kernel ids in the caller’s idmapping:

    make_kuid(u0:k0:r4294967295, u1125) = k1125
  2. Translate the caller’s VFS id into a kernel id in the filesystem’sidmapping:

    mapped_fsuid(v1125):  /* Map the VFS id up into a userspace id in the mount's idmapping. */  from_kuid(u1000:v1125:r1, v1125) = u1000  /* Map the userspace id down into a kernel id in the filesystem's idmapping. */  make_kuid(u0:k0:r4294967295, u1000) = k1000
  3. Verify that the caller’s filesystem ids can be mapped to userspace ids in thefilesystem’s idmapping:

    from_kuid(u0:k0:r4294967295, k1000) = u1000

So ultimately the file will be created withu1000 on disk.

Now let’s briefly look at what ownership the caller with idu1125 will seeon their work computer:

file id:              u1000caller idmapping:     u0:k0:r4294967295filesystem idmapping: u0:k0:r4294967295mount idmapping:      u1000:v1125:r1
  1. Map the userspace id on disk down into a kernel id in the filesystem’sidmapping:

    make_kuid(u0:k0:r4294967295, u1000) = k1000
  2. Translate the kernel id into a VFS id in the mount’s idmapping:

    i_uid_into_vfsuid(k1000):  /* Map the kernel id up into a userspace id in the filesystem's idmapping. */  from_kuid(u0:k0:r4294967295, k1000) = u1000  /* Map the userspace id down into a VFS id in the mounts's idmapping. */  make_kuid(u1000:v1125:r1, u1000) = v1125
  3. Map the VFS id up into a userspace id in the caller’s idmapping:

    k1125 = vfsuid_into_kuid(v1125)from_kuid(u0:k0:r4294967295, k1125) = u1125

So ultimately the caller will be reported that the file belongs tou1125which is the caller’s userspace id on their workstation in our example.

The raw userspace id that is put on disk isu1000 so when the user takestheir home directory back to their home computer where they are assignedu1000 using the initial idmapping and mount the filesystem with the initialidmapping they will see all those files owned byu1000.