Credentials in Linux¶
By: David Howells <dhowells@redhat.com>
Overview¶
There are several parts to the security check performed by Linux when oneobject acts upon another:
Objects.
Objects are things in the system that may be acted upon directly byuserspace programs. Linux has a variety of actionable objects, including:
- Tasks
- Files/inodes
- Sockets
- Message queues
- Shared memory segments
- Semaphores
- Keys
As a part of the description of all these objects there is a set ofcredentials. What’s in the set depends on the type of object.
Object ownership.
Amongst the credentials of most objects, there will be a subset thatindicates the ownership of that object. This is used for resourceaccounting and limitation (disk quotas and task rlimits for example).
In a standard UNIX filesystem, for instance, this will be defined by theUID marked on the inode.
The objective context.
Also amongst the credentials of those objects, there will be a subset thatindicates the ‘objective context’ of that object. This may or may not bethe same set as in (2) - in standard UNIX files, for instance, this is thedefined by the UID and the GID marked on the inode.
The objective context is used as part of the security calculation that iscarried out when an object is acted upon.
Subjects.
A subject is an object that is acting upon another object.
Most of the objects in the system are inactive: they don’t act on otherobjects within the system. Processes/tasks are the obvious exception:they do stuff; they access and manipulate things.
Objects other than tasks may under some circumstances also be subjects.For instance an open file may send SIGIO to a task using the UID and EUIDgiven to it by a task that called
fcntl(F_SETOWN)upon it. In this case,the file struct will have a subjective context too.The subjective context.
A subject has an additional interpretation of its credentials. A subsetof its credentials forms the ‘subjective context’. The subjective contextis used as part of the security calculation that is carried out when asubject acts.
A Linux task, for example, has the FSUID, FSGID and the supplementarygroup list for when it is acting upon a file - which are quite separatefrom the real UID and GID that normally form the objective context of thetask.
Actions.
Linux has a number of actions available that a subject may perform upon anobject. The set of actions available depends on the nature of the subjectand the object.
Actions include reading, writing, creating and deleting files; forking orsignalling and tracing tasks.
Rules, access control lists and security calculations.
When a subject acts upon an object, a security calculation is made. Thisinvolves taking the subjective context, the objective context and theaction, and searching one or more sets of rules to see whether the subjectis granted or denied permission to act in the desired manner on theobject, given those contexts.
There are two main sources of rules:
Discretionary access control (DAC):
Sometimes the object will include sets of rules as part of itsdescription. This is an ‘Access Control List’ or ‘ACL’. A Linuxfile may supply more than one ACL.
A traditional UNIX file, for example, includes a permissions mask thatis an abbreviated ACL with three fixed classes of subject (‘user’,‘group’ and ‘other’), each of which may be granted certain privileges(‘read’, ‘write’ and ‘execute’ - whatever those map to for the objectin question). UNIX file permissions do not allow the arbitraryspecification of subjects, however, and so are of limited use.
A Linux file might also sport a POSIX ACL. This is a list of rulesthat grants various permissions to arbitrary subjects.
Mandatory access control (MAC):
The system as a whole may have one or more sets of rules that getapplied to all subjects and objects, regardless of their source.SELinux and Smack are examples of this.
In the case of SELinux and Smack, each object is given a label as partof its credentials. When an action is requested, they take thesubject label, the object label and the action and look for a rulethat says that this action is either granted or denied.
Types of Credentials¶
The Linux kernel supports the following types of credentials:
Traditional UNIX credentials.
- Real User ID
- Real Group ID
The UID and GID are carried by most, if not all, Linux objects, even if insome cases it has to be invented (FAT or CIFS files for example, which arederived from Windows). These (mostly) define the objective context ofthat object, with tasks being slightly different in some cases.
- Effective, Saved and FS User ID
- Effective, Saved and FS Group ID
- Supplementary groups
These are additional credentials used by tasks only. Usually, anEUID/EGID/GROUPS will be used as the subjective context, and real UID/GIDwill be used as the objective. For tasks, it should be noted that this isnot always true.
Capabilities.
- Set of permitted capabilities
- Set of inheritable capabilities
- Set of effective capabilities
- Capability bounding set
These are only carried by tasks. They indicate superior capabilitiesgranted piecemeal to a task that an ordinary task wouldn’t otherwise have.These are manipulated implicitly by changes to the traditional UNIXcredentials, but can also be manipulated directly by the
capset()system call.The permitted capabilities are those caps that the process might grantitself to its effective or permitted sets through
capset(). Thisinheritable set might also be so constrained.The effective capabilities are the ones that a task is actually allowed tomake use of itself.
The inheritable capabilities are the ones that may get passed across
execve().The bounding set limits the capabilities that may be inherited across
execve(), especially when a binary is executed that will execute asUID 0.Secure management flags (securebits).
These are only carried by tasks. These govern the way the abovecredentials are manipulated and inherited over certain operations such asexecve(). They aren’t used directly as objective or subjectivecredentials.
Keys and keyrings.
These are only carried by tasks. They carry and cache security tokensthat don’t fit into the other standard UNIX credentials. They are formaking such things as network filesystem keys available to the fileaccesses performed by processes, without the necessity of ordinaryprograms having to know about security details involved.
Keyrings are a special type of key. They carry sets of other keys and canbe searched for the desired key. Each process may subscribe to a numberof keyrings:
Per-thread keyingPer-process keyringPer-session keyring
When a process accesses a key, if not already present, it will normally becached on one of these keyrings for future accesses to find.
For more information on using keys, see
Documentation/security/keys/*.LSM
The Linux Security Module allows extra controls to be placed over theoperations that a task may do. Currently Linux supports several LSMoptions.
Some work by labelling the objects in a system and then applying sets ofrules (policies) that say what operations a task with one label may do toan object with another label.
AF_KEY
This is a socket-based approach to credential management for networkingstacks [RFC 2367]. It isn’t discussed by this document as it doesn’tinteract directly with task and file credentials; rather it keeps systemlevel credentials.
When a file is opened, part of the opening task’s subjective context isrecorded in the file struct created. This allows operations using that filestruct to use those credentials instead of the subjective context of the taskthat issued the operation. An example of this would be a file opened on anetwork filesystem where the credentials of the opened file should be presentedto the server, regardless of who is actually doing a read or a write upon it.
File Markings¶
Files on disk or obtained over the network may have annotations that form theobjective security context of that file. Depending on the type of filesystem,this may include one or more of the following:
- UNIX UID, GID, mode;
- Windows user ID;
- Access control list;
- LSM security label;
- UNIX exec privilege escalation bits (SUID/SGID);
- File capabilities exec privilege escalation bits.
These are compared to the task’s subjective security context, and certainoperations allowed or disallowed as a result. In the case of execve(), theprivilege escalation bits come into play, and may allow the resulting processextra privileges, based on the annotations on the executable file.
Task Credentials¶
In Linux, all of a task’s credentials are held in (uid, gid) or through(groups, keys, LSM security) a refcounted structure of type ‘struct cred’.Each task points to its credentials by a pointer called ‘cred’ in itstask_struct.
Once a set of credentials has been prepared and committed, it may not bechanged, barring the following exceptions:
- its reference count may be changed;
- the reference count on the group_info struct it points to may be changed;
- the reference count on the security data it points to may be changed;
- the reference count on any keyrings it points to may be changed;
- any keyrings it points to may be revoked, expired or have their securityattributes changed; and
- the contents of any keyrings to which it points may be changed (the wholepoint of keyrings being a shared set of credentials, modifiable by anyonewith appropriate access).
To alter anything in the cred struct, the copy-and-replace principle must beadhered to. First take a copy, then alter the copy and then use RCU to changethe task pointer to make it point to the new copy. There are wrappers to aidwith this (see below).
A task may only alter its _own_ credentials; it is no longer permitted for atask to alter another’s credentials. This means thecapset() system callis no longer permitted to take any PID other than the one of the currentprocess. Alsokeyctl_instantiate() andkeyctl_negate() functions nolonger permit attachment to process-specific keyrings in the requestingprocess as the instantiating process may need to create them.
Immutable Credentials¶
Once a set of credentials has been made public (by callingcommit_creds()for example), it must be considered immutable, barring two exceptions:
- The reference count may be altered.
- While the keyring subscriptions of a set of credentials may not bechanged, the keyrings subscribed to may have their contents altered.
To catch accidental credential alteration at compile time, struct task_structhas _const_ pointers to its credential sets, as does struct file. Furthermore,certain functions such asget_cred() andput_cred() operate on constpointers, thus rendering casts unnecessary, but require to temporarily ditchthe const qualification to be able to alter the reference count.
Accessing Task Credentials¶
A task being able to alter only its own credentials permits the current processto read or replace its own credentials without the need for any form of locking– which simplifies things greatly. It can just call:
const struct cred *current_cred()
to get a pointer to its credentials structure, and it doesn’t have to releaseit afterwards.
There are convenience wrappers for retrieving specific aspects of a task’scredentials (the value is simply returned in each case):
uid_t current_uid(void) Current's real UIDgid_t current_gid(void) Current's real GIDuid_t current_euid(void) Current's effective UIDgid_t current_egid(void) Current's effective GIDuid_t current_fsuid(void) Current's file access UIDgid_t current_fsgid(void) Current's file access GIDkernel_cap_t current_cap(void) Current's effective capabilitiesvoid *current_security(void) Current's LSM security pointerstruct user_struct *current_user(void) Current's user account
There are also convenience wrappers for retrieving specific associated pairs ofa task’s credentials:
void current_uid_gid(uid_t *, gid_t *);void current_euid_egid(uid_t *, gid_t *);void current_fsuid_fsgid(uid_t *, gid_t *);
which return these pairs of values through their arguments after retrievingthem from the current task’s credentials.
In addition, there is a function for obtaining a reference on the currentprocess’s current set of credentials:
const struct cred *get_current_cred(void);
and functions for getting references to one of the credentials that don’tactually live in struct cred:
struct user_struct *get_current_user(void);struct group_info *get_current_groups(void);
which get references to the current process’s user accounting structure andsupplementary groups list respectively.
Once a reference has been obtained, it must be released withput_cred(),free_uid() orput_group_info() as appropriate.
Accessing Another Task’s Credentials¶
While a task may access its own credentials without the need for locking, thesame is not true of a task wanting to access another task’s credentials. Itmust use the RCU read lock andrcu_dereference().
Thercu_dereference() is wrapped by:
const struct cred *__task_cred(struct task_struct *task);
This should be used inside the RCU read lock, as in the following example:
void foo(struct task_struct *t, struct foo_data *f){ const struct cred *tcred; ... rcu_read_lock(); tcred = __task_cred(t); f->uid = tcred->uid; f->gid = tcred->gid; f->groups = get_group_info(tcred->groups); rcu_read_unlock(); ...}Should it be necessary to hold another task’s credentials for a long period oftime, and possibly to sleep while doing so, then the caller should get areference on them using:
const struct cred *get_task_cred(struct task_struct *task);
This does all the RCU magic inside of it. The caller must call put_cred() onthe credentials so obtained when they’re finished with.
Note
The result of__task_cred() should not be passed directly toget_cred() as this may race withcommit_cred().
There are a couple of convenience functions to access bits of another task’scredentials, hiding the RCU magic from the caller:
uid_t task_uid(task) Task's real UIDuid_t task_euid(task) Task's effective UID
If the caller is holding the RCU read lock at the time anyway, then:
__task_cred(task)->uid__task_cred(task)->euid
should be used instead. Similarly, if multiple aspects of a task’s credentialsneed to be accessed, RCU read lock should be used,__task_cred() called,the result stored in a temporary pointer and then the credential aspects calledfrom that before dropping the lock. This prevents the potentially expensiveRCU magic from being invoked multiple times.
Should some other single aspect of another task’s credentials need to beaccessed, then this can be used:
task_cred_xxx(task, member)
where ‘member’ is a non-pointer member of the cred struct. For instance:
uid_t task_cred_xxx(task, suid);
will retrieve ‘struct cred::suid’ from the task, doing the appropriate RCUmagic. This may not be used for pointer members as what they point to maydisappear the moment the RCU read lock is dropped.
Altering Credentials¶
As previously mentioned, a task may only alter its own credentials, and may notalter those of another task. This means that it doesn’t need to use anylocking to alter its own credentials.
To alter the current process’s credentials, a function should first prepare anew set of credentials by calling:
struct cred *prepare_creds(void);
this locks current->cred_replace_mutex and then allocates and constructs aduplicate of the current process’s credentials, returning with the mutex stillheld if successful. It returns NULL if not successful (out of memory).
The mutex preventsptrace() from altering the ptrace state of a processwhile security checks on credentials construction and changing is taking placeas the ptrace state may alter the outcome, particularly in the case ofexecve().
The new credentials set should be altered appropriately, and any securitychecks and hooks done. Both the current and the proposed sets of credentialsare available for this purpose as current_cred() will return the current setstill at this point.
When replacing the group list, the new list must be sorted before itis added to the credential, as a binary search is used to test formembership. In practice, this means groups_sort() should becalled before set_groups() or set_current_groups().groups_sort() must not be called on astructgroup_list whichis shared as it may permute elements as part of the sorting processeven if the array is already sorted.
When the credential set is ready, it should be committed to the current processby calling:
int commit_creds(struct cred *new);
This will alter various aspects of the credentials and the process, giving theLSM a chance to do likewise, then it will usercu_assign_pointer() toactually commit the new credentials tocurrent->cred, it will releasecurrent->cred_replace_mutex to allowptrace() to take place, and itwill notify the scheduler and others of the changes.
This function is guaranteed to return 0, so that it can be tail-called at theend of such functions assys_setresuid().
Note that this function consumes the caller’s reference to the new credentials.The caller should _not_ callput_cred() on the new credentials afterwards.
Furthermore, once this function has been called on a new set of credentials,those credentials may _not_ be changed further.
Should the security checks fail or some other error occur afterprepare_creds() has been called, then the following function should beinvoked:
void abort_creds(struct cred *new);
This releases the lock oncurrent->cred_replace_mutex thatprepare_creds() got and then releases the new credentials.
A typical credentials alteration function would look something like this:
int alter_suid(uid_t suid){ struct cred *new; int ret; new = prepare_creds(); if (!new) return -ENOMEM; new->suid = suid; ret = security_alter_suid(new); if (ret < 0) { abort_creds(new); return ret; } return commit_creds(new);}Managing Credentials¶
There are some functions to help manage credentials:
voidput_cred(conststructcred*cred);This releases a reference to the given set of credentials. If thereference count reaches zero, the credentials will be scheduled fordestruction by the RCU system.
conststructcred*get_cred(conststructcred*cred);This gets a reference on a live set of credentials, returning a pointer tothat set of credentials.
structcred*get_new_cred(structcred*cred);This gets a reference on a set of credentials that is under constructionand is thus still mutable, returning a pointer to that set of credentials.
Open File Credentials¶
When a new file is opened, a reference is obtained on the opening task’scredentials and this is attached to the file struct asf_cred in place off_uid andf_gid. Code that used to accessfile->f_uid andfile->f_gid should now accessfile->f_cred->fsuid andfile->f_cred->fsgid.
It is safe to accessf_cred without the use of RCU or locking because thepointer will not change over the lifetime of the file struct, and nor will thecontents of the cred struct pointed to, barring the exceptions listed above(see the Task Credentials section).
To avoid “confused deputy” privilege escalation attacks, access control checksduring subsequent operations on an opened file should use these credentialsinstead of “current“‘s credentials, as the file may have been passed to a moreprivileged process.
Overriding the VFS’s Use of Credentials¶
Under some circumstances it is desirable to override the credentials used bythe VFS, and that can be done by calling into such asvfs_mkdir() with adifferent set of credentials. This is done in the following places:
sys_faccessat().do_coredump().- nfs4recover.c.