Adding reference counters (krefs) to kernel objects

Author:

Corey Minyard <minyard@acm.org>

Author:

Thomas Hellström <thomas.hellstrom@linux.intel.com>

A lot of this was lifted from Greg Kroah-Hartman’s 2004 OLS paper andpresentation on krefs, which can be found at:

Introduction

krefs allow you to add reference counters to your objects. If youhave objects that are used in multiple places and passed around, andyou don’t have refcounts, your code is almost certainly broken. Ifyou want refcounts, krefs are the way to go.

To use a kref, add one to your data structures like:

struct my_data{    .    .    struct kref refcount;    .    .};

The kref can occur anywhere within the data structure.

Initialization

You must initialize the kref after you allocate it. To do this, callkref_init as so:

struct my_data *data;data = kmalloc(sizeof(*data), GFP_KERNEL);if (!data)       return -ENOMEM;kref_init(&data->refcount);

This sets the refcount in the kref to 1.

Kref rules

Once you have an initialized kref, you must follow the followingrules:

  1. If you make a non-temporary copy of a pointer, especially ifit can be passed to another thread of execution, you mustincrement the refcount withkref_get() before passing it off:

    kref_get(&data->refcount);

    If you already have a valid pointer to a kref-ed structure (therefcount cannot go to zero) you may do this without a lock.

  2. When you are done with a pointer, you must callkref_put():

    kref_put(&data->refcount, data_release);

    If this is the last reference to the pointer, the releaseroutine will be called. If the code never tries to geta valid pointer to a kref-ed structure without alreadyholding a valid pointer, it is safe to do this withouta lock.

  3. If the code attempts to gain a reference to a kref-ed structurewithout already holding a valid pointer, it must serialize accesswhere akref_put() cannot occur during thekref_get(), and thestructure must remain valid during thekref_get().

For example, if you allocate some data and then pass it to anotherthread to process:

void data_release(struct kref *ref){    struct my_data *data = container_of(ref, struct my_data, refcount);    kfree(data);}void more_data_handling(void *cb_data){    struct my_data *data = cb_data;    .    . do stuff with data here    .    kref_put(&data->refcount, data_release);}int my_data_handler(void){    int rv = 0;    struct my_data *data;    struct task_struct *task;    data = kmalloc(sizeof(*data), GFP_KERNEL);    if (!data)            return -ENOMEM;    kref_init(&data->refcount);    kref_get(&data->refcount);    task = kthread_run(more_data_handling, data, "more_data_handling");    if (task == ERR_PTR(-ENOMEM)) {            rv = -ENOMEM;            kref_put(&data->refcount, data_release);            goto out;    }    .    . do stuff with data here    .out:    kref_put(&data->refcount, data_release);    return rv;}

This way, it doesn’t matter what order the two threads handle thedata, thekref_put() handles knowing when the data is not referencedany more and releasing it. Thekref_get() does not require a lock,since we already have a valid pointer that we own a refcount for. Theput needs no lock because nothing tries to get the data withoutalready holding a pointer.

In the above example,kref_put() will be called 2 times in both successand error paths. This is necessary because the reference count gotincremented 2 times bykref_init() andkref_get().

Note that the “before” in rule 1 is very important. You should neverdo something like:

task = kthread_run(more_data_handling, data, "more_data_handling");if (task == ERR_PTR(-ENOMEM)) {        rv = -ENOMEM;        goto out;} else        /* BAD BAD BAD - get is after the handoff */        kref_get(&data->refcount);

Don’t assume you know what you are doing and use the above construct.First of all, you may not know what you are doing. Second, you mayknow what you are doing (there are some situations where locking isinvolved where the above may be legal) but someone else who doesn’tknow what they are doing may change the code or copy the code. It’sbad style. Don’t do it.

There are some situations where you can optimize the gets and puts.For instance, if you are done with an object and enqueuing it forsomething else or passing it off to something else, there is no reasonto do a get then a put:

/* Silly extra get and put */kref_get(&obj->ref);enqueue(obj);kref_put(&obj->ref, obj_cleanup);

Just do the enqueue. A comment about this is always welcome:

enqueue(obj);/* We are done with obj, so we pass our refcount off   to the queue.  DON'T TOUCH obj AFTER HERE! */

The last rule (rule 3) is the nastiest one to handle. Say, forinstance, you have a list of items that are each kref-ed, and you wishto get the first one. You can’t just pull the first item off the listandkref_get() it. That violates rule 3 because you are not alreadyholding a valid pointer. You must add a mutex (or some other lock).For instance:

static DEFINE_MUTEX(mutex);static LIST_HEAD(q);struct my_data{        struct kref      refcount;        struct list_head link;};static struct my_data *get_entry(){        struct my_data *entry = NULL;        mutex_lock(&mutex);        if (!list_empty(&q)) {                entry = container_of(q.next, struct my_data, link);                kref_get(&entry->refcount);        }        mutex_unlock(&mutex);        return entry;}static void release_entry(struct kref *ref){        struct my_data *entry = container_of(ref, struct my_data, refcount);        list_del(&entry->link);        kfree(entry);}static void put_entry(struct my_data *entry){        mutex_lock(&mutex);        kref_put(&entry->refcount, release_entry);        mutex_unlock(&mutex);}

Thekref_put() return value is useful if you do not want to hold thelock during the whole release operation. Say you didn’t want to callkfree() with the lock held in the example above (since it is kind ofpointless to do so). You could usekref_put() as follows:

static void release_entry(struct kref *ref){        /* All work is done after the return from kref_put(). */}static void put_entry(struct my_data *entry){        mutex_lock(&mutex);        if (kref_put(&entry->refcount, release_entry)) {                list_del(&entry->link);                mutex_unlock(&mutex);                kfree(entry);        } else                mutex_unlock(&mutex);}

This is really more useful if you have to call other routines as partof the free operations that could take a long time or might claim thesame lock. Note that doing everything in the release routine is stillpreferred as it is a little neater.

The above example could also be optimized usingkref_get_unless_zero() inthe following way:

static struct my_data *get_entry(){        struct my_data *entry = NULL;        mutex_lock(&mutex);        if (!list_empty(&q)) {                entry = container_of(q.next, struct my_data, link);                if (!kref_get_unless_zero(&entry->refcount))                        entry = NULL;        }        mutex_unlock(&mutex);        return entry;}static void release_entry(struct kref *ref){        struct my_data *entry = container_of(ref, struct my_data, refcount);        mutex_lock(&mutex);        list_del(&entry->link);        mutex_unlock(&mutex);        kfree(entry);}static void put_entry(struct my_data *entry){        kref_put(&entry->refcount, release_entry);}

Which is useful to remove the mutex lock aroundkref_put() input_entry(), butit’s important that kref_get_unless_zero is enclosed in the same criticalsection that finds the entry in the lookup table,otherwise kref_get_unless_zero may reference already freed memory.Note that it is illegal to use kref_get_unless_zero without checking itsreturn value. If you are sure (by already having a valid pointer) thatkref_get_unless_zero() will return true, then usekref_get() instead.

Krefs and RCU

The function kref_get_unless_zero also makes it possible to use rculocking for lookups in the above example:

struct my_data{        struct rcu_head rhead;        .        struct kref refcount;        .        .};static struct my_data *get_entry_rcu(){        struct my_data *entry = NULL;        rcu_read_lock();        if (!list_empty(&q)) {                entry = container_of(q.next, struct my_data, link);                if (!kref_get_unless_zero(&entry->refcount))                        entry = NULL;        }        rcu_read_unlock();        return entry;}static void release_entry_rcu(struct kref *ref){        struct my_data *entry = container_of(ref, struct my_data, refcount);        mutex_lock(&mutex);        list_del_rcu(&entry->link);        mutex_unlock(&mutex);        kfree_rcu(entry, rhead);}static void put_entry(struct my_data *entry){        kref_put(&entry->refcount, release_entry_rcu);}

But note that thestructkref member needs to remain in valid memory for arcu grace period after release_entry_rcu was called. That can be accomplishedby using kfree_rcu(entry, rhead) as done above, or by callingsynchronize_rcu()before using kfree, but note thatsynchronize_rcu() may sleep for asubstantial amount of time.

Functions and structures

voidkref_init(structkref*kref)

initialize object.

Parameters

structkref*kref

object in question.

voidkref_get(structkref*kref)

increment refcount for object.

Parameters

structkref*kref

object.

intkref_put(structkref*kref,void(*release)(structkref*kref))

Decrement refcount for object

Parameters

structkref*kref

Object

void(*release)(structkref*kref)

Pointer to the function that will clean up the object when thelast reference to the object is released.

Description

Decrement the refcount, and if 0, callrelease. The caller may notpass NULL orkfree() as the release function.

Return

1 if this call removed the object, otherwise return 0. Beware,if this function returns 0, another caller may have removed the objectby the time this function returns. The return value is only certainif you want to see if the object is definitely released.

intkref_put_mutex(structkref*kref,void(*release)(structkref*kref),structmutex*mutex)

Decrement refcount for object

Parameters

structkref*kref

Object

void(*release)(structkref*kref)

Pointer to the function that will clean up the object when thelast reference to the object is released.

structmutex*mutex

Mutex which protects the release function.

Description

This variant ofkref_lock() calls therelease function with themutexheld. Therelease function will release the mutex.

intkref_put_lock(structkref*kref,void(*release)(structkref*kref),spinlock_t*lock)

Decrement refcount for object

Parameters

structkref*kref

Object

void(*release)(structkref*kref)

Pointer to the function that will clean up the object when thelast reference to the object is released.

spinlock_t*lock

Spinlock which protects the release function.

Description

This variant ofkref_lock() calls therelease function with thelockheld. Therelease function will release the lock.

intkref_get_unless_zero(structkref*kref)

Increment refcount for object unless it is zero.

Parameters

structkref*kref

object.

Description

This function is intended to simplify locking around refcounting forobjects that can be looked up from a lookup structure, and which areremoved from that lookup structure in the object destructor.Operations on such objects require at least a read lock aroundlookup + kref_get, and a write lock around kref_put + remove from lookupstructure. Furthermore, RCU implementations become extremely tricky.With a lookup followed by a kref_get_unless_zerowith return value checklocking in the kref_put path can be deferred to the actual removal fromthe lookup structure and RCU lookups become trivial.

Return

non-zero if the increment succeeded. Otherwise return 0.