BPF_MAP_TYPE_CGROUP_STORAGE

TheBPF_MAP_TYPE_CGROUP_STORAGE map type represents a local fix-sizedstorage. It is only available withCONFIG_CGROUP_BPF, and to programs thatattach to cgroups; the programs are made available by the same Kconfig. Thestorage is identified by the cgroup the program is attached to.

The map provide a local storage at the cgroup that the BPF program is attachedto. It provides a faster and simpler access than the general purpose hashtable, which performs a hash table lookups, and requires user to track livecgroups on their own.

This document describes the usage and semantics of theBPF_MAP_TYPE_CGROUP_STORAGE map type. Some of its behaviors was changed inLinux 5.9 and this document will describe the differences.

Usage

The map uses key of type of either__u64cgroup_inode_id orstructbpf_cgroup_storage_key, declared inlinux/bpf.h:

struct bpf_cgroup_storage_key {        __u64 cgroup_inode_id;        __u32 attach_type;};

cgroup_inode_id is the inode id of the cgroup directory.attach_type is the the program’s attach type.

Linux 5.9 added support for type__u64cgroup_inode_id as the key type.When this key type is used, then all attach types of the particular cgroup andmap will share the same storage. Otherwise, if the type isstructbpf_cgroup_storage_key, then programs of different attach typesbe isolated and see different storages.

To access the storage in a program, usebpf_get_local_storage:

void *bpf_get_local_storage(void *map, u64 flags)

flags is reserved for future use and must be 0.

There is no implicit synchronization. Storages ofBPF_MAP_TYPE_CGROUP_STORAGEcan be accessed by multiple programs across different CPUs, and user shouldtake care of synchronization by themselves. The bpf infrastructure providesstructbpf_spin_lock to synchronize the storage. Seetools/testing/selftests/bpf/progs/test_spin_lock.c.

Examples

Usage with key type asstructbpf_cgroup_storage_key:

#include <bpf/bpf.h>struct {        __uint(type, BPF_MAP_TYPE_CGROUP_STORAGE);        __type(key, struct bpf_cgroup_storage_key);        __type(value, __u32);} cgroup_storage SEC(".maps");int program(struct __sk_buff *skb){        __u32 *ptr = bpf_get_local_storage(&cgroup_storage, 0);        __sync_fetch_and_add(ptr, 1);        return 0;}

Userspace accessing map declared above:

#include <linux/bpf.h>#include <linux/libbpf.h>__u32 map_lookup(struct bpf_map *map, __u64 cgrp, enum bpf_attach_type type){        struct bpf_cgroup_storage_key = {                .cgroup_inode_id = cgrp,                .attach_type = type,        };        __u32 value;        bpf_map_lookup_elem(bpf_map__fd(map), &key, &value);        // error checking omitted        return value;}

Alternatively, using just__u64cgroup_inode_id as key type:

#include <bpf/bpf.h>struct {        __uint(type, BPF_MAP_TYPE_CGROUP_STORAGE);        __type(key, __u64);        __type(value, __u32);} cgroup_storage SEC(".maps");int program(struct __sk_buff *skb){        __u32 *ptr = bpf_get_local_storage(&cgroup_storage, 0);        __sync_fetch_and_add(ptr, 1);        return 0;}

And userspace:

#include <linux/bpf.h>#include <linux/libbpf.h>__u32 map_lookup(struct bpf_map *map, __u64 cgrp, enum bpf_attach_type type){        __u32 value;        bpf_map_lookup_elem(bpf_map__fd(map), &cgrp, &value);        // error checking omitted        return value;}

Semantics

BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE is a variant of this map type. Thisper-CPU variant will have different memory regions for each CPU for eachstorage. The non-per-CPU will have the same memory region for each storage.

Prior to Linux 5.9, the lifetime of a storage is precisely per-attachment, andfor a singleCGROUP_STORAGE map, there can be at most one program loadedthat uses the map. A program may be attached to multiple cgroups or havemultiple attach types, and each attach creates a fresh zeroed storage. Thestorage is freed upon detach.

There is a one-to-one association between the map of each type (per-CPU andnon-per-CPU) and the BPF program during load verification time. As a result,each map can only be used by one BPF program and each BPF program can only useone storage map of each type. Because of map can only be used by one BPFprogram, sharing of this cgroup’s storage with other BPF programs wereimpossible.

Since Linux 5.9, storage can be shared by multiple programs. When a program isattached to a cgroup, the kernel would create a new storage only if the mapdoes not already contain an entry for the cgroup and attach type pair, or elsethe old storage is reused for the new attachment. If the map is attach typeshared, then attach type is simply ignored during comparison. Storage is freedonly when either the map or the cgroup attached to is being freed. Detachingwill not directly free the storage, but it may cause the reference to the mapto reach zero and indirectly freeing all storage in the map.

The map is not associated with any BPF program, thus making sharing possible.However, the BPF program can still only associate with one map of each type(per-CPU and non-per-CPU). A BPF program cannot use more than oneBPF_MAP_TYPE_CGROUP_STORAGE or more than oneBPF_MAP_TYPE_PERCPU_CGROUP_STORAGE.

In all versions, userspace may use the the attach parameters of cgroup andattach type pair instructbpf_cgroup_storage_key as the key to the BPF mapAPIs to read or update the storage for a given attachment. For Linux 5.9attach type shared storages, only the first value in the struct, cgroup inodeid, is used during comparison, so userspace may just specify a__u64directly.

The storage is bound at attach time. Even if the program is attached to parentand triggers in child, the storage still belongs to the parent.

Userspace cannot create a new entry in the map or delete an existing entry.Program test runs always use a temporary storage.