| .. SPDX-License-Identifier: GPL-2.0-only | 
 | .. Copyright (C) 2020 Google LLC. | 
 |  | 
 | =========================== | 
 | BPF_MAP_TYPE_CGROUP_STORAGE | 
 | =========================== | 
 |  | 
 | The ``BPF_MAP_TYPE_CGROUP_STORAGE`` map type represents a local fix-sized | 
 | storage. It is only available with ``CONFIG_CGROUP_BPF``, and to programs that | 
 | attach to cgroups; the programs are made available by the same Kconfig. The | 
 | storage is identified by the cgroup the program is attached to. | 
 |  | 
 | The map provide a local storage at the cgroup that the BPF program is attached | 
 | to. It provides a faster and simpler access than the general purpose hash | 
 | table, which performs a hash table lookups, and requires user to track live | 
 | cgroups on their own. | 
 |  | 
 | This document describes the usage and semantics of the | 
 | ``BPF_MAP_TYPE_CGROUP_STORAGE`` map type. Some of its behaviors was changed in | 
 | Linux 5.9 and this document will describe the differences. | 
 |  | 
 | Usage | 
 | ===== | 
 |  | 
 | The map uses key of type of either ``__u64 cgroup_inode_id`` or | 
 | ``struct bpf_cgroup_storage_key``, declared in ``linux/bpf.h``:: | 
 |  | 
 |     struct bpf_cgroup_storage_key { | 
 |             __u64 cgroup_inode_id; | 
 |             __u32 attach_type; | 
 |     }; | 
 |  | 
 | ``cgroup_inode_id`` is the inode id of the cgroup directory. | 
 | ``attach_type`` is the program's attach type. | 
 |  | 
 | Linux 5.9 added support for type ``__u64 cgroup_inode_id`` as the key type. | 
 | When this key type is used, then all attach types of the particular cgroup and | 
 | map will share the same storage. Otherwise, if the type is | 
 | ``struct bpf_cgroup_storage_key``, then programs of different attach types | 
 | be isolated and see different storages. | 
 |  | 
 | To access the storage in a program, use ``bpf_get_local_storage``:: | 
 |  | 
 |     void *bpf_get_local_storage(void *map, u64 flags) | 
 |  | 
 | ``flags`` is reserved for future use and must be 0. | 
 |  | 
 | There is no implicit synchronization. Storages of ``BPF_MAP_TYPE_CGROUP_STORAGE`` | 
 | can be accessed by multiple programs across different CPUs, and user should | 
 | take care of synchronization by themselves. The bpf infrastructure provides | 
 | ``struct bpf_spin_lock`` to synchronize the storage. See | 
 | ``tools/testing/selftests/bpf/progs/test_spin_lock.c``. | 
 |  | 
 | Examples | 
 | ======== | 
 |  | 
 | Usage with key type as ``struct bpf_cgroup_storage_key``:: | 
 |  | 
 |     #include <bpf/bpf.h> | 
 |  | 
 |     struct { | 
 |             __uint(type, BPF_MAP_TYPE_CGROUP_STORAGE); | 
 |             __type(key, struct bpf_cgroup_storage_key); | 
 |             __type(value, __u32); | 
 |     } cgroup_storage SEC(".maps"); | 
 |  | 
 |     int program(struct __sk_buff *skb) | 
 |     { | 
 |             __u32 *ptr = bpf_get_local_storage(&cgroup_storage, 0); | 
 |             __sync_fetch_and_add(ptr, 1); | 
 |  | 
 |             return 0; | 
 |     } | 
 |  | 
 | Userspace accessing map declared above:: | 
 |  | 
 |     #include <linux/bpf.h> | 
 |     #include <linux/libbpf.h> | 
 |  | 
 |     __u32 map_lookup(struct bpf_map *map, __u64 cgrp, enum bpf_attach_type type) | 
 |     { | 
 |             struct bpf_cgroup_storage_key = { | 
 |                     .cgroup_inode_id = cgrp, | 
 |                     .attach_type = type, | 
 |             }; | 
 |             __u32 value; | 
 |             bpf_map_lookup_elem(bpf_map__fd(map), &key, &value); | 
 |             // error checking omitted | 
 |             return value; | 
 |     } | 
 |  | 
 | Alternatively, using just ``__u64 cgroup_inode_id`` as key type:: | 
 |  | 
 |     #include <bpf/bpf.h> | 
 |  | 
 |     struct { | 
 |             __uint(type, BPF_MAP_TYPE_CGROUP_STORAGE); | 
 |             __type(key, __u64); | 
 |             __type(value, __u32); | 
 |     } cgroup_storage SEC(".maps"); | 
 |  | 
 |     int program(struct __sk_buff *skb) | 
 |     { | 
 |             __u32 *ptr = bpf_get_local_storage(&cgroup_storage, 0); | 
 |             __sync_fetch_and_add(ptr, 1); | 
 |  | 
 |             return 0; | 
 |     } | 
 |  | 
 | And userspace:: | 
 |  | 
 |     #include <linux/bpf.h> | 
 |     #include <linux/libbpf.h> | 
 |  | 
 |     __u32 map_lookup(struct bpf_map *map, __u64 cgrp, enum bpf_attach_type type) | 
 |     { | 
 |             __u32 value; | 
 |             bpf_map_lookup_elem(bpf_map__fd(map), &cgrp, &value); | 
 |             // error checking omitted | 
 |             return value; | 
 |     } | 
 |  | 
 | Semantics | 
 | ========= | 
 |  | 
 | ``BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE`` is a variant of this map type. This | 
 | per-CPU variant will have different memory regions for each CPU for each | 
 | storage. The non-per-CPU will have the same memory region for each storage. | 
 |  | 
 | Prior to Linux 5.9, the lifetime of a storage is precisely per-attachment, and | 
 | for a single ``CGROUP_STORAGE`` map, there can be at most one program loaded | 
 | that uses the map. A program may be attached to multiple cgroups or have | 
 | multiple attach types, and each attach creates a fresh zeroed storage. The | 
 | storage is freed upon detach. | 
 |  | 
 | There is a one-to-one association between the map of each type (per-CPU and | 
 | non-per-CPU) and the BPF program during load verification time. As a result, | 
 | each map can only be used by one BPF program and each BPF program can only use | 
 | one storage map of each type. Because of map can only be used by one BPF | 
 | program, sharing of this cgroup's storage with other BPF programs were | 
 | impossible. | 
 |  | 
 | Since Linux 5.9, storage can be shared by multiple programs. When a program is | 
 | attached to a cgroup, the kernel would create a new storage only if the map | 
 | does not already contain an entry for the cgroup and attach type pair, or else | 
 | the old storage is reused for the new attachment. If the map is attach type | 
 | shared, then attach type is simply ignored during comparison. Storage is freed | 
 | only when either the map or the cgroup attached to is being freed. Detaching | 
 | will not directly free the storage, but it may cause the reference to the map | 
 | to reach zero and indirectly freeing all storage in the map. | 
 |  | 
 | The map is not associated with any BPF program, thus making sharing possible. | 
 | However, the BPF program can still only associate with one map of each type | 
 | (per-CPU and non-per-CPU). A BPF program cannot use more than one | 
 | ``BPF_MAP_TYPE_CGROUP_STORAGE`` or more than one | 
 | ``BPF_MAP_TYPE_PERCPU_CGROUP_STORAGE``. | 
 |  | 
 | In all versions, userspace may use the attach parameters of cgroup and | 
 | attach type pair in ``struct bpf_cgroup_storage_key`` as the key to the BPF map | 
 | APIs to read or update the storage for a given attachment. For Linux 5.9 | 
 | attach type shared storages, only the first value in the struct, cgroup inode | 
 | id, is used during comparison, so userspace may just specify a ``__u64`` | 
 | directly. | 
 |  | 
 | The storage is bound at attach time. Even if the program is attached to parent | 
 | and triggers in child, the storage still belongs to the parent. | 
 |  | 
 | Userspace cannot create a new entry in the map or delete an existing entry. | 
 | Program test runs always use a temporary storage. |