| eBPF | |
|---|---|
| Original authors | Alexei Starovoitov, Daniel Borkmann[1][2] |
| Developers | Open source community,Meta,Google, Isovalent,Microsoft,Netflix[1] |
| Initial release | 2014; 12 years ago (2014)[3] |
| Written in | C |
| Operating system | Linux,Windows[4] |
| Type | Runtime system |
| License | Linux:GPL Windows:MIT License |
| Website | ebpf.io |
| Repository | Linux:git Windows:github |
eBPF is a technology that can run programs in aprivileged context such as theoperating systemkernel.[5] It is the successor to theBerkeley Packet Filter (BPF, with the "e" originally meaning "extended") filtering mechanism in Linux and is also used in non-networking parts of the Linux kernel.
It is used to safely and efficiently extend the capabilities of the kernel atruntime without requiring changes to kernelsource code or loadingkernel modules.[6] Safety is provided through an in-kernel verifier which performsstatic code analysis and rejects programs which crash, hang or otherwise interfere with the kernel negatively.[7][8]
This validation model differs fromsandboxed environments, where the execution environment is restricted and the runtime has no insight about the program.[9] Examples of programs that are automatically rejected are programs without strong exit guarantees (i.e. for/while loops without exit conditions) and programs dereferencing pointers without safety checks.[10]
Loaded programs which passed the verifier are eitherinterpreted or in-kerneljust-in-time compiled (JIT compiled) for native execution performance. Theexecution model isevent-driven and with few exceptionsrun-to-completion,[2] meaning, programs can be attached to varioushook points in theoperating system kernel and are run upon triggering of an event. eBPF use cases include (but are not limited to)networking such asXDP,tracing andsecurity subsystems.[5] Given eBPF's efficiency and flexibility opened up new possibilities to solve production issues,Brendan Gregg famously dubbed eBPF "superpowers for Linux".[11]Linus Torvalds said, "BPF has actually been really useful, and the real power of it is how it allows people to do specialized code that isn't enabled until asked for".[12] Due to its success in Linux, the eBPFruntime has been ported to other operating systems such asWindows.[4]
eBPF evolved from the classic Berkeley Packet Filter (cBPF, a retroactively-applied name). At the most basic level, it introduced the use of ten 64-bit registers (instead of two 32-bit long registers for cBPF), different jump semantics, a call instruction and corresponding register passing convention, new instructions, and a different encoding for these instructions.[13]
| Date | Event |
|---|---|
| April 2011 | The first in-kernel Linuxjust-in-time compiler (JIT compiler) for the classic Berkeley Packet Filter was merged.[14] |
| January 2012 | The first non-networking use case of the classic Berkeley Packet Filter,seccomp-bpf,[15] appeared; it allows filtering ofsystem calls using a configurable policy implemented through BPF instructions. |
| March 2014 | David S. Miller, primary maintainer of the Linux networking stack, accepted the rework of the old in-kernel BPFinterpreter. It was replaced by an eBPF interpreter and the Linux kernel internally translates classic BPF (cBPF) into eBPF instructions.[16] It was released in version 3.18 of the Linux kernel.[17] |
| March 2015 | The ability to attach eBPF tokprobes as firsttracing use case was merged.[19] In the same month, initial infrastructure work got accepted to attach eBPF to the networking traffic control (tc) layer allowing to attach eBPF to the core ingress and later also egress paths of the network stack, later heavily used by projects such asCilium.[20][21][22] |
| August 2015 | The eBPFcompiler backend got merged intoLLVM 3.7.0 release.[23] |
| September 2015 | Brendan Gregg announced a collection of new eBPF-based tracing tools as the bcc project, providing a front-end for eBPF to make it easier to write programs.[24] |
| July 2016 | eBPF got the ability to be attached into network driver's core receive path. This layer is known today aseXpress DataPath (XDP) and was added as a response toDPDK to create a fast data path which works in combination with the Linux kernel rather than bypassing it.[25][26][27] |
| August 2016 | Cilium was initially announced duringLinuxCon as a project providing fastIPv6 container networking with eBPF and XDP. Today, Cilium has been adopted by major cloud provider'sKubernetes offerings and is one of the most widely used CNIs.[28][22][29] |
| November 2016 | Netronome added offload of eBPF programs for XDP and tc BPF layer to their NIC.[30] |
| May 2017 | Meta's layer 4 load-balancer, Katran, went live. Every packet towardsfacebook.com since then has been processed by eBPF & XDP.[31] |
| November 2017 | eBPF becomes its own kernel subsystem to ease the continuously growing kernel patch management. The first pull request by eBPF maintainers was submitted.[32] |
| September 2017 | Bpftool was added to the Linux kernel as a user space utility to introspect the eBPF subsystem.[33] |
| January 2018 | A new socket family called AF_XDP was published, allowing for high performance packet processing with zero-copy semantics at the XDP layer.[34] Today,DPDK has an official AF_XDP poll-mode driver support.[35] |
| February 2018 | The bpfilter prototype has been published, allowing translation of a subset of iptables rulesets into eBPF via a newly developed user mode driver. The work has caused controversies due to the ongoing nftables development effort and has not been merged into mainline.[36][37] |
| October 2018 | The new bpftrace tool has been announced byBrendan Gregg asDTrace 2.0 for Linux.[38] |
| November 2018 | eBPF introspection has been added forkTLS in order to support the ability for in-kernel TLS policy enforcement.[39] |
| November 2018 | BTF (BPF Type Format) has been added to the Linux kernel as an efficient meta data format which is approximately 100x smaller in size thanDWARF.[40] |
| December 2019 | The first 880-page long book on BPF, written by Brendan Gregg, was released.[41] |
| March 2020 | Google upstreamed BPF LSM support into the Linux kernel, enabling programmableLinux Security Modules (LSMs) through eBPF.[42] |
| September 2020 | The eBPF compiler backend forGNU Compiler Collection (GCC) was merged.[43] |
| July 2022 | Microsoft released eBPF for Windows, which runs code in the NT kernel.[4] |
| October 2024 | The eBPFinstruction set architecture (ISA) is published as RFC 9669. |
eBPF maps are efficientkey/value stores that reside inkernel space and can be used to share data among multiple eBPF programs or to communicate between a user space application and eBPF code running in the kernel. eBPF programs can leverage eBPF maps to store and retrieve data in a wide set of data structures. Map implementations are provided by the core kernel. There are various types,[44] including hash maps, arrays, and ring buffers.
In practice, eBPF maps are typically used for scenarios such as a user space program writing configuration information to be retrieved by an eBPF program, an eBPF program storing state for later retrieval by another eBPF program (or a future run of the same program), or an eBPF program writing results or metrics into a map for retrieval by a user space program that will present results.[45]
The eBPFvirtual machine runs within the kernel and takes in a program in the form of eBPFbytecode instructions which are converted tonative machine instructions that run on the CPU. Early implementations of eBPF saw eBPF bytecode interpreted, but this has now been replaced with aJust-in-Time (JIT) compilation process for performance and security-related reasons.[45]
The eBPF virtual machine consists of eleven 64-bit registers with 32-bit subregisters, aprogram counter and a 512-byte large BPF stack space. These general purposeregisters keep track of state when eBPF programs are executed.[46]
Tail calls can call and execute another eBPF program and replace theexecution context, similar to how theexecve() system call operates for regular processes. This basically allows an eBPF program to call another eBPF program. Tail calls are implemented as a long jump, reusing the samestack frame. Tail calls are particularly useful in eBPF, where the stack is limited to 512 bytes. During runtime, functionality can be added or replaced atomically, thus altering the BPF program’s execution behavior.[46] A popular use case for tail calls is to spread the complexity of eBPF programs over several programs. Another use case is for replacing or extending logic by replacing the contents of the program array while it is in use. For example, to update a program version withoutdowntime or to enable/disable logic.[47]
It is generally considered good practice in software development to group common code into afunction, encapsulating logic for reusability. Prior to Linux kernel 4.16 and LLVM 6.0, a typical eBPF C program had to explicitly direct the compiler toinline a function, resulting in a BPF object file that had duplicate functions. This restriction was lifted, and mainstream eBPF compilers now support writing functions naturally in eBPF programs. This reduces the generated eBPF code size, making it friendlier to a CPU instruction cache.[45][46]
The verifier is a core component of eBPF, and its main responsibility is to ensure that an eBPF program is safe to execute. It performs a static analysis of the eBPF bytecode to guarantee its safety. The verifier analyzes the program to assess all possible execution paths. It steps through the instructions in order and evaluates them. The verification process starts with adepth-first search through all possible paths of the program, the verifier simulates the execution of each instruction usingabstract interpretation,[48] tracking the state of registers and stack if any instruction could lead to an unsafe state, verification fails. This process continues until all paths have been analyzed or a violation is found. Depending on the type of program, the verifier checks for violations of specific rules. These rules can include checking that an eBPF program always terminates within a reasonable amount of time (noinfinite loops or infiniterecursion), checking that an eBPF program is not allowed to read arbitrary memory because being able to arbitrary read memory could allow a program to leak sensitive information, checking that network programs are not allowed to access memory outside ofpacket bounds because adjacent memory could contain sensitive information, checking that programs are not allowed todeadlock, so any heldspinlocks must be released and only one lock can be held at a time to avoid deadlocks over multiple programs, checking that programs are not allowed to read uninitialized memory. This is not an exhaustive list of the checks the verifier does, and there are exceptions to these rules. An example is that tracing programs have access to helpers that allow them to read memory in a controlled way, but these program types requireroot privileges and thus do not pose a security risk.[47][45]
Over time the eBPF verifier has evolved to include newer features and optimizations, such as support for bounded loops,dead-code elimination, function-by-function verification, andcallbacks.
eBPF programs use the memory anddata structures from the kernel. Some structures can be modified between different kernel versions, altering the memory layout. Since the Linux kernel is continuously developed, there is no guarantee that the internal data structures will remain the same across different versions. CO-RE is a fundamental concept in modern eBPF development that allows eBPF programs to be portable across different kernel versions and configurations. It addresses the challenge of kernel structure variations between differentLinux distributions andversions. CO-RE comprises BTF (BPF Type Format) - ametadata format that describes the types used in the kernel and eBPF programs and provides detailed information about struct layouts, field offsets, and data types. It enables runtime accessibility of kernel types, which is crucial for BPF program development and verification. BTF is included in the kernel image of BTF-enabled kernels. Special relocations are emitted by thecompiler (e.g., LLVM). These relocations capture high-level descriptions of what information the eBPF program intends to access. Thelibbpf library adapts eBPF programs to work with the data structure layout on the target kernel where they run, even if this layout is different from the kernel where the code was compiled. To do this, libbpf needs the BPF CO-RE relocation information generated by Clang as part of the compilation process.[45] The compiled eBPF program is stored in anELF (Executable and Linkable Format)object file. This file contains BTF-type information andClang-generated relocations. The ELF format allows the eBPF loader (e.g., libbpf) to process and adjust the BPF program dynamically for the targetkernel.[49]
The alias eBPF is often interchangeably used with BPF,[2][50] for example by the Linux kernel community. eBPF and BPF is referred to as a technology name likeLLVM.[2] eBPF evolved from the machine language for the filtering virtual machine in theBerkeley Packet Filter as an extended version, but as its use cases outgrew networking, today "eBPF" is preferentially interpreted as apseudo-acronym.[2]
Thebee is the official logo for eBPF. At the first eBPF Summit there was a vote taken and the beemascot was named "eBee".[51][52] The logo was originally created by Vadim Shchekoldin.[52] Earlier unofficial eBPF mascots have existed in the past,[53] but have not seen widespread adoption.
The eBPF Foundation was created in August 2021 with the goal to expand the contributions being made to extend the powerful capabilities of eBPF and grow beyond Linux.[1] Founding members includeMeta,Google, Isovalent,Microsoft andNetflix. The purpose is to raise, budget, and spend funds in support of various open source, open data and/or open standards projects relating to eBPF technologies[54] to further drive the growth and adoption of the eBPF ecosystem. Since inception,Red Hat,Huawei,Crowdstrike, Tigera, DaoCloud, Datoms, FutureWei also joined.[55]
eBPF has been adopted by a number of large-scale production users, for example:
Due to the ease of programmability, eBPF has been used as a tool for implementing microarchitectural timingside-channel attacks such asSpectre against vulnerablemicroprocessors.[100] While unprivileged eBPF implements mitigations against Spectre v1, v2, and v4 forx86-64,[101][102] unprivileged use has ultimately been disabled by the kernel community by default to protect users of unsupported architectures and limit the impact of future hardware vulnerabilities.[103] On x86-64, Spectre v1 is mitigated through a combination of branchless bounds-enforcement (e.g., masking instructions) and the verification of speculative execution paths. Spectre v4 is mitigated exclusively through speculation barriers (i.e., lfence) and Spectre v2 is mitigated through retpoline when available[104] or speculation barriers. These mitigations prevent sensitive information owned by the kernel (e.g., kernel addresses) from being leaked by malicious eBPF programs, but are not designed to prevent innocuous eBPF programs from accidentally leaking sensitive information they own/process (e.g., cryptographic keys stored as numbers).[102]