Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up

bpftune uses BPF to auto-tune Linux systems

License

NotificationsYou must be signed in to change notification settings

oracle/bpftune

Repository files navigation

bpftune aims to provide lightweight, always-on auto-tuning of systembehaviour. The key benefit it provides are

  • by using BPF observability features, we can continuously monitorand adjust system behaviour
  • because we can observe system behaviour at a fine grain (ratherthan using coarse system-wide stats), we can tune at a finer graintoo (individual socket policies, individual device policies etc)

The problem

The Linux kernel contains a large number of tunables; theseoften take the form of sysctl(8) parameters, and are usuallyintroduced for situations where there is no one "right" answerfor a configuration choice. The number of tunables availableis quite daunting. On a 6.2 kernel we see

# sysctl --all 2>/dev/null|wc -l1624

See here for an excellent writeup on network-related tunables..

At the same time, individual systems get a lot less careand adminstrator attention than they used to; phrases like"cattle not pets" exemplify this. Given the modern cloudarchitectures used for most deployments, most systems neverhave any human adminstrator interaction after initialprovisioning; in fact given the scale requirements, thisis often an explicit design goal - "no ssh'ing in!".

These two observations are not unrelated; in an earlierera of fewer, larger systems, tuning by administrators wasmore feasible.

These trends - system complexity combined with minimaladmin interaction suggest a rethink in terms of tunablemanagement.

A lot of lore accumulates around these tunables, and to helpclarify why we developed bpftune, we will use a straw-manversion of the approach taken with tunables:

"find the set of magic numbers that will work for thesystem forever"

This is obviously a caricature of how administratorsapproach the problem, but it does highlight a criticalimplicit assumption - that systems are static.

And that gets to the "BPF" in bpftune; BPF provides meansto carry out low-overhead observability of systems. Sonot only can we observe the system and tune appropriately,we can also observe the effect of that tuning and re-tuneif necessary.

Key design principles

  • Minimize overhead. Use observability features sparingly; do nottrace very high frequency events.
  • Be explicit about policy changes providing both a "what" - whatchange was made - and a "why" - how does it help? syslog loggingmakes policy actions explicit with explanations
  • Get out of the way of the administrator. We can use BPFobservability to see if the admin sets tunable values that weare auto-tuning; if they do, we need to get out of the way anddisable auto-tuning of the related feature set.
  • Don't replace tunables with more tunables! bpftune is designed tobe zero configuration; there are no options, and we try to avoidmagic numbers where possible.
  • Use push-pull approaches. For example, with tcp buffer sizing,we often want to get out of the way of applications and bumpup tcp sndbuf and rcvbuf, but at a certain point we run therisk of exhausting TCP memory. We can however monitor if weare approaching TCP memory pressure and if so we can tune downvalues that we've tuned up. In this way, we can let the systemfind a balance between providing resources and exhausting them.In some cases, we won't need to tune up values; they may be fineas they are. But in other cases these limits block optimal performance,and if they are raised safely - with awareness of global memorylimits - we can get out the way of improved performance. Anotherconcern is that increasing buffer size leads to latency - tohandle that, we correlate buffer size changes and TCP smoothedround-trip time; if the correlation between these exceeds athreshold (0.7) we stop increasing buffer size.

Concepts

The key components are

  • tuners: each tuner manages tunables and handles events sentfrom BPF programs to userspace via the shared ring buffer.Each tuner has an associated set of tunables that it manages.

  • optional strategies: a tuner can specify multiple strategies;after running for a while a strategy times out and we assessif a better strategy is available. Each strategy specifies a

    • name
    • description
    • timeout
    • evaluation function
    • set of BPF program names in tuner associated with strategy

    Strategies are optional and should be set in the tuner init()method via bpftune_strategies_add(). See test/strategyfor a coded example. When a strategy times out, the variousevaluation functions are called and the highest-value evaluationdictates the next stratgey.

    Strategies provide a way of providing multiple schemes forauto-tuning the same set of tunables, where the choice isguided by an evaluation of the effectiveness of the strategies.

  • events specify a

    • tuner id: which tuner the event is destined for
    • a scenario: what happened
    • an associated netns (if supported)
    • information about the event (IP address etc)
  • the tuner then responds to the event guided by the active strategy;increase or decrease a tunable value, etc. Describing the eventin the log is key; this allows an admin to understand whatchanged and why.

Architecture

  • bpftune is a daemon which manages a set of .so plugin tuners;each of these is a shared object that is loaded on start-up.
  • tuners can be enabled or disabled; a tuner is automaticallydisabled if the admin changes associated tunables manually.
  • tuners share a global BPF ring buffer which allows posting ofevents from BPF programs to userspace. For example, if thesysctl tuner sees a systl being set, it posts an event.
  • each tuner has an associated id (set when it is loaded),and events posted contain the tuner id.
  • each tuner has a BPF component (built using a BPF skeleton)and a userspace component. The latter has init(), fini()and event_handler() entrypoints. When an event isreceived, the tuner id is used to identify the appropriateevent handler and its event_handler() callback function is run.
  • init, fini and event_handler functions are loaded from thetuner .so object.
  • BPF components should include bpftune.bpf.h; it containsthe common map definitions (ringbuf, etc) and shared variablessuch as learning rate and tuner ids that each tuner needs.

Supported tuners

  • TCP connection tuner: auto-tune choice of congestion control algorithm.Seebpftune-tcp-conn (8)
  • IP fragmentation tuner: auto-tune IP fragmentation memory limitsto support fragment reassembly. Seebpftune-ip-frag (8)
  • neighbour table tuner: auto-tune neighbour table sizes by growingtables when approaching full. Seebpftune-neigh (8)
  • sysctl tuner: monitor sysctl setting and if it collides with anauto-tuned sysctl value, disable the associated tuner. Seebpftune-sysctl (8)
  • TCP buffer tuner: auto-tune max and initial buffer sizes. Seebpftune-tcp-buffer (8)
  • net buffer tuner: auto-tune tunables related to core networking.Seebpftune-net-buffer (8)
  • netns tuner: notices addition and removal of network namespaces,which helps power namespace awareness for bpftune as a whole.Namespace awareness is important as we want to be able to auto-tunecontainers also. Seebpftune-netns (8)
  • UDP buffer tuner: auto-tune buffers relating to UDP. Seebpftune-udp-buffer (8)

Code organization

Both core bpftune.c and individual tuners use the libbpftune library.It handles logging, tuner init/fini, and BPF init/fini.

Each tuner shared object defines an init(), fini() and event_handler()function. These respectively set up and clean up BPF and handle eventsthat originate from the BPF code.

Getting Started

If building the repository manually, simply run

$ make ; sudo make install

at the top-level of the repository. bpftune also supports a

$ make pkg

target, which will make a bpftune RPM. See ./buildrpm/bpftune.spec

We can also build with non-standard libdir for distros which do notuse /usr/lib64 like CachyOS; in this case to install to /usr/libinstead

$ make libdir=lib$ sudo make install libdir=lib

To build the following packages are needed (names may vary by distro);

  • libbpf, libbpf-devel >= 0.6
  • libcap-devel
  • bpftool >= 4.18
  • libnl3-devel (on some distros like Debian libnl-route-3-dev is needed)
  • clang >= 11
  • llvm >= 11
  • python3-docutils

The bpf components in bpftune can be built via GCC BPF support.Seehttps://gcc.gnu.org/wiki/BPFBackEnd for details on the BPF backend.To build with gcc bpf, specify

$ GCC_BPF=bpf-unknown-none-gcc make

From the kernel side, the kernel needs to support BPF ring buffer(around the 5.6 kernel, though 5.4 is supported on Oracle Linuxas ring buffer support was backported), and kernel BTF isrequired (CONFIG_DEBUG_INFO_BTF=y). Verify /sys/kernel/btf/vmlinuxis present.

To enable bpftune as a service

$ sudo service bpftune start

...and to enable it by default

$ sudo systemctl enable bpftune

bpftune logs to syslog so /var/log/messages will contain detailsof any tuning carried out.

bpftune can also be run in the foreground as a program; to redirectoutput to stdout/stderr, run

$ sudo bpftune -s

On exit, bpftune will summarize any tuning done.

Queries of bpftune state can be done viabpftune -q.

Ansible Install Play

Information: If you are using an Fedora Upstream based Distribution you have to enable the correct repository based on the system you are using, because the libbpf-devel package is getting shipped on additional repository, based on the Distribution. You can look it up here:https://pkgs.org/search/?q=libbpf-devel

- name: bpftune download, build, install and start service  hosts: all  become: true  vars:    repo_url: "https://github.com/oracle/bpftune"    repo_dest: "/root/bpftune/"  tasks:    - name: Install git and system independent build requirements if not present      ansible.builtin.package:        name:          - git          - clang            # build requirement          - llvm             # build requirement          - bpftool          # build requirement          - iperf3           # build requirement          - python3-docutils # build requirement        state: present    - name: Gather package manager fact      ansible.builtin.setup:        filter: ansible_pkg_mgr      register: setup_info    - name: install run and build requirements for bpftune on dnf based systems      ansible.builtin.dnf:        name:          - libbpf       # run requirement          - libnl3       # run requirement          - libcap       # run requirement          - libbpf-devel # build requirement          - libnl3-devel # build requirement          - libcap-devel # build requirement          - clang-libs   # build requirement          - llvm-libs    # build requirement        state: present        # enablerepo: <repository> # !ATTENTION! based on the system you use you have to enable the correct repository based on the system you can look it up here: https://pkgs.org/search/?q=libbpf-devel      when: setup_info.ansible_facts.ansible_pkg_mgr == "dnf"    - name: install build requirements for bpftune on apt based systems      ansible.builtin.apt:        name:          - libbpf-dev        # build requirement          - libcap-dev        # build requirement          - libnl-3-dev       # build requirement          - libnl-route-3-dev # build requirement        state: present        install_recommends: false      when: setup_info.ansible_facts.ansible_pkg_mgr == "apt"    - name: Clone or update the bpftune repository if a new version is available      ansible.builtin.git:        repo: "{{ repo_url }}"        dest: "{{ repo_dest }}"        update: true      register: git_result    - name: if repository changes rebuild software      block:        - name: Run make with taget 'all' for bpftune          community.general.make:            chdir: "{{ repo_dest }}"            target: all        - name: Run make with taget 'install' target for bpftune          community.general.make:            chdir: "{{ repo_dest }}"            target: install        - name: Check bpftune status          command: bpftune -S          register: bpftune_status          changed_when: false          failed_when: "'bpftune works fully' not in bpftune_status.stderr"        - name: restart and enable bpftune service          ansible.builtin.service:            name: bpftune.service            state: started            enabled: yes      when: git_result.changed

Tests

Tests are supplied for each tuner in the tests/ subdirectory."make test" runs all the tests. Tests use network namespacesto simulate interactions with remote hosts. See ./TESTING.mdfor more details.

Does my system support bpftune?

Simply run "bpftune -S" to see:

$ bpftune -Sbpftune works fullybpftune supports per-netns policy (via netns cookie)

Two aspects are important here

  • does the system support fentry/fexit etc? If so full supportis likely.
  • does the system support network namespace cookies? If soper-network-namespace policy is supported.

Demo

Simply starting bpftune and observing changes made via /var/log/messagescan be instructive. For example, on a standard VM with sysctl defaults,I ran

$ service bpftune start

...and went about normal development activities such as cloning gittrees from upstream, building kernels, etc. From the log we seesome of the adjustments bpftune made to accommodate these activities

$ sudo grep bpftune /var/log/messages...Apr 19 16:14:59 bpftest bpftune[2778]: bpftune works fullyApr 19 16:14:59 bpftest bpftune[2778]: bpftune supports per-netns policy (via netns cookie)Apr 19 16:18:40 bpftest bpftune[2778]: Scenario 'specify bbr congestion control' occurred for tunable 'TCP congestion control' in global ns. Because loss rate has exceeded 1 percent for a connection, use bbr congestion control algorithm instead of defaultApr 19 16:18:40 bpftest bpftune[2778]: due to loss events for 145.40.68.75, specify 'bbr' congestion control algorithmApr 19 16:26:53 bpftest bpftune[2778]: Scenario 'need to increase TCP buffer size(s)' occurred for tunable 'net.ipv4.tcp_rmem' in global ns. Need to increase buffer size(s) to maximize throughputApr 19 16:26:53 bpftest bpftune[2778]: Due to need to increase max buffer size to maximize throughput change net.ipv4.tcp_rmem(min default max) from (4096 131072 6291456) -> (4096 131072 7864320)Apr 19 16:26:53 bpftest bpftune[2778]: Scenario 'need to increase TCP buffer size(s)' occurred for tunable 'net.ipv4.tcp_rmem' in global ns. Need to increase buffer size(s) to maximize throughputApr 19 16:26:53 bpftest bpftune[2778]: Due to need to increase max buffer size to maximize throughput change net.ipv4.tcp_rmem(min default max) from (4096 131072 7864320) -> (4096 131072 9830400)Apr 19 16:29:04 bpftest bpftune[2778]: Scenario 'specify bbr congestion control' occurred for tunable 'TCP congestion control' in global ns. Because loss rate has exceeded 1 percent for a connection, use bbr congestion control algorithm instead of defaultApr 19 16:29:04 bpftest bpftune[2778]: due to loss events for 140.91.12.81, specify 'bbr' congestion control algorithm

To deterministically trigger bpftune behaviour, one approach we cantake is to download a large file with inappropriate settings.

In one window, set tcp rmem max to a too-low value, and run bpftuneas a program logging to stdout/stderr (-s):

$ sudo sysctl -w net.ipv4.tcp_rmem="4096 131072 1310720"net.ipv4.tcp_rmem = 4096 131072 1310720$ sudo bpftune -s

In another window, wget a large file:

$ wget https://yum.oracle.com/ISOS/OracleLinux/OL8/u7/x86_64/OracleLinux-R8-U7-x86_64-dvd.iso

In the first window, we see bpftune tuning up rmem:

bpftune: bpftune works in legacy modebpftune: bpftune does not support per-netns policy (via netns cookie)bpftune: Scenario 'need to increase TCP buffer size(s)' occurred for tunable 'net.ipv4.tcp_rmem' in global ns. Need to increase buffer size(s) to maximize throughputbpftune: Due to need to increase max buffer size to maximize throughput change net.ipv4.tcp_rmem(min default max) from (4096 131072 1310720) -> (4096 131072 1638400)

This occurs multiple times, and on exit (Ctrl+C) we seethe summary of changes made:

bpftune: Summary: scenario 'need to increase TCP buffer size(s)' occurred 9 times for tunable 'net.ipv4.tcp_rmem' in global ns. Need to increase buffer size(s) to maximize throughputbpftune: sysctl 'net.ipv4.tcp_rmem' changed from (4096 131072 1310720 ) -> (4096 131072 9765625 )

For more info

See the docs/ subdirectory for manual pages covering bpftuneand associated tuners.

bpftune was presented at the eBPF summit;video here.

bpftunewas also discussed on Liz Rice's excellent eCHO eBPF podcast, specifically in the context of using reinforcement learning in BPF

Contributing

This project welcomes contributions from the community. Before submitting a pull request, pleasereview our contribution guide

Security

Please consult thesecurity guide for our responsible security vulnerability disclosure process

License

Copyright (c) 2023 Oracle and/or its affiliates.

This software is available to you under

SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note

Being under the terms of the GNU General Public License version 2.

SPDX-URL:https://spdx.org/licenses/GPL-2.0.html

Seethe license file for more details.

About

bpftune uses BPF to auto-tune Linux systems

Topics

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

[8]ページ先頭

©2009-2025 Movatter.jp