Movatterモバイル変換


[0]ホーム

URL:


Skip to content
DEV Community
Log in Create account

DEV Community

Cover image for Building Secure Foundations: A Practical Guide to Minimizing Linux Services' Attack Surface
David Gries
David Gries

Posted on • Edited on

     

Building Secure Foundations: A Practical Guide to Minimizing Linux Services' Attack Surface

Cybersecurity and its awareness have never been more crucial than they are today. Considering the increasing amount of attacks, it has become clear that protecting digital assets plays a significant role in software development and operations. What concrete steps can be taken to enhance the security of our services even further?

Starting at a Lower Level

While antivirus a well-executed read-only backup strategy are essential for identifying and reducing the impact of threats, it's important to establish a strong foundation of security from the outset. Rather than solely focusing on mitigating consequences after the fact, reducing the attack surface should be a primary goal.

This can be done by limiting access to the underlying system, like running as an arbitrary user and dropping unneeded privileges. In Kubernetes, this would for example typically mean using non-root base images in combination withsecurityContext definitions.

But in some cases, it's better or even required to deploy directly on virtual machines. So how can a similar strategy be applied there?

🔒 Hardening Nginx: Step by Step

Let's examine a real-world example using the Nginx service file provided by Ubuntu 20.04:

The Defaults

david@proxy:~$ systemctl cat nginx.service

# /lib/systemd/system/nginx.service[Unit]Description=A high performance web server and a reverse proxy serverDocumentation=man:nginx(8)After=network.target[Service]Type=forkingPIDFile=/run/nginx.pidExecStartPre=/usr/sbin/nginx -t -q -g 'daemon on; master_process on;'ExecStart=/usr/sbin/nginx -g 'daemon on; master_process on;'ExecReload=/usr/sbin/nginx -g 'daemon on; master_process on;' -s reloadExecStop=-/sbin/start-stop-daemon --quiet --stop --retry QUIT/5 --pidfile /run/nginx.pidTimeoutStopSec=5KillMode=mixed[Install]WantedBy=multi-user.target
Enter fullscreen modeExit fullscreen mode

By default, the service runs as theroot user. Therefore, processes spawned by/usr/sbin/nginx have all privileges of theroot user and group, which could allow malicious software to control every part of the system when there is an exploit for Nginx. While Nginx is also able to use arbitrary users by itself, the main process that's started by the service still has root privileges. In many cases, this is not required and can be avoided by using Systemd's already built-in capabilities.

Breaking it Down

Thesystemd-analyze cli tool can help to get an overview of potential issues of Systemd services:

systemd-analyze security                # provides a high-level overview including a                                        # numeric "exposure" value of Systemd servicessystemd-analyze security <service_name> # shows detailed security-related information                                        # about a single service
Enter fullscreen modeExit fullscreen mode

The output for thenginx service looks like this:

david@proxy:~$ systemd-analyze security nginx.service --no-pager

Nginx Service Security Summary
  NAME                                                        DESCRIPTION                                                       EXPOSURE✗ PrivateNetwork=                                             Service has access to the host's network                               0.5✗ User=/DynamicUser=                                          Service runs as root user                                              0.4✗ CapabilityBoundingSet=~CAP_SET(UID|GID|PCAP)                Service may change UID/GID identities/capabilities                     0.3✗ CapabilityBoundingSet=~CAP_SYS_ADMIN                        Service has administrator privileges                                   0.3✗ CapabilityBoundingSet=~CAP_SYS_PTRACE                       Service has ptrace() debugging abilities                               0.3✗ RestrictAddressFamilies=~AF_(INET|INET6)                    Service may allocate Internet sockets                                  0.3✗ RestrictNamespaces=~CLONE_NEWUSER                           Service may create user namespaces                                     0.3✗ RestrictAddressFamilies=~…                                  Service may allocate exotic sockets                                    0.3✗ CapabilityBoundingSet=~CAP_(CHOWN|FSETID|SETFCAP)           Service may change file ownership/access mode/capabilities unres…      0.2✗ CapabilityBoundingSet=~CAP_(DAC_*|FOWNER|IPC_OWNER)         Service may override UNIX file/IPC permission checks                   0.2✗ CapabilityBoundingSet=~CAP_NET_ADMIN                        Service has network configuration privileges                           0.2✗ CapabilityBoundingSet=~CAP_RAWIO                            Service has raw I/O access                                             0.2✗ CapabilityBoundingSet=~CAP_SYS_MODULE                       Service may load kernel modules                                        0.2✗ CapabilityBoundingSet=~CAP_SYS_TIME                         Service processes may change the system clock                          0.2✗ DeviceAllow=                                                Service has no device ACL                                              0.2✗ IPAddressDeny=                                              Service does not define an IP address whitelist                        0.2✓ KeyringMode=                                                Service doesn't share key material with other services✗ NoNewPrivileges=                                            Service processes may acquire new privileges                           0.2✓ NotifyAccess=                                               Service child processes cannot alter service state✗ PrivateDevices=                                             Service potentially has access to hardware devices                     0.2✗ PrivateMounts=                                              Service may install system mounts                                      0.2✗ PrivateTmp=                                                 Service has access to other software's temporary files                 0.2✗ PrivateUsers=                                               Service has access to other users                                      0.2✗ ProtectClock=                                               Service may write to the hardware clock or system clock                0.2✗ ProtectControlGroups=                                       Service may modify the control group file system                       0.2✗ ProtectHome=                                                Service has full access to home directories                            0.2✗ ProtectKernelLogs=                                          Service may read from or write to the kernel log ring buffer           0.2✗ ProtectKernelModules=                                       Service may load or read kernel modules                                0.2✗ ProtectKernelTunables=                                      Service may alter kernel tunables                                      0.2✗ ProtectSystem=                                              Service has full access to the OS file hierarchy                       0.2✗ RestrictAddressFamilies=~AF_PACKET                          Service may allocate packet sockets                                    0.2✗ RestrictSUIDSGID=                                           Service may create SUID/SGID files                                     0.2✗ SystemCallArchitectures=                                    Service may execute system calls with all ABIs                         0.2✗ SystemCallFilter=~@clock                                    Service does not filter system calls                                   0.2✗ SystemCallFilter=~@debug                                    Service does not filter system calls                                   0.2✗ SystemCallFilter=~@module                                   Service does not filter system calls                                   0.2✗ SystemCallFilter=~@mount                                    Service does not filter system calls                                   0.2✗ SystemCallFilter=~@raw-io                                   Service does not filter system calls                                   0.2✗ SystemCallFilter=~@reboot                                   Service does not filter system calls                                   0.2✗ SystemCallFilter=~@swap                                     Service does not filter system calls                                   0.2✗ SystemCallFilter=~@privileged                               Service does not filter system calls                                   0.2✗ SystemCallFilter=~@resources                                Service does not filter system calls                                   0.2✓ AmbientCapabilities=                                        Service process does not receive ambient capabilities✗ CapabilityBoundingSet=~CAP_AUDIT_*                          Service has audit subsystem access                                     0.1✗ CapabilityBoundingSet=~CAP_KILL                             Service may send UNIX signals to arbitrary processes                   0.1✗ CapabilityBoundingSet=~CAP_MKNOD                            Service may create device nodes                                        0.1✗ CapabilityBoundingSet=~CAP_NET_(BIND_SERVICE|BROADCAST|RAW) Service has elevated networking privileges                             0.1✗ CapabilityBoundingSet=~CAP_SYSLOG                           Service has access to kernel logging                                   0.1✗ CapabilityBoundingSet=~CAP_SYS_(NICE|RESOURCE)              Service has privileges to change resource use parameters               0.1✗ RestrictNamespaces=~CLONE_NEWCGROUP                         Service may create cgroup namespaces                                   0.1✗ RestrictNamespaces=~CLONE_NEWIPC                            Service may create IPC namespaces                                      0.1✗ RestrictNamespaces=~CLONE_NEWNET                            Service may create network namespaces                                  0.1✗ RestrictNamespaces=~CLONE_NEWNS                             Service may create file system namespaces                              0.1✗ RestrictNamespaces=~CLONE_NEWPID                            Service may create process namespaces                                  0.1✗ RestrictRealtime=                                           Service may acquire realtime scheduling                                0.1✗ SystemCallFilter=~@cpu-emulation                            Service does not filter system calls                                   0.1✗ SystemCallFilter=~@obsolete                                 Service does not filter system calls                                   0.1✗ RestrictAddressFamilies=~AF_NETLINK                         Service may allocate netlink sockets                                   0.1✗ RootDirectory=/RootImage=                                   Service runs within the host's root directory                          0.1    SupplementaryGroups=                                        Service runs as root, option does not matter✗ CapabilityBoundingSet=~CAP_MAC_*                            Service may adjust SMACK MAC                                           0.1✗ CapabilityBoundingSet=~CAP_SYS_BOOT                         Service may issue reboot()                                             0.1✓ Delegate=                                                   Service does not maintain its own delegated control group subtree✗ LockPersonality=                                            Service may change ABI personality                                     0.1✗ MemoryDenyWriteExecute=                                     Service may create writable executable memory mappings                 0.1    RemoveIPC=                                                  Service runs as root, option does not apply✗ RestrictNamespaces=~CLONE_NEWUTS                            Service may create hostname namespaces                                 0.1✗ UMask=                                                      Files created by service are world-readable by default                 0.1✗ CapabilityBoundingSet=~CAP_LINUX_IMMUTABLE                  Service may mark files immutable                                       0.1✗ CapabilityBoundingSet=~CAP_IPC_LOCK                         Service may lock memory into RAM                                       0.1✗ CapabilityBoundingSet=~CAP_SYS_CHROOT                       Service may issue chroot()                                             0.1✗ ProtectHostname=                                            Service may change system host/domainname                              0.1✗ CapabilityBoundingSet=~CAP_BLOCK_SUSPEND                    Service may establish wake locks                                       0.1✗ CapabilityBoundingSet=~CAP_LEASE                            Service may create file leases                                         0.1✗ CapabilityBoundingSet=~CAP_SYS_PACCT                        Service may use acct()                                                 0.1✗ CapabilityBoundingSet=~CAP_SYS_TTY_CONFIG                   Service may issue vhangup()                                            0.1✗ CapabilityBoundingSet=~CAP_WAKE_ALARM                       Service may program timers that wake up the system                     0.1✗ RestrictAddressFamilies=~AF_UNIX                            Service may allocate local sockets                                     0.1→ Overall exposure level for nginx.service: 9.6 UNSAFE 😨
Enter fullscreen modeExit fullscreen mode

A lot of those capabilities are not required to run a web server, so it's best to limit the service's privileges. As interfacing with the Linux kernel can be very complex and is prone to changes, Systemd services offer a way to define common configurations directly in the service files. Given the multitude of configuration parameters for Systemd services, this example will concentrate on values significantly affecting security. It will use a standard KubernetessecurityContext as a foundation.

The Principle of Least Privilege

Adopting the principle of least privilege is crucial. By restricting access and privileges to the bare essentials, the attack surface diminishes significantly. When using Kubernetes resources, you'd usually use asecurityContext definition to limit capabilities of a Pod:

...securityContext:runAsNonRoot:truerunAsUser:1001runAsGroup:2001allowPrivilegeEscalation:falseprivileged:falsereadOnlyRootFilesystem:truecapabilities:drop:-all...
Enter fullscreen modeExit fullscreen mode

In the above example, the process runs without root privileges on a read-only filesystem and all capabilities are dropped. A similar setup can be achieved using a Systemd service:

  • runAsNonRoot: true ➜ no equivalent, if possibleDynamicUser can be used
  • runAsUser: 1001User=<username>
  • runAsGroup: 2001Group=<groupname>
  • allowPrivilegeEscalation: falseNoNewPrivileges=true
  • privileged: false ➜ no equivalent,PrivateDevices=<...>,Protect<...>=<...> etc. can be used
  • readOnlyRootFilesystem: trueProtectSystem=strict /TemporaryFileSystem=/:ro (this also hides all files, needs Systemd >= 238)
  • capabilities.drop: ["all"]CapabilityBoundingSet=<...>

There are a lot more ways to control the capabilities and permissions of Systemd services which are documentedhere. After applying some of these parameters to the Nginx service, the Unit File looks as follows:

david@proxy:~$ systemctl cat nginx

# /etc/systemd/system/nginx.service# Rootless Nginx service based on https://github.com/stephan13360/systemd-services/blob/master/nginx/nginx.service[Unit]# This is from the default nginx.serviceDescription=nginx (hardened rootless)Documentation=https://nginx.org/en/docs/Documentation=https://github.com/stephan13360/systemd-services/blob/master/nginx/README.mdAfter=network-online.target remote-fs.target nss-lookup.targetWants=network-online.target[Service]# forking is not necessary as `daemon` is turned off in the nginx configType=execUser=nginxGroup=nginx## can be used e.g. for accessing directory containing SSL certs#SupplementaryGroups=acme# define runtime directory /run/nginx as rootless services can't access /runRuntimeDirectory=nginx# write logs to /var/log/nginxLogsDirectory=nginx# write cache to /var/cache/nginxCacheDirectory=nginx# configuration is in /etc/nginxConfigurationDirectory=nginxExecStart=/usr/sbin/nginx -c /etc/nginx/nginx.conf# PID is not necessary here as the service is not forkingExecReload=/usr/sbin/nginx -s reloadRestart=on-failureRestartSec=10s# Hardening# hide the entire filesystem tree from the service and also make it read only, requires systemd >=238TemporaryFileSystem=/:ro# Remount (bind) necessary paths, based on https://gitlab.com/apparmor/apparmor/blob/master/profiles/apparmor.d/abstractions/base,# https://github.com/jelly/apparmor-profiles/blob/master/usr.bin.nginx,# https://www.freedesktop.org/software/systemd/man/systemd.exec.html#RootDirectory=## This gives access to (probably) necessary system files, allows journald loggingBindReadOnlyPaths=/lib/ /lib64/ /usr/lib/ /usr/lib64/ /etc/ld.so.cache /etc/ld.so.conf /etc/ld.so.conf.d/ /etc/bindresvport.blacklist /usr/share/zoneinfo/ /usr/share/locale/ /etc/localtime /usr/share/common-licenses/ /etc/ssl/certs/ /etc/resolv.confBindReadOnlyPaths=/dev/log /run/systemd/journal/socket /run/systemd/journal/stdout /run/systemd/notify# Additional access to service-specific directoriesBindReadOnlyPaths=/usr/sbin/nginxBindReadOnlyPaths=/run/ /usr/share/nginx/PrivateTmp=truePrivateDevices=trueProtectControlGroups=trueProtectKernelModules=trueProtectKernelTunables=true# Network accessRestrictAddressFamilies=AF_UNIX AF_INET AF_INET6# MiscellaneousSystemCallArchitectures=native# also implicit because settings like MemoryDenyWriteExecute are setNoNewPrivileges=trueMemoryDenyWriteExecute=trueProtectKernelLogs=trueLockPersonality=trueProtectHostname=trueRemoveIPC=trueRestrictSUIDSGID=trueProtectClock=true# Capabilities to bind low ports (80, 443)AmbientCapabilities=CAP_NET_BIND_SERVICE[Install]WantedBy=multi-user.target
Enter fullscreen modeExit fullscreen mode

Now, not only is the service running as non-root, but the process and sub-processes also only have access to a very limited part of the system. All filesystem access is dropped by default and only necessary system directories are either made available or substituted by temporary paths. Besides that, persistence is only possible where necessary which further limits the attack surface. Runningsystemd-analyze again on the new service, the results are showing effect:

david@proxy:~$ systemd-analyze security nginx.service --no-pager

Nginx Service Security Summary
  NAME                                                        DESCRIPTION                                                       EXPOSURE✗ PrivateNetwork=                                             Service has access to the host's network                               0.5✓ User=/DynamicUser=                                          Service runs under a static non-root user identity✗ CapabilityBoundingSet=~CAP_SET(UID|GID|PCAP)                Service may change UID/GID identities/capabilities                     0.3✗ CapabilityBoundingSet=~CAP_SYS_ADMIN                        Service has administrator privileges                                   0.3✗ CapabilityBoundingSet=~CAP_SYS_PTRACE                       Service has ptrace() debugging abilities                               0.3✗ RestrictAddressFamilies=~AF_(INET|INET6)                    Service may allocate Internet sockets                                  0.3✗ RestrictNamespaces=~CLONE_NEWUSER                           Service may create user namespaces                                     0.3✓ RestrictAddressFamilies=~…                                  Service cannot allocate exotic sockets✗ CapabilityBoundingSet=~CAP_(CHOWN|FSETID|SETFCAP)           Service may change file ownership/access mode/capabilities unres…      0.2✗ CapabilityBoundingSet=~CAP_(DAC_*|FOWNER|IPC_OWNER)         Service may override UNIX file/IPC permission checks                   0.2✗ CapabilityBoundingSet=~CAP_NET_ADMIN                        Service has network configuration privileges                           0.2✓ CapabilityBoundingSet=~CAP_RAWIO                            Service has no raw I/O access✓ CapabilityBoundingSet=~CAP_SYS_MODULE                       Service cannot load kernel modules✓ CapabilityBoundingSet=~CAP_SYS_TIME                         Service processes cannot change the system clock✗ DeviceAllow=                                                Service has a device ACL with some special devices                     0.1✗ IPAddressDeny=                                              Service does not define an IP address whitelist                        0.2✓ KeyringMode=                                                Service doesn't share key material with other services✓ NoNewPrivileges=                                            Service processes cannot acquire new privileges✓ NotifyAccess=                                               Service child processes cannot alter service state✓ PrivateDevices=                                             Service has no access to hardware devices✓ PrivateMounts=                                              Service cannot install system mounts✓ PrivateTmp=                                                 Service has no access to other software's temporary files✗ PrivateUsers=                                               Service has access to other users                                      0.2✗ ProtectClock=                                               Service may write to the hardware clock or system clock                0.2✓ ProtectControlGroups=                                       Service cannot modify the control group file system✗ ProtectHome=                                                Service has full access to home directories                            0.2✓ ProtectKernelLogs=                                          Service cannot read from or write to the kernel log ring buffer✓ ProtectKernelModules=                                       Service cannot load or read kernel modules✓ ProtectKernelTunables=                                      Service cannot alter kernel tunables (/proc/sys, …)✗ ProtectSystem=                                              Service has full access to the OS file hierarchy                       0.2✓ RestrictAddressFamilies=~AF_PACKET                          Service cannot allocate packet sockets✓ RestrictSUIDSGID=                                           SUID/SGID file creation by service is restricted✓ SystemCallArchitectures=                                    Service may execute system calls only with native ABI✗ SystemCallFilter=~@clock                                    Service does not filter system calls                                   0.2✗ SystemCallFilter=~@debug                                    Service does not filter system calls                                   0.2✗ SystemCallFilter=~@module                                   Service does not filter system calls                                   0.2✗ SystemCallFilter=~@mount                                    Service does not filter system calls                                   0.2✗ SystemCallFilter=~@raw-io                                   Service does not filter system calls                                   0.2✗ SystemCallFilter=~@reboot                                   Service does not filter system calls                                   0.2✗ SystemCallFilter=~@swap                                     Service does not filter system calls                                   0.2✗ SystemCallFilter=~@privileged                               Service does not filter system calls                                   0.2✗ SystemCallFilter=~@resources                                Service does not filter system calls                                   0.2✗ AmbientCapabilities=                                        Service process receives ambient capabilities                          0.1✗ CapabilityBoundingSet=~CAP_AUDIT_*                          Service has audit subsystem access                                     0.1✗ CapabilityBoundingSet=~CAP_KILL                             Service may send UNIX signals to arbitrary processes                   0.1✓ CapabilityBoundingSet=~CAP_MKNOD                            Service cannot create device nodes✗ CapabilityBoundingSet=~CAP_NET_(BIND_SERVICE|BROADCAST|RAW) Service has elevated networking privileges                             0.1✓ CapabilityBoundingSet=~CAP_SYSLOG                           Service has no access to kernel logging✗ CapabilityBoundingSet=~CAP_SYS_(NICE|RESOURCE)              Service has privileges to change resource use parameters               0.1✗ RestrictNamespaces=~CLONE_NEWCGROUP                         Service may create cgroup namespaces                                   0.1✗ RestrictNamespaces=~CLONE_NEWIPC                            Service may create IPC namespaces                                      0.1✗ RestrictNamespaces=~CLONE_NEWNET                            Service may create network namespaces                                  0.1✗ RestrictNamespaces=~CLONE_NEWNS                             Service may create file system namespaces                              0.1✗ RestrictNamespaces=~CLONE_NEWPID                            Service may create process namespaces                                  0.1✗ RestrictRealtime=                                           Service may acquire realtime scheduling                                0.1✗ SystemCallFilter=~@cpu-emulation                            Service does not filter system calls                                   0.1✗ SystemCallFilter=~@obsolete                                 Service does not filter system calls                                   0.1✓ RestrictAddressFamilies=~AF_NETLINK                         Service cannot allocate netlink sockets✗ RootDirectory=/RootImage=                                   Service runs within the host's root directory                          0.1✓ SupplementaryGroups=                                        Service has no supplementary groups✗ CapabilityBoundingSet=~CAP_MAC_*                            Service may adjust SMACK MAC                                           0.1✗ CapabilityBoundingSet=~CAP_SYS_BOOT                         Service may issue reboot()                                             0.1✓ Delegate=                                                   Service does not maintain its own delegated control group subtree✓ LockPersonality=                                            Service cannot change ABI personality✓ MemoryDenyWriteExecute=                                     Service cannot create writable executable memory mappings✓ RemoveIPC=                                                  Service user cannot leave SysV IPC objects around✗ RestrictNamespaces=~CLONE_NEWUTS                            Service may create hostname namespaces                                 0.1✗ UMask=                                                      Files created by service are world-readable by default                 0.1✗ CapabilityBoundingSet=~CAP_LINUX_IMMUTABLE                  Service may mark files immutable                                       0.1✗ CapabilityBoundingSet=~CAP_IPC_LOCK                         Service may lock memory into RAM                                       0.1✗ CapabilityBoundingSet=~CAP_SYS_CHROOT                       Service may issue chroot()                                             0.1✓ ProtectHostname=                                            Service cannot change system host/domainname✗ CapabilityBoundingSet=~CAP_BLOCK_SUSPEND                    Service may establish wake locks                                       0.1✗ CapabilityBoundingSet=~CAP_LEASE                            Service may create file leases                                         0.1✗ CapabilityBoundingSet=~CAP_SYS_PACCT                        Service may use acct()                                                 0.1✗ CapabilityBoundingSet=~CAP_SYS_TTY_CONFIG                   Service may issue vhangup()                                            0.1✓ CapabilityBoundingSet=~CAP_WAKE_ALARM                       Service cannot program timers that wake up the system✗ RestrictAddressFamilies=~AF_UNIX                            Service may allocate local sockets                                     0.1→ Overall exposure level for nginx.service: 6.1 MEDIUM 😐
Enter fullscreen modeExit fullscreen mode

The score shows there's still room for improvement, but in the end, a lot of potential attack vectors have been mitigated in comparison to the officially provided Unit file.

🚀 Where to Continue

In summary, Systemd offers a straightforward method for constraining a process's capabilities, primarily leveraging Linux namespaces. This approach can significantly enhance security, but it does have its constraints. That is where Mandatory Access Control steps in, with tools such asAppArmor andSELinux providing fine grained control over system access. These tools enable a more nuanced approach to restricting system access, albeit with a more intricate configuration process. It's worth noting that numerous Linux distributions provide predefined profiles for a wide range of services, simplifying the implementation of these controls.

Ultimately, achieving a balance between security and practical implementation boils down to leveraging Systemd's capabilities alongside predefined Mandatory Access Control profiles. This approach strikes an effective compromise, ensuring both enhanced security and efficient deployment timelines.

Top comments(0)

Subscribe
pic
Create template

Templates let you quickly answer FAQs or store snippets for re-use.

Dismiss

Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment'spermalink.

For further actions, you may consider blocking this person and/orreporting abuse

DevOps Engineer | ⎈ K8s | 💻 RHEL, Debian | 🤖 Iac, CM, automation
  • Location
    Bavaria, Germany
  • Education
    Aschaffenburg University of Applied Sciences
  • Work
    DevOps Engineer
  • Joined

More fromDavid Gries

DEV Community

We're a place where coders share, stay up-to-date and grow their careers.

Log in Create account

[8]ページ先頭

©2009-2025 Movatter.jp