Netfilter’s flowtable infrastructure¶
This documentation describes the Netfilter flowtable infrastructure which allowsyou to define a fastpath through the flowtable datapath. This infrastructurealso provides hardware offload support. The flowtable supports for the layer 3IPv4 and IPv6 and the layer 4 TCP and UDP protocols.
Overview¶
Once the first packet of the flow successfully goes through the IP forwardingpath, from the second packet on, you might decide to offload the flow to theflowtable through your ruleset. The flowtable infrastructure provides a ruleaction that allows you to specify when to add a flow to the flowtable.
A packet that finds a matching entry in the flowtable (ie. flowtable hit) istransmitted to the output netdevice vianeigh_xmit(), hence, packets bypass theclassic IP forwarding path (the visible effect is that you do not see thesepackets from any of the Netfilter hooks coming after ingress). In case thatthere is no matching entry in the flowtable (ie. flowtable miss), the packetfollows the classic IP forwarding path.
The flowtable uses a resizable hashtable. Lookups are based on the followingn-tuple selectors: layer 2 protocol encapsulation (VLAN and PPPoE), layer 3source and destination, layer 4 source and destination ports and the inputinterface (useful in case there are several conntrack zones in place).
The ‘flow add’ action allows you to populate the flowtable, the user selectivelyspecifies what flows are placed into the flowtable. Hence, packets follow theclassic IP forwarding path unless the user explicitly instruct flows to use thisnew alternative forwarding path via policy.
The flowtable datapath is represented in Fig.1, which describes the classic IPforwarding path including the Netfilter hooks and the flowtable fastpath bypass.
userspace process ^ | | | _____|____ ____\/___ / \ / \ | input | | output | \__________/ \_________/ ^ | | | _________ __________ --------- _____\/_____ / \ / \ |Routing | / \--> ingress ---> prerouting ---> |decision| | postrouting |--> neigh_xmit \_________/ \__________/ ---------- \____________/ ^ | ^ | ^ | flowtable | ____\/___ | | | | / \ | | __\/___ | | forward |------------ | |-----| | \_________/ | |-----| | 'flow offload' rule | |-----| | adds entry to | |_____| | flowtable | | | | / \ | | /hit\_no_| | \ ? / | \ / | |__yes_________________fastpath bypass ____________________________| Fig.1 Netfilter hooks and flowtable interactions
The flowtable entry also stores the NAT configuration, so all packets aremangled according to the NAT policy that is specified from the classic IPforwarding path. The TTL is decremented before callingneigh_xmit(). Fragmentedtraffic is passed up to follow the classic IP forwarding path given that thetransport header is missing, in this case, flowtable lookups are not possible.TCP RST and FIN packets are also passed up to the classic IP forwarding path torelease the flow gracefully. Packets that exceed the MTU are also passed up tothe classic forwarding path to report packet-too-big ICMP errors to the sender.
Example configuration¶
Enabling the flowtable bypass is relatively easy, you only need to create aflowtable and add one rule to your forward chain:
table inet x { flowtable f { hook ingress priority 0; devices = { eth0, eth1 }; } chain y { type filter hook forward priority 0; policy accept; ip protocol tcp flow add @f counter packets 0 bytes 0 }}This example adds the flowtable ‘f’ to the ingress hook of the eth0 and eth1netdevices. You can create as many flowtables as you want in case you need toperform resource partitioning. The flowtable priority defines the order in whichhooks are run in the pipeline, this is convenient in case you already have anftables ingress chain (make sure the flowtable priority is smaller than thenftables ingress chain hence the flowtable runs before in the pipeline).
The ‘flow offload’ action from the forward chain ‘y’ adds an entry to theflowtable for the TCP syn-ack packet coming in the reply direction. Once theflow is offloaded, you will observe that the counter rule in the example abovedoes not get updated for the packets that are being forwarded through theforwarding bypass.
You can identify offloaded flows through the [OFFLOAD] tag when listing yourconnection tracking table.
# conntrack -Ltcp 6 src=10.141.10.2 dst=192.168.10.2 sport=52728 dport=5201 src=192.168.10.2 dst=192.168.10.1 sport=5201 dport=52728 [OFFLOAD] mark=0 use=2
Layer 2 encapsulation¶
Since Linux kernel 5.13, the flowtable infrastructure discovers the realnetdevice behind VLAN and PPPoE netdevices. The flowtable software datapathparses the VLAN and PPPoE layer 2 headers to extract the ethertype and theVLAN ID / PPPoE session ID which are used for the flowtable lookups. Theflowtable datapath also deals with layer 2 decapsulation.
You do not need to add the PPPoE and the VLAN devices to your flowtable,instead the real device is sufficient for the flowtable to track your flows.
Bridge and IP forwarding¶
Since Linux kernel 5.13, you can add bridge ports to the flowtable. Theflowtable infrastructure discovers the topology behind the bridge device. Thisallows the flowtable to define a fastpath bypass between the bridge ports(represented as eth1 and eth2 in the example figure below) and the gatewaydevice (represented as eth0) in your switch/router.
fastpath bypass .-------------------------./ \| IP forwarding || / \ \/| br0 eth0 ..... eth0. / \ *host B* -> eth1 eth2 . *switch/router* . . eth0 *host A*
The flowtable infrastructure also supports for bridge VLAN filtering actionssuch as PVID and untagged. You can also stack a classic VLAN device on top ofyour bridge port.
If you would like that your flowtable defines a fastpath between your bridgeports and your IP forwarding path, you have to add your bridge ports (asrepresented by the real netdevice) to your flowtable definition.
Counters¶
The flowtable can synchronize packet and byte counters with the existingconnection tracking entry by specifying the counter statement in your flowtabledefinition, e.g.
table inet x { flowtable f { hook ingress priority 0; devices = { eth0, eth1 }; counter }}Counter support is available since Linux kernel 5.7.
Hardware offload¶
If your network device provides hardware offload support, you can turn it on bymeans of the ‘offload’ flag in your flowtable definition, e.g.
table inet x { flowtable f { hook ingress priority 0; devices = { eth0, eth1 }; flags offload; }}There is a workqueue that adds the flows to the hardware. Note that a fewpackets might still run over the flowtable software path until the workqueue hasa chance to offload the flow to the network device.
You can identify hardware offloaded flows through the [HW_OFFLOAD] tag whenlisting your connection tracking table. Please, note that the [OFFLOAD] tagrefers to the software offload mode, so there is a distinction between [OFFLOAD]which refers to the software flowtable fastpath and [HW_OFFLOAD] which refersto the hardware offload datapath being used by the flow.
The flowtable hardware offload infrastructure also supports for the DSA(Distributed Switch Architecture).
Limitations¶
The flowtable behaves like a cache. The flowtable entries might get stale ifeither the destination MAC address or the egress netdevice that is used fortransmission changes.
This might be a problem if:
You run the flowtable in software mode and you combine bridge and IPforwarding in your setup.
Hardware offload is enabled.
More reading¶
This documentation is based on the LWN.net articles[1][2]. Rafal Mileckialso made a very complete and comprehensive summary called “A state of networkacceleration” that describes how things were before this infrastructure wasmainlined[3] and it also makes a rough summary of this work[4].
[1][2][3]http://lists.infradead.org/pipermail/lede-dev/2018-January/010830.html
[4]http://lists.infradead.org/pipermail/lede-dev/2018-January/010829.html