Segmentation Offloads¶
Introduction¶
This document describes a set of techniques in the Linux networking stackto take advantage of segmentation offload capabilities of various NICs.
- The following technologies are described:
TCP Segmentation Offload - TSO
UDP Fragmentation Offload - UFO
IPIP, SIT, GRE, and UDP Tunnel Offloads
Generic Segmentation Offload - GSO
Generic Receive Offload - GRO
Partial Generic Segmentation Offload - GSO_PARTIAL
SCTP acceleration with GSO - GSO_BY_FRAGS
TCP Segmentation Offload¶
TCP segmentation allows a device to segment a single frame into multipleframes with a data payload size specified inskb_shinfo()->gso_size.When TCP segmentation requested the bit for either SKB_GSO_TCPV4 orSKB_GSO_TCPV6 should be set inskb_shinfo()->gso_type andskb_shinfo()->gso_size should be set to a non-zero value.
TCP segmentation is dependent on support for the use of partial checksumoffload. For this reason TSO is normally disabled if the Tx checksumoffload for a given device is disabled.
In order to support TCP segmentation offload it is necessary to populatethe network and transport header offsets of the skbuff so that the devicedrivers will be able determine the offsets of the IP or IPv6 header and theTCP header. In addition as CHECKSUM_PARTIAL is required csum_start shouldalso point to the TCP header of the packet.
For IPv4 segmentation we support one of two types in terms of the IP ID.The default behavior is to increment the IP ID with every segment. If theGSO type SKB_GSO_TCP_FIXEDID is specified then we will not increment the IPID and all segments will use the same IP ID.
For encapsulated packets, SKB_GSO_TCP_FIXEDID refers only to the outer header.SKB_GSO_TCP_FIXEDID_INNER can be used to specify the same for the inner header.Any combination of these two GSO types is allowed.
If a device has NETIF_F_TSO_MANGLEID set then the IP ID can be ignored whenperforming TSO and we will either increment the IP ID for all frames, or leaveit at a static value based on driver preference. For encapsulated packets,NETIF_F_TSO_MANGLEID is relevant for both outer and inner headers, unless theDF bit is not set on the outer header, in which case the device driver mustguarantee that the IP ID field is incremented in the outer header with everysegment.
UDP Fragmentation Offload¶
UDP fragmentation offload allows a device to fragment an oversized UDPdatagram into multiple IPv4 fragments. Many of the requirements for UDPfragmentation offload are the same as TSO. However the IPv4 ID forfragments should not increment as a single IPv4 datagram is fragmented.
UFO is deprecated: modern kernels will no longer generate UFO skbs, but canstill receive them from tuntap and similar devices. Offload of UDP-basedtunnel protocols is still supported.
IPIP, SIT, GRE, UDP Tunnel, and Remote Checksum Offloads¶
In addition to the offloads described above it is possible for a frame tocontain additional headers such as an outer tunnel. In order to accountfor such instances an additional set of segmentation offload types wereintroduced including SKB_GSO_IPXIP4, SKB_GSO_IPXIP6, SKB_GSO_GRE, andSKB_GSO_UDP_TUNNEL. These extra segmentation types are used to identifycases where there are more than just 1 set of headers. For example in thecase of IPIP and SIT we should have the network and transport headers movedfrom the standard list of headers to “inner” header offsets.
Currently only two levels of headers are supported. The convention is torefer to the tunnel headers as the outer headers, while the encapsulateddata is normally referred to as the inner headers. Below is the list ofcalls to access the given headers:
IPIP/SIT Tunnel:
Outer InnerMAC skb_mac_headerNetwork skb_network_header skb_inner_network_headerTransport skb_transport_header
UDP/GRE Tunnel:
Outer InnerMAC skb_mac_header skb_inner_mac_headerNetwork skb_network_header skb_inner_network_headerTransport skb_transport_header skb_inner_transport_header
In addition to the above tunnel types there are also SKB_GSO_GRE_CSUM andSKB_GSO_UDP_TUNNEL_CSUM. These two additional tunnel types reflect thefact that the outer header also requests to have a non-zero checksumincluded in the outer header.
Finally there is SKB_GSO_TUNNEL_REMCSUM which indicates that a given tunnelheader has requested a remote checksum offload. In this case the innerheaders will be left with a partial checksum and only the outer headerchecksum will be computed.
Generic Segmentation Offload¶
Generic segmentation offload is a pure software offload that is meant todeal with cases where device drivers cannot perform the offloads describedabove. What occurs in GSO is that a given skbuff will have its data brokenout over multiple skbuffs that have been resized to match the MSS providedviaskb_shinfo()->gso_size.
Before enabling any hardware segmentation offload a corresponding softwareoffload is required in GSO. Otherwise it becomes possible for a frame tobe re-routed between devices and end up being unable to be transmitted.
Generic Receive Offload¶
Generic receive offload is the complement to GSO. Ideally any frameassembled by GRO should be segmented to create an identical sequence offrames using GSO, and any sequence of frames segmented by GSO should beable to be reassembled back to the original by GRO.
Partial Generic Segmentation Offload¶
Partial generic segmentation offload is a hybrid between TSO and GSO. Whatit effectively does is take advantage of certain traits of TCP and tunnelsso that instead of having to rewrite the packet headers for each segmentonly the inner-most transport header and possibly the outer-most networkheader need to be updated. This allows devices that do not support tunneloffloads or tunnel offloads with checksum to still make use of segmentation.
With the partial offload what occurs is that all headers excluding theinner transport header are updated such that they will contain the correctvalues for if the header was simply duplicated. The one exception to thisis the outer IPv4 ID field. It is up to the device drivers to guaranteethat the IPv4 ID field is incremented in the case that a given header doesnot have the DF bit set.
SCTP acceleration with GSO¶
SCTP - despite the lack of hardware support - can still take advantage ofGSO to pass one large packet through the network stack, rather thanmultiple small packets.
This requires a different approach to other offloads, as SCTP packetscannot be just segmented to (P)MTU. Rather, the chunks must be contained inIP segments, padding respected. So unlike regular GSO, SCTP can’t justgenerate a big skb, set gso_size to the fragmentation point and deliver itto IP layer.
Instead, the SCTP protocol layer builds an skb with the segments correctlypadded and stored as chained skbs, andskb_segment() splits based on those.To signal this, gso_size is set to the special value GSO_BY_FRAGS.
Therefore, any code in the core networking stack must be aware of thepossibility that gso_size will be GSO_BY_FRAGS and handle that caseappropriately.
There are some helpers to make this easier:
skb_is_gso(skb) && skb_is_gso_sctp(skb) is the best way to see ifan skb is an SCTP GSO skb.
For size checks, the skb_gso_validate_*_len family of helpers correctlyconsiders GSO_BY_FRAGS.
For manipulating packets, skb_increase_gso_size and skb_decrease_gso_sizewill check for GSO_BY_FRAGS and WARN if asked to manipulate these skbs.
This also affects drivers with the NETIF_F_FRAGLIST & NETIF_F_GSO_SCTP bitsset. Note also that NETIF_F_GSO_SCTP is included in NETIF_F_GSO_SOFTWARE.