XDP RX Metadata

This document describes how an eXpress Data Path (XDP) program can accesshardware metadata related to a packet using a set of helper functions,and how it can pass that metadata on to other consumers.

General Design

XDP has access to a set of kfuncs to manipulate the metadata in an XDP frame.Every device driver that wishes to expose additional packet metadata canimplement these kfuncs. The set of kfuncs is declared ininclude/net/xdp.hviaXDP_METADATA_KFUNC_xxx.

Currently, the following kfuncs are supported. In the future, as moremetadata is supported, this set will grow:

__bpf_kfuncintbpf_xdp_metadata_rx_timestamp(conststructxdp_md*ctx,u64*timestamp)

Read XDP frame RX timestamp.

Parameters

conststructxdp_md*ctx

XDP context pointer.

u64*timestamp

Return value pointer.

Return

  • Returns 0 on success or-errno on error.

  • -EOPNOTSUPP : means device driver does not implement kfunc

  • -ENODATA : means no RX-timestamp available for this frame

__bpf_kfuncintbpf_xdp_metadata_rx_hash(conststructxdp_md*ctx,u32*hash,enumxdp_rss_hash_type*rss_type)

Read XDP frame RX hash.

Parameters

conststructxdp_md*ctx

XDP context pointer.

u32*hash

Return value pointer.

enumxdp_rss_hash_type*rss_type

Return value pointer for RSS type.

Description

The RSS hash type (rss_type) specifies what portion of packet headers NIChardware used when calculating RSS hash value. The RSS type can be decodedviaenumxdp_rss_hash_type either matching on individual L3/L4 bitsXDP_RSS_L* or by combined traditionalRSS Hashing TypesXDP_RSS_TYPE_L*.

Return

  • Returns 0 on success or-errno on error.

  • -EOPNOTSUPP : means device driver doesn’t implement kfunc

  • -ENODATA : means no RX-hash available for this frame

__bpf_kfuncintbpf_xdp_metadata_rx_vlan_tag(conststructxdp_md*ctx,__be16*vlan_proto,u16*vlan_tci)

Get XDP packet outermost VLAN tag

Parameters

conststructxdp_md*ctx

XDP context pointer.

__be16*vlan_proto

Destination pointer for VLAN Tag protocol identifier (TPID).

u16*vlan_tci

Destination pointer for VLAN TCI (VID + DEI + PCP)

Description

In case of success,vlan_proto containsTag protocol identifier (TPID),usuallyETH_P_8021Q orETH_P_8021AD, but some networks can usecustom TPIDs.vlan_proto is stored innetwork byte order (BE)and should be used as follows:if(vlan_proto==bpf_htons(ETH_P_8021Q))do_something();

vlan_tci contains the remaining 16 bits of a VLAN tag.Driver is expected to provide those inhost byte order (usually LE),so the bpf program should not perform byte conversion.According to 802.1Q standard,VLAN TCI (Tag control information)is a bit field that contains:VLAN identifier (VID) that can be read withvlan_tci&0xfff,Drop eligible indicator (DEI) - 1 bit,Priority code point (PCP) - 3 bits.For detailed meaning of DEI and PCP, please refer to other sources.

Return

  • Returns 0 on success or-errno on error.

  • -EOPNOTSUPP : device driver doesn’t implement kfunc

  • -ENODATA : VLAN tag was not stripped or is not available

An XDP program can use these kfuncs to read the metadata into stackvariables for its own consumption. Or, to pass the metadata on to otherconsumers, an XDP program can store it into the metadata area carriedahead of the packet. Not all packets will necessary have the requestedmetadata available in which case the driver returns-ENODATA.

Not all kfuncs have to be implemented by the device driver; when notimplemented, the default ones that return-EOPNOTSUPP will be usedto indicate the device driver have not implemented this kfunc.

Within an XDP frame, the metadata layout (accessed viaxdp_buff) isas follows:

+----------+-----------------+------+| headroom | custom metadata | data |+----------+-----------------+------+           ^                 ^           |                 | xdp_buff->data_meta   xdp_buff->data

An XDP program can store individual metadata items into thisdata_metaarea in whichever format it chooses. Later consumers of the metadatawill have to agree on the format by some out of band contract (like forthe AF_XDP use case, see below).

AF_XDP

AF_XDP use-case implies that there is a contract between the BPFprogram that redirects XDP frames into theAF_XDP socket (XSK) andthe final consumer. Thus the BPF program manually allocates a fixed number ofbytes out of metadata viabpf_xdp_adjust_meta and calls a subsetof kfuncs to populate it. The userspaceXSK consumer computesxsk_umem__get_data()-METADATA_SIZE to locate that metadata.Note,xsk_umem__get_data is defined inlibxdp andMETADATA_SIZE is an application-specific constant (AF_XDP receivedescriptor does _not_ explicitly carry the size of the metadata).

Here is theAF_XDP consumer layout (note missingdata_meta pointer):

+----------+-----------------+------+| headroom | custom metadata | data |+----------+-----------------+------+                             ^                             |                      rx_desc->address

XDP_PASS

This is the path where the packets processed by the XDP program are passedinto the kernel. The kernel creates theskb out of thexdp_buffcontents. Currently, every driver has custom kernel code to parsethe descriptors and populateskb metadata when doing thisxdp_buff->skbconversion, and the XDP metadata is not used by the kernel when buildingskbs. However, TC-BPF programs can access the XDP metadata area usingthedata_meta pointer.

In the future, we’d like to support a case where an XDP programcan override some of the metadata used for buildingskbs.

bpf_redirect_map

bpf_redirect_map can redirect the frame to a different device.Some devices (like virtual ethernet links) support running a second XDPprogram after the redirect. However, the final consumer doesn’t haveaccess to the original hardware descriptor and can’t access any ofthe original metadata. The same applies to XDP programs installedinto devmaps and cpumaps.

This means that for redirected packets only custom metadata iscurrently supported, which has to be prepared by the initial XDP programbefore redirect. If the frame is eventually passed to the kernel, theskb created from such a frame won’t have any hardware metadata populatedin itsskb. If such a packet is later redirected into anXSK,that will also only have access to the custom metadata.

bpf_tail_call

Adding programs that access metadata kfuncs to theBPF_MAP_TYPE_PROG_ARRAYis currently not supported.

Supported Devices

It is possible to query which kfunc the particular netdev implements vianetlink. Seexdp-rx-metadata-features attribute set inDocumentation/netlink/specs/netdev.yaml.

Driver Implementation

Certain devices may prepend metadata to received packets. However, as of now,AF_XDP lacks the ability to communicate the size of thedata_meta areato the consumer. Therefore, it is the responsibility of the driver to copy anydevice-reserved metadata out from the metadata area and ensure thatxdp_buff->data_meta is pointing toxdp_buff->data before presenting theframe to the XDP program. This is necessary so that, after the XDP programadjusts the metadata area, the consumer can reliably retrieve the metadataaddress usingMETADATA_SIZE offset.

The following diagram shows how custom metadata is positioned relative to thepacket data and how pointers are adjusted for metadata access:

            |<-- bpf_xdp_adjust_meta(xdp_buff, -METADATA_SIZE) --|new xdp_buff->data_meta                              old xdp_buff->data_meta            |                                                    |            |                                            xdp_buff->data            |                                                    | +----------+----------------------------------------------------+------+ | headroom |                  custom metadata                   | data | +----------+----------------------------------------------------+------+            |                                                    |            |                                            xdp_desc->addr            |<------ xsk_umem__get_data() - METADATA_SIZE -------|

bpf_xdp_adjust_meta ensures thatMETADATA_SIZE is aligned to 4 bytes,does not exceed 252 bytes, and leaves sufficient space for building thexdp_frame. If these conditions are not met, it returns a negative error. In thiscase, the BPF program should not proceed to populate data into thedata_metaarea.

Example

Seetools/testing/selftests/bpf/progs/xdp_metadata.c andtools/testing/selftests/bpf/prog_tests/xdp_metadata.c for an example ofBPF program that handles XDP metadata.