Movatterモバイル変換

RFC 9347	IP Traffic Flow Security	January 2023
Hopps	Standards Track	[Page]

Abstract

This document describes a mechanism for aggregation andfragmentation of IP packets when they are being encapsulated in Encapsulating Security Payload (ESP). This new payload type can be used for various purposes, suchas decreasing encapsulation overhead for small IP packets; however,the focus in this document is to enhance IP Traffic Flow Security(IP-TFS) by adding Traffic Flow Confidentiality (TFC) to encrypted IP-encapsulated traffic. TFC is provided by obscuring the size andfrequency of IP traffic using a fixed-size, constant-send-rate IPsectunnel. The solution allows for congestion control, as well asnonconstant send-rate usage.¶

Status of This Memo

This is an Internet Standards Track document.¶

This document is a product of the Internet Engineering Task Force (IETF). It represents the consensus of the IETF community. It has received public review and has been approved for publication by the Internet Engineering Steering Group (IESG). Further information on Internet Standards is available in Section 2 of RFC 7841.¶

Information about the current status of this document, any errata, and how to provide feedback on it may be obtained athttps://www.rfc-editor.org/info/rfc9347.¶

Copyright Notice

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶

1.Introduction

Traffic analysis[RFC4301][AppCrypt] is the act of extractinginformation about data being sent through a network. While directlyobscuring the data with encryption[RFC4303], the patterns in themessage traffic may expose information due to variations in its shapeand timing[RFC8546][AppCrypt]. Hiding the size and frequency oftraffic is referred to as Traffic Flow Confidentiality (TFC), per[RFC4303].¶

[RFC4303] provides for TFC by allowing padding to be added to encrypted IP packets and allowing for transmission of all-pad packets(indicated using protocol 59). This method has the major limitation that it can significantly underutilize the available bandwidth.¶

This document defines an aggregation and fragmentation (AGGFRAG) modefor ESP, as well as ESP's use for IP Traffic Flow Security (IP-TFS). Thissolution provides for full TFC without the aforementioned bandwidthlimitation. This is accomplished by using a constant-send-rate IPsec[RFC4303] tunnel with fixed-size encapsulating packets; however, thesefixed-size packets can contain partial, whole, or multiple IP packetsto maximize the bandwidth of the tunnel. A nonconstant send rate isallowed, but the confidentiality properties of its use are outsidethe scope of this document.¶

For a comparison of the overhead of IP-TFS with the TFC solutionprescribed in[RFC4303], seeAppendix C.¶

Additionally, IP-TFS provides for operating fairly within congestednetworks[RFC2914]. This is important for when the IP-TFS user is notin full control of the domain through which the IP-TFS tunnel pathflows.¶

The mechanisms, such as the AGGFRAG mode, defined in this documentare generic with the intent of allowing for non-TFS uses, but suchuses are outside the scope of this document.¶

1.1.Terminology & Concepts

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14[RFC2119][RFC8174] when, and only when, they appear in all capitals, as shown here.¶

This document assumes familiarity with IP security concepts, includingTFC, as described in[RFC4301].¶

2.The AGGFRAG Tunnel

As mentioned inSection 1, the AGGFRAG mode utilizes an IPsec[RFC4303] tunnel as its transport. For the purpose of IP-TFS, fixed-size encapsulatingpackets are sent at a constant rate on the AGGFRAG tunnel.¶

The primary input to the tunnel algorithm is the requested bandwidthto be used by the tunnel. Two values are then required to provide forthis bandwidth use: the fixed size of the encapsulating packets andthe rate at which to send them.¶

The fixed packet sizeMAY either be specified manually or bedetermined through other methods, such as the Packetization Layer MTUDiscovery (PLMTUD)[RFC4821][RFC8899] or Path MTU Discovery (PMTUD)[RFC1191][RFC8201]. PMTUD is known to have issues, so PLMTUD isconsidered the more robust option. For PLMTUD, congestion controlpayloads can be used as in-band probes (seeSection 6.1.2 and[RFC8899]).¶

Given the encapsulating packet size and the requested bandwidth to beused, the corresponding packet send rate can be calculated. Thepacket send rate is the requested bandwidth to be used, which is then divided by thesize of the encapsulating packet.¶

The egress (receiving) side of the AGGFRAG tunnelMUST allow for andexpect the ingress (sending) side of the AGGFRAG tunnel to vary thesize and rate of sent encapsulating packets, unless constrained byother policy.¶

2.1.Tunnel Content

As previously mentioned, one issue with the TFC padding solution in[RFC4303] is the large amount of wasted bandwidth, as only one IPpacket can be sent per encapsulating packet. In order to maximizebandwidth, IP-TFS breaks this one-to-one association by introducingan AGGFRAG mode for ESP.¶

The AGGFRAG mode aggregates and fragments the inner IP trafficflow into encapsulating IPsec tunnel packets. For IP-TFS, the IPsecencapsulating tunnel packets are a fixed size. Padding is only addedto the tunnel packets if there is no data available to be sent atthe time of tunnel packet transmission or if fragmentation has beendisabled by the receiver.¶

This is accomplished using a new Encapsulating Security Payload (ESP)[RFC4303] Next Header field value AGGFRAG_PAYLOAD(Section 6.1).¶

Other non-IP-TFS uses of this AGGFRAG mode have been suggested, suchas increased performance through packet aggregation, as well ashandling MTU issues using fragmentation. These uses are not definedhere but are also not restricted by this document.¶

2.2.Payload Content

The AGGFRAG_PAYLOAD payload content defined in this documentconsists of a 4- or 24-octet header, followed by either a partialdata block, a full data block, or multiple partial or full data blocks.The following diagram illustrates this payload within the ESP packet.SeeSection 6.1 for the exact formats of theAGGFRAG_PAYLOAD payload.¶

 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Outer Encapsulating Header ...                                . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ESP Header...                                                 . +---------------------------------------------------------------+ |   [AGGFRAG sub-type/flags]   :           BlockOffset          | +---------------------------------------------------------------+ :                  [Optional Congestion Info]                   : +---------------------------------------------------------------+ |       DataBlocks ...                                          ~ ~                                                               ~ ~                                                               | +---------------------------------------------------------------| . ESP Trailer...                                                . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

Figure 1:Layout of an AGGFRAG Mode IPsec Packet

TheBlockOffset value is either zero or some offset into or pastthe end of theDataBlocks data.¶

If theBlockOffset value is zero, it means that theDataBlocksdata begins with a new data block.¶

Conversely, if theBlockOffset value is non-zero, it points to thestart of the new data block, and the initialDataBlocks databelongs to the data block that is still being reassembled.¶

If theBlockOffset points past the end of theDataBlocks data,then the next data block occurs in a subsequent encapsulating packet.¶

Having theBlockOffset always point at the next available datablock allows for recovering the next inner packet in thepresence of outer encapsulating packet loss.¶

An example AGGFRAG mode packet flow can be found inAppendix A.¶

2.2.1.DataBlocks

 +---------------------------------------------------------------+ | Type  | rest of IPv4, IPv6, or pad... +--------

Figure 2:Layout of a Data Block

A data block is defined by a 4-bit type code, followed by the datablock data. The type values have been carefully chosen to coincidewith the IPv4/IPv6 version field values so that no per-data block type overhead is required to encapsulate an IP packet. Likewise, thelength of the data block is extracted from the encapsulated IPv4'sTotal Length or IPv6'sPayload Length fields.¶

2.2.2.End Padding

Since a data block's type is identified in its first 4 bits, the onlytime padding is required is when there is no data to encapsulate. Forthis end padding, aPad Data Block is used.¶

2.2.3.Fragmentation, Sequence Numbers, and All-Pad Payloads

In order for a receiver to reassemble fragmented inner packets, thesenderMUST send the inner packet fragments back to back in thelogical outer packet stream (i.e., using consecutive ESP sequencenumbers). However, the sender is allowed to insert "all-pad" payloads(i.e., payloads with aBlockOffset of zero and a single paddata block ) in between the packets carrying the inner packetfragment payloads. This interleaving of all-pad payloads allows thesender to always send a tunnel packet, regardless of theencapsulation computational requirements.¶

When a receiver is reassembling an inner packet, and it receives an"all-pad" payload, it increments the expected sequence number thatthe next inner packet fragment is expected to arrive in.¶

Given the above, the receiver will need to handle out-of-orderarrival of outer ESP packets prior to reassembly processing. ESPalready provides for optionally detecting replay attacks. Detectingreplay attacks normally utilizes a window method. A similar sequence-number-basedsliding window can be used to correct reordering of theouter packet stream.Receiving a larger (newer) sequence numberpacket advances the window, and if any older ESP packets whosesequence numbers the window has passed by are received, then the packets are dropped. A good choicefor the size of this window depends on the amount of misordering theuser is experiencing; however, a value of 3 has been suggested as adefault when no more informed choice exists.¶

As the amount of misordering that may be present is hard to predict,the window sizeSHOULD be configurable by the user. ImplementationsMAY also dynamically adjust the reordering window based on actualmisordering seen in arriving packets.¶

Please note, when IP-TFS sends a continuous stream of packets, thereis no requirement for an explicit lost packet timer; however, using alost packet timer isRECOMMENDED. If an implementation does not use alost packet timer and only considers an outer packet lost when thereorder window moves by it, the inner traffic can be delayed by up tothe reorder window size times the per-packet send rate. Thisdelay could be significant for slower send rates or when largerreorder window sizes are in use. As the lost packet timer affectsthe delay of inner packet delivery, an implementation or user could choose to set itproportionate to the tunnel rate.¶

While ESP guarantees an increasing sequence number with subsequentlysent packets, it does not actually require the sequence numbers to begenerated consecutively (e.g., sending only even-numbered sequencenumbers would be allowed, as long as they are always increasing). Gapsin the sequence numbers will not work for this document, so thesequence number streamMUST increase monotonically by 1 for eachsubsequent packet.¶

When using the AGGFRAG_PAYLOAD in conjunction with replay detection,the window size for bothMAY be reduced to the smaller of the twowindow sizes. This is because packets outside of the smaller windowbut inside the larger window would still be dropped by the mechanism withthe smaller window size. However, there is also no requirement tomake these values the same. Indeed, in some cases, such as slowtunnels where a very small or zero reorder window size isappropriate, the user may still want a large replay detection windowto log replayed packets. Additionally, large replay windows can beimplemented with very little overhead, compared to large reorderwindows.¶

Finally, as sequence numbers are reset when switching Security Associations (SAs) (e.g., whenrekeying a Child SA), sendersMUST NOT send initial fragments of an inner packet using one SA and subsequent fragments in a different SA.¶

A note onBlockOffset values: SendersMUST encode theBlockOffsetconsistently with the immediately preceding non-all-pad payload packet.Specifically, if the immediately preceding non-all-pad payload packetended with a Pad Data Block, thisBlockOffsetMUST be zero, as PadData Blocks are never fragmented. TheBlockOffsetMUST beconsistent with the remaining size implied by the lengthfield from the fragmented inner packet.¶

2.2.3.1.Optional Extra Padding

When the tunnel bandwidth is not being fully utilized, asenderMAY pad out the current encapsulating packet in orderto deliver an inner packet unfragmented in the following outerpacket. The benefit would be to avoid inner packet fragmentation inthe presence of a bursty offered load (non-bursty traffic willnaturally not fragment). SendersMAY also choose to allowfor a minimum fragment size to be configured (e.g., as a percentageof the AGGFRAG_PAYLOAD payload size) to avoid fragmentation at thecost of tunnel bandwidth. The costs with these methods are complexityand an added delay of inner traffic. The main advantage to avoidingfragmentation is to minimize inner packet loss in the presence ofouter packet loss. When this is worthwhile (e.g., how much loss andwhat type of loss is required, given different inner traffic shapesand utilization, for this to make sense) and what values to use forthe allowable/added delay may be worth researching but is outsidethe scope of this document.¶

While use of padding to avoid fragmentation does not impactinteroperability, if padding is used inappropriately, it can reduce the effectivethroughput of a tunnel. Senders implementing either of theabove approaches will need to take care to not reduce the effectivecapacity, and overall utility, of the tunnel through the overuse ofpadding.¶

2.2.4.Empty Payload

To support reporting of congestion control information (describedlater) using a non-AGGFRAG_PAYLOAD-enabled SA, it is allowed to sendan AGGFRAG_PAYLOAD payload with no data blocks (i.e., the ESP payloadlength is equal to the AGGFRAG_PAYLOAD header length). This specialpayload is called an empty payload.¶

Currently, this situation is only applicable in use cases without Internet Key Exchange Protocol Version 2 (IKEv2).¶

2.2.5.IP Header Value Mapping

[RFC4301] provides some direction on when and how to map various valuesfrom an inner IP header to the outer encapsulating header, namely theDon't Fragment (DF) bit[RFC0791], the DifferentiatedServices (DS) field[RFC2474], and the Explicit Congestion Notification(ECN) field[RFC3168]. Unlike in[RFC4301], the AGGFRAG mode may, and often will, beencapsulating more than one IP packet per ESP packet. To deal withthis, these mappings are restricted further.¶

2.2.5.1.DF Bit

The AGGFRAG mode never maps the inner DF bit, as it is unrelated to theAGGFRAG tunnel functionality; the AGGFRAG mode never needs to IP fragmentthe inner packets, and the inner packets will not affect thefragmentation of the outer encapsulation packets.¶

2.2.5.2.ECN Value

The ECN value need not be mapped, as any congestion related to theconstant-send-rate IP-TFS tunnel is unrelated (by design) to theinner traffic flow. The senderMAY still set the ECN value of innerpackets based on the normal ECN specification[RFC3168][RFC4301][RFC6040].¶

2.2.5.3.DS Field

By default, the DS fieldSHOULD NOT be copied, although a senderMAYchoose to allow for configuration to override this behavior. A senderSHOULD also allow the DS value to be set by configuration.¶

2.2.6.IPv4 Time To Live (TTL), IPv6 Hop Limit, and ICMP Messages

How to modify the inner packet IPv4 TTL[RFC0791] orIPv6 Hop Limit[RFC8200] is specified in[RFC4301].¶

[RFC4301] specifies how to apply policy to authenticated andunauthenticated ICMP error packets (e.g., Destination Unreachable)arriving at or being forwarded through the endpoint, in particular,whether to process, ignore, or forward said packets. With the oneexception that this document does not change the handling of thesepackets, they should be handled as specified in[RFC4301].¶

The one way in which an AGGFRAG tunnel differs in ICMP error packetmechanics is with PMTU. When fragmentation is enabled on the AGGFRAGtunnel, then no ICMP "Too Big" errors need to be generated forarriving ingress traffic, as the arriving inner packets will benaturally fragmented by the AGGFRAG encapsulation.¶

Otherwise, when fragmentation has been disabled on the AGGFRAG tunnel,then the treatment of arriving inner traffic exactly maps to that ofa non-AGGFRAG ESP tunnel. Explicitly, IPv4 with DF set and IPv6packets that cannot fit in its own outer packet payload willgenerate the appropriate ICMP "Too Big" error, as described in[RFC4301],and IPv4 packets without DF set will be IP fragmented, as described in[RFC4301].¶

Packets egressing the tunnel continue to be handled as specified in[RFC4301].¶

All other aspects of PMTU and the handling of ICMP "Too Big" messages(i.e., with regards to the outer AGGFRAG/ESP tunnel packet size)also remain unchanged from[RFC4301].¶

2.2.7.Effective MTU of the Tunnel

Unlike in[RFC4301], there is normally no effective MTU (EMTU) on anAGGFRAG tunnel, as all IP packet sizes are properly transmitted withoutrequiring IP fragmentation prior to tunnel ingress. That said, asenderMAY allow for explicitly configuring an MTU for thetunnel.¶

If fragmentation has been disabled on the AGGFRAG tunnel, then thetunnel's EMTU and behaviors are the same as normal IPsec tunnels[RFC4301].¶

2.3.Exclusive SA Use

This document does not specify mixed use of anAGGFRAG_PAYLOAD-enabled SA. A senderMUST only send AGGFRAG_PAYLOADpayloads over an SA configured for AGGFRAG mode.¶

2.4.Modes of Operation

Just as with normal IPsec/ESP SAs, AGGFRAG SAs areunidirectional. Bidirectional IP-TFS functionality is achieved bysetting up 2 AGGFRAG SAs, one in either direction.¶

An AGGFRAG tunnel used for IP-TFS can operate in 2 modes, anon-congestion-controlled mode and congestion-controlled mode.¶

2.4.1.Non-Congestion-Controlled Mode

In the non-congestion-controlled mode, IP-TFS sends fixed-sizepackets over an AGGFRAG tunnel at a constant rate. The packet sendrate is constant and is not automatically adjusted, regardless of anynetwork congestion (e.g., packet loss).¶

For similar reasons as given in[RFC7510], the non-congestion-controlledmodeMUST only be used where the user has full administrative controlover any path the tunnel will take andMUST NOT be used if this isnot the case. This is required so the user can guarantee thebandwidth and also be sure as to not be negatively affecting networkcongestion[RFC2914]. In this case, packet loss should be reported tothe administrator (e.g., via syslog, YANG notification, SNMP traps,etc.) so that any failures due to a lack of bandwidth can becorrected. The use of circuit breakers is alsoRECOMMENDED (Section 2.4.2.1).¶

Users that choose the non-congestion-controlled mode need tounderstand that this mode will send packets at a constant rate,utilizing a constant, fixed bandwidth, and will not adjust based oncongestion. Thus, if they do not guarantee the bandwidth required bythe tunnel, the tunnel's operation, as well as the rest of theirnetwork, may be negatively impacted.¶

One expected use case for the non-congestion-controlled mode is toguarantee the full tunnel bandwidth is available and preferred overother non-tunnel traffic. In fact, a typical site-to-site use casemight have all of the user traffic utilizing the IP-TFS tunnel.¶

The non-congestion-controlled mode is also appropriate if ESP over TCP isin use[RFC9329]. However, the use of TCP is considered a fallback-only solution for IPsec; it is highly not preferred. This is alsoone of the reasons that TCP was not chosen as the encapsulation forIP-TFS instead of AGGFRAG.¶

2.4.2.Congestion-Controlled Mode

With the congestion-controlled mode, IP-TFS adapts to networkcongestion by lowering the packet send rate to accommodate thecongestion, as well as raising the rate when congestion subsides.Since overhead is per packet, by allowing for maximal fixed-sizepackets and varying the send rate, transport overhead is minimized.¶

The output of the congestion control algorithm will adjust the rateat which the ingress sends packets. While this document does notrequire a specific congestion control algorithm, best currentpractice RECOMMENDS that the algorithm conform to[RFC5348]. Congestioncontrol principles are documented in[RFC2914] as well. There is an example in[RFC4342]of the algorithm in[RFC5348], which matches therequirements of IP-TFS (i.e., designed for fixed-size packets and sendrate varied based on congestion).¶

The required inputs for the TCP-friendly rate control algorithmdescribed in[RFC5348] are the receiver's loss event rate and thesender's estimated round-trip time (RTT). These values are provided byIP-TFS using the congestion information header fields described inSection 3. In particular, these values are sufficient toimplement the algorithm described in[RFC5348].¶

At a minimum, the congestion informationMUST be sent, from thereceiver and from the sender, at least once per RTT. Prior toestablishing an RTT, the informationSHOULD be sent constantly fromthe sender and the receiver so that an RTT estimate can beestablished. Not receiving this information over multipleconsecutive RTT intervals should be considered a congestion eventthat causes the sender to adjust its sending rate lower. Forexample, this is called the "no feedback timeout" in[RFC4342], and it is equalto 4 RTT intervals. When a "no feedback timeout" has occurred, the sending rate is halved, as per[RFC4342].¶

An implementationMAY choose to always include the congestioninformation in its AGGFRAG payload header if it is sending it on an IP-TFS-enabledSA. Since IP-TFS normally will operate with a large packetsize, the congestion information should represent a small portion ofthe available tunnel bandwidth. An implementation choosing to alwayssend the dataMAY also choose to only update theLossEventRateandRTT header field values it sends everyRTT through.¶

When choosing a congestion control algorithm (or a selection ofalgorithms), note that IP-TFS is not providing for reliable deliveryof IP traffic, and so per-packet acknowledgements (ACKs) are not required and are notprovided.¶

It is worth noting that the variable send rate of acongestion-controlled AGGFRAG tunnel is not private; however, thissend rate is being driven by network congestion, and as long as theencapsulated (inner) traffic flow shape and timing are not directlyaffecting the (outer) network congestion, the variations in thetunnel rate will not weaken the provided inner traffic flowconfidentiality.¶

2.4.2.1.Circuit Breakers

In addition to congestion control, implementations that support thenon-congestion-control modeSHOULD implement circuit breakers[RFC8084]as a recovery method of last resort. When circuit breakers areenabled, an implementationSHOULD also enable congestion controlreports so that circuit breakers have information to act on.¶

The pseudowire congestion considerations[RFC7893] are equallyapplicable to the mechanisms defined in this document, notably thetext on inelastic traffic.¶

One example of a simple, slow-trip circuit breaker that animplementation may provide would utilize 2 values: the amount ofpersistent loss rate required to trip the circuit breaker and the required lengthof time this persistent loss rate must be seen to trip the circuit breaker. These2 value are required configurations from the user. When the circuit breaker istripped, the tunnel traffic is disabled and an appropriate logmessage or other management type alarm is triggered, indicatingoperation intervention is required.¶

2.5.Summary of Receiver Processing

An AGGFRAG-enabled SA receiver has a few tasks to perform.¶

The receiverMAY process incoming AGGFRAG_PAYLOAD payloads as soon asthey arrive, as much as it can, i.e., if the incoming AGGFRAG_PAYLOADpacket contains complete inner packet(s), the receiver should extractand transmit them immediately. For partial packets, the receiver needsto keep the partial packets in the memory until they fall outfrom the reordering window or until the missing parts of the packetsare received, in which case, it will reassemble and transmit them. Ifthe AGGFRAG_PAYLOAD payload contains multiple packets, theySHOULD be sentout in the order they are in the AGGFRAG_PAYLOAD (i.e., keep theoriginal order they were received on the other end). The cost ofusing this method is that an amplification of out-of-order deliveryof inner packets can occur due to inner packet aggregation.¶

Instead of the method described in the previous paragraph, thereceiverMAY reorder out-of-order AGGFRAG_PAYLOAD payloads receivedinto in-sequence-order AGGFRAG_PAYLOAD payloads (Section 2.2.3), and only after it has anin-order AGGFRAG_PAYLOAD payload stream would the receiver transmitthe inner packets. Using this method will ensure the inner packetsare sent in order. The cost of this method is that a lost packet willcause a delay of up to the lost packet timer interval (or the fullreorder window if no lost packet timer is used). Additionally, therecan be extra burstiness in the output stream. This burstiness canhappen when a lost packet is dropped from the reorder window,and the remaining outer packets in the reorder window are immediatelyprocessed and sent out back to back.¶

Additionally, if congestion control is enabled, the receiver sendscongestion control data (Section 6.1.2) back to the sender, as described in Sections2.4.2and3.¶

Finally, a note on receiving incorrectBlockOffset values: To accountfor misbehaving senders, a receiverSHOULD gracefully handle the casewhere theBlockOffset of consecutive packets, and/or the innerpacket they share, do not agree. ItMAY drop the inner packet or one or both of the outer packets.¶

3.Congestion Information

In order to support the congestion-controlled mode, the sender needs toknow the loss event rate and to approximate the RTT[RFC5348]. In orderto obtain these values, the receiver sends congestion controlinformation on its SA back to the sender. Thus, to supportcongestion control, the receiverMUST have a paired SA back to thesender (this is always the case when the tunnel was created usingIKEv2). If the SA back to the sender is a non-AGGFRAG_PAYLOAD-enabledSA, then an AGGFRAG_PAYLOAD empty payload (i.e., header only) is usedto convey the information.¶

In order to calculate a loss event rate compatible with[RFC5348], thereceiver needs to have an RTT estimate. Thus, the sendercommunicates this estimate in theRTT header field. On startup, thisvalue will be zero, as no RTT estimate is yet known.¶

In order for the sender to estimate itsRTT value, the senderplaces a timestamp value in theTVal header field. On first receiptof thisTVal, the receiver records the newTVal value, along withthe time it arrived locally. Subsequent receipt of the sameTValMUST NOT update the recorded time.¶

When the receiver sends its congestion control header, it places this latest recordedTVal in theTEcho header field, along with 2 delay values:EchoDelay andTransmit Delay. TheEcho Delay value is the time deltafrom the recorded arrival time ofTVal and the current clock inmicroseconds. The second value,Transmit Delay, is the receiver'scurrent transmission delay on the tunnel (i.e., the average timebetween sending packets on its half of the AGGFRAG tunnel).¶

When the sender receives back itsTVal in theTEcho header field,it calculates 2 RTT estimates. The first is the actual delay found bysubtracting theTEcho value from its current clock and thensubtracting theEcho Delay as well. The second RTT estimate is found byadding the receivedTransmit Delay header value to the sender's owntransmission delay (i.e., the average time between sending packets onits half of the AGGFRAG tunnel). The larger of these 2 RTT estimatesSHOULD be used as theRTT value.¶

The two RTT estimates are required to handle different combinations offaster or slower tunnel packet paths with faster or slower fixedtunnel rates. Choosing the larger of the two values guarantees thattheRTT is never considered faster than the aggregate transmissiondelay based on the IP-TFS send rate (the second estimate), as wellas never being considered faster than the actual RTT along the tunnelpacket path (the first estimate).¶

The receiver also calculates, and communicates in theLossEventRateheader field, the loss event rate for use by the sender. This isslightly different from[RFC4342], which periodically sends all the lossinterval data back to the sender so that it can do the calculation.SeeAppendix B for a suggested way tocalculate the loss event rate value. Initially, this value will bezero (indicating no loss) until enough data has been collected by thereceiver to update it.¶

3.1.ECN Support

In addition to normal packet loss information, the AGGFRAG mode supports useof the ECN bits in the encapsulating IP header[RFC3168] foridentifying congestion. If ECN use is enabled and a packet arrives atthe egress (receiving) side with the Congestion Experienced (CE) value set,then the receiver considers that packet as being dropped, although itdoes not drop it. The receiverMUST set the E bit in anyAGGFRAG_PAYLOAD payload header containing aLossEventRate valuederived from a CE value being considered.¶

In[RFC6040], which updates[RFC3168] and[RFC4301], behaviors for markingthe outer ECN field value based on the ECN field of the inner packet are defined.As the AGGFRAG mode may have multiple inner packets present in a singleouter packet, and there is no obvious correct way to map thesemultiple values to the single outer packet ECN field value, thetunnel ingress endpointSHOULD operate in the "compatibility" mode,rather than the "default" mode from[RFC6040]. In particular, this meansthat the ingress (sending) endpoint of the tunnel always sets thenewly constructed outer encapsulating packet header ECN fieldto Not-ECT[RFC6040].¶

4.Configuration of AGGFRAG Tunnels for IP-TFS

IP-TFS is meant to be deployable with a minimal amount ofconfiguration. All IP-TFS-specific configuration should bespecified at the unidirectional tunnel ingress (sending) side. Itis intended that non-IKEv2 operation is supported, at least, withlocal static configuration.¶

YANG and MIB documents have been defined for IP-TFS in[RFC9348] and[RFC9349].¶

4.1.Bandwidth

Bandwidth is a local configuration option. For thenon-congestion-controlled mode, the bandwidthSHOULD be configured.For the congestion-controlled mode, the bandwidth can be configured orthe congestion control algorithm discovers and uses the maximumbandwidth available. No standardized configuration method isrequired.¶

4.2.Fixed Packet Size

The fixed packet size to be used for the tunnel encapsulation packetsMAY be configured manually or can be automatically determined usingother methods, such as PLMTUD[RFC4821][RFC8899] or PMTUD[RFC1191][RFC8201]. As PMTUD is known to have issues, PLMTUD is considered themore robust option. No standardized configuration method is required.¶

4.3.Congestion Control

Congestion control is a local configuration option. No standardizedconfiguration method is required.¶

5.IKEv2

5.1.USE_AGGFRAG Notification Message

As mentioned previously, AGGFRAG tunnels utilize ESP payloads of typeAGGFRAG_PAYLOAD.¶

When using IKEv2, a new "USE_AGGFRAG" notification message enablesthe AGGFRAG_PAYLOAD payload on a Child SA pair. Themethod used is similar to how USE_TRANSPORT_MODE is negotiated, asdescribed in[RFC7296].¶

To request use of the AGGFRAG_PAYLOAD payload on the Child SA pair,the initiator includes the USE_AGGFRAG notification in an SA payloadrequesting a new Child SA (either during the initial IKE_AUTH orduring CREATE_CHILD_SA exchanges). If the request isaccepted, then the responseMUST also include a notification of typeUSE_AGGFRAG. If the responder declines the request, the Child SA willbe established without AGGFRAG_PAYLOAD payload use enabled. Ifthis is unacceptable to the initiator, the initiatorMUST delete theChild SA.¶

As the use of the AGGFRAG_PAYLOAD payload is currently only definedfor non-transport-mode tunnels, the USE_AGGFRAG notificationMUST NOTbe combined with the USE_TRANSPORT notification.¶

The USE_AGGFRAG notification contains a 1-octet payload of flags thatspecify requirements from the sender of the notification. If anyrequirement flags are not understood or cannot be supported by thereceiver, then the receiverSHOULD NOT enable use of AGGFRAG_PAYLOAD(either by not responding with the USE_AGGFRAG notification or, inthe case of the initiator, by deleting the Child SA if the now-established non-AGGFRAG_PAYLOAD using SA is unacceptable).¶

The notification type and payload flag values are defined inSection 6.1.4.¶

6.Packet and Data Formats

The packet and data formats defined below are generic with the intentof allowing for non-IP-TFS uses, but such uses are outside the scope ofthis document.¶

6.1.AGGFRAG_PAYLOAD Payload

ESP Next Header value: 144¶

An AGGFRAG payload is identified by the ESP Next Header valueAGGFRAG_PAYLOAD, which has the value 144, which has been reserved inthe IP protocol numbers space. The first octet of the payloadindicates the format of the remaining payload data.¶

  0 1 2 3 4 5 6 7 +-+-+-+-+-+-+-+-+-+-+- |   Sub-type    | ... +-+-+-+-+-+-+-+-+-+-+-

Figure 3:AGGFRAG_PAYLOAD Payload Format

Sub-type:: An 8-bit value indicating the payload format.¶

This document defines 2 payload sub-types. These payload formatsare defined in the following sections.¶

6.1.1.Non-Congestion-Control AGGFRAG_PAYLOAD Payload Format

The non-congestion-control AGGFRAG_PAYLOAD payload consists of a4-octet header, followed by a variable amount ofDataBlocks data, asshown below.¶

                      1                   2                   3  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |  Sub-Type (0) |   Reserved    |          BlockOffset          | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |       DataBlocks ... +-+-+-+-+-+-+-+-+-+-+-

Figure 4:Non-Congestion-Control Payload Format

Sub-type:: An octet indicating the payload format. For thisnon-congestion-control format, the value is 0.¶
Reserved:: An octet set to 0 on generation and ignored onreceipt.¶
BlockOffset:: A 16-bit unsigned integer counting the number ofoctets ofDataBlocks data before the start of anew data block. If the start of a new data blockoccurs in a subsequent payload, theBlockOffsetwill point past the end of theDataBlocks data.In this case, all theDataBlocks data belongs tothe current data block being assembled. When theBlockOffset extends into subsequent payloads, itcontinues to only countDataBlocks data (i.e.,it does not count subsequent packets of thenon-DataBlocks data, such as header octets).¶
DataBlocks:: Variable number of octets that begins with the startof a data block or the continuation of a previousdata block, followed by zero or more additional datablocks.¶

6.1.2.Congestion Control AGGFRAG_PAYLOAD Payload Format

The congestion control AGGFRAG_PAYLOAD payload consists of a 24-octet header, followed by a variable amount ofDataBlocks data, asshown below.¶

                      1                   2                   3  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |  Sub-type (1) |  Reserved |P|E|          BlockOffset          | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |                          LossEventRate                        | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |                      RTT                  |   Echo Delay ... +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+      ... Echo Delay   |           Transmit Delay                | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |                              TVal                             | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |                             TEcho                             | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |       DataBlocks ... +-+-+-+-+-+-+-+-+-+-+-

Figure 5:Congestion Control Payload Format

Sub-type:: An octet indicating the payload format. For thiscongestion control format, the value is 1.¶
Reserved:: A 6-bit field set to 0 on generation and ignored on receipt.¶
P:: A 1-bit value that, if set, indicates that PLMTUD probing is inprogress. This information can be used to avoid treatingmissing packets as loss events by the congestion control algorithm whenrunning the PLMTUD probe algorithm.¶
E:: A 1-bit value that, if set, indicates that Congestion Experienced(CE) ECN bits were received and used in deriving thereportedLossEventRate.¶
BlockOffset:: The same value as the non-congestion-controlledpayload format value.¶
LossEventRate:: A 32-bit value specifying the inverse of thecurrent loss event rate, as calculated by thereceiver. A value of zero indicates no loss.Otherwise, the loss event rate is1/LossEventRate.¶
RTT:: A 22-bit value specifying the sender's current RTT estimate in microseconds. The valueMAY be zero priorto the sender having calculated an RTT estimate.The valueSHOULD be set to zero onnon-AGGFRAG_PAYLOAD-enabled SAs. If the RTT is equal to orlarger than0x3FFFFF, the valueMUST be set to0x3FFFFF.¶
Echo Delay:: A 21-bit value specifying the delay in microsecondsincurred between the receiver first receiving theTValvalue, which it is sending back inTEcho. If the delayis equal to or larger than0x1FFFFF, the valueMUST beset to0x1FFFFF.¶
Transmit Delay:: A 21-bit value specifying the transmission delay inmicroseconds. This is the fixed (or average) delay on thereceiver between it sending packets on the IP-TFS tunnel.If the delay is equal to or larger than0x1FFFFF, thevalueMUST be set to0x1FFFFF.¶
TVal:: An opaque, 32-bit value that will be echoed back by thereceiver in later packets in theTEcho field, along withanEcho Delay value of how long that echo took.¶
TEcho:: The opaque, 32-bit value from a received packet'sTValfield. The receivedTVal is placed inTEcho, along withanEcho Delay value indicating how long it has been sincereceiving theTVal value.¶
DataBlocks:: Variable number of octets that begins with the startof a data block or the continuation of a previousdata block, followed by zero or more additional datablocks. For the special case of sending congestioncontrol information on a non-IP-TFS-enabled SA, thisfieldMUST be empty (i.e., be zero octets long).¶

6.1.3.Data Blocks

                      1                   2                   3  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type  | IPv4, IPv6, or pad... +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-

Figure 6:Data Block Format

Type:: A 4-bit field where 0x0 identifies a Pad Data Block, 0x4indicates an IPv4 data block, and 0x6 indicates an IPv6data block.¶

6.1.3.1.IPv4 Data Block

                      1                   2                   3  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |  0x4  |  IHL  |  TypeOfService  |         TotalLength         | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Rest of the inner packet ... +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-

Figure 7:IPv4 Data Block Format

These values are the actual values within the encapsulated IPv4header. In other words, the start of this data block is the start ofthe encapsulated IP packet.¶

Type:: A 4-bit value of 0x4 indicating IPv4 (i.e., first nibble ofthe IPv4 packet).¶
TotalLength:: The 16-bit unsigned integer "Total Length" field ofthe IPv4 inner packet.¶

6.1.3.2.IPv6 Data Block

                      1                   2                   3  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |  0x6  | TrafficClass  |               FlowLabel               | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |         PayloadLength         | Rest of the inner packet ... +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-

Figure 8:IPv6 Data Block Format

These values are the actual values within the encapsulated IPv6header. In other words, the start of this data block is the start ofthe encapsulated IP packet.¶

Type:: A 4-bit value of 0x6 indicating IPv6 (i.e., first nibble ofthe IPv6 packet).¶
PayloadLength:: The 16-bit unsigned integer "Payload Length" fieldof the inner IPv6 inner packet.¶

6.1.3.3.Pad Data Block

                      1                   2                   3  0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |  0x0  | Padding ... +-+-+-+-+-+-+-+-+-+-+-

Figure 9:Pad Data Block Format

Type:: A 4-bit value of 0x0 indicating a padding data block.¶
Padding:: Extends to end of the encapsulating packet.¶

6.1.4.IKEv2 USE_AGGFRAG Notification Message

As discussed inSection 5.1, a notificationmessage USE_AGGFRAG is used to negotiate use of the ESP AGGFRAG_PAYLOADNext Header value.¶

The USE_AGGFRAG Notification Message State Type is 16442.¶

The notification payload contains 1 octet of requirement flags. Thereare currently 2 requirement flags defined. This may be revised bylater specifications.¶

 +-+-+-+-+-+-+-+-+ |0|0|0|0|0|0|C|D| +-+-+-+-+-+-+-+-+

Figure 10:USE_AGGFRAG Requirement Flags

0:: 6 bits - ReservedMUST be zero on send, unless defined bylater specifications.¶
C:: Congestion Control bit. If set, then the sender is requiringthat congestion control informationMUST be returned to itperiodically, as defined inSection 3.¶
D:: Don't Fragment bit. If set, it indicates the sender of the notifymessage does not support receiving packet fragments (i.e., innerpacketsMUST be sent using a singleData Block). This value onlyapplies to what the sender is capable of receiving; the senderMAYstill send packet fragments unless similarly restricted by thereceiver in its USE_AGGFRAG notification.¶

7.IANA Considerations

7.1.ESP Next Header Value

IANA hasallocated an IP protocol number from the "Protocol Numbers - AssignedInternet Protocol Numbers" registry as follows.¶

Decimal:: 144¶
Keyword:: AGGFRAG¶
Protocol:: AGGFRAG encapsulation payload for ESP¶
Reference:: RFC 9347¶

7.2.AGGFRAG_PAYLOAD Sub-Types

IANA has created a registry called "AGGFRAG_PAYLOADSub-Types" under a new category named "ESP AGGFRAG_PAYLOAD".The registration policy for this registry is "Expert Review"[RFC8126][RFC7120].¶

Name:: AGGFRAG_PAYLOAD Sub-Types¶
Description:: AGGFRAG_PAYLOAD Payload Formats¶
Reference:: RFC 9347¶

This initial content for this registry is as follows:¶

Table 1:AGGFRAG_PAYLOAD Sub-Types
Sub-Type	Name	Reference
0	Non-Congestion-Control Format	RFC 9347
1	Congestion Control Format	RFC 9347
3-255	Reserved

7.3.USE_AGGFRAG Notify Message Status Type

IANA has allocated a status type USE_AGGFRAG fromthe "IKEv2 Notify Message Types - Status Types" registry.¶

Decimal:: 16442¶
Name:: USE_AGGFRAG¶
Reference:: RFC 9347¶

8.Security Considerations

This document describes an aggregation and fragmentation mechanism toefficiently implement TFC for IP traffic. This approach is expected to reducethe efficacy of traffic analysis on IPsec communication. Other thanthe additional security afforded by using this mechanism, IP-TFSutilizes the security protocols[RFC4303] and[RFC7296], and so theirsecurity considerations apply to IP-TFS as well.¶

As noted inSection 3.1, the ECN bits are not protected by IPsec andthus may constitute a covert channel. For this reason, ECN useSHOULD NOT be enabled by default.¶

As noted previously inSection 2.4.2, for TFC to bemaintained, the encapsulated traffic flow should not beaffecting network congestion in a predictable way, and if it would be,then non-congestion-controlled mode use should be considered instead.¶

9.References

9.1.Normative References

[RFC2119]: Bradner, S.,"Key words for use in RFCs to Indicate Requirement Levels",BCP 14,RFC 2119,DOI 10.17487/RFC2119,March 1997,<https://www.rfc-editor.org/info/rfc2119>.
[RFC4303]: Kent, S.,"IP Encapsulating Security Payload (ESP)",RFC 4303,DOI 10.17487/RFC4303,December 2005,<https://www.rfc-editor.org/info/rfc4303>.
[RFC7296]: Kaufman, C.,Hoffman, P.,Nir, Y.,Eronen, P., andT. Kivinen,"Internet Key Exchange Protocol Version 2 (IKEv2)",STD 79,RFC 7296,DOI 10.17487/RFC7296,October 2014,<https://www.rfc-editor.org/info/rfc7296>.
[RFC8174]: Leiba, B.,"Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words",BCP 14,RFC 8174,DOI 10.17487/RFC8174,May 2017,<https://www.rfc-editor.org/info/rfc8174>.

9.2.Informative References

[AppCrypt]: Schneier, B.,"Applied Cryptography: Protocols, Algorithms, and Source Code in C",1996.
[RFC0791]: Postel, J.,"Internet Protocol",STD 5,RFC 791,DOI 10.17487/RFC0791,September 1981,<https://www.rfc-editor.org/info/rfc791>.
[RFC1191]: Mogul, J. andS. Deering,"Path MTU discovery",RFC 1191,DOI 10.17487/RFC1191,November 1990,<https://www.rfc-editor.org/info/rfc1191>.
[RFC2474]: Nichols, K.,Blake, S.,Baker, F., andD. Black,"Definition of the Differentiated Services Field (DS Field) in the IPv4 and IPv6 Headers",RFC 2474,DOI 10.17487/RFC2474,December 1998,<https://www.rfc-editor.org/info/rfc2474>.
[RFC2914]: Floyd, S.,"Congestion Control Principles",BCP 41,RFC 2914,DOI 10.17487/RFC2914,September 2000,<https://www.rfc-editor.org/info/rfc2914>.
[RFC3168]: Ramakrishnan, K.,Floyd, S., andD. Black,"The Addition of Explicit Congestion Notification (ECN) to IP",RFC 3168,DOI 10.17487/RFC3168,September 2001,<https://www.rfc-editor.org/info/rfc3168>.
[RFC4301]: Kent, S. andK. Seo,"Security Architecture for the Internet Protocol",RFC 4301,DOI 10.17487/RFC4301,December 2005,<https://www.rfc-editor.org/info/rfc4301>.
[RFC4342]: Floyd, S.,Kohler, E., andJ. Padhye,"Profile for Datagram Congestion Control Protocol (DCCP) Congestion Control ID 3: TCP-Friendly Rate Control (TFRC)",RFC 4342,DOI 10.17487/RFC4342,March 2006,<https://www.rfc-editor.org/info/rfc4342>.
[RFC4821]: Mathis, M. andJ. Heffner,"Packetization Layer Path MTU Discovery",RFC 4821,DOI 10.17487/RFC4821,March 2007,<https://www.rfc-editor.org/info/rfc4821>.
[RFC5348]: Floyd, S.,Handley, M.,Padhye, J., andJ. Widmer,"TCP Friendly Rate Control (TFRC): Protocol Specification",RFC 5348,DOI 10.17487/RFC5348,September 2008,<https://www.rfc-editor.org/info/rfc5348>.
[RFC6040]: Briscoe, B.,"Tunnelling of Explicit Congestion Notification",RFC 6040,DOI 10.17487/RFC6040,November 2010,<https://www.rfc-editor.org/info/rfc6040>.
[RFC7120]: Cotton, M.,"Early IANA Allocation of Standards Track Code Points",BCP 100,RFC 7120,DOI 10.17487/RFC7120,January 2014,<https://www.rfc-editor.org/info/rfc7120>.
[RFC7510]: Xu, X.,Sheth, N.,Yong, L.,Callon, R., andD. Black,"Encapsulating MPLS in UDP",RFC 7510,DOI 10.17487/RFC7510,April 2015,<https://www.rfc-editor.org/info/rfc7510>.
[RFC7893]: Stein, Y(J).,Black, D., andB. Briscoe,"Pseudowire Congestion Considerations",RFC 7893,DOI 10.17487/RFC7893,June 2016,<https://www.rfc-editor.org/info/rfc7893>.
[RFC8084]: Fairhurst, G.,"Network Transport Circuit Breakers",BCP 208,RFC 8084,DOI 10.17487/RFC8084,March 2017,<https://www.rfc-editor.org/info/rfc8084>.
[RFC8126]: Cotton, M.,Leiba, B., andT. Narten,"Guidelines for Writing an IANA Considerations Section in RFCs",BCP 26,RFC 8126,DOI 10.17487/RFC8126,June 2017,<https://www.rfc-editor.org/info/rfc8126>.
[RFC8200]: Deering, S. andR. Hinden,"Internet Protocol, Version 6 (IPv6) Specification",STD 86,RFC 8200,DOI 10.17487/RFC8200,July 2017,<https://www.rfc-editor.org/info/rfc8200>.
[RFC8201]: McCann, J.,Deering, S.,Mogul, J., andR. Hinden, Ed.,"Path MTU Discovery for IP version 6",STD 87,RFC 8201,DOI 10.17487/RFC8201,July 2017,<https://www.rfc-editor.org/info/rfc8201>.
[RFC8546]: Trammell, B. andM. Kuehlewind,"The Wire Image of a Network Protocol",RFC 8546,DOI 10.17487/RFC8546,April 2019,<https://www.rfc-editor.org/info/rfc8546>.
[RFC8899]: Fairhurst, G.,Jones, T.,Tüxen, M.,Rüngeler, I., andT. Völker,"Packetization Layer Path MTU Discovery for Datagram Transports",RFC 8899,DOI 10.17487/RFC8899,September 2020,<https://www.rfc-editor.org/info/rfc8899>.
[RFC9329]: Pauly, T. andV. Smyslov,"TCP Encapsulation of Internet Key Exchange Protocol (IKE) and IPsec Packets",RFC 9329,DOI 10.17487/RFC9329,November 2022,<https://www.rfc-editor.org/info/rfc9329>.
[RFC9348]: Fedyk, D. andC. Hopps,"A YANG Data Model for IP Traffic Flow Security",RFC 9348,DOI 10.17487/RFC9348,January 2023,<https://www.rfc-editor.org/info/rfc9348>.
[RFC9349]: Fedyk, D. andE. Kinzie,"Definitions of Managed Objects for IP Traffic Flow Security",RFC 9349,DOI 10.17487/RFC9349,January 2023,<https://www.rfc-editor.org/info/rfc9349>.

Appendix A.Example of an Encapsulated IP Packet Flow

Below, an example inner IP packet flow within the encapsulating tunnelpacket stream is shown. Notice how encapsulated IP packets can startand end anywhere, and more than one or less than one may occur in asingle encapsulating packet.¶

  Offset: 0        Offset: 100    Offset: 2000    Offset: 600 [ ESP1  (1404) ][ ESP2  (1404) ][ ESP3  (1404) ][ ESP4  (1404) ] [--750--][--750--][60][-240-][--3000----------------------][pad]

Figure 11:Inner and Outer Packet Flow

Each outer encapsulating ESP space is a fixed size of 1404 octets, the first 4 octets of which contain the AGGFRAG header.The encapsulated IP packet flow (lengths include the IP header andpayload) is as follows: a 750-octet packet, a 750-octet packet, a60-octet packet, a 240-octet packet, and a 3000-octet packet.¶

TheBlockOffset values in the 4 AGGFRAG payload headers for thispacket flow would thus be: 0, 100, 2000, and 600, respectively. The firstencapsulating packet (ESP1) has a zeroBlockOffset, which points at theIP data block immediately following the AGGFRAG header. The followingpacket's (ESP2)BlockOffset points inward 100 octets to the start of the60-octet data block. The third encapsulating packet (ESP3) contains themiddle portion of the 3000-octet data block, so the offset points pastits end and into the fourth encapsulating packet. The fourth packet's(ESP4) offset is 600, pointing at the padding that follows thecompletion of the continued 3000-octet packet.¶

Appendix B.A Send and Loss Event Rate Calculation

The current best practice indicates that congestion controlSHOULD bedone in a TCP-friendly way. A TCP-friendly congestion control algorithmis described in[RFC5348]. For this IP-TFS use case (as with[RFC4342]), the(fixed) packet size is used as the segment size for the algorithm. Themain formula in the algorithm for the send rate is then as follows:¶

                              1   X = -----------------------------------------------       R * (sqrt(2*p/3) + 12*sqrt(3*p/8)*p*(1+32*p^2))

X is the send rate in packets per second,R is theRTT estimate, andp is the loss event rate (the inverseof which is provided by the receiver).¶

In addition, the algorithm in[RFC5348] also uses anX_recv value (thereceiver's receive rate). For IP-TFS, oneMAY set this value according tothe sender's current tunnel send rate (X).¶

The IP-TFS receiver, having the RTT estimate from the sender, can use thesame method as described in[RFC5348] and[RFC4342] to collect the lossintervals and calculate the loss event rate value using the weightedaverage as indicated. The receiver communicates the inverse of thisvalue back to the sender in the AGGFRAG_PAYLOAD payload header fieldLossEventRate.¶

The IP-TFS sender now has both theR andp values and can calculatethe correct sending rate. If following[RFC5348], the sender should alsouse the slow start mechanism described therein when the IP-TFS SA isfirst established.¶

Appendix C.Comparisons of IP-TFS

C.1.Comparing Overhead

For comparing overhead, the overhead of ESP for both normal and AGGFRAGtunnel packets must be calculated, and so an algorithm for encryptionand authentication must be chosen. For the data below, AES-GCM-256 wasselected. This leads to an IP+ESP overhead of 54.¶

  54 = 20 (IP) + 8 (ESPH) + 2 (ESPF) + 8 (IV) + 16 (ICV)

Additionally, for IP-TFS, non-congestion-control AGGFRAG_PAYLOADheaders were chosen, which adds 4 octets, for a total overhead of 58.¶

C.1.1.IP-TFS Overhead

For comparison, the overhead of an AGGFRAG payload is 58 octets per outer packet.Therefore, the octet overhead per inner packet is 58 divided by thenumber of outer packets required (fractions allowed). The overheadas a percentage of inner packet size is a constant based on the OuterMTU size.¶

   OH = 58 / Outer Payload Size / Inner Packet Size   OH % of Inner Packet Size = 100 * OH / Inner Packet Size   OH % of Inner Packet Size = 5800 / Outer Payload Size

Table 2:IP-TFS Overhead as Percentage of Inner Packet Size
Type	IP-TFS	IP-TFS	IP-TFS
MTU	576	1500	9000
PSize	518	1442	8942
40	11.20%	4.02%	0.65%
576	11.20%	4.02%	0.65%
1500	11.20%	4.02%	0.65%
9000	11.20%	4.02%	0.65%

C.1.2.ESP with Padding Overhead

The overhead per inner packet for constant-send-rate-padded ESP(i.e., original IPsec TFC) is 36 octets plus any padding, unlessfragmentation is required.¶

When fragmentation of the inner packet is required to fit in theouter IPsec packet, overhead is the number of outer packets requiredto carry the fragmented inner packet times both the inner IP Overhead(20) and the outer packet overhead (54) minus the initial inner IPOverhead plus any required tail padding in the last encapsulationpacket. The required tail padding is the number of required packetstimes the difference of the Outer Payload Size and the IP Overheadminus the Inner Payload Size. So:¶

  Inner Payload Size = IP Packet Size - IP Overhead  Outer Payload Size = MTU - IPsec Overhead                Inner Payload Size  NF0 = ----------------------------------         Outer Payload Size - IP Overhead  NF = CEILING(NF0)  OH = NF * (IP Overhead + IPsec Overhead)       - IP Overhead       + NF * (Outer Payload Size - IP Overhead)       - Inner Payload Size  OH = NF * (IPsec Overhead + Outer Payload Size)       - (IP Overhead + Inner Payload Size)  OH = NF * (IPsec Overhead + Outer Payload Size)       - Inner Packet Size

C.2.Overhead Comparison

The following tables collect the overhead values for some common L3MTU sizes in order to compare them. The first table is the number ofoctets of overhead for a given L3 MTU-sized packet. The second tableis the percentage of overhead in the same MTU-sized packet.¶

Table 3:Overhead Comparison in Octets
Type	ESP+Pad	ESP+Pad	ESP+Pad	IP-TFS	IP-TFS	IP-TFS
L3 MTU	576	1500	9000	576	1500	9000
PSize	522	1446	8946	518	1442	8942
40	482	1406	8906	4.5	1.6	0.3
128	394	1318	8818	14.3	5.1	0.8
256	266	1190	8690	28.7	10.3	1.7
518	4	928	8428	58.0	20.8	3.4
576	576	870	8370	64.5	23.2	3.7
1442	286	4	7504	161.5	58.0	9.4
1500	228	1500	7446	168.0	60.3	9.7
8942	1426	1558	4	1001.2	359.7	58.0
9000	1368	1500	9000	1007.7	362.0	58.4

Table 4:Overhead as Percentage of Inner Packet Size
Type	ESP+Pad	ESP+Pad	ESP+Pad	IP-TFS	IP-TFS	IP-TFS
MTU	576	1500	9000	576	1500	9000
PSize	522	1446	8946	518	1442	8942
40	1205.0%	3515.0%	22265.0%	11.20%	4.02%	0.65%
128	307.8%	1029.7%	6889.1%	11.20%	4.02%	0.65%
256	103.9%	464.8%	3394.5%	11.20%	4.02%	0.65%
518	0.8%	179.2%	1627.0%	11.20%	4.02%	0.65%
576	100.0%	151.0%	1453.1%	11.20%	4.02%	0.65%
1442	19.8%	0.3%	520.4%	11.20%	4.02%	0.65%
1500	15.2%	100.0%	496.4%	11.20%	4.02%	0.65%
8942	15.9%	17.4%	0.0%	11.20%	4.02%	0.65%
9000	15.2%	16.7%	100.0%	11.20%	4.02%	0.65%

C.3.Comparing Available Bandwidth

Another way to compare the two solutions is to look at the amount ofavailable bandwidth each solution provides. The following sectionsconsider and compare the percentage of available bandwidth. For thesake of providing a well-understood baseline, normal (unencrypted)Ethernet and normal ESP values are included.¶

C.3.1.Ethernet

In order to calculate the available bandwidth, the per-packet overheadis calculated first. The total overhead of Ethernet is 14+4 octets ofheader and Cyclic Redundancy Check (CRC) plus an additional 20 octets of framing (preamble,start, and inter-packet gap), for a total of 38 octets. Additionally, the minimum payload is 46 octets.¶

Table 5:L2 Octets Per Packet
Size	E + P	E + P	E + P	IPTFS	IPTFS	IPTFS	Enet	ESP
MTU	590	1514	9014	590	1514	9014	any	any
OH	92	92	92	96	96	96	38	74
40	614	1538	9038	47	42	40	84	114
128	614	1538	9038	151	136	129	166	202
256	614	1538	9038	303	273	258	294	330
518	614	1538	9038	614	552	523	574	610
576	1228	1538	9038	682	614	582	614	650
1442	1842	1538	9038	1709	1538	1457	1498	1534
1500	1842	3076	9038	1777	1599	1516	1538	1574
8942	11052	10766	9038	10599	9537	9038	8998	9034
9000	11052	10766	18076	10667	9599	9096	9038	9074

Table 6:Packets Per Second on 10G Ethernet
Size	E + P	E + P	E + P	IPTFS	IPTFS	IPTFS	Enet	ESP
MTU	590	1514	9014	590	1514	9014	any	any
OH	92	92	92	96	96	96	38	74
40	2.0M	0.8M	0.1M	26.4M	29.3M	30.9M	14.9M	11.0M
128	2.0M	0.8M	0.1M	8.2M	9.2M	9.7M	7.5M	6.2M
256	2.0M	0.8M	0.1M	4.1M	4.6M	4.8M	4.3M	3.8M
518	2.0M	0.8M	0.1M	2.0M	2.3M	2.4M	2.2M	2.1M
576	1.0M	0.8M	0.1M	1.8M	2.0M	2.1M	2.0M	1.9M
1442	678K	812K	138K	731K	812K	857K	844K	824K
1500	678K	406K	138K	703K	781K	824K	812K	794K
8942	113K	116K	138K	117K	131K	138K	139K	138K
9000	113K	116K	69K	117K	130K	137K	138K	137K

Table 7:Percentage of Bandwidth on 10G Ethernet
Size	E + P	E + P	E + P	IP-TFS	IP-TFS	IP-TFS	Enet	ESP
MTU	590	1514	9014	590	1514	9014	any	any
OH	92	92	92	96	96	96	38	74
40	6.51%	2.60%	0.44%	84.36%	93.76%	98.94%	47.62%	35.09%
128	20.85%	8.32%	1.42%	84.36%	93.76%	98.94%	77.11%	63.37%
256	41.69%	16.64%	2.83%	84.36%	93.76%	98.94%	87.07%	77.58%
518	84.36%	33.68%	5.73%	84.36%	93.76%	98.94%	93.17%	87.50%
576	46.91%	37.45%	6.37%	84.36%	93.76%	98.94%	93.81%	88.62%
1442	78.28%	93.76%	15.95%	84.36%	93.76%	98.94%	97.43%	95.12%
1500	81.43%	48.76%	16.60%	84.36%	93.76%	98.94%	97.53%	95.30%
8942	80.91%	83.06%	98.94%	84.36%	93.76%	98.94%	99.58%	99.18%
9000	81.43%	83.60%	49.79%	84.36%	93.76%	98.94%	99.58%	99.18%

A sometimes unexpected result of using an AGGFRAG tunnel (or any packetaggregating tunnel) is that, for small- to medium-sized packets, theavailable bandwidth is actually greater than plain Ethernet. This isdue to the reduction in Ethernet framing overhead. This increasedbandwidth is paid for with an increase in latency. This latency isthe time to send the unrelated octets in the outer tunnel frame. Thefollowing table illustrates the latency for some common values on a10G Ethernet link. The table also includes latency introduced by padding if using ESP with padding.¶

Table 8:Added Latency
Size	ESP+Pad	ESP+Pad	IP-TFS	IP-TFS
MTU	1500	9000	1500	9000
40	1.12 us	7.12 us	1.17 us	7.17 us
128	1.05 us	7.05 us	1.10 us	7.10 us
256	0.95 us	6.95 us	1.00 us	7.00 us
518	0.74 us	6.74 us	0.79 us	6.79 us
576	0.70 us	6.70 us	0.74 us	6.74 us
1442	0.00 us	6.00 us	0.05 us	6.05 us
1500	1.20 us	5.96 us	0.00 us	6.00 us

Notice that the latency values are very similar between the twosolutions; however, whereas IP-TFS provides for constant highbandwidth, in some cases even exceeding plain Ethernet, ESP withpadding often greatly reduces available bandwidth.¶

Acknowledgements

We would like to thankDon Fedyk for help in reviewing and editingthis work. We would also like to thankMichael Richardson,Sean Turner,Valery Smyslov, andTero Kivinen for reviews and manysuggestions for improvements, as well asJoseph Touch for thetransport area review and suggested improvements.¶

Contributors

The following person made significant contributions to this document.¶

Lou Berger

LabN Consulting, L.L.C.

Email:lberger@labn.net

Movatterモバイル変換

RFC 9347

Aggregation and Fragmentation Mode for Encapsulating Security Payload (ESP) and Its Use for IP Traffic Flow Security (IP-TFS)