RFC 9347 | IP Traffic Flow Security | January 2023 |
Hopps | Standards Track | [Page] |
This document describes a mechanism for aggregation andfragmentation of IP packets when they are being encapsulated in Encapsulating Security Payload (ESP). This new payload type can be used for various purposes, suchas decreasing encapsulation overhead for small IP packets; however,the focus in this document is to enhance IP Traffic Flow Security(IP-TFS) by adding Traffic Flow Confidentiality (TFC) to encrypted IP-encapsulated traffic. TFC is provided by obscuring the size andfrequency of IP traffic using a fixed-size, constant-send-rate IPsectunnel. The solution allows for congestion control, as well asnonconstant send-rate usage.¶
This is an Internet Standards Track document.¶
This document is a product of the Internet Engineering Task Force (IETF). It represents the consensus of the IETF community. It has received public review and has been approved for publication by the Internet Engineering Steering Group (IESG). Further information on Internet Standards is available in Section 2 of RFC 7841.¶
Information about the current status of this document, any errata, and how to provide feedback on it may be obtained athttps://www.rfc-editor.org/info/rfc9347.¶
Copyright (c) 2023 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
Traffic analysis[RFC4301][AppCrypt] is the act of extractinginformation about data being sent through a network. While directlyobscuring the data with encryption[RFC4303], the patterns in themessage traffic may expose information due to variations in its shapeand timing[RFC8546][AppCrypt]. Hiding the size and frequency oftraffic is referred to as Traffic Flow Confidentiality (TFC), per[RFC4303].¶
[RFC4303] provides for TFC by allowing padding to be added to encrypted IP packets and allowing for transmission of all-pad packets(indicated using protocol 59). This method has the major limitation that it can significantly underutilize the available bandwidth.¶
This document defines an aggregation and fragmentation (AGGFRAG) modefor ESP, as well as ESP's use for IP Traffic Flow Security (IP-TFS). Thissolution provides for full TFC without the aforementioned bandwidthlimitation. This is accomplished by using a constant-send-rate IPsec[RFC4303] tunnel with fixed-size encapsulating packets; however, thesefixed-size packets can contain partial, whole, or multiple IP packetsto maximize the bandwidth of the tunnel. A nonconstant send rate isallowed, but the confidentiality properties of its use are outsidethe scope of this document.¶
For a comparison of the overhead of IP-TFS with the TFC solutionprescribed in[RFC4303], seeAppendix C.¶
Additionally, IP-TFS provides for operating fairly within congestednetworks[RFC2914]. This is important for when the IP-TFS user is notin full control of the domain through which the IP-TFS tunnel pathflows.¶
The mechanisms, such as the AGGFRAG mode, defined in this documentare generic with the intent of allowing for non-TFS uses, but suchuses are outside the scope of this document.¶
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14[RFC2119][RFC8174] when, and only when, they appear in all capitals, as shown here.¶
This document assumes familiarity with IP security concepts, includingTFC, as described in[RFC4301].¶
As mentioned inSection 1, the AGGFRAG mode utilizes an IPsec[RFC4303] tunnel as its transport. For the purpose of IP-TFS, fixed-size encapsulatingpackets are sent at a constant rate on the AGGFRAG tunnel.¶
The primary input to the tunnel algorithm is the requested bandwidthto be used by the tunnel. Two values are then required to provide forthis bandwidth use: the fixed size of the encapsulating packets andthe rate at which to send them.¶
The fixed packet sizeMAY either be specified manually or bedetermined through other methods, such as the Packetization Layer MTUDiscovery (PLMTUD)[RFC4821][RFC8899] or Path MTU Discovery (PMTUD)[RFC1191][RFC8201]. PMTUD is known to have issues, so PLMTUD isconsidered the more robust option. For PLMTUD, congestion controlpayloads can be used as in-band probes (seeSection 6.1.2 and[RFC8899]).¶
Given the encapsulating packet size and the requested bandwidth to beused, the corresponding packet send rate can be calculated. Thepacket send rate is the requested bandwidth to be used, which is then divided by thesize of the encapsulating packet.¶
The egress (receiving) side of the AGGFRAG tunnelMUST allow for andexpect the ingress (sending) side of the AGGFRAG tunnel to vary thesize and rate of sent encapsulating packets, unless constrained byother policy.¶
As previously mentioned, one issue with the TFC padding solution in[RFC4303] is the large amount of wasted bandwidth, as only one IPpacket can be sent per encapsulating packet. In order to maximizebandwidth, IP-TFS breaks this one-to-one association by introducingan AGGFRAG mode for ESP.¶
The AGGFRAG mode aggregates and fragments the inner IP trafficflow into encapsulating IPsec tunnel packets. For IP-TFS, the IPsecencapsulating tunnel packets are a fixed size. Padding is only addedto the tunnel packets if there is no data available to be sent atthe time of tunnel packet transmission or if fragmentation has beendisabled by the receiver.¶
This is accomplished using a new Encapsulating Security Payload (ESP)[RFC4303] Next Header field value AGGFRAG_PAYLOAD(Section 6.1).¶
Other non-IP-TFS uses of this AGGFRAG mode have been suggested, suchas increased performance through packet aggregation, as well ashandling MTU issues using fragmentation. These uses are not definedhere but are also not restricted by this document.¶
The AGGFRAG_PAYLOAD payload content defined in this documentconsists of a 4- or 24-octet header, followed by either a partialdata block, a full data block, or multiple partial or full data blocks.The following diagram illustrates this payload within the ESP packet.SeeSection 6.1 for the exact formats of theAGGFRAG_PAYLOAD payload.¶
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Outer Encapsulating Header ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ESP Header... . +---------------------------------------------------------------+ | [AGGFRAG sub-type/flags] : BlockOffset | +---------------------------------------------------------------+ : [Optional Congestion Info] : +---------------------------------------------------------------+ | DataBlocks ... ~ ~ ~ ~ | +---------------------------------------------------------------| . ESP Trailer... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
TheBlockOffset
value is either zero or some offset into or pastthe end of theDataBlocks
data.¶
If theBlockOffset
value is zero, it means that theDataBlocks
data begins with a new data block.¶
Conversely, if theBlockOffset
value is non-zero, it points to thestart of the new data block, and the initialDataBlocks
databelongs to the data block that is still being reassembled.¶
If theBlockOffset
points past the end of theDataBlocks
data,then the next data block occurs in a subsequent encapsulating packet.¶
Having theBlockOffset
always point at the next available datablock allows for recovering the next inner packet in thepresence of outer encapsulating packet loss.¶
An example AGGFRAG mode packet flow can be found inAppendix A.¶
+---------------------------------------------------------------+ | Type | rest of IPv4, IPv6, or pad... +--------
A data block is defined by a 4-bit type code, followed by the datablock data. The type values have been carefully chosen to coincidewith the IPv4/IPv6 version field values so that no per-data block type overhead is required to encapsulate an IP packet. Likewise, thelength of the data block is extracted from the encapsulated IPv4'sTotal Length
or IPv6'sPayload Length
fields.¶
Since a data block's type is identified in its first 4 bits, the onlytime padding is required is when there is no data to encapsulate. Forthis end padding, aPad Data Block
is used.¶
In order for a receiver to reassemble fragmented inner packets, thesenderMUST send the inner packet fragments back to back in thelogical outer packet stream (i.e., using consecutive ESP sequencenumbers). However, the sender is allowed to insert "all-pad" payloads(i.e., payloads with aBlockOffset
of zero and a single paddata block ) in between the packets carrying the inner packetfragment payloads. This interleaving of all-pad payloads allows thesender to always send a tunnel packet, regardless of theencapsulation computational requirements.¶
When a receiver is reassembling an inner packet, and it receives an"all-pad" payload, it increments the expected sequence number thatthe next inner packet fragment is expected to arrive in.¶
Given the above, the receiver will need to handle out-of-orderarrival of outer ESP packets prior to reassembly processing. ESPalready provides for optionally detecting replay attacks. Detectingreplay attacks normally utilizes a window method. A similar sequence-number-basedsliding window can be used to correct reordering of theouter packet stream.Receiving a larger (newer) sequence numberpacket advances the window, and if any older ESP packets whosesequence numbers the window has passed by are received, then the packets are dropped. A good choicefor the size of this window depends on the amount of misordering theuser is experiencing; however, a value of 3 has been suggested as adefault when no more informed choice exists.¶
As the amount of misordering that may be present is hard to predict,the window sizeSHOULD be configurable by the user. ImplementationsMAY also dynamically adjust the reordering window based on actualmisordering seen in arriving packets.¶
Please note, when IP-TFS sends a continuous stream of packets, thereis no requirement for an explicit lost packet timer; however, using alost packet timer isRECOMMENDED. If an implementation does not use alost packet timer and only considers an outer packet lost when thereorder window moves by it, the inner traffic can be delayed by up tothe reorder window size times the per-packet send rate. Thisdelay could be significant for slower send rates or when largerreorder window sizes are in use. As the lost packet timer affectsthe delay of inner packet delivery, an implementation or user could choose to set itproportionate to the tunnel rate.¶
While ESP guarantees an increasing sequence number with subsequentlysent packets, it does not actually require the sequence numbers to begenerated consecutively (e.g., sending only even-numbered sequencenumbers would be allowed, as long as they are always increasing). Gapsin the sequence numbers will not work for this document, so thesequence number streamMUST increase monotonically by 1 for eachsubsequent packet.¶
When using the AGGFRAG_PAYLOAD in conjunction with replay detection,the window size for bothMAY be reduced to the smaller of the twowindow sizes. This is because packets outside of the smaller windowbut inside the larger window would still be dropped by the mechanism withthe smaller window size. However, there is also no requirement tomake these values the same. Indeed, in some cases, such as slowtunnels where a very small or zero reorder window size isappropriate, the user may still want a large replay detection windowto log replayed packets. Additionally, large replay windows can beimplemented with very little overhead, compared to large reorderwindows.¶
Finally, as sequence numbers are reset when switching Security Associations (SAs) (e.g., whenrekeying a Child SA), sendersMUST NOT send initial fragments of an inner packet using one SA and subsequent fragments in a different SA.¶
A note onBlockOffset
values: SendersMUST encode theBlockOffset
consistently with the immediately preceding non-all-pad payload packet.Specifically, if the immediately preceding non-all-pad payload packetended with a Pad Data Block, thisBlockOffset
MUST be zero, as PadData Blocks are never fragmented. TheBlockOffset
MUST beconsistent with the remaining size implied by the lengthfield from the fragmented inner packet.¶
When the tunnel bandwidth is not being fully utilized, asenderMAY pad out the current encapsulating packet in orderto deliver an inner packet unfragmented in the following outerpacket. The benefit would be to avoid inner packet fragmentation inthe presence of a bursty offered load (non-bursty traffic willnaturally not fragment). SendersMAY also choose to allowfor a minimum fragment size to be configured (e.g., as a percentageof the AGGFRAG_PAYLOAD payload size) to avoid fragmentation at thecost of tunnel bandwidth. The costs with these methods are complexityand an added delay of inner traffic. The main advantage to avoidingfragmentation is to minimize inner packet loss in the presence ofouter packet loss. When this is worthwhile (e.g., how much loss andwhat type of loss is required, given different inner traffic shapesand utilization, for this to make sense) and what values to use forthe allowable/added delay may be worth researching but is outsidethe scope of this document.¶
While use of padding to avoid fragmentation does not impactinteroperability, if padding is used inappropriately, it can reduce the effectivethroughput of a tunnel. Senders implementing either of theabove approaches will need to take care to not reduce the effectivecapacity, and overall utility, of the tunnel through the overuse ofpadding.¶
To support reporting of congestion control information (describedlater) using a non-AGGFRAG_PAYLOAD-enabled SA, it is allowed to sendan AGGFRAG_PAYLOAD payload with no data blocks (i.e., the ESP payloadlength is equal to the AGGFRAG_PAYLOAD header length). This specialpayload is called an empty payload.¶
Currently, this situation is only applicable in use cases without Internet Key Exchange Protocol Version 2 (IKEv2).¶
[RFC4301] provides some direction on when and how to map various valuesfrom an inner IP header to the outer encapsulating header, namely theDon't Fragment (DF) bit[RFC0791], the DifferentiatedServices (DS) field[RFC2474], and the Explicit Congestion Notification(ECN) field[RFC3168]. Unlike in[RFC4301], the AGGFRAG mode may, and often will, beencapsulating more than one IP packet per ESP packet. To deal withthis, these mappings are restricted further.¶
The AGGFRAG mode never maps the inner DF bit, as it is unrelated to theAGGFRAG tunnel functionality; the AGGFRAG mode never needs to IP fragmentthe inner packets, and the inner packets will not affect thefragmentation of the outer encapsulation packets.¶
The ECN value need not be mapped, as any congestion related to theconstant-send-rate IP-TFS tunnel is unrelated (by design) to theinner traffic flow. The senderMAY still set the ECN value of innerpackets based on the normal ECN specification[RFC3168][RFC4301][RFC6040].¶
By default, the DS fieldSHOULD NOT be copied, although a senderMAYchoose to allow for configuration to override this behavior. A senderSHOULD also allow the DS value to be set by configuration.¶
How to modify the inner packet IPv4 TTL[RFC0791] orIPv6 Hop Limit[RFC8200] is specified in[RFC4301].¶
[RFC4301] specifies how to apply policy to authenticated andunauthenticated ICMP error packets (e.g., Destination Unreachable)arriving at or being forwarded through the endpoint, in particular,whether to process, ignore, or forward said packets. With the oneexception that this document does not change the handling of thesepackets, they should be handled as specified in[RFC4301].¶
The one way in which an AGGFRAG tunnel differs in ICMP error packetmechanics is with PMTU. When fragmentation is enabled on the AGGFRAGtunnel, then no ICMP "Too Big" errors need to be generated forarriving ingress traffic, as the arriving inner packets will benaturally fragmented by the AGGFRAG encapsulation.¶
Otherwise, when fragmentation has been disabled on the AGGFRAG tunnel,then the treatment of arriving inner traffic exactly maps to that ofa non-AGGFRAG ESP tunnel. Explicitly, IPv4 with DF set and IPv6packets that cannot fit in its own outer packet payload willgenerate the appropriate ICMP "Too Big" error, as described in[RFC4301],and IPv4 packets without DF set will be IP fragmented, as described in[RFC4301].¶
Packets egressing the tunnel continue to be handled as specified in[RFC4301].¶
All other aspects of PMTU and the handling of ICMP "Too Big" messages(i.e., with regards to the outer AGGFRAG/ESP tunnel packet size)also remain unchanged from[RFC4301].¶
Unlike in[RFC4301], there is normally no effective MTU (EMTU) on anAGGFRAG tunnel, as all IP packet sizes are properly transmitted withoutrequiring IP fragmentation prior to tunnel ingress. That said, asenderMAY allow for explicitly configuring an MTU for thetunnel.¶
If fragmentation has been disabled on the AGGFRAG tunnel, then thetunnel's EMTU and behaviors are the same as normal IPsec tunnels[RFC4301].¶
This document does not specify mixed use of anAGGFRAG_PAYLOAD-enabled SA. A senderMUST only send AGGFRAG_PAYLOADpayloads over an SA configured for AGGFRAG mode.¶
Just as with normal IPsec/ESP SAs, AGGFRAG SAs areunidirectional. Bidirectional IP-TFS functionality is achieved bysetting up 2 AGGFRAG SAs, one in either direction.¶
An AGGFRAG tunnel used for IP-TFS can operate in 2 modes, anon-congestion-controlled mode and congestion-controlled mode.¶
In the non-congestion-controlled mode, IP-TFS sends fixed-sizepackets over an AGGFRAG tunnel at a constant rate. The packet sendrate is constant and is not automatically adjusted, regardless of anynetwork congestion (e.g., packet loss).¶
For similar reasons as given in[RFC7510], the non-congestion-controlledmodeMUST only be used where the user has full administrative controlover any path the tunnel will take andMUST NOT be used if this isnot the case. This is required so the user can guarantee thebandwidth and also be sure as to not be negatively affecting networkcongestion[RFC2914]. In this case, packet loss should be reported tothe administrator (e.g., via syslog, YANG notification, SNMP traps,etc.) so that any failures due to a lack of bandwidth can becorrected. The use of circuit breakers is alsoRECOMMENDED (Section 2.4.2.1).¶
Users that choose the non-congestion-controlled mode need tounderstand that this mode will send packets at a constant rate,utilizing a constant, fixed bandwidth, and will not adjust based oncongestion. Thus, if they do not guarantee the bandwidth required bythe tunnel, the tunnel's operation, as well as the rest of theirnetwork, may be negatively impacted.¶
One expected use case for the non-congestion-controlled mode is toguarantee the full tunnel bandwidth is available and preferred overother non-tunnel traffic. In fact, a typical site-to-site use casemight have all of the user traffic utilizing the IP-TFS tunnel.¶
The non-congestion-controlled mode is also appropriate if ESP over TCP isin use[RFC9329]. However, the use of TCP is considered a fallback-only solution for IPsec; it is highly not preferred. This is alsoone of the reasons that TCP was not chosen as the encapsulation forIP-TFS instead of AGGFRAG.¶
With the congestion-controlled mode, IP-TFS adapts to networkcongestion by lowering the packet send rate to accommodate thecongestion, as well as raising the rate when congestion subsides.Since overhead is per packet, by allowing for maximal fixed-sizepackets and varying the send rate, transport overhead is minimized.¶
The output of the congestion control algorithm will adjust the rateat which the ingress sends packets. While this document does notrequire a specific congestion control algorithm, best currentpractice RECOMMENDS that the algorithm conform to[RFC5348]. Congestioncontrol principles are documented in[RFC2914] as well. There is an example in[RFC4342]of the algorithm in[RFC5348], which matches therequirements of IP-TFS (i.e., designed for fixed-size packets and sendrate varied based on congestion).¶
The required inputs for the TCP-friendly rate control algorithmdescribed in[RFC5348] are the receiver's loss event rate and thesender's estimated round-trip time (RTT). These values are provided byIP-TFS using the congestion information header fields described inSection 3. In particular, these values are sufficient toimplement the algorithm described in[RFC5348].¶
At a minimum, the congestion informationMUST be sent, from thereceiver and from the sender, at least once per RTT. Prior toestablishing an RTT, the informationSHOULD be sent constantly fromthe sender and the receiver so that an RTT estimate can beestablished. Not receiving this information over multipleconsecutive RTT intervals should be considered a congestion eventthat causes the sender to adjust its sending rate lower. Forexample, this is called the "no feedback timeout" in[RFC4342], and it is equalto 4 RTT intervals. When a "no feedback timeout" has occurred, the sending rate is halved, as per[RFC4342].¶
An implementationMAY choose to always include the congestioninformation in its AGGFRAG payload header if it is sending it on an IP-TFS-enabledSA. Since IP-TFS normally will operate with a large packetsize, the congestion information should represent a small portion ofthe available tunnel bandwidth. An implementation choosing to alwayssend the dataMAY also choose to only update theLossEventRate
andRTT
header field values it sends everyRTT
through.¶
When choosing a congestion control algorithm (or a selection ofalgorithms), note that IP-TFS is not providing for reliable deliveryof IP traffic, and so per-packet acknowledgements (ACKs) are not required and are notprovided.¶
It is worth noting that the variable send rate of acongestion-controlled AGGFRAG tunnel is not private; however, thissend rate is being driven by network congestion, and as long as theencapsulated (inner) traffic flow shape and timing are not directlyaffecting the (outer) network congestion, the variations in thetunnel rate will not weaken the provided inner traffic flowconfidentiality.¶
In addition to congestion control, implementations that support thenon-congestion-control modeSHOULD implement circuit breakers[RFC8084]as a recovery method of last resort. When circuit breakers areenabled, an implementationSHOULD also enable congestion controlreports so that circuit breakers have information to act on.¶
The pseudowire congestion considerations[RFC7893] are equallyapplicable to the mechanisms defined in this document, notably thetext on inelastic traffic.¶
One example of a simple, slow-trip circuit breaker that animplementation may provide would utilize 2 values: the amount ofpersistent loss rate required to trip the circuit breaker and the required lengthof time this persistent loss rate must be seen to trip the circuit breaker. These2 value are required configurations from the user. When the circuit breaker istripped, the tunnel traffic is disabled and an appropriate logmessage or other management type alarm is triggered, indicatingoperation intervention is required.¶
An AGGFRAG-enabled SA receiver has a few tasks to perform.¶
The receiverMAY process incoming AGGFRAG_PAYLOAD payloads as soon asthey arrive, as much as it can, i.e., if the incoming AGGFRAG_PAYLOADpacket contains complete inner packet(s), the receiver should extractand transmit them immediately. For partial packets, the receiver needsto keep the partial packets in the memory until they fall outfrom the reordering window or until the missing parts of the packetsare received, in which case, it will reassemble and transmit them. Ifthe AGGFRAG_PAYLOAD payload contains multiple packets, theySHOULD be sentout in the order they are in the AGGFRAG_PAYLOAD (i.e., keep theoriginal order they were received on the other end). The cost ofusing this method is that an amplification of out-of-order deliveryof inner packets can occur due to inner packet aggregation.¶
Instead of the method described in the previous paragraph, thereceiverMAY reorder out-of-order AGGFRAG_PAYLOAD payloads receivedinto in-sequence-order AGGFRAG_PAYLOAD payloads (Section 2.2.3), and only after it has anin-order AGGFRAG_PAYLOAD payload stream would the receiver transmitthe inner packets. Using this method will ensure the inner packetsare sent in order. The cost of this method is that a lost packet willcause a delay of up to the lost packet timer interval (or the fullreorder window if no lost packet timer is used). Additionally, therecan be extra burstiness in the output stream. This burstiness canhappen when a lost packet is dropped from the reorder window,and the remaining outer packets in the reorder window are immediatelyprocessed and sent out back to back.¶
Additionally, if congestion control is enabled, the receiver sendscongestion control data (Section 6.1.2) back to the sender, as described in Sections2.4.2and3.¶
Finally, a note on receiving incorrectBlockOffset
values: To accountfor misbehaving senders, a receiverSHOULD gracefully handle the casewhere theBlockOffset
of consecutive packets, and/or the innerpacket they share, do not agree. ItMAY drop the inner packet or one or both of the outer packets.¶
In order to support the congestion-controlled mode, the sender needs toknow the loss event rate and to approximate the RTT[RFC5348]. In orderto obtain these values, the receiver sends congestion controlinformation on its SA back to the sender. Thus, to supportcongestion control, the receiverMUST have a paired SA back to thesender (this is always the case when the tunnel was created usingIKEv2). If the SA back to the sender is a non-AGGFRAG_PAYLOAD-enabledSA, then an AGGFRAG_PAYLOAD empty payload (i.e., header only) is usedto convey the information.¶
In order to calculate a loss event rate compatible with[RFC5348], thereceiver needs to have an RTT estimate. Thus, the sendercommunicates this estimate in theRTT
header field. On startup, thisvalue will be zero, as no RTT estimate is yet known.¶
In order for the sender to estimate itsRTT
value, the senderplaces a timestamp value in theTVal
header field. On first receiptof thisTVal
, the receiver records the newTVal
value, along withthe time it arrived locally. Subsequent receipt of the sameTVal
MUST NOT update the recorded time.¶
When the receiver sends its congestion control header, it places this latest recordedTVal
in theTEcho
header field, along with 2 delay values:EchoDelay
andTransmit Delay
. TheEcho Delay
value is the time deltafrom the recorded arrival time ofTVal
and the current clock inmicroseconds. The second value,Transmit Delay
, is the receiver'scurrent transmission delay on the tunnel (i.e., the average timebetween sending packets on its half of the AGGFRAG tunnel).¶
When the sender receives back itsTVal
in theTEcho
header field,it calculates 2 RTT estimates. The first is the actual delay found bysubtracting theTEcho
value from its current clock and thensubtracting theEcho Delay
as well. The second RTT estimate is found byadding the receivedTransmit Delay
header value to the sender's owntransmission delay (i.e., the average time between sending packets onits half of the AGGFRAG tunnel). The larger of these 2 RTT estimatesSHOULD be used as theRTT
value.¶
The two RTT estimates are required to handle different combinations offaster or slower tunnel packet paths with faster or slower fixedtunnel rates. Choosing the larger of the two values guarantees thattheRTT
is never considered faster than the aggregate transmissiondelay based on the IP-TFS send rate (the second estimate), as wellas never being considered faster than the actual RTT along the tunnelpacket path (the first estimate).¶
The receiver also calculates, and communicates in theLossEventRate
header field, the loss event rate for use by the sender. This isslightly different from[RFC4342], which periodically sends all the lossinterval data back to the sender so that it can do the calculation.SeeAppendix B for a suggested way tocalculate the loss event rate value. Initially, this value will bezero (indicating no loss) until enough data has been collected by thereceiver to update it.¶
In addition to normal packet loss information, the AGGFRAG mode supports useof the ECN bits in the encapsulating IP header[RFC3168] foridentifying congestion. If ECN use is enabled and a packet arrives atthe egress (receiving) side with the Congestion Experienced (CE) value set,then the receiver considers that packet as being dropped, although itdoes not drop it. The receiverMUST set the E bit in anyAGGFRAG_PAYLOAD payload header containing aLossEventRate
valuederived from a CE value being considered.¶
In[RFC6040], which updates[RFC3168] and[RFC4301], behaviors for markingthe outer ECN field value based on the ECN field of the inner packet are defined.As the AGGFRAG mode may have multiple inner packets present in a singleouter packet, and there is no obvious correct way to map thesemultiple values to the single outer packet ECN field value, thetunnel ingress endpointSHOULD operate in the "compatibility" mode,rather than the "default" mode from[RFC6040]. In particular, this meansthat the ingress (sending) endpoint of the tunnel always sets thenewly constructed outer encapsulating packet header ECN fieldto Not-ECT[RFC6040].¶
IP-TFS is meant to be deployable with a minimal amount ofconfiguration. All IP-TFS-specific configuration should bespecified at the unidirectional tunnel ingress (sending) side. Itis intended that non-IKEv2 operation is supported, at least, withlocal static configuration.¶
YANG and MIB documents have been defined for IP-TFS in[RFC9348] and[RFC9349].¶
Bandwidth is a local configuration option. For thenon-congestion-controlled mode, the bandwidthSHOULD be configured.For the congestion-controlled mode, the bandwidth can be configured orthe congestion control algorithm discovers and uses the maximumbandwidth available. No standardized configuration method isrequired.¶
The fixed packet size to be used for the tunnel encapsulation packetsMAY be configured manually or can be automatically determined usingother methods, such as PLMTUD[RFC4821][RFC8899] or PMTUD[RFC1191][RFC8201]. As PMTUD is known to have issues, PLMTUD is considered themore robust option. No standardized configuration method is required.¶
Congestion control is a local configuration option. No standardizedconfiguration method is required.¶
As mentioned previously, AGGFRAG tunnels utilize ESP payloads of typeAGGFRAG_PAYLOAD.¶
When using IKEv2, a new "USE_AGGFRAG" notification message enablesthe AGGFRAG_PAYLOAD payload on a Child SA pair. Themethod used is similar to how USE_TRANSPORT_MODE is negotiated, asdescribed in[RFC7296].¶
To request use of the AGGFRAG_PAYLOAD payload on the Child SA pair,the initiator includes the USE_AGGFRAG notification in an SA payloadrequesting a new Child SA (either during the initial IKE_AUTH orduring CREATE_CHILD_SA exchanges). If the request isaccepted, then the responseMUST also include a notification of typeUSE_AGGFRAG. If the responder declines the request, the Child SA willbe established without AGGFRAG_PAYLOAD payload use enabled. Ifthis is unacceptable to the initiator, the initiatorMUST delete theChild SA.¶
As the use of the AGGFRAG_PAYLOAD payload is currently only definedfor non-transport-mode tunnels, the USE_AGGFRAG notificationMUST NOTbe combined with the USE_TRANSPORT notification.¶
The USE_AGGFRAG notification contains a 1-octet payload of flags thatspecify requirements from the sender of the notification. If anyrequirement flags are not understood or cannot be supported by thereceiver, then the receiverSHOULD NOT enable use of AGGFRAG_PAYLOAD(either by not responding with the USE_AGGFRAG notification or, inthe case of the initiator, by deleting the Child SA if the now-established non-AGGFRAG_PAYLOAD using SA is unacceptable).¶
The notification type and payload flag values are defined inSection 6.1.4.¶
The packet and data formats defined below are generic with the intentof allowing for non-IP-TFS uses, but such uses are outside the scope ofthis document.¶
ESP Next Header value: 144¶
An AGGFRAG payload is identified by the ESP Next Header valueAGGFRAG_PAYLOAD, which has the value 144, which has been reserved inthe IP protocol numbers space. The first octet of the payloadindicates the format of the remaining payload data.¶
0 1 2 3 4 5 6 7 +-+-+-+-+-+-+-+-+-+-+- | Sub-type | ... +-+-+-+-+-+-+-+-+-+-+-
This document defines 2 payload sub-types. These payload formatsare defined in the following sections.¶
The non-congestion-control AGGFRAG_PAYLOAD payload consists of a4-octet header, followed by a variable amount ofDataBlocks
data, asshown below.¶
1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Sub-Type (0) | Reserved | BlockOffset | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | DataBlocks ... +-+-+-+-+-+-+-+-+-+-+-
DataBlocks
data before the start of anew data block. If the start of a new data blockoccurs in a subsequent payload, theBlockOffset
will point past the end of theDataBlocks
data.In this case, all theDataBlocks
data belongs tothe current data block being assembled. When theBlockOffset
extends into subsequent payloads, itcontinues to only countDataBlocks
data (i.e.,it does not count subsequent packets of thenon-DataBlocks
data, such as header octets).¶The congestion control AGGFRAG_PAYLOAD payload consists of a 24-octet header, followed by a variable amount ofDataBlocks
data, asshown below.¶
1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Sub-type (1) | Reserved |P|E| BlockOffset | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | LossEventRate | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | RTT | Echo Delay ... +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ... Echo Delay | Transmit Delay | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | TVal | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | TEcho | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | DataBlocks ... +-+-+-+-+-+-+-+-+-+-+-
LossEventRate
.¶1/LossEventRate
.¶0x3FFFFF
, the valueMUST be set to0x3FFFFF
.¶TVal
value, which it is sending back inTEcho
. If the delayis equal to or larger than0x1FFFFF
, the valueMUST beset to0x1FFFFF
.¶0x1FFFFF
, thevalueMUST be set to0x1FFFFF
.¶TEcho
field, along withanEcho Delay
value of how long that echo took.¶TVal
field. The receivedTVal
is placed inTEcho
, along withanEcho Delay
value indicating how long it has been sincereceiving theTVal
value.¶1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Type | IPv4, IPv6, or pad... +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 0x4 | IHL | TypeOfService | TotalLength | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Rest of the inner packet ... +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
These values are the actual values within the encapsulated IPv4header. In other words, the start of this data block is the start ofthe encapsulated IP packet.¶
1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 0x6 | TrafficClass | FlowLabel | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | PayloadLength | Rest of the inner packet ... +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-
These values are the actual values within the encapsulated IPv6header. In other words, the start of this data block is the start ofthe encapsulated IP packet.¶
1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | 0x0 | Padding ... +-+-+-+-+-+-+-+-+-+-+-
As discussed inSection 5.1, a notificationmessage USE_AGGFRAG is used to negotiate use of the ESP AGGFRAG_PAYLOADNext Header value.¶
The USE_AGGFRAG Notification Message State Type is 16442.¶
The notification payload contains 1 octet of requirement flags. Thereare currently 2 requirement flags defined. This may be revised bylater specifications.¶
+-+-+-+-+-+-+-+-+ |0|0|0|0|0|0|C|D| +-+-+-+-+-+-+-+-+
Data Block
). This value onlyapplies to what the sender is capable of receiving; the senderMAYstill send packet fragments unless similarly restricted by thereceiver in its USE_AGGFRAG notification.¶IANA hasallocated an IP protocol number from the "Protocol Numbers - AssignedInternet Protocol Numbers" registry as follows.¶
IANA has created a registry called "AGGFRAG_PAYLOADSub-Types" under a new category named "ESP AGGFRAG_PAYLOAD".The registration policy for this registry is "Expert Review"[RFC8126][RFC7120].¶
This initial content for this registry is as follows:¶
Sub-Type | Name | Reference |
---|---|---|
0 | Non-Congestion-Control Format | RFC 9347 |
1 | Congestion Control Format | RFC 9347 |
3-255 | Reserved |
IANA has allocated a status type USE_AGGFRAG fromthe "IKEv2 Notify Message Types - Status Types" registry.¶
This document describes an aggregation and fragmentation mechanism toefficiently implement TFC for IP traffic. This approach is expected to reducethe efficacy of traffic analysis on IPsec communication. Other thanthe additional security afforded by using this mechanism, IP-TFSutilizes the security protocols[RFC4303] and[RFC7296], and so theirsecurity considerations apply to IP-TFS as well.¶
As noted inSection 3.1, the ECN bits are not protected by IPsec andthus may constitute a covert channel. For this reason, ECN useSHOULD NOT be enabled by default.¶
As noted previously inSection 2.4.2, for TFC to bemaintained, the encapsulated traffic flow should not beaffecting network congestion in a predictable way, and if it would be,then non-congestion-controlled mode use should be considered instead.¶
Below, an example inner IP packet flow within the encapsulating tunnelpacket stream is shown. Notice how encapsulated IP packets can startand end anywhere, and more than one or less than one may occur in asingle encapsulating packet.¶
Offset: 0 Offset: 100 Offset: 2000 Offset: 600 [ ESP1 (1404) ][ ESP2 (1404) ][ ESP3 (1404) ][ ESP4 (1404) ] [--750--][--750--][60][-240-][--3000----------------------][pad]
Each outer encapsulating ESP space is a fixed size of 1404 octets, the first 4 octets of which contain the AGGFRAG header.The encapsulated IP packet flow (lengths include the IP header andpayload) is as follows: a 750-octet packet, a 750-octet packet, a60-octet packet, a 240-octet packet, and a 3000-octet packet.¶
TheBlockOffset
values in the 4 AGGFRAG payload headers for thispacket flow would thus be: 0, 100, 2000, and 600, respectively. The firstencapsulating packet (ESP1) has a zeroBlockOffset
, which points at theIP data block immediately following the AGGFRAG header. The followingpacket's (ESP2)BlockOffset
points inward 100 octets to the start of the60-octet data block. The third encapsulating packet (ESP3) contains themiddle portion of the 3000-octet data block, so the offset points pastits end and into the fourth encapsulating packet. The fourth packet's(ESP4) offset is 600, pointing at the padding that follows thecompletion of the continued 3000-octet packet.¶
The current best practice indicates that congestion controlSHOULD bedone in a TCP-friendly way. A TCP-friendly congestion control algorithmis described in[RFC5348]. For this IP-TFS use case (as with[RFC4342]), the(fixed) packet size is used as the segment size for the algorithm. Themain formula in the algorithm for the send rate is then as follows:¶
1 X = ----------------------------------------------- R * (sqrt(2*p/3) + 12*sqrt(3*p/8)*p*(1+32*p^2))¶
X
is the send rate in packets per second,R
is theRTT estimate, andp
is the loss event rate (the inverseof which is provided by the receiver).¶
In addition, the algorithm in[RFC5348] also uses anX_recv
value (thereceiver's receive rate). For IP-TFS, oneMAY set this value according tothe sender's current tunnel send rate (X
).¶
The IP-TFS receiver, having the RTT estimate from the sender, can use thesame method as described in[RFC5348] and[RFC4342] to collect the lossintervals and calculate the loss event rate value using the weightedaverage as indicated. The receiver communicates the inverse of thisvalue back to the sender in the AGGFRAG_PAYLOAD payload header fieldLossEventRate
.¶
The IP-TFS sender now has both theR
andp
values and can calculatethe correct sending rate. If following[RFC5348], the sender should alsouse the slow start mechanism described therein when the IP-TFS SA isfirst established.¶
For comparing overhead, the overhead of ESP for both normal and AGGFRAGtunnel packets must be calculated, and so an algorithm for encryptionand authentication must be chosen. For the data below, AES-GCM-256 wasselected. This leads to an IP+ESP overhead of 54.¶
54 = 20 (IP) + 8 (ESPH) + 2 (ESPF) + 8 (IV) + 16 (ICV)¶
Additionally, for IP-TFS, non-congestion-control AGGFRAG_PAYLOADheaders were chosen, which adds 4 octets, for a total overhead of 58.¶
For comparison, the overhead of an AGGFRAG payload is 58 octets per outer packet.Therefore, the octet overhead per inner packet is 58 divided by thenumber of outer packets required (fractions allowed). The overheadas a percentage of inner packet size is a constant based on the OuterMTU size.¶
OH = 58 / Outer Payload Size / Inner Packet Size OH % of Inner Packet Size = 100 * OH / Inner Packet Size OH % of Inner Packet Size = 5800 / Outer Payload Size¶
Type | IP-TFS | IP-TFS | IP-TFS |
---|---|---|---|
MTU | 576 | 1500 | 9000 |
PSize | 518 | 1442 | 8942 |
40 | 11.20% | 4.02% | 0.65% |
576 | 11.20% | 4.02% | 0.65% |
1500 | 11.20% | 4.02% | 0.65% |
9000 | 11.20% | 4.02% | 0.65% |
The overhead per inner packet for constant-send-rate-padded ESP(i.e., original IPsec TFC) is 36 octets plus any padding, unlessfragmentation is required.¶
When fragmentation of the inner packet is required to fit in theouter IPsec packet, overhead is the number of outer packets requiredto carry the fragmented inner packet times both the inner IP Overhead(20) and the outer packet overhead (54) minus the initial inner IPOverhead plus any required tail padding in the last encapsulationpacket. The required tail padding is the number of required packetstimes the difference of the Outer Payload Size and the IP Overheadminus the Inner Payload Size. So:¶
Inner Payload Size = IP Packet Size - IP Overhead Outer Payload Size = MTU - IPsec Overhead Inner Payload Size NF0 = ---------------------------------- Outer Payload Size - IP Overhead NF = CEILING(NF0) OH = NF * (IP Overhead + IPsec Overhead) - IP Overhead + NF * (Outer Payload Size - IP Overhead) - Inner Payload Size OH = NF * (IPsec Overhead + Outer Payload Size) - (IP Overhead + Inner Payload Size) OH = NF * (IPsec Overhead + Outer Payload Size) - Inner Packet Size¶
The following tables collect the overhead values for some common L3MTU sizes in order to compare them. The first table is the number ofoctets of overhead for a given L3 MTU-sized packet. The second tableis the percentage of overhead in the same MTU-sized packet.¶
Type | ESP+Pad | ESP+Pad | ESP+Pad | IP-TFS | IP-TFS | IP-TFS |
---|---|---|---|---|---|---|
L3 MTU | 576 | 1500 | 9000 | 576 | 1500 | 9000 |
PSize | 522 | 1446 | 8946 | 518 | 1442 | 8942 |
40 | 482 | 1406 | 8906 | 4.5 | 1.6 | 0.3 |
128 | 394 | 1318 | 8818 | 14.3 | 5.1 | 0.8 |
256 | 266 | 1190 | 8690 | 28.7 | 10.3 | 1.7 |
518 | 4 | 928 | 8428 | 58.0 | 20.8 | 3.4 |
576 | 576 | 870 | 8370 | 64.5 | 23.2 | 3.7 |
1442 | 286 | 4 | 7504 | 161.5 | 58.0 | 9.4 |
1500 | 228 | 1500 | 7446 | 168.0 | 60.3 | 9.7 |
8942 | 1426 | 1558 | 4 | 1001.2 | 359.7 | 58.0 |
9000 | 1368 | 1500 | 9000 | 1007.7 | 362.0 | 58.4 |
Type | ESP+Pad | ESP+Pad | ESP+Pad | IP-TFS | IP-TFS | IP-TFS |
---|---|---|---|---|---|---|
MTU | 576 | 1500 | 9000 | 576 | 1500 | 9000 |
PSize | 522 | 1446 | 8946 | 518 | 1442 | 8942 |
40 | 1205.0% | 3515.0% | 22265.0% | 11.20% | 4.02% | 0.65% |
128 | 307.8% | 1029.7% | 6889.1% | 11.20% | 4.02% | 0.65% |
256 | 103.9% | 464.8% | 3394.5% | 11.20% | 4.02% | 0.65% |
518 | 0.8% | 179.2% | 1627.0% | 11.20% | 4.02% | 0.65% |
576 | 100.0% | 151.0% | 1453.1% | 11.20% | 4.02% | 0.65% |
1442 | 19.8% | 0.3% | 520.4% | 11.20% | 4.02% | 0.65% |
1500 | 15.2% | 100.0% | 496.4% | 11.20% | 4.02% | 0.65% |
8942 | 15.9% | 17.4% | 0.0% | 11.20% | 4.02% | 0.65% |
9000 | 15.2% | 16.7% | 100.0% | 11.20% | 4.02% | 0.65% |
Another way to compare the two solutions is to look at the amount ofavailable bandwidth each solution provides. The following sectionsconsider and compare the percentage of available bandwidth. For thesake of providing a well-understood baseline, normal (unencrypted)Ethernet and normal ESP values are included.¶
In order to calculate the available bandwidth, the per-packet overheadis calculated first. The total overhead of Ethernet is 14+4 octets ofheader and Cyclic Redundancy Check (CRC) plus an additional 20 octets of framing (preamble,start, and inter-packet gap), for a total of 38 octets. Additionally, the minimum payload is 46 octets.¶
Size | E + P | E + P | E + P | IPTFS | IPTFS | IPTFS | Enet | ESP |
---|---|---|---|---|---|---|---|---|
MTU | 590 | 1514 | 9014 | 590 | 1514 | 9014 | any | any |
OH | 92 | 92 | 92 | 96 | 96 | 96 | 38 | 74 |
40 | 614 | 1538 | 9038 | 47 | 42 | 40 | 84 | 114 |
128 | 614 | 1538 | 9038 | 151 | 136 | 129 | 166 | 202 |
256 | 614 | 1538 | 9038 | 303 | 273 | 258 | 294 | 330 |
518 | 614 | 1538 | 9038 | 614 | 552 | 523 | 574 | 610 |
576 | 1228 | 1538 | 9038 | 682 | 614 | 582 | 614 | 650 |
1442 | 1842 | 1538 | 9038 | 1709 | 1538 | 1457 | 1498 | 1534 |
1500 | 1842 | 3076 | 9038 | 1777 | 1599 | 1516 | 1538 | 1574 |
8942 | 11052 | 10766 | 9038 | 10599 | 9537 | 9038 | 8998 | 9034 |
9000 | 11052 | 10766 | 18076 | 10667 | 9599 | 9096 | 9038 | 9074 |
Size | E + P | E + P | E + P | IPTFS | IPTFS | IPTFS | Enet | ESP |
---|---|---|---|---|---|---|---|---|
MTU | 590 | 1514 | 9014 | 590 | 1514 | 9014 | any | any |
OH | 92 | 92 | 92 | 96 | 96 | 96 | 38 | 74 |
40 | 2.0M | 0.8M | 0.1M | 26.4M | 29.3M | 30.9M | 14.9M | 11.0M |
128 | 2.0M | 0.8M | 0.1M | 8.2M | 9.2M | 9.7M | 7.5M | 6.2M |
256 | 2.0M | 0.8M | 0.1M | 4.1M | 4.6M | 4.8M | 4.3M | 3.8M |
518 | 2.0M | 0.8M | 0.1M | 2.0M | 2.3M | 2.4M | 2.2M | 2.1M |
576 | 1.0M | 0.8M | 0.1M | 1.8M | 2.0M | 2.1M | 2.0M | 1.9M |
1442 | 678K | 812K | 138K | 731K | 812K | 857K | 844K | 824K |
1500 | 678K | 406K | 138K | 703K | 781K | 824K | 812K | 794K |
8942 | 113K | 116K | 138K | 117K | 131K | 138K | 139K | 138K |
9000 | 113K | 116K | 69K | 117K | 130K | 137K | 138K | 137K |
Size | E + P | E + P | E + P | IP-TFS | IP-TFS | IP-TFS | Enet | ESP |
---|---|---|---|---|---|---|---|---|
MTU | 590 | 1514 | 9014 | 590 | 1514 | 9014 | any | any |
OH | 92 | 92 | 92 | 96 | 96 | 96 | 38 | 74 |
40 | 6.51% | 2.60% | 0.44% | 84.36% | 93.76% | 98.94% | 47.62% | 35.09% |
128 | 20.85% | 8.32% | 1.42% | 84.36% | 93.76% | 98.94% | 77.11% | 63.37% |
256 | 41.69% | 16.64% | 2.83% | 84.36% | 93.76% | 98.94% | 87.07% | 77.58% |
518 | 84.36% | 33.68% | 5.73% | 84.36% | 93.76% | 98.94% | 93.17% | 87.50% |
576 | 46.91% | 37.45% | 6.37% | 84.36% | 93.76% | 98.94% | 93.81% | 88.62% |
1442 | 78.28% | 93.76% | 15.95% | 84.36% | 93.76% | 98.94% | 97.43% | 95.12% |
1500 | 81.43% | 48.76% | 16.60% | 84.36% | 93.76% | 98.94% | 97.53% | 95.30% |
8942 | 80.91% | 83.06% | 98.94% | 84.36% | 93.76% | 98.94% | 99.58% | 99.18% |
9000 | 81.43% | 83.60% | 49.79% | 84.36% | 93.76% | 98.94% | 99.58% | 99.18% |
A sometimes unexpected result of using an AGGFRAG tunnel (or any packetaggregating tunnel) is that, for small- to medium-sized packets, theavailable bandwidth is actually greater than plain Ethernet. This isdue to the reduction in Ethernet framing overhead. This increasedbandwidth is paid for with an increase in latency. This latency isthe time to send the unrelated octets in the outer tunnel frame. Thefollowing table illustrates the latency for some common values on a10G Ethernet link. The table also includes latency introduced by padding if using ESP with padding.¶
Size | ESP+Pad | ESP+Pad | IP-TFS | IP-TFS |
---|---|---|---|---|
MTU | 1500 | 9000 | 1500 | 9000 |
40 | 1.12 us | 7.12 us | 1.17 us | 7.17 us |
128 | 1.05 us | 7.05 us | 1.10 us | 7.10 us |
256 | 0.95 us | 6.95 us | 1.00 us | 7.00 us |
518 | 0.74 us | 6.74 us | 0.79 us | 6.79 us |
576 | 0.70 us | 6.70 us | 0.74 us | 6.74 us |
1442 | 0.00 us | 6.00 us | 0.05 us | 6.05 us |
1500 | 1.20 us | 5.96 us | 0.00 us | 6.00 us |
Notice that the latency values are very similar between the twosolutions; however, whereas IP-TFS provides for constant highbandwidth, in some cases even exceeding plain Ethernet, ESP withpadding often greatly reduces available bandwidth.¶
We would like to thankDon Fedyk for help in reviewing and editingthis work. We would also like to thankMichael Richardson,Sean Turner,Valery Smyslov, andTero Kivinen for reviews and manysuggestions for improvements, as well asJoseph Touch for thetransport area review and suggested improvements.¶
The following person made significant contributions to this document.¶