Movatterモバイル変換


[0]ホーム

URL:


RFC 9616Babel RTT ExtensionSeptember 2024
Jonglez & ChroboczekStandards Track[Page]
Stream:
Internet Engineering Task Force (IETF)
RFC:
9616
Category:
Standards Track
Published:
ISSN:
2070-1721
Authors:
B. Jonglez
ENS Lyon
J. Chroboczek
IRIF, Université Paris Cité

RFC 9616

Delay-Based Metric Extension for the Babel Routing Protocol

Abstract

This document defines an extension to the Babel routing protocol thatmeasures the round-trip time (RTT) between routers and makes it possibleto prefer lower-latency links over higher-latency ones.

Status of This Memo

This is an Internet Standards Track document.

This document is a product of the Internet Engineering Task Force (IETF). It represents the consensus of the IETF community. It has received public review and has been approved for publication by the Internet Engineering Steering Group (IESG). Further information on Internet Standards is available in Section 2 of RFC 7841.

Information about the current status of this document, any errata, and how to provide feedback on it may be obtained athttps://www.rfc-editor.org/info/rfc9616.

Copyright Notice

Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved.

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.

Table of Contents

1.Introduction

The Babel routing protocol[RFC8966] does not mandatea specific algorithm for computing metrics; existing implementations usea packet-loss-based metric on wireless links and a simple hop-count metricon all other types of links. While this strategy works reasonably well inmany networks, it fails to select reasonable routes in some topologiesinvolving tunnels or VPNs.

                   +------------+                   | A (Paris)  +---------------+                   +------------+                \                  /                               \                 /                                 \                /                                   \  +------------+                                     +------------+  | B  (Paris) |                                     | C  (Tokyo) |  +------------+                                     +------------+                \                                   /                 \                                 /                  \                               /                   +------------+                /                   | D (Paris)  +---------------+                   +------------+
Figure 1:Four Routers in a Diamond Topology

For example, consider the topology described inFigure 1,with three routers A, B, and D located in Paris and a fourth routerC located in Tokyo, connected through tunnels in a diamond topology. Whenrouting traffic from A to D, it is obviously preferable to use the localroute through B as this is likely to provide better service quality andlower monetary cost than the distant route through C. However, theexisting implementations of Babel consider both routes as having the samemetric; therefore, they will route the traffic through C in roughly half thecases.

In the first part of this document (Section 3),we specify an extension to the Babel routing protocol that producesa sequence of accurate measurements of the round-trip time (RTT) betweentwo Babel neighbours. These measurements are not directly usable as aninput to Babel's route selection procedure since they tend to be noisyand to cause a negative feedback loop, which might give rise to frequentoscillations. In the second part (Section 4), wedefine an algorithm that maps the sequence of RTT samples to a link costthat can be used for route selection.

1.1.Applicability

The extension defined inSection 3 providesa sequence of accurate but potentially noisy RTT samples. Since theRTT is a symmetric measure of delay, this protocol is onlyapplicable in environments where the symmetric delay is a good predictorof whether a link should be taken by routing traffic, which might notnecessarily be the case in networks built over exotic link technologies.

The extension makes minimal requirements on the nodes. In particular,it does not assume synchronised clocks, and only requires that clock driftbe negligible during the time interval between two Hello TLVs. Since thatis on the order of a few seconds, this requirement is met even with cheapcrystal oscillators, such as the ones used in consumer electronics.

The algorithm defined inSection 4 depends ona number of assumptions about the network. The assumption with the mostsevere consequences is that all links below a certain RTT (rtt-min inSection 4.2) can be grouped in a single category of"good" links. While this is the case in wide-area overlay networks, itmakes the algorithm inapplicable in networks where distinguishing betweenlow-latency links is important.

There are other assumptions, but they are less likely to limit thealgorithm's applicability. The algorithm assumes that all links abovea certain RTT (rtt-max inSection 4.2) are equally bad, and they will only be used as a last resort. Inaddition, in order to avoid oscillations, the algorithm is designed toreact slowly to RTT variations, thus causing suboptimal routing forseconds or even minutes after an RTT change; while this is a desirableproperty in fixed networks, as it avoid excessive route oscillations, itmight be an issue with networks with high rates of node mobility.

2.Specification of Requirements

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14[RFC2119][RFC8174] when, and only when, they appear in all capitals, as shown here.

3.RTT Sampling

3.1.Data Structures

We assume that every Babel speaker maintains a local clock that countsmicroseconds from an arbitrary origin. We do not assume that clocks aresynchronised: clocks local to distinct nodes need not share a commonorigin. The protocol will eventually recover if the clock is stepped, soclocks need not persist across node reboots.

Every Babel speaker maintains a Neighbour Table, described inSection 3.2.4 of [RFC8966]. This extension extends everyentry in the Neighbour Table with the following data:

  • the Origin Timestamp, a 32-bit timestamp (modulo 232) according tothe neighbour's clock;
  • the Receive Timestamp, a 32-bit timestamp (modulo 232) according tothe local clock.

Both values are initially undefined.

3.2.Protocol Operation

The RTT to a neighbour is estimated using an algorithm due to Mills[RFC891], originally developed for the HELLO routingprotocol and later used in NTP[RFC5905].

A Babel speaker periodically sends Hello messages to its neighbours(Section 3.4.1 of [RFC8966]). Additionally, itoccasionally sends a set of IHU ("I Heard You") messages, at most one perneighbour (Section 3.4.2 of [RFC8966]).

   A          B     |      |  t1 +      |     |\     |     | \    |     |  \   |  Hello(t1)     |   \  |     |    \ |     |     \|     |      + t1'     |      |     |      |               RTT = (t2 - t1) - (t2' - t1')     |      |     |      + t2'     |     /|     |    / |     |   /  |     |  /   |  Hello(t2')     | /    |  IHU(t1, t1')     |/     |  t2 +      |     |      |     v      v
Figure 2:Mills' Algorithm

In order to enable the computation of RTTs, a node AMUST include, inevery Hello that it sends, a timestamp t1 (according to A's local clock),as illustrated inFigure 2. When a node B receives A'stimestamped Hello, it computes the time t1' at which the Hello wasreceived (according to B's local clock). It thenMUST record the value t1in the Origin Timestamp field of the Neighbour Table entry correspondingto A and the value t1' in the Receive Timestamp field of the NeighbourTable entry.

When B sends an IHU to A, it checks whether both timestamps are definedin the Neighbour Table. If that is the case, then itMUST ensure that itsIHU TLV is sent in a packet that also contains a timestamped Hello TLV(either a normally scheduled Hello or an unscheduled Hello, seeSection 3.4.1 of [RFC8966]). ItMUST include in the IHUboth the Origin Timestamp and the Receive Timestamp stored in the NeighbourTable.

Upon receiving B's packet, A computes the time t2 (according to itslocal clock) at which it was received. Node AMUST then verify that itcontains both a Hello TLV with timestamp t2' and an IHU TLV with twotimestamps t1 and t1'. If that is the case, A computes the value:

RTT = (t2 - t1) - (t2' - t1')

(where all computations are done modulo232), which is a measurement of the RTT between A and B. (A then storesthe values t2' and t2 in its Neighbour Table, as B did before.)

This algorithm has a number of desirable properties:

  1. Thealgorithm is symmetric: A and B use the same procedures for timestampingpackets and computing RTT samples, and both nodes produce one RTT samplefor each received (Hello, IHU) pair.
  2. Since there is norequirement that t1' and t2' be equal, the protocol is asynchronous: theonly change to Babel's message scheduling is the requirement that a packetcontaining an IHU also contain a Hello.
  3. Since the algorithm only evercomputes differences of timestamps according to a single clock, it doesnot require synchronised clocks.
  4. The algorithm requires very littleadditional state: a node only needs to store the two timestamps associatedwith the last hello received from each neighbour.
  5. Since the algorithm onlyrequires piggybacking one or two timestamps on each Hello and IHU TLV,it makes efficient use of network resources.

In principle, this algorithm is inaccurate in the presence of clockdrift (i.e., when A's clock and B's clock are running at different frequencies).However, t2' - t1' is usually on the order of a few seconds, andsignificant clock drift is unlikely to happen at that time scale.

In order for RTT values to be consistent between implementations,timestamps need to be computed at roughly the same point in the networkstack. Transmit timestampsSHOULD be computed just before the packet ispassed to the network stack (i.e., before it is subjected to any queueingdelays); receive timestampsSHOULD be computed just after the packetis received from the network stack.

3.3.Wrap-Around and Node Restart

Timestamp values are a count of microseconds stored as a 32-bitunsigned integer; thus, they wrap around every 71 minutes or so. What ismore, a node may occasionally reboot and restart its clock at an arbitraryorigin. For these reasons, very old timestamps or nonsensical timestampsMUST NOT be used to yield RTT samples.

The following algorithm can be used to discard obsolete samples. When a nodereceives a packet containing a Hello and an IHU, it compares the currentlocal time t2 with the Origin Timestamp contained in the IHU; if theOrigin Timestamp appears to be in the future, or if it is in the past bymore than a time T (the value T = 3 minutes is recommended), then thetimestamps are still recorded in the Neighbour Table, but they are not used forcomputation of an RTT sample.

Similarly, the node compares the Hello's timestamp with the ReceiveTimestamp recorded in the Neighbour Table; if the Hello's timestampappears to be older than the recorded timestamp, or if it appears to bemore recent by an interval larger than the value T, then the timestampsare not used for computation of an RTT sample.

3.4.Implementation Notes

The accuracy of the computed RTT samples depends on Transmit Timestampsbeing computed as late as possible before a packet containing a Hello TLVis passed to the network stack, and Receive Timestamps being computed asearly as possible after reception of a packet containing a (Hello, IHU)pair. We have found the following implementation strategy to beuseful.

When a Hello TLV is buffered for transmission, we insert a PadN sub-TLV(Section 4.7.2 of[RFC8966]) with a length of 4 octetswithin the TLV. When the packet is ready to be sent, we check whether itcontains a 4-octet PadN sub-TLV; if that's the case, we overwrite the PadNsub-TLV with a Timestamp sub-TLV with the current time, and send out thepacket.

Conversely, when a packet is received, we immediately compute thecurrent time and record it with the received packet. We then process thepacket as usual and use the recorded timestamp in order to compute an RTTsample.

The protocol is designed to survive the clock being reset when a nodereboots; on POSIX systems, this makes it possible to use theCLOCK_MONOTONIC clock for computing timestamps. If CLOCK_MONOTONIC is notavailable, CLOCK_REALTIME may be used, since the protocol is able tosurvive the clock being occasionally stepped.

4.RTT-Based Route Selection

The protocol described above yields a series of RTT samples. Whilethese samples are fairly accurate, they are not directly usable as aninput to the route selection procedure, for at least three reasons:

  1. In the presence of bursty traffic, routers experiencetransient congestion, which causes occasional spikes in the measured RTT.Thus, the RTT signal may be noisy and require smoothing before it canbe used for route selection.
  2. Using the RTT signal for route selection gives rise toa negative feedback loop. When a route has a low RTT, it is deemed to bemore desirable; this causes it to be used for more data traffic, whichmay lead to congestion, which in turn increases the RTT. Without someform of hysteresis, using RTT for route selection would lead tooscillations between parallel routes, which would lead to packetreordering and negatively affect upper-layer protocols (such as TCP).
  3. Even in the absence of congestion, the RTT tends to exhibit somevariation. If the RTTs of two parallel routes oscillate arounda common value, using the RTT as input to route selection will causefrequent routing oscillations, which, again, indicates the need for someform of hysteresis.

In this section, we describe an algorithm that integrates smoothing andhysteresis. It has been shown to behave well both in simulation andexperimentally over the Internet[DELAY-BASED] and isRECOMMENDED when RTT information is being used for route selection. Thealgorithm is structured as follows:

4.1.Smoothing

The RTT samples provided by Mills' algorithm are fairly accurate, butnoisy: experiments indicate the occasional presence of individual samplesthat are much larger than the expected value. Thus, some form ofsmoothingSHOULD be applied in order to avoid instabilities due tooccasional outliers.

An implementationMAY use the exponential average algorithm, which issimple to implement and appears to yield good results in practice[DELAY-BASED]. The algorithm is parameterised by a constant α,where 0 < α < 1, which controls the amount of smoothing beingapplied. For each neighbour, it maintains a smoothed value RTT, which isinitially undefined. When the first sample RTT0 is measured, the smoothedvalue is set to the value of RTT0. At each new sample RTTn, the smoothedvalue is set to a weighted average of the previous smoothed value and thenew sample:

    RTT := α RTT + (1 - α) RTTn

The smoothing constant αSHOULD be between 0.8 and 0.9; the value 0.836is theRECOMMENDED default.

4.2.Cost Computation

The smoothed RTT value obtained in the previous step needs to be mappedto a link cost, suitable for input to the metric computation procedure(Section 3.5.2 of [RFC8966]). Obviously, the mappingshould be monotonic (larger RTTs imply larger costs). In addition, themapping should be constant beyond a certain value (all very bad links areequally bad) so that congested links do not contribute to routinginstability. The mapping should also be constant around 0, so that smalloscillations in the RTT of low-RTT links do not contribute to routinginstability.

  cost    ^    |    |    |                           C + max-rtt-penalty    |                       +---------------------------    |                      /.    |                     / .    |                    /  .    |                   /   .    |                  /    .    |                 /     .    |                /      .    |               /       .    |              /        .    |             /         .  C +------------+          .    |            .          .    |            .          .    |            .          .    |            .          .  0 +---------------------------------------------------->    0         rtt-min    rtt-max                          RTT
Figure 3:Mapping from RTT to Link Cost

ImplementationsSHOULD use the mapping described inFigure 3, which is parameterised by three parameters:rtt-min, rtt-max, and max-rtt-penalty. For RTT values below rtt-min, thelink cost is just the nominal cost C of a single hop. Between rtt-min andrtt-max, the cost increases linearly; above rtt-max, the constant valuemax-rtt-penalty is added to the nominal cost.

The value rtt-min should be slightly larger than the RTT of a local,uncongested link. The value rtt-max should be the RTT above which a linkshould be avoided if possible, either because it is a long-distance linkor because it is congested; reducing the value of rtt-max improvesstability, but prevents the protocol from discriminating betweenhigh-latency links. As for max-rtt-penalty, it controls how much theprotocol will penalise long-distance links. The default valuesrtt-min = 10 ms, rtt-max = 120 ms, and max-rtt-penalty = 150 areRECOMMENDED.

4.3.Hysteresis

Even after applying a bounded mapping from smoothed RTT to a costvalue, the cost may fluctuate when a link's RTT is between rtt-min andrtt-max. ImplementationsSHOULD use a robust hysteresis algorithm, suchas the one described inAppendix A.3 of [RFC8966].

5.Backwards and Forwards Compatibility

This protocol extension stores the data that it requires withinsub-TLVs of Babel's Hello and IHU TLVs. As discussed inAppendix D of [RFC8966], implementations that do not understand thisextension will silently ignore the sub-TLVs while parsing the rest of theTLVs that they contain. In effect, this extension supports buildinghybrid networks consisting of extended and unextended routers; whilesuch networks might suffer from sub-optimal routing, they will not sufferfrom routing loops or other pathologies.

If a sub-TLV defined in this extension is longer than expected, theadditional data is silently ignored. This provision is made in order toallow a future version of this protocol to extend the packet format withadditional data, for example high-precision or absolute timestamps.

6.Packet Format

This extension defines the Timestamp sub-TLV whose Type field has the value 3. This sub-TLV can be contained within a Hello sub-TLV, in which case itcarries a single timestamp, or within an IHU sub-TLV, in which case itcarries two timestamps.

Timestamps are encoded as 32-bit unsigned integers (modulo 232),expressed in units of one microsecond, counting from an arbitrary origin.Timestamps wrap around every 4295 seconds, or roughly 71 minutes (see alsoSection 3.3).

6.1.Timestamp Sub-TLV in Hello TLVs

When contained within a Hello TLV, the Timestamp sub-TLVhas the following format:

 0                   1                   2                   3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+|   Type = 3    |    Length     |      Transmit Timestamp       |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+|          (continued)          |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Type:
Set to 3 to indicate a Timestamp sub-TLV.
Length:
The length of the body in octets, exclusive of the Typeand Length fields.
Transmit Timestamp:
The time at which the packet containingthis sub-TLV was sent, according to the sender's clock.

If the Length field is larger than the expected 4 octets, the sub-TLVMUST be processed normally (the first 4 octets are interpreted asdescribed above) and any extra data contained in this sub-TLVMUST besilently ignored. If the Length field is smaller than the expected4 octets, then this sub-TLVMUST be ignored (and the remainder of theenclosing TLV processed as usual).

6.2.Timestamp Sub-TLV in IHU TLVs

When contained in an IHU TLV, the Timestamp sub-TLV has the followingformat:

 0                   1                   2                   3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+|   Type = 3    |    Length     |        Origin Timestamp       |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+|          (continued)          |        Receive Timestamp      |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+|          (continued)          |+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Type:
Set to 3 to indicate a Timestamp sub-TLV.
Length:
The length of the body in octets, exclusive of the Typeand Length fields.
Origin Timestamp:
A copy of the Transmit Timestamp of the lastTimestamp sub-TLV contained in a Hello TLV received from the node to whichthe enclosing IHU TLV applies.
Receive Timestamp:
The time, according to the sender's clock,at which the last timestamped Hello TLV was received from the node to whichthe enclosing IHU TLV applies.

If the Length field is larger than the expected 8 octets, the sub-TLVMUST be processed normally (the first 8 octets are interpreted asdescribed above), and any extra data contained in this sub-TLVMUST besilently ignored. If the Length field is smaller than the expected8 octets, then this sub-TLVMUST be ignored (and the remainder of theenclosing TLV processed as usual).

7.IANA Considerations

IANA has added the following entry to the "Babel Sub-TLV Types"registry:

Table 1
TypeNameReference
3TimestampRFC 9616

8.Security Considerations

This extension adds timestamping data to two of the TLVs sent by a Babel router. By broadcasting the value of a reasonably accurate local clock, these additional data might make a node more susceptible to timing attacks.

Broadcasting an accurate time raises privacy issues. The timestampsused by this protocol have an arbitrary origin; therefore, they do not leaka node's boot time or time zone. However, having access to accuratetimestamps could allow an attacker to determine the physical location ofa node. Nodes might avoid disclosure of location information by notincluding Timestamp sub-TLVs in the TLVs that they send, which will causetheir neighbours to fall back to hop-count routing.

9.References

9.1.Normative References

[RFC2119]
Bradner, S.,"Key words for use in RFCs to Indicate Requirement Levels",BCP 14,RFC 2119,DOI 10.17487/RFC2119,,<https://www.rfc-editor.org/info/rfc2119>.
[RFC8174]
Leiba, B.,"Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words",BCP 14,RFC 8174,DOI 10.17487/RFC8174,,<https://www.rfc-editor.org/info/rfc8174>.
[RFC8966]
Chroboczek, J. andD. Schinazi,"The Babel Routing Protocol",RFC 8966,DOI 10.17487/RFC8966,,<https://www.rfc-editor.org/info/rfc8966>.

9.2.Informative References

[DELAY-BASED]
Jonglez, B.,Boutier, M., andJ. Chroboczek,"A delay-based routing metric",DOI 10.48550/arXiv.1403.3488,,<http://arxiv.org/abs/1403.3488>.
[RFC891]
Mills, D.,"DCN Local-Network Protocols",STD 44,RFC 891,DOI 10.17487/RFC0891,,<https://www.rfc-editor.org/info/rfc891>.
[RFC5905]
Mills, D.,Martin, J., Ed.,Burbank, J., andW. Kasch,"Network Time Protocol Version 4: Protocol and Algorithms Specification",RFC 5905,DOI 10.17487/RFC5905,,<https://www.rfc-editor.org/info/rfc5905>.

Acknowledgements

The authors are indebted toJean-Paul Smets, who prompted theinvestigation that originally lead to this protocol. We are also gratefultoDonald Eastlake, 3rd,Toke Høiland-Jørgensen,Maria Matejka,David Schinazi,Pascal Thubert,Steffen Vogel, andOndřej Zajiček.

Authors' Addresses

Baptiste Jonglez
ENS Lyon
France
Email:baptiste.jonglez@ens-lyon.org
Juliusz Chroboczek
IRIF, Université Paris Cité
Case 7014
75205 Paris Cedex 13
France
Email:jch@irif.fr

[8]ページ先頭

©2009-2025 Movatter.jp