Internet Engineering Task Force Hadi Salim, JInternet Draft Nandy, B Seddigh, N Computing Technology Labs, Nortel June 1998 <draft-salim-jhsbnns-ecn-00.txt> A proposal for Backward ECN for the Internet Protocol (IPv4/IPv6)Status of this Memo This document is an Internet-Draft. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." To view the entire list of current Internet-Drafts, please check the "1id-abstracts.txt" listing contained in the Internet-Drafts Shadow Directories on ftp.is.co.za (Africa), ftp.nordu.net (Northern Europe), ftp.nis.garr.it (Southern Europe), munnari.oz.au (Pacific Rim), ftp.ietf.org (US East Coast), or ftp.isi.edu (US West Coast).AbstractThis memo proposes an alternative approach to the current ECN mechanismas proposed in the internet draft [draft-kksjf]. A Backward-ECN(BECN)is proposed which uses the existing IP signalling mechanism, theInternet Control Messaging Protocol (ICMP) [RFC 792] Source Quenchmessage. The use of ICMP Source Quench (ISQ) allows a basic ECNmechanism for IP which does not require any negotiation between endsystems. Congestion notification is kept at the network(IP) level. Thecongestion state can be reflected up to the transport layer (e.g. TCP orUDP) for appropriate action. The ISQ based approach reduces the reactiontime to a congestion in the network. In addition, the ISQ message caninclude information on the severity of the congestion allowing the endhost to react accordingly so as to make maximal use of the resourceswhile maintaining network equilibrium.Hadi et al Expires December 1998 [Page 1]
Internet Draft Backward ECN for the Internet Protocol June 19981.0 IntroductionIP currently does not have any adhered to mechanism to notify itstransport protocols of network congestion problems. ISQs have been inthe past used for congestion notification; TCP implements its owncongestion control algorithm and makes inferences about networkcongestion: TCP-Reno and variants use packet losses as an indicatorwhereas TCP-Vegas uses delay/throughput as the indicator. UDPapplications are usually unresponsive and the protocols running over UDP(e.g., RTP) use their own congestion control methods if they do at all.The initial suggestions to introduce a methodology for adding ExplicitCongestion Notification to IP are outlined in [Floyd94] and later in theIETF draft [draft-kksjf].1.1 Current ECN Proposal [draft-kksjf]Bits 10 and 11 in the IPV6 header are proposed respectively for the ECT(ECN Capable Transport indicator) and CE (Congestion Experiencedindicator). Bits 6 and 7 of the IPV4 header TOS field are also proposedas the ECT and CE place holders respectively. The TCP header ismodified to add an additional flag, the ECN Echo, to notify the sender(from the receiver) that it is contributing to congestion. The flag'sbit-space is borrowed from the reserved field in the TCP header. Thisbit is also interchangebly referred to as the ECE bit in this text.The ECT bit is set by the sender end system if both the end systems areECN capable. This is confirmed in the pre-negotiation during theconnection setup phase in TCP. Packets encountering congestion aremarked (CE bit) by a router on their way to the receiver end system(from the sender end system), with a probability proportional to theirbandwidth usage following the procedure used in RED [RFC2309] routers.When the receiver end system receives the congestion causing packet withCE and ECT bits set, it informs the sender end system that it iscontributing to congestion by the setting of ECE bit in the ACK packet.The sender end system reacts by halving the congestion window uponreceiving the ACK packet. The sender end system reacts only once to ECEmessages per in-flight window of messages.1.2 Limitations of the Current ECN Proposal [draft-kksjf]1) The [draft-kksjf] proposal's congestion notification is coupled tothe transport layer(TCP) via the use of header information (ECE bit).To extend this proposal to other transport protocols will requirechanges to each of their respective headers.2) The proposed [draft-kksjf] scheme requires the congestionnotification to incur a round trip time (RTT) before the sender canreact. In a path with high delay-bandwidth product this would beHadi et al Expires December 1998 [Page 2]
Internet Draft Backward ECN for the Internet Protocol June 1998problematic for two reasons: i) in the scenario where the delay-bandwidth product is dominated mostly by the high bandwidth (as in inhigh-speed networks), a large amount of traffic will pass through theintermediate routers causing an increase in congestion level before thesender is notified. ii) in the scenario where the delay-bandwidthproduct is dominated mostly by the high latency/RTT (as in satellitenetworks), the reaction will take too long to address the congestionissue. In both cases, the efficient use of the available bandwidth isaffected.3) Because of the binary nature of the feedback, the reaction is limitedto halving the window size even if the congestion level is very low.Network resources could be more effectively utilized if the feedback wasindicative of the congestion level at the overloaded point in thenetwork.In this document we introduce a Backward ECN (BECN) which is a binaryfeedback mechanism and then an incremental improvement to BECN whichprovides Multi-level Backward ECN which we refer to as Multilevel ECN(MECN).Section 2 gives an introduction to our solution and how it addresses theabove limitations: a justification for using ISQ is made and BackwardECN (BECN) and then multi-level BECN (MECN).Section 3 goes into thedetails of BECN and suggests a role for the router and the end system.Section 4 goes into the details of MECN and suggests a role for therouter and the end system.Section 5 addresses the situation ofmultiple congested routers with our scheme.Section 6 is on securityissues.2.0 Network Level Signalling for ECNWe argue that ECN is a network level functionality and should bedecoupled from the transport protocols. A mechanism should be providedfor the end IP layer to inform its transport protocols of congestionproblems without using their header bit(s). This provides the value thatall IP transport protocols (including any new ones that might be addedin the future) are notified in the same manner about network congestion.In this document we only deal with TCP and in particular TCP mechanismswhich use packet drops as indicators of congestion such as TCP-Reno andits variants.It is assumed that the participating routers are capable of RED or someother active queue management mechanism. In such a router, a packet hasa probability of being dropped where this probability is dependent onaverage queue size. For packets with the ECT bit set in the IP header,instead of the packet being dropped it would have the CE bit in theHadi et al Expires December 1998 [Page 3]
Internet Draft Backward ECN for the Internet Protocol June 1998header set before being forwarded with a given probability if theaverage queue size goes between the minimum and maximum thresholds asdescribed in [draft-kksjf].We leverage ICMP's Source Quench message whose design intent is toprovide feedback to a source end system about network congestion. Boththe CE and ECT bits defined in [draft-kksjf] are maintained. During thede-multiplexing of the IP message, the values of both CE and ECT arepassed to the transport layer.We start by introducing a traditional ISQ which comprises a binaryfeedback mechanism and a relatively modified binary reaction at thesource end system (in comparison to what the requirements for the endhost's reaction to ISQ are at the moment [RFC1122])Definition: The term binary congestion feedback is used to definegathered knowledge of network congestion being passed back to an endnode, explicit or otherwise, ignoring the levels of congestion. Thedata only says that the network is congested.We then introduce a multilevel congestion feedback mechanism based onthe various incipient congestion levels detected at the RED router. Thesender end system in that scenario has the luxury of having more variedreactions based on the congestion level that is fed back. This resultsin effective use of the network resources and performance.Definition: The term multilevel congestion feedback is used to definegathered knowledge of network congestion being passed back to an endnode with explicit level indicators of how severely the network iscongested.We propose the multilevel congestion feedback and reaction as anincremental improvement over the binary congestion feedback and reactionmechanism. In sections3 and4 we suggest some simple algorithms forboth the binary and multilevel solutions.2.1 Backward ECN (BECN)This section briefly describes the binary feedback-reaction mechanism.ICMP Source Quench messages (ISQ) are generated by the intermediatecongested RED router and sent back to the source as an indication ofincipient congestion whenever that router decides to mark the CE bit.ISQs are usually not generated for a packet that has already been markedpreviously by another router regardless of whether that packet iscontributing to some congestion; however, when the router queue levelmandates that the packet be dropped then an ISQ is sent back to thesource regardless of whether the packet was marked previously or not.Hadi et al Expires December 1998 [Page 4]
Internet Draft Backward ECN for the Internet Protocol June 1998The source reacts at the transport protocol level by lowering its datathroughput into the network. In TCP, upon identifying the flow causingthe congestion, the sender reacts by halving both the congestion windowand the slow start threshold value for that flow. The sender does notreact to an ISQ message more than once per window. This is similar tothe algorithm defined in the draft[draft-kksjf].2.2 Multilevel BECN (MECN)This section briefly describes the multilevel congestion feedback-reaction.Multi-level ICMP Source Quench messages (ISQ) are generated by the REDrouter and sent back to the source as an indication of incipientcongestion whenever the CE bit is marked by the intermediate congestedrouter. The levels are based on the RED probability, and thereforeaverage queue size, at the time a congestive packet arrives at therouter. The congestion level sent back is a multiplicative factor of themarking probability and is stored in the 32-bit unused field of the ISQ.As an example the multiplicative value selected is 100. The upper limitof 100 is returned when the probability of dropping the packet is equalto one.(i.e average queue size is above maximum threshold). ISQs arenot generated for a packet that has already been marked; however, as inthe case of the BECN when the router queue level mandates that a packetis dropped then an ISQ is sent back to the source regardless of whetherthe packet was previously marked or not. The value is the maximum i.e100 in the above example.2.3 The argument to justify the use of ISQISQ messages, generated by a router to an end system, in the past havebeen considered inefficient due to the following reasons:1) Gateway CPU abuse while processing these extra messages and 2)Bandwidth consumption on the reverse path. It is suggested [RFC1812]that the routers, if implementing ISQs, should rate limit theirgeneration because they consume too much bandwidth in the reverse path.We argue that CPU time is no longer a constrained resource today andthat the benefits provided by ECN outweigh the small performance hitadded. Moreover, it has been shown [red-paper] that when using RED(with cooperating end systems) less packet drops happen at the router incomparison to the traditional drop-tail algorithms used in disapprovingISQ. This implies the amount of processing needed at the router isreduced. It has been quantitatively shown in simulations [kcho-97] thatonly about 1-5% of the packets are marked or dropped in a RED gatewayHadi et al Expires December 1998 [Page 5]
Internet Draft Backward ECN for the Internet Protocol June 1998under incipient congestion. We argue that a faster reaction to theproblem as provided by ISQ would alleviate the problem faster resultingin even further reductions.Using a RED gateway provides us with an advantage. A connection isnotified (by an ISQ in this case) of congestion at a rate proportionalto the connection's share of the bandwidth at the congested gateway.Generation of ISQ messages will be limited to the period between whenincipient congestion is detected all the way until the source end systemadjusts. In fact, given our scheme which addresses congested routerssequentially on a downstream path, we argue that the back-path even ifit is the same as the forward path is probably not really congestedsince it covers the path only to the first point of congestion alongthat path. More details insection 5.In essence RED addresses both the backward path congestion problem, ifthe back path is the same one as the forward path, as well as the routerprocessing concerns.3.0 Suggested BECN algorithmThis is a binary feedback-reaction mechanism. The ISQs sent by therouter to the source host act as an indication of incipient congestion.The source reacts at the transport level by lowering its congestionwindow. The algorithm supplied here is the same as the one used in theECN proposal [draft-kksjf]3.1 Role of the RouterIf the incoming message causes the average queue size to go above themaximum threshold, then drop the segment if the ECT bit is marked in the IP header send an ISQ back to the source.else if the incoming message causes the average queue to go betweenthe minimum and maximum thresholds then: if the RED probability chooses this packet and the ECT bit is set and if packet is not already marked then: mark the packet (CE bit) and send an ISQ back. else if RED chooses this packet and the ECT bit is not set then: drop the packet.3.2 Role of the Source End System If an ISQ message is received then the sender knows that there isnetwork congestion. The flow causing the congestion is identified fromthe ICMP data. The TCP source reacts by halving both the congestionwindow and the slow start threshold value for that flow.Hadi et al Expires December 1998 [Page 6]
Internet Draft Backward ECN for the Internet Protocol June 1998 The sender does not react to ISQ more than once per window. Uponreceipt of an ISQ packet at time t, it notes the packets that areoutstanding at that time (sent but not yet acked) and waits until a timeu when they have all been acknowledged before reacting to a new ISQmessage.4.0 Suggested MECN algorithmThis is an evolution of BECN. The router now sends levels of congestionnotification and the source end system reacts differently depending onthe severity of the congestion. The level of notification is stored inthe 32-bit unused field in the ISQ.4.1 Role of the Router4.1.1 How the congestion level weight is computedPb refers to the computed RED packet marking probability. Pb is afunction of the computed average queue size. As the average queue sizevaries from minimum to maximum threshold, Pb varies between 0 and themaximum value set for it, Maxp. Note that we quantify Pb to be onewhen the threshold is above maximum; in that particular case, themaximum weight is sent to the source system. We choose for simplicity'ssake a multiplicative factor to be 100 to fashion the weight as apercentage congestion level. Above the maximum threshold we send a valueof 100 in the feedback message indicating 100% incipient congestion. Wemultiply Pb by some factor such that we get a reflection of 99%congestion when Pb reaches its maximum value and we add 1 to counter forthe fact that Pb is zero at the minimum threshold. The equation used tocompute the weight to send between the minimum and maximum thresholdsis:level= Pb*(98/Maxp) + 1At the maximum threshold the weight sent is 99 and at minimum thresholdthe weight sent is 1. For efficiency, 98/Maxp could be computed at REDinitialization.4.1.2 The Router functionalityIf the incoming message causes the average queue size to go above themaximum threshold, then: drop the packet, if the ECT bit is marked in the IP header then: send an ISQ back to the source with a weight of 100Hadi et al Expires December 1998 [Page 7]
Internet Draft Backward ECN for the Internet Protocol June 1998If the incoming message causes the average queue to go between theminimum and maximum thresholds then: if the RED probability picks this packet then: if the ECT bit is set and the CE bit is not already marked then: mark the packet and send an ISQ of integer level 1+(Pb*98/Maxp) back to the source else (the ECT bit is not set in the IP header) then: drop the packet.4.2 Role of the End SystemThe end system can now react to a shade of congestion levelnotifications.We show here a simple algorithm that could be incrementally improved.We react to each ISQ received under the assumption that the effect ofburstiness and spuriousness is accounted for by the RED algorithm at therouter. Since a weight of 100 indicates that the packet was dropped weuse this information to improve RTO in TCP by retransmitting thatpacket. Note that the packet sequence number can be deduced from the 8bytes of the TCP header passed back in the ISQ message (ISQs always pass8 bytes on top of the IP header's information). The slow start,congestion avoidance and Fast retransmit/recovery mechanics aremaintained.4.2.1 The Source end system functionalityIf an ISQ message is received then the sender knows that there isnetwork congestion. The flow causing the congestion is identified fromthe ICMP data and the congestion level is extracted.If the congestion level == 100 then: extract the TCP sequence number from the ISQ. retransmit the packet. cut the congestion window and threshold value by 1/2.else (we are between max and min threshold at the router) then: if congestion level >=50 then: cut the congestion window and threshold value by 1/2. else (anything below 50%) then: congestion window is linearly decremented by 1.Note: a) The usual rules about the lower bounds of the threshold andcongestion window values apply when decrementing.b) The MECN method outlined above will have interactions with theHadi et al Expires December 1998 [Page 8]
Internet Draft Backward ECN for the Internet Protocol June 1998existing congestion control mechanisms in TCP. The overall effect stillslows down the system throughput if the congestion levels warrant it.5.0 Multiple congested routersMultiple congested routers on the path between the sender and thereceiver have their concerns addressed one at a time in a domino effect.If any of the downstream routers are congested to the extent of a packetdrop then that router's congestion concerns are addressed immediately.If a packet is marked by a congested router, no ISQ message is generatedfurther for it on its way to the destination. The exception to the ruleis, if along the path after the marking, some other intermediate routerdecides to drop this packet. In that case it will transmit an ISQ oflevel 100 to which the end system will have to invoke the congestionreaction immediately. Therefore any router which is congested to thelevel of dropping packets will participate in the congestion control.Routers which are closer to the source will be favored in the sense thattheir incipient congestion levels will be reacted to first. If the flowis long enough, the router closest to the source will have itscongestion concerns serviced first with the next downstream routerserviced next and so forth with the router closest to the destinationbeing the last one responded to. The bias is more eminent when afurther downstream router (other the one that marked the packet) wouldhave sent a higher notification level had it had the opportunity i.e hada packet not been marked and given a lesser weight in a previous router.We feel that this bias is not of great significance given that anydownstream router dropping a packet will contribute to the congestionreaction at the source.6.0 Security issuesISQ messages can be spoofed. This can be used for a Denial of Serviceattack on a source end system. Building authentication is probably tooheavy weight. This is a problem faced by IP in general and so we havenot attempted to address it.7.0 References[draft-kksjf] Ramakrishanan, KK and Floyd, S. A proposal to add ExplicitCongestion Notification(ECN) to IPv6 and to TCP, IETF Draftdraft-kksjf-ECN-00.txt, November 1997.[Floyd94] Floyd, S. TCP and Explicit Congestion Notification, ACMComputer Communications Review, V.24N, October 1994.[red-paper] Floyd,S. and Jacobson, V. Random Early Detection Gatewaysfor Congestion Avoidance, IEEE/ACM Transactions on Networking,Aug 1993.Hadi et al Expires December 1998 [Page 9]
Internet Draft Backward ECN for the Internet Protocol June 1998[kcho-97] Cho, K.J. ALTQ/RED Performance,http://www.csl.csl.sony.co.jp/person/kjc/red/perf.html[RFC 792] Postel, J Internet Control Message Protocol (sep 1981)[RFC1122] Braden, R (Editor) Requirements for Internet Hosts --Communication Layers (oct 1989).[RFC2309] Braden, B.,Clark, D.,Crowcroft, J.,Davie, B., Deering, S.,Estrin, D., Floyd, S., Jacobson, V., Minshall, G., Partridge, C.,Peterseon, L., Ramakrishnan, K., Shenker, S.,Wroclaski, J., and Zhang,L. Recommendations on Queue Management and Congestion Avoidance in theInternet (April 1998).[RFC 1812] Baker, F. Requirements for IPv4 routers (June 1995).8.0 AcknowledgementsThe authors are much indebted to Alan Chapman. Without his insight andmultiple edits the ideas embedded in here would have been much difficultto present.9.0 Authors' AddressesJamal Hadi Salim,Computing Technology Labs,Nortel Canada,PO Box 3511 Station COttawa ON K1Y 4H7CanadaPhone: 613-763-6395Email: hadi@nortel.comBiswajit Nandy,Computing Technology Labs,Nortel Canada,PO Box 3511 Station COttawa ON K1Y 4H7CanadaPhone: 613-765-3709Email: bnandy@nortel.comNabil Seddigh,Computing Technology Labs,Nortel Canada,Hadi et al Expires December 1998 [Page 10]
Internet Draft Backward ECN for the Internet Protocol June 1998PO Box 3511 Station COttawa ON K1Y 4H7CanadaPhone: 613-763-6396Email: nseddigh@nortel.comHadi et al Expires December 1998 [Page 11]
draft-salim-jhsbnns-ecn-00
Expired Internet-Draft (individual)
Document | Document type | Expired Internet-Draft (individual) Expired & archived This document is an Internet-Draft (I-D). Anyone may submit an I-D to the IETF. This I-D isnot endorsed by the IETF and hasno formal standing in theIETF standards process. | |
---|---|---|---|
Select version | |||
Authors | Dr. Biswajit Nandy,Nabil Seddigh,Jamal Hadi Salim Email authors | ||
RFC stream | (None) | ||
Intended RFC status | (None) | ||
Other formats |