Movatterモバイル変換

[RFC Home] [TEXT|PDF|HTML] [Tracker] [IPR] [Errata] [Info page]

INFORMATIONAL
Errata Exist

Network Working Group                                   D.L.  MillsRequest for Comments:  889                              December 1983Internet Delay ExperimentsThis memo reports on some measurement experiments and suggests some possibleimprovements to the TCP retransmission timeout calculation.  This memo isboth a status report on the measurements and advice to implementers of TCP.1.  Introduction     This memorandum describes two series of experiments designed to explorethe transmission characteristics of the Internet system.  One series ofexperiments was designed to determine the network delays with respect topacket length, while the other was designed to assess the effectiveness of theTCP retransmission-timeout algorithm specified in the standards documents.Both sets of experiments were conducted during the October - November 1983time frame and used many hosts distributed throughout the Internet system.     The objectives of these experiments were first to accumulate experimentaldata on actual network paths that could be used as a benchmark of Internetsystem performance, and second to apply these data to refine individual TCPimplementations and improve their performance.     The experiments were done using a specially instrumented measurement hostcalled a Fuzzball, which consists of an LSI-11 running IP/TCP and variousapplication-layer protocols including TELNET, FTP and SMTP mail.  Among thevarious measurement packages is the original PING (Packet InterNet Groper)program used over the last six years for numerous tests and measurements ofthe Internet system and its client nets.  This program contains facilities tosend various kinds of probe packets, including ICMP Echo messages, process thereply and record elapsed times and other information in a data file, as wellas produce real-time snapshot histograms and traces.     Following an experiment run, the data collected in the file were reducedby another set of programs and plotted on a Peritek bit-map display with colormonitor.  The plots have been found invaluable in the indentification andunderstanding of the causes of netword glitches and other "zoo" phenomena.Finally, summary data were extracted and presented in this memorandum.  Theraw data files, including bit-map image files of the various plots, areavailable to other experimenters upon request.     The Fuzzballs and their local-net architecture, called DCN, have abouttwo-dozen clones scattered worldwide, including one (DCN1) at the LinkabitCorporation offices in McLean, Virginia, and another at the NorwegianTelecommunications Adminstration (NTA) near Oslo, Norway.  The DCN1 Fuzzballis connected to the ARPANET at the Mitre IMP by means of 1822 Error ControlUnits operating over a 56-Kbps line.  The NTA Fuzzball is connected to theNTARE Gateway by an 1822 interface and then via VDH/HAP operating over a9.6-Kbps line to SATNET at the Tanum (Sweden) SIMP.  For most experimentsdescribed below, these details of the local connectivity can be ignored, sinceonly relatively small delays are involved.

Internet Delay Experiments Page 2D.L. Mills The remote test hosts were selected to represent canonical paths in theInternet system and were scattered all over the world. They included some onthe ARPANET, MILNET, MINET, SATNET, TELENET and numerous local nets reachablevia these long-haul nets. As an example of the richness of the Internetsystem connectivity and the experimental data base, data are included forthree different paths from the ARPANET-based measurement host to London hosts,two via different satellite links and one via an undersea cable.2. Packet Length Versus Delay This set of experiments was designed to determine whether delays acrossthe Internet are significantly influenced by packet length. In cases wherethe intrinsic propagation delays are high relative to the time to transmit anindividual packet, one would expect that delays would not be strongly affectedby packet length. This is the case with satellite nets, including SATNET andWIDEBAND, but also with terrestrial nets where the degree of trafficaggregation is high, so that the measured traffic is a small proportion of thetotal traffic on the path. However, in cases where the intrinsic propagationdelays are low and the measured traffic represents the bulk of the traffic onthe path, quite the opposite would be expected. The objective of the experiments was to assess the degree to which TCPperformance could be improved by refining the retransmission-timeout algorithmto include a dependency on packet length. Another objective was to determinethe nature of the delay characteristic versus packet length on tandem pathsspanning networks of widely varying architectures, including local-nets,terrestrial long-haul nets and satellite nets.2.1. Experiment Design There were two sets of experiments to measure delays as a function ofpacket length. One of these was based at DCN1, while the other was based atNTA. All experiments used ICMP Echo/Reply messages with embedded timestamps.A cycle consisted of sending an ICMP Echo message of specified length, waitingfor the corresponding ICMP Reply message to come back and recording theelapsed time (normalized to one-way delay). An experiment run, resulting inone line of the table below, consisted of 512 of these volleys. The length of each ICMP message was determined by a random-numbergenerator uniformly distributed between zero and 256. Lengths less than 40were rounded up to 40, which is the minimum datagram size for an ICMP messagecontaining timestamps and just happens to also be the minimum TCP segmentsize. The maximum length was chosen to avoid complications due tofragmentation and reassembly, since ICMP messages are not ordinarilyfragmented or reassembled by the gateways. The data collected were first plotted as a scatter diagram on a colorbit-map display. For all paths involving the ARPANET, this immediatelyrevealed two distinct characteristics, one for short (single-packet) messagesless than 126 octets in length and the other for long (multi-packet) messages

Internet Delay Experiments                                              Page 3D.L. Millslonger than this.  Linear regression lines were then fitted to eachcharacteristic with the results shown in the following table.  (Only onecharacteristic was assumed for ARPANET-exclusive paths.) The table shows foreach host the delays, in milliseconds, for each type of message along with arate computed on the basis of these delays.  The "Host ID" column designatesthe host at the remote end of the path, with a letter suffix used whennecessary to identify a particular run.

Internet Delay Experiments                                              Page 4D.L. MillsHost    Single-packet   Rate    Multi-packet    Rate    CommentsID      40      125     (bps)   125     256     (bps)---------------------------------------------------------------------------DCN1 to nearby local-net hosts (calibration)DCN5    9                               13      366422  DMA 1822DCN8    14                              20      268017  EthernetIMP17   22                              60      45228   56K 1822/ECUFORD1   93                              274     9540    9600 DDCMP baseUMD1    102                             473     4663    4800 synchDCN6    188                             550     4782    4800 DDCMPFACC    243                             770     3282    9600/4800 DDCMPFOE     608                             1917    1320    9600/14.4K stat muxDCN1 to ARPANET hosts and local netsMILARP  61      105     15358   133     171     27769   MILNET gatewayISID-L  166     263     6989    403     472     15029   low-traffic periodSCORE   184     318     5088    541     608     15745   low-traffic periodRVAX    231     398     4061    651     740     11781   Purdue local netAJAX    322     578     2664    944     1081    7681    MIT local netISID-H  333     520     3643    715     889     6029    high-traffic periodBERK    336     967     1078    1188    1403    4879    UC BerkeleyWASH    498     776     2441    1256    1348    11379   U WashingtonDCN1 to MILNET/MINET hosts and local netsISIA-L  460     563     6633    1049    1140    11489   low-traffic periodISIA-H  564     841     2447    1275    1635    2910    high-traffic periodBRL     560     973     1645    1605    1825    4768    BRL local netLON     585     835     2724    1775    1998    4696    MINET host (London)HAWAII  679     980     2257    1817    1931    9238    a long way offOFFICE3 762     1249    1396    2283    2414    8004    heavily loaded hostKOREA   897     1294    1712    2717    2770    19652   a long, long way offDCN1 to TELENET hosts via ARPANETRICE    1456    2358    754     3086    3543    2297    via VAN gatewayDCN1 to SATNET hosts and local nets via ARPANETUCL     1089    1240    4514    1426    1548    8558    UCL zooNTA-L   1132    1417    2382    1524    1838    3339    low-traffic periodNTA-H   1247    1504    2640    1681    1811    8078    high-traffic periodNTA to SATNET hostsTANUM   107                             368     6625    9600 bps Tanum lineETAM    964                             1274    5576    Etam channel echoGOONY   972                             1256    6082    Goonhilly channel echo

Internet Delay Experiments Page 5D.L. Mills2.2 Analysis of Results The data clearly show a strong correlation between delay and length, withthe longest packets showing delays two to three times the shortest. On pathsvia ARPANET clones the delay characteristic shows a stonger correlation withlength for single-packet messages than for multi-packet messages, which isconsistent with a design which favors low delays for short messages and highthroughputs for longer ones. Most of the runs were made during off-peak hours. In the few cases whereruns were made for a particular host during both on-peak and off-peak hours,comparison shows a greater dependency on packet length than on traffic shift. TCP implementors should be advised that some dependency on packet lengthmay have to be built into the retransmission-timeout estimation algorithm toinsure good performance over lossy nets like SATNET. They should also beadvised that some Internet paths may require stupendous timeout intervalsranging to many seconds for the net alone, not to mention additional delays onhost-system queues. I call to your attention the fact that the delays (at least for thelarger packets) from ARPANET hosts (e.g. DCN1) to MILNET hosts (e.g. ISIA)are in the same ballpark as the delays to SATNET hosts (e.g. UCL)! I havealso observed that the packet-loss rates on the MILNET path are at present notneglible (18 in 512 for ISIA-2). Presumably, the loss is in the gateways;however, there may well be a host or two out there swamping the gateways withretransmitted data and which have a funny idea of the "normal" timeoutinterval. The recent discovery of a bug in the TOPS-20 TCP implementation,where spurious ACKs were generated at an alarming rate, would seem to confirmthat suspicion.3. Retransmission-Timeout Algorithm One of the basic features of TCP which allow it to be used on pathsspanning many nets of widely varying delay and packet-loss characteristics isthe retranansmission-timeout algorithm, sometimes known as the "RSREAlgorithm" for the original designers. The algorithm operates by recordingthe time and initial sequence number when a segment is transmitted, thencomputing the elapsed time for that sequence number to be acknowledged. Thereare various degrees of sophistication in the implementation of the algorithm,ranging from allowing only one such computation to be in progress at a time toallowing one for each segment outstanding at a time on the connection. The retransmission-timeout algorithm is basically an estimation process.It maintains an extimate of the current roundtrip delay time and updates it asnew delay samples are computed. The algorithm smooths these samples and thenestablishes a timeout, which if exceeded causes a retransmission. Theselection of the parameters of this algorithm are vitally important in orderto provide effective data transmission and avoid abuse of the Internet systemby excessive retransmissions. I have long been suspicious of the parameters

Internet Delay Experiments Page 6D.L. Millssuggested in the specification and used in some implementations, especially incases involving long-delay paths involving lossy nets. The experiment wasdesigned to simulate the operation of the algorithm using data collected fromreal paths involving some pretty leaky Internet plumbing.3.1. Experiment Design The experiment data base was constructed of well over a hundred runsusing ICMP Echo/Reply messages bounced off hosts scattered all over the world.Most runs, including all those summarized here, consisted of 512 echo/replycycles lasting from several seconds to twenty minutes or so. Other runsdesigned to detect network glitches lasted several hours. Some runs usedpackets of constant length, while others used different lengths distributedfrom 40 to 256 octets. The maximum length was chosen to avoid complicationsfragmented or reassembled by the gateways. The object of the experiment was to simulate the packet delaydistribution seen by TCP over the paths measured. Only the network delay isof interest here, not the queueing delays within the hosts themselves, whichcan be considerable. Also, only a single packet was allowed in flight, sothat stress on the network itself was minimal. Some tests were conductedduring busy periods of network activity, while others were conducted duringquiet hours. The 512 data points collected during each run were processed by a programwhich plotted on a color bit-map display each data point (x,y), where xrepresents the time since initiation of the experiment the and y the measureddelay, normalized to the one-way delay. Then, the simulatedretransmission-timeout algorithm was run on these data and its computedtimeout plotted in the same way. The display immediately reveals how thealgorithm behaves in the face of varying traffic loads, network glitches, lostpackets and superfluous retransmissions. Each experiment run also produced summary statistics, which aresummarized in the table below. Each line includes the Host ID, whichidentifies the run. The suffix -1 indicates 40-octet packets, -2 indicates256-octet packets and no suffix indicates uniformly distributed lengthsbetween 40 and 256. The Lost Packets columns refer to instances when no ICMPReply message was received for thirty seconds after transmission of the ICMPEcho message, indicating probable loss of one or both messages. The RTXPackets columns refer to instances when the computed timeout is less than themeasured delay, which would result in a superfluous retransmission. For eachof these two types of packets the column indicates the number of instancesand the Time column indicates the total accumulated time required for therecovery action. For reference purposes, the Mean column indicates the computed mean delayof the echo/reply cycles, excluding those cycles involving packet loss, whilethe CoV column indicates the coefficient of variation. Finally, the Eff

Internet Delay Experiments                                              Page 7D.L. Millscolumn indicates the efficiency, computed as the ratio of the total timeaccumulated while sending good data to this time plus the lost-packet andrtx-packet time.     Complete sets of runs were made for each of the hosts in the table belowfor each of several selections of algorithm parameters.  The table itselfreflects values, selected as described later, believed to be a good compromisefor use on existing paths in the Internet system.

Internet Delay Experiments                                              Page 8D.L. MillsHost    Total   Lost Packets    RTX Packets     Mean    CoV     EffID      Time            Time            Time-------------------------------------------------------------------DCN1 to nearby local-net hosts (calibration)DCN5    5       0       0       0       0       11      .15     1DCN8    8       0       0       0       0       16      .13     1IMP17   19      0       0       0       0       38      .33     1FORD1   86      0       0       1       .2      167     .33     .99UMD1    135     0       0       2       .5      263     .45     .99DCN6    177     0       0       0       0       347     .34     1FACC    368     196     222.1   6       9.2     267     1.1     .37FOE     670     3       7.5     21      73.3    1150    .69     .87FOE-1   374     0       0       26      61.9    610     .75     .83FOE-2   1016    3       16.7    10      47.2    1859    .41     .93DCN1 to ARPANET hosts and local netsMILARP  59      0       0       2       .5      115     .39     .99ISID    163     0       0       1       1.8     316     .47     .98ISID-1  84      0       0       2       1       163     .18     .98ISID-2  281     0       0       3       17      516     .91     .93ISID *  329     0       0       5       12.9    619     .81     .96SCORE   208     0       0       1       .8      405     .46     .99RVAX    256     1       1.3     0       0       499     .42     .99AJAX    365     0       0       0       0       713     .44     1WASH    494     0       0       2       2.8     960     .39     .99WASH-1  271     0       0       5       8       514     .34     .97WASH-2  749     1       9.8     2       17.5    1411    .4      .96BERK    528     20      50.1    4       35      865     1.13    .83DCN1 to MILNET/MINET hosts and local netsISIA    436     4       7.4     2       15.7    807     .68     .94ISIA-1  197     0       0       0       0       385     .27     1ISIA-2  615     0       0       2       15      1172    .36     .97ISIA *  595     18      54.1    6       33.3    992     .77     .85BRL     644     1       3       1       1.9     1249    .43     .99BRL-1   318     0       0       4       13.6    596     .68     .95BRL-2   962     2       8.4     0       0       1864    .12     .99LON     677     0       0       3       11.7    1300    .51     .98LON-1   302     0       0       0       0       589     .06     1LON-2   1047    0       0       0       0       2044    .03     1HAWAII  709     4       12.9    3       18.5    1325    .55     .95OFFICE3 856     3       12.9    3       10.3    1627    .54     .97OFF3-1  432     2       4.2     2       6.9     823     .31     .97OFF3-2  1277    7       39      3       41.5    2336    .44     .93KOREA   1048    3       14.5    2       18.7    1982    .48     .96KOREA-1 506     4       8.6     1       2.2     967     .18     .97KOREA-2 1493    6       35.5    2       19.3    2810    .19     .96DCN1 to TELENET hosts via ARPANETRICE    677     2       6.8     3       12.1    1286    .41     .97

Internet Delay Experiments                                              Page 9D.L. MillsRICE-1  368     1       .1      3       2.3     715     .11     .99RICE-2  1002    1       4.4     1       9.5     1930    .19     .98DCN1 to SATNET hosts and local nets via ARPANETUCL     689     9       26.8    0       0       1294    .21     .96UCL-1   623     39      92.8    2       5.3     1025    .32     .84UCL-2   818     4       13.5    0       0       1571    .15     .98NTA     779     12      38.7    1       3.7     1438    .24     .94NTA-1   616     24      56.6    2       5.3     1083    .25     .89NTA-2   971     19      71.1    0       0       1757    .2      .92NTA to SATNET hosts and local netsTANUM   110     3       1.6     0       0       213     .41     .98GOONY   587     19      44.2    1       2.9     1056    .23     .91ETAM    608     32      76.3    1       3.1     1032    .29     .86UCL     612     5       12.6    2       8.5     1154    .24     .96Note:  * indicates randomly distributed packets during periods of high ARPANETactivity.  The same entry without the * indicates randomly distributed packetsduring periods of low ARPANET activity.

Internet Delay Experiments Page 10D.L. Mills3.2 Discussion of Results It is immediately obvious from visual inspection of the bit-map displaythat the delay distribution is more-or-less Poissonly distributed about arelatively narrow range with important exceptions. The exceptions arecharacterized by occasional spasms where one or more packets can be delayedmany times the typical value. Such glitches have been commonly noted beforeon paths involving ARPANET and SATNET, but the true impact of their occuranceon the timeout algorithm is much greater than I expected. What commonlyhappens is that the algorithm, when confronted with a short burst oflong-delay packets after a relatively long interval of well-mannered behavior,takes much too long to adapt to the spasm, thus inviting many superfluousretransmissions and leading to congestion. The incidence of long-delay bursts, or glitches, varied widely during theexperiments. Some of them were glitch-free, but most had at least one glitchin 512 echo/reply volleys. Glitches did not seem to correlate well withincreases in baseline delay, which occurs as the result of traffic surges, nordid they correlate well with instances of packet loss. I did not notice anyparticular periodicity, such as might be expected with regular pinging, forexample; however, I did not process the data specially for that. There was no correction for packet length used in any of theseexperiments, in spite of the results of the first set of experiments describedpreviously. This may be done in a future set of experiments. The algorithmdoes cope well in the case of constant-length packets and in the case ofrandomly distributed packet lengths between 40 and 256 octets, as indicated inthe table. Future experiments may involve bursts of short packets followed bybursts of longer ones, so that the speed of adaptation of the algorithm can bedirectly deterimend. One particularily interesting experiment involved the FOE host(FORD-FOE), which is located in London and reached via a 14.4-Kbps underseacable and statistical multiplexor. The multiplexor introduces a moderate meandelay, but with an extremely large delay dispersion. The specifiedretransmission-timeout algorithm had a hard time with this circuit, as mightbe expected; however, with the improvments described below, TCP performancewas acceptable. It is unlikely that many instances of such ornery circuitswill occur in the Internet system, but it is comforting to know that thealgorithm can deal effectively with them.3.3. Improvments to the Algorithm The specified retransmission-timeout algorithm, really a first-orderlinear recursive filter, is characterized by two parameters, a weightingfactor F and a threshold factor G. For each measured delay sample R the delayestimator E is updated: E = F*E + (1 - F)*R .

Internet Delay Experiments Page 11D.L. MillsThen, if an interval equal to G*E expires after transmitting a packet, thepacket is retransmitted. The current TCP specification suggests values in therange 0.8 to 0.9 for F and 1.5 to 2.0 for G. These values have been believedreasonable up to now over ARPANET and SATNET paths. I found that a simple change to the algorithm made a worthwhile change inthe efficiency. The change amounts to using two values of F, one (F1) when R< E in the expression above and the other (F2) when R >= E, with F1 > F2. Theeffect is to make the algorithm more responsive to upward-going trends indelay and less respnsive to downward-going trends. After a number of trials Iconcluded that values of F1 = 15/16 and F2 = 3/4 (with G = 2) gave the bestall-around performance. The results on some paths (FOE, ISID, ISIA) werebetter by some ten percent in efficiency, as compared to the values now usedin typical implementations where F = 7/8 and G = 2. The results on most pathswere better by five percent, while on a couple (FACC, UCL) the results wereworse by a few percent. There was no clear-cut gain in fiddling with G. The value G = 2 seemedto represent the best overall compromise. Note that increasing G makessuperfluous retransmissions less likely, but increases the total delay whenpackets are lost. Also, note that increasing F2 too much tends to causeovershoot in the case of network glitches and leads to the same result. Thetable above was constructed using F1 = 15/16, F2 = 3/4 and G = 2. Readers familiar with signal-detection theory will recognize mysuggestion as analogous to an ordinary peak-detector circuit. F1 representsthe discharge time-constant, while F2 represents the charge time-constant. Grepresents a "squelch" threshold, as used in voice-operated switches, forexample. Some wag may be even go on to suggest a network glitch should becalled a netspurt.

Internet Delay Experiments                                             Page 12D.L. MillsAppendix.  Index of Test HostsName    Address         NIC Host Name-------------------------------------DCN1 to nearby local-net hosts (calibration)DCN5    128.4.0.5       DCN5DCN8    128.4.0.8       DCN8IMP17   10.3.0.17       DCN-GATEWAYFORD1   128.5.0.1       FORD1UMD1    128.8.0.1       UMD1DCN6    128.4.0.6       DCN6FACC    128.5.32.1      FORD-WDL1FOE     128.5.0.15      FORD-FOEDCN1 to ARPANET hosts and local netsMILARP  10.2.0.28       ARPA-MILNET-GWISID    10.0.0.27       USC-ISIDSCORE   10.3.0.11       SU-SCORERVAX    128.10.0.2      PURDUE-MORDREDAJAX    18.10.0.64      MIT-AJAXWASH    10.0.0.91       WASHINGTONBERK    10.2.0.78       UCB-VAXDCN1 to MILNET/MINET hosts and local netsISIA    26.3.0.103      USC-ISIABRL     192.5.21.6      BRL-VGRLON     24.0.0.7        MINET-LON-EMHAWAII  26.1.0.36       HAWAII-EMHOFFICE3 26.2.0.43       OFFICE-3KOREA   26.0.0.117      KOREA-EMHDCN1 to TELENET hosts via ARPANETRICE    14.0.0.12       RICEDCN1 to SATNET hosts and local nets via ARPANETUCL     128.16.9.0      UCL-SAMNTA     128.39.0.2      NTARE1NTA to SATNET hosts and local netsTANUM   4.0.0.64        TANUM-ECHOGOONY   4.0.0.63        GOONHILLY-ECHOETAM    4.0.0.62        ETAM-ECHO

[8]ページ先頭