Movatterモバイル変換

[RFC Home] [TEXT|PDF|HTML] [Tracker] [IPR] [Info page]

UNKNOWN

Network Working Group                               Richard Schantz (BBN-TENEX)Request for Comments: 672                                              Dec 1974NIC #31440A Multi-Site Data Collection Facility        Preface:        This RFC reproduces most of a working document        prepared during the design and implementation of the        protocols for the TIP-TENEX integrated system for        handling TIP accounting. Bernie Cosell (BBN-TIP)        and Bob Thomas (BBN-TENEX) have contributed to        various aspects of this work. The system has been        partially operational for about a month on selected        hosts. We feel that the techniques described here        have wide applicability beyond TIP accounting.

Section IProtocols for a Multi-site Data Collection FacilityIntroduction The development of computer networks has provided thegroundwork for distributed computation: one in which a job or taskis comprised of components from various computer systems. In asingle computer system, the unavailability or malfunction of any ofthe job components (e.g. program, file, device, etc.) usuallynecessitates job termination. With computer networks, it becomesfeasible to duplicate certain job components which previously had nobasis for duplication. (In a single system, it does not matter howmany times a process that performs a certain function is duplicated;a system crash makes all unavailable). It is such resourceduplication that enables us to utilize the network to achieve highreliability and load leveling. In order to realize the potential ofresource duplication, it is necessary to have protocols whichprovide for the orderly use of these resources. In this document,we first discuss in general terms a problem of protocol definitionfor interacting with a multiply defined resource (server). Theproblem deals with providing a highly reliable data collectionfacility, by supporting it at many sites throughout the network. Inthe second section of this document, we describe in detail aparticular implementation of the protocol which handles the problemof utilizing multiple data collector processes for collectingaccounting data generated by the network TIPs. This example alsoillustrates the specialization of hosts to perform parts of acomputation they are best equipped to handle. The large networkhosts (TENEX systems) perform the accounting function for the smallnetwork access TiPs. The situation to be discussed is the following: a datagenerating process needs to use a data collection service which isduplicately provided by processes on a number of network machines.A request to a server involves sending the data to be collected.An Initial Approach The data generator could proceed by selecting a particularserver and sending its request to that server. It might also takethe attitude that if the message reaches the destination host (thecommunication subsystem will indicate this) the message will beproperly processed to completion. Failure of the request Messagewould then lead to selecting another server, until the requestsucceeds or all servers have been tried. -2-

Such a simple strategy is a poor one. It makes sense torequire that the servicing process send a positive acknowledgementto the requesting process. If nothing else, the reply indicatesthat the server process itself is still functioning. Waiting forsuch a reply also implies that there is a strategy for selectinganother server if the reply is not forthcoming. Herein lies aproblem. If the expected reply is timed out, and then a new requestis sent to another server, we run the risk of receiving the(delayed) original acknowledgement at a later time. This couldresult in having the data entered into the collection system twice(data duplication). If the request is re-transmitted to the sameserver only, we face the possibility of not being able to access acollector (data loss). In addition, for load leveling purposes, wemay wish to send new requests to some (or all) servers. We can thenuse their reply (or lack of reply) as an indicator of load on thatparticular instance of the service. Doing this without dataduplication requires more than a simple request and acknowledgementprotocol*.Extension of the Protocol The general protocol developed to handle multiple collectionservers involves having the data generator send the data request tosome (or all) data collectors. Those willing to handle the requestreply with an "I've got it" message. They then await furthernotification before finalizing the processing of the data. The datagenerator sends a "go ahead" message to one of the replyingcollectors, and a "discard" message to all other replyingcollectors. The "go ahead" message is the signal to process thedata (i.e. collect permanently), while the "discard" messageindicates that the data is being collected elsewhere and should notbe retained. The question now arises as to whether or not the collectorprocess should acknowledge receipt of the "go ahead" message with areply of its own, and then should the generator process acknowledgethis acknowledgement, etc. We would like to send as few messages aspossible to achieve reliable communication. Therefore, when a state--------------------* If the servers are independent of each other to the extent that iftwo or more servers all act on the same request, the end result isthe same as having a single server act on the request, then a simplerequest/acknowledgement protocol is adequate. Such may be the case,for example, if we subject the totality of collected data (i.e. alldata collected by all collectors for a certain period) to aduplicate detection scan. If we could store enough context in eachentry to be able to determine duplicates, then having two or moreservers act on the data would be functionally equivalent toprocessing by a single server. -3-

is reached for which further acknowledgements lead to a previouslyvisited state, or when the cost of further acknowledgements outweighthe increase in reliability they bring, further acknowledgementsbecome unnecessary. The initial question was should the collector processacknowledge the "go ahead" message? Assume for the moment that itshould not send such an acknowledgement. The data generator couldverify, through the communication subsystem, the transmission of the"go ahead" message to the host of the collector. If this messagedid not arrive correctly, the generator has the option ofre-transmitting it or sending a "go ahead" to another collectorwhich has acknowledged receipt of the data. Either strategyinvolves no risk of duplication. If the "go ahead" message arrivescorrectly, and a collector acknowledgement to the "go ahead" messageis not required, then we incur a vulnerability to (collector host)system crash from the time the "go ahead" message is accepted by thehost until the time the data is totally processed. Call the dataprocessing time P. Once the data generator has selected aparticular collector (on the basis of receiving its "I've got it"message), we also incur a vulnerability to malfunction of thiscollector process. The vulnerable period is from the time thecollector sends its "i've got it" message until the time the data isprocessed. This amounts to two network transit times (2N) plus IMPand host overhead for message delivery (0) plus data processing time(P). [Total time=2N+P+O]. A malfunction (crash) in this period cancause the loss of data. There is no potential for duplication. Now, assume that the data collector process must acknowledgethe "go ahead" message. The question then arises as to when such anacknowledgement should be sent. The reasonable choices are eitherimmediately before final processing of the data (i.c. before thedata is permanently recorded) or immediately after final processing.It can be argued that unless another acknowledgement is required (bythe generator to the collector) to this acknowledgement BEFORE theactual data update, then the best time for the collector toacknowledge the "go ahead" is after final processing. This is sobecause receiving the acknowledgement conveys more information if itis sent after processing, while not receiving it (timeout), ineither case, leaves us in an unknown state with respect to the dataupdate. Depending on the relative speeds of various network andsystem components, the data may or may not be permanently entered.Therefore if we interpret the timeout as a signal to have the dataprocessed at another site, we run the risk of duplication of data.To avoid data duplication, the timeout strategy must only involvere-sending the "go ahead" message to the same collector. This willonly help if the lack of reply is due to a lost network message.Our vulnerability intervals to system and process malfunction remainas before. It is our conjecture (to be analyzed further) that any furtheracknowledgements to these acknowledgements will have virtually noeffect on reducing the period of vulnerability outlined above. Assuch, the protocol with the fewest messages required is superior. -4-

Data Dependent Aspects of the Protocol As discussed above, a main issue is which process should be thelast to respond (send an acknowledgement). If the data generatorsends the last message (i.e. "go ahead"), we can only check on itscorrect arrival at the destination host. We must "take on faith"the ability of the collector to correctly complete the transaction.This strategy is geared toward avoiding data duplication. If on theother hand, the protocol specifies that the collector is to send thelast message, with the timeout of such a message causing the datagenerator to use another collector, then the protocol is gearedtoward the best efforts of recording the data somewhere, at theexpense of possible duplication. Thus, the nature of the problem will dictate which of theprotocols is appropriate for a given situation. The next sectiondeals in the specifics of an implement;tion of a data collectionprotocol to handle the problem of collecting TIP accounting data byusing the TENEX systems for running the collection server processes.It is shown how the general protocol is optimized for the accountingdata collection.Section IIProtocol for TIP-TENEX Accounting Server Information ExchangeOverview of the Facility When a user initially requests service from a TIP, the TIP willperform a broadcast ICP to find an available RSEXEC which maintainsan authentication data base. The user must then complete s loginsequence in order to authenticate himself. If he is successful theRSEXEC will transmit his unique ID code to the TIP. Failure willcause the RSEXEC to close the connection and the TIP to hang up onthe user. After the user is authenticated, the TIP will accumulateaccounting data for the user session. The data includes a count ofmessages sent on behalf of the user, and the connect time for theuser. From time to time the TIP will transmit intermediateaccounting data to Accounting Server (ACTSER) processes scatteredthroughout the network. These accounting servers will maintainfiles containing intermediate raw accounting data. The rawaccounting data will periodically be collected and sorted to producean accounting data base. Providing a number of accounting serversreduces the possibility of being unable to find a repository for theintermediate data, which otherwise would be lost due to bufferinglimitations in the TiPs. The multitude of accounting servers canalso serve to reduce the load on the individual hosts providing thisfacility. -5-

The rest of this document details the protocol that has beendeveloped to ensure delivery of TIP accounting data to one of theavailable accounting servers for storage in the intermediateaccounting files.Adapting the ProtocolThe TIP to Accounting Server data exchange uses a protocol thatallows the TIP to select for data transmission one, some, or allserver hosts either sequentially or in parallel, yet insures thatthe data that becomes part of the accounting file does not containduplicate information. The protocol also minimizes the amount ofdata buffering that must be done by the limited capacity TiPs. Theprotocol is applicable to a wide class of data collection problemswhich use a number of data generators and collectors. The followingdescribes how the protocol works for TIP accounting.Each TIP is responsible for maintaining in its memory the cellsindicating the connect time and the number of messages sent for eachof its current users. These cells are incremented by the TIP forevery quantum of connect time and message sent, as the case may be.This is the data generation phase. Periodically, the TIP will scanall its active counters, and along with each user ID code, pack theaccumulated data into one network message (i.e. less than 8K bits).The TIP then transmits this data to a set of Accounting Serverprocesses residing throughout the network. The data transfer isover a specially designated host-host link. The accounting serversutilize the raw network message facility of TENEX 1.32 in order todirectly access that link. When an ACTSER receives a data messagefrom a TIP, it buffers the data and replies by returning the entiremessage to the originating TIP. The TIP responds with a positiveacknowledgement ("go ahead") to the first ACTSER which returns thedata, and responds with a negative acknowledgement ("discard") toall subsequent ACTSER data return messages for this series oftransfers. If the TIP does not receive a reply from any ACTSER, itaccumulates new data (i.e. the TIP has all the while beenincrementing its local counters to reflect the increased connecttime and message count; the current values will comprise new datatransfers) and sends the new data to the Accounting Serverprocesses. When an ACTSER receives a positive acknowledgement froma TIP (i.e. "go ahead"), it appends the appropriate parts of thebuffered data to the locally maintained accounting information file.On receiving a negative acknowledgement from the TIP (i.e."discard"), the ACTSER discards the data buffered for this TIP. In-addition, when the TIP responds with a "go ahead" to the firstACTSER which has accepted the data (acknowledged by returning thedata along with the "I've got it"), the TIP decrements the connecttime and message counters for each user by the amount indicated inthe data returned by the ACTSER. This data will already beaccounted for in the intermediate accounting files.As an aid in determining which ACTSER replies are to currentrequests, and which are tardy replies to old requests, the TIP -6-

maintains a sequence number indicator, and appends this number toeach data message sent to an ACTSER. On receiving a reply from anACTSER, the TIP merely checks the returned sequence number to see ifthis is the first reply to the current set of TIP requests. If thereturned sequence number is the same as the current sequence number,then this is the first reply; a positive acknowledgement is sentoff, the counters are decremented by the returned data, and thesequence number is incremented. If the returned sequence number isnot the same as the current one (i.e. not the one we are nowseeking a reply for) then a negative acknowledgement is sent to thereplying ACTSER. After a positive acknowledgement to an ACTSER (andthe implied incrementing of the sequence number), the TIP can waitfor more information to accumulate, and then start transmittingagain using the new sequence number.Further Clarification of the ProtocolThere are a number of points concerning the protocol thatshould be noted.1. The data generator (TIP) can send different (i.e. updatedversions) data to different data collectors (accounting servers) aspart of the same logical transmission sequence. This is possiblebecause the TIP does not account for the data sent until it receivesthe acknowledgement of the data echo. This strategy relieves theTIP of any buffering in conjunction with re-transmission of datawhich hasn't been acknowledged.2. A new data request to an accounting server from a TIP willalso serve as a negative acknowledgement concerning any data alreadybuffered by the ACTSER for that TIP, but not yet acknowledged. Theold data will be discarded, and the new data will be buffered andechoed as an acknowledgement. This allows the TIP the option of notsending a negative acknowledgement when it is not convenient to doso, without having to remember that it must be sent at a later time.There is one exception to this convention. If the new data messagehas the same sequence number as the old buffered message, then thenew data must be discarded, and the old data kept and re-echoed.This is to prevent a slow acknowledgement to the old data from beingaccepted by the TIP, after the TIP has already sent the new data tothe slow host. This caveat can be avoided if the TIP does notresend to a non-responding server within the time period that amessage could possibly be stuck in the network, but could still bedelivered. Ignoring this situation may result in some accountingdata being counted twice. Because of the rule to keep old data whenconfronted with matching sequence numbers, on restarting after acrash, the TIP should send a "discard" message to all servers inorder to clear any data which has been buffered for it prior to thecrash. An alternative to this would be for the TIP to initializeits sequence number from a varying source such as time of day.3. The accounting server similarly need not acknowledge receiptof data (by echoing) if it finds itself otherwise occupied. Thiswill mean that the ACTSER is not buffering the data, and hence isnot a candidate for entering the data into the file. However, the -7-

TIP may try this ACTSER at a later time (even with the same data),with no ill effects.4. Because of 2 and 3 above, the protocol is robust with respectto lost or garbled transmissions of TIP data requests and accountingserver echo replies. That is, in the event of loss of such amessage, a re-transmission will occur as the normal procedure.5. There is no synchronization problem with respect to thesequence number used for duplicate detection, since this number ismaintained only at the TIP site. The accounting server merelyechoes the sequence number it has received as part of the data.6. There are, however, some constraints on the size of thesequence number field. It must be large enough so that ALL tracesof the previous use of a given sequence number are totally reMovedfrom the network before the number is re-used by the TIP. Thesequence number is modulo the size of the largest number representedby the number of bits allocated, and is cyclic. Problems generallyarise when a host proceeds from a service interruption while it washolding on to a reply. If during the service interruption, we havecycled through our sequence numbers exactly N times (where N is anyinteger), this VERY tardy reply could be mistaken for a reply to thenew data, which has the same sequence number (i.e. N revolutions ofsequence numbers later). By utilizing a sufficiently large sequencenumber field (16 bits), and by allowing sufficient time betweeninstances of sending new data, we can effectively reduce theprobability of such an error to zero.7. Since the data involved in this problem is the source ofaccounting information, care must be taken to avoid duplicateentries. This must be done at the expense of potentially losingdata in certain instances. Other than the obvious TIP malfunction,there are two known ways of losing data. One is the situation whereno accounting server responds to a TIP for an extended period oftime causing the TIP counters to overflow (highly unlikely if thereare sufficient Accounting Servers). In this case, the TIP can holdthe counters at their maximum value until a server comes up, therebykeeping the lost accounting data at its minimum. The othersituation results from adapting the protocol to our insistence on noduplicate data in the incremental files. We are vulnerable to dataloss with no recourse from the time the server receives the "goahead" to update the file with the buffered data (i.e. positiveacknowledgement) until the time the update is completed and the fileis closed. An accounting server crash during this period will causethat accounting data to be lost. In our initial implementation, wehave slightly extended this period of vulnerability in order to savethe TIP from having to buffer the acknowledged data for a shortperiod of time. By updating TIP counters from the returned data inparallel with sending the "go ahead" acknowledgement, we relieve theTIP of the burden of buffering this data until the Request for NextMessage (RFNM) from the accounting server IMP is received. Thisadds slightly to our period of vulnerability to malfunction, movingthe beginning of the period from the point when the ACTSER hostreceives the "go ahead", back to the point when the TIP sends off -8-

the "go ahead" (i.e. a period of one network transit time plus someIMP processing time). However, loss of data in this period isdetectable through the Host Dead or Incomplete Transmission returnin place of the RFNM. We intend to record such occurrences with theNetwork Control Center. If this data loss becomes intolerable, theTIP program will be modified to await the RFNM for the positiveacknowledgement before updating its counters. In such a case, ifthe RFNM does not come, the TIP can discard the buffered data andre-transmit new data to other servers.8. There is adequate protection against the entry of forged datainto the intermediate accounting files. This is primarily due tothe system enforced limited access to Host-Imp messages andHost-Host links. In addition, messages received on such designatedlimited access links can be easily verified as coming from a TIP.The IMP subnet appends the signature (address) of the sending hostto all of its messages, so there can be no forging. The AccountingServer is in a position to check if the source of the message is infact a TIP data generator.Current Parameters of the ProtocolIn the initial implementation, the TIP sends its accumulatedaccounting data about once every half hour. If it gets no positiveacknowledgement, it tries to send with greater frequency (aboutevery 5 minutes) until it finally succeeds. It can then return tothe normal waiting period. (A TIP user logout introduces anexception to this behavior. In order to re-use the TIP port and itsassociated counters as soon as possible, a user terminating his TIPsession causes the accounting data to be sent immediately).initially, our implementation calls for each TIP to remember a"favored" accounting server. At the wait period expiration, the TIPwill try to deposit the data at its "favored" site. If successfulwithin a short timeout period, this site remains the favored site,and the wait interval is reset. If unsuccessful within the shorttimeout, the data can be sent to all servers*. The one replyingfirst will update its file with the data and also become the"favored" server for this TIP. With these parameters, a host wouldhave to undergo a proceedable service interruption of more than ayear in order for the potential sequence number problem outlined in(6) above to occur.Concluding RemarksWhen the implementation is complete, we will have a generaldata accumulation and collection system which can be used to gathera wide variety of information. The protocol as outlined is gearedto gathering data which is either independent of the previouslyaccumulated data items (e.g. recording names), or data whichadheres to a commutative relationship (e.g. counting). This is a -9-

consequence of the policy of retransmission of different versions ofthe data to different potential collectors (to relieve TIP bufferingproblems).In the specified version of the protocol, care was taken toavoid duplicate data entries, at the cost of possibly losing somedata through collector malfunction. Data collection problems whichrequire avoiding such loss (at the cost of possible duplication ofsome data items) can easily be accommodated with a slight adjustmentto the protocol. Collected data which does not adhere to thecommutative relationship indicated above, can also be handled byutilizing more buffer space at the data generator sites.The sequence number can be incremented for this new set of datamessages, and the new data can also be sent to the slow host. Inthis way we won't be giving the tardy response from the old favoredhost unfair advantage in determining which server can respond mostquickly. If there is no reply to this series of messages, the TIPcan continue to resend the new data. However, the sequence numbershould not be incremented, since no reply was received, and sinceindiscriminate incrementing of the sequence number increases thechance of recycling during the lifetime of a message.                                     -10-

[8]ページ先頭