Movatterモバイル変換
[0]ホーム
[RFC Home] [TEXT|PDF|HTML] [Tracker] [IPR] [Info page]
UNKNOWN
Network Working Group Paul R. Johnson (BBN-TENEX)RFC # 677 Robert H. Thomas (BBN-TENEX)NIC # 31507 January 27, 1975The Maintenance of Duplicate DatabasesPreface:This RFC is a working paper on the problem of maintaining duplicateddatabases in an ARPA-like network. It briefly discusses the generalduplicate database problem, and then outlines in some detail a solutionfor a particular type of duplicate database. The concepts developedhere were used in the design of the User Identification Database for theTIP user authentication and accounting system. We believe that theseconcepts are generally applicable to distributed database problems.Johnson & Thomas [Page 1]
RFC 677 The Maintenance of Duplicate Databases January 1975Introduction There are a number of motivations for maintaining redundant,duplicate copies of databases in a distributed network environment. Twoimportant motivations are: - to increase reliability of data access. The accessibility of critical data can be increased by redundantly maintaining it. The database used for TIP login and accounting is redundantly distributed to achieve highly reliable access. - to insure efficiency of data access. Data can be more quickly and efficiently accessed when it is "near" the accessing process. A copy of the TIP user ID database is maintained at each site supporting the TIP login service to insure rapid, efficient access. (Reliability considerations dictate that this database be redundantly maintained, and efficiency considerations dictate that a copy be maintained at each authentication site.) The design of a system to maintain redundant, duplicate databases isa challenging task because of the inherent communication delay betweencopies of the database, as well as the real world constraints of systemcrashes, operator error, communication channel failure, etc. This paperdiscusses some of the problems we encountered in designing such asystem, and outlines a system design for maintaining a particular typeof database which solves those problems.The Model A system for supporting duplicate copies of a database can bemodeled by a group of independent database management processes (DBMPs)each maintaining its own copy of the database. These processescommunicate with each other over network communication paths. Each DBMPhas complete control over its copy of the database. It handles allaccesses to and modifications of the database in response to requestsfrom other processes. Though the DBMPs act only upon requests, in thefollowing they will often be said to be actually causing or originatingthe modifications. An important design consideration is that the communication pathsbetween the DBMPs are subject to failures. Thus one DBMP may have itsinteractions with other DBMPs interrupted and/or have to wait untilcommunication paths are re-established before it can communicate withother DBMPs. An assumption made in this paper about the communicationJohnson & Thomas [Page 2]
RFC 677 The Maintenance of Duplicate Databases January 1975paths is that messages from one process to another are delivered in thesame order that they are sent. This is true of the ARPANET. For networksthat make no such guarantee, communication protocols between the DBMPscan be used to correctly order the messages. In order to proceed further, it is necessary to be more preciseabout the nature of the duplicated database and the operations allowedon it. A constant, read-only database is at one extreme. The task of theDBMPs is trivial in this case. They simply respond to value retrievalrequests. At the other extreme is a general shared database wherefunctional modification requests (such as "X := f(X,Y,Z)") are allowedand/or where it is necessary to completely restrict access to entrieswhile they are being modified. In this case all the problems of shareddatabases on a single computer system arise (e.g., the need forsynchronization mechanisms and the resulting potential deadlocksituations), as well as those unique to having multiple database copiesdistributed among independent computers. For example, a completelygeneral system must deal with the possibility of communication failureswhich cause the network to become partitioned into two or more sub-networks. Any solution which relies on locking an element of thedatabase for synchronized modification must cope with the possibility ofprocesses in non-communicating sub-networks attempting to lock the sameelement. Either they both must be allowed to do so (which violates thelock discipline), or they both must wait till the partition ceases(which may take arbitrarily long), or some form of centralized orhierarchical control must be used, with a resulting dependency of someDBMPs on others for all modifications and perhaps accesses as well.The Database The type of database to be examined in this paper can be representedas a collection of entries which are (Selector,Value) pairs. Eachselector is unique and the values are atomic entities as far as theDBMPs are concerned. The mechanisms to be presented may be extended tohandle databases with greater structure - where the values maythemselves be collections of (selector,value) pairs - but this extensionwill not be considered further here.Four operations are to be allowed on the database: 1) Selection - given a selector, the current associated value is returned. 2) Assignment - a selector and a value are given and the given value replaces the old value associated with the selector.Johnson & Thomas [Page 3]
RFC 677 The Maintenance of Duplicate Databases January 1975 3) Creation - a new selector and an initial value are given and a new (selector,initial value) entry is added to the database. 4) Deletion- a selector is given and the existing (selector,value) entry is removed from the database.Note that value modification is limited to assignment. Functionalmodification requests - such as "Change X to be Factorial(X)" - arespecifically ruled out. Allowing them would force the use of system widesynchronization interlocks.Consistency The extent to which the copies of the database can be kept"identical" must be examined. Because of the inherent delay incommunications between DBMPs, it is impossible to guarantee that thedata bases are identical at all times. Rather, our goal is to guaranteethat the copies are "consistent" with each other. By this we mean thatgiven a cessation of update activity to any entry, and enough time foreach DBMP to communicate with all other DBMPs, then the state of thatentry (its existence and value) will be identical in all copies of thedatabase.Timestamps We permit modifications to the database to originate at any of theDBMPs maintaining it. These changes must, of course, be communicated tothe other DBMPs. To insure consistency, all of the DBMPs must make thesame decision as to which modification to a particular entry is to beconsidered "final". It is desirable to select the "most recent" change.However, since there is no way to absolutely determine the time sequenceof events in a distributed system without a universal, always accessiblesequence number generator (a network time standard should suffice),"most recent" can only be approximated. We accomplish the approximationby associating a timestamp with each modification and with each entry,the latter being the timestamp of the modification which set its currentvalue.(1) Since the uniqueness of timestamps given out at more than one----------(1) Time is useful in this context because it has the desired propertiesof being monotonically increasing, and of being available with areasonable degree of accuracy. Any other sequence numbering scheme withthese properties can be used, "time-of-day" was chosen because it issimple to obtain. Its main faults are that it is often manually set (andthus prone to error), and it may stop during system serviceJohnson & Thomas [Page 4]
RFC 677 The Maintenance of Duplicate Databases January 1975interruptions. As computer systems learn to adapt to a networkenvironment, the use of an independent time source should become morecommon. This is beginning to happen with the TENEX sites on the ARPANET.DBMP can not be guaranteed, a "DBMP of origin" is included as part ofeach timestamp. By (arbitrarily) ordering the DBMPs, we thus have ameans of uniquely ordering timestamps. Each timestamp is a pair (T,D): T is a time, D is a DBMP identifier. For twotimestamps (T1,D1) and (T2,D2) we have the following: (T1,D1)>(T2,D2) <=> (T1>T2) or (T1=T2 and D1>D2) (T1,D1) is said to be "more recent" than (T2,D2) If D1=D2 and T1=T2 it is assumed that the two modifications are really two copies of the same modification request. In order to insure the uniqueness of timestamps, it is necessarythat each individual DBMP associate different times with differentmodifications. This is certainly possible to do, though the fineness ofthe unit of time may restrict the frequency of modifications at a singleDBMP site. Each entry in the data base is now a triple: E ::= (S,V,T), where S is the selector V is the associated value T is the timestamp (a Time,DBMP pair) of the last change to the entry The task of each DBMP is to keep its copy of the database up-to-date, given the information on modifications that it has received sofar. At the same time it must insure that each of its entries staysconsistent with those of all the other DBMPs. This must be done despitethe fact that the order in which it receives modifications may be verydifferent from the order in which they are received by other DBMPs. Inthe remainder of this paper we examine the allowable database operationswith respect to their effect on DBMP operation.Assignment Consider the case of assignment to an existing entry. When theassignment is initiated (by a person or process) the DBMP involved makesthe change locally, and creates a copy of the modified entry and anassociated list of DBMPs to which the change must be sent. As themodification is delivered to the other DBMPs, they are removed from thelist until no DBMPs remain. The copy of the modification is thendeleted. This distribution mechanism must be error free (i.e., receiptJohnson & Thomas [Page 5]
RFC 677 The Maintenance of Duplicate Databases January 1975of a modification must be positively confirmed before removing a DBMPfrom the list of recipients).(2) When a DBMP receives an assignment modification (either locally orfrom another DBMP) it compares the timestamp of the modification withthe timestamp of the copy of the entry in its database and keepswhichever is more "recent" as defined by the ordering given above. Thuswhen all existing assignments to a given entry have been distributed toall the DBMPs, they are guaranteed have the same "latest" valueassociated with that entry.Creation Creation and deletion of entries pose more of a problem. Note thatthe ability to create new, previously unknown entries requires that aDBMP be able to handle assignments to unknown entries. For example,consider the case of an entry XYZ created by DBMP A, and the followingsequence of events: DBMP A tells DBMP B about the new entry, andsubsequently B assigns a new value to XYZ; DBMP B then tells DBMP Cabout the assignment before C has heard from A about the creation. DBMPC must either save the assignment to XYZ until it hears about thecreation, or simply assume the creation will be coming and use the "new"entry right away. The latter is more in the spirit of trying to keep thedatabase as "up-to-date" as possible and leads to no inconsistencies.Deletion Deletion of entries is the main problem for this database system.If deletion is taken to mean immediate removal from the database, thenproblems arise. Consider the following scenario: XYZ is an entry known by all DBMPs. XYZ is deleted at DBMP A. XYZ is modified at DBMP B (before B is notified of the deletion by A).Now, consider a third DBMP, C. C may hear from DBMP B before DBMP A, inwhich case XYZ ends up deleted at DBMP C. C may however hear from DBMP A----------(2) This same process (local modification and queuing for remotedistribution) is, of course, performed for the other possible operationson the database. The details of how the local modification is donesafely, how the messages are queued, how confirmation of delivery isdone, etc., though important, will not be discussed here. The use of anaddressee list attached to the modification to be delivered isconceptually easy to work with and not difficult to implement inpractice.Johnson & Thomas [Page 6]
RFC 677 The Maintenance of Duplicate Databases January 1975before DBMP B. In this case, if C removes XYZ from its database, thenthe assignment to XYZ initiated by DBMP B will result in the re-creationof XYZ at DBMP C. To prevent this C must remember that XYZ has beendeleted until it is "safe" to completely forget about XYZ. One approach to this problem is to never actually remove an entryfrom the database. Deletion just marks the entry as being deleted bysetting a "deleted" flag associated each entry. However, the problem ofreceiving assignments to deleted entries still exists. For example, DBMPA may receive an assignment from DBMP B to an entry which A has markedas deleted. DBMP A can not tell whether B has not heard about thedeletion, or has heard about it but has also received a re-creationrequest for the entry, which hasn't reached DBMP A. To enable A toresolve such situations we include another timestamp in all entries: thetimestamp of the entry's creation. Thus in the above example, DBMP A cancompare the creation timestamps of the assignment and the deleted entry.The one with the later creation timestamp is kept. Indeed whenever amodification with an old creation timestamp is received it can beignored. We now have a 5-tuple for entries and modifications: E ::= (S,V,F,CT,T) S is the selector V is the associated value F is the deleted/not-deleted flag CT is the timestamp of creation T is the timestamp of this (last) modification Note that the values of the F, CT, and T components of amodification uniquely specify the type of modification. Thus only the5-tuple to become the new entry for a selector, not the type ofmodification, need be communicated: F = not deleted, CT = T => creation F = not deleted, CT < T => assignment F = deleted => deletion The mechanism described above handles all the desired operations onthe distributed database in a way that guarantees the consistency of allcopies. A modification to the database will take effect at each DBMP assoon as it receives the request from the DBMP originating the change. A deficiency with this scheme is that deleted entries are neverremoved from the database. A method which permits "garbage collection"of deleted entries is discussed below.Johnson & Thomas [Page 7]
RFC 677 The Maintenance of Duplicate Databases January 1975Removal of Deleted Entries The basic constraint is that a DBMP should not remove a deletedentry until it will never receive any assignments with the same selector(S) and the same or older create time (CT). If it fails to do this, thenit will be unable to distinguish these "out of date" assignments fromassignments to a newly created entry for the same S. To be able dothis, each DBMP must know for each deleted entry whether all other DBMPshave heard about the deletion. To accomplish this, each DBMP couldnotify the other DBMPs whenever it hears about a deletion. If thesenotifications are transmitted in order with the "normal" sequence ofmodifications, then upon receipt of such a notification a DBMP can besure that the sending DBMP has delivered any outstanding assignments tothe deleted entry, has marked it as deleted, and will not generate anynew assignments to it. Keeping track of, and exchanging messages about,each individual deleted entry in this manner is, however, somewhat moreelaborate than necessary. Having each DBMP deliver all its own modifications in sequentialorder (by timestamp) allows the following simplification. We have allDBMPs maintain a table of the timestamps of the last modificationreceived from each other DBMP. Any DBMP, say A, is then guaranteed tohave received all modifications originating at another DBMP, say B.which have timestamps earlier than (or equal to) the entry for B in A scopy of this table. If this table is exchanged between DBMPs, then allDBMPs would have a second N*N (N= number of DBMPs) table where entry(I,J) would be the timestamp of the last modification received by DBMP Ifrom DBMP J. Thus DBMP A can remove a deleted entry whose deletionoriginated at DBMP K when all entries in the Kth column of this table atDBMP A are later than or equal to the timestamp of the deleted entry.When a DBMP receives a deletion modification, in addition to acting onit and acknowledging receipt of it, the DBMP should also send its tableof last timestamps received to all other DBMPs. This is sent in atimestamped message which is queued with the "normal" modificationmessages. A refinement to this approach, which reduces the amount ofinformation exchanged and the size of the tables, is to have the DBMPsexchange only the oldest of the entries in the first table (oftimestamps of last modifications received from other DBMPs). These wouldthen be saved in a 1*N table, replacing the N*N table. A DBMP receivinga modification with a timestamp equal to or older than the oldesttimestamp in its second table knows that the modification has beenconfirmed as being received by all other DBMPs. A deleted entry can thusbe removed when its timestamp satisfies this condition. A DBMP would,upon receipt of a deletion modification, queue up a message with this"timestamp of oldest last modification received" for delivery to allother DBMPs.Johnson & Thomas [Page 8]
RFC 677 The Maintenance of Duplicate Databases January 1975Summary of solution: An entry in the database is a 5-tuple: (S,V,F,CT,T) where S is an selector used in all references to this entry. V is its present value. F is a deleted/undeleted flag. CT is the timestamp of the creation of this entry. T is the timestamp of the modification which set the current V and/or F of the entry. A timestamp is a pair (time,DBMP) where the DBMP identifies the site at which the time was generated, and the DBMPs are (arbitrarily) ordered, so that timestamps are completely ordered. A modification is a pair (Set-of-DBMPs,Entry) where Set-of-DBMPs is the set of DBMPs to which the Entry has yet to be delivered. An ordered (by timestamp) list of modifications is kept at each DBMP. The DBMP periodically attempts to deliver modification requests to those DBMPs which remain in the Set-of-DBMPs associated with each modification. Modification entries are removed from this list when they have been delivered to all DBMPs. When a DBMP receives a modification request from another DBMP, it compares the timestamps of the request with the timestamps of the corresponding entry (if any) in its database, and acts upon or disregards the new entry accordingly. Each DBMP keeps a vector of the timestamp (T) of the last modification received by it from each other DBMP. When a DBMP receives a deletion modification, it sends a timestamped message to all other DBMPs containing the oldest timestamp in its vector of timestamps of last modification received. Each DBMP keeps a second vector of the last of these timestamps received from each other DBMP. A deleted entry may be removed from the database when its timestamp (T) is older than all the timestamps in this second vector of timestamps received from other DBMPs.Johnson & Thomas [Page 9]
RFC 677 The Maintenance of Duplicate Databases January 1975Conclusion This paper has presented techniques by which a number of looselycoupled processes can maintain duplicate copies of a database, despitethe unreliability of their only means of communication. The copies ofthe database can be kept "consistent'. However it is possible forseemingly anomalous behavior to occur. For example a user may assign avalue to a selector using one DBMP, later use another DBMP and assign anew value, and still later find that the first value is the one thatremains in the database. This can occur if the clocks used by the twoDBMPs for their timestamps are sufficiently out of synchrony that thesecond assignment is timestamped as having taken place before the firstassignment. To the extent that the communication paths can be madereliable, and the clocks used by the processes kept close to synchrony,the probability of seemingly strange behavior can be made very small.However, the distributed nature of the system dictates that thisprobability can never be zero. The major innovation presented here is the explicit representationof the time sequence of modifications through timestamps for bothmodifications and entries. This enables the each DBMP to select the same"most-recent" modification of an entry. In addition, timestamps enablethe DBMPs to decide when a deleted entry is no longer referenced (i.e.,still active anywhere) and can be deallocated. These techniques shouldhave broader utility in building and modeling other systems ofconcurrent, cooperating processes. [ This RFC was put into machine readable form for entry ] [ into the online RFC archives by Alex McKenzie with ] [ support from GTE, formerly BBN Corp. 12/99 ]Johnson & Thomas [Page 10]
[8]ページ先頭