JPH05265780A

Movatterモバイル変換

Info

Publication number: JPH05265780A
Application number: JP6524392A
Authority: JP
Inventors: Hiroaki Higaki; 博章桧垣; Akinao Soneoka; 昭直曽根岡
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1992-03-23
Filing date: 1992-03-23
Publication date: 1993-10-15

Abstract

(57)【要約】（修正有）【目的】プロセスの故障や回復が発生した場合でもメ
ッセージ送信イベント実行の欠落や重複がなく、信頼性
の高い分散システムを実現する。【構成】プロセス１は他のプロセスとメッセージ通信
を行ないながらアプリケーションプログラム２の処理を
行なう。プロセスグループビュー９は各プロセスグルー
プについてマスタプロセス、バックアッププロセスの識
別を行なう。送信イベント処理部ではマスタプロセスは
グループ間メッセージの送信を、バックアッププロセス
はグループ内メッセージ受信処理部５で送信した送信完
了・故障・回復メッセージの因果関係を登録したグルー
プ内メッセージキュー１１からメッセージを取り出す。
受信イベント処理部４はグループ間メッセージ受信処理
部６で受信したプロセスグループ間メッセージの同一順
序受理のために登録したメッセージを受信可能メッセー
ジキュー１０から取り出す。(57) [Summary] (Correction) [Purpose] Even if a process failure or recovery occurs, there is no omission or duplication of message transmission event execution, and a highly reliable distributed system is realized. [Configuration] Process 1 processes application program 2 while performing message communication with other processes. The process group view 9 identifies a master process and a backup process for each process group. In the transmission event processing unit, the master process retrieves the message between groups, and the backup process retrieves the message from the in-group message queue 11 in which the causal relation between the transmission completion / fault / recovery message transmitted by the in-group message reception processing unit 5 is registered. ..
The reception event processing unit 4 takes out from the receivable message queue 10 the message registered for the same order acceptance of the inter-process messages received by the inter-group message reception processing unit 6.

Description

Translated fromJapanese

【発明の詳細な説明】Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、分散システムにおいて
アプリケーションプログラムを高い信頼性をもって実行
するように複製プロセスからなるプロセスグループ間の
メッセージ通信を行う複製プロセスグループ間通信方法
に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a replication process group communication method for performing message communication between process groups consisting of replication processes so that an application program can be executed with high reliability in a distributed system.

【０００２】[0002]

【従来の技術】複数のプロセスがメッセージをやりとり
しながら処理を行なう分散システムにおいて、システム
の信頼性、可用性の向上が望まれている。システムに耐
故障性を持たせるフォールトトレランス技術として、プ
ロセスを複製して異なる計算機上で実行させる方法があ
り、集中システムで実現されている方法は大きく２つに
分類される。2. Description of the Related Art In a distributed system in which a plurality of processes perform processing while exchanging messages, improvement in system reliability and availability is desired. As a fault-tolerance technique for making a system fault-tolerant, there is a method of duplicating a process and executing it on different computers, and methods implemented by a centralized system are roughly classified into two.

【０００３】第１の方法は、複製プロセスのうちの１つ
だけが実際に動作し、このプロセスが定められたチェッ
クポイントで状態情報を複製プロセスに通知し、動作プ
ロセスが故障した場合には、複製プロセスが最初のチェ
ックポイントから動作を再開する方法である。The first method is that only one of the replication processes actually operates, and this process notifies the replication process of state information at a predetermined checkpoint, and when the operating process fails, A method by which the replication process resumes operation from the first checkpoint.

【０００４】第２の方法は、すべての複製プロセスが完
全に同期して動作する方法である。The second method is one in which all replication processes operate in perfect synchronization.

【０００５】第１の方法では、動作プロセスが故障し複
製プロセスに引き継がれる時にチェックポイントまで動
作がロールバックされるため、回復に要する時間的コス
トが大きい。また、チェックポイントでは動作の引き継
ぎに十分なデータの転送と記憶が必要である。さらに、
プログラムへのチェックポイント設定記述といった開発
コストを要する。In the first method, since the operation is rolled back to the checkpoint when the operation process fails and is taken over by the duplication process, the time cost required for recovery is large. At the checkpoint, it is necessary to transfer and store sufficient data to take over the operation. further,
Development cost such as description of checkpoint setting in the program is required.

【０００６】第２の方法はこれらの事項は問題にならな
いものの、複製プロセスを完全に同期させるためのハー
ドウェア装置が必要である。Although the second method does not have these problems, it requires a hardware device for completely synchronizing the duplication process.

【０００７】分散システムにおいては、複製プロセスの
それぞれに対して送信されたメッセージの伝達遅延が異
なるために複製を密に同期させることは困難である。そ
こで、プロセスの動作がその状態と受理メッセージのみ
によって一意に定まる決定的なプロセスからなる分散シ
ステムにおいて、複製プロセスに対するメッセージをア
トミックに送信し、複製プロセスがメッセージを同一順
序で受理することによって動作履歴を同一にする仮想的
な同期を実現する方法が提案されている（文献１”Reli
able communications in the presence of failures
”，K.Birman等，ACM Transactions on Computer Syst
ems，vol.2 ，no.1，pp.47-76（1987））。ここでは、
プロセスはメッセージを複製プロセスからなるプロセス
グループに対して送信する。このメッセージは宛先のプ
ロセスグループに含まれる全てのプロセスに受信される
か、いずれのプロセスにも受信されないかのいずれかで
あり、これをプロセスグループに対するアトミックなメ
ッセージ通信という。In distributed systems, it is difficult to tightly synchronize the replication because of the different propagation delays of the messages sent to each of the replication processes. Therefore, in a distributed system that consists of a deterministic process whose process behavior is uniquely determined only by its state and an acceptance message, a message to a replication process is atomically transmitted, and the replication process accepts the messages in the same order. There has been proposed a method for realizing virtual synchronization that makes the same (Reference 1 "Reli
able communications in the presence of failures
”, K. Birman et al., ACM Transactions on Computer Syst
ems, vol.2, no.1, pp.47-76 (1987)). here,
The process sends the message to a process group of duplicate processes. This message is either received by all processes included in the destination process group or not received by any process, and this is called atomic message communication to the process group.

【０００８】アトミックに送信されたメッセージも伝達
遅延がそれぞれ異なるために、複製プロセスのそれぞれ
によって異なる順序でメッセージが受信されることがあ
る。上記文献１の方法によってこれらを受理する順序を
同一にすることで動作履歴を同一にできる。このような
アプローチによる方法には文献２”Replicated distrib
uted program”，E.C.Cooper，Proceedings of the 10t
h ACM Symposium on Operating System Principles，p
p.63-78（1985），文献３”Fault tolerant computing
in object based distributed operating systems”，
M.Ahamad等，Proceedings of the 6th Symposium on Re
liable Distributed Systems，pp.115-125（1987）とい
った例もある。Since the messages transmitted atomically also have different propagation delays, the messages may be received in different orders by each of the replication processes. The operation history can be made the same by making the order of receiving them the same by the method of the above-mentioned document 1. Reference 2 "Replicated distrib
uted program ”, ECCooper, Proceedings of the 10t
h ACM Symposium on Operating System Principles, p
p.63-78 (1985), Reference 3 "Fault tolerant computing
in object based distributed operating systems ”,
M. Ahamad et al., Proceedings of the 6th Symposium on Re
There are also examples such as liable Distributed Systems, pp.115-125 (1987).

【０００９】しかしながら、いずれもアプリケーション
プログラムのメッセージ送信要求に対して、全ての複製
プロセスが実際に送信を行なう。したがって、プロセス
がＮ多重に複製されている場合、１つのメッセージ送信
の実行に対してＮ²に比例した数のメッセージが必要に
なり通信コストが大きくなる。また、各複製プロセスが
送信した同一内容のメッセージを破棄するための機構が
受信プロセス側で必要となる。However, in all cases, all the duplication processes actually transmit the message transmission request of the application program. Therefore, when the process is duplicated N times, the number of messages proportional to N² is required to execute one message transmission, which increases the communication cost. In addition, the receiving process side needs a mechanism for discarding the message of the same content sent by each duplication process.

【００１０】動作している複製プロセスのうちの１つを
マスタプロセス、他をバックアッププロセスとし、メッ
セージ送信イベントの実行においてマスタプロセスのみ
が実際の送信を行なうことにより、Ｎ多重複製プロセス
からなるプロセスグループ間のメッセージ通信をＮに比
例した数のメッセージを用いて実現する方法がある（文
献４”Implementing fault-tolerant distributed obje
cts ”，K.Birman等，IEEE Transactions on Software
Engineering,SE-11(6)，pp.502-508（1985））。A process group consisting of N multiplex duplication processes by using one of the operating duplication processes as a master process and the other as a backup process, and only the master process actually performs transmission in executing a message transmission event. There is a method for realizing message communication between messages using a number of messages proportional to N (reference 4 "Implementing fault-tolerant distributed obje").
cts ”, K. Birman et al., IEEE Transactions on Software
Engineering, SE-11 (6), pp.502-508 (1985)).

【００１１】しかし、マスタプロセスの故障が発生し、
バックアッププロセスのいずれかが新しいマスタプロセ
スとして動作を開始する場合、これらのプロセスのイベ
ント実行の進行状況が異なるため、故障したマスタプロ
セスが送信イベント実行をどこまで終了したかを新しい
マスタプロセスが知ることができなければ、メッセージ
が重複して送信されたり、実行されるべきメッセージ送
信イベントが欠落したりすることがある。これを防ぐた
めに文献４では、どのプロセスからも読み取り可能な記
憶装置に送信の履歴を残す方法を示している。しかしこ
の方法では、プロセスグループ間通信を行うたびにこの
ような記憶装置へアクセスするというオーバーヘッドが
伴い、マスタプロセスの故障発生時に新しいマスタプロ
セスの処理の方が進んでいる場合には、送信イベント実
行の欠落を防ぐために適当な送信イベントまで処理を戻
すロールバック処理が必要になるといった問題がある。However, when a master process failure occurs,
When one of the backup processes starts to act as a new master process, the progress of event execution for these processes is different, so that the new master process can know how far the failed master process has finished sending event executions. Otherwise, duplicate messages may be sent, or message sending events that should be performed may be missed. In order to prevent this, Document 4 shows a method of leaving a transmission history in a storage device readable by any process. However, this method has the overhead of accessing such a storage device each time communication between process groups is performed, and if a new master process is ahead of processing when a master process failure occurs, the send event execution There is a problem that a rollback process for returning the process to an appropriate transmission event is necessary to prevent the loss of the data.

【００１２】また、文献１では複製プロセスの故障発生
時や回復時の処理については、その故障や回復に関する
情報をシステム内の全てのプロセスに対して通知する方
法が用いられている。これは、プロセスグループに対す
るメッセージ通信が、プロセスグループに属するプロセ
スに対する通信の集合として実現されるために、メッセ
ージを送信するプロセスが宛先のプロセスグループに属
するプロセスを正確に知ることを必要としているためで
ある。これに対処するために、ネットワークにブロード
キャストの機能を仮定した方法（文献”知的分散システ
ムにおける高信頼放送通信機構”、関等、電子情報通信
学会論文誌、vol. J73-D-1，no.2，pp.117-125（199
0））や、パス管理のためのハードウェアの機能を用い
る方法（文献”A message system supporting fault to
lerance ”，A.Brog等，Proceedingsof the 9th ACM Sy
mposium on Operating System Principles ，pp.90-99
（1983））が提案されているが、ハードウェアでなくソ
フトウェアでの対処が望ましい。[0012] Further, in Reference 1, regarding the processing at the time of occurrence of a failure and the recovery of the duplication process, a method of notifying all the processes in the system of information related to the failure or the recovery is used. This is because the message communication to the process group is realized as a set of communication to the processes belonging to the process group, so that the process sending the message needs to know exactly the process belonging to the destination process group. is there. In order to deal with this, a method assuming a broadcasting function in the network (reference “Reliable Broadcast Communication Mechanism in Intelligent Distributed Systems”, Seki et al., IEICE Transactions, vol. J73-D-1, no .2, pp.117-125 (199
0)) or a method of using the hardware function for path management (reference “A message system supporting fault to
lerance ”, A. Brog, Proceedings of the 9th ACM Sy
mposium on Operating System Principles, pp.90-99
(1983)), but it is desirable to deal with it by software instead of hardware.

【００１３】なお、分散システム内のプロセスはタイム
アウトを用いた故障検出が可能であり、故障発生後は誤
ったメッセージを送信することがないフェールストップ
プロセスであるとする。これは文献”Byzantine genera
ls in action:implementingfail-stop processors”，
F.Schneider ，ACM Transactions on Computer Systems
2 (2)，pp.145-154（1984）の方法を用いることにより
実現可能である。また、下位レイヤプロトコルによっ
て、メッセージ通信のためのプロセス間チャネルでのメ
ッセージの紛失は起こらないものとする。It is assumed that a process in the distributed system is a fail-stop process that can detect a failure using a timeout and does not send an erroneous message after a failure occurs. This is the document "Byzantine genera
ls in action: implementingfail-stop processors ”,
F. Schneider, ACM Transactions on Computer Systems
2 (2), pp.145-154 (1984). Also, the lower layer protocols shall not cause message loss on the inter-process channel for message communication.

【００１４】[0014]

【発明が解決しようとする課題】上で述べたように、従
来のプロセス複製による分散システムでのフォールトト
レランス技法では、メッセージ送信イベント実行の重複
や欠落を防ぐためには、特殊な記憶装置へのアクセスを
伴ってメッセージ通信を行なう、プロセスの故障発生時
や回復時にはシステムの全プロセスに対して通知するこ
とが必要である、という問題がある。As described above, according to the conventional fault tolerance technique in the distributed system by the process replication, in order to prevent the duplicate or the loss of the execution of the message sending event, the access to the special storage device is performed. There is a problem that it is necessary to notify all the processes of the system when a process failure occurs or when the process is recovered.

【００１５】本発明は、上記に鑑みてなされたもので、
その目的とするところは、特殊なハードウェアを用いず
に、多重度Ｎに比例した数のメッセージによってバック
アッププロセスの動作履歴をマスタプロセスと同一に
し、プロセスの故障や回復が発生した場合でもメッセー
ジ送信イベント実行の欠落や重複がなく、プロセス故障
発生時の処理オーバーヘッドが小さく、信頼性の高い分
散システムを実現し得る複製プロセスグループ間通信方
法を提供することにある。The present invention has been made in view of the above,
The purpose is to make the operation history of the backup process the same as that of the master process with a number of messages proportional to the multiplicity N without using special hardware, and to send a message even if a process failure or recovery occurs. An object of the present invention is to provide a duplicated process group communication method capable of realizing a highly reliable distributed system with no omission or duplication of event execution, small processing overhead when a process failure occurs.

【００１６】[0016]

【課題を解決するための手段】上記目的を達成するた
め、本発明の複製プロセスグループ間通信方法は、複数
のプロセスが互いにメッセージ通信を行いながら処理を
進める分散環境上で各プロセスの複製を作成し、その１
つをマスタプロセスとし、他をバックアッププロセスと
したプロセスグループ間のメッセージ通信を行う複製プ
ロセスグループ間通信方法であって、送信側のマスタプ
ロセスがプロセスグループ間メッセージを送信するとと
もに、同一プロセスグループに属するバックアッププロ
セスに対して送信終了メッセージを送信し、受信したバ
ックアッププロセスが前記送信終了メッセージを故障通
知メッセージと順序化して受理するプロセスグループ間
メッセージ送信実行手段と、受信したプロセスグループ
間メッセージの受信順序を受信側のマスタプロセスが決
定し、同一プロセスグループに属するバックアッププロ
セスに対して通知し、該通知を受けたバックアッププロ
セスがこの順序に従ってメッセージを受理する受信メッ
セージ処理手段と、プロセス故障を同一プロセスグルー
プ内のプロセスに対して通知し、受信した正常プロセス
が該通知を送信終了メッセージと順序化して受理し、マ
スタプロセス故障メッセージの受理を機会にマスタプロ
セスを切り換える手段とを有することを要旨とする。In order to achieve the above object, the replication process group communication method of the present invention creates a copy of each process in a distributed environment in which a plurality of processes proceed with processing while performing message communication with each other. And that 1
A replication inter-process communication method that performs message communication between process groups with one as a master process and the other as a backup process, in which the master process on the sending side sends an inter-process group message and belongs to the same process group. An inter-process group message transmission executing means for transmitting a transmission end message to the backup process, the received backup process ordering the transmission end message with the failure notification message and accepting, and a reception order of the received inter-process group message. A reception message processing unit that is determined by the master process on the receiving side, notifies the backup processes that belong to the same process group, and the backup processes that have received the notification receive the messages in this order; A process failure is notified to processes in the same process group, the received normal process receives the notification in order with the transmission end message, and receives the master process failure message, and switches the master process. That is the summary.

【００１７】[0017]

【作用】本発明の複製プロセスグループ間通信方法で
は、マスタプロセスはメッセージ送信イベントを実行す
るとき、プロセスグループ間通信メッセージの送信とと
もに複製プロセスに対して送信完了を通知するメッセー
ジをアトミックに送信する。バックアッププロセスは、
送信イベント実行時には実際のメッセージ送信は行なわ
ずに、受信した送信完了通知メッセージまたは故障通知
メッセージをこの送信イベント実行を機会に受理する。In the inter-replication process group communication method of the present invention, when the master process executes the message transmission event, the master process atomically transmits the inter-process group communication message and the transmission completion message to the replication process. The backup process is
When the transmission event is executed, the actual message transmission is not performed, and the received transmission completion notification message or failure notification message is accepted at this transmission event execution opportunity.

【００１８】プロセスグループ間メッセージを受信した
場合は、バックアッププロセスではただちには受理可能
とはしない。受信メッセージの受理順序はマスタプロセ
スが決定する。マスタプロセスは受信メッセージをバッ
クアッププロセスにフォワードし、バックアッププロセ
スではこれを受信してはじめて受信メッセージの受理が
可能になる。メッセージ受信イベント実行時には、この
受理可能になったメッセージを受理可能になった順番に
受け取ることとする。When an inter-process group message is received, the backup process does not immediately accept it. The master process determines the order of receiving the received messages. The master process forwards the received message to the backup process, and the backup process can receive the received message only after receiving the message. When the message reception event is executed, the messages that have become acceptable will be received in the order in which they become acceptable.

【００１９】プロセスの故障および回復の通知は、故障
および回復したプロセスが含まれるプロセスグループに
属するプロセスに対してのみ行なう。この故障および回
復の通知メッセージは送信完了通知メッセージと因果関
係を保存した順序で受理される。プロセス故障によるマ
スタプロセスの交替は、送信イベント実行要求処理時に
故障通知メッセージを受理する機会に行なう。Notification of process failure and recovery is made only to processes belonging to the process group including the failed and recovered process. The failure and recovery notification messages are accepted in the order in which the causal relationship with the transmission completion notification message is preserved. The master process is replaced by a process failure at the time of receiving the failure notification message when processing the transmission event execution request.

【００２０】[0020]

【実施例】以下、図面を用いて本発明の実施例を説明す
る。Embodiments of the present invention will be described below with reference to the drawings.

【００２１】図１は、本発明の一実施例に係わる複製プ
ロセスグループ間通信方法を実施する装置の構成を示す
ブロック図である。図１において、プロセス１は他のプ
ロセスとメッセージ通信を行ないながらアプリケーショ
ンプログラム２に記述された処理を行なう。アプリケー
ションプログラム２にはメッセージ送信イベント７やメ
ッセージ受信イベント８が記述されており、これらの実
行時にはそれぞれ送信イベント処理部３と受信イベント
処理部４に処理要求が出される。プロセスグループビュ
ー９は、システムの全プロセスグループについての各グ
ループに含まれているプロセスの識別子（プロセスＩ
Ｄ）の列である。自プロセスグループビューの先頭に自
プロセスＩＤが登録されている場合はマスタプロセス、
それ以外の場合はバックアッププロセスであるものとし
て各プロセスは動作する。送信イベント処理部３はアプ
リケーションプログラム２からの送信イベント処理要求
の処理を行なう。このとき、マスタプロセスはグループ
間メッセージの送信を行ない、バックアッププロセスは
グループ内メッセージキュー１１からメッセージを取り
出す。受信イベント処理部４はアプリケーションプログ
ラム２からの受信イベント処理要求に対して、受理可能
メッセージキュー１０からメッセージを取り出す。グル
ープ内メッセージ受信処理部５では、受信した送信完了
・故障・回復メッセージを因果関係を保存した順序（文
献”A new algorithm to implement causal orderin
g”，A.Schiper 等，Lecture Notes in Computer Scien
ce 392 ，Distributed Algorithms，pp.219-232（198
9））でグループ内メッセージキュー１１に登録する。
グループ間メッセージ受信処理部６は受信したプロセス
グループ間メッセージの同一順序受理のためにグループ
間メッセージバッファ１２を用いて処理を行ない、受理
可能メッセージを受信可能メッセージキュー１０に登録
する。FIG. 1 is a block diagram showing the configuration of an apparatus for carrying out a replication process group communication method according to an embodiment of the present invention. In FIG. 1, a process 1 performs a process described in an application program 2 while performing message communication with another process. A message transmission event 7 and a message reception event 8 are described in the application program 2, and a processing request is issued to each of the transmission event processing unit 3 and the reception event processing unit 4 when these are executed. The process group view 9 shows the identifiers of processes (process I) included in each group for all process groups of the system.
It is a row of D). Master process if own process ID is registered at the top of own process group view,
In other cases, each process operates as a backup process. The transmission event processing unit 3 processes the transmission event processing request from the application program 2. At this time, the master process sends the inter-group message, and the backup process takes out the message from the intra-group message queue 11. The reception event processing unit 4 extracts a message from the acceptable message queue 10 in response to a reception event processing request from the application program 2. In the group message reception processing unit 5, the order in which the causal relationship of the received transmission completion / fault / recovery messages is preserved (reference “A new algorithm to implement causal order
g ”, A.Schiper et al., Lecture Notes in Computer Scien
ce 392, Distributed Algorithms, pp.219-232 (198
9)) is registered in the group message queue 11.
The inter-group message reception processing unit 6 uses the inter-group message buffer 12 to process the received inter-process messages in the same order, and registers the receivable message in the receivable message queue 10.

【００２２】アプリケーションプログラムにおけるメッ
セージ送信イベント実行について説明する。The message transmission event execution in the application program will be described.

【００２３】アプリケーションプログラム２がメッセー
ジ送信イベント処理を要求した場合の、送信イベント処
理部３における処理アルゴリズムを図２に示す。自プロ
セスが複製プロセスグループのマスタプロセスであるな
らば、実際に送信イベントで要求されたメッセージの送
信を行ない、この時グループ間のメッセージとともにグ
ループ内のバックアッププロセスに対して送信完了メッ
セージを送信する（ステップ２１０〜２３０）。これら
をアトミックに送信することにより、マスタプロセスが
どの送信イベントの実行を終えたかを正しくバックアッ
ププロセスに対して通知することが可能である。一方、
バックアッププロセスである場合には、実際の送信動作
を行なわない（ステップ２２０〜２７０）。マスタプロ
セスからの送信完了メッセージが受信されていることを
確認するだけである。この方法によって、プロセス正常
時にメッセージが重複して送信されることがないことが
保証されるとともに、バックアッププロセスの処理進行
は送信イベントを機会としてマスタプロセスより先行し
ないように制御されるので、マスタプロセスの故障によ
る新しいマスタプロセスへの切り換えのときでも、送信
イベント実行の欠落を防ぐためにロールバック処理を行
なう必要がない。FIG. 2 shows a processing algorithm in the transmission event processing unit 3 when the application program 2 requests message transmission event processing. If the self process is the master process of the duplicate process group, it actually sends the message requested by the send event, and at this time sends the send complete message to the backup process in the group along with the message between groups ( Steps 210-230). By sending these atomically, it is possible to correctly notify the backup process of which send event the master process has finished executing. on the other hand,
If it is the backup process, the actual transmission operation is not performed (steps 220 to 270). It just makes sure that a send complete message from the master process is received. This method guarantees that duplicate messages will not be sent when the process is normal, and the progress of the backup process is controlled so that it does not precede the master process by using the send event as an opportunity. Even if the master process is switched to a new master process due to the failure, there is no need to perform rollback processing in order to prevent the loss of execution of the transmission event.

【００２４】バックアッププロセスが取り出したメッセ
ージが故障通知または回復通知であった場合には自グル
ープのビューの更新を行なう（ステップ２８０〜２９
５）。具体的な更新処理は図９、図１１に示すように、
故障プロセスのプロセスＩＤのグループビューからの削
除、回復プロセスのプロセスＩＤのグループビューの最
後尾への追加である。前述したように、送信イベントの
実行はマスタプロセスではメッセージ送信、バックアッ
ププロセスでは送信完了メッセージの受理によって終了
するので、この更新操作を行なった場合には図２に示す
ようにアルゴリズムの先頭に戻って処理を継続する。マ
スタプロセスの故障通知を受け取り、このグループビュ
ー更新処理によって新しいマスタプロセスとなったプロ
セスは、アルゴリズムの先頭に戻ると今度はマスタプロ
セスとしてメッセージ送信を実際に行なうことになる。If the message retrieved by the backup process is a failure notification or a recovery notification, the view of the own group is updated (steps 280-29).
5). Specific update processing is as shown in FIGS. 9 and 11.
The process ID of the failed process is deleted from the group view, and the process ID of the recovery process is added to the end of the group view. As described above, the execution of the transmission event is completed by the message transmission in the master process and the reception of the transmission completion message in the backup process. Therefore, when this update operation is performed, the process returns to the beginning of the algorithm as shown in FIG. Continue processing. The process that receives the failure notification of the master process and becomes the new master process by this group view update process will actually perform message transmission as the master process when returning to the beginning of the algorithm.

【００２５】アプリケーションプログラムにおけるメッ
セージ受信イベント実行について説明する。The message reception event execution in the application program will be described.

【００２６】アプリケーションプログラムがメッセージ
受信イベントを実行する場合の受信イベント処理部４に
おける処理を図４に示す。受理可能メッセージキューに
は後に示す方法により、複製プロセスの間ですでに同一
の受理順序が決定されているメッセージが、その決定順
にしたがってキューに登録されている。したがって、受
信イベント実行時には先頭から順番に取りだせばよい。FIG. 4 shows the processing in the reception event processing unit 4 when the application program executes the message reception event. In the acceptable message queue, by the method described later, the messages for which the same acceptance order has already been determined during the duplication process are queued according to the determination order. Therefore, when executing a reception event, it is sufficient to fetch them in order from the beginning.

【００２７】グループ間メッセージ受信処理について説
明する。Inter-group message reception processing will be described.

【００２８】他のプロセスグループに含まれるプロセス
から送信されたメッセージを受信した場合の、グループ
間メッセージ受信処理部６におけるグループ間メッセー
ジ受信処理を図５に示す。ここでは、複製プロセスによ
るグループ間メッセージの同一順序での受理を実現す
る。各プロセスはグループ間メッセージを受信してもた
だちにはアプリケーションでの受理を可能とはせず、一
時グループ間メッセージバッファに保持する。受理順序
はプロセスグループのマスタプロセスが決定し、その受
理順序をバックアッププロセスに通知する。このために
用いられるフォワードメッセージにはグループ間メッセ
ージと同一の識別子が付与されている。バックアッププ
ロセスは、このフォワードメッセージを受信して受理順
序を知り、図６に示すように受理可能メッセージキュー
に登録する（ステップ６１０）。グループ間メッセージ
を受信した時に、既にマスタプロセスからのフォワード
メッセージを受信している場合がある。この場合にメッ
セージ受信が重複処理されることを防ぐために、グルー
プ間メッセージより先に受信したフォワードメッセージ
は受理可能キューに登録すると同時に、グループ間メッ
セージバッファに登録しておく。グループ間メッセージ
受信時には、まずこのバッファに同一識別子を付与され
たフォワードメッセージが登録されているかを確認し、
登録されている場合にはただちにこれを削除して処理を
終了する（ステップ５１０，５２０）。FIG. 5 shows an inter-group message reception process in the inter-group message reception processing unit 6 when a message transmitted from a process included in another process group is received. Here, the reception of inter-group messages in the same order by the duplication process is realized. When each process receives an inter-group message, it is not immediately accepted by the application, and is held in a temporary inter-group message buffer. The master process of the process group determines the acceptance order, and notifies the backup process of the acceptance order. The forward message used for this purpose has the same identifier as the inter-group message. The backup process receives this forward message, knows the acceptance order, and registers it in the acceptable message queue as shown in FIG. 6 (step 610). When the inter-group message is received, the forward message from the master process may have already been received. In this case, in order to prevent the message reception from being duplicated, the forward message received prior to the inter-group message is registered in the acceptable queue and at the same time is registered in the inter-group message buffer. When receiving a message between groups, first check if a forward message with the same identifier is registered in this buffer,
If it is registered, it is immediately deleted and the processing is terminated (steps 510 and 520).

【００２９】このマスタプロセスが受信グループ間メッ
セージをフォワードする方法は、同時に複製プロセスの
故障情報をグループ内に隠蔽することを可能にしてい
る。グループ間メッセージを送信するプロセスは自プロ
セスが持っている宛先グループのグループビューに含ま
れているプロセスに対してメッセージを送信する。しか
し、このグループビューは実際に宛先グループに含まれ
ているプロセスとは異なっていることがある。すなわ
ち、複製プロセスの幾つかは既に故障している場合があ
り、また幾つかの複製プロセスが新たにグループに追加
されている場合もある。マスタプロセスがフォワードす
る方法を用いることによって、故障通知や回復通知をシ
ステムの全てのプロセスに通知せずに、同一プロセスグ
ループに含まれるプロセスのみに通知するだけで、グル
ープに含まれるプロセスに正しくグループ間メッセージ
を届けることが可能になっている。ただし、送信プロセ
スの持っているグループビューに含まれていないプロセ
スは、グループ間メッセージを受け取ることができず、
フォワードメッセージだけを受け取るため、前に述べた
ようにフォワードメッセージをグループ間メッセージバ
ッファに登録して、グループ間メッセージの受信を待ち
合わせる処理は無駄である。したがって、グループ間メ
ッセージに送信プロセスの持っている宛先グループのビ
ューを付与し、受信したプロセスがグループ間メッセー
ジの待ち合わせが可能であるか判定する。同時に、この
グループ間メッセージに付与された宛先グループのビュ
ーが実際のプロセスグループ構成とは異なっている場合
には、正しいグループビューを付与したビュー更新メッ
セージを送信プロセスを含むプロセスグループに対して
送る。図７に示すように、このメッセージを受信したプ
ロセスは、自プロセスの持つ宛先プロセスグループビュ
ーをメッセージに付与されているグループビューに変更
する。This method in which the master process forwards the message between the reception groups makes it possible to hide the failure information of the replication process in the group at the same time. The process that sends a message between groups sends a message to the process included in the group view of the destination group that the process owns. However, this group view may differ from the processes actually included in the destination group. That is, some of the replication processes may have already failed, and some replication processes may have been newly added to the group. By using the method in which the master process forwards, it is possible to correctly notify the processes included in the group by notifying all processes in the system of failure notifications and recovery notifications, but only the processes included in the same process group. It is possible to deliver messages between. However, a process that is not included in the group view of the sending process cannot receive the inter-group message,
Since only the forward message is received, the process of registering the forward message in the inter-group message buffer and waiting for the reception of the inter-group message as described above is wasteful. Therefore, a view of the destination group held by the sending process is added to the inter-group message, and it is determined whether the received process can wait for the inter-group message. At the same time, if the view of the destination group given to this inter-group message is different from the actual process group configuration, a view update message with the correct group view is sent to the process group containing the sending process. As shown in FIG. 7, the process receiving this message changes the destination process group view of its own process to the group view attached to the message.

【００３０】グループ内メッセージの受信処理について
説明する。The reception processing of the in-group message will be described.

【００３１】グループ内メッセージ受信処理部５では、
図３の送信完了メッセージ受信処理、図８の故障通知メ
ッセージ受信処理、図１０の回復通知メッセージ受信処
理に示すアルゴリズムにより、これらのメッセージをグ
ループ内メッセージキューに登録する。メッセージの登
録順序はメッセージ間の因果関係を保存している。この
ため、バックアッププロセスの送信イベント処理部が送
信イベント実行の時にグループ内メッセージキューから
マスタプロセスの故障通知を取り出した時には、故障し
たマスタプロセスが故障以前に送った送信完了メッセー
ジは全て受け取られている。したがって、送信イベント
がプロセス故障発生時でも複製プロセスによって重複し
て実行されることがない。In the intra-group message reception processing section 5,
These messages are registered in the intra-group message queue by the algorithm shown in the transmission completion message reception process of FIG. 3, the failure notification message reception process of FIG. 8, and the recovery notification message reception process of FIG. The registration order of messages preserves the causal relationship between messages. Therefore, when the transmission event processing unit of the backup process retrieves the failure notification of the master process from the message queue in the group when executing the transmission event, all the transmission completion messages sent by the failed master process before the failure are received. .. Therefore, the transmission event will not be duplicated by the duplication process even when a process failure occurs.

【００３２】図８は、複製プロセスの故障通知メッセー
ジの受信時のグループ内メッセージ受信処理部５におけ
る処理アルゴリズムである。前に述べたように、故障通
知メッセージは故障プロセスを含むプロセスグループに
対してのみ通知すればよく、これを受信した場合に送信
完了メッセージと順序化して受理することによって、送
信イベント実行の重複や欠落を防ぐことができる。した
がって、故障通知メッセージを受信した場合、マスタプ
ロセスではただちにグループビューから故障プロセスの
プロセスＩＤを削除し（ステップ８３０）、一方、バッ
クアッププロセスではグループ内メッセージキューに保
持する（ステップ８４０）。実際の故障プロセスのプロ
セスＩＤの削除は、送信イベント実行の処理においてこ
のキューから故障通知メッセージが取り出された時に行
なわれる。FIG. 8 shows a processing algorithm in the in-group message reception processing unit 5 when the failure notification message of the duplication process is received. As described above, the failure notification message needs to be notified only to the process group including the failed process, and when it is received, it is received in order as the transmission completion message, so that duplication of the transmission event execution or You can prevent omissions. Therefore, when the failure notification message is received, the master process immediately deletes the process ID of the failed process from the group view (step 830), while the backup process holds it in the in-group message queue (step 840). The process ID of the actual failure process is deleted when the failure notification message is fetched from this queue in the process of executing the transmission event.

【００３３】図１０は、回復通知メッセージのグループ
内メッセージ受信処理部５における処理アルゴリズムで
あるが、基本的には前述の故障通知メッセージの処理と
同様である。ただし、回復プロセスに対してはマスタプ
ロセスの持っている状態情報、本実施例実現のために用
いるプロセスグループビュー、受理可能メッセージキュ
ー、グループ内メッセージキュー、グループ間メッセー
ジキューの内容を通知するためのメッセージを送る。FIG. 10 shows a processing algorithm of the recovery notification message in the in-group message reception processing section 5, but it is basically the same as the processing of the failure notification message described above. However, for the recovery process, the status information of the master process, the process group view used to implement this embodiment, the acceptable message queue, the message queue within the group, and the message queue between groups send a message.

【００３４】本実施例の場合、Ｎ多重に複製されたプロ
セスからなるプロセスグループ間メッセージ通信に必要
なメッセージ数は、グループ間メッセージＮ、送信完了
メッセージＮ−１、フォワードメッセージＮ−１、合計
３Ｎ−２メッセージであり、多重度に比例したメッセー
ジ数で通信が実現されている。また、プロセスの故障や
回復の通知に必要なメッセージ数はＮ−１であり、シス
テムの全グループ数には依存しない。In the case of this embodiment, the number of messages required for message communication between process groups consisting of N-multiplexed processes is 3N in total, which is an inter-group message N, a transmission completion message N-1, and a forward message N-1. -2 messages, and communication is realized with the number of messages proportional to the multiplicity. Further, the number of messages required for notification of a process failure or recovery is N-1, and does not depend on the total number of groups in the system.

【００３５】上述した本発明の複製プロセスグループ間
通信方法では、（１）プロセス故障発生時でも、送信イ
ベント実行におけるメッセージ送信動作が重複したり欠
落せず、（２）フォールトトレランスがプログラマやユ
ーザに対して透過であり、すなわちアプリケーションプ
ログラムへのチェックポイント設定記述が不要であり、
特殊なハードウェア装置の付加を必要とせず、（３）プ
ロセス故障は複製プロセスグループ内に隠蔽されるとい
う特徴を有する。In the above-described communication method between duplicated process groups of the present invention, (1) even when a process failure occurs, message transmission operations in transmission event execution are not duplicated or lost, and (2) fault tolerance is provided to programmers and users. In contrast, it is transparent, that is, checkpoint setting description in the application program is unnecessary,
No special hardware device is required, and (3) the process failure is concealed in the duplicate process group.

【００３６】従って、本プロセスグループ間メッセージ
通信方法を用いることにより、従来必要とされた特殊な
記憶装置を用いずに既存の計算機ネットワークを用い
て、チェックポイント設定やロールバック処理という大
きなオーバーヘッドを伴うプロセス故障からの回復処理
を行なわず、またプロセス故障通知の範囲を狭くするこ
とによってグループ間メッセージ通信、プロセス故障通
知、プロセス回復通知とも多重度Ｎに比例したメッセー
ジ数で、高い信頼性を持つ分散システムの構築が可能に
なる。Therefore, by using the inter-process group message communication method, a large overhead such as checkpoint setting and rollback processing is used by using the existing computer network without using the special storage device conventionally required. By not performing recovery processing from a process failure and narrowing the range of process failure notification, message communication between groups, process failure notification, and process recovery notification are distributed in a highly reliable manner with the number of messages proportional to the multiplicity N. The system can be constructed.

【００３７】[0037]

【発明の効果】以上説明したように、本発明によれば、
複製プロセスからなるプロセスグループ間通信方法を用
いることによって、少ないメッセージ数により通信チャ
ネルを圧迫せず、プロセス故障発生時でもメッセージ送
信イベント実行の重複や欠落を防ぐことによりシステム
の動作に矛盾が生じず、かつこの矛盾防止のためにチェ
ックポイント設定という開発コストや故障回復処理のた
めのロールバック処理という実行時コストがない、高信
頼分散システムが実現できる。As described above, according to the present invention,
By using the communication method between process groups consisting of duplicated processes, the communication channel is not squeezed by a small number of messages, and even if a process failure occurs, there is no inconsistency in system operation by preventing duplicate or missing execution of message transmission events. In addition, a highly reliable distributed system without the development cost of setting checkpoints and the runtime cost of rollback processing for failure recovery processing to prevent this contradiction can be realized.

【図面の簡単な説明】[Brief description of drawings]

【図１】本発明の一実施例に係わる複製プロセスグルー
プ間通信方法を実施する装置の構成を示すブロック図で
ある。FIG. 1 is a block diagram showing a configuration of an apparatus that implements an inter-duplicate process group communication method according to an embodiment of the present invention.

【図２】送信イベント処理部における送信イベント実行
要求を処理するアルゴリズムを示すフローチャートであ
る。FIG. 2 is a flowchart showing an algorithm for processing a transmission event execution request in a transmission event processing unit.

【図３】グループ内メッセージ受信処理における送信完
了メッセージの受信処理アルゴリズムを示すフローチャ
ートである。FIG. 3 is a flowchart showing a reception processing algorithm of a transmission completion message in the intra-group message reception processing.

【図４】受信イベント処理部における受信イベント実行
要求を処理するアルゴリズムを示すフローチャートであ
る。FIG. 4 is a flowchart showing an algorithm for processing a reception event execution request in a reception event processing unit.

【図５】グループ間メッセージ受信処理部におけるグル
ープ間メッセージの受信処理アルゴリズム示すフローチ
ャートである。FIG. 5 is a flowchart showing an inter-group message reception processing algorithm in the inter-group message reception processing unit.

【図６】グループ間メッセージ受信処理部におけるフォ
ワードメッセージの受信処理アルゴリズムを示すフロー
チャートである。FIG. 6 is a flowchart showing a reception processing algorithm of a forward message in an inter-group message reception processing unit.

【図７】グループ間メッセージ受信処理部におけるビュ
ー更新メッセージの受信処理アルゴリズムを示すフロー
チャートである。FIG. 7 is a flowchart showing a view update message reception processing algorithm in the inter-group message reception processing unit.

【図８】グループ内メッセージ受信処理部における故障
通知メッセージの受信処理アルゴリズムを示すフローチ
ャートである。FIG. 8 is a flowchart showing a reception processing algorithm of a failure notification message in the in-group message reception processing unit.

【図９】送信イベント処理部における故障通知メッセー
ジ受理時のグループビュー更新アルゴリズムを示すフロ
ーチャートである。FIG. 9 is a flowchart showing a group view update algorithm when a transmission event processing unit receives a failure notification message.

【図１０】グループ内メッセージ受信処理部におけるプ
ロセス回復処理アルゴリズムを示すフローチャートであ
る。FIG. 10 is a flowchart showing a process recovery processing algorithm in the in-group message reception processing unit.

【図１１】送信イベント処理部における回復通知メッセ
ージ受理時のグループビュー更新アルゴリズムを示すフ
ローチャートである。FIG. 11 is a flowchart showing a group view update algorithm when a recovery notification message is received in the transmission event processing unit.

【符号の説明】[Explanation of symbols]

１プロセス２アプリケーションプログラム３送信イベント処理部４受信イベント処理部５グループ内メッセージ受信処理部６グループ間メッセージ受信処理部 1 process 2 application program 3 transmission event processing unit 4 reception event processing unit 5 intra-group message reception processing unit 6 inter-group message reception processing unit

Claims

Translated fromJapanese

【特許請求の範囲】[Claims]