US20040172628A1

Movatterモバイル変換

Info

Publication number: US20040172628A1
Application number: US10/795,304
Authority: US
Inventors: Hiromitsu Aramaki; Hiroyuki Takatsu; Akio Tatsumi
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2000-03-23
Filing date: 2004-03-09
Publication date: 2004-09-02
Also published as: US20010025200A1; JP2001273122A; US6721612B2

Abstract

A distributing node initiates an install control program in receiving nodes, and then broadcasts or multicasts program data to the receiving nodes. Thereby, the installation of the program into the nodes is carried out in shorter time. In this event, the distributing node and the receiving nodes buffer the program data in units of data block sizes of storage devices associated therewith. The distributing node executes in parallel the processing for storing data read from the storage device in a buffer, and the processing for reading the data from the buffer and broadcasting or multicasting the read data to the receiving node. The receiving node executes in parallel the processing for storing the data received from the distributing node in a buffer, and the processing for reading the program data from the buffer and storing the program data in the storage device thereof.

Description

BACKGROUND OF THE INVENTION

The present invention relates to installation of a program into nodes independent of one another in a parallel computer system including plural nodes.[0001]

An installing method for use in a parallel computer system is disclosed, for example, in JP-A-11-296349, wherein upon completion of installation into a particular node, this node serves as a server machine to sequentially install software into different nodes to reduce time required for the installation in the overall system.[0002]

Also, JP-A-6-309261 discloses a method including a step of sending an install instruction from a server machine, a step of requesting required software from client machines to the server machine, and a step of starting installation of the software into plural client machines.[0003]

Further, JP-A-6-59994 discloses a method including a step of sending an install start instruction and install information from a primary station computer device to plural secondary station computer devices to install a program into plural client machines.[0004]

A parallel computer system may include a number of nodes ranging from several tens to several thousands or more because of requirements imposed thereto to execute a large scale of computations. When the same programs are incorporated into these nodes, it is necessary to reduce time required for installing the programs. In the prior art JP-A-11-296349, assuming that the number of nodes in a system is N, and time required for installation per node is T, time required for the installation into all nodes is expressed by (log[0005]₂N)×T.

SUMMARY OF THE INVENTION

It is an object of the present invention to further reduce the above installation time (log[0006]₂N)×T required for installing into plural nodes.

The present invention is characterized by simultaneously performing install processing in plural nodes by simultaneously transferring data on a program to be installed, utilizing communication means interconnecting the respective nodes.[0007]

An arbitrary node in a parallel computer system reads every predefined amount of programs from a storage medium which stores the programs, and delivers program data to all nodes, into which the programs are to be installed, through the communication means. Each node receives the data and writes the data into a storage device of the node itself to install the same program in parallel.[0008]

Also, a master install control program for distributing a program is executed by one node or an install device in the parallel computer system. The master install control program reads a program from a storage medium which stores programs, and transfers the read program. In this event, plural buffers are used for communication of data associated with the reading and transferring of the program.[0009]

A node receiving the program executes an install control program for receiving the distributed data. The install control program receives data on the program, which is to be executed in the node, from the distribution node, and writes the received data into a storage device of the node itself. Plural buffers are utilized for communication of data during the reception of data and the writing into the storage device.[0010]

The master install control program and the install control program rely on the buffers to process in parallel the reading of the program from the recording medium, the delivery of the read program, the reception of the program, and the writing of the program into the storage device, to reduce time required for installing the program into plural nodes.[0011]

In an environment in which the present invention is implemented under the best condition, transfer time is calculated as follows. Assuming for example that the number of nodes is N; a total data size of a program to be distributed is A; a predefined amount of data size for distribution is B; time required for reading the predefined amount of data is C; time required for transferring the predefined amount of data to all nodes is D; time required for receiving the predefined amount of data is E; and time required for writing the predefined amount of data into an external storage device is F, time required for installing the program into all nodes is expressed by ((A/B)×F)+(C+D+E). (C+D+E) is time taken for transferring the first predefined amount of data in the processing for writing the predefined amount of data into the external storage device. Subsequently, the data read processing, the transfer-to-node processing and the data reception processing are performed in parallel through the buffers, so that time required for the processing is included in time required for writing data into the storage device.[0012]

As described above, since a program is distributed to all nodes at one time, time required for installing the program into all the nodes does not depend on the number of nodes N.[0013]

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a block diagram illustrating the configuration of a parallel computer system;[0014]

FIG. 2 is a diagram illustrating a data flow during installation of a program;[0015]

FIG. 3 is a block diagram illustrating the configuration of a master install control program;[0016]

FIG. 4 is a block diagram illustrating the configuration of an install control program;[0017]

FIG. 5 is a flow chart illustrating the flow of processing executed by the master install control program;[0018]

FIG. 6 is a flow chart illustrating data read processing in the master install control program;[0019]

FIG. 7 is a flow chart illustrating data transfer processing in the master install control program;[0020]

FIG. 8 is a flow chart illustrating the flow of processing executed by the install control program;[0021]

FIG. 9 is a flow chart illustrating data reception processing in the install control program; and[0022]

FIG. 10 is a flow chart illustrating disk write processing in the install control program.[0023]

DETAILED DESCRIPTION OF THE EMBODIMENTS

FIG. 1 illustrates the configuration of a parallel computer system. In a[0024]

parallel computer system

1, plural nodes, each including a CPU, a memory and a disk drive, are interconnected through aninternal network11.

For example, a node ([0025]1)2 includes a computation unit (1)21 having a CPU and a memory; adisk22; and aninstall device23 such as a hard disk or a magnetic tape. The node (1)2 is connected to theinternal network11 through acommunication device12 which has a broadcast or multicast transfer function. In this way, all nodes (i.e., node (1)2, node (2)3, node (3)4, node (4)5, . . . , node (n-1)6 and node (n)7) are interconnected to theinternal network11.

The[0026]

disk

22 is an external storage device for storing a distributed program, and may be implemented by a hard disk or the like. Theinstall device23 is an external storage device for storing programs to be distributed, and may be implemented by a hard disk, a magnetic tape, an optical disk or the like. Alternatively, instead of the external storage device connected to each node, a storage region may be reserved on a memory in each node, or a program may be directly stored in a work region.

FIG. 2 is a block diagram illustrating an example in which a program to be distributed is read from the[0027]

install device

23 and installed into all the nodes through theinternal network11.Data read processing81 at the node (1)2 reads every predefined amount of the program to be distributed, todata dividing buffers90 from theinstall device23 storing the program to be distributed.Data transfer processing82 transfers the data read into the buffers to all the nodes to which the program is to be distributed, through theinternal network11. One of all the nodes is represented by the node (2)3. In the node (2)3,data reception processing111 remains in a data waiting state, for waiting data from thenetwork11. As the program data is delivered by thedata transfer processing82, thedata reception processing111 initiates to read the transferred data intodata dividing buffers120.Disk write processing112 writes the data read into the buffers into adisk33 of the node (2)3.

A master[0028]

install control program

80 including thedata read processing81 and thedata transfer processing82, and aninstall control program120 including thedata reception processing111 and thedisk write processing112 are stored in the storage device of each node. Alternatively, theinstall control program120 and the masterinstall control program80 may be stored in a storage device of a distributing node, such that the distributing node distributes theinstall control program120 to receiving nodes when the parallel computer system is powered on. In this event, the distributing node sequentially transfers the install control program to the receiving nodes.

FIG. 3 illustrates the configuration of the master[0029]

install control program

80. In the node (1)2 to which theinstall device23 storing the program to be distributed is connected, the masterinstall control program80 transfers every predefined amount of data read from theinstall device23 to all of the node (1)2, node (2)3, node (3)4, node (4)5, . . . , node (n−1)6 and node (n)7 to which the program is distributed. In the following, the masterinstall control program80 will be described in detail.

The master[0030]

install control program

80 includes thedata read processing81 for reading data from theinstall device23 which stores programs; thedata transfer processing82 for transferring the data read in thedata read processing81 to all the nodes;data dividing buffers90 each of which stores the predefined amount of data read from theinstall device23 which stores programs; and buffer management tables100 for managing thedata dividing buffers90. Each buffer in thedata dividing buffers90 has a size equivalent to that of the predefined amount read from theinstall device23.

The buffer management tables[0031]100 store information indicative of the states of the associated buffers to control thedata dividing buffers90 used in thedata read processing81 and thedata transfer processing82. The buffer management tables100 include table (1)101, table (2)102, . . . , table (m−1)103 and table (m)104 corresponding to buffer (1)91, buffer (2)92, . . . , buffer (m−1)93 and buffer (m)94 in thedata dividing buffers90.

FIG. 4 illustrates the configuration of the[0032]

install control program

110.

The[0033]

install control program

110 in each of the node (1)2, node (2)3, node (3)4, node (4)5, node (n−1)6 and node (n)7 is initiated by the masterinstall control program80.

The[0034]

install control program

110 includesdata reception processing111 for receiving data transferred from the masterinstall control program80;disk write processing112 for writing data read in thedata reception processing111 into a disk;data dividing buffers120 for storing every predefined amount of data transferred from the masterinstall control program80 and received in thedata reception processing111; and buffer management tables130 for managing thedata dividing buffers120. Each buffer in thedata dividing buffers120 has a size equivalent to that of the predefined amount read from the installdevice23.

The operation of the master install[0035]

control program

80 in the configuration of FIG. 3 will be described along flow charts illustrated in FIGS. 5, 6 and7

The flow chart in FIG. 5 illustrates the flow of processing executed by the master install[0036]

control program

80. First, the master installcontrol program80 reserves a region in thedata dividing buffers90 for storing data read from the installdevice23, and a region in the buffer management tables100 for managing the data dividing buffers90 (F50), and initializes the data division management tables100 to an initial state (F51). Next, theprogram80 initiates the installcontrol programs110 in all the nodes (F52). Finally, theprogram80 initiates the data readprocessing81 and thedata transfer processing82 in the master install control program80 (F53). In this event, the installcontrol programs110 in all the nodes are sequentially initiated in each of the nodes (F52).

Each of the receiving nodes is additionally provided with confirmation means for confirming whether a node power source is ON or OFF, and notification means for notifying a distributing node of the power-source state of the node itself. The distributing node may identify receiving nodes in which the power source is in ON state, before initiating the install[0037]

control program

110 in all the nodes (F52), to initiate the installcontrol programs110 in the operable receiving nodes (F52).

FIG. 6 is a flow chart illustrating the data read[0038]

processing

81. The data readprocessing81 is the processing for sequentially reading the predefined amount of data from the installdevice23 into buffer (1)91, buffer (2)92, . . . , buffer (m−1)93 and buffer (m)94.

The data read processing[0039]81 stores the location of the buffer (1)91 which is the head of the data dividing buffers90 (F60). Next, the data readprocessing81 finds a corresponding table in the buffer management tables100 from the stored buffer location (F61), and checks the state of the buffer from the found table (F62). A buffer may take one of the following four states: a state (a reading-in-progress state) in which the predefined amount of data is being read from the installdevice23 into the buffer; a state (a reading-completion state) in which the data has been completely read into the buffer; a state (a transfer-completion state) in which the data has been completely transferred to all the nodes; and a state in which a program has been fully read from the install device23 (End of File). These states are represented by

numerical values

0, 1, 0, −1 from the first state. It should be noted that the reading-in-progress state and the transfer-completion state are synonym. When the buffer is in the state “1,” theprocessing81 waits for an event (F63). When the buffer is in the state “0” theprocessing81 checks whether or not data still remains in the install device23 (F64). If data remains, the predefined amount of data is read from the installdevice23 into the buffer (F65). Then, the processing81 transitions the state of the buffer to the reading-completion state (F66). Theprocessing81 finds and stores the location of the next buffer (F67), and returns to F61. If no data remains in the installdevice23, theprocessing81 sets a table corresponding to the buffer location to “−1” (F68), followed by termination of the flow. The correspondence between the buffers and the tables is made by reserving arrays of the same length. If the end location of the array is reached in determining the next location, the head of the array is pointed. Also, when the state of the buffer is set to “1” (F66), and when thedata transfer processing82 is waiting for an event when the state of the buffer is set to “−1” (F68), thedata transfer processing82 is released from the event waiting state, and forced to continue the processing.

FIG. 7 is a flow chart illustrating the[0040]

data transfer processing

82. Thedata transfer processing82 is the processing for transferring data read into thedata dividing buffers90 in the data readprocessing81 to the installcontrol program110 in each of the nodes. The data transfer processing82 stores the location of the buffer (1)91 which is the head of the data dividing buffers90 (F70).

Next, the[0041]

processing

82 finds a corresponding table in the buffer management tables100 from the stored buffer location (F71), and checks the state of the buffer from the found table (F72, F74). The buffer may take one of the four states similar to the foregoing. When the buffer is in the state “0” theprocessing82 waits for an event (F73). When the buffer is in the state “−1,” theprocessing82 notifies all the nodes that the data has been fully read (the end of data is reached) (F75), followed by termination of the flow. When the buffer is in the state “1,” theprocessing82 transfers the data in the buffer to all the nodes (F76). Then, the processing82 transitions the state of the buffer to the transfer-completion state (F77), stores the location of the next buffer (F78), and returns to F71. If the end location of the array is reached in determining the next location, the head of the array is pointed in a manner similar to the foregoing. Also, when the data readprocessing81 is waiting for an event when the state of the buffer is set to “0” (F77), the data readprocessing81 is released from the event waiting state, and forced to continue the processing.

Next, the operation of the install[0042]

control program

110 in the configuration of FIG. 4 will be described along flow charts illustrated in FIGS. 8, 9 and10.

The flow chart in FIG. 8 illustrates the flow of processing executed by the install[0043]

control program

110. First, theprogram110 reserves a region in thedata dividing buffers120 for storing data transferred from thedata transfer processing82 in the master installcontrol program80, and a region in the buffer management tables130 for managing the data dividing buffers120 (F80), and initializes the data-division management tables130 (F81) into the initial state. Finally, theprogram110 initiates thedata reception processing111 and thedisk write processing112 in the install control program110 (F82).

FIG. 9 is a flow chart illustrating the[0044]

data reception processing

111. Thedata reception processing111 is the processing for sequentially receiving the predefined amount of data from thedata transfer processing82 in the master installcontrol program80 in buffer (1)121, buffer (2)122, buffer (m−1)123 and buffer (m)124.

The[0045]

data reception processing

111 stores the location of the buffer (1)121 which is the head of the data dividing buffers120 (F90).

Next, the[0046]

processing

111 finds a corresponding table in the buffer management tables130 from the stored buffer location (F91), and checks the state of the buffer from the table (F92). The buffer may take one of the following four states: a state (a reception-in-progress state) in which data is being received from thedata transfer processing82; a state (a receiving-completion state) in which data has been completely received; a state (a writing-completion state) in which data has been completely written into a disk, and a state in which the end of data has been reached. The respective states are represented by

numerical values

0, 1, 0, −1 from the first one. It should be noted that the reception-in-progress state and the writing-completion state are synonym. When the buffer is in the state “1,” theprocessing111 waits for an event (F93). When the buffer is in the state “0,” theprocessing111 checks whether or not data to be transferred from thedata transfer processing82 still remains (F94). If data remains, data transferred from thedata transfer processing82 is read into the buffer (F95). Then, the processing111 transitions the state of the buffer to the receiving-completion state (F96). Theprocessing111 finds and stores the location of the next buffer (F97), and returns to F91. If no data remains, theprocessing111 sets a table corresponding to the buffer location to “−1” (F98), followed by termination of the flow. The correspondence between the buffers and the tables is made by reserving arrays of the same length. If the end location of the array is reached in determining the next location, the head of the array is pointed. Also, when the state of the buffer is set to “−1” (F96), and when thedata write processing112 is waiting for an event when the state of the buffer is set to “−1” (F98), thedata write processing112 is released from the event waiting state, and forced to continue the processing.

FIG. 10 is a flow chart illustrating the[0047]

disk write processing

112. Thedisk write processing112 is the processing for writing data read into the data dividing buffers in thedata reception processing111 into a disk connected to the node. Thedisk write processing112 stores the location of the buffer (1)121 which is the head of the data dividing buffers120 (F100).

Next, the[0048]

processing

112 finds a corresponding table in the buffer management tables130 from the stored buffer location (F101), and checks the state of the buffer from the found table (F102, F104). The buffer may take one of the four states similar to the foregoing. When the buffer is in the state “0,” theprocessing112 waits for an event (F103). When the buffer is in the state “−1” the flow is terminated. When the buffer is in the state “1,” theprocessing112 writes the data in the buffer into a disk (Fl05). Then, the processing112 transitions the state of the buffer to the writing-completion state (F106), finds and stores the location of the next buffer (F107), and returns to F101. If the end location of the array is reached in determining the next location, the head of the array is pointed in a manner similar to the foregoing. Also, when thedata reception processing111 is waiting for an event when the state of the buffer is set to “0” (F106), thedata reception processing111 is released from the event waiting state, and forced to continue the processing.

The foregoing description has been made for the embodiment according to the present invention which distributes the executed-in-parallel program in the parallel computer system including plural nodes interconnected through the internal network. It goes without saying, however, that the present invention can be applied to a system which has plural computers connected to a network and executed in parallel by multicasting program data from a distributing node to receiving nodes.[0049]

Claims

1. A method of installing a program in a parallel computer system including plural nodes comprising:

selecting one of said plural nodes as a distributing node for delivering to remaining nodes of said plural nodes a program to be executed in parallel in said parallel computer system, and selecting said remaining nodes as receiving nodes;

providing each of said receiving nodes with an install control program which receives and stores a program from said distributing node;

causing said distributing node to initiate said install control program in each of said receiving nodes; and

broadcasting said program to be executed in parallel, from said distributing node to said receiving nodes, and causing said receiving nodes to simultaneously receive said program.

2. An installing method according toclaim 1, wherein:

said distributing node confirms whether or not each of said receiving nodes is powered on before initiating said install control program in each of said receiving nodes, and broadcasts said program to receiving nodes which are powered on.

3. An installing method according toclaim 1, further comprising:

notifying all said receiving nodes that said program to be executed in parallel has been delivered, after completing the delivery of said program.

4. An installing method according toclaim 1, wherein in broadcasting said program to be executed in parallel, following four steps are executed in parallel:

a step of causing said distributing node to read said program from a storage device, and to store the read program in a first buffer;

a step of causing said distributing node to read said program from said first buffer, and to send the read program to said receiving nodes;

a step of causing said receiving node to store said received program in a second buffer; and

a step of causing said receiving node to read a program distributed from said second buffer, and to store the distributed program in a storage device.

5. An installing method according toclaim 4, wherein said distributing node reads a program distributed in data block units of said storage device, and stores the read program in said first buffer.

6. An installing method according toclaim 4., wherein said receiving node reads said received program from said second buffer in data block units of said storage device of said receiving node itself, and stores the read program in said storage device of said receiving node itself.

7. An installing method according toclaim 4, wherein said distributing node divides said first buffer by a block size of a data block in said storage device, and manages said first buffer as a ring buffer using a first buffer management table.

8. An installing method according toclaim 7, wherein

said first buffer management table stores a three-value buffer state indicative of a state in which a data block is being read or has been completely transmitted to said receiving nodes, a state in which said data block has been completely read, or a state in which all data has been fully read, to manage said first buffer in blocks.

9. An installing method according toclaim 4, wherein said receiving node divides said second buffer by a block size of a data block in the storage device to manages said second buffer as a ring buffer using a second buffer management table.

10. An installing method according toclaim 9, wherein

said second buffer management table stores a three-value buffer state indicative of a state in which data is being received or has been completely written into a disk, or a state in which said data has been completely received, or a state in which all data has been completely written, to manage said second buffer in blocks.

11. A parallel computer system including plural nodes, comprising:

a storage device for storing a program which is executed in parallel in said plural nodes of said parallel computer system;

a distributing node being one of said plural nodes, and for delivering said program to remaining nodes of said plural nodes; and

receiving nodes for receiving said program delivered from said distributing node,

wherein said distributing node includes a master install control unit, said master install control unit notifying said receiving nodes, which receive said program, of delivery of said program before delivering said program, reading said program to be delivered from said storage device, and delivering said program simultaneously to said receiving nodes; and

said receiving node includes an install control unit, said install control unit receiving said program from said distributing node, and storing said program in a storage device of said receiving node itself.

12. A parallel computer system according toclaim 11, wherein

said master install control unit includes a first buffer, said first buffer temporarily holding said program to be delivered which is read from said storage device; and

said master install control unit performs in parallel an operation involved in reading said program to be delivered from said storage device and writing said program to be delivered in said first buffer, and an operation involved in reading said program to be delivered from said first buffer and transferring data to said receiving nodes.

13. A parallel computer system according toclaim 12, wherein

said master install control unit includes a first buffer management table, said first buffer management table managing said first buffer for each data block size of said storage unit; and

said first buffer management table stores a three-value buffer state indicative of a state in which a data block is being read or has been completely transmitted to said receiving nodes, a state in which said data block has been completely read, or a state in which all data has been completely read.

14. A parallel computer system according toclaim 11, wherein

said install control unit includes a second buffer, said second buffer temporarily buffering said program delivered from said distributing node; and

said install control unit performs in parallel an operation involved in receiving said program from said distributing node and writing said program in said second buffer, and an operation involved in reading said program delivered from said second buffer and writing said program in said storage device of said receiving node itself.

15. A parallel computer system according toclaim 14, wherein

said install control unit includes a second buffer management table, said second buffer management table managing said second buffer for each data block size of said storage unit; and

said second buffer management table stores a three-value buffer state indicative of a state in which said program to be delivered is being received or has been completely written into said disk, a state in which said program to be delivered has been completely received, or a state in which all data has been completely written.

16. A parallel computer system according toclaim 11, wherein

said master install control unit includes a power source control unit, said power source control unit checking whether or not said receiving node is powered on before delivering said program.

17. A medium storing a computer executable installing program for a parallel computer system, said computer executable installing program comprising:

a step of causing a distributing node to initiate an install control program in plural receiving nodes which receive a program;

a step of causing said distributing node to read a program to be delivered in order to be executed in each node, from a storage device of said distributing node itself, and to store the read program in a first buffer;

a step of causing said receiving node to store a received program in a second buffer;

a step of causing said receiving node to read a distributed program from said second buffer, and to store the read program in a storage device of said receiving node itself; and

a step of causing said distributing node to notify all said receiving nodes of completion of program delivery after completing delivery of said program.