CN104077188A

Movatterモバイル変換

Info

Publication number: CN104077188A
Application number: CN201310109076.3A
Authority: CN
Inventors: 刘彦; 戚威
Original assignee: Siemens Corp
Current assignee: Siemens Corp
Priority date: 2013-03-29
Filing date: 2013-03-29
Publication date: 2014-10-01

Abstract

本发明涉及一种用于任务调度的方法和装置，该装置包括：决策模块，用于当接收到计算集群中的计算节点发送的请求分配具有指定类型的任务的消息时，根据所存储的发送所述消息的计算节点处理各个任务所花费的处理时间的信息，从待处理的任务的至少一部分任务中确定属于所述指定类型且发送所述消息的计算节点以前处理其所花费的处理时间最小的任务；以及，分配模块，用于将所述确定的任务分配给发送所述消息的计算节点进行处理。利用该方法和装置，能够提高分布式并行计算的总体性能。

The present invention relates to a method and device for task scheduling. The device includes: a decision-making module, which is used for sending a message according to the stored task when receiving a message requesting to allocate a task of a specified type sent by a computing node in a computing cluster. information on the processing time spent by the computing node of the message processing each task, and determine from at least a part of the tasks to be processed that belong to the specified type and that the computing node that sent the message has spent the smallest processing time before processing it and an allocation module, configured to allocate the determined task to the computing node sending the message for processing. By using the method and device, the overall performance of distributed parallel computing can be improved.

Description

Translated fromChinese

一种用于任务调度的方法和装置A method and device for task scheduling

技术领域technical field

本发明涉及分布式并行计算领域，尤其涉及一种用于任务调度的方法和装置。The invention relates to the field of distributed parallel computing, in particular to a method and device for task scheduling.

背景技术Background technique

MapReduce是谷歌公司（Google）提出的一种新型和有效的分布式并行计算架构，其成为了在云计算时代的最广泛使用的架构。MapReduce架构被设计用于在异质（heterogeneous）计算机集群中进行并行计算，以改善并行计算的总体计算性能。MapReduce is a new and effective distributed parallel computing architecture proposed by Google (Google), which has become the most widely used architecture in the era of cloud computing. The MapReduce architecture is designed for parallel computing in heterogeneous computer clusters to improve the overall computing performance of parallel computing.

在MapReduce架构中，每一个工作（Job）被划分成能在计算集群的多个计算节点上并行运行的多个任务（task），这些任务按照其类型可以分为映射（Map）任务和归约（Reduce）任务。In the MapReduce architecture, each job (Job) is divided into multiple tasks (tasks) that can run in parallel on multiple computing nodes of the computing cluster. These tasks can be divided into mapping (Map) tasks and reduction tasks according to their types. (Reduce) tasks.

对于每一个工作的各个任务，哪一个任务由计算集群中的哪一个计算节点来处理并且什么时候进行处理由任务调度（task scheduling）来确定。因此，任务调度在并行计算中很重要，其能影响并行计算的总体性能。For each task of each job, which task is processed by which computing node in the computing cluster and when it is processed is determined by task scheduling. Therefore, task scheduling is very important in parallel computing, which can affect the overall performance of parallel computing.

发明内容Contents of the invention

本发明的实施例提出一种用于任务调度的方法和装置，其能够提高分布式并行计算的总体性能。Embodiments of the present invention provide a method and device for task scheduling, which can improve the overall performance of distributed parallel computing.

按照本发明实施例的一种用于任务调度的方法，包括：当接收到计算集群中的计算节点发送的请求分配具有指定类型的任务的消息时，根据所存储的发送所述消息的计算节点处理各个任务所花费的处理时间的信息，从待处理的任务的至少一部分任务中确定属于所述指定类型且发送所述消息的计算节点以前处理其所花费的处理时间最小的任务；以及，将所述确定的任务分配给发送所述消息的计算节点进行处理。A method for task scheduling according to an embodiment of the present invention, comprising: when receiving a message requesting assignment of a task of a specified type sent by a computing node in a computing cluster, according to the stored computing node sending the message, processing the information of the processing time spent on each task, determining a task belonging to the specified type and having the minimum processing time previously processed by the computing node sending the message from at least a part of the tasks to be processed; and, The determined task is assigned to the computing node that sends the message for processing.

在一种具体实现中，所述确定步骤包括：根据所存储的所述计算集群中的各个计算节点处理所述待处理的任务所花费的处理时间的信息，统计所述待处理的任务各自的最小处理时间；从所述待处理的任务中，查找出其最小处理时间大于第一指定阈值的任务；以及，根据所存储的关于发送所述消息的计算节点处理各个任务所花费的处理时间的信息，从所述查找的任务中检索出属于所述指定类型且发送所述消息的计算节点以前处理其所花费的处理时间最小的任务，作为所述确定的任务。In a specific implementation, the determining step includes: according to the stored information on the processing time spent by each computing node in the computing cluster to process the task to be processed, count the respective The minimum processing time; from the tasks to be processed, find out the tasks whose minimum processing time is greater than the first specified threshold; and, according to the stored processing time spent on processing each task by the computing node sending the message information, and retrieve a task belonging to the specified type from the searched tasks and which has been processed by the computing node sending the message with the smallest processing time before, as the determined task.

其中，所述第一指定阈值是基于所述待处理的任务的最小处理时间确定的。Wherein, the first specified threshold is determined based on the minimum processing time of the tasks to be processed.

在另一种具体实现中，所述确定步骤包括：根据所存储的所述计算集群中的各个计算节点处理所述待处理的任务所花费的处理时间的信息，统计所述待处理的任务各自的最小处理时间；从所述待处理的任务中，查找出其属于所述指定类型且其最小处理时间大于第二指定阈值的任务；以及，根据所存储的关于发送所述消息的计算节点处理各个任务所花费的处理时间的信息，从所述查找的任务中检索出发送所述消息的计算节点以前处理其所花费的处理时间最小的任务，作为所述确定的任务。In another specific implementation, the determining step includes: according to the stored information on the processing time spent by each computing node in the computing cluster to process the task to be processed, making statistics on each of the tasks to be processed The minimum processing time; from the tasks to be processed, find out the tasks that belong to the specified type and whose minimum processing time is greater than the second specified threshold; and process the message according to the stored computing node sending the message For the information on the processing time spent by each task, retrieve the task with the smallest processing time before the computing node sending the message from the searched tasks as the determined task.

其中，所述第二指定阈值是基于所述待处理的任务中的特定任务的最小处理时间确定的，其中所述特定任务是属于所述指定类型的任务。Wherein, the second specified threshold is determined based on the minimum processing time of a specific task among the tasks to be processed, wherein the specific task is a task belonging to the specified type.

在一种具体实现中，所述还包括：在从所述计算集群中的计算节点接收到关于其处理一个任务所花费的处理时间的信息之后，存储所接收的信息。In a specific implementation, the method further includes: after receiving information about the processing time it takes to process a task from the computing nodes in the computing cluster, storing the received information.

按照本发明实施例的一种用于任务调度的装置，包括：决策模块，用于当接收到计算集群中的计算节点发送的请求分配具有指定类型的任务的消息时，根据所存储的发送所述消息的计算节点处理各个任务所花费的处理时间的信息，从待处理的任务的至少一部分任务中确定属于所述指定类型且发送所述消息的计算节点以前处理其所花费的处理时间最小的任务；以及，分配模块，用于将所述确定的任务分配给发送所述消息的计算节点进行处理。An apparatus for task scheduling according to an embodiment of the present invention includes: a decision-making module, configured to, when receiving a message requesting assignment of a task of a specified type sent by a computing node in a computing cluster, send the task according to the stored Information about the processing time spent by the computing node of the message processing each task, and determine from at least a part of the tasks to be processed that belong to the specified type and that the computing node that sent the message previously spent the least processing time on it a task; and an allocation module, configured to allocate the determined task to the computing node sending the message for processing.

在一种具体实现中，所述决策模块包括：第一统计模块，用于根据所存储的所述计算集群中的各个计算节点处理所述待处理的任务所花费的处理时间的信息，统计所述待处理的任务各自的最小处理时间；第一查找模块，用于从所述待处理的任务中，查找出最小处理时间大于第一指定阈值的任务；以及，第一检索模块，用于根据所存储的关于发送所述消息的计算节点处理各个任务所花费的处理时间的信息，从所述查找的任务中检索出其属于所述指定类型且发送所述消息的计算节点以前处理其所花费的处理时间最小的任务，作为所述确定的任务。In a specific implementation, the decision-making module includes: a first statistics module, configured to, according to the stored information on the processing time spent by each computing node in the computing cluster to process the task to be processed, count all The minimum processing time of each of the tasks to be processed; the first search module is used to find out the tasks whose minimum processing time is greater than the first specified threshold from the tasks to be processed; and the first retrieval module is used to according to Stored information about the processing time spent by the computing node that sent the message processing each task, retrieving from the searched tasks that belong to the specified type and that the computing node that sent the message previously spent processing it The task whose processing time is the smallest is the identified task.

在另一种具体实现中，所述决策模块包括：第二统计模块，用于根据所存储的关于所述计算集群中的各个计算节点处理所述待处理的任务所花费的处理时间的信息，统计所述待处理的任务各自的最小处理时间；第二查找模块，用于从所述待处理的任务中，查找出属于所述指定类型且其最小处理时间大于第二指定阈值的任务；以及，第二检索模块，用于根据所存储的关于发送所述消息的计算节点处理各个任务所花费的处理时间的信息，从所述查找的任务中检索出发送所述消息的计算节点以前处理其所花费的处理时间最小的任务，作为所述确定的任务。In another specific implementation, the decision-making module includes: a second statistics module, configured to, according to the stored information about the processing time spent by each computing node in the computing cluster to process the task to be processed, Counting the respective minimum processing times of the tasks to be processed; a second search module, configured to find, from the tasks to be processed, tasks that belong to the specified type and whose minimum processing time is greater than a second specified threshold; and , a second retrieval module, configured to retrieve from the searched tasks the computing node that sent the message before processing the tasks according to the stored information about the processing time spent by the computing node that sent the message The task that takes the least processing time is taken as the determined task.

在一种具体实现中，所述装置还包括：收集模块，用于在从所述计算集群中的计算节点接收到关于其处理某个任务所花费的处理时间的信息之后，存储所接收的信息。In a specific implementation, the device further includes: a collection module, configured to store the received information after receiving information about the processing time it takes to process a certain task from the computing nodes in the computing cluster .

从上面的描述可以看出，本发明实施例的方案根据所存储的关于各个计算节点处理各个任务所花费的处理时间的信息，来确定其以前处理待处理的任务所花费的处理时间最小的计算节点作为用于处理该待处理的任务的计算节点，从而使得对每一个待处理的任务的处理尽可能地在较短时间内完成，因而提高了分布式并行计算的总体性能。It can be seen from the above description that the solution of the embodiment of the present invention determines the computing node that takes the least processing time to process the task to be processed according to the stored information about the processing time spent by each computing node to process each task. The node serves as a computing node for processing the task to be processed, so that the processing of each task to be processed can be completed in a short time as much as possible, thereby improving the overall performance of distributed parallel computing.

附图说明Description of drawings

本发明的其它特征、特点、优点和益处通过以下结合附图的详细描述将变得更加显而易见。Other features, features, advantages and benefits of the present invention will become more apparent from the following detailed description in conjunction with the accompanying drawings.

图1示出了按照本发明一个实施例的并行计算系统的架构示意图。Fig. 1 shows a schematic diagram of the architecture of a parallel computing system according to an embodiment of the present invention.

图2示出了按照本发明一个实施例的用于任务调度的方法的流程图。Fig. 2 shows a flowchart of a method for task scheduling according to an embodiment of the present invention.

图3A示出了按照本发明一个实施例的用于任务调度的装置的示意图。Fig. 3A shows a schematic diagram of an apparatus for task scheduling according to an embodiment of the present invention.

图3B示出了按照本发明一个实施例的决策模块的示意图。Fig. 3B shows a schematic diagram of a decision module according to an embodiment of the present invention.

图3C示出了按照本发明另一实施例的决策模块的示意图。Fig. 3C shows a schematic diagram of a decision module according to another embodiment of the present invention.

图4示出了按照本发明一个实施例的用于任务调度的设备的示意图。Fig. 4 shows a schematic diagram of a device for task scheduling according to an embodiment of the present invention.

具体实施方式Detailed ways

在实际中，有一些计算工作通常是反复执行的，例如系统日志、操作日志和呼叫详细记录的分析统计工作通常一天、一个月或一个季度等要执行一次。在并行计算的架构下执行这些反复执行的工作时，这些反复执行的工作所划分的各个任务通常也是反复地分配给计算集群中的各个计算节点处理的。本发明实施例的方案收集和存储关于计算集群中的各个计算节点处理各个任务所花费的处理时间的信息，然后在接收到空闲计算节点发送的请求分配具有指定类型的任务的消息时根据所收集的关于该空闲的计算节点处理各个任务所花费的处理时间的信息，从待处理的任务中选取该空闲的计算节点以前处理其所花费的处理时间最少的任务并把该选取的任务分配给该空闲的计算节点来执行处理，从而提高分布式并行计算的总体性能。In practice, some computing work is usually performed repeatedly, such as the analysis and statistics of system logs, operation logs and call detailed records, which are usually performed once a day, a month or a quarter. When these repetitive tasks are executed under the parallel computing architecture, each task divided by these repetitive tasks is usually repeatedly assigned to each computing node in the computing cluster for processing. The solution of the embodiment of the present invention collects and stores information about the processing time spent by each computing node in the computing cluster to process each task, and then when receiving a message from an idle computing node requesting to allocate a task of a specified type, according to the collected Information about the processing time spent by the idle computing node processing each task, select the task that the idle computing node has previously spent the least processing time on from the tasks to be processed, and assign the selected task to the Idle computing nodes are used to perform processing, thereby improving the overall performance of distributed parallel computing.

下面，将结合附图详细本发明的各个实施例。In the following, various embodiments of the present invention will be described in detail with reference to the accompanying drawings.

现在参见图1，其示出了按照本发明一个实施例的并行计算系统的架构示意图。如图1所示，并行计算系统10是属于Mapreduce架构的系统，其包括多个计算节点100和主控节点200，其中各个计算节点100可以通信连接到主控节点200。该多个计算节点100形成了并行计算系统10的计算集群。Referring now to FIG. 1 , it shows a schematic diagram of the architecture of a parallel computing system according to an embodiment of the present invention. As shown in FIG. 1 , the parallel computing system 10 is a system belonging to the Mapreduce architecture, which includes a plurality of computing nodes 100 and a master control node 200 , wherein each computing node 100 can be connected to the master control node 200 by communication. The plurality of computing nodes 100 forms a computing cluster of the parallel computing system 10 .

计算节点100可以是任何具有计算能力的设备，例如但不局限于台式计算机、服务器、笔记本电脑等。计算节点100可以具有处理映射（Map）类型的任务和归约（Reduce）类型的任务中的一种或多种的能力。当计算节点100有空闲时间能够处理某一类型（映射或归纳）T的任务时，计算节点100可以向主控节点200发送请求分配具有类型T的任务的消息，以请求主控节点200分配具有类型T的任务进行处理。当从主控节点200接收到一个分配给其处理的类型T的任务时，计算节点100处理该分配的任务，并在完成该任务的处理之后向主控节点200发送关于其处理该任务所花费的处理时间的信息。Computing node 100 may be any device with computing capability, such as but not limited to desktop computer, server, notebook computer and so on. The computing node 100 may have the ability to process one or more of Map (Map) type tasks and Reduce (Reduce) type tasks. When the computing node 100 has free time to process a task of a certain type (mapping or induction), the computing node 100 can send a message requesting to allocate a task of type T to the master control node 200, so as to request the master control node 200 to allocate a task with Tasks of type T are processed. When receiving a task of type T assigned to it to process from the master control node 200, the computing node 100 processes the assigned task, and after completing the processing of the task, sends to the master control node 200 the processing time information.

主控节点200可以是任何具有计算能力的设备，例如但不局限于台式计算机、服务器、笔记本电脑等。主控节点200收集和存储关于各个计算节点100处理各个任务所花费的处理时间的信息，在初始阶段，主控节点200例如可以尽量将同一任务分配给各个计算节点100进行处理，以便能收集到关于各个计算节点100之前处理各个任务所花费的处理时间的信息。当接收到来自计算集群中的任一计算节点100的请求分配指定类型的任务的消息时，主控节点200可以根据所存储的关于该任一计算节点100处理各个任务所花费的处理时间的信息，从待处理的任务的至少一部分任务中确定其属于该指定类型且该任一计算节点100以前处理其所花费的处理时间最小的任务，并将所确定的任务分配给该任一计算节点100进行处理。The master control node 200 may be any device with computing capability, such as but not limited to a desktop computer, a server, a notebook computer, and the like. The master control node 200 collects and stores information about the processing time spent by each computing node 100 to process each task. In the initial stage, the master control node 200 can, for example, try to assign the same task to each computing node 100 for processing, so as to collect Information about the processing time that each computing node 100 previously spent processing each task. When receiving a message from any computing node 100 in the computing cluster requesting to allocate a task of a specified type, the master control node 200 can process the tasks according to the stored information about the processing time spent by any computing node 100 , determining from at least a part of the tasks to be processed which belong to the specified type and which any computing node 100 has previously processed with the minimum processing time, and assigning the determined task to the any computing node 100 to process.

现在参见图2，其示出了按照本发明一个实施例的用于任务调度的方法的流程图。如图2所示，在步骤S200，当计算集群中的任一计算节点100i有空闲资源处理某一类型T的任务时，计算节点100i向主控节点200发送请求分配具有类型T的任务的消息。其中，类型T可以是映射任务或归纳任务。Referring now to FIG. 2 , it shows a flowchart of a method for task scheduling according to an embodiment of the present invention. As shown in Figure 2, in step S200, when any computing node 100i in the computing cluster has free resources to process a task of a certain type T, the computing node 100i sends a message requesting to allocate a task of type T to the master control node 200 . Among them, the type T can be a mapping task or an inductive task.

在步骤S204，在接收到来自计算节点100i的请求分配具有类型T的任务的消息之后，主控节点200统计待处理的任务K各自的最小处理时间。例如，可以从所存储的各个计算节点100处理各个任务所花费的处理时间中，查找各个计算节点100处理待处理的任务K的每一个所花费的处理时间中最小的处理时间，作为待处理的任务K的每一个的最小处理时间。In step S204, after receiving the message requesting to allocate a task of type T from the computing node 100i, the master control node 200 counts the respective minimum processing time of the tasks K to be processed. For example, from the stored processing time spent by each computing node 100 to process each task, the minimum processing time among the processing times spent by each computing node 100 to process each task K to be processed may be found, and used as the processing time to be processed The minimum processing time for each of task K.

在步骤S208，主控节点200计算待处理的任务K各自的最小处理时间中的最大的最小处理时间和最小的最小处理时间的加权平均值，作为指定阈值YZ。In step S208, the master control node 200 calculates the weighted average of the largest minimum processing time and the smallest minimum processing time among the respective minimum processing times of tasks K to be processed, as the specified threshold YZ.

在步骤S212，主控节点200从待处理的任务K中查找出其最小处理时间大于指定阈值YZ的任务。In step S212, the master control node 200 finds out the tasks whose minimum processing time is greater than the specified threshold YZ from the tasks K to be processed.

在步骤S216，主控节点200从查找的任务中搜索出类型属于类型T的任务。In step S216, the master control node 200 searches for tasks whose type belongs to type T from the searched tasks.

在步骤S220，主控节点200根据所存储的关于计算节点100i处理各个任务所花费的处理时间的信息，从所搜索的任务中检索出计算节点100i以前处理其花费的处理时间最小的任务Ki。也就是说，在所搜索的任务中，计算节点100i以前处理任务Ki所花费的处理时间是最小的。In step S220, the master control node 200 retrieves from the searched tasks Ki the task Ki that the computing node 100i previously spent the smallest processing time on processing according to the stored information about the processing time spent by the computing node 100i on processing each task. That is to say, among the searched tasks, the processing time spent by the computing node 100i to process the task Ki before is the smallest.

在步骤S224，主控节点200把所检索的任务Ki分配给计算节点100i。In step S224, the master control node 200 assigns the retrieved task Ki to the computing node 100i.

在步骤S228，在从主控节点200接收到所分配的任务Ki之后，计算节点100i处理任务Ki。In step S228, after receiving the assigned task Ki from the master control node 200, the computing node 100i processes the task Ki.

在步骤S232，在完成任务Ki的处理之后，计算节点100i向主控节点200发送关于计算节点100i处理任务Ki所花费的处理时间的信息。In step S232, after completing the processing of the task Ki, the computation node 100i sends information about the processing time spent by the computation node 100i to process the task Ki to the master control node 200.

在步骤S236，主控节点200存储所接收的关于计算节点100i处理任务Ki所花费的处理时间的信息，以更新以前所存储的关于计算节点100i处理任务Ki所花费的处理时间的信息。In step S236, the master control node 200 stores the received information about the processing time spent by the computing node 100i processing the task Ki to update the previously stored information about the processing time spent by the computing node 100i processing the task Ki.

从以上的描述可以看出，本实施例的方案存储各个计算节点处理各个任务所花费的处理时间，然后在出现空闲的计算节点时根据所存储的该空闲的计算节点处理各个任务所花费的处理时间，从待处理的任务中选取该空闲的计算节点以前处理其所花费的处理时间最少的任务并把所选取的任务分配给该空闲的计算节点来处理，从而使得每一个任务尽可能地在较短时间内被处理，因而提高了分布式并行计算的总体性能。It can be seen from the above description that the solution of this embodiment stores the processing time spent by each computing node to process each task, and then when there is an idle computing node, according to the stored processing time spent by the idle computing node to process each task time, from the tasks to be processed, select the task that the idle computing node has spent the least processing time before and assign the selected task to the idle computing node for processing, so that each task can be processed as quickly as possible It is processed in a shorter time, thus improving the overall performance of distributed parallel computing.

其它变型other variants

本领域技术人员应当理解，虽然在上面的实施例中，指定阈值YZ是待处理的任务K各自的最小处理时间中的最大的最小处理时间和最小的最小处理时间的加权平均值，然而，本发明并不局限于此。在本发明的其它一些实施例中，也可以利用其它任何合适的方式基于待处理的任务K的最小处理时间来计算指定阈值YZ。例如，可以计算待处理的任务K各自的最小处理时间的加权平均值作为指定阈值YZ。又例如，可以使用待处理的任务K中其最小处理时间处于中间位置的任务的最小处理时间作为指定阈值YZ。Those skilled in the art should understand that although in the above embodiment, the specified threshold YZ is the weighted average of the largest minimum processing time and the smallest minimum processing time among the respective minimum processing times of tasks K to be processed, however, this The invention is not limited thereto. In some other embodiments of the present invention, any other suitable manner may also be used to calculate the specified threshold YZ based on the minimum processing time of the task K to be processed. For example, the weighted average of the minimum processing times of the tasks K to be processed may be calculated as the specified threshold YZ. For another example, the minimum processing time of a task whose minimum processing time is in the middle among the tasks K to be processed may be used as the specified threshold YZ.

本领域技术人员应当理解，虽然在上面的实施例中，在步骤S208-S220，主控节点200首先计算待处理的任务K各自的最小处理时间中的最大的最小处理时间和最小的最小处理时间的加权平均值作为指定阈值YZ，然后从待处理的任务K中查找出其最小处理时间大于指定阈值YZ的任务，接着从查找的任务中搜索出其类型属于类型T的任务，最后从所搜索的任务中检索出计算节点100i以前处理其花费的处理时间最小的任务Ki，然而，本发明并不局限于此。Those skilled in the art should understand that, although in the above embodiment, in steps S208-S220, the master control node 200 first calculates the largest minimum processing time and the smallest minimum processing time among the respective minimum processing times of tasks K to be processed The weighted average value of is used as the specified threshold YZ, and then find out the tasks whose minimum processing time is greater than the specified threshold YZ from the pending tasks K, and then search for tasks whose type belongs to the type T from the searched tasks, and finally from the searched tasks Among the tasks, the task Ki that takes the least processing time to be processed by the computing node 100i before is retrieved, however, the present invention is not limited thereto.

在本发明的其它一些实施例中，也可以使用以下步骤来替代步骤S208-S220：主控节点200首先从待处理的任务K中搜索出其类型属于类型T的任务KP，然后计算所搜索的任务KP的最小处理时间中的最大的最小处理时间和最小的最小处理时间的加权平均值作为指定阈值TH，接着从所搜索的任务KP中查找出其最小处理时间大于指定阈值TH的任务KPP，最后从所查找的任务KPP中检索出计算节点100i以前处理其花费的处理时间最小的任务Ki，作为分配给计算节点100i处理的任务。In some other embodiments of the present invention, the following steps can also be used instead of steps S208-S220: the master control node 200 first searches for tasks KP whose type belongs to type T from the tasks K to be processed, and then calculates the searched The weighted average of the maximum minimum processing time and the minimum minimum processing time in the minimum processing time of the task KP is used as the specified threshold TH, and then the task KPP whose minimum processing time is greater than the specified threshold TH is found from the searched tasks KP, Finally, from the searched tasks KPP, the task Ki that takes the least processing time to be processed by the computing node 100i before is retrieved as the task assigned to the computing node 100i for processing.

这里，还可以利用其它任何合适的方式基于所搜索的任务KP的最小处理时间来计算指定阈值TH。例如，可以计算所搜索的任务KP的最小处理时间的加权平均值作为指定阈值TH。又例如，可以使用所搜索的任务KP中其最小处理时间处于中间位置的任务的最小处理时间作为指定阈值TH。Here, the specified threshold TH can also be calculated based on the searched minimum processing time of the task KP in any other suitable manner. For example, a weighted average of the minimum processing times of the searched tasks KP may be calculated as the specified threshold TH. For another example, the minimum processing time of a task whose minimum processing time is at an intermediate position among the searched tasks KP may be used as the designated threshold TH.

本领域技术人员应当理解，虽然在上面的实施例中，从待处理的任务K中的其最小处理时间大于指定阈值的任务中选取合适的任务分配给空闲的计算节点100i，然而，本发明并不局限于此。在本发明的其它一些实施例中，也可以从待处理的任务K的所有任务中选取合适的任务分配给空闲的计算节点100i。Those skilled in the art should understand that although in the above embodiment, an appropriate task is selected from among the tasks K to be processed and whose minimum processing time is greater than a specified threshold, and assigned to the idle computing node 100i, however, the present invention does not It is not limited to this. In some other embodiments of the present invention, an appropriate task may also be selected from all tasks of the task K to be processed and assigned to the idle computing node 100i.

本领域技术人员应当理解，虽然在上面的实施例中，任务被分成映射类型和归约类型，然而，本发明并不局限于此。在本发明的其它一些实施例中，任务的类型可以根据需要任意划分。Those skilled in the art should understand that although in the above embodiments, tasks are divided into mapping type and reduction type, the present invention is not limited thereto. In some other embodiments of the present invention, the types of tasks can be arbitrarily divided according to needs.

本领域技术人员应当理解，虽然在上面的实施例中，并行计算系统10是属于Mapreduce架构的系统，然而，本发明并不局限于此。在本发明的其它一些实施例中，并行计算系统10可以是任何将任务划分成多个子任务并分配给计算集群中的各个计算节点处理的并行计算系统。Those skilled in the art should understand that although in the above embodiments, the parallel computing system 10 is a system belonging to the Mapreduce architecture, the present invention is not limited thereto. In some other embodiments of the present invention, the parallel computing system 10 may be any parallel computing system that divides a task into multiple subtasks and distributes them to each computing node in the computing cluster for processing.

现在参见图3A，其示出了按照本发明一个实施例的用于任务调度的装置的示意图。图3A所示的装置可以安装在主控节点200中，并且可以利用软件、硬件（例如集成电路或FPGA等）或软硬件结合的方式来实现。Referring now to FIG. 3A , it shows a schematic diagram of an apparatus for task scheduling according to an embodiment of the present invention. The device shown in FIG. 3A can be installed in the master control node 200, and can be realized by software, hardware (such as an integrated circuit or FPGA, etc.) or a combination of software and hardware.

如图3A所示，用于任务调度的装置300可以包括决策模块310和分配模块320。其中，决策模块310用于用于当接收到计算集群中的计算节点发送的请求分配具有指定类型的任务的消息时，根据所存储的关于发送所述消息的计算节点处理各个任务所花费的处理时间的信息，从待处理的任务的至少一部分任务中确定其属于所述指定类型且发送所述消息的计算节点以前处理其所花费的处理时间最小的任务。分配模块320用于用于将所述确定的任务分配给发送所述消息的计算节点进行处理。As shown in FIG. 3A , an apparatus 300 for task scheduling may include a decision module 310 and an assignment module 320 . Wherein, the decision-making module 310 is configured to, when receiving a message requesting assignment of a task of a specified type sent by a computing node in the computing cluster, process each task according to the stored information about the computing node sending the message. time information, and determine from at least a part of the tasks to be processed which belong to the specified type and which have been processed by the computing node sending the message with the smallest processing time before. The assignment module 320 is configured to assign the determined task to the computing node sending the message for processing.

其中，在一种具体实现方式中，如图3B所示，决策模块310可以包括第一统计模块311、第一查找模块312和第一检索模块313。其中，第一估计模块311用于根据所存储的关于所述计算集群中的各个计算节点处理所述待处理的任务所花费的处理时间的信息，统计所述待处理的任务各自的最小处理时间。第一查找模块312用于从所述待处理的任务中，查找出其最小处理时间大于第一指定阈值的任务。第一检索模块313用于根据所存储的关于发送所述消息的计算节点处理各个任务所花费的处理时间的信息，从所述查找的任务中检索出其属于所述指定类型且发送所述消息的计算节点以前处理其所花费的处理时间最小的任务，作为所述确定的任务。Wherein, in a specific implementation manner, as shown in FIG. 3B , the decision-making module 310 may include a first statistics module 311 , a first search module 312 and a first retrieval module 313 . Wherein, the first estimating module 311 is configured to count the minimum processing time of each of the tasks to be processed according to the stored information about the processing time spent by each computing node in the computing cluster to process the tasks to be processed . The first search module 312 is configured to find out the tasks whose minimum processing time is greater than a first specified threshold from the tasks to be processed. The first retrieval module 313 is configured to retrieve from the searched tasks that belong to the specified type and send the message according to the stored information about the processing time spent by the computing node that sends the message to process each task The computing node previously processed the task whose processing time was the smallest, as the determined task.

其中，所述第一指定阈值可以是基于所述待处理的任务的最小处理时间确定的。Wherein, the first specified threshold may be determined based on the minimum processing time of the task to be processed.

在另一具体实现方式中，如图3C所示，决策模块310可以包括第二统计模块315、第二查找模块316和第二检索模块317。第二统计模块315用于根据所存储的关于所述计算集群中的各个计算节点处理所述待处理的任务所花费的处理时间的信息，统计所述待处理的任务各自的最小处理时间。第二查找模块316用于从所述待处理的任务中，查找出其属于所述指定类型且其最小处理时间大于第二指定阈值的任务。第二检索模块317用于根据所存储的关于发送所述消息的计算节点处理各个任务所花费的处理时间的信息，从所述查找的任务中检索出发送所述消息的计算节点以前处理其所花费的处理时间最小的任务，作为所述确定的任务。In another specific implementation manner, as shown in FIG. 3C , the decision-making module 310 may include a second statistics module 315 , a second search module 316 and a second retrieval module 317 . The second statistics module 315 is configured to count the minimum processing time of each of the tasks to be processed according to the stored information about the processing time spent by each computing node in the computing cluster to process the tasks to be processed. The second searching module 316 is configured to find, from the pending tasks, tasks that belong to the specified type and whose minimum processing time is greater than a second specified threshold. The second retrieval module 317 is configured to, according to the stored information about the processing time spent by the computing node that sent the message to process each task, retrieve from the searched tasks the computing node that sent the message previously processed the task. The task that takes the least processing time is taken as the determined task.

其中，所述第二指定阈值可以是基于所述待处理的任务中的特定任务的最小处理时间确定的，其中所述特定任务是属于所述指定类型的任务。Wherein, the second specified threshold may be determined based on the minimum processing time of a specific task among the tasks to be processed, wherein the specific task is a task belonging to the specified type.

其中，装置300还可以包括收集模块330，用于在从所述计算集群中的计算节点接收到关于其处理某个任务所花费的处理时间的信息之后，存储所接收的信息。Wherein, the apparatus 300 may further include a collection module 330, configured to store the received information after receiving information about the processing time it takes to process a certain task from the computing nodes in the computing cluster.

现在参见图4，其示出了按照本发明一个实施例的用于任务调度的设备的示意图。如图4所示，设备400可以包括用于存储可执行指令的存储器410和处理器420。其中，处理器420根据存储器410所存储的可执行指令，执行装置300的各个模块所执行的操作。Referring now to FIG. 4 , it shows a schematic diagram of a device for task scheduling according to an embodiment of the present invention. As shown in FIG. 4, device 400 may include a memory 410 and a processor 420 for storing executable instructions. Wherein, the processor 420 executes the operations executed by each module of the device 300 according to the executable instructions stored in the memory 410 .

本发明实施例还提供一种计算节点可读介质，其上存储可执行指令，当该可执行指令被执行时，使得机器执行处理器420所执行的操作。The embodiment of the present invention also provides a computing node readable medium on which executable instructions are stored, and when the executable instructions are executed, the machine executes the operations executed by the processor 420 .

本领域技术人员应当理解，上面公开的各个实施例可以在不偏离发明实质的情况下做出各种变形和修改。因此，本发明的保护范围应当由所附的权利要求书来限定。Those skilled in the art should understand that various variations and modifications can be made to the above-disclosed embodiments without departing from the essence of the invention. Therefore, the protection scope of the present invention should be defined by the appended claims.