CN106155799B

Movatterモバイル変換

Info

Publication number: CN106155799B
Application number: CN201610628188.3A
Authority: CN
Inventors: 裴颂文; 王金恺
Original assignee: University of Shanghai for Science and Technology
Current assignee: University of Shanghai for Science and Technology
Priority date: 2016-08-03
Filing date: 2016-08-03
Publication date: 2019-07-23
Anticipated expiration: 2036-08-03
Also published as: CN106155799A

Abstract

The present invention relates to a kind of Codelet dispatching method based on genetic algorithm, based on Codelet data-flow computation model, genetic algorithm dispatching method based on multi-core processor parallel system is combined with the task schedule of Codelet model, the dispatching method is when solving Mission Scheduling, using the coding mode of explicit two-dimensional array, by the genetic operators operation such as hybridize, make a variation, using two new arrays of arrays " fusion " generation as follow-on solution.The dispatching method can effectively improve the parallel execution efficiency of task, reduce the free time of execution unit, improve the resource utilization of multi-processor system-on-chip.This method can also be extended to other kinds of data flow model, realize parallel execution of the task in multiple nucleus system.

Description

Translated fromChinese

基于遗传算法的Codelet调度方法A Codelet Scheduling Method Based on Genetic Algorithm

技术领域technical field

本发明涉及一种多核数据流计算机系统任务调度方法的设计，特别涉及一种基于遗传算法的Codele调度方法。The invention relates to the design of a task scheduling method for a multi-core data flow computer system, in particular to a Codele scheduling method based on genetic algorithm.

背景技术Background technique

长期以来，主流的计算机系统执行模型是以控制流为中心的冯诺依曼计算模型。该模型的特点是将一连串顺序执行的指令放入同一地址空间，并由一个单独的程序计数器控制。随着分布式存储计算机模型和多核系统的涌现，并行计算和并行访存成了异构多核计算机系统的基本要求，大数据算法的应用进一步推动了大规模数据流计算模型的发展。For a long time, the mainstream computer system execution model is the Von Neumann computational model centered on the control flow. This model is characterized by placing a series of sequentially executed instructions into the same address space, controlled by a single program counter. With the emergence of distributed storage computer models and multi-core systems, parallel computing and parallel memory access have become the basic requirements of heterogeneous multi-core computer systems, and the application of big data algorithms has further promoted the development of large-scale data flow computing models.

数据流计算模型以其对算法支持的高弹性，较强的系统扩展性，性能功耗比高等特点，被广泛认为是以控制流为中心的传统并行计算模型的潜在替代方案。数据流计算模型是一种天然的、由数据驱动的并行执行模型，数据流的程序逻辑是基于数据流图表达的。当某个程序单元的所有输入数据均就绪时，该程序单元就会被激活。随后所需硬件资源都满足的情况下，该程序单元将在运行时(Runtime)环境下自动执行。这种基于数据流的并行处理模型符合当前多核计算系统基础架构的发展方向，可以有效提高大数据应用的规模化执行效率。The data flow computing model is widely regarded as a potential alternative to the traditional parallel computing model centered on control flow due to its high flexibility for algorithm support, strong system scalability, and high performance-to-power consumption ratio. The data flow computing model is a natural, data-driven parallel execution model, and the program logic of the data flow is expressed based on the data flow graph. A program unit is activated when all its input data are ready. When all required hardware resources are satisfied subsequently, the program unit will be automatically executed in a runtime (Runtime) environment. This data flow-based parallel processing model conforms to the development direction of the current multi-core computing system infrastructure, and can effectively improve the large-scale execution efficiency of big data applications.

针对特拉华大学相关学者提出的一种新型的数据流并行执行模型Codelet，本发明提出一种基于遗传算法的Codelet调度方法。Codelet模型是一种细粒度的、由事件驱动的混合式并行执行模型，由基本的执行单元Codelet和充当容器的线程程序(TreadedProcedure)组成。这种按等级划分的执行模型，采用Codelet大规模并行执行的方式继承了数据流计算模型高并发的特点，还使用线程程序弥补了数据流计算模型空间局部性差的缺陷，成为当前数据流计算模型发展的典型代表。Codelet模型的运行时DARTS提供了三种原生的调度方法，分别为静态调度方法、动态调度方法以及密取(Work Stealing)调度方法。静态调度方法采用轮询的方式依次将激活的Codelet节点分配给各调度器，也可通过人为地设置Codelet的元数据来指定执行Codelet的调度器；动态调度方法将激活的Codelet全部放入线程程序调度器的就绪池(RP)中，调度器空闲时会统一地从该就绪池(RP)中取Codelet；密取调度方法与动态调度方法类似，当某个调度器空闲时，会随机地从其他调度器的就绪池(RP)中取Codelet。以上三种原生的Codelet调度方法的Codelet调度效率低、系统可扩展性差。Aiming at Codelet, a new type of data flow parallel execution model proposed by relevant scholars at the University of Delaware, the present invention proposes a Codelet scheduling method based on genetic algorithm. The Codelet model is a fine-grained, event-driven hybrid parallel execution model, which consists of a basic execution unit Codelet and a thread program (TreadedProcedure) acting as a container. This hierarchical execution model adopts the large-scale parallel execution of Codelet to inherit the high concurrency characteristics of the data flow computing model, and also uses thread programs to make up for the defect of poor spatial locality of the data flow computing model, becoming the current data flow computing model. A typical representative of development. The runtime DARTS of the codelet model provides three native scheduling methods, namely static scheduling, dynamic scheduling, and Work Stealing scheduling. The static scheduling method adopts the polling method to assign the activated Codelet nodes to each scheduler in turn, and can also specify the scheduler that executes the Codelet by manually setting the metadata of the Codelet; the dynamic scheduling method puts all the activated Codelets into the thread program. In the ready pool (RP) of the scheduler, when the scheduler is idle, the codelet will be taken from the ready pool (RP) uniformly; the secret access scheduling method is similar to the dynamic scheduling method. The codelet is taken from the ready pool (RP) of other schedulers. The above three native codelet scheduling methods have low codelet scheduling efficiency and poor system scalability.

有效提高Codelet程序单元执行的高并发度是提高该系统性能的重要途径。因此，本发明提出了一种基于遗传算法的Codelet调度方法，相比于Codelet模型原生的三种调度方法，实验证明本调度方法具有明显的效果。Effectively improving the high concurrency of Codelet program unit execution is an important way to improve the performance of the system. Therefore, the present invention proposes a codelet scheduling method based on a genetic algorithm. Compared with the three original scheduling methods of the Codelet model, experiments have proved that the scheduling method has obvious effects.

发明内容SUMMARY OF THE INVENTION

本发明是针对Codelet模型原生的静态调度和动态调度适应性差、系统并发执行度低的问题，提出了一种基于遗传算法的Codelet调度方法，以Codelet数据流计算模型为基础，设计了一种基于遗传算法的Codelet调度方法，并应用于实际的Codelet模型，该调度方法在求解任务调度问题时，采用显式二维数组的编码方式，通过杂交、变异等遗传算子操作，将两个数组“融合”产生新数组作为下一代的解。实验验证了该方法能提高多核并行系统任务执行的效率和硬件资源的利用率，证实了该方法的有效性。Aiming at the problems of poor adaptability of static scheduling and dynamic scheduling native to the Codelet model and low degree of concurrent execution of the system, the invention proposes a Codelet scheduling method based on genetic algorithm. The Codelet scheduling method of the genetic algorithm is applied to the actual Codelet model. When solving the task scheduling problem, the scheduling method adopts the coding method of an explicit two-dimensional array, and combines the two arrays through the operation of genetic operators such as hybridization and mutation. Fusion" produces a new array as the solution for the next generation. Experiments show that the method can improve the efficiency of multi-core parallel system task execution and the utilization of hardware resources, which proves the effectiveness of the method.

本发明的技术方案为：一种基于遗传算法的Codelet调度方法，具体包括如下步骤：The technical scheme of the present invention is: a codelet scheduling method based on genetic algorithm, which specifically includes the following steps:

1)导入一个Codelet数据流图，计算codelet的深度，并将任务转换成二维矩阵，计算调度时间：1) Import a codelet data flow graph, calculate the depth of the codelet, convert the task into a two-dimensional matrix, and calculate the scheduling time:

一个Codelet数据流图包括A Codelet data flow diagram consists of

A、n个Codelet节点V＝{V₀,V₁,…,V_n-1}；A. n Codelet nodes V={V₀ , V₁ ,...,V_n-1 };

B、各个Codelet节点间的有向边依赖关系，用<V_i,V_j>表示结点V_i与V_j的依赖关系，V_i是V_j的前继，V_j是V_i的后继，即V_i∈PRED(V_j)，V_j∈SUCC(V_i)；B. The directed edge dependency between each Codelet node, use <V_i , V_j > to represent the dependency between node_Vi and V_j , Vi is the_{predecessor of V j, V j}_is_the successor of_Vi , That is, V_i ∈PRED(V_j ), V_j ∈SUCC(V_i );

C、每个Codelet节点的运行时间，Rt(V_i)表示运行结点V_i所花费的时间；C. The running time of each Codelet node, Rt(V_i ) represents the time spent running the node V_i ;

D、每个Codelet节点的深度，结点V_i的深度0≤d≤d_m，d_m是Codelet数据流图中最大的深度；D. The depth of each Codelet node, the depth of node_Vi 0≤d≤d_m , d_m is the maximum depth in the Codelet data flow graph;

二维矩阵：2D matrix:

一个任务对应一个Codelet数据流图，m核处理器系统上执行一个任务，该m核处理器系统的每个处理器核心分别对应一个调度器，共m个调度器，每个调度器内有根据调度任务顺序结合Codelet数据流图中节点依赖关系的任务执行序列，Q_i表示第i个调度器中所包含的任务执行序列，即Q_i＝[V_1i,V_2i,…,V_ki]^T，其中k为该调度器依次执行节点的序列长度；A task corresponds to a codelet data flow diagram, and a task is executed on an m-core processor system. Each processor core of the m-core processor system corresponds to a scheduler, a total of m schedulers, and each scheduler has a basis The order of scheduling tasks is combined with the task execution sequence of node dependencies in the codelet data flow graph_{. Qi represents the task execution sequence included in the i-th scheduler, that is, Qi = [V 1i}_,_V_2i ,...,V_ki ]^T , where k is the sequence length of the nodes that the scheduler executes in turn;

调度S表示成一个二维数组矩阵，即S＝[Q₁,Q₂,…,Q_m]；The schedule S is represented as a two-dimensional array matrix, that is, S=[Q₁ , Q₂ ,...,Q_m ];

调度时间：Schedule time:

根据调度S结合任务中每个Codelet节点运行时间和深度，计算出系统按照调度S执行所有Codelet节点所花费的总时间T(S)；According to the schedule S combined with the running time and depth of each Codelet node in the task, calculate the total time T(S) that the system takes to execute all Codelet nodes according to the schedule S;

2)随机生成初始解群POP(t)，t＝0：2) Randomly generate the initial solution group POP(t), t=0:

初始解群采用平均分配的方式随机生成，一个解对应一个调度器，平均分配表示每个调度器上的任务队列所包含的Codelet节点数量基本相同，从而缩小完成每个调度器上任务队列所需时间的差距；The initial solution group is randomly generated by the method of average distribution. One solution corresponds to one scheduler. The average distribution means that the number of Codelet nodes contained in the task queue on each scheduler is basically the same, thereby reducing the number of codelet nodes required to complete the task queue on each scheduler. time gap;

3)根据选定的适应函数，计算初始解群各解的适应值：3) Calculate the fitness value of each solution of the initial solution group according to the selected fitness function:

适应函数定义为f(S)＝c-T(S)，其中，c设定为所有Codelet预执行时间的总和，即The adaptation function is defined as f(S)=cT(S), where c is set to the sum of all codelet pre-execution times, i.e.

4)检测是否满足迭代收敛中止条件，若满足跳至步骤10)，否则执行步骤5)；4) Detect whether the iterative convergence termination condition is met, if so, skip to step 10), otherwise, execute step 5);

5)使用轮盘赌，选择杂交的母体：5) Using the roulette wheel, choose the parent to cross:

计算解群中各个解的适应值，并相加得出适应值总和；计算个体解的适应值与适应值总和的比值，得出的比值为选择个体的概率值；根据概率值从解群中选择适应强的两个个体解作为杂交操作的父代个体；Calculate the fitness value of each solution in the solution group, and add them to get the sum of fitness values; calculate the ratio of the fitness value of the individual solution to the sum of fitness values, and the obtained ratio is the probability value of selecting an individual; according to the probability value, from the solution group Choose two individual solutions with strong adaptation as the parent individuals of the hybrid operation;

6)根据经验设定概率Pc，根据步骤5)选定的个体的概率值大于Pc，则执行外部杂交操作，小于Pc执行内部杂交操作；6) set probability Pc according to experience, according to step 5) the probability value of the selected individual is greater than Pc, then perform external hybridization operation, and perform internal hybridization operation less than Pc;

7)根据经验设定概率Pm，以概率Pm执行变异操作；7) Set probability Pm according to experience, and perform mutation operation with probability Pm;

8)使用精英策略，保留原有解群中的最优解：8) Use the elite strategy to retain the optimal solution in the original solution group:

保留原解群中适应值最高的解，记为B(t)＝Best(POP(t))；计算新解群中个体解的适应值，记新解群中的最优解为B(t+1)＝Best(POP(t+1))；将POP(t+1)中的最差解替换成POP(t)中的最优解；Retain the solution with the highest fitness value in the original solution group, denoted as B(t)=Best(POP(t)); calculate the fitness value of the individual solutions in the new solution group, and denote the optimal solution in the new solution group as B(t +1)=Best(POP(t+1)); replace the worst solution in POP(t+1) with the optimal solution in POP(t);

9)将最优解作为新的解群POP(t+1)，t＝t+1，计算新的解群各解的适应值，返回执行步骤4)；9) Take the optimal solution as a new solution group POP(t+1), t=t+1, calculate the fitness value of each solution of the new solution group, and return to step 4);

10)输出近似最优解。10) Output the approximate optimal solution.

所述步骤6)中外部杂交操作步骤如下：In the described step 6), the external hybridization operation steps are as follows:

执行交换操作的部分是所有调度之间的所有任务序列：The part that performs the swap operation is all the task sequences between all the schedules:

(1)随机选取两个调度S1和S2；(1) Randomly select two schedules S1 and S2;

(2)随机生成两个数d’₁和d’₂作为杂交点的深度，且0<＝d’₁,d’₂<＝d_m；(2) randomly generate two numbers d'₁ and d'₂ as the depth of the hybridization point, and 0 <= d'₁ , d'₂ <= d_m ;

(3)由d’₁和d’₂确定交叉操作的方式和位置；(3) The mode and position of the crossover operation are determined by d'₁ and d'₂;

(4)若d’₁与d’₂相等，执行两点交叉操作，将调度S1与S2中两个杂交点之间的部分相互交换；(4) if d'₁ and d'₂ are equal, perform a two-point crossover operation, and exchange the parts between the two hybridization points in scheduling S1 and S2;

(5)若d’₁与d’₂不等，执行单点交叉操作，将调度S1与S2杂交点后的部分相互交换。(5) If d'₁ and d'₂ are not equal, a single-point crossover operation is performed, and the parts after the crossover point of schedule S1 and S2 are exchanged.

所述步骤6)中内部杂交操作步骤如下：In the described step 6), the internal hybridization operation steps are as follows:

执行交换操作的部分是每个调度器中的所有任务序列之间执行交换操作：The part that performs the swap operation is to perform the swap operation between all task sequences in each scheduler:

Ⅰ、从一个调度S中随机选取两个任务序列Q1和Q2；1. Randomly select two task sequences Q1 and Q2 from a schedule S;

Ⅱ、随机生成两个数d’₁和d’₂作为杂交点的深度，且0<＝d’₁,d’₂<＝d_m；II. Two numbers d'₁ and d'₂ are randomly generated as the depth of the hybridization point, and 0 <= d'₁ , d'₂ <= d_m ;

Ⅲ、由d’₁和d’₂确定交叉操作的方式和位置；Ⅲ. The mode and position of the crossover operation are determined by d'₁ and d'₂;

Ⅳ、若d’₁与d’₂相等，执行两点交叉操作，将任务序列Q1与Q2中两个杂交点之间的部分相互交换；IV. If d'₁ and d'₂ are equal, perform a two-point crossover operation to exchange the parts between the two hybridization points in the task sequence Q1 and Q2;

Ⅴ、若d’₁与d’₂不等，执行单点交叉操作，将任务序列Q1与Q2中杂交点后的部分相互交换。V. If d'₁ and d'₂ are not equal, perform a single-point crossover operation to exchange the parts after the crossover point in the task sequence Q1 and Q2.

所述步骤7)中变异操作：随机交换一个调度中同一深度下的两个结点，随机生成一个数d’，使0<d’<d_m；随机选择两个深度为d’的结点；交换两个结点，产生新解。Mutation operation in the step 7): randomly exchange two nodes under the same depth in a scheduling, randomly generate a number d', so that 0<d'<d_m ; randomly select two nodes with a depth of d'; Swap two nodes to generate a new solution.

本发明的有益效果在于：本发明基于遗传算法的Codelet调度方法，将基于多核处理器并行系统的遗传算法调度方法与Codelet模型的任务调度相结合，是一种能够适应多核计算系统的并行任务调度方法。该调度方法可有效提高任务的并行执行效率，降低执行单元的空闲时间，提高片上多核系统的资源利用率。该方法也可以扩展应用到其他类型的数据流模型，实现任务在多核系统中的并行执行。The beneficial effects of the present invention are: the genetic algorithm-based Codelet scheduling method of the present invention combines the genetic algorithm scheduling method based on the multi-core processor parallel system with the task scheduling of the Codelet model, which is a parallel task scheduling that can adapt to the multi-core computing system method. The scheduling method can effectively improve the parallel execution efficiency of the task, reduce the idle time of the execution unit, and improve the resource utilization rate of the on-chip multi-core system. The method can also be extended and applied to other types of data flow models to achieve parallel execution of tasks in multi-core systems.

附图说明Description of drawings

图1为本发明Codelet模型示意图；Fig. 1 is the schematic diagram of Codelet model of the present invention;

图2为本发明对应的算法流程图；Fig. 2 is the algorithm flow chart corresponding to the present invention;

图3为本发明任务在单片双核系统上的Codelet调度图；Fig. 3 is the Codelet scheduling diagram of the task of the present invention on the single-chip dual-core system;

图4为本发明调度算法中的外部杂交操作图；Fig. 4 is the external hybridization operation diagram in the scheduling algorithm of the present invention;

图5为本发明调度算法中的内部杂交操作图；Fig. 5 is the internal hybridization operation diagram in the scheduling algorithm of the present invention;

图6为本发明调度算法中的变异交换操作图。FIG. 6 is a diagram of a mutation exchange operation in the scheduling algorithm of the present invention.

具体实施方式Detailed ways

本发明基于Codelet模型作为任务在多核并行系统中的基础执行模型。Codelet是包含一连串机器指令的计算单元或程序的片段，对应于数据流图中的一个执行节点。Codelet模型是由执行单元Codelet及线程程序TP组成。以图1Codelet模型的示意图为例，S和E分别表示Codelet图的开始节点和结束节点，A、B、C、D分别是任务执行过程中间的程序单元节点；线程程序是每个调度器中所有Codelet节点的容器，负责调度器内的数据共享以及调度器间的任务协调。Codelet节点执行的语义为：当且仅当Codelet的同步接口对应的依赖数据均就绪时，Codelet节点被自动激活，并在硬件资源满足的情况下由Codelet任务调度器控制执行Codelet节点内的程序片段(指令序列)。该模型通过将任务转换成包含多重依赖关系的Codelet数据流图，并将Codelet节点按照某种指定的调度方法分配至每个调度器对应的任务队列中。The invention is based on the Codelet model as the basic execution model of tasks in the multi-core parallel system. A codelet is a computational unit or a fragment of a program that contains a sequence of machine instructions, corresponding to an execution node in a dataflow graph. Codelet model is composed of execution unit Codelet and thread program TP. Taking the schematic diagram of the Codelet model in Figure 1 as an example, S and E represent the start node and end node of the Codelet diagram, respectively, A, B, C, and D are the program unit nodes in the middle of the task execution process; the thread program is all in each scheduler. The container of the codelet node is responsible for data sharing within the scheduler and task coordination between schedulers. The semantics of codelet node execution is: if and only when the dependent data corresponding to the synchronization interface of the codelet is ready, the codelet node is automatically activated, and the codelet task scheduler controls the execution of the program fragments in the codelet node when the hardware resources are satisfied. (instruction sequence). This model converts tasks into a Codelet data flow graph containing multiple dependencies, and assigns Codelet nodes to the task queue corresponding to each scheduler according to a specified scheduling method.

本发明的核心思想是将某个任务分割成多个子任务，并表示成一个有向无环的数据流图；再将该数据流图映射至Codelet模型上。根据多核计算系统硬件资源的配置可将Codelet数据流图显式转换成一个二维数组矩阵，该其中的某个二维数组向量对应任务在某个Codelet节点上的调度次序。在满足各Codelet间数据依赖关系的前提下，首先随机地将Codelet节点分配到硬件资源上，完成对任务调度的初始化；然后根据遗传算法经过多次迭代进化得出Codelet模型执行该任务的最优调度方案。The core idea of the present invention is to divide a certain task into a plurality of subtasks and express it as a directed acyclic data flow graph; and then map the data flow graph to the Codelet model. According to the configuration of the hardware resources of the multi-core computing system, the Codelet data flow graph can be explicitly converted into a two-dimensional array matrix, and a certain two-dimensional array vector corresponds to the scheduling order of tasks on a certain Codelet node. On the premise of satisfying the data dependencies between codelets, the codelet nodes are first randomly allocated to hardware resources to complete the initialization of task scheduling; then according to the genetic algorithm, the optimal codelet model to perform the task is obtained through multiple iterations of evolution. scheduling plan.

基于遗传算法的Codelet调度流程如图2所示：具体步骤如下：The codelet scheduling process based on genetic algorithm is shown in Figure 2: The specific steps are as follows:

(1)导入一个Codelet数据流图(CDG)，计算codelet的深度，并将任务转换成二维矩阵，计算调度时间；(1) Import a Codelet data flow graph (CDG), calculate the depth of the codelet, convert the task into a two-dimensional matrix, and calculate the scheduling time;

(2)随机生成初始解群POP(t)，t＝0；(2) Randomly generate the initial solution group POP(t), t=0;

(3)根据适应函数，通过调度时间计算初始解群各解的适应值；(3) According to the adaptation function, the fitness value of each solution of the initial solution group is calculated through the scheduling time;

(4)检测是否满足收敛中止条件，若满足跳至步骤(10)，否则执行步骤(5)；(4) Detect whether the convergence termination condition is satisfied, if it is satisfied, skip to step (10), otherwise, execute step (5);

(5)使用轮盘赌，选择杂交的母体；(5) Use roulette to select the parent of the hybrid;

(6)根据经验设定概率Pc，根据选定的母体的概率值大于Pc，则执行外部杂交操作，小于Pc执行内部杂交操作；(6) set probability Pc according to experience, according to the probability value of the selected mother is greater than Pc, then perform external hybridization operation, less than Pc perform internal hybridization operation;

(7)根据经验设定概率Pm，以概率Pm执行变异操作；(7) Set probability Pm according to experience, and perform mutation operation with probability Pm;

(8)使用精英策略，保留原有解群中的最优解；(8) Use the elite strategy to retain the optimal solution in the original solution group;

(9)将最优解作为新的解群POP(t+1)，t＝t+1，计算新的解群各解的适应值，返回执行步骤(4)；(9) Take the optimal solution as a new solution group POP(t+1), t=t+1, calculate the fitness value of each solution of the new solution group, and return to step (4);

(10)输出近似最优解。(10) Output the approximate optimal solution.

本发明中的调度方法生成最优解的过程包括四个核心模块：(1)Codelet调度的表示，(2)适应函数的表示，(3)初始解群的生成，(4)操作算子。The process of generating the optimal solution by the scheduling method in the present invention includes four core modules: (1) the representation of the Codelet scheduling, (2) the representation of the adaptation function, (3) the generation of the initial solution group, and (4) the operator.

1.Codelet调度的表示1. Representation of Codelet scheduling

首先，任务在Codelet模型中可转换成一个Codelet图(CDG)(中文名称Codelet数据流图)，相应的数据流图CDG可以转化对应于一个二维矩阵。二维数组矩阵的维度对应某个集群或多核计算系统所包含的调度器数量。因此，对于任务在Codelet模型中的调度问题，本发明给出以下几点定义：First, the task can be converted into a Codelet Graph (CDG) (Chinese name Codelet data flow graph) in the Codelet model, and the corresponding data flow graph CDG can be converted to correspond to a two-dimensional matrix. The dimension of the two-dimensional array matrix corresponds to the number of schedulers included in a cluster or multi-core computing system. Therefore, for the task scheduling problem in the Codelet model, the present invention provides the following definitions:

i.记G为某个CDG图，该图可表示为G＝<V,E,Rt,d>，则有：i. Denote G as a CDG graph, which can be expressed as G=<V,E,Rt,d>, then there are:

·V表示n个节点(Codelet)的集合，即V＝{V₀,V₁,…,V_n-1}；·V represents a set of n nodes (Codelets), that is, V={V₀ , V₁ , ..., V_n-1 };

·E表示节点间的有向边(依赖关系)的集合，<V_i,V_j>表示结点V_i与V_j的依赖关系，V_i是V_j的前继，V_j是V_i的后继，即V_i∈PRED(V_j)，V_j∈SUCC(V_i)；E represents the set of directed edges (dependencies) between nodes, <V_i , V_j > represents the dependency relationship between nodes V_i and V_j , V_i is the predecessor of V_j , and V_j is the successor of V_i Successor, namely V_i ∈ PRED(V_j ), V_j ∈ SUCC(V_i );

·Rt表示某个Codelet节点的运行时间，Rt(V_i)表示运行结点V_i所花费的时间；Rt represents the running time of a Codelet node, and Rt(V_i ) represents the time spent running the node_Vi ;

·d表示某个Codelet节点的深度，结点V_i的深度可定义为：d represents the depth of a Codelet node, and the depth of node_Vi can be defined as:

0≤d≤d_m，d_m是CDG图中最大的深度。 0≤d≤d_m , where d_m is the maximum depth in the CDG map.

其中，深度是CDG图的一个重要参数，用于生成初始解群及判断遗传算法解的合法性。Among them, the depth is an important parameter of the CDG graph, which is used to generate the initial solution group and judge the legitimacy of the genetic algorithm solution.

图3为任务在单片双核系统上的Codelet调度图，其中图3(a)所示Codelet数据流图，该CDG图包含了7个Codelet节点(V0～V6)，每个结点的信息包括Codelet的运行时间Rt和深度d(例如V0的Rt为3，d为0)，箭头代表了Codelet节点间的依赖关系。Figure 3 is the codelet scheduling diagram of tasks on a single-chip dual-core system, in which the codelet data flow diagram shown in Figure 3(a), the CDG diagram includes 7 codelet nodes (V0 ~ V6), and the information of each node includes The running time Rt and depth d of the codelet (for example, the Rt of V0 is 3, and d is 0), and the arrows represent the dependencies between the codelet nodes.

ii.记S为某个CDG图的调度，该调度可表示为S＝<C,Q>，则有：ii. Let S be the schedule of a CDG graph, and the schedule can be expressed as S=<C, Q>, then:

·C表示m个调度器的集合，即C＝[C₀,C₁,…,C_m-1]，C_i表示ID为i的调度器；C represents the set of m schedulers, that is, C=[C₀ , C₁ ,..., C_m-1 ], and C_i represents the scheduler whose ID is i;

·Q_i表示调度器C_i中所包含的Codelet任务执行序列，即Q_i＝[V_1i,V_2i,…,V_ki]^T，其中k为该调度器C_i依次执行Codelet节点的序列长度；·Q_i represents the codelet task execution sequence included in the scheduler C_i , that is, Q_i =[V_1i ,V_2i ,...,V_ki ]^T , where k is the sequence length of the codelet node executed by the scheduler C_i in turn ;

·调度S可表示成一个二维数组矩阵，即S＝[Q₁,Q₂,…,Q_m]。· The schedule S can be represented as a two-dimensional array matrix, ie S=[Q₁ , Q₂ , . . . , Q_m ].

·由于调度器是按照任务序列的顺序依次执行Codelet的，打乱CDG图中原有的依赖关系可能会使任务无法正常执行。因此，为了保证系统能够按照调度正常执行Codelet，在执行调度方法计算前先对任务序列中的节点按节点深度进行升序排列。· Since the scheduler executes the codelets in the order of the task sequence, disrupting the original dependencies in the CDG graph may cause the tasks to fail to execute normally. Therefore, in order to ensure that the system can normally execute the Codelet according to the schedule, the nodes in the task sequence are arranged in ascending order of node depth before executing the calculation of the scheduling method.

如图3(b)所示CDG图的调度示意图，假定在某个多核处理器上执行某个任务T，该多核系统的每个处理器核心分别对应一个调度器。如果对于该双核系统有两个调度S1和S2，两个调度S1和S2可完成执行相同的任务T，每个调度分别对应一个任务序列，每个任务序列表示调度器执行Codelet节点的次序。图中S1可表示成{[V0,V1,V3,V6],[V2,V4,V5]}，S2可表示成{[V0,V2,V5],[V1,V3,V4,V6]}。As shown in Fig. 3(b), a schematic diagram of the scheduling of the CDG graph, assuming that a certain task T is executed on a certain multi-core processor, each processor core of the multi-core system corresponds to a scheduler respectively. If there are two schedules S1 and S2 for the dual-core system, the two schedules S1 and S2 can complete and execute the same task T, each schedule corresponds to a task sequence, and each task sequence represents the order in which the scheduler executes Codelet nodes. In the figure, S1 can be expressed as {[V0, V1, V3, V6], [V2, V4, V5]}, and S2 can be expressed as {[V0, V2, V5], [V1, V3, V4, V6]}.

iii.记T(S)为系统按照调度S执行所有Codelet节点所花费的总时间，Ts(Q_i)为执行任务序列Q_i下所有Codelet所花费的总时间，则有T(S)＝MAX(Ts(Q_i))，图3(c)为根据图3(b)的调度任务顺序结合图3(a)每个Codelet节点运行时间和深度绘制的系统执行总时间的示意图，调度S1和S2执行花费时间分别为T(S1)＝14，T(S2)＝13。iii. Let T(S) be the total time spent by the system to execute all Codelet nodes according to the schedule S, and Ts(Q_i ) be the total time spent executing all_Codelets under the task sequence Qi, then T(S)=MAX (Ts(Q_i )), Figure 3(c) is a schematic diagram of the total execution time of the system based on the scheduling task sequence of Figure 3(b) combined with the running time and depth of each Codelet node in Figure 3(a), and the execution time of scheduling S1 and S2 is T ( S1)=14, T(S2)=13.

根据以上定义，一个任务在Codelet模型中的调度最终可转换成一个二维数组矩阵。Codelet调度的目标是为多核系统寻找一种全局最佳的调度S，使Codelet节点按照既定的次序在多个调度器中并行执行，并使系统执行任务的总时间最小，即MIN(T(S))，According to the above definition, the scheduling of a task in the Codelet model can finally be converted into a two-dimensional array matrix. The goal of Codelet scheduling is to find a globally optimal schedule S for multi-core systems, so that Codelet nodes are executed in parallel in multiple schedulers in a given order, and the total time for the system to perform tasks is minimized, that is, MIN(T(S )),

2.适应函数2. Adaptation function

适应函数描述了使用各Codelet节点的执行时间及Codelet间的通信开销来预计任务在系统中的执行总时间，并以该函数的适应值作为某个解优劣性的评价指标。例如:图3展示了同一任务在单片多核处理器上执行两种调度的执行总时间，调度S1的执行时间为14，而调度S2的执行时间为13。因此，根据适应值T(S1)>T(S2)，那么调度S2要优于调度S1。The adaptation function describes the execution time of each codelet node and the communication overhead between codelets to estimate the total execution time of the task in the system, and the fitness value of the function is used as the evaluation index of the pros and cons of a solution. For example: Figure 3 shows the total execution time for the same task to execute two schedules on a single-chip multi-core processor, the execution time of schedule S1 is 14, and the execution time of schedule S2 is 13. Therefore, according to the fitness value T(S1)>T(S2), scheduling S2 is better than scheduling S1.

适应值是遗传算法中用来评价个体优劣的标准，本发明将适应函数定义为f(S)＝c-T(S)，其中，c设定为某个足够大的常数，比如可以取值为所有Codelet预执行时间的总和，即T(S)为系统在分配完Codelet节点后，执行所有Codelet的总时间。因此，f(S)越大，表示系统执行完成该任务的总时间越短。如图3(c)，调度S1的执行时间为T(S1)＝14，所有Codelet执行时间总和为c＝20，则适应值f(S1)＝c-T(S1)＝6，调度S2对应的适应值f(S2)＝c-T(S2)＝7，适应函数可以根据计算自主设定，本发明用了减法，用加法也可以。The fitness value is a standard used to evaluate the quality of an individual in the genetic algorithm. The present invention defines the fitness function as f(S)=cT(S), where c is set to a sufficiently large constant, for example, it can be taken as The sum of all codelet pre-execution times, i.e. T(S) is the total time that the system executes all Codelets after allocating Codelet nodes. Therefore, the larger f(S), the shorter the total time the system takes to complete the task. As shown in Fig. 3(c), the execution time of scheduling S1 is T(S1)=14, and the total execution time of all Codelets is c=20, then the adaptation value f(S1)=cT(S1)=6, the adaptation corresponding to scheduling S2 The value f(S2)=cT(S2)=7, the adaptation function can be set independently according to the calculation, the present invention uses the subtraction, and the addition can also be used.

3.初始解群3. Initial solution group

初始解群采用平均分配的方式随机生成，即每个调度器上的任务队列所包含的Codelet节点数量基本相同，从而缩小完成每个调度器上任务队列所需时间的差距。为了进一步提高初始解群的质量，该算法会先生成M个初始解，之后计算各个解的适应值，从中选取N(N<M)个适应值大的解作为初始解群。The initial solution group is randomly generated by means of average distribution, that is, the number of Codelet nodes contained in the task queue on each scheduler is basically the same, thereby narrowing the gap in the time required to complete the task queue on each scheduler. In order to further improve the quality of the initial solution group, the algorithm will first generate M initial solutions, then calculate the fitness values of each solution, and select N (N<M) solutions with large fitness values as the initial solution group.

由于Codelet模型是一种数据流并行计算模型，可以将Codelet节点均匀地分配给调度器。记n(i)为CDG图中深度为i的Codelet节点数量，m为调度器的数量，并设则同一深度的Codelet节点分配到各调度器的数量为[ave(i),ave(i)+1]，且各调度器分配到的Codelet节点数量的总和应等于n(i)。将同一调度器下的Codelet按深度升序排列就可得到一个合法的任务序列，为一个初始解，各个调度器的Codelet按深度升序排列得到每个合法的任务序列，组成一个初始解群。Since the codelet model is a data flow parallel computing model, the codelet nodes can be evenly distributed to the scheduler. Let n(i) be the number of Codelet nodes with depth i in the CDG graph, m is the number of schedulers, and set Then the number of codelet nodes at the same depth allocated to each scheduler is [ave(i), ave(i)+1], and the sum of the number of codelet nodes allocated by each scheduler should be equal to n(i). A legal task sequence can be obtained by arranging the codelets under the same scheduler in ascending order of depth, which is an initial solution. The codelets of each scheduler are arranged in ascending order of depth to obtain each legal task sequence, forming an initial solution group.

初始解群的优劣决定了遗传算法收敛的速度，且初始生成的解具有很强的随机性。为了进一步加快算法求解的速度，防止过早收敛，会事先随机生成多个初始解，在计算各解的适应值后，从中筛选适应值高的优秀解作为实际的初始解，以提高初始解群的质量。The quality of the initial solution group determines the convergence speed of the genetic algorithm, and the initially generated solution has strong randomness. In order to further speed up the algorithm solution and prevent premature convergence, multiple initial solutions will be randomly generated in advance. After calculating the fitness value of each solution, an excellent solution with a high fitness value is selected as the actual initial solution to improve the initial solution group. the quality of.

4.操作算子4. Operation operator

在遗传算法中，解群的选择和进化是由一系列操作算子来实现的。这些操作算子包括选择算子、杂交算子以及变异算子，选择算子可以采用轮盘赌作为选取个体解的方法，并采用精英策略作为保留优秀个体解至下一代的方法；为了扩大搜索域，杂交算子根据杂交对象的不同又可分为外杂交算子和内部杂交算子；根据杂交点是否相同，杂交的交叉操作又分为单点交叉和两点交叉；变异算子采用传统的交换操作，即交换单个解内的两个Codelet节点。In the genetic algorithm, the selection and evolution of the solution group are realized by a series of operators. These operators include selection operator, hybrid operator and mutation operator. The selection operator can use roulette as a method of selecting individual solutions, and adopt an elite strategy as a method to retain excellent individual solutions to the next generation; in order to expand the search According to the different crossover objects, the crossover operator can be divided into an outer crossover operator and an internal crossover operator; according to whether the crossover points are the same, the crossover operation of the crossover is divided into a single-point crossover and a two-point crossover; the mutation operator adopts the traditional crossover operation. The swap operation of , that is, swapping two Codelet nodes within a single solution.

i.选择算子-轮盘赌i. Selection Operator - Roulette

选择机制是根据适应值的大小选择和保留优秀个体，淘汰劣质个体的过程。本发明采用轮盘赌的方式来决定每次迭代进行杂交的父代个体。具体过程如下：The selection mechanism is the process of selecting and retaining excellent individuals and eliminating inferior individuals according to the fitness value. The present invention uses a roulette wheel to determine the parental individuals to be crossed in each iteration. The specific process is as follows:

(1)计算解群中各个解的适应值，并相加得出适应值总和；(1) Calculate the fitness values of each solution in the solution group, and add them to obtain the sum of fitness values;

(2)计算个体解的适应值与适应值总和的比值，得出的比值为选择个体的概率值；(2) Calculate the ratio of the fitness value of the individual solution to the sum of the fitness values, and the obtained ratio is the probability value of selecting an individual;

(3)根据概率值越大越优的原则(根据本发明的适应值算法，越大越优)，从解群中选取两个个体解作为杂交操作的父代个体。(3) According to the principle that the larger the probability value, the better (according to the fitness value algorithm of the present invention, the larger the better), select two individual solutions from the solution group as the parent individuals of the hybridization operation.

ii.外部杂交算子ii. External hybridization operator

在进行杂交操作之前，根据经验设定进行外部杂交概率Pc，一般Pc＝0.7～0.9，所选择的父代个体的概率值大于选定的概率Pc时进行外部杂交操作，外部杂交操作所选择的父代个体为调度，即由选择算子得到的两个调度，执行交换操作的部分是两个调度的所有任务序列。具体过程如下：Before the hybridization operation, the external hybridization probability Pc is set according to experience. Generally, Pc=0.7~0.9. When the probability value of the selected parental individual is greater than the selected probability Pc, the external hybridization operation is performed. The parent individual is the schedule, that is, the two schedules obtained by the selection operator, and the part that performs the exchange operation is all the task sequences of the two schedules. The specific process is as follows:

(4)若d’₁与d’₂相等，执行两点交叉操作，将调度S1与S2中两个杂交点之间的部分(任务序列片段的集合)相互交换；(4) if d'₁ and d'₂ are equal, perform a two-point crossover operation, and exchange the parts (sets of task sequence fragments) between the two hybridization points in the scheduling S1 and S2;

如图4所示，两个调度执行了外部杂交操作，图中两个杂交点的深度分别为0和2，因此，需对两个调度内深度在0与2之间的Codelet(即深度为1和2的Codelet)执行两点交叉操作，即交换图4中调度S1与调度S2的虚框部分。(S1和S2中的Q2均出现了深度为2两个Codelet，根据图4解释怎么外部交叉)As shown in Figure 4, the two schedules perform external hybridization operations, and the depths of the two hybridization points in the figure are 0 and 2, respectively. Therefore, it is necessary to perform codelets with a depth between 0 and 2 in the two schedules (that is, the depth is 0 and 2). The codelets of 1 and 2) perform a two-point crossover operation, that is, swap the dashed-frame parts of schedule S1 and schedule S2 in FIG. 4 . (Q2 in S1 and S2 both have two Codelets with a depth of 2, explain how to cross externally according to Figure 4)

iii.内部杂交算子iii. Internal hybridization operator

与外部杂交不同的是，内部杂交的操作对象为个体解而非解群，即对两个调度器中任意一个调度器中的两个任务序列Q1和Q2执行交换操作，因此一般不做外部杂交的情况下作内部杂交，在不确定的情况下可以将外部杂交和内部杂交效果进行比对，再确定。内部杂交具体过程如下：Different from external hybridization, the operation object of internal hybridization is individual solution rather than solution group, that is, the exchange operation is performed on the two task sequences Q1 and Q2 in any one of the two schedulers, so external hybridization is generally not performed. In the case of internal hybridization, in the case of uncertainty, the effect of external hybridization and internal hybridization can be compared and then determined. The specific process of internal hybridization is as follows:

(1)从一个调度S中随机选取两个任务序列Q1和Q2；(1) Randomly select two task sequences Q1 and Q2 from a schedule S;

(2)(3)步骤同外部杂交操作相同；(2) (3) step is identical with external hybridization operation;

(4)若d’₁与d’₂相等，执行两点交叉操作，将任务序列Q1与Q2中两个杂交点之间的部分(Codelet的集合)相互交换；(4) If d'₁ and d'₂ are equal, perform a two-point crossover operation, and exchange the part (set of Codelets) between the two hybridization points in the task sequence Q1 and Q2;

(5)若d’₁与d’₂不等，执行单点交叉操作，将任务序列Q1与Q2中杂交点后的部分相互交换。(5) If d'₁ and d'₂ are not equal, perform a single-point crossover operation to exchange the parts after the crossover point in the task sequences Q1 and Q2.

图5展示了两次内部杂交操作，其中，调度S1对应的杂交点深度分别为1和2，因此执行的是两点交叉操作，交叉的部分为两个任务序列中深度在1与2之间的Codelet(即深度为2的Codelet)；由于调度S2的杂交点深度均为1，因此执行的是单点交叉操作，交叉的部分为两个任务序列中深度大于1的Codelet。Figure 5 shows two internal hybridization operations, where the depths of hybridization points corresponding to schedule S1 are 1 and 2, respectively, so a two-point crossover operation is performed, and the intersecting part is the depth between 1 and 2 in the two task sequences The codelet (that is, the codelet with a depth of 2); since the depth of the hybridization point of scheduling S2 is all 1, a single-point crossover operation is performed, and the crossed part is the codelet with a depth greater than 1 in the two task sequences.

iv.变异算子iv. Mutation operator

变异操作是迭代产生新个体的辅助方式，通过变异可维持了解群的多样性，加快最优解的收敛速度。变异算子采用传统的交换操作，即随机交换一个调度中同一深度下的两个结点。具体过程如下：The mutation operation is an auxiliary method to iteratively generate new individuals. Through mutation, the diversity of the understanding group can be maintained and the convergence speed of the optimal solution can be accelerated. The mutation operator adopts the traditional swap operation, that is, randomly swaps two nodes at the same depth in a schedule. The specific process is as follows:

(1)随机生成一个数d’，使0<d’<d_m；(1) Randomly generate a number d', so that 0<d'<d_m ;

(2)随机选择两个深度为d’的结点；(2) Randomly select two nodes with a depth of d';

(3)交换两个结点，产生新解。(3) Exchange two nodes to generate a new solution.

如图6所示，调度S1执行了变异操作，选择变异的深度为2，满足条件的Codelet有V3、V4和V5，因此这三个Codelet在执行变异操作时，会随机选取两个互换位置。As shown in Figure 6, scheduling S1 executes the mutation operation, and selects the depth of mutation as 2. The codelets that meet the conditions are V3, V4, and V5. Therefore, when these three codelets perform the mutation operation, they will randomly select two swap positions. .

v.选择算子-精英策略v. Selection operator - elite strategy

由于轮盘赌的选择方式是基于概率选择的，因此存在统计误差，为了将优秀个体保留至下一代，保证每次迭代最优解的适应值单调递增，保留机制采用精英策略，该策略的主要思想是保持解群中的适应值大的优秀解不丢失。具体过程如下：Since the selection method of roulette is based on probability selection, there are statistical errors. In order to retain excellent individuals to the next generation and ensure that the fitness value of the optimal solution in each iteration increases monotonically, the retention mechanism adopts an elite strategy. The idea is to keep the excellent solutions with large fitness values in the solution group from being lost. The specific process is as follows:

(1)保留原解群中适应值最高的解，记为B(t)＝Best(POP(t))；(1) Retain the solution with the highest fitness value in the original solution group, denoted as B(t)=Best(POP(t));

(2)计算新解群中个体解的适应值，记新解群中的最优解为B(t+1)＝Best(POP(t+1))；(2) Calculate the fitness value of individual solutions in the new solution group, and record the optimal solution in the new solution group as B(t+1)=Best(POP(t+1));

(3)将POP(t+1)中的最差解替换成POP(t)中的最优解。(3) Replace the worst solution in POP(t+1) with the optimal solution in POP(t).

旧解群通过以上各遗传算子的操作，最终得到了新解群，而新生成解群的适应值必定不小于原来的解群，因此，该遗传算法可在多次迭代后逐渐向最优解(即最优调度)收敛。Through the operations of the above genetic operators, the old solution group finally obtains a new solution group, and the fitness value of the newly generated solution group must not be less than the original solution group. Therefore, the genetic algorithm can gradually improve to the optimal solution after many iterations. The solution (ie, the optimal schedule) converges.

5.收敛条件5. Convergence Conditions

设置合理的收敛中止条件，可使遗传算法所生成的解更接近理想的最优解。为了在尽可能短的时间内生成优秀的解，本发明设置了以下两个收敛条件：Setting reasonable convergence termination conditions can make the solution generated by the genetic algorithm closer to the ideal optimal solution. In order to generate an excellent solution in the shortest possible time, the present invention sets the following two convergence conditions:

(1)在给定的迭代收敛阈值后遗传算法收敛，适应值维持不变(在该发明的测试实验设置阈值为500，可根据测试对象定)，将最近生成的解作为最终的结果输出；(1) The genetic algorithm converges after a given iterative convergence threshold, and the fitness value remains unchanged (the threshold is set at 500 in the test experiment of the invention, which can be determined according to the test object), and the recently generated solution is output as the final result;

(2)如果遗传算法在给定的迭代收敛阈值后仍然不收敛，当迭代次数超过给定的最大迭代阈值(在该发明的测试试验中设置的最大迭代阈值为2000，可根据测试对象定)后，为避免在生成最优解的过程中耗费太多的时间，将最近生成的解作为近似的最优解输出。(2) If the genetic algorithm still does not converge after a given iteration convergence threshold, when the number of iterations exceeds the given maximum iteration threshold (the maximum iteration threshold set in the test experiment of the invention is 2000, which can be determined according to the test object) Then, in order to avoid spending too much time in the process of generating the optimal solution, the most recently generated solution is output as an approximate optimal solution.

Claims

Translated fromChinese

1.一种基于遗传算法的Codelet调度方法，其特征在于，具体包括如下步骤：1. a Codelet scheduling method based on genetic algorithm, is characterized in that, specifically comprises the steps:

一个Codelet数据流图包括A Codelet data flow diagram consists of

二维矩阵：2D matrix:

调度时间：Schedule time:

计算解群中各个解的适应值，并相加得出适应值总和；计算个体解的适应值与适应值总和的比值，得出的比值为选择个体的概率值；根据概率值从解群中选择适应强的两个个体解作为杂交操作的父代个体；Calculate the fitness value of each solution in the solution group, and add them to obtain the sum of fitness values; calculate the ratio of the fitness value of the individual solution to the sum of fitness values, and the obtained ratio is the probability value of selecting an individual; according to the probability value, from the solution group Choose two individual solutions with strong adaptation as the parent individuals of the hybrid operation;

10)输出近似最优解。10) Output the approximate optimal solution.

2.根据权利要求1所述基于遗传算法的Codelet调度方法，其特征在于，所述步骤6)中外部杂交操作步骤如下：2. the Codelet scheduling method based on genetic algorithm according to claim 1, is characterized in that, described step 6) middle external hybridization operation step is as follows:

3.根据权利要求1所述基于遗传算法的Codelet调度方法，其特征在于，所述步骤6)中内部杂交操作步骤如下：3. the Codelet scheduling method based on genetic algorithm according to claim 1, is characterized in that, in described step 6), internal hybridization operation step is as follows:

4.根据权利要求1所述基于遗传算法的Codelet调度方法，其特征在于，所述步骤7)中变异操作：随机交换一个调度中同一深度下的两个结点，随机生成一个数d’，使0<d’<d_m；随机选择两个深度为d’的结点；交换两个结点，产生新解。4. the Codelet scheduling method based on genetic algorithm according to claim 1, is characterized in that, in described step 7), mutation operation: randomly exchange two nodes under the same depth in a scheduling, randomly generate a number d', Let 0<d'<_dm; randomly select two nodes with depth d'; exchange the two nodes to generate a new solution.