CN108829500B

Movatterモバイル変換

Info

Publication number: CN108829500B
Application number: CN201810419583.XA
Authority: CN
Inventors: 郝永生; 曹杰; 季赛; 马廷淮; 管廷昭; 李仕强; 夏艳东
Original assignee: Nanjing University of Information Science and Technology
Current assignee: Nanjing University of Information Science and Technology
Priority date: 2018-05-04
Filing date: 2018-05-04
Publication date: 2022-05-27
Anticipated expiration: 2038-05-04
Also published as: CN108829500A

Abstract

Translated fromChinese

本发明公开了一种云环境下模块化并行作业的动态节能调度方法，包括以下步骤：(1)根据系统日志，计算系统平均负载及作业平均并行度；(2)根据系统的平均负载，寻找资源的参考工作状态，包括计算资源的工作频率与工作电压；(3)假设资源工作在参考工作状态下，且作业的并行度为平均并行度，计算此时作业的能耗及工作量；(4)与步骤(3)得到的结果进行对比，寻找可能比步骤(3)更小能耗的调度方法，若有更新调度方法，若没有则采取步骤(3)的方法。本发明不需要知道作业的内部结构，控制简单，首先从全局计算能耗，使能耗处于较低的水平，通过节省资源，实现作业的动态调度。

The invention discloses a dynamic energy-saving scheduling method for modular parallel jobs in a cloud environment, comprising the following steps: (1) calculating the average system load and the average parallelism degree of operations according to the system log; (2) searching for the average load of the system according to the average load of the system The reference working state of the resource, including the working frequency and working voltage of the computing resource; (3) Assuming that the resource works in the reference working state and the parallelism of the job is the average parallelism, calculate the energy consumption and workload of the job at this time; ( 4) Compare with the results obtained in step (3), and find a scheduling method that may consume less energy than step (3). If there is an update scheduling method, if not, adopt the method of step (3). The present invention does not need to know the internal structure of the job, the control is simple, the energy consumption is firstly calculated globally, so that the energy consumption is at a lower level, and the dynamic scheduling of the job is realized by saving resources.

Description

Translated fromChinese

一种云环境下模块化并行作业的动态节能调度方法A dynamic energy-saving scheduling method for modular parallel jobs in cloud environment

技术领域technical field

本发明涉及计算机软件调度节能领域，尤其涉及一种云环境下模块化并行作业的动态节能调度方法。The invention relates to the field of computer software scheduling and energy saving, in particular to a dynamic energy saving scheduling method for modular parallel jobs in a cloud environment.

背景技术Background technique

随着云计算、信息服务、Web2.0、SaaS、互联网+等理念和技术的快速发展，以及大数据平台的出现，并行算法得到了广泛应用。由于并行算法节省了大量的时间，在不同行业均有了优越的表现，例如气象，遥感，军事等。编写并行算法对于非计算机专业的研究人员是件困难的事，因此，在非计算机行业，更多的采用的是模块化并行算法。模块化并行算法的优点是将数据广播，消息传递，计算同步等同步控制信息隐藏起来，保证非计算机行业的科学家，按照自己的思路编写并行程序，减少专业问题之外的内容。同时，模块化并行算法能耗也越来越大。并行作业的节能调度与传统的并行任务调度(非以节能为目地)相比，主要目标不是减少任务总体完成时间(调度长度)，而是尽量减少使用资源数目和占用时间，提高资源利用率，达到整个系统性能和能耗之间的平衡。With the rapid development of concepts and technologies such as cloud computing, information services, Web2.0, SaaS, and Internet+, as well as the emergence of big data platforms, parallel algorithms have been widely used. Because the parallel algorithm saves a lot of time, it has excellent performance in different industries, such as meteorology, remote sensing, military and so on. Writing parallel algorithms is difficult for non-computer researchers. Therefore, in non-computer industries, more modular parallel algorithms are used. The advantage of the modular parallel algorithm is to hide the synchronization control information such as data broadcasting, message transmission, and calculation synchronization, so as to ensure that scientists who are not in the computer industry can write parallel programs according to their own ideas and reduce the content other than professional problems. At the same time, the energy consumption of modular parallel algorithms is also increasing. Compared with the traditional parallel task scheduling (not for the purpose of energy saving), the energy-saving scheduling of parallel jobs is not to reduce the overall task completion time (scheduling length), but to minimize the number of resources used and the occupied time, and to improve resource utilization. To achieve a balance between overall system performance and energy consumption.

现有方法大多数需要知道程序的结构图，实际上，程序的结构图本身可能非常复杂，对其调度，计算复杂度很高。本发明提出了模块化并行程序的调度方法，模块化并行程序的各个作业同时获得资源，作业完成后，同时释放资源。Most of the existing methods need to know the structure diagram of the program. In fact, the structure diagram of the program itself may be very complex, and the scheduling of the program has high computational complexity. The invention proposes a scheduling method for a modular parallel program. Each job of the modular parallel program obtains resources at the same time, and releases the resources at the same time after the job is completed.

发明内容SUMMARY OF THE INVENTION

发明目的：本发明目的是提供一种云环境下模块化并行作业的动态节能调度方法。Purpose of the invention: The purpose of the present invention is to provide a dynamic energy-saving scheduling method for modular parallel jobs in a cloud environment.

技术方案：本发明包括以下步骤：Technical scheme: the present invention comprises the following steps:

(1)根据系统日志，计算系统平均负载及作业平均并行度；(1) According to the system log, calculate the average load of the system and the average degree of parallelism of the job;

(2)根据系统的平均负载，寻找资源的参考工作状态，包括计算资源的工作频率与工作电压；(2) According to the average load of the system, find the reference working state of the resource, including the working frequency and working voltage of the computing resource;

(3)假设资源工作在参考工作状态下，且作业的并行度为平均并行度，计算此时作业的能耗及工作量；(3) Assuming that the resource works in the reference work state, and the parallelism of the job is the average parallelism, calculate the energy consumption and workload of the job at this time;

(4)与步骤(3)得到的结果进行对比，寻找可能比步骤(3)更小能耗的调度方法，若有更新调度方法，若没有则采取步骤(3)的方法。(4) Compare with the results obtained in step (3), and find a scheduling method that may consume less energy than step (3). If there is an update scheduling method, if not, adopt the method of step (3).

所述步骤(1)中系统平均负载及作业平均并行度的计算以资源作业的实际计算量为准。The calculation of the average system load and the average degree of parallelism of the operation in the step (1) is based on the actual calculation amount of the resource operation.

所述步骤(2)中的参考工作状态是在满足系统平均负载且系统总能耗最小时，资源的工作状态。The reference working state in the step (2) is the working state of the resource when the average load of the system is satisfied and the total energy consumption of the system is minimum.

所述步骤(2)中参考工作状态的寻找采用全局查找方法。In the step (2), a global search method is used to search for the reference working state.

所述步骤(4)中的调度方法为：先选择可以满足当前作业的资源种类，再根据作业的时间期限与能耗要求，进行进一步调度。The scheduling method in the step (4) is as follows: first, selecting a resource type that can satisfy the current job, and then performing further scheduling according to the time limit and energy consumption requirements of the job.

有益效果：本发明不需要知道作业的内部结构，控制简单，首先从全局计算能耗，使能耗处于较低的水平，通过节省资源，实现作业的动态调度。Beneficial effects: the present invention does not need to know the internal structure of the job, the control is simple, the energy consumption is firstly calculated globally, so that the energy consumption is at a lower level, and the dynamic scheduling of the job is realized by saving resources.

附图说明Description of drawings

图1为本发明的流程图；Fig. 1 is the flow chart of the present invention;

图2为本发明的实施流程图。FIG. 2 is a flow chart of the implementation of the present invention.

具体实施方式Detailed ways

下面结合附图对本发明作进一步说明。The present invention will be further described below in conjunction with the accompanying drawings.

如图1所示，本发明包括以下步骤：As shown in Figure 1, the present invention comprises the following steps:

(1)根据系统日志，计算系统平均负载及作业平均并行度，其计算以资源作业的实际计算量为准：(1) Calculate the average load of the system and the average degree of parallelism of the job according to the system log. The calculation is based on the actual calculation amount of the resource job:

假设系统已经完成了NUMJ(1≤temp≤NUMJ)个作业：Suppose the system has completed NUMJ (1≤temp≤NUMJ) jobs:

J＝{j₁，...，j_temp，...j_NUMJ} (1)J={j₁ ,...,j_temp ,...j_NUMJ } (1)

对于这些作业，其分配资源数量为：For these jobs, their assigned resource quantities are:

R＝{r₁，...，r_temp，...r_NUMJ} (2)R={r₁ , ..., r_temp , ... r_NUMJ } (2)

这些资源的工作频率及工作电压分别是F及V(计算资源的工作频率及工作电压一经分配，不再改变)：The working frequency and working voltage of these resources are F and V respectively (the working frequency and working voltage of computing resources will not change once they are allocated):

F＝{f₁，...，f_temp，...f_NUMJ} (3)F={f₁ , ..., f_temp , ... f_NUMJ } (3)

则总的计算耗费为CS：Then the total computational cost is CS:

总的执行时间是et，则平均负载AL为:The total execution time is et, then the average load AL is:

对于模块化并行作业j_temp，假设它的并行度为(1,pl_temp)：For a modular parallel job j_temp , assuming its degree of parallelism is (1,pl_temp ):

其中：

表示作业j_temp在并行度为

时，执行时间为

in:

Indicates that job j_temp has a degree of parallelism of

, the execution time is

对于作业集合J，其实际的并行度为：For job set J, its actual parallelism is:

J＝{sl₁，...，sl_temp，...sl_NUMJ} (7)J={sl₁ , ..., sl_temp , ... sl_NUMJ } (7)

则所有作业平均并行度(在实际调度时，我们认为这个是参考并行度)为：Then the average parallelism of all jobs (in actual scheduling, we consider this to be the reference parallelism) is:

(2)根据系统的平均负载，寻找资源的参考工作状态，包括计算资源的工作频率与工作电压，参考工作状态指当系统负载工作在平均水平时，使整个系统的能耗处于最低的工作状体。(2) According to the average load of the system, find the reference working state of the resource, including the working frequency and working voltage of the computing resource. The reference working state means that when the system load works at the average level, the energy consumption of the entire system is at the lowest working state. body.

根据DVFS模型，计算资源P_dy的动态能耗为:According to the DVFS model, the dynamic energy consumption of the computing resource P_dy is:

P_dy＝αC_lV_s²f_s (9)P_dy =αC_l V_s² f_s (9)

其中，α，C_l，V_s及f_s分别表示计算资源的翻转率，负载电容的容值，工作电压，工作频率。其中，一般把αC_l当作常数。Among them, α, C_l , V_s and f_s represent the turnover rate of computing resources, the capacitance of the load capacitor, the working voltage, and the working frequency, respectively. Among them,_αCl is generally regarded as a constant.

基于上述资源的动态能耗的计算方法，模块化并行作业的动态能耗为：Based on the calculation method of the dynamic energy consumption of the above resources, the dynamic energy consumption of the modular parallel job is:

其中，st，num，Ns分别表示计算资源的时钟周期，分配资源数量及计算需要的周期个数(slot)；

分别表示资源在相应的时钟周期内的工作状态(电压、工作频率)。Among them, st, num, and Ns respectively represent the clock cycle of computing resources, the number of allocated resources and the number of cycles required for computing (slot);

Respectively represent the working state (voltage, working frequency) of the resource in the corresponding clock cycle.

由于计算资源的静态能耗P_st是常值，所以静态能耗E_st仅同计算时间及资源类型相关。Since the static energy consumption P_st of computing resources is a constant value, the static energy consumption_Est is only related to the computing time and resource type.

E_st＝st*num*P_st (11)E_st =st*num*P_st (11)

模块化并行作业的计算能耗为E：The computational energy consumption of a modular parallel job is E:

E＝E_dy+E_st (12)E=E_dy +E_st (12)

假设计算资源类型A(CPU类型)，具有nums个工作状态：Suppose computing resource type A (CPU type) has nums working states:

A＝{(v₁，f₁，p₁)，...，(v_temp，f_temp，p_temp)，...(v_nums，f_nums，f_nums)} (13)A={(v₁ , f₁ , p₁ ), ..., (v_temp , f_temp , p_temp ), ... (v_nums , f_nums , f_nums )} (13)

其中，v_temp，f_temp，p_temp分别表示资源工作电压，工作频率及相应的功耗。Among them, v_temp , f_temp , and p_temp respectively represent the working voltage of the resource, the working frequency and the corresponding power consumption.

假设系统总共有NUMK种资源，类型A类资源共有num(A)种。首先对所有资源按照p_temp/v_temp升序排列，平均功耗越低，说明其效率越高。假设公式(10)中，已经完成按平均功耗从低到高的排列顺序。It is assumed that the system has NUMK resources in total, and there are num(A) types of resources of type A. First, arrange all resources in ascending order of p_temp /v_temp . The lower the average power consumption, the higher the efficiency. Assume that in formula (10), the order of the average power consumption from low to high has been completed.

参考工作状态指当系统负载工作在平均水平时，使整个系统的能耗处于最低的工作状体。根据这个目标，采用全局查找方法，查找需要的资源(类型及数量)及其工作状态。假设系统总共有NUMK种资源，第i种资源共有num(i)个资源，并具有nums(A)个状态，假设R是所有资源的集合：R＝[1，2...，NUMK]。The reference working state refers to the working state that makes the energy consumption of the whole system at the lowest level when the system load is working at an average level. According to this goal, a global search method is used to find the required resources (type and quantity) and their working status. Suppose the system has NUMK resources in total, the i-th resource has num(i) resources in total, and has nums(A) states, and suppose R is the set of all resources: R=[1, 2..., NUMK].

对于这些资源，采用算法生成所有可能的排列组合，以此组合作为排列顺序：sol＝perms(R)。For these resources, an algorithm is used to generate all possible permutations, and this combination is used as the permutation order: sol=perms(R).

sol的行数即为可能的调度顺序，总共有Πnum(i)种可能性。假设row(sol)为获得排列组合的行数，下列算法为获得参考工作状态的具体过程：The number of rows in sol is the possible scheduling order, and there are a total of Πnum(i) possibilities. Assuming that row(sol) is the number of rows to obtain the permutation and combination, the following algorithm is the specific process of obtaining the reference working state:

算法1：Algorithm 1:

根据以上方法，资源调度顺序存储于第selrid个组合，相应的资源工作状态存于

每一种资源数量存于rnum中。以此方法，完成资源的参考工作状态的计算。According to the above method, the resource scheduling sequence is stored in the selrid combination, and the corresponding resource working status is stored in

The quantity of each resource is stored in rnum. In this way, the calculation of the reference work state of the resource is completed.

(3)假设作业工作在平均并行度下，资源工作在参考状态下，计算其能耗，及计算总量耗费，并对其时间期限进行判断，判断是否能按要求完成。(3) Assuming that the job is working under the average degree of parallelism and the resource is working under the reference state, calculate its energy consumption and the total consumption of the calculation, and judge its time limit to determine whether it can be completed as required.

其中参考状态能耗计算方式为：平均并行度*资源参考状态下功耗*计算时间。平均并行度为资源数量。计算总量耗费为：平均并行度*资源参考状态下工作频率*计算时间。The reference state energy consumption calculation method is: average parallelism * power consumption in resource reference state * calculation time. The average degree of parallelism is the number of resources. The total calculation cost is: average parallelism * working frequency in resource reference state * calculation time.

(4)基于参考工作状态的资源调度方法为：(4) The resource scheduling method based on the reference working state is:

以上部分已经从全局角度完成了资源的参考状态的计算，但是对于各个作业，由于其本身的特点，时间要求，具体算法，需要进一步调整。对于作业集合J，其对应的时间期限为：The above part has completed the calculation of the reference state of the resource from a global perspective, but for each job, due to its own characteristics, time requirements, and specific algorithms, further adjustments are required. For job set J, the corresponding time period is:

D＝{d₁，...，d_temp，...d_NUMJ}D={d₁ , ..., d_temp , ... d_NUMJ }

在调度过程中，用一个参数saver计算当前作业节省的资源(以参考工作状态的频率为标准)。其调度方法为：先选择可以满足当前作业的资源种类，再根据时间作业的时间期限，能耗要求，进行进一步调度：如果某个方法(作业的并行度，资源的工作状态)比参考状态、平均并行度情况下更能节省能耗，并节省标准工作能源，那么就选择它，并计算saver；否则按照参考电压及工作状态执行。In the scheduling process, a parameter saver is used to calculate the resources saved by the current job (based on the frequency of the reference work status). The scheduling method is: first select the type of resources that can meet the current job, and then perform further scheduling according to the time limit of the time job and energy consumption requirements: if a certain method (the parallelism of the job, the working state of the resource) is better than the reference state, In the case of average parallelism, it can save more energy and save the standard working energy, then select it and calculate the saver; otherwise, it is executed according to the reference voltage and working state.

其具体算法如下：The specific algorithm is as follows:

算法2：Algorithm 2:

图2为本发明的实施流程图：Fig. 2 is the implementation flow chart of the present invention:

1.从历史数据收集所有作业J的分配资源数量R，资源工作频率信息F，并将这些信息作为中心调度的基础信息，以后执行的作业，不断更新历史数据，作为未来调度依据；1. Collect the allocated resource quantity R and resource working frequency information F of all jobs J from the historical data, and use these information as the basic information for central scheduling. For the jobs to be executed in the future, the historical data will be continuously updated as the basis for future scheduling;

2.根据步骤1获得的历史数据，计算平均负载AL，平均并行度AP；2. Calculate the average load AL and the average parallelism AP according to the historical data obtained in step 1;

3.根据历史信息计算资源的参考工作状态，即每一种资源的工作频率与工作电压，参考状态是在满足系统平均负载情况下，各种计算资源调度的数量及处于的工作状态(工作频率、工作电压等)，算法一给出了具体的过程，其具体方法为假设系统的负载为平均负载，通过计算资源可能的分配数量与工作状态，寻找最小能耗时的相关参数(选择计算资源种类、数量及其工作状态等)；3. Calculate the reference working state of resources according to the historical information, that is, the working frequency and working voltage of each resource. The reference state is the number of various computing resources scheduled and the working state (working frequency , working voltage, etc.), Algorithm 1 gives the specific process. The specific method is to assume that the load of the system is an average load, and find the relevant parameters when the minimum energy consumption is calculated by the possible allocation number and working state of the computing resources (select the computing resources type, quantity and its working status, etc.);

4.根据步骤3，假设工作在参考状态下，作业具有平均并行度情况下的能耗seg，及计算总量sin；4. According to step 3, assuming that the work is in the reference state, the energy consumption seg and the calculation total sin when the job has an average degree of parallelism;

5.寻找作业的工作状态，使能耗及计算总量分别小于上一步的结果，若找到，则用新的调度方法替换，算法2给出了具体过程，其具体过程为：通过改变计算资源数量、模块化并行作业并行度、计算资源工作状态，计算不同情况下的能耗及计算规模，如果两者分别小于能耗及计算规模则采取新的调度方案；5. Find the working state of the job so that the energy consumption and the total calculation amount are respectively smaller than the results of the previous step. If found, replace it with a new scheduling method. Algorithm 2 gives the specific process. The specific process is: by changing the computing resources Quantity, degree of parallelism of modularized parallel jobs, working status of computing resources, calculate energy consumption and computing scale in different situations, if the two are less than energy consumption and computing scale respectively, a new scheduling scheme is adopted;

6.经过一定时间段后，重新收集历史数据，转至第一步。6. After a certain period of time, re-collect historical data and go to the first step.

Claims

Translated fromChinese

1.一种云环境下模块化并行作业的动态节能调度方法，其特征在于，包括以下步骤：1. A dynamic energy-saving scheduling method for modularized parallel jobs in a cloud environment, comprising the following steps:

（1）根据系统日志，计算系统平均负载及作业平均并行度；(1) Calculate the average load of the system and the average degree of parallelism of the job according to the system log;

（2）根据系统的平均负载，寻找资源的参考工作状态，包括计算资源的工作频率与工作电压，所述的参考工作状态是在满足系统平均负载且系统总能耗最小时，资源的工作状态；(2) According to the average load of the system, find the reference working state of the resource, including the working frequency and working voltage of the computing resource. The reference working state is the working state of the resource when the system average load is satisfied and the total system energy consumption is the smallest. ;

（3）假设资源工作在参考工作状态下，且作业的并行度为平均并行度，计算此时作业的能耗及工作量；(3) Assuming that the resource is working in the reference work state, and the parallelism of the job is the average parallelism, calculate the energy consumption and workload of the job at this time;

（4）根据步骤（3）得到的结果，判断是否有比步骤（3）更小能耗的调度方法，若有更新调度方法，若没有则采取步骤（3）的方法。(4) According to the result obtained in step (3), determine whether there is a scheduling method with lower energy consumption than step (3). If there is an update scheduling method, if not, adopt the method of step (3).

2.根据权利要求1所述的一种云环境下模块化并行作业的动态节能调度方法，其特征在于，所述步骤（1）中系统平均负载及作业平均并行度的计算以资源作业的实际计算量为准。2 . The dynamic energy-saving scheduling method for modular parallel jobs in a cloud environment according to claim 1 , wherein in the step (1), the average system load and the average job parallelism are calculated based on the actual resource jobs. 3 . The amount of calculation shall prevail.

3.根据权利要求1所述的一种云环境下模块化并行作业的动态节能调度方法，其特征在于，所述步骤（2）中参考工作状态的寻找采用全局查找方法。3 . The method for dynamic energy-saving scheduling of modular parallel jobs in a cloud environment according to claim 1 , wherein in the step (2), a global search method is used to search for the reference working state. 4 .

4.根据权利要求1所述的一种云环境下模块化并行作业的动态节能调度方法，其特征在于，所述步骤（4）中的调度方法为：先选择能够满足当前作业的资源种类，再根据作业的时间期限与能耗要求，进行进一步调度。4 . The dynamic energy-saving scheduling method for modular parallel jobs in a cloud environment according to claim 1 , wherein the scheduling method in step (4) is: first selecting a resource type that can satisfy the current job, 5 . Further scheduling is carried out according to the time limit and energy consumption requirements of the job.