Movatterモバイル変換


[0]ホーム

URL:


CN114610397A - Load balancing instruction scheduling method, device, equipment and medium - Google Patents

Load balancing instruction scheduling method, device, equipment and medium
Download PDF

Info

Publication number
CN114610397A
CN114610397ACN202210268534.7ACN202210268534ACN114610397ACN 114610397 ACN114610397 ACN 114610397ACN 202210268534 ACN202210268534 ACN 202210268534ACN 114610397 ACN114610397 ACN 114610397A
Authority
CN
China
Prior art keywords
instruction
instructions
scheduling model
pipeline
operation pipeline
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210268534.7A
Other languages
Chinese (zh)
Inventor
李雅静
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Yunhai Guochuang Cloud Computing Equipment Industry Innovation Center Co Ltd
Original Assignee
Shandong Yunhai Guochuang Cloud Computing Equipment Industry Innovation Center Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Yunhai Guochuang Cloud Computing Equipment Industry Innovation Center Co LtdfiledCriticalShandong Yunhai Guochuang Cloud Computing Equipment Industry Innovation Center Co Ltd
Priority to CN202210268534.7ApriorityCriticalpatent/CN114610397A/en
Publication of CN114610397ApublicationCriticalpatent/CN114610397A/en
Pendinglegal-statusCriticalCurrent

Links

Images

Classifications

Landscapes

Abstract

Translated fromChinese

本发明提出了一种负载均衡指令调度方法、装置、设备及介质,该方法包括:基于不同指令的发射等待延迟建立指令分配调度模型;如果多个指令之间不存在数据相关性,将每条指令分配至所述指令对应的指令分配调度模型下对应结果的方差最小的运算流水线;如果多个指令之间存在数据相关性,判断存在数据相关性的指令之间是否满足预先设置的相关约束,若不满足,存在数据相关性且执行优先级低的指令依次选择发射等待延迟小于自身指令对应发射等待延迟的指令使用的运算流水线,直到满足该约束,基于该方法,本发明还提出了一种负载均衡指令调度装置,有效地提高相关性指令执行时,调度指令均衡分配的可靠性。

Figure 202210268534

The present invention provides a load balancing instruction scheduling method, device, equipment and medium. The method includes: establishing an instruction distribution scheduling model based on the transmission waiting delay of different instructions; if there is no data correlation between multiple instructions, assigning each instruction The instruction is allocated to the operation pipeline with the smallest variance of the corresponding result under the instruction allocation scheduling model corresponding to the instruction; if there is data correlation between multiple instructions, it is judged whether the instructions with data correlation satisfy the preset correlation constraints, If it is not satisfied, the instruction with data dependency and low execution priority selects the operation pipeline used by the instruction whose launch waiting delay is smaller than the corresponding launch waiting delay of its own instruction, until the constraint is satisfied. Based on this method, the present invention also proposes a The load balancing instruction scheduling device can effectively improve the reliability of the balanced distribution of scheduling instructions when the related instructions are executed.

Figure 202210268534

Description

Translated fromChinese
一种负载均衡指令调度方法、装置、设备及介质A load balancing instruction scheduling method, device, device and medium

技术领域technical field

本发明涉及处理器中指令调度领域,尤其是涉及一种负载均衡指令调度方法、装置、设备及介质。The present invention relates to the field of instruction scheduling in processors, in particular to a load balancing instruction scheduling method, device, equipment and medium.

背景技术Background technique

超标量处理器通过增大指令窗口、充分利用处理器内部计算部件挖掘并行性。一般可以设置多个运算簇,每个运算簇包含包括不同的指令类型运算部件,综合考虑指令特点、硬件开销等性能要求,制定恰当的发射分配策略可以更好利用运算簇的性能。Superscalar processors exploit parallelism by increasing the instruction window and making full use of the internal computing components of the processor. Generally, multiple operation clusters can be set up, and each operation cluster includes operation components of different instruction types. Considering the performance requirements such as instruction characteristics and hardware overhead, formulating an appropriate issue allocation strategy can make better use of the performance of the operation cluster.

现有技术中,指令分配方法包括简单的静态分配方法(通过编译器进行软件编译),如轮转法虽然容易实现,但性能较差;复杂的分配方法考虑因素较多,能利用缓冲、计算资源,获得较好的均衡效果。In the prior art, the instruction allocation method includes a simple static allocation method (software compilation is performed by a compiler), such as the round-robin method, although it is easy to implement, but the performance is poor; the complex allocation method takes many factors into consideration, and can utilize buffering and computing resources. , to obtain a better balance effect.

但现有技术中均未考虑指令相关、数据依赖对指令调度带来的影响,即指令存在RAW(read after write,写后读)相关时,如果不合理安排指令发射分配顺序,轻则阻塞流水线,重则改变指令顺序,引起程序执行错误。However, the prior art does not consider the impact of instruction dependencies and data dependencies on instruction scheduling, that is, when an instruction is related to RAW (read after write, read after write), if the instruction distribution sequence is not reasonably arranged, the pipeline will be blocked. , and the order of instructions is changed, causing program execution errors.

发明内容SUMMARY OF THE INVENTION

本发明为了解决现有技术中存在的问题,创新提出了一种负载均衡指令调度方法、装置、设备及介质,有效解决由于现有技术未考虑指令相关性造成相关性指令执行顺序混乱的问题,有效地提高相关性指令执行时,调度指令均衡分配的可靠性。In order to solve the problems existing in the prior art, the present invention innovatively proposes a load balancing instruction scheduling method, device, equipment and medium, which effectively solves the problem that the execution sequence of dependent instructions is disordered because the prior art does not consider instruction dependency, and Effectively improve the reliability of the balanced allocation of scheduling instructions when the dependent instructions are executed.

本发明第一方面提供了一种负载均衡指令调度方法,运行于处理器的控制单元中,包括:A first aspect of the present invention provides a load balancing instruction scheduling method, which runs in a control unit of a processor and includes:

基于不同指令的发射等待延迟建立指令分配调度模型,用于将多个指令均衡分配至不同运算部件组成的运算流水线;An instruction allocation scheduling model is established based on the launch waiting delay of different instructions, which is used to evenly distribute multiple instructions to the operation pipeline composed of different operation components;

获取多个指令之间相互关系,如果多个指令之间不存在数据相关性,将每条指令分配至所述指令对应的指令分配调度模型下对应结果的方差最小的运算流水线;如果多个指令之间存在数据相关性,将每条指令分配至所述指令对应的指令分配调度模型下对应结果的方差最小的运算流水线,同时判断存在数据相关性的指令之间是否满足预先设置的相关约束,若不满足,存在数据相关性且执行优先级低的指令依次选择发射等待延迟小于自身指令对应发射等待延迟的指令使用的运算流水线,直到满足该约束。Obtain the relationship between multiple instructions. If there is no data dependency between multiple instructions, assign each instruction to the operation pipeline with the smallest variance of the corresponding result under the instruction allocation scheduling model corresponding to the instruction; if multiple instructions There is data correlation between them, and each instruction is allocated to the operation pipeline with the smallest variance of the corresponding result under the instruction allocation scheduling model corresponding to the instruction, and at the same time, it is judged whether the pre-set relevant constraints are satisfied between the instructions with data correlation, If it is not satisfied, the instructions with data dependencies and low execution priority sequentially select the operation pipeline used by the instructions whose issuing waiting delay is smaller than the corresponding issuing waiting delay of their own instructions, until the constraint is satisfied.

可选地,基于不同指令的发射等待延迟建立指令分配调度模型,用于将多个指令均衡分配至不同运算部件组成的运算流水线具体是:Optionally, an instruction distribution scheduling model is established based on the launch waiting delays of different instructions, and the operation pipeline for evenly distributing multiple instructions to different operation components is specifically:

获取不同指令的发射等待延迟以及不同运算流水线上已存在指令发射等待延迟;Obtain the issue waiting delay of different instructions and the existing instruction issue waiting delay on different operation pipelines;

根据获取不同指令的发射等待延迟以及不同运算流水线上已存在指令发射等待延迟,建立指令分配调度模型,用于将多个指令均衡分配至不同运算部件组成的运算流水线。According to the issue waiting delay of acquiring different instructions and the existing instruction issue waiting delay in different operation pipelines, an instruction allocation scheduling model is established, which is used to evenly distribute multiple instructions to the operation pipeline composed of different operation components.

进一步地,建立的指令分配调度模型具体是:Further, the established instruction allocation scheduling model is specifically:

Figure BDA0003553468640000021
Figure BDA0003553468640000021

其中,Lij为指令i分配到运算流水线j上的发射完成的等待延迟,m为运算流水线总条数,wij为指令i分配到运算流水线j上的发射等待延迟,Latij为指令i分配到运算流水线j时,运算流水线j已有指令发射完成等待延迟,I(i,j)表示指令i分配到运算流水线j上的可能性。Among them, Lij is the waiting delay for the completion of the launch of the instruction i allocated to the operation pipeline j, m is the total number of operation pipelines, wij is the transmission waiting delay that the instruction i is allocated to the operation pipeline j, and Latij is the allocation of the instruction i. When it arrives at the operation pipeline j, the operation pipeline j has an instruction issue completion waiting delay, and I(i, j) indicates the possibility of the instruction i being allocated to the operation pipeline j.

进一步地,将每条指令分配至所述指令对应的指令分配调度模型下对应结果的方差最小的运算流水线具体是:Further, assigning each instruction to the operation pipeline with the smallest variance of the corresponding result under the instruction assignment scheduling model corresponding to the instruction is specifically:

采用对称贪心解法对求解指令分配调度模型,使得每条指令分配至所述指令对应的指令分配调度模型下对应结果的方差最小的运算流水线。The symmetric greedy solution method is used to solve the instruction assignment scheduling model, so that each instruction is assigned to the operation pipeline with the smallest variance of the corresponding result under the instruction assignment scheduling model corresponding to the instruction.

进一步地,采用对称贪心解法对求解指令分配调度模型,使得每条指令分配至所述指令对应的指令分配调度模型下对应结果的方差最小的运算流水线具体是:Further, the symmetric greedy solution method is used to solve the instruction assignment scheduling model, so that each instruction is assigned to the operation pipeline with the smallest variance of the corresponding result under the instruction assignment scheduling model corresponding to the instruction, specifically:

采用对称贪心解法下,指令分配调度模型下n条指令对应结果的方差为:Under the symmetric greedy solution, the variance of the corresponding results of n instructions under the instruction allocation scheduling model is:

Figure BDA0003553468640000031
其中,Varn为指令分配调度模型中n条指令对应结果Lij的方差,wi为对称贪心解法情况下,指令i分配到运算流水线j上的发射等待延迟,C为常数,ki为对称贪心解法情况下当前j的最优解,
Figure BDA0003553468640000033
为第i-1条指令分配至最优运算流水线ki上的发射完成的等待延迟。
Figure BDA0003553468640000031
Among them, Varn is the variance of the corresponding result Lij of n instructions in the instruction allocation scheduling model, wi is the issue waiting delay when the instructioni is allocated to the operation pipeline j in the case of the symmetric greedy solution, C is a constant,ki is the symmetrical The optimal solution of the current j in the case of the greedy solution,
Figure BDA0003553468640000033
The latency to wait for the issue to complete for thei -1 th instruction assigned to the optimal computation pipeline ki.

可选地,存在数据相关性的指令包括生产指令以及消费指令,生产指令以及消费指令存在写后读相关性。Optionally, the instructions with data dependencies include a production instruction and a consumption instruction, and the production instruction and the consumption instruction have a read-after-write dependency.

进一步地,预先设置的相关约束为:Further, the preset related constraints are:

Figure BDA0003553468640000032
Figure BDA0003553468640000032

其中,Latxj为生产指令x分配到运算流水线j时,运算流水线j已有指令发射完成等待延迟,wxj为生产指令x分配到运算流水线j上的发射等待延迟,I(x,j)表示生产指令x分配到运算流水线j上的可能性,exex为生产指令x在运算流水线j的执行时间;Lathl为消费指令h分配到运算流水线l时,运算流水线l已有指令发射完成等待延迟,whl消费指令h分配到运算流水线l上的发射等待延迟,I(h,l)表示消费指令h分配到运算流水线l上的可能性。Among them, Latxj is the waiting delay for the operation pipeline j when the production instruction x is allocated to the operation pipeline j, and the operation pipeline j has the waiting delay for the completion of the issuance of instructions, and wxj is the transmission waiting delay when the production instruction x is allocated to the operation pipeline j, and I(x, j) represents The possibility that the production instruction x is allocated to the operation pipeline j, exex is the execution time of the production instruction x in the operation pipeline j; Lathl is when the consumption instruction h is allocated to the operation pipeline l, the operation pipeline l already has an instruction that has been issued and waited for the delay , whl consuming instruction h is allocated to the issue waiting delay on the operation pipeline l, I(h, l) represents the possibility that the consumption instruction h is allocated to the operation pipeline l.

本发明第二方面提供了一种负载均衡指令调度方法,运行于处理器的控制单元中,包括:A second aspect of the present invention provides a load balancing instruction scheduling method, which runs in a control unit of a processor, and includes:

建立模块,基于不同指令的发射等待延迟建立指令分配调度模型,用于将多个指令均衡分配至不同运算部件组成的运算流水线;Establishing a module to establish an instruction distribution scheduling model based on the launch waiting delay of different instructions, which is used for evenly distributing multiple instructions to an operation pipeline composed of different operation components;

分配模块,获取多个指令之间相互关系,如果多个指令之间不存在数据相关性,将每条指令分配至所述指令对应的指令分配调度模型下对应结果的方差最小的运算流水线;如果多个指令之间存在数据相关性,将每条指令分配至所述指令对应的指令分配调度模型下对应结果的方差最小的运算流水线,同时判断存在数据相关性的指令之间是否满足预先设置的相关约束,若不满足,存在数据相关性且执行优先级低的指令依次选择发射等待延迟小于自身指令对应发射等待延迟的指令使用的运算流水线,直到满足该约束。The allocation module obtains the relationship between multiple instructions, and if there is no data correlation between multiple instructions, assigns each instruction to the operation pipeline with the smallest variance of the corresponding result under the instruction distribution scheduling model corresponding to the instruction; if There is data dependency between multiple instructions, and each instruction is allocated to the operation pipeline with the smallest variance of the corresponding result under the instruction allocation scheduling model corresponding to the instruction, and at the same time, it is judged whether the instructions with data dependency meet the preset requirements. If the relevant constraints are not satisfied, the instructions with data dependencies and low execution priority sequentially select the operation pipeline used by the instructions whose launch waiting delay is smaller than the corresponding launch waiting delay of their own instructions, until the constraint is satisfied.

本发明第三方面提供了一种电子设备,包括:存储器,用于存储计算机程序;处理器,用于执行所述计算机程序时实现如本发明第一方面所述的一种负载均衡指令调度方法的步骤。A third aspect of the present invention provides an electronic device, comprising: a memory for storing a computer program; and a processor for implementing the load balancing instruction scheduling method according to the first aspect of the present invention when executing the computer program A step of.

本发明第四方面提供了一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现如本发明第一方面所述的一种负载均衡指令调度方法的步骤。A fourth aspect of the present invention provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, a load according to the first aspect of the present invention is implemented Steps of a balanced instruction scheduling method.

本发明采用的技术方案包括以下技术效果:The technical scheme adopted in the present invention includes the following technical effects:

1、本发明技术方案中通过指令之间是否存在相关性以及相关约束的判断,有效解决由于现有技术未考虑指令相关性造成相关性指令执行顺序混乱的问题,有效地提高相关性指令执行时,调度指令均衡分配的可靠性。1. In the technical solution of the present invention, by judging whether there is a correlation between the instructions and the relevant constraints, the problem that the execution order of the relevant instructions is disordered due to the fact that the prior art does not consider the instruction correlation effectively solves the problem, and effectively improves the execution time of the relevant instructions. , the reliability of the balanced allocation of scheduling instructions.

2、本发明技术方案在不使用编译器静态优化调度情况下,能考虑同时发射的多条指令存在数据相关性,并完成不同指令的运算流水线调度分配,实现负载均衡,降低指令流短板效应,提高指令吞吐量,提高运算速度。2. The technical solution of the present invention can take into account the data dependencies of multiple instructions issued at the same time without using the static optimization scheduling of the compiler, and complete the scheduling and allocation of the operation pipeline of different instructions, realize load balancing, and reduce the short board effect of the instruction flow. , improve the instruction throughput and improve the operation speed.

3、本发明技术方案采用对称贪心解法对求解指令分配调度模型,使得能够在尽量快的时间内找出全局相对较优解。3. The technical solution of the present invention adopts the symmetric greedy solution method to assign the scheduling model to the solution instruction, so that the global relatively optimal solution can be found in the fastest time possible.

应当理解的是以上的一般描述以及后文的细节描述仅是示例性和解释性的,并不能限制本发明。It is to be understood that the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention.

附图说明Description of drawings

为了更清楚说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单介绍,显而易见的,对于本领域普通技术人员而言,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the following briefly introduces the accompanying drawings that need to be used in the description of the embodiments or the prior art. Obviously, for those of ordinary skill in the art, On the premise of no creative work, other drawings can also be obtained from these drawings.

图1为本发明方案中实施例一方法的流程示意图;Fig. 1 is the schematic flow chart of the method of Example 1 in the scheme of the present invention;

图2为本发明方案中实施例一方法中步骤S1的流程示意图;Fig. 2 is the schematic flow chart of step S1 in the method of Example 1 in the solution of the present invention;

图3为本发明方案中实施例二装置的结构示意图;FIG. 3 is a schematic structural diagram of the device of Example 2 in the solution of the present invention;

图4为本发明方案中实施例三设备的结构示意图。FIG. 4 is a schematic structural diagram of the apparatus of Example 3 in the solution of the present invention.

具体实施方式Detailed ways

为能清楚说明本方案的技术特点,下面通过具体实施方式,并结合其附图,对本发明进行详细阐述。下文的公开提供了许多不同的实施例或例子用来实现本发明的不同结构。为了简化本发明的公开,下文中对特定例子的部件和设置进行描述。此外,本发明可以在不同例子中重复参考数字和/或字母。这种重复是为了简化和清楚的目的,其本身不指示所讨论各种实施例和/或设置之间的关系。应当注意,在附图中所图示的部件不一定按比例绘制。本发明省略了对公知组件和处理技术及工艺的描述以避免不必要地限制本发明。In order to clearly illustrate the technical features of the solution, the present invention will be described in detail below through specific embodiments and in conjunction with the accompanying drawings. The following disclosure provides many different embodiments or examples for implementing different structures of the invention. In order to simplify the disclosure of the present invention, the components and arrangements of specific examples are described below. Furthermore, the present invention may repeat reference numerals and/or letters in different instances. This repetition is for the purpose of simplicity and clarity and does not in itself indicate a relationship between the various embodiments and/or arrangements discussed. It should be noted that the components illustrated in the figures are not necessarily drawn to scale. Descriptions of well-known components and processing techniques and processes are omitted from the present invention to avoid unnecessarily limiting the present invention.

实施例一Example 1

如图1所示,本发明提供了一种负载均衡指令调度方法,运行于处理器的控制单元中,包括:As shown in FIG. 1, the present invention provides a load balancing instruction scheduling method, which runs in the control unit of the processor, including:

S1,基于不同指令的发射等待延迟建立指令分配调度模型,用于将多个指令均衡分配至不同运算部件组成的运算流水线;S1, an instruction allocation scheduling model is established based on the launch waiting delay of different instructions, which is used for evenly distributing multiple instructions to an operation pipeline composed of different operation components;

S2,获取多个指令之间相互关系,判断多个指令之间是否存在数据相关性,如果判断结果为是,则执行步骤S3;如果判断结果为否,则执行步骤S4;S2, obtain the mutual relationship between the multiple instructions, and determine whether there is data correlation between the multiple instructions, if the judgment result is yes, then execute step S3; if the judgment result is no, then execute step S4;

S3,将每条指令分配至该指令对应的指令分配调度模型下对应结果的方差最小的运算流水线,同时判断存在数据相关性的指令之间是否满足预先设置的相关约束,如果判断结果为是,则执行步骤S5;如果判断结果为否,则执行步骤S6;S3, assigning each instruction to the operation pipeline with the smallest variance of the corresponding result under the instruction assignment scheduling model corresponding to the instruction, and at the same time judging whether the instructions with data dependencies satisfy the preset relevant constraints, if the judgment result is yes, Then go to step S5; if the judgment result is no, go to step S6;

S4,将每条指令分配至所述指令对应的指令分配调度模型下对应结果的方差最小的运算流水线;S4, assigning each instruction to the operation pipeline with the smallest variance of the corresponding result under the instruction assignment scheduling model corresponding to the instruction;

S5,指令分配调度调度完成;S5, the instruction allocation scheduling is completed;

S6,存在数据相关性且执行优先级低的指令依次选择发射等待延迟小于自身指令对应发射等待延迟的指令使用的运算流水线,直到满足该约束。S6 , the instructions with data dependencies and low execution priority sequentially select the operation pipeline used by the instructions whose launch waiting delay is smaller than the launch waiting delay corresponding to their own instructions, until the constraint is satisfied.

其中,在步骤S1中,如图2所示,步骤S1具体包括:Wherein, in step S1, as shown in FIG. 2, step S1 specifically includes:

S11,获取不同指令的发射等待延迟以及不同运算流水线上已存在指令发射等待延迟;S11, obtain the issue waiting delays of different instructions and the existing instruction issue waiting delays on different operation pipelines;

S12,根据获取不同指令的发射等待延迟以及不同运算流水线上已存在指令发射等待延迟,建立指令分配调度模型,用于将多个指令均衡分配至不同运算部件组成的运算流水线。S12 , according to the issue waiting delays of acquiring different instructions and the existing instruction issue waiting delays on different operation pipelines, establish an instruction allocation scheduling model for evenly distributing multiple instructions to operation pipelines composed of different operation components.

其中,在步骤S11-S12中,建立的指令分配调度模型具体是:Wherein, in steps S11-S12, the established instruction allocation scheduling model is specifically:

Figure BDA0003553468640000071
Figure BDA0003553468640000071

其中,Lij为指令i分配到运算流水线j上的发射完成的等待延迟,m为运算流水线总条数,wij为指令i分配到运算流水线j上的发射等待延迟,Latij为指令i分配到运算流水线j时,运算流水线j已有指令发射完成等待延迟,I(i,j)表示指令i分配到运算流水线j上的可能性。Among them, Lij is the waiting delay for the completion of the launch of the instruction i allocated to the operation pipeline j, m is the total number of operation pipelines, wij is the transmission waiting delay that the instruction i is allocated to the operation pipeline j, and Latij is the allocation of the instruction i. When it arrives at the operation pipeline j, the operation pipeline j has an instruction issue completion waiting delay, and I(i, j) indicates the possibility of the instruction i being allocated to the operation pipeline j.

运算簇(处理器内部包括多个运算簇,每个运算簇中包括多个运算部件)设置遵循以下原则:运算簇按按整数、浮点、访存分类配置;每类指令只对应一种运算部件;部件设置数目与使用频率正相关;计算部件按执行时间长短互补的方式配置。The operation cluster (including multiple operation clusters inside the processor, and each operation cluster includes multiple operation components) follows the following principles: operation clusters are configured according to integer, floating point, and memory access classification; each type of instruction corresponds to only one operation Components; the number of component settings is positively related to the frequency of use; the calculation components are configured in a way that complements the execution time.

指令发射延时来源于运算簇和运算部件,每周期每个运算簇发射通道只能发射一条指令,对应一个周期发射延时。此外,还需要运算部件执行完当前指令、空闲时,才能执行新指令,对应运算部件发射延迟。The instruction launch delay comes from the operation cluster and the operation unit. Each operation cluster launch channel can only issue one instruction per cycle, corresponding to one cycle launch delay. In addition, a new instruction can only be executed when the computing component has finished executing the current instruction and is idle, which corresponds to the delay of the computing component launching.

为消除伪相关,会对处理器中的寄存器进行重命名,留下RAW真相关,只能等待生产指令执行完毕,消费指令才能执行。每个运算簇对应使用相同运算部件的一种指令类型,也对应一个发射缓冲,译码重命名后的指令被放在等待缓冲中,在硬件设计完成情况下,每周期发射的指令类型及数目固定。指令发射执行的条件为操作数准备好、运算簇空闲、计算部件空闲。In order to eliminate the false correlation, the registers in the processor will be renamed, leaving the RAW true correlation, and the consumption instruction can only be executed after the execution of the production instruction is completed. Each operation cluster corresponds to an instruction type that uses the same operation unit, and also corresponds to an issue buffer. The instructions after decoding and renaming are placed in the waiting buffer. When the hardware design is completed, the type and number of instructions issued per cycle fixed. The conditions for instruction issuance and execution are that the operand is ready, the operation cluster is idle, and the computing unit is idle.

指令发射后,运算簇和计算部件busy(处于工作状态),下一拍运算簇free(空闲),计算部件执行完free。因此,操作数相关性、指令可选的运算簇范围、指令延迟影响了指令执行状态。程序经编译器编译成指令后,存放在指令存储器(寄存器)。经取指单元取出发射后,控制单元对寄存器进行重命名后便可执行。After the instruction is issued, the computing cluster and the computing component are busy (in a working state), the computing cluster is free (idle) in the next beat, and the computing component is free after execution. Therefore, operand dependencies, the range of operation clusters that an instruction can select, and instruction latency affect the instruction execution state. After the program is compiled into instructions by the compiler, it is stored in the instruction memory (register). After the instruction fetch unit fetches and transmits, the control unit can execute after renaming the register.

控制单元根据指令类型及运算簇负载情况,将指令依次调度给不同的运算簇,运算簇将各自分配得到的指令放入等待缓冲中,等指令的操作数准备好、对应的运算簇空闲、计算部件空闲后,指令被发射到运算部件执行。The control unit schedules the instructions to different operation clusters in turn according to the instruction type and the load condition of the operation cluster. The operation cluster puts the assigned instructions into the waiting buffer, and waits until the operands of the instruction are ready, the corresponding operation cluster is free, and the calculation After the unit is idle, the instruction is issued to the arithmetic unit for execution.

假设有m条运算流水线(不同运算簇中的多个运算部件组成的运算流水线,从0--m-1对m条运算流水线进行编号),发射n条指令,指令执行时间固定,指令i分配到运算流水线j上的发射等待延迟为wij,指令i无法分配到运算流水线j上时,令wij=max_Lat,运算流水线j上已有指令需要等待Latj完成发射,I(i,j)=1表示指令i分配到运算流水线j上,I(i,j)=0则相反,即指令i未分配到运算流水线j上;因此

Figure BDA0003553468640000091
Assuming that there are m operation pipelines (operation pipelines composed of multiple operation components in different operation clusters, the m operation pipelines are numbered from 0--m-1), n instructions are issued, the instruction execution time is fixed, and the instruction i is allocated The launch waiting delay to the operation pipeline j is wij , and when the instruction i cannot be allocated to the operation pipeline j, let wij =max_Lat, the existing instructions on the operation pipeline j need to wait for Latj to complete the transmission, I(i,j) =1 indicates that instruction i is allocated to the operation pipeline j, and I(i,j)=0 is the opposite, that is, the instruction i is not allocated to the operation pipeline j; therefore
Figure BDA0003553468640000091

其中,在步骤S2中,存在数据相关性的指令包括生产指令以及消费指令,生产指令以及消费指令存在写后读相关性,即消费指令必须在生产指令执行完之后发射。Wherein, in step S2, the instructions with data dependencies include production instructions and consumption instructions, and the production instructions and consumption instructions have read-after-write dependencies, that is, the consumption instructions must be issued after the production instructions are executed.

在步骤S3-S6中,将每条指令分配至所述指令对应的指令分配调度模型下对应结果的方差最小的运算流水线具体是:In steps S3-S6, assigning each instruction to the operation pipeline with the smallest variance of the corresponding result under the instruction assignment scheduling model corresponding to the instruction is specifically:

采用对称贪心解法对求解指令分配调度模型,使得每条指令分配至所述指令对应的指令分配调度模型下对应结果的方差最小的运算流水线。The symmetric greedy solution method is used to solve the instruction assignment scheduling model, so that each instruction is assigned to the operation pipeline with the smallest variance of the corresponding result under the instruction assignment scheduling model corresponding to the instruction.

具体地,采用对称贪心解法对求解指令分配调度模型,使得每条指令分配至所述指令对应的指令分配调度模型下对应结果的方差最小的运算流水线具体是:Specifically, the symmetric greedy solution method is used to solve the instruction assignment scheduling model, so that each instruction is assigned to the operation pipeline with the smallest variance of the corresponding result under the instruction assignment scheduling model corresponding to the instruction. Specifically:

采用对称贪心解法下,指令分配调度模型下n条指令对应结果的方差为:Under the symmetric greedy solution, the variance of the corresponding results of n instructions under the instruction allocation scheduling model is:

Figure BDA0003553468640000092
其中,Varn为指令分配调度模型中n条指令对应结果Lij的方差,wi为对称贪心解法情况下,指令i分配到运算流水线j上的发射等待延迟,C为常数,ki为对称贪心解法情况下当前j的最优解,
Figure BDA0003553468640000093
为第i-1条指令分配至最优运算流水线ki上的发射完成的等待延迟。
Figure BDA0003553468640000092
Among them, Varn is the variance of the corresponding result Lij of n instructions in the instruction allocation scheduling model, wi is the issue waiting delay when the instructioni is allocated to the operation pipeline j in the case of the symmetric greedy solution, C is a constant,ki is the symmetrical The optimal solution of the current j in the case of the greedy solution,
Figure BDA0003553468640000093
The latency to wait for the issue to complete for thei -1 th instruction assigned to the optimal computation pipeline ki.

指令分配调度模型中Lij(指令i分配到运算流水线j上的发射完成的等待延迟)的负载均衡效果用方差表示:In the instruction allocation scheduling model, the load balancing effect of Lij (the waiting delay for the completion of the issue when the instruction i is allocated to the operation pipeline j) is represented by the variance:

Figure BDA0003553468640000101
Figure BDA0003553468640000101

其中,

Figure BDA0003553468640000102
in,
Figure BDA0003553468640000102

均衡目标即求解方差Var最小情况下,各个指令i对应的流水线分配策略I(i,j)。The equilibrium goal is to solve the pipeline allocation strategy I(i,j) corresponding to each instruction i when the variance Var is minimized.

该非线性规划问题的求解可以使用遍历法求解,但花费时间很长且无法硬件实现。对处理器而言,所有已知状态为译码分配后的n条指令,可以采用寻找局部最优解的贪心解法,系列局部最优解保证系统在尽量快的时间内找出全局相对较优解。The solution of this nonlinear programming problem can be solved using the ergodic method, but it takes a long time and cannot be implemented in hardware. For the processor, all known states are n instructions after decoding and allocation, and the greedy solution method of finding local optimal solutions can be used. untie.

对指令i,所有运算流水线负载

Figure BDA0003553468640000103
初始负载
Figure BDA0003553468640000104
For instruction i, all operations are pipelined
Figure BDA0003553468640000103
initial load
Figure BDA0003553468640000104

对称贪心求解第i条指令分配的流水线j,满足目标

Figure BDA0003553468640000105
假设当前j的最优解为ki。如果有多个解,则轮换分配。Symmetrically greedy solves the pipeline j allocated by the i-th instruction to meet the goal
Figure BDA0003553468640000105
Suppose the optimal solution of current j is ki . If there are multiple solutions, rotate the assignments.

此时,

Figure BDA0003553468640000106
at this time,
Figure BDA0003553468640000106

分配指令i后,所有运算流水线负载Li的均值

Figure BDA0003553468640000107
Figure BDA0003553468640000108
After allocating instructioni , the mean value of all operational pipeline loads Li
Figure BDA0003553468640000107
Figure BDA0003553468640000108

分配指令i后,所有运算流水线负载Li的方差

Figure BDA0003553468640000109
Figure BDA00035534686400001010
After assigning instructioni , the variance of all computation pipeline loads Li
Figure BDA0003553468640000109
Figure BDA00035534686400001010

则n条指令下,所有运算流水线负载

Figure BDA00035534686400001011
的方差
Figure BDA0003553468640000111
Then under n instructions, all operation pipeline loads
Figure BDA00035534686400001011
Variance
Figure BDA0003553468640000111

对称贪心解法情况下指令发射到任意一条运算流水线的发射等待延迟的时间相同,即wij=wi,因此In the case of the symmetric greedy solution, the issue waiting time of the instruction issued to any one of the operation pipelines is the same, that is, wij =wi , so

Figure BDA0003553468640000112
Figure BDA0003553468640000112

其中,C为常数。where C is a constant.

在不考虑指令相关性时,指令i选择对应流水线中Li-1,j最小的流水线作ki方差最小,又因为Lij随i增大而增大,所以不同指令的负载

Figure BDA0003553468640000113
也随i增大而增大。When the instruction dependency is not considered, the instruction i selects the pipeline with the smallest Li-1,j in the corresponding pipeline to make the ki variance the smallest, and because Lij increases with the increase of i, the load of different instructions
Figure BDA0003553468640000113
It also increases as i increases.

为实现Varn最小化,优选地,可以将不同指令wi从大到小排序,每条指令取

Figure BDA0003553468640000114
则可实现分配策略的确定。In order to minimize Varn , preferably, different instructionswi can be sorted from large to small, and each instruction fetches
Figure BDA0003553468640000114
Then the determination of the allocation strategy can be realized.

进一步地,当指令之间存在数据相关性时,存在数据相关性的指令包括生产指令以及消费指令,生产指令以及消费指令存在写后读相关性。Further, when there is data dependency between the instructions, the instructions with data dependency include a production instruction and a consumption instruction, and the production instruction and the consumption instruction have a read-after-write dependency.

对应地,预先设置的相关约束为:Correspondingly, the preset related constraints are:

Figure BDA0003553468640000115
Figure BDA0003553468640000115

其中,Latxj为生产指令x分配到运算流水线j时,运算流水线j已有指令发射完成等待延迟,wxj为生产指令x分配到运算流水线j上的发射等待延迟,I(x,j)表示生产指令x分配到运算流水线j上的可能性,exex为生产指令x在运算流水线j的执行时间;Lathl为消费指令h分配到运算流水线l时,运算流水线l已有指令发射完成等待延迟,whl消费指令h分配到运算流水线l上的发射等待延迟,I(h,l)表示消费指令h分配到运算流水线l上的可能性。Among them, Latxj is the waiting delay for the operation pipeline j when the production instruction x is allocated to the operation pipeline j, and the operation pipeline j has the waiting delay for the completion of the issuance of instructions, and wxj is the transmission waiting delay when the production instruction x is allocated to the operation pipeline j, and I(x, j) represents The possibility that the production instruction x is allocated to the operation pipeline j, exex is the execution time of the production instruction x in the operation pipeline j; Lathl is when the consumption instruction h is allocated to the operation pipeline l, the operation pipeline l already has an instruction that has been issued and waited for the delay , whl consuming instruction h is allocated to the issue waiting delay on the operation pipeline l, I(h, l) represents the possibility that the consumption instruction h is allocated to the operation pipeline l.

现有技术中一般是在缓冲区考虑指令的数据相关性,此时指令已完成分配调度,有可能发生在消费指令在运算簇与运算部件都空闲时,因等待生产指令运算完产生操作数的额外延时。本发明在指令重命名后的分配阶段(指令未分配前),就考虑数据相关性,依此决定分配策略参考数据之一。In the prior art, the data dependency of the instruction is generally considered in the buffer. At this time, the instruction has been allocated and scheduled. It may happen that when the consumption instruction is idle in the operation cluster and the operation component, the operand is generated due to waiting for the operation of the production instruction. extra delay. In the allocation stage after the instruction is renamed (before the instruction is not allocated), the present invention considers the data dependency, and determines one of the allocation strategy reference data accordingly.

在取指、译码并完成重命名后,处理器即可掌握指令类型、执行时间exex,且在调度消费指令h与生产指令x时,Lathl、whl、Latxj、wij都是可知常量。After fetching, decoding and renaming, the processor can grasp the instruction type and execution time exex , and when scheduling the consumption instruction h and the production instruction x, Lathl , whl , Latxj , and wij are all Known constants.

在重命名后的分配阶段,考虑数据相关性约束,将不同指令wi从大到小排序,每条指令取最小Li-1时的运算流水线j,即ki,计算是否满足RAW相关约束。若不满足,消费指令h依次选择wi值小于wh的自身指令使用的流水线l,直到满足该约束为止。In the allocation stage after renaming, considering the data dependency constraints, the different instructions wi are sorted from large to small, and the operation pipeline j when each instruction takes the minimum Li-1 , that is, ki , calculates whether the RAW related constraints are satisfied. . If it is not satisfied, the consumption instructionh sequentially selects the pipeline l used by its own instruction whosewi value is less than wh until the constraint is satisfied.

本发明技术方案中通过指令之间是否存在相关性以及相关约束的判断,有效解决由于现有技术未考虑指令相关性造成相关性指令执行顺序混乱的问题,有效地提高相关性指令执行时,调度指令均衡分配的可靠性。In the technical solution of the present invention, by judging whether there is a correlation between instructions and related constraints, the problem that the execution sequence of the related instructions is disordered due to the lack of consideration of the instruction correlation in the prior art effectively solves the problem, and effectively improves the scheduling efficiency when the related instructions are executed. The reliability of the balanced distribution of instructions.

本发明技术方案在不使用编译器静态优化调度情况下,能考虑同时发射的多条指令存在数据相关性,并完成不同指令的运算流水线调度分配,实现负载均衡,降低指令流短板效应,提高指令吞吐量,提高运算速度。The technical scheme of the present invention can take into account the data dependencies of multiple instructions issued at the same time without using the static optimization scheduling of the compiler, and complete the operation pipeline scheduling and allocation of different instructions, realize load balancing, reduce the short-board effect of the instruction flow, and improve the Instruction throughput, improve operation speed.

本发明技术方案采用对称贪心解法对求解指令分配调度模型,使得能够在尽量快的时间内找出全局相对较优解。The technical scheme of the present invention adopts the symmetric greedy solution method to assign the scheduling model to the solution instruction, so that the global relatively optimal solution can be found in the fastest time possible.

实施例二Embodiment 2

如图3所示,本发明技术方案还提供了一种负载均衡指令调度方法,运行于处理器的控制单元中,包括:As shown in FIG. 3 , the technical solution of the present invention also provides a load balancing instruction scheduling method, which runs in the control unit of the processor and includes:

建立模块101,基于不同指令的发射等待延迟建立指令分配调度模型,用于将多个指令均衡分配至不同运算部件组成的运算流水线;Theestablishment module 101 is to establish an instruction allocation scheduling model based on the delay of transmitting different instructions, so as to evenly distribute a plurality of instructions to the calculation pipeline composed of different calculation components;

分配模块102,获取多个指令之间相互关系,如果多个指令之间不存在数据相关性,将每条指令分配至所述指令对应的指令分配调度模型下对应结果的方差最小的运算流水线;如果多个指令之间存在数据相关性,将每条指令分配至所述指令对应的指令分配调度模型下对应结果的方差最小的运算流水线,同时判断存在数据相关性的指令之间是否满足预先设置的相关约束,若不满足,存在数据相关性且执行优先级低的指令依次选择发射等待延迟小于自身指令对应发射等待延迟的指令使用的运算流水线,直到满足该约束。Theallocation module 102 obtains the relationship between the multiple instructions, and if there is no data correlation between the multiple instructions, assigns each instruction to the operation pipeline with the smallest variance of the corresponding result under the instruction distribution scheduling model corresponding to the instruction; If there is data dependency between multiple instructions, assign each instruction to the operation pipeline with the smallest variance of the corresponding result under the instruction allocation scheduling model corresponding to the instruction, and judge whether the instructions with data dependency satisfy the preset requirements. If it is not satisfied, the instructions with data dependencies and low execution priority will sequentially select the operation pipeline used by the instruction whose launch waiting delay is smaller than the corresponding launch waiting delay of its own instruction, until the constraint is satisfied.

本发明技术方案中通过指令之间是否存在相关性以及相关约束的判断,有效解决由于现有技术未考虑指令相关性造成相关性指令执行顺序混乱的问题,有效地提高相关性指令执行时,调度指令均衡分配的可靠性。In the technical solution of the present invention, by judging whether there is a correlation between the instructions and the relevant constraints, the problem that the execution sequence of the relevant instructions is disordered due to the lack of consideration of the instruction correlation in the prior art is effectively solved, and the scheduling of the execution of the relevant instructions is effectively improved. The reliability of the balanced distribution of instructions.

本发明技术方案在不使用编译器静态优化调度情况下,能考虑同时发射的多条指令存在数据相关性,并完成不同指令的运算流水线调度分配,实现负载均衡,降低指令流短板效应,提高指令吞吐量,提高运算速度。The technical scheme of the present invention can take into account the data dependencies of multiple instructions issued at the same time without using the static optimization scheduling of the compiler, and complete the operation pipeline scheduling and allocation of different instructions, realize load balancing, reduce the short-board effect of the instruction flow, and improve the Instruction throughput, improve operation speed.

本发明技术方案采用对称贪心解法对求解指令分配调度模型,使得能够在尽量快的时间内找出全局相对较优解。The technical scheme of the present invention adopts the symmetric greedy solution method to assign the scheduling model to the solution instruction, so that the global relatively optimal solution can be found in the fastest time possible.

实施例三Embodiment 3

如图4所示,本发明技术方案还提供了一种电子设备,包括:存储器201,用于存储计算机程序;处理器202,用于执行所述计算机程序时实现如实施例一中的一种负载均衡指令调度方法的步骤。As shown in FIG. 4 , the technical solution of the present invention also provides an electronic device, including: amemory 201 for storing a computer program; aprocessor 202 for implementing one of the first embodiment when executing the computer program The steps of the load balancing instruction scheduling method.

本申请实施例中的存储器201用于存储各种类型的数据以支持电子设备的操作。这些数据的示例包括:用于在电子设备上操作的任何计算机程序。可以理解,存储器201可以是易失性存储器或非易失性存储器,也可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(ROM,Read Only Memory)、可编程只读存储器(PROM,Programmable Read-Only Memory)、可擦除可编程只读存储器(EPROM,ErasableProgrammable Read-Only Memory)、电可擦除可编程只读存储器(EEPROM,ElectricallyErasable Programmable Read-Only Memory)、磁性随机存取存储器(FRAM,ferromagneticrandom access memory)、快闪存储器(Flash Memory)、磁表面存储器、光盘、或只读光盘(CD-ROM,Compact Disc Read-Only Memory);磁表面存储器可以是磁盘存储器或磁带存储器。易失性存储器可以是随机存取存储器(RAM,Random Access Memory),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(SRAM,Static Random Access Memory)、同步静态随机存取存储器(SSRAM,SynchronousStatic Random Access Memory)、动态随机存取存储器(DRAM,Dynamic Random AccessMemory)、同步动态随机存取存储器(SDRAM,Synchronous Dynamic Random AccessMemory)、双倍数据速率同步动态随机存取存储器(DDRSDRAM,Double Data RateSynchronous Dynamic Random Access Memory)、增强型同步动态随机存取存储器(ESDRAM,Enhanced Synchronous Dynamic Random Access Memory)、同步连接动态随机存取存储器(SLDRAM,SyncLink Dynamic Random Access Memory)、直接内存总线随机存取存储器(DRRAM,Direct Rambus Random Access Memory)。本申请实施例描述的存储器201旨在包括但不限于这些和任意其它适合类型的存储器。Thememory 201 in the embodiment of the present application is used to store various types of data to support the operation of the electronic device. Examples of such data include: any computer program used to operate on an electronic device. It can be understood that thememory 201 may be a volatile memory or a non-volatile memory, and may also include both volatile and non-volatile memory. Among them, the non-volatile memory may be a read-only memory (ROM, Read Only Memory), a programmable read-only memory (PROM, Programmable Read-Only Memory), an erasable programmable read-only memory (EPROM, Erasable Programmable Read-Only Memory) Memory), Electrically Erasable Programmable Read-Only Memory (EEPROM, Electrically Erasable Programmable Read-Only Memory), Magnetic Random Access Memory (FRAM, ferromagneticrandom access memory), Flash Memory, Magnetic Surface Memory, Optical Disc, Or compact disc read-only memory (CD-ROM, Compact Disc Read-Only Memory); magnetic surface memory can be magnetic disk memory or tape memory. The volatile memory may be random access memory (RAM, Random Access Memory), which is used as an external cache memory. By way of example and not limitation, many forms of RAM are available, such as Static Random Access Memory (SRAM), Synchronous Static Random Access Memory (SSRAM), Dynamic Random Access Memory (DRAM, Dynamic Random Access Memory), Synchronous Dynamic Random Access Memory (SDRAM, Synchronous Dynamic Random Access Memory), Double Data Rate Synchronous Dynamic Random Access Memory (DDRSDRAM, Double Data Rate Synchronous Dynamic Random Access Memory), Enhanced Synchronous Dynamic Random Access Memory Access memory (ESDRAM, Enhanced Synchronous Dynamic Random Access Memory), synchronous link dynamic random access memory (SLDRAM, SyncLink Dynamic Random Access Memory), Direct memory bus random access memory (DRRAM, Direct Rambus Random Access Memory). Thememory 201 described in the embodiments of the present application is intended to include but not limited to these and any other suitable types of memory.

上述本申请实施例揭示的方法可以应用于处理器202中,或者由处理器202实现。处理器202可能是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法的各步骤可以通过处理器202中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器202可以是通用处理器、DSP(Digital Signal Processing,即指能够实现数字信号处理技术的芯片),或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。处理器202可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者任何常规的处理器等。结合本申请实施例所公开的方法的步骤,可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于存储介质中,该存储介质位于存储器201,处理器202读取存储器201中的程序,结合其硬件完成前述方法的步骤。处理器202执行所述程序时实现本申请实施例的各个方法中的相应流程,为了简洁,在此不再赘述。The methods disclosed in the above embodiments of the present application may be applied to theprocessor 202 or implemented by theprocessor 202 . Theprocessor 202 may be an integrated circuit chip with signal processing capability. In the implementation process, each step of the above-mentioned method can be completed by an integrated logic circuit of hardware in theprocessor 202 or an instruction in the form of software. The above-mentionedprocessor 202 may be a general-purpose processor, a DSP (Digital Signal Processing, that is, a chip capable of implementing digital signal processing technology), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, and the like. Theprocessor 202 may implement or execute the methods, steps, and logical block diagrams disclosed in the embodiments of this application. A general purpose processor may be a microprocessor or any conventional processor or the like. The steps of the method disclosed in the embodiments of the present application can be directly embodied as being executed by a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor. The software module may be located in a storage medium, and the storage medium is located in thememory 201, and theprocessor 202 reads the program in thememory 201, and completes the steps of the foregoing method in combination with its hardware. When theprocessor 202 executes the program, the corresponding process in each method of the embodiments of the present application is implemented, which is not repeated here for brevity.

本发明技术方案中通过指令之间是否存在相关性以及相关约束的判断,有效解决由于现有技术未考虑指令相关性造成相关性指令执行顺序混乱的问题,有效地提高相关性指令执行时,调度指令均衡分配的可靠性。In the technical solution of the present invention, by judging whether there is a correlation between instructions and related constraints, the problem that the execution sequence of the related instructions is disordered due to the lack of consideration of the instruction correlation in the prior art effectively solves the problem, and effectively improves the scheduling efficiency when the related instructions are executed. The reliability of the balanced distribution of instructions.

本发明技术方案在不使用编译器静态优化调度情况下,能考虑同时发射的多条指令存在数据相关性,并完成不同指令的运算流水线调度分配,实现负载均衡,降低指令流短板效应,提高指令吞吐量,提高运算速度。The technical scheme of the present invention can take into account the data dependencies of multiple instructions issued at the same time without using the static optimization scheduling of the compiler, and complete the operation pipeline scheduling and allocation of different instructions, realize load balancing, reduce the short-board effect of the instruction flow, and improve the Instruction throughput, improve operation speed.

本发明技术方案采用对称贪心解法对求解指令分配调度模型,使得能够在尽量快的时间内找出全局相对较优解。The technical scheme of the present invention adopts the symmetric greedy solution method to assign the scheduling model to the solution instruction, so that the global relatively optimal solution can be found in the fastest time possible.

实施例四Embodiment 4

本发明技术方案还提供了一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现如实施例一中的一种负载均衡指令调度方法的步骤。The technical solution of the present invention also provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the load balancing instruction scheduling in the first embodiment is implemented steps of the method.

例如包括存储计算机程序的存储器201,上述计算机程序可由处理器202执行,以完成前述方法所述步骤。计算机可读存储介质可以是FRAM、ROM、PROM、EPROM、EEPROM、FlashMemory、磁表面存储器、光盘、或CD-ROM等存储器。For example, it includes amemory 201 storing a computer program, and the computer program can be executed by theprocessor 202 to complete the steps of the aforementioned method. The computer-readable storage medium may be memory such as FRAM, ROM, PROM, EPROM, EEPROM, FlashMemory, magnetic surface memory, optical disk, or CD-ROM.

本领域普通技术人员可以理解:实现上述方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成,前述的程序可以存储于一计算机可读取存储介质中,该程序在执行时,执行包括上述方法实施例的步骤;而前述的存储介质包括:移动存储设备、ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。或者,本申请上述集成的单元如果以软件功能模块的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请实施例的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台电子设备(可以是个人计算机、服务器、或者网络设备等)执行本申请各个实施例所述方法的全部或部分。而前述的存储介质包括:移动存储设备、ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。Those of ordinary skill in the art can understand that all or part of the steps of implementing the above method embodiments can be completed by program instructions related to hardware, the aforementioned program can be stored in a computer-readable storage medium, and when the program is executed, execute It includes the steps of the above method embodiments; and the aforementioned storage medium includes: a removable storage device, a ROM, a RAM, a magnetic disk or an optical disk and other media that can store program codes. Alternatively, if the above-mentioned integrated units of the present application are implemented in the form of software function modules and sold or used as independent products, they may also be stored in a computer-readable storage medium. Based on such understanding, the technical solutions of the embodiments of the present application can be embodied in the form of software products in essence or in the parts that make contributions to the prior art. The computer software products are stored in a storage medium and include several instructions for An electronic device (which may be a personal computer, a server, or a network device, etc.) is caused to execute all or part of the methods described in the various embodiments of the present application. The aforementioned storage medium includes: a removable storage device, a ROM, a RAM, a magnetic disk or an optical disk and other mediums that can store program codes.

本发明技术方案中通过指令之间是否存在相关性以及相关约束的判断,有效解决由于现有技术未考虑指令相关性造成相关性指令执行顺序混乱的问题,有效地提高相关性指令执行时,调度指令均衡分配的可靠性。In the technical solution of the present invention, by judging whether there is a correlation between instructions and related constraints, the problem that the execution sequence of the related instructions is disordered due to the lack of consideration of the instruction correlation in the prior art effectively solves the problem, and effectively improves the scheduling efficiency when the related instructions are executed. The reliability of the balanced distribution of instructions.

本发明技术方案在不使用编译器静态优化调度情况下,能考虑同时发射的多条指令存在数据相关性,并完成不同指令的运算流水线调度分配,实现负载均衡,降低指令流短板效应,提高指令吞吐量,提高运算速度。The technical scheme of the present invention can take into account the data dependencies of multiple instructions issued at the same time without using the static optimization scheduling of the compiler, and complete the operation pipeline scheduling and allocation of different instructions, realize load balancing, reduce the short-board effect of the instruction flow, and improve the Instruction throughput, improve operation speed.

本发明技术方案采用对称贪心解法对求解指令分配调度模型,使得能够在尽量快的时间内找出全局相对较优解。The technical scheme of the present invention adopts the symmetric greedy solution method to assign the scheduling model to the solution instruction, so that the global relatively optimal solution can be found in the fastest time possible.

上述虽然结合附图对本发明的具体实施方式进行了描述,但并非对本发明保护范围的限制,所属领域技术人员应该明白,在本发明的技术方案的基础上,本领域技术人员不需要付出创造性劳动即可做出的各种修改或变形仍在本发明的保护范围以内。Although the specific embodiments of the present invention have been described above in conjunction with the accompanying drawings, they do not limit the scope of protection of the present invention. Those skilled in the art should understand that on the basis of the technical solutions of the present invention, those skilled in the art do not need to pay creative work. Various modifications or deformations that can be made are still within the protection scope of the present invention.

Claims (10)

1. A load balancing instruction scheduling method is characterized in that a control unit running in a processor comprises the following steps:
establishing an instruction distribution scheduling model based on the emission waiting delay of different instructions, wherein the instruction distribution scheduling model is used for distributing a plurality of instructions to an operation production line formed by different operation parts in a balanced manner;
acquiring the interrelation among a plurality of instructions, and if the data correlation does not exist among the plurality of instructions, allocating each instruction to an operation production line with the minimum variance of the corresponding result under an instruction allocation scheduling model corresponding to the instruction; if data correlation exists among a plurality of instructions, each instruction is distributed to an operation assembly line with the minimum variance of the corresponding result under an instruction distribution scheduling model corresponding to the instruction, whether preset correlation constraint is met among the instructions with the data correlation is judged, if the preset correlation constraint is not met, the operation assembly line used by the instruction with the transmission waiting delay smaller than the transmission waiting delay corresponding to the instruction of the instruction is sequentially selected by the instructions with the data correlation and low execution priority until the constraint is met.
2. The method as claimed in claim 1, wherein the step of establishing the instruction allocation scheduling model based on the issue latency of different instructions for allocating a plurality of instructions to the operation pipeline composed of different operation units comprises:
acquiring the emission waiting delays of different instructions and the existing instruction emission waiting delays on different operation pipelines;
and establishing an instruction distribution scheduling model for uniformly distributing a plurality of instructions to the operation pipelines composed of different operation parts according to the obtained emission waiting delays of different instructions and the existing instruction emission waiting delays on different operation pipelines.
3. The method as claimed in claim 2, wherein the instruction allocation scheduling model is specifically:
Figure FDA0003553468630000021
wherein L isijThe latency of completion of the issue assigned to instruction i on the arithmetic pipeline j, m being the total number of arithmetic pipelines, wijAllocating an instruction i to a launch wait delay, Lat, on an operation pipeline jijWhen an instruction I is distributed to an operation pipeline j, the operation pipeline j has an instruction emission completion waiting delay, and I (I, j) represents the possibility that the instruction I is distributed to the operation pipeline j.
4. The method as claimed in claim 3, wherein the operation pipeline with the smallest variance of the corresponding result under the instruction allocation scheduling model corresponding to each instruction to which each instruction is allocated specifically comprises:
and adopting a symmetric greedy solution to distribute a scheduling model for the solved instructions, so that each instruction is distributed to an operation production line with the minimum variance of the corresponding result under the instruction distribution scheduling model corresponding to the instruction.
5. The method as claimed in claim 4, wherein a symmetric greedy solution is adopted to assign a scheduling model to the solved instructions, and the operation pipeline with the minimum variance of the corresponding result when each instruction is assigned to the instruction assignment scheduling model corresponding to the instruction is specifically:
under the symmetric greedy solution method, the variance of the corresponding results of n instructions under the instruction allocation scheduling model is as follows:
Figure FDA0003553468630000022
wherein, VarnDistributing the corresponding result L of n instructions in the scheduling model to the instructionsijVariance of, wiIn the case of a symmetric greedy solution, instruction i is assigned a latency of issue on the operation pipeline j, C is a constant, kiIs the optimal solution for the current j in the case of a symmetric greedy solution,
Figure FDA0003553468630000031
distributing the (i-1) th instruction to the optimal operation pipeline kiLatency of completion of the transmission.
6. The method according to any of claims 1-5, wherein the instructions having data dependencies comprise a produce instruction and a consume instruction, the produce instruction and the consume instruction having read-after-write dependencies.
7. The method as claimed in claim 6, wherein the preset relevant constraints are:
Figure FDA0003553468630000032
wherein, LatxjWhen the production instruction x is distributed to the operation flow line j, the operation flow line j has an instruction issueWait for completion of shot, wxjIssue wait delay assigned to the operation pipeline j for the production instruction x, I (x, j) representing the possibility of the production instruction x being assigned to the operation pipeline j, exexThe execution time of the production instruction x in the operation pipeline j; lathlWhen the consumed instruction h is distributed to the operation pipeline l, the operation pipeline l has the instruction emission completion waiting delay, whlIssue wait delay of the consumer instruction h allocated onto the operation pipeline l, I (h, l) represents the probability of the consumer instruction h being allocated onto the operation pipeline l.
8. A load balancing instruction scheduling method is characterized in that a control unit running in a processor comprises the following steps:
the establishing module is used for establishing an instruction distribution scheduling model based on the transmission waiting delay of different instructions and used for evenly distributing a plurality of instructions to an operation assembly line consisting of different operation parts;
the distribution module is used for obtaining the correlation among the multiple instructions, and distributing each instruction to an operation production line with the minimum variance of the corresponding result under the instruction distribution scheduling model corresponding to the instruction if the data correlation does not exist among the multiple instructions; if data correlation exists among a plurality of instructions, each instruction is distributed to an operation assembly line with the minimum variance of the corresponding result under an instruction distribution scheduling model corresponding to the instruction, whether preset correlation constraint is met among the instructions with the data correlation is judged, if the preset correlation constraint is not met, the operation assembly line used by the instruction with the transmission waiting delay smaller than the transmission waiting delay corresponding to the instruction of the instruction is sequentially selected by the instructions with the data correlation and low execution priority until the constraint is met.
9. An electronic device, comprising: a memory for storing a computer program; a processor for implementing the steps of a method of load balancing instruction scheduling according to any one of claims 1 to 7 when executing said computer program.
10. A computer-readable storage medium, having stored thereon a computer program which, when being executed by a processor, carries out the steps of a method for load balancing instruction scheduling according to any one of claims 1 to 7.
CN202210268534.7A2022-03-182022-03-18Load balancing instruction scheduling method, device, equipment and mediumPendingCN114610397A (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
CN202210268534.7ACN114610397A (en)2022-03-182022-03-18Load balancing instruction scheduling method, device, equipment and medium

Applications Claiming Priority (1)

Application NumberPriority DateFiling DateTitle
CN202210268534.7ACN114610397A (en)2022-03-182022-03-18Load balancing instruction scheduling method, device, equipment and medium

Publications (1)

Publication NumberPublication Date
CN114610397Atrue CN114610397A (en)2022-06-10

Family

ID=81864304

Family Applications (1)

Application NumberTitlePriority DateFiling Date
CN202210268534.7APendingCN114610397A (en)2022-03-182022-03-18Load balancing instruction scheduling method, device, equipment and medium

Country Status (1)

CountryLink
CN (1)CN114610397A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN116095177A (en)*2023-02-032023-05-09西安交通大学Hierarchical clustering scheduling method, system, medium and equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US5664193A (en)*1995-11-171997-09-02Sun Microsystems, Inc.Method and apparatus for automatic selection of the load latency to be used in modulo scheduling in an optimizing compiler
CN1545026A (en)*2003-11-262004-11-10中国人民解放军国防科学技术大学 A Dynamic VLIW Instruction Scheduling Method Based on Deterministic Latency
CN112445528A (en)*2019-08-292021-03-05无锡江南计算技术研究所Result self-checking instruction sequence filling method based on pipeline constraint

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US5664193A (en)*1995-11-171997-09-02Sun Microsystems, Inc.Method and apparatus for automatic selection of the load latency to be used in modulo scheduling in an optimizing compiler
CN1545026A (en)*2003-11-262004-11-10中国人民解放军国防科学技术大学 A Dynamic VLIW Instruction Scheduling Method Based on Deterministic Latency
CN112445528A (en)*2019-08-292021-03-05无锡江南计算技术研究所Result self-checking instruction sequence filling method based on pipeline constraint

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
王国澎;杨剑新;尹飞;蒋生健;: "负载均衡的处理器运算资源分配方法", 计算机科学, no. 08, 31 August 2020 (2020-08-31), pages 41 - 48*
胡越明编著: "《计算机系统结构》", 31 October 2007, 北京:北京航空航天大学出版社, pages: 52 - 68*

Cited By (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
CN116095177A (en)*2023-02-032023-05-09西安交通大学Hierarchical clustering scheduling method, system, medium and equipment

Similar Documents

PublicationPublication DateTitle
US10768989B2 (en)Virtual vector processing
CN111310910B (en)Computing device and method
US11609792B2 (en)Maximizing resource utilization of neural network computing system
CN100461094C (en) An Instruction Control Method for Stream Processor
US9672035B2 (en)Data processing apparatus and method for performing vector processing
US9619298B2 (en)Scheduling computing tasks for multi-processor systems based on resource requirements
US8893104B2 (en)Method and apparatus for register spill minimization
US20140181477A1 (en)Compressing Execution Cycles For Divergent Execution In A Single Instruction Multiple Data (SIMD) Processor
US9927859B2 (en)Internal communication interconnect scalability
US8997071B2 (en)Optimized division of work among processors in a heterogeneous processing system
US10268519B2 (en)Scheduling method and processing device for thread groups execution in a computing system
GB2492457A (en)Predicting out of order instruction level parallelism of threads in a multi-threaded processor
CN111752615A (en)Apparatus, method and system for ensuring quality of service of multithreaded processor cores
US9886278B2 (en)Computing architecture and method for processing data
EP1794674A1 (en)Dynamic loading and unloading for processing unit
US20240086359A1 (en)Dynamic allocation of arithmetic logic units for vectorized operations
CN111857830B (en)Method, system and storage medium for designing path for forwarding instruction data in advance
US20110276786A1 (en)Shared Prefetching to Reduce Execution Skew in Multi-Threaded Systems
CN118796277A (en) Instruction pipeline optimization and dynamic programming method and system based on GPGPU
CN114610397A (en)Load balancing instruction scheduling method, device, equipment and medium
US20240193721A1 (en)System and method for adaptive graph-to-stream scheduling
CN118245187A (en) Thread scheduling method and device, electronic device and storage medium
CN112148106A (en)System, apparatus and method for hybrid reservation station for processor
CN113703841B (en) An optimized method, device and medium for register data reading
US11853762B1 (en)Single instruction multiple data execution with variable size logical registers

Legal Events

DateCodeTitleDescription
PB01Publication
PB01Publication
SE01Entry into force of request for substantive examination
SE01Entry into force of request for substantive examination

[8]ページ先頭

©2009-2025 Movatter.jp