CN101630275A

Movatterモバイル変換

Info

Publication number: CN101630275A
Application number: CN200910090403A
Authority: CN
Inventors: 刘雷波; 王延升; 尹首一; 于苏东; 魏少军
Original assignee: Tsinghua University
Current assignee: Shenzhen Pango Microsystems Co Ltd
Priority date: 2009-07-31
Filing date: 2009-07-31
Publication date: 2010-01-20
Anticipated expiration: 2029-07-31
Also published as: CN101630275B

Abstract

本发明公开了一种实现生成循环任务配置信息的方法和装置，属于嵌入式系统领域。当数据流图规模不大于可重构阵列的规模时，所述方法包括：对可重构阵列当前执行数据流的节点进行算子调度；根据算子调度结果，获得关键路径的长度、各个节点的自由度，以及各个节点可处的时钟周期；根据关键路径的长度、可重构阵列的列数，建立矩阵；当数据流中存在未映射的节点时，根据未映射节点的自由度、未映射节点可处的时钟周期，以及映射规则，将未映射的节点映射到矩阵中；根据得到的映射结果，得到循环任务在可重构阵列上的配置信息。本发明提供的方案，只生成一套可重构阵列的配置信息，减少了向可重构阵列传输配置信息的时间，提高了效率。

The invention discloses a method and a device for realizing generating cycle task configuration information, belonging to the field of embedded systems. When the scale of the data flow graph is not larger than the scale of the reconfigurable array, the method includes: performing operator scheduling on the nodes currently executing the data flow in the reconfigurable array; according to the operator scheduling result, obtaining the length of the critical path, each node degree of freedom of each node, and the available clock cycles of each node; according to the length of the critical path and the number of columns of the reconfigurable array, a matrix is established; when there are unmapped nodes in the data stream, according to the degrees of freedom of the unmapped nodes, The available clock cycles of the mapping nodes and the mapping rules map the unmapped nodes into the matrix; according to the obtained mapping results, the configuration information of the cyclic tasks on the reconfigurable array is obtained. The solution provided by the invention only generates a set of configuration information of the reconfigurable array, reduces the time for transmitting configuration information to the reconfigurable array, and improves efficiency.

Description

Translated fromChinese

一种实现生成循环任务配置信息的方法和装置A method and device for generating cyclic task configuration information

技术领域technical field

本发明涉及嵌入式系统领域，特别涉及一种实现生成循环任务配置信息的方法和装置。The invention relates to the field of embedded systems, in particular to a method and device for realizing generating cycle task configuration information.

背景技术Background technique

芯片集成度的日益提高，使得一个芯片内可以集成大量的功能模块，形成片上系统芯片。在集成的功能模块中往往包含了处理器模块和硬件加速模块，如何让处理器与硬件加速模块能够协同工作，是片上系统芯片的软硬件协同设计需要解决的一个问题。With the increasing level of chip integration, a large number of functional modules can be integrated in one chip to form a system-on-chip. The integrated functional module often includes a processor module and a hardware acceleration module. How to make the processor and the hardware acceleration module work together is a problem that needs to be solved in the software-hardware co-design of the SoC.

其中，可重构处理器由主处理器和可重构阵列构成。如图1所示，可重构处理器的软硬件划分是指将应用程序划分为在主处理器上执行的软件部分和在可重构阵列上执行的硬件部分。通常情况下，主处理器运行应用程序的控制部分和计算量相对较小的部分，而可重构阵列运行应用中的计算量相对较大的部分。在可重构阵列上，按照节点在数据流图中的执行顺序将其映射到可重构阵列内部的可重构单元上，由映射的结果可产生可重构阵列的配置信息，该配置信息用于配置可重构单元行，实现可重构单元行的运算功能。Among them, the reconfigurable processor is composed of a main processor and a reconfigurable array. As shown in Figure 1, the software and hardware division of the reconfigurable processor refers to dividing the application program into a software part executed on the main processor and a hardware part executed on the reconfigurable array. Typically, the main processor runs the control and relatively light portions of the application, while the reconfigurable array runs the relatively computationally heavy portions of the application. On the reconfigurable array, the nodes are mapped to the reconfigurable units inside the reconfigurable array according to the execution order of the nodes in the data flow graph, and the configuration information of the reconfigurable array can be generated from the mapping result, the configuration information It is used to configure the reconfigurable unit row and realize the operation function of the reconfigurable unit row.

发明人在实现本发明的过程中发现，现有技术至少存在以下缺点：The inventor finds in the process of realizing the present invention that there are at least the following disadvantages in the prior art:

在可重构阵列执行对循环任务进行软硬件划分之后，可重构阵列当前执行的数据流图的关键路径的长度大于该可重构阵列自身的行数时，现有技术会分多批的将节点映射到可重构阵列上，导致会生成了多套可重构阵列的配置信息，增加了向可重构阵列传输配置信息的时间，无法较快地实现配置可重构单元行的运算功能，降低了可重构阵列的处理效率。After the reconfigurable array executes the software and hardware division of the cyclic tasks, when the length of the critical path of the data flow graph currently executed by the reconfigurable array is greater than the number of rows of the reconfigurable array itself, the existing technology will divide the cyclic tasks into multiple batches. Mapping nodes to reconfigurable arrays will generate multiple sets of configuration information for reconfigurable arrays, which increases the time for transmitting configuration information to reconfigurable arrays, and cannot quickly realize the operation of configuring reconfigurable unit rows function, reducing the processing efficiency of reconfigurable arrays.

发明内容Contents of the invention

当可重构阵列当前执行的数据流图的关键路径的长度大于可重构阵列行数的数据流图时，为了可以一次将节点映射到可重构单元上，减少向可重构阵列传输配置信息的时间，提高处理效率，满足实际应用中的需要，本发明实施例提供了一种实现生成循环任务配置信息的方法和装置，所述技术方案如下：When the length of the critical path of the data flow graph currently executed by the reconfigurable array is greater than the data flow graph of the row number of the reconfigurable array, in order to map nodes to reconfigurable units at one time, reduce the transmission of configuration to the reconfigurable array information, improve processing efficiency, and meet the needs of practical applications. Embodiments of the present invention provide a method and device for generating cyclic task configuration information. The technical solution is as follows:

一种实现生成循环任务配置信息的方法，当数据流的规模不大于可重构阵列的规模时，所述方法包括：A method for generating cyclic task configuration information, when the size of the data stream is not greater than the size of the reconfigurable array, the method includes:

步骤C1：对所述可重构阵列当前执行数据流的节点进行算子调度；Step C1: Perform operator scheduling on the nodes currently executing data streams in the reconfigurable array;

步骤C2：根据算子调度结果，获得关键路径的长度、各个节点的自由度，以及各个节点可处的时钟周期；Step C2: According to the operator scheduling result, obtain the length of the critical path, the degree of freedom of each node, and the available clock cycle of each node;

步骤C3：根据所述关键路径的长度、所述可重构阵列的列数，建立矩阵；Step C3: Establishing a matrix according to the length of the critical path and the number of columns of the reconfigurable array;

步骤C4：当所述数据流中存在未映射的节点时，根据所述未映射节点的自由度、所述未映射节点可处的时钟周期，以及映射规则，将所述未映射的节点映射到所述矩阵中；Step C4: When there are unmapped nodes in the data stream, map the unmapped nodes to In said matrix;

步骤C5：根据步骤C4得到的映射结果，得到所述循环任务在所述可重构阵列上的配置信息。Step C5: Obtain configuration information of the cyclic task on the reconfigurable array according to the mapping result obtained in step C4.

所述步骤C1包括：Said step C1 comprises:

步骤C11：对所述当前执行数据流的节点进行第一算子调度；Step C11: Perform first operator scheduling on the node currently executing the data flow;

步骤C12：对所述当前执行数据流的节点进行第二算子调度；Step C12: Perform second operator scheduling on the node currently executing the data flow;

相应地，所述步骤C2获得的各个节点的自由度具体包括：Correspondingly, the degrees of freedom of each node obtained in step C2 specifically include:

各个节点的自由度＝“第二算子调度中该节点的时钟周期-第一算子调度中该节点的时钟周期+1”。The degree of freedom of each node = "the clock cycle of the node in the second operator schedule - the clock cycle of the node in the first operator schedule + 1".

所述步骤C3具体包括：以所述关键路径的长度为行、以所述可重构阵列的列数为列，建立矩阵。The step C3 specifically includes: establishing a matrix with the length of the critical path as the row and the number of columns of the reconfigurable array as the column.

当所述数据流中存在的未映射的节点个数为多个，且自由度不同时，When there are multiple unmapped nodes in the data stream, and the degrees of freedom are different,

所述步骤C4根据所述未映射节点的自由度、所述未映射节点可处的时钟周期，以及映射规则，将所述未映射的节点映射到所述矩阵中具体包括：The step C4, according to the degree of freedom of the unmapped node, the clock cycle where the unmapped node can be located, and the mapping rule, mapping the unmapped node into the matrix specifically includes:

依次按照所述多个未映射节点的自由度由低到高的顺序，根据所述未映射节点可处的时钟周期，以及映射规则，将所述未映射的节点映射到所述矩阵。Mapping the unmapped nodes to the matrix according to the descending order of the degrees of freedom of the multiple unmapped nodes from low to high, according to the available clock cycles of the unmapped nodes and the mapping rules.

所述映射规则具体包括：The mapping rules specifically include:

在所述矩阵中，将所述节点映射到行号与所述节点的时钟周期相同的行上；In the matrix, mapping the node to a row whose row number is the same as the clock period of the node;

且，在将所述节点映射到所述行时，按照从左至右的顺序进行映射；And, when the node is mapped to the row, the mapping is performed in order from left to right;

且，当对所述节点映射后，在所述矩阵中与所述节点映射的位置同列，且横坐标相差可重构阵列行数的整数倍的所有位置不再被其他未映射节点所映射。Moreover, after the node is mapped, all positions in the matrix that are in the same column as the mapped node and whose abscissa differs from an integer multiple of the number of rows of the reconfigurable array are no longer mapped by other unmapped nodes.

当所述节点可处在不同的时钟周期时，所述在所述矩阵中，将所述节点映射到行号与所述节点的时钟周期相同的行上，还包括：When the nodes can be in different clock periods, in the matrix, mapping the nodes to the row whose row number is the same as the clock period of the node further includes:

I、根据是否影响其他未映射节点的自由度，确定该节点所映射的行；或，I. Determine the row mapped by the node according to whether it affects the degrees of freedom of other unmapped nodes; or,

II、根据该节点可映射的行的填满程度，确定该节点所映射的行；或，II. Determine the row mapped by the node according to the fullness of the mappable rows of the node; or,

III、根据该节点可映射的行的行号，确定该节点所映射的行。III. Determine the row mapped by the node according to the row number of the row mappable by the node.

所述I、II、III执行的优先级顺序为：I＞II＞III。The order of priority for execution of I, II, and III is: I>II>III.

一种生成循环任务配置信息的装置，当数据流图规模不大于可重构阵列的规模时，所述装置包括：A device for generating cyclic task configuration information, when the size of the data flow graph is not larger than the size of the reconfigurable array, the device includes:

调度模块，用于对所述可重构阵列当前执行数据流的节点进行算子调度；A scheduling module, configured to perform operator scheduling on nodes currently executing data streams in the reconfigurable array;

获得模块，用于根据所述调度模块得到的算子调度结果，获得关键路径的长度、各个节点的自由度，以及各个节点可处的时钟周期；An obtaining module, configured to obtain the length of the critical path, the degree of freedom of each node, and the available clock cycle of each node according to the operator scheduling result obtained by the scheduling module;

建立模块，用于根据所述获得模块获得的关键路径的长度、所述可重构阵列的列数，建立矩阵；A building module, configured to create a matrix according to the length of the critical path obtained by the obtaining module and the number of columns of the reconfigurable array;

映射模块，用于当所述数据流中存在未映射的节点时，根据所述所述获得模块获得的未映射节点的自由度、所述未映射节点可处的时钟周期，以及映射规则，将所述未映射的节点映射到所述建立模块建立的矩阵中；a mapping module, configured to, when there is an unmapped node in the data stream, according to the degree of freedom of the unmapped node obtained by the obtaining module, the clock cycle where the unmapped node can be located, and the mapping rule, convert The unmapped nodes are mapped into the matrix established by the establishment module;

生成模块，用于所述映射模块得到的映射结果，得到所述循环任务在所述可重构阵列上的配置信息。The generating module is used for the mapping result obtained by the mapping module to obtain configuration information of the cyclic task on the reconfigurable array.

所述调度模块包括：The scheduling module includes:

第一调度单元，用于对所述当前执行数据流的节点进行第一算子调度；A first scheduling unit, configured to perform first operator scheduling on the node currently executing the data flow;

第二调度单元，用于对所述当前执行数据流的节点进行第二算子调度；a second scheduling unit, configured to perform second operator scheduling on the node currently executing the data flow;

相应地，所述获得模块在获得各个节点的自由度时，根据所述第二调度单元得到的第二算子调度中该节点的时钟周期-所述第一调度单元得到的第一算子调度中该节点的时钟周期+1，得到该节点的自由度。Correspondingly, when the obtaining module obtains the degree of freedom of each node, according to the clock cycle of the node in the second operator schedule obtained by the second scheduling unit - the first operator schedule obtained by the first scheduling unit The clock cycle of the node in +1, get the degree of freedom of the node.

所述建立模块具体用于以所述关键路径的长度为行、以所述可重构阵列的列数为列，建立矩阵。The establishing module is specifically configured to establish a matrix with the length of the critical path as the row and the number of columns of the reconfigurable array as the column.

所述映射模块，具体用于依次按照所述多个未映射节点的自由度由低到高的顺序，根据所述未映射节点可处的时钟周期，以及映射规则，将所述未映射的节点映射到所述矩阵。The mapping module is specifically configured to convert the unmapped nodes to mapped to the matrix.

所述映射规则具体包括：The mapping rules specifically include:

本发明实施例提供的技术方案的有益效果是：通过算子调度获得各个节点的自由度、时钟周期、关键路径的长度等参数，利用上述参数将未映射的节点映射到相对应的矩阵中，可以只获得一套可重构阵列上的配置信息，为此，减少了向可重构阵列传输配置信息的时间，提高了处理效率，满足了实际应用中的需要。The beneficial effect of the technical solution provided by the embodiment of the present invention is: the degree of freedom of each node, the clock cycle, the length of the critical path and other parameters are obtained through operator scheduling, and the unmapped nodes are mapped to the corresponding matrix by using the above parameters, Only one set of configuration information on the reconfigurable array can be obtained, thereby reducing the time for transmitting configuration information to the reconfigurable array, improving processing efficiency, and meeting the needs of practical applications.

附图说明Description of drawings

为了更清楚地说明本发明实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only These are some embodiments of the present invention. Those skilled in the art can also obtain other drawings based on these drawings without creative work.

图1是现有技术提供的可重构处理器的软硬件划分框图；FIG. 1 is a block diagram of software and hardware division of a reconfigurable processor provided by the prior art;

图2是本发明实施例提供的算子调度的示意图；Fig. 2 is a schematic diagram of operator scheduling provided by an embodiment of the present invention;

图3是本发明实施例提供的可重构阵列的结构图；Fig. 3 is a structural diagram of a reconfigurable array provided by an embodiment of the present invention;

图4是本发明实施例提供的一种实现生成循环任务配置信息方法的示意图；FIG. 4 is a schematic diagram of a method for generating configuration information of a cyclic task provided by an embodiment of the present invention;

图5是本发明实施例提供的待划分的循环体的数据流图；Fig. 5 is a data flow diagram of a loop body to be divided provided by an embodiment of the present invention;

图6是本发明实施例提供的算子调度结果的示意图；Fig. 6 is a schematic diagram of operator scheduling results provided by an embodiment of the present invention;

图7是本发明实施例提供的映射过程及配置信息的示意图；FIG. 7 is a schematic diagram of a mapping process and configuration information provided by an embodiment of the present invention;

图8是本发明实施例1提供的一种实现生成循环任务配置信息的方法的流程图；FIG. 8 is a flow chart of a method for generating cyclic task configuration information provided byEmbodiment 1 of the present invention;

图9是本发明实施例2提供的一种实现生成循环任务配置信息的装置的示意图；FIG. 9 is a schematic diagram of a device for generating cyclic task configuration information provided byEmbodiment 2 of the present invention;

图10是本发明实施例2提供的一种实现生成循环任务配置信息的装置的具体示意图。FIG. 10 is a specific schematic diagram of an apparatus for generating cyclic task configuration information provided byEmbodiment 2 of the present invention.

具体实施方式Detailed ways

为使本发明的目的、技术方案和优点更加清楚，下面将结合附图对本发明实施方式作进一步地详细描述。In order to make the object, technical solution and advantages of the present invention clearer, the implementation manner of the present invention will be further described in detail below in conjunction with the accompanying drawings.

在基于算子调度对软硬件进行划分时，将数据流图，在时间域上进行划分，确定每一个算子执行时间的早晚和相互间的顺序。算子调度从算子执行时间早晚的角度出发，可分为“尽可能早”算子调度和“尽可能晚”算子调度。所谓“尽可能早”算子调度将所有算子尽可能早的执行；相反，所谓“尽可能晚”算子调度将所有算子尽可能晚的执行。图2给出了一个简单的数据流图，其中各个算子均可在一个时钟周期内完成，算子调度的流程如下：When dividing software and hardware based on operator scheduling, the data flow graph is divided in the time domain to determine the execution time of each operator and the order between them. From the perspective of operator execution time, operator scheduling can be divided into "as early as possible" operator scheduling and "as late as possible" operator scheduling. The so-called "as early as possible" operator scheduling executes all operators as early as possible; on the contrary, the so-called "as late as possible" operator scheduling executes all operators as late as possible. Figure 2 shows a simple data flow diagram, in which each operator can be completed within one clock cycle, and the operator scheduling process is as follows:

1、在数据流图中找出关键路径，关键路径为在所有的输入到输出的路径中，执行时间最长，即算子个数最多的路径，关键路径上的算子不在调度之列，将其按照执行的顺序分配到各个时钟周期。1. Find the critical path in the data flow diagram. The critical path is the path with the longest execution time among all the paths from input to output, that is, the path with the largest number of operators. The operators on the critical path are not included in the scheduling. Distribute them to individual clock cycles in the order of execution.

其中，参见图2，图2中的关键路径为“1→2→3→4→5→6”，算子“1”、“2”、“3”、“4”、“5”、“6”分别被分配到第1、2、3、4、5、6个时钟周期。Among them, referring to Figure 2, the critical path in Figure 2 is "1→2→3→4→5→6", and the operators "1", "2", "3", "4", "5", " 6" are assigned to the 1st, 2nd, 3rd, 4th, 5th, and 6th clock cycles respectively.

2、对于非关键路径上的算子，“尽可能早”算子调度将算子尽可能早的执行。参见图2，算子“8”可以在第1至5个时钟周期之间的任一时钟周期执行，算子“7”可以在第3、4个时钟周期执行(由于从路径服从关键路径的调度关系，算子“7”的输入来自算子“2”的输出，算子“2”位于第2时钟周期，即，算子“7”可以位于第2时钟周期后的周期，而，算子“5”的输入来自算子“7”的输出，算子“5”位于第5个时钟周期，即算子“7”只能位于第5时钟周期以上的时钟周期，综上，即算子“7”可以位于第3、4时钟周期。)，根据“尽可能早”算子调度，算子“8”被分配到第1个时钟周期执行；算子“7”被分配到第3个时钟周期；相反，“尽可能晚”算子调度将算子尽可能晚的执行，参见图2算子“8”被分配到第5个时钟周期执行；算子“7”被分配到第4个时钟周期。2. For operators on non-critical paths, "as early as possible" operator scheduling executes operators as early as possible. Referring to Figure 2, operator "8" can be executed in any clock cycle between the 1st and 5th clock cycles, and operator "7" can be executed in the 3rd and 4th clock cycles (because the slave path obeys the critical path Scheduling relationship, the input of operator "7" comes from the output of operator "2", and operator "2" is located in the second clock cycle, that is, operator "7" can be located in the cycle after the second clock cycle, and the operator The input of sub "5" comes from the output of operator "7", and operator "5" is located in the 5th clock cycle, that is, operator "7" can only be located in the clock cycle above the 5th clock cycle. In summary, the calculation The operator "7" can be located in the 3rd or 4th clock cycle.), according to the "as early as possible" operator scheduling, the operator "8" is assigned to execute in the 1st clock cycle; the operator "7" is assigned to the 3rd clock cycle clock cycle; on the contrary, the "as late as possible" operator scheduling executes the operator as late as possible, see Figure 2. Operator "8" is assigned to execute in the 5th clock cycle; operator "7" is assigned to execute in the5th clock cycle 4 clock cycles.

其中，自由度代表算子可以在不同的时钟周期中被执行的自由程度，可以由“尽可能早”算子调度和“尽可能晚”算子调度的结果计算得到。其中，算子自由度的计算公式为“‘尽可能晚算子调度中算子的时钟周期’-‘尽可能早算子调度中算子的时钟周期’+1”。所以，关键路径上的所有算子的自由度均为1。即，图2中算子“1”、“2”、“3”、“4”、“5”、“6”的自由度为1，算子“7”的自由度为2(具体为，4-3+1＝2)，算子“8”的自由度为5(具体为5-1+1＝5)。Among them, the degree of freedom represents the degree of freedom that operators can be executed in different clock cycles, and can be calculated from the results of "as early as possible" operator scheduling and "as late as possible" operator scheduling. Among them, the calculation formula of the degree of freedom of the operator is "'the clock period of the operator in the operator scheduling as late as possible'-'the clock period of the operator in the operator scheduling as early aspossible'+1". Therefore, all operators on the critical path have 1 degree of freedom. That is, the degree of freedom of operators "1", "2", "3", "4", "5" and "6" in Fig. 2 is 1, and the degree of freedom of operator "7" is 2 (specifically, 4-3+1=2), and the degree of freedom of the operator "8" is 5 (specifically, 5-1+1=5).

为了便于说明，本实施例将上述算子定义为节点，将上述“尽可能早”算子调度定义为第一算子调度，将上述“尽可能晚”算子调度定义为第二算子调度。For ease of description, in this embodiment, the above-mentioned operators are defined as nodes, the above-mentioned "as early as possible" operator scheduling is defined as the first operator scheduling, and the above-mentioned "as late as possible" operator scheduling is defined as the second operator scheduling .

参见图3，图3(a)为本发明实施提供的一种RCA(Reconfigurable Cell Array，可重构阵列)协处理器，其中，主要包括RC(Reconfigurable Cell，可重构单元)、临时数据模块、外部数据寄存器、路由模块4部分。Referring to Fig. 3, Fig. 3 (a) provides a kind of RCA (Reconfigurable Cell Array, reconfigurable array) coprocessor for the implementation of the present invention, wherein, mainly comprise RC (Reconfigurable Cell, reconfigurable unit), temporary data module , external data register,routing module 4 parts.

其中，可重构单元以行为单位，实现算术运算和逻辑运算的功能，一行内的多个RC经配置后在一个时钟周期内并行的完成运算；最后一行的可重构单元的输出经路由模块选择后将作为第一行可重构单元的输入。Among them, the reconfigurable unit realizes the functions of arithmetic operation and logical operation in units of rows, and multiple RCs in one row are configured to complete the operation in parallel within one clock cycle; the output of the last row of reconfigurable units is routed through the routing module When selected it will be the input to the first row of reconfigurable cells.

其中，临时数据模块的输入来自每个RC的输出；Among them, the input of the temporary data module comes from the output of each RC;

临时数据模块的功能是将RC的输出延迟若干个时钟周期，并在需要的时钟周期输出给路由模块。然后，经路由模块选择后，输出给RC作为输入。其中，例如图3(b)数据流图中，节点“1”的输出将作为节点“4”的输入。先将节点“1”、“2”、“3”、“4”分别映射到RCA右起第1列的第1、2、3、4个RC。第1个时钟周期，节点“1”完成运算后，将节点“1”的运算结果输入给临时数据模块。临时数据模块将节点“1”的运算结果延迟2个时钟周期，在第4个时钟周期输出给路由模块4，经路由模块选择后输入给第4行右起第1个RC，即节点“4”对应的RC。The function of the temporary data module is to delay the output of the RC for several clock cycles, and output it to the routing module at the required clock cycle. Then, after being selected by the routing module, it is output to RC as an input. Wherein, for example, in the data flow diagram of Fig. 3(b), the output of node "1" will be used as the input of node "4". First map the nodes "1", "2", "3", and "4" to the 1st, 2nd, 3rd, and 4th RCs in the first column from the right of the RCA, respectively. In the first clock cycle, after node "1" completes the operation, the operation result of node "1" is input to the temporary data module. The temporary data module delays the operation result of node "1" by 2 clock cycles, outputs it to therouting module 4 in the fourth clock cycle, and inputs it to the first RC from the right of the fourth row after being selected by the routing module, that is, the node "4 "Corresponding RC.

其中，路由模块将每行RC连接起来，路由模块的输入来自三个方面：外部数据寄存器、上一行RC的输出、临时数据模块；Among them, the routing module connects each row of RC, and the input of the routing module comes from three aspects: the external data register, the output of the previous row of RC, and the temporary data module;

(1)路由模块的功能是为RC选择输入，其选择外部数据寄存器、上一行RC的输出、临时数据模块的输入输出给RC，作为RC的输入；(1) The function of the routing module is to select the input for RC, which selects the external data register, the output of the previous line of RC, and the input and output of the temporary data module to RC as the input of RC;

(2)最后一行RC的输出经路由模块选择后可作为第一行RC的输入，这一结构使得RCA可以映射关键路径长度大于自身行数的数据流图。例如图3(b)数据流图的关键路径为5，大于RCA的行数4。将节点“1”、“2”、“3”、“4”分别映射到RCA右起第1列的第1、2、3、4个RC，将节点“5”映射到RCA第1行右起第2个RC。通过上述的结构，则节点“4”的输出可以输入给路由模块1，经路由模块选择后输入给节点“5”。这实现了关键路径长度大于RCA行数的任务图到RCA的映射。(2) The output of the last row of RC can be used as the input of the first row of RC after being selected by the routing module. This structure enables RCA to map the data flow graph whose critical path length is greater than its own row number. For example, the critical path of the data flow graph in Figure 3(b) is 5, which is greater than therow number 4 of RCA. Map the nodes "1", "2", "3", and "4" to the 1st, 2nd, 3rd, and 4th RCs in the first column from the right of the RCA, and map the node "5" to the right of the first row of the RCA Start the second RC. Through the above structure, the output of the node "4" can be input to therouting module 1, and then input to the node "5" after being selected by the routing module. This enables mapping of task graphs with critical path lengths greater than the number of RCA rows to RCA.

其中，本发明实施例具体以图3提供的可重构阵列协处理器进行说明，但对此不做限制。Wherein, the embodiment of the present invention is specifically described by using the reconfigurable array coprocessor provided in FIG. 3 , but this is not limited thereto.

参见图4，图4是本发明实施例提供的方法的示意图，下面以一个具体实施例来说明划分算法的执行过程，具体执行过程如下：Referring to FIG. 4, FIG. 4 is a schematic diagram of the method provided by the embodiment of the present invention. The execution process of the division algorithm is described below with a specific embodiment. The specific execution process is as follows:

该方法包括：The method includes:

D1：为对可重构阵列当前执行数据流的节点进行算子调度；D1: Perform operator scheduling for the nodes that currently execute data flow on the reconfigurable array;

D2：为根据算子调度结果，获得关键路径的长度、各个节点的自由度，以及各个节点可处在的时钟周期；D2: According to the operator scheduling results, obtain the length of the critical path, the degrees of freedom of each node, and the clock cycle that each node can be in;

D3：建立以关键路径的长度为行、以可重构阵列的列数为列的矩阵；D3: Establish a matrix with the length of the critical path as the row and the number of columns of the reconfigurable array as the column;

D4：根据未映射节点的自由度、所处的时钟周期，按照映射规则，将未映射的节点映射到矩阵中；D4: According to the degree of freedom of the unmapped nodes and the clock cycle, according to the mapping rules, map the unmapped nodes into the matrix;

D5：将映射结果生成循环任务在可重构阵列上的配置信息。D5: Generate the configuration information of the cyclic task on the reconfigurable array from the mapping result.

为了对上述方法进行详细说明，详见如下，假设，RCA规模为x行y列；数据流图G1、节点个数为n(其中，n＜＝x×y)；In order to describe the above method in detail, see the following for details, assuming that the RCA scale is x rows and y columns; the data flow graph G1 and the number of nodes are n (wherein, n<=x×y);

首先，对G1进行“尽可能早”算子调度(即第一算子调度)；对G1进行“尽可能晚”算子调度(即第二算子调度)；获得数据流图G1的关键路径的长度k和各个节点的自由度，以及各个节点可处在的时钟周期；Firstly, perform "as early as possible" operator scheduling on G1 (that is, the first operator scheduling); perform "as late as possible" operator scheduling on G1 (that is, the second operator scheduling); obtain the critical path of the data flow graph G1 The length k of and the degrees of freedom of each node, and the clock cycle that each node can be in;

其次，建立k行y列的矩阵；Second, create a matrix of k rows and y columns;

其中，k行代表RCA拟在k个时钟周期内执行完数据流图G1；矩阵的每一行对应着RCA的每一行所处的不同时钟周期，矩阵中行号相差x(其中，具体为RCA的行数)的整数倍的所有行是RCA的同一行所处的不同的时钟周期；矩阵的列与RCA的列一致，矩阵的横坐标为从左至右，纵坐标为从上至下。Among them, row k represents that RCA intends to execute the data flow graph G1 within k clock cycles; each row of the matrix corresponds to a different clock cycle of each row of RCA, and the row numbers in the matrix differ by x (wherein, specifically, the row of RCA All the rows that are integer multiples of the number) are different clock cycles of the same row of the RCA; the columns of the matrix are consistent with the columns of the RCA, the abscissa of the matrix is from left to right, and the ordinate is from top to bottom.

然后，当数据流图中有节点未映射时，则对上述数据流图中未映射的节点进行映射。该步骤详见如下：Then, when there are unmapped nodes in the data flow graph, the unmapped nodes in the above data flow graph are mapped. The steps are detailed as follows:

在对上述数据流图中未映射的节点进行映射时，如果这些未映射的节点为多个且自由度不同，则按照各节点的自由度，优先映射未映射的节点中自由度最低的节点。原因如下：这是因为随着越来越多的节点被映射到矩阵上，后映射的节点可选的映射范围将越来越小，只有将自由度高的节点后映射才能尽可能的保证所有的节点都被映射到矩阵上。When mapping the unmapped nodes in the above data flow graph, if there are multiple unmapped nodes with different degrees of freedom, the node with the lowest degree of freedom among the unmapped nodes is mapped preferentially according to the degrees of freedom of each node. The reason is as follows: This is because as more and more nodes are mapped to the matrix, the optional mapping range of the post-mapped nodes will become smaller and smaller. Only by post-mapping the nodes with a high degree of freedom can guarantee all The nodes of are mapped to the matrix.

当确定了需要进行映射的节点后，在进行对该节点的映射时，遵循以下映射规则：After determining the node that needs to be mapped, follow the following mapping rules when mapping the node:

一、各个节点根据“尽可能早”、“尽可能晚”算子调度获得各个节点可处在的时钟周期，被映射到行号与时钟周期相同的行上。在将各个节点映射到矩阵的行上时，按照从左至右的顺序映射，先将节点映射到行上最左侧的点。1. According to the "as early as possible" and "as late as possible" operator scheduling, each node obtains the clock cycle that each node can be in, and is mapped to the row whose row number is the same as the clock cycle. When each node is mapped to the row of the matrix, it is mapped from left to right, and the node is first mapped to the leftmost point on the row.

特别需要注意的是，当一个节点可处在不同的时钟周期时，根据下述确定规则，将该节点映射到其中一个时钟周期所对应的行上。确定规则如下：It should be particularly noted that when a node can be in different clock cycles, the node is mapped to the row corresponding to one of the clock cycles according to the following determination rules. The determination rules are as follows:

I、根据是否影响其他未映射节点的自由度，确定该节点所映射的行。具体如下：将节点映射到矩阵的某一行上时，是以不减小其他未映射节点的自由度为前提的，因为一旦减小了其他未映射节点的自由度，这些未映射节点在向矩阵映射时，可选择的映射范围将减小，这将有可能导致这些未映射的节点不能被映射到矩阵上；I. Determine the row mapped by the node according to whether it affects the degrees of freedom of other unmapped nodes. The details are as follows: when mapping a node to a certain row of the matrix, it is based on the premise that the degrees of freedom of other unmapped nodes are not reduced, because once the degrees of freedom of other unmapped nodes are reduced, these unmapped nodes will contribute to the matrix When mapping, the selectable mapping range will be reduced, which may cause these unmapped nodes to not be mapped to the matrix;

II、根据该节点可映射的行的填满程度，确定该节点所映射的行。具体如下：优先将节点映射到被填满程度小的行上，因为这样可以尽量保持各行的填满程度一致，从而使得未被填满的行数最多。未映射的节点之后在被映射到矩阵上时，可选择的映射范围将更广，这将尽可能的保证所有的节点都能被映射到矩阵上。II. Determine the row mapped by the node according to the fullness of the mappable rows of the node. The details are as follows: Map the nodes to the rows with a small degree of filling first, because in this way, the filling degree of each row can be kept as consistent as possible, so that the number of unfilled rows is the largest. When the unmapped nodes are mapped to the matrix, the optional mapping range will be wider, which will ensure that all nodes can be mapped to the matrix as much as possible.

III、根据该节点可映射的行的行号，确定该节点所映射的行。具体如下：优先将节点映射到行号小的行上。III. Determine the row mapped by the node according to the row number of the row mappable by the node. The details are as follows: firstly map the nodes to the row with the smaller row number.

其中，上述优先级的关系为I＞II＞III，即：不减小未映射节点的自由度的优先级别大于优先将节点映射到被填满程度小的行上，优先将节点映射到被填满程度小的行上的优先级别大于优先将节点映射到行号小的行上。Among them, the above-mentioned priority relationship is I>II>III, that is, the priority level of not reducing the degree of freedom of unmapped nodes is higher than the priority of mapping nodes to rows that are less filled, and the priority of mapping nodes to filled rows The priority level on the row with a small full degree is higher than the priority to map nodes to the row with a small row number.

二、每当一个节点被映射到矩阵上的某一点后，矩阵中与此点同列，且横坐标相差x(其中，具体为RCA的行数)的整数倍的所有点就不能被映射，因为矩阵中行号相差x(RCA的行数)的整数倍的所有行是RCA的同一行所处的不同的时钟周期，所以当矩阵中的某一点所对应的RC被映射为数据流图上的节点后，此RC在其余的时钟周期便不能被映射。2. Whenever a node is mapped to a certain point on the matrix, all points in the matrix that are in the same column as this point and whose abscissa differs by an integer multiple of x (specifically, the number of rows of RCA) cannot be mapped, because All rows in the matrix whose row numbers differ by an integer multiple of x (the number of RCA rows) are different clock cycles in the same row of RCA, so when the RC corresponding to a certain point in the matrix is mapped to a node on the data flow graph After that, the RC cannot be mapped in the remaining clock cycles.

最后，根据循环任务的数据流图在矩阵上的映射结果，生成循环任务在可重构阵列上的配置信息。Finally, according to the mapping result of the data flow graph of the cyclic task on the matrix, the configuration information of the cyclic task on the reconfigurable array is generated.

基于上述描述，下面以一个具体实施例来说明实现生成循环任务配置信息的过程，详见如下：Based on the above description, a specific embodiment is used below to illustrate the process of generating configuration information for cyclic tasks, see the following for details:

实施例1Example 1

本发明以图5的数据流图为例，输入为：4×4RCA；数据流图G1，节点个数16(16＜＝4×4)，图8为本发明实施例提供的实现生成循环任务配置信息的方法的流程图，参见图8，具体步骤如下：The present invention takes the data flow graph of Fig. 5 as an example, the input is: 4*4RCA; the data flow graph G1, the number of nodes is 16 (16<=4*4), and Fig. 8 is the implementation and generation cycle task provided by the embodiment of the present invention For the flow chart of the method for configuring information, see FIG. 8 , the specific steps are as follows:

步骤101：对G1进行“尽可能早”算子调度，得到调度后的数据流图。Step 101: Perform "as early as possible" operator scheduling on G1 to obtain the scheduled data flow graph.

其中，调度后的数据流图如图6左图所示，根据“尽可能早”算子调度，以及从路径必须服从关键路径的调度关系，将节点“2”、节点“11”分配到第1个时钟周期；将节点“6”分配到第2个时钟周期；将节点“9”分配到第3个时钟周期；将节点“12”分配到第4个时钟周期；节点“14”分配到第5个时钟周期；由于节点“14”分配到第5个时钟周期，故节点“16”分配到第6个时钟周期；节点“18”分配到第8个时钟周期。Among them, the data flow diagram after scheduling is shown in the left figure of Figure 6. According to the "as early as possible" operator scheduling and the scheduling relationship that the slave path must obey the critical path, nodes "2" and "11" are assigned to the first 1 clock cycle; assign node "6" to 2nd clock cycle; assign node "9" to 3rd clock cycle; assign node "12" to 4th clock cycle; assign node "14" to The 5th clock cycle; since the node "14" is allocated to the 5th clock cycle, the node "16" is allocated to the 6th clock cycle; the node "18" is allocated to the 8th clock cycle.

步骤102：对G1进行“尽可能晚”算子调度，得到调度后的数据流图。Step 102: Perform "as late as possible" operator scheduling on G1 to obtain the scheduled data flow graph.

其中，调度后的数据流图如图6右图所示，根据“尽可能晚”算子调度，以及从路径必须服从关键路径的调度关系，将节点“18”分配到第8个时钟周期；将节点“16”分配到第7个时钟周期；将节点“14”、节点“12”分配到第6个时钟周期；将节点“11”、节点“9”分配到第5个时钟周期；将节点“6”、分配到第4个时钟周期；将节点“2”分配到第3个时钟周期。Among them, the scheduled data flow diagram is shown in the right figure of Figure 6. According to the "as late as possible" operator scheduling and the scheduling relationship that the slave path must obey the critical path, the node "18" is assigned to the eighth clock cycle; Assign node "16" to the seventh clock cycle; assign node "14" and node "12" to the sixth clock cycle; assign node "11" and node "9" to the fifth clock cycle; assign Node "6" is assigned to the 4th clock cycle; node "2" is assigned to the 3rd clock cycle.

步骤103：根据步骤101、步骤102调度后的数据流图，得到G1的关键路径长度，各节点的自由度，及各节点可处在的时钟周期，参见表1，其中，表1为各个节点的自由度和可处在的时钟周期。Step 103: According to the data flow graph scheduled instep 101 and step 102, the critical path length of G1, the degree of freedom of each node, and the clock cycle that each node can be in are obtained, see Table 1, where Table 1 shows each node degrees of freedom and available clock cycles.

表1各个节点的自由度和可处在的时钟周期Table 1 The degrees of freedom of each node and the available clock cycles

其中，根据算子自由度的计算公式为“‘尽可能晚算子调度中算子的时钟周期’-‘尽可能早算子调度中算子的时钟周期’+1”。所以，关键路径上的所有算子的自由度均为1，关键路径为(其中，关键路径为在所有的输入到输出的路径中，执行时间最长，即算子个数最多的路径)2条，分别为“1→3→7→10→13→15→17”和“1→4→7→10→13→15→17”。即，节点“1”、“3”、“4”、“7”、“10”、“13”、“15”、“17”的自由度为1，关键路径的长度为8。同理根据自由度的计算公式能分别获得节点“2”的自由度为3、节点“6”的自由度为3、节点“9”的自由度为3、节点“11”的自由度为5、节点“12”的自由度为3、节点“14”的自由度为2、节点“16”的自由度为2、节点“18”的自由度为1。Among them, the calculation formula according to the degree of freedom of the operator is "'the clock period of the operator in the operator scheduling as late as possible'-'the clock period of the operator in the operator scheduling as early aspossible'+1". Therefore, the degree of freedom of all operators on the critical path is 1, and the critical path is (among them, the critical path is the path with the longest execution time among all the paths from input to output, that is, the path with the largest number of operators) 2 The bars are "1→3→7→10→13→15→17" and "1→4→7→10→13→15→17". That is, the degree of freedom of nodes "1", "3", "4", "7", "10", "13", "15", and "17" is 1, and the length of the critical path is 8. Similarly, according to the calculation formula of the degree of freedom, the degree of freedom of node "2" is 3, the degree of freedom of node "6" is 3, the degree of freedom of node "9" is 3, and the degree of freedom of node "11" is 5 , the degree of freedom of the node "12" is 3, the degree of freedom of the node "14" is 2, the degree of freedom of the node "16" is 2, and the degree of freedom of the node "18" is 1.

其中，参见图6的左图、右图可以获得各个节点处在的时钟周期，参见表1，详见如下：关键路径上的节点“1”可处在的时钟周期为第1时钟周期、节点“3”可处在的时钟周期为第2时钟周期、节点“4”可处在的时钟周期为第2时钟周期、节点“7”可处在的时钟周期为第3时钟周期、节点“10”可处在的时钟周期为第4时钟周期、节点“13”可处在的时钟周期为第5时钟周期、节点“15”可处在的时钟周期为第6时钟周期、节点“17”可处在的时钟周期为第7时钟周期；从路径上的节点“2”可处在的时钟周期为第1时钟周期或第2时钟周期或第3时钟周期、节点“6”可处在的时钟周期为第2时钟周期或第3时钟周期或第4时钟周期、节点“9”可处在的时钟周期为第3时钟周期或第4时钟周期或第5时钟周期、节点“11”可处在的时钟周期为第1时钟周期或第2时钟周期或第3时钟周期或第4时钟周期或第5时钟周期、节点“12”可处在的时钟周期为第4时钟周期或第5时钟周期或第6时钟周期、节点“14”可处在的时钟周期为第5时钟周期或第6时钟周期、节点“16”可处在的时钟周期为第6时钟周期或第7时钟周期、节点“18”可处在的时钟周期为第8时钟周期。Among them, refer to the left and right diagrams of Figure 6 to obtain the clock cycle of each node, see Table 1, see the following for details: the clock cycle that node "1" on the critical path can be in is the first clock cycle, node The clock cycle that "3" can be in is the second clock cycle, the clock cycle that node "4" can be in is the second clock cycle, the clock cycle that node "7" can be in is the third clock cycle, node "10" "The clock cycle that can be in the 4th clock cycle, the clock cycle that the node "13" can be in is the 5th clock cycle, the clock cycle that the node "15" can be in is the 6th clock cycle, and the node "17" can be in the 6th clock cycle. The clock cycle that is in is the seventh clock cycle; the clock cycle that node "2" on the slave path can be in is the first clock cycle or the second clock cycle or the third clock cycle, and the clock that node "6" can be in The cycle is the 2nd clock cycle or the 3rd clock cycle or the 4th clock cycle, the clock cycle that node "9" can be in is the 3rd clock cycle, the 4th clock cycle or the 5th clock cycle, and the node "11" can be in The clock cycle is the 1st clock cycle or the 2nd clock cycle or the 3rd clock cycle or the 4th clock cycle or the 5th clock cycle, and the clock cycle that node "12" can be in is the 4th clock cycle or the 5th clock cycle or The 6th clock cycle, the clock cycle that node "14" can be in is the 5th clock cycle or the 6th clock cycle, the clock cycle that node "16" can be in is the 6th clock cycle or the 7th clock cycle, node "18" "The available clock cycle is the 8th clock cycle.

步骤104：根据关键路径的长度、可重构阵列的列数，建立矩阵。Step 104: Establish a matrix according to the length of the critical path and the number of columns of the reconfigurable array.

其中，以关键路径的长度为行、以可重构阵列的列数为列，建立该矩阵，本实施例建立的矩阵具体为8行×4列。Wherein, the matrix is established with the length of the critical path as the row and the number of columns of the reconfigurable array as the column, and the matrix established in this embodiment is specifically 8 rows×4 columns.

步骤105：当数据流图中存在节点未被映射时，根据未映射节点的自由度，所处的时钟周期，按照映射规则，将未映射节点映射到矩阵中，得到映射后的矩阵。Step 105: When there is a node in the data flow graph that is not mapped, according to the degree of freedom of the unmapped node and the clock cycle, map the unmapped node into the matrix according to the mapping rule to obtain the mapped matrix.

其中，根据各个节点的自由度，所处的时钟周期，按照映射规则，将节点映射到矩阵中时，按照各节点的自由度的高低，优先对未映射的节点中自由度最低的节点进行映射。Among them, according to the degree of freedom of each node, the clock cycle, according to the mapping rules, when mapping the nodes into the matrix, according to the degree of freedom of each node, the node with the lowest degree of freedom among the unmapped nodes is preferentially mapped .

一、映射所有自由度为1的节点。1. Map all nodes with a degree of freedom of 1.

其中，参见图7(1)，所有自由度为1的节点行号均固定，按节点编号从左至右填充矩阵，每映射一个节点，要对其同列，且行号“+4”或“-4”的矩阵的点进行更新，画上“x”标明其不能再被映射。Among them, see Figure 7(1), the row numbers of all nodes with a degree of freedom of 1 are fixed, and the matrix is filled from left to right according to the node numbers. Every time a node is mapped, it must be in the same column, and the row number "+4" or "- 4" matrix points are updated, and an "x" is drawn to indicate that it can no longer be mapped.

其中，例如将节点“1”映射到第1行第1列，则需要将矩阵的第5行第1列画上“x”，则第5行第1列不能再被映射；同理将节点“3”映射到第2行第1列，则需要将矩阵的第6行第1列画上“x”，则第6行第1列不能再被映射；同理将节点“4”映射到第2行第2列，则需要将矩阵的第6行第2列画上“x”，则第6行第2列不能再被映射；同理节点“7”映射到第3行第1列，则需要将矩阵的第7行第1列画上“x”，则第7行第1列不能再被映射；同理节点“10”映射到第4行第1列，则需要将矩阵的第8行第1列画上“x”，则第8行第1列不能再被映射；由于第5行第1列不能再被映射，即将节点“13”映射到第5行第2列，则需要将矩阵的第1行第2列画上“x”，则第1行第2列不能再被映射；由于第6行第2列不能再被映射，即将节点“15”映射到第6行第3列，则需要将矩阵的第2行第3列画上“x”，则第2行第3列不能再被映射；由于第7行第1列不能再被映射，则需要将节点“17”映射到第7行第2列，则需要将矩阵的第3行第2列画上“x”，则第3行第2列不能再被映射；由于第8行第1列不能再被映射，则需要将节点“18”映射到第8行第2列，则需要将矩阵的第4行第2列画上“x”，则第4行第2列不能再被映射。Among them, for example, to map the node "1" to the first row and the first column, you need to draw "x" on the fifth row and the first column of the matrix, then the fifth row and the first column can no longer be mapped; similarly, the node "3" is mapped to row 2, column 1, then you need to draw "x" on row 6, column 1 of the matrix, then row 6, column 1 can no longer be mapped; in the same way, node "4" is mapped to In row 2, column 2, you need to draw "x" on row 6, column 2 of the matrix, then row 6, column 2 can no longer be mapped; similarly, node "7" is mapped to row 3, column 1 , you need to draw "x" on the 7th row and 1st column of the matrix, then the 7th row and 1st column can no longer be mapped; similarly the node "10" is mapped to the 4th row and 1st column, you need to map the matrix If "x" is drawn on the first column of row 8, the first column of row 8 can no longer be mapped; since the first column of row 5 can no longer be mapped, the node "13" is mapped to the second column of row 5, Then you need to draw "x" on the first row and the second column of the matrix, then the first row and the second column can no longer be mapped; since the sixth row and the second column can no longer be mapped, the node "15" is mapped to the sixth row 3 column, you need to draw "x" on the 2nd row and 3rd column of the matrix, then the 2nd row and 3rd column can no longer be mapped; since the 7th row and 1st column can no longer be mapped, you need to put the node "17" is mapped to row 7, column 2, then you need to draw "x" on row 3, column 2 of the matrix, then row 3, column 2 can no longer be mapped; because row 8, column 1 can no longer is mapped, you need to map node "18" to row 8, column 2, and you need to draw "x" on row 4, column 2 of the matrix, then row 4, column 2 can no longer be mapped.

二、映射所有自由度为2的节点。2. Map all nodes with 2 degrees of freedom.

其中，参见图7(2)，按节点编号先映射节点“14”。参照表1，由于节点“14”可处于第5时钟周期或第6时钟周期，即节点“14”有2个选择，可以选择矩阵的第5行和第6行。其中，由于节点“16”处于第6时钟周期或第7时钟周期，若将节点“14”映射到第6行，即节点“16”只能选择第7时钟周期，则，节点“16”的自由度减小为1(具体为，选择之前节点“16”的自由度为7-6+1＝2，选择后节点“16”的自由度为7-7+1＝1，2-1＝1)，对节点16的自由度存在减小的影响，所以节点“14”选择第5行第3列，则需要将矩阵的第1行第3列画上“x”，则第1行第3列不能再被映射；。Wherein, referring to FIG. 7(2), the node "14" is first mapped according to the node number. Referring to Table 1, since the node "14" can be in the 5th clock cycle or the 6th clock cycle, that is, the node "14" has 2 options, and the 5th row and the 6th row of the matrix can be selected. Among them, since the node "16" is in the 6th clock cycle or the 7th clock cycle, if the node "14" is mapped to the 6th row, that is, the node "16" can only choose the 7th clock cycle, then the node "16" The degree of freedom is reduced to 1 (specifically, the degree of freedom of node "16" before selection is 7-6+1=2, and the degree of freedom of node "16" after selection is 7-7+1=1, 2-1= 1), the degree of freedom ofnode 16 is reduced, so node "14" selectsrow 5 andcolumn 3, then it is necessary to draw "x" onrow 1 andcolumn 3 of the matrix, thenrow 1 andcolumn 3 columns can no longer be mapped;.

其中，再映射节点“16”。节点“16”有2个选择，可以选择矩阵的第6行和第7行。其中，由于节点“16”不存在减小未映射节点自由度的问题，可将其映射到填满度较小的第7行第3列，则需要将矩阵的第3行第3列画上“x”，则第3行第3列不能再被映射。Among them, the node "16" is remapped. Node "16" has 2 selections,row 6 androw 7 of the matrix can be selected. Among them, since the node "16" does not have the problem of reducing the degree of freedom of the unmapped node, it can be mapped to the seventh row and the third column with a small filling degree, and the third row and the third column of the matrix need to be drawn on "x", thenrow 3 andcolumn 3 can no longer be mapped.

三、映射所有自由度为3的节点。3. Map all nodes with 3 degrees of freedom.

其中，参见图7(3)，按节点编号先映射节点“2”，参照表1，由于节点“2”可处于第1时钟周期或第2时钟周期或第3时钟周期，即节点“2”有3个选择，可以选择矩阵的第1行或第2行或第3行。由于节点“2”存在减小未映射节点自由度的问题，即若节点“2”选择第2时钟周期，则对节点“6”的自由度有影响，使得节点“6”的自由度减少1；若节点“2”选择第3时钟周期，则对节点“9”的自由度有影响，使得节点“9”的自由度减少1，综上，节点“2”选择第1行第4列，则需要将矩阵的第5行第4列画上“x”，则第5行第4列不能再被映射。Among them, referring to Figure 7(3), first map node "2" according to the node number, refer to Table 1, since node "2" can be in the first clock cycle or the second clock cycle or the third clock cycle, that is, node "2" There are 3 choices to chooserow 1 orrow 2 orrow 3 of the matrix. Because node "2" has the problem of reducing the degree of freedom of unmapped nodes, that is, if node "2" chooses the second clock cycle, it will affect the degree of freedom of node "6", so that the degree of freedom of node "6" is reduced by 1 ; If the node "2" chooses the third clock cycle, it will affect the degree of freedom of the node "9", so that the degree of freedom of the node "9" will be reduced by 1. In summary, the node "2" chooses the first row and the fourth column, Then you need to draw "x" on the 5th row and 4th column of the matrix, then the 5th row and 4th column can no longer be mapped.

再映射节点“6”，参照表1，由于节点“6”可处于第2时钟周期或第3时钟周期或第4时钟周期，即节点“6”有3个选择，可以选择矩阵的第2行或第3行或第4行。由于节点“6”存在减小未映射节点自由度的问题，即若节点“6”选择第3时钟周期，则对节点“9”的自由度有影响，使得节点“9”的自由度减少1；若节点“6”选择第4时钟周期，则对节点“12”的自由度有影响，使得节点“12”的自由度减少1，综上，节点“6”选择第2行第4列，则需要将矩阵的第6行第4列画上“x”，则第6行第4列不能再被映射。Remap node "6", refer to Table 1, since node "6" can be in the second clock cycle or the third clock cycle or the fourth clock cycle, that is, node "6" has 3 choices, you can choose the second row of the matrix orline 3 orline 4. Because node "6" has the problem of reducing the degree of freedom of unmapped nodes, that is, if node "6" chooses the third clock cycle, it will affect the degree of freedom of node "9", so that the degree of freedom of node "9" is reduced by 1 ; If the node "6" chooses the 4th clock cycle, it will affect the degree of freedom of the node "12", so that the degree of freedom of the node "12" will be reduced by 1. In summary, the node "6" chooses the second row and the fourth column, Then you need to draw "x" on the 6th row and 4th column of the matrix, then the 6th row and 4th column can no longer be mapped.

再映射节点“9”，参照表1，由于节点“9”可处于第3时钟周期或第4时钟周期或第5时钟周期，即节点“9”有3个选择，可以选择矩阵的第3行或第4行或第5行。由于节点“9”存在减小未映射节点自由度的问题，即若节点“9”选择第4时钟周期，则对节点“12”的自由度有影响，使得节点“12”的自由度减少1，即，节点“9”选择第3行第4列，则需要将矩阵的第7行第4列画上“x”，则第7行第4列不能再被映射。Remap node "9", refer to Table 1, since node "9" can be in the 3rd clock cycle or the 4th clock cycle or the 5th clock cycle, that is, node "9" has 3 choices, you can choose the third row of the matrix orline 4 orline 5. Because node "9" has the problem of reducing the degree of freedom of unmapped nodes, that is, if node "9" chooses the fourth clock cycle, it will affect the degree of freedom of node "12", so that the degree of freedom of node "12" is reduced by 1 , That is, if node "9" selectsrow 3 andcolumn 4, it is necessary to draw "x" onrow 7 andcolumn 4 of the matrix, androw 7 andcolumn 4 cannot be mapped anymore.

再映射节点“12”，参照表1，由于节点“12”可处于第4时钟周期或第5时钟周期或第6时钟周期，节点“12”有3个选择，可以选择矩阵的第4行或第5行或第6行。其中，由于此时矩阵中只剩第4行和第8行未被填充满，所以节点“12”被映射到第4行第3列，则需要将矩阵的第8行第3列画上“x”，则第8行第3列不能再被映射。Remap node "12", referring to Table 1, since node "12" can be in the 4th clock cycle, 5th clock cycle or 6th clock cycle, node "12" has 3 options, you can choose the 4th row of the matrix orLine 5 or 6. Among them, since only the 4th row and the 8th row are not filled in the matrix at this time, the node "12" is mapped to the 4th row and the 3rd column, and the 8th row and the 3rd column of the matrix need to be drawn with " x", thenrow 8 andcolumn 3 can no longer be mapped.

四、映射自由度为5的节点。4. Map nodes with 5 degrees of freedom.

其中，参见图7(4)，参照表1，由于节点“11”可处于第1时钟周期或第2时钟周期或第3时钟周期或第4时钟周期或第5时钟周期，节点“11”有5个选择，可以选择矩阵的第1行或第2行或第3行或第4行或第5行。由于此时矩阵中只剩第4行和第8行未被填充满，所以节点“11”被映射到第4行第4列，则需要将矩阵的第8行第4列画上“x”，则第8行第4列不能再被映射。Wherein, referring to Fig. 7 (4), referring to Table 1, since the node "11" can be in the first clock cycle or the second clock cycle or the third clock cycle or the fourth clock cycle or the fifth clock cycle, the node "11" has 5 choices, you can chooserow 1 orrow 2 orrow 3 orrow 4 orrow 5 of the matrix. Since only the 4th row and the 8th row are not filled in the matrix at this time, the node "11" is mapped to the 4th row and 4th column, and the 8th row and 4th column of the matrix need to be drawn with "x" , thenrow 8 andcolumn 4 can no longer be mapped.

步骤106：将映射结果生成循环任务在可重构阵列上的配置信息。Step 106: Generate configuration information of the cyclic task on the reconfigurable array from the mapping result.

其中，参见图7(4)，图7(4)为循环任务的数据流图在矩阵上的映射结果即所得到的映射后的矩阵，相应地，图7(5)为循环任务在可重构阵列上生成的配置信息。Wherein, referring to Fig. 7(4), Fig. 7(4) is the mapping result of the data flow graph of the cyclic task on the matrix, that is, the obtained mapped matrix. Correspondingly, Fig. 7(5) is the reproducible Configuration information generated on the fabric array.

本发明提供的方法，通过只生成了一套可重构阵列的配置信息，减少了向可重构阵列传输配置信息的时间，提高了处理效率，减少了计算量，可以较快地配置可重构单元行的运算功能，满足了实际应用中的需要。The method provided by the present invention reduces the time for transmitting configuration information to the reconfigurable array by only generating a set of configuration information of the reconfigurable array, improves processing efficiency, reduces the amount of calculation, and can quickly configure the reconfigurable array. The operation function of the structural unit line meets the needs of practical applications.

实施例2Example 2

参见图9，为本发明实施例提供的一种实现生成循环任务配置信息的装置示意图，图10为本发明实施例提供的一种实现生成循环任务配置信息的装置具体示意图，包括：Referring to FIG. 9, it is a schematic diagram of a device for generating cyclic task configuration information provided by an embodiment of the present invention. FIG. 10 is a specific schematic diagram of a device for generating cyclic task configuration information provided by an embodiment of the present invention, including:

调度模块201：用于对所述可重构阵列当前执行数据流的节点进行算子调度。Scheduling module 201: used to perform operator scheduling on nodes currently executing data streams of the reconfigurable array.

其中，调度模块具体包括：Among them, the scheduling module specifically includes:

第一调度单元201A，用于对当前执行数据流的节点进行第一算子调度；The first scheduling unit 201A is configured to perform first operator scheduling on nodes currently executing data flows;

第二调度单元201B，用于对当前执行数据流的节点进行第二算子调度；The second scheduling unit 201B is configured to perform second operator scheduling on the node currently executing the data flow;

相应地，获得模块在获得各个节点的自由度时，根据第二调度单元得到的第二算子调度中该节点的时钟周期-第一调度单元得到的第一算子调度中该节点的时钟周期+1，得到该节点的自由度。Correspondingly, when the obtaining module obtains the degree of freedom of each node, according to the clock cycle of the node in the second operator schedule obtained by the second scheduling unit - the clock cycle of the node in the first operator schedule obtained by the first scheduling unit +1 for getting the degrees of freedom for this node.

获得模块202：用于根据调度模块201得到的算子调度结果，获得关键路径的长度、各个节点的自由度，以及各个节点可处的时钟周期。Obtaining module 202: used to obtain the length of the critical path, the degree of freedom of each node, and the available clock cycle of each node according to the operator scheduling result obtained by the scheduling module 201.

建立模块203：用于根据获得模块202获得的关键路径的长度、可重构阵列的列数，建立矩阵。Establishment module 203: used to establish a matrix according to the length of the critical path obtained by the obtaining module 202 and the number of columns of the reconfigurable array.

其中，建立模块203具体用于以关键路径的长度为行、以可重构阵列的列数为列，建立矩阵。Wherein, the establishing module 203 is specifically configured to establish a matrix with the length of the critical path as the row and the number of columns of the reconfigurable array as the column.

映射模块204：用于当数据流中存在未映射的节点时，根据获得模块202获得的未映射节点的自由度、未映射节点可处的时钟周期，以及映射规则，将未映射的节点映射到建立模块203建立的矩阵中。Mapping module 204: used to map the unmapped nodes to In the matrix established by the establishment module 203.

其中，当数据流中存在的未映射的节点个数为多个，且自由度不同时，映射模块，具体用于依次按照多个未映射节点的自由度由低到高的顺序，根据未映射节点可处的时钟周期，以及映射规则，将未映射的节点映射到矩阵。Among them, when the number of unmapped nodes in the data stream is multiple and the degrees of freedom are different, the mapping module is specifically used to sequentially follow the order of the degrees of freedom of the multiple unmapped nodes from low to high, according to the unmapped The clock cycles at which the nodes are available, and the mapping rules that map unmapped nodes to the matrix.

其中，映射规则具体包括：Among them, the mapping rules specifically include:

在矩阵中，将节点映射到行号与节点的时钟周期相同的行上；In the matrix, map the node to the row with the same row number as the clock period of the node;

且，在将节点映射到行时，按照从左至右的顺序进行映射；And, when mapping nodes to rows, map them in order from left to right;

且，当对节点映射后，在矩阵中与节点映射的位置同列，且横坐标相差可重构阵列行数的整数倍的所有位置不再被其他未映射节点所映射。Moreover, after the nodes are mapped, all positions in the matrix that are in the same column as the positions mapped to the nodes and whose abscissas differ by an integer multiple of the number of rows of the reconfigurable array are no longer mapped by other unmapped nodes.

当节点可处在不同的时钟周期时，在矩阵中，将节点映射到行号与节点的时钟周期相同的行上，还包括：When the nodes can be at different clock periods, in the matrix, map the nodes to the row with the same row number as the node's clock period, also include:

其中，I、II、III执行的优先级顺序为：I＞II＞III。Wherein, the order of priority for execution of I, II, and III is: I>II>III.

生成模块205：用于映射模块204得到的映射结果，得到循环任务在可重构阵列上的配置信息。Generation module 205: use the mapping result obtained by the mapping module 204 to obtain the configuration information of the cyclic task on the reconfigurable array.

综上，本发明实施例提出的一种实现生成循环任务配置信息的方法和装置，只生成了一套可重构阵列的配置信息，减少了向可重构阵列传输配置信息的时间，提高了处理效率，可以较快地配置可重构单元行的运算功能，满足了实际应用中的需要。In summary, a method and device for generating cyclic task configuration information proposed by the embodiments of the present invention only generates a set of configuration information of reconfigurable arrays, reduces the time for transmitting configuration information to reconfigurable arrays, and improves The processing efficiency can quickly configure the operation function of the reconfigurable unit row, which meets the needs of practical applications.

以上仅为本发明的较佳实施例，并不用以限制本发明，凡在本发明的精神和原则之内，所作的任何修改、等同替换、改进等，均应包含在本发明的保护范围之内。The above are only preferred embodiments of the present invention, and are not intended to limit the present invention. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of the present invention shall be included in the protection scope of the present invention Inside.

Claims

1, a kind of method that realizes generating the cycle task configuration information is characterized in that, when the small scale in reconfigurable arrays of data stream, described method comprises:

Step C1: the node to the current execution data stream of described reconfigurable arrays carries out the operator scheduling;

Step C2: according to the operator scheduling result, obtain the length of critical path, the degree of freedom of each node, and the clock period that can locate of each node;

Step C3:, set up matrix according to the length of described critical path, the columns of described reconfigurable arrays;

Step C4: when having unmapped node in the described data stream, the clock period that can locate according to the degree of freedom of described not mapping node, described not mapping node, and mapping ruler, described unmapped node is mapped in the described matrix;

Step C5:, obtain the configuration information of described cycle task on described reconfigurable arrays according to the mapping result that step C4 obtains.

2, the method for claim 1 is characterized in that, described step C1 comprises:

Step C11: the node to described current execution data stream carries out the scheduling of first operator;

Step C12: the node to described current execution data stream carries out the scheduling of second operator;

Correspondingly, the degree of freedom of each node of described step C2 acquisition specifically comprises:

The degree of freedom of each node=" clock period+1 of this node during clock period-the first operator of this node is dispatched in the scheduling of second operator ".

3, the method for claim 1 is characterized in that, described step C3 specifically comprises: be row, be row with the columns of described reconfigurable arrays with the length of described critical path, set up matrix.

4, method as claimed in claim 3 is characterized in that, the unmapped node number that exists in described data stream is a plurality of, and degree of freedom is not simultaneously,

The clock period that described step C4 can locate according to the degree of freedom of described not mapping node, described not mapping node, and mapping ruler, described unmapped node be mapped in the described matrix specifically comprise:

Successively according to the degree of freedom order from low to high of described a plurality of not mapping nodes, the clock period that can locate according to described not mapping node, and mapping ruler, described unmapped node is mapped to described matrix.

5, as claim 3 or 4 described methods, it is characterized in that described mapping ruler specifically comprises:

In described matrix, described node is mapped on number identical with the clock period of the described node row of row;

And, described node is being mapped to described when capable, shine upon according to order from left to right;

And, after to the mapping of described node, in described matrix with the position same column of described node mapping, and horizontal ordinate differ the reconfigurable arrays line number integral multiple all positions no longer by other not mapping node shine upon.

6, method as claimed in claim 5 is characterized in that, and is described in described matrix when described node can be in the different clock period, and described node is mapped on number identical with the clock period of the described node row of row, also comprises:

Whether I, basis influence other not degree of freedom of mapping node, determine the row that this node shines upon; Or,

The degree of filling up of II, the row that can shine upon according to this node is determined the row that this node shines upon; Or,

The row of III, the row that can shine upon according to this node number is determined the row that this node shines upon.

7, method as claimed in claim 6 is characterized in that, the priority orders that described I, II, III carry out is: I＞II＞III.

8, a kind of device that generates the cycle task configuration information is characterized in that, when the small scale in reconfigurable arrays of data flow diagram, described device comprises:

Scheduler module is used for the node of the current execution data stream of described reconfigurable arrays is carried out the operator scheduling;

Obtain module, be used for the operator scheduling result that obtains according to described scheduler module, obtain the length of critical path, the degree of freedom of each node, and the clock period that can locate of each node;

Set up module, be used for setting up matrix according to the length of the critical path of described acquisition module acquisition, the columns of described reconfigurable arrays;

Mapping block, be used for when there is unmapped node in described data stream, the degree of freedom of the not mapping node that obtains according to described acquisition module, the clock period that described not mapping node can be located, and mapping ruler, described unmapped node is mapped to described foundation in the matrix that module sets up;

Generation module is used for the mapping result that described mapping block obtains, and obtains the configuration information of described cycle task on described reconfigurable arrays.

9, device as claimed in claim 8 is characterized in that, described scheduler module comprises:

First scheduling unit is used for the node of described current execution data stream is carried out the scheduling of first operator;

Second scheduling unit is used for the node of described current execution data stream is carried out the scheduling of second operator;

Correspondingly, described acquisition module is when obtaining the degree of freedom of each node, in second operator scheduling that obtains according to described second scheduling unit this node the clock period-first operator scheduling that described first scheduling unit obtains in clock period+1 of this node, obtain the degree of freedom of this node.

10, device as claimed in claim 8 is characterized in that, describedly sets up module specifically to be used for length with described critical path be row, is row with the columns of described reconfigurable arrays, sets up matrix.

11, device as claimed in claim 10 is characterized in that, the unmapped node number that exists in described data stream is a plurality of, and degree of freedom is not simultaneously,

Described mapping block specifically is used for successively the degree of freedom order from low to high according to described a plurality of not mapping nodes, the clock period that can locate according to described not mapping node, and mapping ruler, described unmapped node is mapped to described matrix.

12, as claim 10 or 11 described devices, it is characterized in that described mapping ruler specifically comprises:

13, device as claimed in claim 12 is characterized in that, and is described in described matrix when described node can be in the different clock period, and described node is mapped on number identical with the clock period of the described node row of row, also comprises:

14, device as claimed in claim 13 is characterized in that, the priority orders that described I, II, III carry out is: I＞II＞III.