CN102360306A

Movatterモバイル変換

Info

Publication number: CN102360306A
Application number: CN2011103191863A
Authority: CN
Inventors: 曹超; 景乃锋; 何卫锋; 绳伟光; 毛志刚; 付宇卓
Original assignee: Shanghai Jiao Tong University
Current assignee: Shanghai Jiao Tong University
Priority date: 2011-10-19
Filing date: 2011-10-19
Publication date: 2012-02-22

Abstract

The invention relates to a method for extracting and optimizing information of cyclic data flow charts in high-level language codes. A cyclic part in a high-level language program is analyzed and developed automatically, memory access operation in the cyclic part is extracted preliminarily and written back, instructions in the cyclic part are extracted into different categories of nodes, directed edges are generated by analyzing the nodes, and then a data flow chart is generated. Compared with the prior art, the method has the advantages of high efficiency, easiness in realization and the like.

Description

The loop-around data flow graph extracts the optimization information disposal route in the higher-level language code

Technical field

The present invention relates to a kind of restructural calculation task compiling neighborhood optimization implementation method, especially relate to loop-around data flow graph extraction optimization information disposal route in a kind of higher-level language code.

Background technology

The reconfigureable computing array processor is the hardware platform to the software-hardware synergism design; Its framework is a kind of hardware structure between traditional general processor and special IC; Taken into account the dirigibility and the high efficiency of system, it is formed and generally comprises general processor and reconfigureable computing array.The performance history of using on the reconfigureable computing array processor comprises application is divided into various tasks; And with the computation-intensive duty mapping to the computing unit array to improve its computing velocity and performance, be mapped as the advantage of software and will control intensive task with the performance general processor.

The automatic technique of compiling of reconfigureable computing array processor can effectively lower the difficulty of application development, uses the processor active task of DFD (Data Flow Graph, be called for short DFG) in representing to use in this technology.Through DFD, program compiler can be analyzed the dependence and the concurrency of computing automatically, thereby helps the mapping of task on restructural computing array.Owing to do not comprise the memory access function in the reconfigureable computing array, should not comprise the data of memory access in the DFD that can directly on this array, shine upon.The shared resource maximum of round-robin operation in application program; Data-intensive computing often all occurs with the round-robin form; The execution speed of cyclic part affects the travelling speed of total system; And cyclic part is the part of most possibly excavating out concurrency in the application program; Circulation is mapped to the computing velocity that can effectively improve the reconfigureable computing array processor on the reconfigurable arrays efficiently, and therefore effectively extracting the round-robin DFD has important effect to whole reconfigureable computing array processor compiling system.

Literature search through to prior art is found; People such as Chongyong Yin have discussed towards the generation method of reconfigurable arrays processor data flow graph in " Compiler framework for reconfigurable computing system " (compiler framework of restructurable computing system) delivered on " International Conference on Communications, Circuits and Systems " in 2009.The method that proposes in this article is the CFG (control flow graph) that at first generates the whole procedure code, then analyzes each fundamental block among the CFG, and generates the DFG of each fundamental block.But this method is not distinguished computation-intensive code and the intensive code of control effectively mainly towards whole procedure rather than towards circulation; And this method is to extract DFD in the fundamental block in the control flow graph, makes that like this DFD small scale quantity that extracts is many, is unfavorable for the division and the mapping optimization of automatic compiling system rear end.

Summary of the invention

The object of the invention is exactly for the defective that overcomes above-mentioned prior art existence loop-around data flow graph extraction optimization information disposal route in a kind of higher-level language code to be provided.

The object of the invention can be realized through following technical scheme:

The loop-around data flow graph extracts the optimization information disposal route in a kind of higher-level language code; It is characterized in that, analyze and launch the cyclic part in the high-level language programs automatically, accessing operation is wherein looked ahead and write back processing; Be different classes of node then with instruction fetch wherein; Generate directed edge through analyzing these nodes, and form DFD, concrete steps are following:

1) the code text scan module carries out morphology to whole higher-level language code and syntactic information is analyzed, and the intermediate representation data of the tree-like formula of generative grammar, and execution in step 2 then);

2) the loop unrolling module with the intermediate representation data of syntax tree form as input; Obtain wherein code structure information; And, at first confirm the round-robin number of times according to round-robin starting condition and termination condition for each circulation, the number of times of then setting according to round-robin number of times or user carries out round-robin to launch; And demarcate the code block after launching, execution in step 3 then);

3) look ahead and write back module the code block of demarcating is carried out looking ahead and writing back of memory access variable; At first the operand with memory read is prefetched in the scalar; The operand of all read-write memories in the circulation is replaced with corresponding scalar; And write back in the storer at the scalar that the end of code block will be referred to memory write, execution in step 4 then);

4) the intermediate representation modular converter transfers the intermediate representation data of syntax tree form to the intermediate representation data of instruction list form; Each bar instruction that the node generation module will be demarcated in the code block is classified according to the number of source operand; Sorted code is carried out the generation of corresponding calculated node CNode, and execution in step 5 then);

5) the limit generation module is analyzed the dependency information between each computing node according to the variable name of the data of opnd in each computing node and dst; The computing of existence from a variable set set to another variable a; Just exist from the set set all variablees to the directed edge of variable a; Thereby the generation directed edge, and form DFD.

The number of times of the loop unrolling described step 2) only extracts the loop body part for being specified by the user or discerning automatically according to the semanteme of program when carrying out the extraction of DFD for the part of not launching here.

Described step 2) used a look-up table to write down the state of these accessing operations in, thereby guaranteed that the dependence after the replacement is constant with semanteme.

Instruction is categorized as single-operand instruction, double operand instruction and 3-operand instruction according to the number of source operand in the described step 4).

The territory of computing node CNode comprises node serial number Num, operational character op, operand number opnd_num, operand opnd and destination operand dst in the described step 4).

Compared with prior art; The present invention has realized effective extraction of round-robin DFD in the higher-level language code; The invention has the beneficial effects as follows that the loop unrolling module can be according to the scale of round-robin scale automatic expansion DFD; Look ahead and write back module and can suffer accessing operation by active set, the access delay that effectively shortens the stratification storage system is more than 30%.

Embodiment

Below in conjunction with specific embodiment the present invention is elaborated.

Embodiment

DFD method for distilling of the present invention can be used for improving the extraction of the for round-robin DFD of C programmer MPEG2, mainly may further comprise the steps:

1) the code text scan module carries out morphology and grammatical analysis to whole higher-level language code, and the intermediate representation of the tree-like formula of generative grammar, and the intermediate representation of syntax tree form has kept the structures of higher-level language code, is easy to the analysis of compiling system.The code text scan module be general senior language compiler system compile and optimize before necessary module.

2) the loop unrolling module with the intermediate representation of syntax tree form as input; Obtain wherein code structure information; And, at first confirm the round-robin times N according to round-robin starting condition Cs and termination condition Ce for each circulation, code structure information comprises process, code block and statement.For the for circulation of C language, the computing formula of cycle index N:

N=(U-L)/step, wherein U is the circulation upper bound, and L is the circulation lower bound, and step is the circulation step-length.

Operation below then carrying out for each circulation:

1. round-robin launches.Then decide the number of times M of expansion, be defaulted as whole expansion, i.e. M=N according to user's input; According to launching number of times M circulation is launched, and the variable of circular correlation is replaced with the irrelevant variable of circulation, to keep the semanteme of program;

2. for the part of not launching (N-M), it is remained the loop structure of original program, and cycle index is set to N-M;

3. the code block after will launching is labeled as B, if can't confirm the round-robin number of times here, then this part is not carried out the round-robin expansion, and loop body is partly demarcated.

3) look ahead and write back module the code block of demarcating is carried out looking ahead and writing back of memory access variable; At first the operand with memory read is prefetched in the scalar; The operand of all read-write memories in the circulation is replaced with corresponding scalar, and write back in the storer at the scalar that the end of code block will be referred to memory write.Need use lookup table technology to come the dependence of record code here; Dependence with after the maintenance replacement is constant with semanteme; So both supported general reconfigurable arrays not have this characteristic of memory access function; Also active set has suffered accessing operation, effectively shortens the access delay of stratification storage system.A newly-built look-up table LUT is used for the state of all memory access variablees of record code piece B.For each accessing operation m among the B, operation below carrying out:

If 1. m is in LUT, then it is replaced with the scalar of respective items among the LUT, if do not exist, then m is replaced with a new scalar, and together be inserted in LUT with this new scalar m;

If 2. m is a read memory operation, then the read-write state of respective items in the LUT table is recorded as and reads; If m is the memory write operation, then the read-write state of respective items in the LUT table is recorded as and writes;

3. each in LUT being shown is carried out corresponding looking ahead and writing back according to its read-write state, and the state that occur promptly a certain the first time is then looked ahead for reading, as long as there is a kind of state need write back for writing then.

4) the intermediate representation modular converter transfers the intermediate representation of syntax tree form the intermediate representation of instruction list form to, and the intermediate representation of instruction list form is simple in structure, is easy to the dependence of routine analyzer.Each bar instruction that the node generation module will be demarcated in the code block is classified according to the number of source operand, is divided into (like absolute value operation, inversion operation etc.); Two operand instruction (like arithmetical operation, logical operation etc.); 3-operand instruction (like three order operations etc.).Sorted code is carried out the generation of corresponding calculated node CNode.The territory of computing node CNode comprises node serial number Num, operational character op, operand number opnd_num, operand opnd, destination operand dst.Node serial number Num is the unique identification code that generates at random, and the source operand in the computations is as the opnd of computing node, and destination operand is as the dst of computing node, and operand is as the operational character op of computing node, and opnd_num is the type of instruction.

5) the limit generation module is analyzed the dependence between each computing node according to the variable name of the data of opnd in each computing node and dst; The computing of existence from variable set set to another variable a just exists from the set set all variablees to the directed edge of variable a.The limit generation module can calculate the complete input and output information of each node, and these information are backfilling in each node, will not have the vertex ticks of forerunner's node to be input node INode, and the vertex ticks that will not have descendant node is output node ONode.At this moment the round-robin DFD generates.

Use the result of implementation of present embodiment to show; Use the loop unrolling module can make the number of DFD increase by 2.7 times; Average individual data flow graph scale increases by 34%; Looking ahead and writing back the module active set has suffered accessing operation, the access delay 46% of eliminating duplication storage system under the constant situation of other conditions.

In sum; The present invention has realized the method for distilling of round-robin DFD in the higher-level language code; Can effectively analyze and extract the DFD in the circulation; The invention has the beneficial effects as follows can be according to the scale of round-robin scale automatic expansion DFD, and can suffer accessing operation by active set, than the access delay of eliminating duplication of prior art storage system.

Claims

Translated fromChinese

1.一种高级语言代码中循环数据流图提取优化信息处理方法，其特征在于，自动分析并展开高级语言程序中的循环部分，对其中的访存操作进行预取和写回处理，接着将其中的指令提取为不同类别的节点，通过分析这些节点生成有向边，并形成数据流图，具体步骤如下：1. A method for extracting and optimizing information processing of circular data flow diagrams in high-level language codes, characterized in that the loop part in the high-level language program is automatically analyzed and expanded, and the memory access operations therein are prefetched and written back, and then processed The instructions are extracted into different types of nodes. By analyzing these nodes, directed edges are generated and a data flow graph is formed. The specific steps are as follows:

1)代码文本扫描模块对整个高级语言代码进行词法和语法信息进行分析，并生成语法树形式的中间表示数据，然后执行步骤2)；1) The code text scanning module analyzes the lexical and grammatical information of the entire high-level language code, and generates intermediate representation data in the form of a syntax tree, and then performs step 2);

2)循环展开模块以语法树形式的中间表示数据作为输入，获取其中代码结构信息，并对于每一个循环，首先根据循环的初始条件和结束条件来确定循环的次数，接着按照循环的次数或者用户设定的次数进行循环的展开，并标定展开后的代码块，然后执行步骤3)；2) The loop unrolling module takes the intermediate representation data in the form of a syntax tree as input, obtains the code structure information in it, and for each loop, first determines the number of loops according to the initial conditions and end conditions of the loop, and then determines the number of loops according to the number of loops or the user Expand the loop for the set number of times, and calibrate the expanded code block, and then perform step 3);

3)预取和写回模块将标定的代码块进行访存变量的预取和写回，首先将读存储器的操作数预取到标量中，将循环中所有读写存储器的操作数替换为对应的标量，并在代码块的结束处将涉及写存储器的标量写回到存储器中，然后执行步骤4)；3) The prefetch and writeback module performs prefetch and writeback of the calibrated code block to access variables, first prefetches the operands of the read memory into scalars, and replaces all the operands of the read and write memory in the loop with the corresponding , and at the end of the code block, write the scalar involved in writing the memory back into the memory, and then perform step 4);

4)中间表示转换模块将语法树形式的中间表示数据转为指令列表形式的中间表示数据，节点生成模块将标定代码块中的每一条指令根据源操作数的数目进行分类，对分类后的代码进行相应的计算节点CNode的生成，然后执行步骤5)；4) The intermediate representation conversion module converts the intermediate representation data in the form of syntax tree into the intermediate representation data in the form of instruction list, and the node generation module classifies each instruction in the marked code block according to the number of source operands, and classifies the classified code Carry out the generation of corresponding computing node CNode, then perform step 5);

5)边生成模块根据每一计算节点中的opnd和dst的数据的变量名来分析各个计算节点之间的依赖关系信息，存在从一个变量集合set到另一变量a的运算，就存在从set集合中所有变量到变量a的有向边，从而生成有向边，并形成数据流图。5) The edge generation module analyzes the dependency information between each computing node according to the variable names of the opnd and dst data in each computing node. If there is an operation from a variable set set to another variable a, there is an operation from set Directed edges from all variables in the set to variable a, thereby generating directed edges and forming a data flow graph.

2.根据权利要求1所述的一种高级语言代码中循环数据流图提取优化信息处理方法，其特征在于，所述的步骤2)中的循环展开的次数为由用户指定或根据程序的语义进行自动识别，对于此处未展开的部分进行数据流图的提取时只提取循环体部分。2. a kind of high-level language code according to claim 1 extracts and optimizes the information processing method of circular data flow diagram, it is characterized in that, the number of times of loop expansion in described step 2) is specified by the user or according to the semantics of program Automatic identification is performed, and only the loop body part is extracted when extracting the data flow graph for the unexpanded part here.

3.根据权利要求1所述的一种高级语言代码中循环数据流图提取优化信息处理方法，其特征在于，所述的步骤2)中使用了一个查找表来记录这些访存操作的状态，从而保证替换后的依赖关系和语义不变。3. in a kind of high-level language code according to claim 1, circular data flow diagram extracts optimization information processing method, it is characterized in that, described step 2) has used a look-up table to record the state of these memory access operations, In this way, the dependencies and semantics after replacement are guaranteed to remain unchanged.

4.根据权利要求1所述的一种高级语言代码中循环数据流图提取优化信息处理方法，其特征在于，所述的步骤4)中指令根据源操作数的数目进行分类为单操作数指令、双操作数指令和三操作数指令。4. a kind of high-level language code according to claim 1 extracts and optimizes the information processing method of circular data flow graph, it is characterized in that, in described step 4), instruction is classified into single operand instruction according to the number of source operand , two-operand instructions and three-operand instructions.

5.根据权利要求1所述的一种高级语言代码中循环数据流图提取优化信息处理方法，其特征在于，所述的步骤4)中计算节点CNode的域包含节点编号Num、操作符op、操作数个数opnd_num、操作数opnd和目的操作数dst。5. in a kind of high-level language code according to claim 1, circular data flow diagram extracts optimization information processing method, it is characterized in that, described step 4) in the domain of computing node CNode comprises node numbering Num, operator op, Operand number opnd_num, operand opnd and destination operand dst.