Background technology
The reconfigureable computing array processor is the hardware platform to the software-hardware synergism design; Its framework is a kind of hardware structure between traditional general processor and special IC; Taken into account the dirigibility and the high efficiency of system, it is formed and generally comprises general processor and reconfigureable computing array.The performance history of using on the reconfigureable computing array processor comprises application is divided into various tasks; And with the computation-intensive duty mapping to the computing unit array to improve its computing velocity and performance, be mapped as the advantage of software and will control intensive task with the performance general processor.
The automatic technique of compiling of reconfigureable computing array processor can effectively lower the difficulty of application development, uses the processor active task of DFD (Data Flow Graph, be called for short DFG) in representing to use in this technology.Through DFD, program compiler can be analyzed the dependence and the concurrency of computing automatically, thereby helps the mapping of task on restructural computing array.Owing to do not comprise the memory access function in the reconfigureable computing array, should not comprise the data of memory access in the DFD that can directly on this array, shine upon.The shared resource maximum of round-robin operation in application program; Data-intensive computing often all occurs with the round-robin form; The execution speed of cyclic part affects the travelling speed of total system; And cyclic part is the part of most possibly excavating out concurrency in the application program; Circulation is mapped to the computing velocity that can effectively improve the reconfigureable computing array processor on the reconfigurable arrays efficiently, and therefore effectively extracting the round-robin DFD has important effect to whole reconfigureable computing array processor compiling system.
Literature search through to prior art is found; People such as Chongyong Yin have discussed towards the generation method of reconfigurable arrays processor data flow graph in " Compiler framework for reconfigurable computing system " (compiler framework of restructurable computing system) delivered on " International Conference on Communications, Circuits and Systems " in 2009.The method that proposes in this article is the CFG (control flow graph) that at first generates the whole procedure code, then analyzes each fundamental block among the CFG, and generates the DFG of each fundamental block.But this method is not distinguished computation-intensive code and the intensive code of control effectively mainly towards whole procedure rather than towards circulation; And this method is to extract DFD in the fundamental block in the control flow graph, makes that like this DFD small scale quantity that extracts is many, is unfavorable for the division and the mapping optimization of automatic compiling system rear end.
Summary of the invention
The object of the invention is exactly for the defective that overcomes above-mentioned prior art existence loop-around data flow graph extraction optimization information disposal route in a kind of higher-level language code to be provided.
The object of the invention can be realized through following technical scheme:
The loop-around data flow graph extracts the optimization information disposal route in a kind of higher-level language code; It is characterized in that, analyze and launch the cyclic part in the high-level language programs automatically, accessing operation is wherein looked ahead and write back processing; Be different classes of node then with instruction fetch wherein; Generate directed edge through analyzing these nodes, and form DFD, concrete steps are following:
1) the code text scan module carries out morphology to whole higher-level language code and syntactic information is analyzed, and the intermediate representation data of the tree-like formula of generative grammar, and execution in step 2 then);
2) the loop unrolling module with the intermediate representation data of syntax tree form as input; Obtain wherein code structure information; And, at first confirm the round-robin number of times according to round-robin starting condition and termination condition for each circulation, the number of times of then setting according to round-robin number of times or user carries out round-robin to launch; And demarcate the code block after launching, execution in step 3 then);
3) look ahead and write back module the code block of demarcating is carried out looking ahead and writing back of memory access variable; At first the operand with memory read is prefetched in the scalar; The operand of all read-write memories in the circulation is replaced with corresponding scalar; And write back in the storer at the scalar that the end of code block will be referred to memory write, execution in step 4 then);
4) the intermediate representation modular converter transfers the intermediate representation data of syntax tree form to the intermediate representation data of instruction list form; Each bar instruction that the node generation module will be demarcated in the code block is classified according to the number of source operand; Sorted code is carried out the generation of corresponding calculated node CNode, and execution in step 5 then);
5) the limit generation module is analyzed the dependency information between each computing node according to the variable name of the data of opnd in each computing node and dst; The computing of existence from a variable set set to another variable a; Just exist from the set set all variablees to the directed edge of variable a; Thereby the generation directed edge, and form DFD.
The number of times of the loop unrolling described step 2) only extracts the loop body part for being specified by the user or discerning automatically according to the semanteme of program when carrying out the extraction of DFD for the part of not launching here.
Described step 2) used a look-up table to write down the state of these accessing operations in, thereby guaranteed that the dependence after the replacement is constant with semanteme.
Instruction is categorized as single-operand instruction, double operand instruction and 3-operand instruction according to the number of source operand in the described step 4).
The territory of computing node CNode comprises node serial number Num, operational character op, operand number opnd_num, operand opnd and destination operand dst in the described step 4).
Compared with prior art; The present invention has realized effective extraction of round-robin DFD in the higher-level language code; The invention has the beneficial effects as follows that the loop unrolling module can be according to the scale of round-robin scale automatic expansion DFD; Look ahead and write back module and can suffer accessing operation by active set, the access delay that effectively shortens the stratification storage system is more than 30%.
Embodiment
Below in conjunction with specific embodiment the present invention is elaborated.
Embodiment
DFD method for distilling of the present invention can be used for improving the extraction of the for round-robin DFD of C programmer MPEG2, mainly may further comprise the steps:
1) the code text scan module carries out morphology and grammatical analysis to whole higher-level language code, and the intermediate representation of the tree-like formula of generative grammar, and the intermediate representation of syntax tree form has kept the structures of higher-level language code, is easy to the analysis of compiling system.The code text scan module be general senior language compiler system compile and optimize before necessary module.
2) the loop unrolling module with the intermediate representation of syntax tree form as input; Obtain wherein code structure information; And, at first confirm the round-robin times N according to round-robin starting condition Cs and termination condition Ce for each circulation, code structure information comprises process, code block and statement.For the for circulation of C language, the computing formula of cycle index N:
N=(U-L)/step, wherein U is the circulation upper bound, and L is the circulation lower bound, and step is the circulation step-length.
Operation below then carrying out for each circulation:
1. round-robin launches.Then decide the number of times M of expansion, be defaulted as whole expansion, i.e. M=N according to user's input; According to launching number of times M circulation is launched, and the variable of circular correlation is replaced with the irrelevant variable of circulation, to keep the semanteme of program;
2. for the part of not launching (N-M), it is remained the loop structure of original program, and cycle index is set to N-M;
3. the code block after will launching is labeled as B, if can't confirm the round-robin number of times here, then this part is not carried out the round-robin expansion, and loop body is partly demarcated.
3) look ahead and write back module the code block of demarcating is carried out looking ahead and writing back of memory access variable; At first the operand with memory read is prefetched in the scalar; The operand of all read-write memories in the circulation is replaced with corresponding scalar, and write back in the storer at the scalar that the end of code block will be referred to memory write.Need use lookup table technology to come the dependence of record code here; Dependence with after the maintenance replacement is constant with semanteme; So both supported general reconfigurable arrays not have this characteristic of memory access function; Also active set has suffered accessing operation, effectively shortens the access delay of stratification storage system.A newly-built look-up table LUT is used for the state of all memory access variablees of record code piece B.For each accessing operation m among the B, operation below carrying out:
If 1. m is in LUT, then it is replaced with the scalar of respective items among the LUT, if do not exist, then m is replaced with a new scalar, and together be inserted in LUT with this new scalar m;
If 2. m is a read memory operation, then the read-write state of respective items in the LUT table is recorded as and reads; If m is the memory write operation, then the read-write state of respective items in the LUT table is recorded as and writes;
3. each in LUT being shown is carried out corresponding looking ahead and writing back according to its read-write state, and the state that occur promptly a certain the first time is then looked ahead for reading, as long as there is a kind of state need write back for writing then.
4) the intermediate representation modular converter transfers the intermediate representation of syntax tree form the intermediate representation of instruction list form to, and the intermediate representation of instruction list form is simple in structure, is easy to the dependence of routine analyzer.Each bar instruction that the node generation module will be demarcated in the code block is classified according to the number of source operand, is divided into (like absolute value operation, inversion operation etc.); Two operand instruction (like arithmetical operation, logical operation etc.); 3-operand instruction (like three order operations etc.).Sorted code is carried out the generation of corresponding calculated node CNode.The territory of computing node CNode comprises node serial number Num, operational character op, operand number opnd_num, operand opnd, destination operand dst.Node serial number Num is the unique identification code that generates at random, and the source operand in the computations is as the opnd of computing node, and destination operand is as the dst of computing node, and operand is as the operational character op of computing node, and opnd_num is the type of instruction.
5) the limit generation module is analyzed the dependence between each computing node according to the variable name of the data of opnd in each computing node and dst; The computing of existence from variable set set to another variable a just exists from the set set all variablees to the directed edge of variable a.The limit generation module can calculate the complete input and output information of each node, and these information are backfilling in each node, will not have the vertex ticks of forerunner's node to be input node INode, and the vertex ticks that will not have descendant node is output node ONode.At this moment the round-robin DFD generates.
Use the result of implementation of present embodiment to show; Use the loop unrolling module can make the number of DFD increase by 2.7 times; Average individual data flow graph scale increases by 34%; Looking ahead and writing back the module active set has suffered accessing operation, the access delay 46% of eliminating duplication storage system under the constant situation of other conditions.
In sum; The present invention has realized the method for distilling of round-robin DFD in the higher-level language code; Can effectively analyze and extract the DFD in the circulation; The invention has the beneficial effects as follows can be according to the scale of round-robin scale automatic expansion DFD, and can suffer accessing operation by active set, than the access delay of eliminating duplication of prior art storage system.