This Code generation introneeds additional citations forverification. Please helpimprove this article byadding citations to reliable sources in this Code generation intro. Unsourced material may be challenged and removed. Find sources: "Code generation" compiler – news ·newspapers ·books ·scholar ·JSTOR(November 2006) (Learn how and when to remove this message) |
Incomputing,code generation is part of the process chain of acompiler, in which anintermediate representation ofsource code is converted into a form (e.g.,machine code) that can be readily executed by the target system.
Sophisticated compilers typically performmultiple passes over various intermediate forms. This multi-stage process is used because manyalgorithms forcode optimization are easier to apply one at a time, or because the input to one optimization relies on the completed processing performed by another optimization. This organization also facilitates the creation of a single compiler that can target multiple architectures, as only the last of the code generation stages (thebackend) needs to change from target to target. (For more information on compiler design, seeCompiler.)
The input to the code generator typically consists of aparse tree or anabstract syntax tree.[1] The tree is converted into a linear sequence of instructions, usually in anintermediate language such asthree-address code. Further stages of compilation may or may not be referred to as "code generation", depending on whether they involve a significant change in the representation of the program. (For example, apeephole optimization pass would not likely be called "code generation", although a code generator might incorporate a peephole optimization pass.)
In addition to the basic conversion from an intermediate representation into a linear sequence of machine instructions, a typical code generator tries to optimize the generated code in some way.
Tasks which are typically part of a sophisticated compiler's "code generation" phase include:
Instruction selection is typically carried out by doing arecursivepostorder traversal on the abstract syntax tree, matching particular tree configurations against templates; for example, the treeW := ADD(X,MUL(Y,Z))
might be transformed into a linear sequence of instructions by recursively generating the sequences fort1 := X
andt2 := MUL(Y,Z)
, and then emitting the instructionADD W, t1, t2
.
In a compiler that uses an intermediate language, there may be two instruction selection stages—one to convert the parse tree into intermediate code, and a second phase much later to convert the intermediate code into instructions from theinstruction set of the target machine. This second phase does not require a tree traversal; it can be done linearly, and typically involves a simple replacement of intermediate-language operations with their correspondingopcodes. However, if the compiler is actually alanguage translator (for example, one that convertsJava toC++), then the second code-generation phase may involvebuilding a tree from the linear intermediate code.
When code generation occurs atruntime, as injust-in-time compilation (JIT), it is important that the entire process beefficient with respect to space and time. For example, whenregular expressions are interpreted and used to generate code at runtime, a non-deterministicfinite-state machine is often generated instead of a deterministic one, because usually the former can be created more quickly and occupies less memory space than the latter. Despite its generally generating less efficient code, JIT code generation can take advantage ofprofiling information that is available only at runtime.
The fundamental task of taking input in one language and producing output in a non-trivially different language can be understood in terms of the coretransformational operations offormal language theory. Consequently, some techniques that were originally developed for use in compilers have come to be employed in other ways as well. For example,YACC (Yet AnotherCompiler-Compiler) takes input inBackus–Naur form and converts it to a parser inC. Though it was originally created for automatic generation of a parser for a compiler, yacc is also often used to automate writing code that needs to be modified each time specifications are changed.[3]
Manyintegrated development environments (IDEs) support some form of automaticsource-code generation, often using algorithms in common with compiler code generators, although commonly less complicated. (See also:Program transformation,Data transformation.)
In general, a syntax and semantic analyzer tries to retrieve the structure of the program from the source code, while a code generator uses this structural information (e.g.,data types) to produce code. In other words, the formeradds information while the latterloses some of the information. One consequence of this information loss is thatreflection becomes difficult or even impossible. To counter this problem, code generators often embed syntactic and semantic information in addition to the code necessary for execution.
code generation.