| Program execution |
|---|
| General concepts |
| Types of code |
| Compilation strategies |
| Notable runtimes |
|
| Notable compilers & toolchains |
|
Anintermediate representation (IR) is thedata structure or code used internally by acompiler orvirtual machine to representsource code. An IR is designed to be conducive to further processing, such asoptimization andtranslation.[1] A "good" IR must beaccurate – capable of representing the source code without loss of information[2] – andindependent of any particular source or target language.[1] An IR may take one of several forms: an in-memorydata structure, or a specialtuple- orstack-basedcode readable by the program.[3] In the latter case it is also called anintermediate language.
A canonical example is found in most modern compilers. For example, theCPython interpreter transforms the linear human-readable text representing a program into an intermediategraph structure that allowsflow analysis and re-arrangement before execution. Use of an intermediate representation such as this allows compiler systems like theGNU Compiler Collection andLLVM to be used by many different source languages togenerate code for many different targetarchitectures.
This sectionneeds additional citations forverification. Please helpimprove this article byadding citations to reliable sources in this section. Unsourced material may be challenged and removed.(February 2025) (Learn how and when to remove this message) |
Anintermediate language is the language of anabstract machine designed to aid in the analysis ofcomputer programs. The term comes from their use incompilers, where the source code of a program is translated into a form more suitable for code-improving transformations before being used to generateobject ormachine code for a target machine. The design of an intermediate language typically differs from that of a practicalmachine language in three fundamental ways:
A popular format for intermediate languages isthree-address code.
The term is also used to refer to languages used as intermediates by somehigh-level programming languages which do not output object or machine code themselves, but output the intermediate language only. This intermediate language is submitted to a compiler for such language, which then outputs finished object or machine code. This is usually done to ease the process ofoptimization or to increaseportability by using an intermediate language that has compilers for manyprocessors andoperating systems, such asC. Languages used for this fall in complexity between high-level languages andlow-level languages, such asassembly languages.
Though not explicitly designed as an intermediate language,C's nature as an abstraction ofassembly and its ubiquity as thede factosystem language inUnix-like and other operating systems has made it a popular intermediate language:Eiffel,Sather,Esterel, somedialects ofLisp (Lush,Gambit),Squeak's Smalltalk-subset Slang,Nim,Cython,SystemTap,Vala, V, and others make use of C as an intermediate language. Variants of C have been designed to provide C's features as a portableassembly language, includingC-- and theC Intermediate Language.
Any language targeting avirtual machine orp-code machine can be considered an intermediate language:
TheGNU Compiler Collection (GCC) uses several intermediate languages internally to simplify portability andcross-compilation. Among these languages are
GCC supports generating these IRs, as a final target:
TheLLVM compiler framework is based on theLLVM IR intermediate language, of which the compact, binary serialized representation is also referred to as "bitcode" and has been productized by Apple.[4][5] Like GIMPLE Bytecode, LLVM Bitcode is useful in link-time optimization. Like GCC, LLVM also targets some IRs meant for direct distribution, including Google'sPNaCl IR andSPIR. A further development within LLVM is the use ofMulti-Level Intermediate Representation (MLIR) with the potential to generate code for different heterogeneous targets, and to combine the outputs of different compilers.[6]
The ILOC intermediate language[7] is used in classes on compiler design as a simple target language.[8]
Static analysis tools often use an intermediate representation. For instance,Radare2 is a toolbox for binary files analysis and reverse-engineering. It uses the intermediate languages ESIL[9] and REIL[10] to analyze binary files.