This articleneeds additional citations forverification. Please helpimprove this article byadding citations to reliable sources. Unsourced material may be challenged and removed. Find sources: "Preprocessor" – news ·newspapers ·books ·scholar ·JSTOR(February 2013) (Learn how and when to remove this message) |
Data transformation |
---|
Concepts |
Transformation languages |
Techniques and transforms |
Applications |
Related |
Incomputer science, apreprocessor (orprecompiler)[1] is aprogram that processes its input data to produce output that is used as input in another program. The output is said to be apreprocessed form of the input data, which is often used by some subsequent programs likecompilers. The amount and kind of processing done depends on the nature of the preprocessor; some preprocessors are only capable of performing relatively simple textual substitutions andmacro expansions, while others have the power of full-fledgedprogramming languages.
A common example fromcomputer programming is the processing performed onsource code before the next step of compilation.In somecomputer languages (e.g.,C andPL/I) there is a phase oftranslation known aspreprocessing. It can also include macro processing, file inclusion and language extensions.
Lexical preprocessors are the lowest-level of preprocessors as they only requirelexical analysis, that is, they operate on the source text, prior to anyparsing, by performing simple substitution oftokenized character sequences for other tokenized character sequences, according to user-defined rules. They typically performmacro substitution,textual inclusion of other files, and conditional compilation or inclusion.
The most common example of this is theC preprocessor, which takes lines beginning with '#' asdirectives.The C preprocessor does not expect its input to use the syntax of the C language.Some languages take a different approach and use built-in language features to achieve similar things. For example:
if-then-else
anddead code elimination to achieveconditional compilation.Other lexical preprocessors include the general-purposem4, most commonly used in cross-platform build systems such asautoconf, andGEMA, an open source macro processor which operates on patterns of context.
Syntactic preprocessors were introduced with theLisp family of languages. Their role is to transform syntax trees according to a number of user-defined rules. For some programming languages, the rules are written in the same language as the program (compile-time reflection). This is the case withLisp andOCaml. Some other languages rely on a fully external language to define the transformations, such as theXSLT preprocessor forXML, or its statically typed counterpart CDuce.
Syntactic preprocessors are typically used to customize the syntax of a language, extend a language by adding new primitives, or embed adomain-specific programming language (DSL) inside a general purpose language.
A good example of syntax customization is the existence of two different syntaxes in theObjective Caml programming language.[2] Programs may be written indifferently using the "normal syntax" or the "revised syntax", and may be pretty-printed with either syntax on demand.
Similarly, a number of programs written inOCaml customize the syntax of the language by the addition of new operators.
The best examples of language extension through macros are found in theLisp family of languages. While the languages, by themselves, are simple dynamically typed functional cores, the standard distributions ofScheme orCommon Lisp permit imperative or object-oriented programming, as well as static typing. Almost all of these features are implemented by syntactic preprocessing, although it bears noting that the "macro expansion" phase of compilation is handled by the compiler in Lisp. This can still be considered a form of preprocessing, since it takes place before other phases of compilation.
One of the unusual features of theLisp family of languages is the possibility of using macros to create an internal DSL. Typically, in a largeLisp-based project, a module may be written in a variety of suchminilanguages, one perhaps using aSQL-based dialect ofLisp, another written in a dialect specialized forGUIs or pretty-printing, etc.Common Lisp's standard library contains an example of this level of syntactic abstraction in the form of the LOOP macro, which implements an Algol-like minilanguage to describe complex iteration, while still enabling the use of standard Lisp operators.
TheMetaOCaml preprocessor/language provides similar features for external DSLs. This preprocessor takes the description of the semantics of a language (i.e. an interpreter) and, by combining compile-time interpretation and code generation, turns that definition into a compiler to theOCaml programming language—and from that language, either to bytecode or to native code.
Most preprocessors are specific to a particular data processing task (e.g.,compiling the C language). A preprocessor may be promoted as beinggeneral purpose, meaning that it is not aimed at a specific usage or programming language, and is intended to be used for a wide variety of text processing tasks.
M4 is probably the most well known example of such a general purpose preprocessor, although the C preprocessor is sometimes used in a non-C specific role. Examples: