This article has multiple issues. Please helpimprove it or discuss these issues on thetalk page.(Learn how and when to remove these messages) (Learn how and when to remove this message)
|
TheC preprocessor (CPP) is atext fileprocessor that is used withC,C++ and otherprogramming tools. The preprocessor provides for file inclusion (oftenheader files),macro expansion,conditional compilation, and line control. Although named in association with C and used with C, the preprocessor capabilities are not inherently tied to the C language. It can be and is used to process other kinds of files.[1]
C, C++, andObjective-C compilers provide a preprocessor capability, as it is required by the definition of each language. Some compilers provide extensions and deviations from the target language standard. Some provide options to control standards compliance. For instance, the GNU C preprocessor can be made more standards compliant by supplying certain command-line flags.[2]
TheC# programming language also allows fordirectives, though they are not read by a preprocessor and they cannot be used for creating macros, and are generally more intended for features such as conditional compilation.[3] C# seldom requires the use of the directives, for example code inclusion does not require a preprocessor at all (as C# relies on a package/namespace system like Java, no code needs to be "included"). Similarly,F# andVisual J# are able to call these C# preprocessor directives.
TheHaskell programming language also allows the usage of the C preprocessor.
Features of the preprocessor are encoded insource code asdirectives that start with#.
Although C++ source files are often named with a.cpp extension, that is an abbreviation for "C plus plus"; not C preprocessor.
The following languages have the following accepted directives.
The following tokens are recognised by the preprocessor in the context of preprocessor directives.
#if#elif#else#endif#ifdef#ifndef#elifdef#elifndef#define#undef#include#embed#line#error#warning#pragmadefined (follows a conditional directive; not actually a directive, but rather an operator)__has_include (operator)__has_cpp_attribute (operator)__has_c_attribute (operator)__has_embed (operator)UntilC++26, the C++ keywordsimport,export, andmodule were partially handled by the preprocessor as well.
The Haskell programming language also accepts C preprocessor directives, which is invoked by writing{-# LANGUAGE CPP #-} at the top of the file. The accepted preprocessor directives align with those in standard C/C++.
Although C#,F#[4] andVisual J# do not have a separate preprocessor, these directives are processed as if there were one.
#nullable#if#elif#else#endif#define#undef#region#endregion#error#warning#line#pragmaC# does not use a preprocessor to handle these directives, and thus they are not handled or removed by a preprocessor, but rather directly read by the C# compiler as a feature of the language.
The following tokens are recognised by the preprocessor in the context of preprocessor directives.
#if#elif#else#endif#ifdef#ifndef#define#undef#include#import#error#pragmadefinedThe preprocessor was introduced to C around 1973 at the urging of Alan Snyder and also in recognition of the usefulness of the file inclusion mechanisms available inBCPL andPL/I. The first version offered file inclusion via#include and parameterless string replacement macros via#define. It was extended shortly after, firstly byMike Lesk and then by John Reiser, to add arguments to macros and to supportconditional compilation.[5]
The C preprocessor was part of a long macro-language tradition at Bell Labs, which was started by Douglas Eastwood andDouglas McIlroy in 1959.[6]
Preprocessing is defined by the first four (of eight)phases of translation specified in the C Standard.
_Pragma operators.There are two directives in the C preprocessor for including contents of files:
#include, used for directly including the contents of a file in-place (typically containing code of some kind)#embed, used for directly including or embedding the contents of a binary resource in-placeTo include the content of one file into another, the preprocessor replaces a line that starts with#include with the content of the file specified after the directive. The inclusion may be logical in the sense that the resulting content may not be stored on disk and certainly is not overwritten to the source file. The file being included need not contain any sort of code, as this directive will copy the contents of whatever file is included in-place, but the most typical use of#include is to include a header file (or in some rarer cases, a source file).
In the following example code, the preprocessor replaces the line#include <stdio.h> with the content of the standard library header file named 'stdio.h' in which thefunctionprintf() and other symbols are declared.
#include<stdio.h>intmain(void){printf("Hello, World!\n");return0;}
In this case, the file name is enclosed in angle brackets to denote that it is a system file. For a file in thecodebase beingbuilt, double-quotes are used instead. The preprocessor may use a different search algorithm to find the file based on this distinction.
For C, a header file is usually named with a.h extension. In C++, the convention for file extension varies with common extensions.h and.hpp. But the preprocessor includes a file regardless of the extension. In fact, sometimes code includes.c or.cpp files.
To prevent including the same file multiple times, which often leads to a compiler error, a header file typically contains an#include guard or if supported by the preprocessor#pragma once to prevent multiple inclusion.
C23 andC++26 introduce the#embed directive forbinary resource inclusion, which allows including the content of a binary file into a source even if it is not valid C code.[7][8]This allows binary resources (like images) to be included into a program without requiring processing by external tools likexxd -i and without the use ofstring literals, which have a length limit onMSVC. Similarly toxxd -i, the directive is replaced by a comma separated list of integers corresponding to the data of the specified resource. More precisely, if an array of typeunsigned char is initialized using an#embed directive, the result is the same as-if the resource was written to the array usingfread (unless a parameter changes the embed element width to something other thanCHAR_BIT). Apart from the convenience,#embed is also easier for compilers to handle, since they are allowed to skip expanding the directive to its full form due to theas-if rule.
The file to embed is specified the same as for#include – either withbrackets or double quotes. The directive also allows certain parameters to be passed to it to customize its behavior. The C standard defines some parameters and implementations may define additional. Thelimit parameter is used to limit the width of the included data. It is mostly intended to be used with "infinite" files likeurandom. Theprefix andsuffix parameters allow for specifying a prefix and suffix to the embedded data. Finally, theif_empty parameter replaces the entire directive if the resource is empty. All standard parameters can be surrounded by double underscores, just like standard attributes on C23, for example__prefix__ is interchangeable withprefix . Implementation-defined parameters use a form similar toattribute syntax (e.g.,vendor::attr) but without the square brackets. While all standard parameters require an argument to be passed to them (e.g., limit requires a width), this is generally optional and even the set of parentheses can be omitted if an argument is not required, which might be the case for some implementation-defined parameters.
constunsignedchariconDisplayData[]={#embed "art.png"};// specify any type which can be initialized form integer constant expressions will doconstcharresetBlob[]={#embed "data.bin"};// attributes work just as wellalignas(8)constsignedcharalignedDataString[]={#embed "attributes.xml"};intmain(){return#embed </dev/urandom> limit(1);}
Conditional compilation is supported via theif-else core directives#if,#else,#elif, and#endif and with contraction directives#ifdef and#ifndef, which stand for#if defined(...) and#if !defined(...), respectively. In the following example code, theprintf() call is only included for compilation ifVERBOSE is defined.
#ifdef VERBOSEprintf("trace message");#endif
The following demonstrates more complex logic:
#if !(defined __LP64__ || defined __LLP64__) || defined _WIN32 && !defined _WIN64// code for a 32-bit system#else// code for a 64-bit system#endif
A macro specifies how to replace text in the source code with other text. Anobject-like macro defines a token that the preprocessor replaces with other text. It does not include parameter syntax and therefore cannot support parameterization. The following macro definition associates the text "1 / 12" with the token "VALUE":
#define VALUE 1 / 12Afunction-like macro supports parameters, although the parameter list can be empty. The following macro definition associates the expression "(A + B)" with the token "ADD" that has parameters "A" and "B".
#define ADD(A, B) (A + B)A function-like macro declaration cannot have whitespace between the token and the first, opening parenthesis. If whitespace is present, the macro is interpreted as object-like with everything starting at the first parenthesis included in the replacement text.
The preprocessor replaces each token of the code that matches a macro token with the associated replacement text in what is known asmacro expansion. Note that text of string literals and comments is not parsed as tokens and is therefore ignored for macro expansion. For a function-like macro, the macro parameters are also replaced with the values specified in the macro reference. For example,ADD(VALUE, 2) expands to1 / 12 + 2.
Avariadic macro (introduced withC99) accepts a varying number of arguments, which is particularly useful when wrapping functions that accept a variable number of parameters, such asprintf.
Function-like macro expansion occurs in the following stages:
This may produce surprising results:
#define HE HI#define LLO _THERE#define HELLO "HI THERE"#define CAT(a,b) a##b#define XCAT(a,b) CAT(a,b)#define CALL(fn) fn(HE,LLO)CAT(HE,LLO)// "HI THERE", because concatenation occurs before normal expansionXCAT(HE,LLO)// HI_THERE, because the tokens originating from parameters ("HE" and "LLO") are expanded firstCALL(CAT)// "HI THERE", because this evaluates to CAT(a,b)
A macro definition can be removed from the preprocessor context via#undef such that subsequent reference to the macro token will not expand. For example:
#undef VALUEThe preprocessor provides some macro definitions automatically. The C standard specifies that__FILE__ expands to the name of the file being processed and__LINE__ expands to the number of the line that contains the directive. The following macro,DEBUGPRINT, formats and prints a message with the file name and line number.
#define DEBUGPRINT(_fmt, ...) printf("[%s:%d]: " _fmt, __FILE__, __LINE__, __VA_ARGS__)For the example code below that is on line 30 of fileutil.c and for count 123, the output is:[util.c:30]: count=123.
DEBUGPRINT("count=%d\n",count);
The firstC Standard specified that__STDC__ expand to "1" if the implementation conforms to the ISO standard and "0" otherwise and that__STDC_VERSION__ expand to a numeric literal specifying the version of the standard supported by the implementation. Standard C++ compilers support the__cplusplus macro. Compilers running in non-standard mode must not set these macros or must define others to signal the differences.
Other standard macros include__DATE__, the current date, and__TIME__, the current time.
The second edition of the C Standard,C99, added support for__func__, which contains the name of the function definition within which it is contained, but because the preprocessor isagnostic to the grammar of C, this must be done in the compiler itself using a variable local to the function.
One little-known usage pattern of the C preprocessor is known asX-Macros.[9][10][11] An X-Macro is aheader file. Commonly, these use the extension.def instead of the traditional.h . This file contains a list of similar macro calls, which can be referred to as "component macros." The include file is then referenced repeatedly.
Many compilers define additional, non-standard macros. A common reference for these macros is thePre-defined C/C++ Compiler Macros project, which lists "various pre-defined compiler macros that can be used to identify standards, compilers, operating systems, hardware architectures, and even basic run-time libraries at compile-time."
Most compilers targetingMicrosoft Windows implicitly define_WIN32.[12] This allows code, including preprocessor commands, to compile only when targeting Windows systems. A few compilers defineWIN32 instead. For such compilers that do not implicitly define the_WIN32 macro, it can be specified on the compiler's command line, using-D_WIN32.
#ifdef __unix__/* __unix__ is usually defined by compilers targeting Unix systems */#include<unistd.h>#elif defined _WIN32/* _WIN32 is usually defined by compilers targeting 32 or 64 bit Windows systems */#include<windows.h>#endif
The example code tests if a macro__unix__ is defined. If it is, the file<unistd.h> is then included. Otherwise, it tests if a macro_WIN32 is defined instead. If it is, the file<windows.h> is then included.
The values of the predefined macros__FILE__ and__LINE__ can be set for a subsequent line via the#line directive. In the code below,__LINE__ expands to 314 and__FILE__ to "pi.c".
#line 314 "pi.c"printf("line=%d file=%s\n",__LINE__,__FILE__);
The preprocessor is capable of interpreting operators and evaluating very basic expressions, such as integer constants, arithmetic operators, comparison operators, logical operators, bitwise operations, thedefined operator, and the# stringificafion operator. This allows the preprocessor to make evaluations such as:
#if X == 10// if X equals 10, the preprocessor sees #if 10 == 10
While thedefined operator, denoted bydefined is not a directive in its own right, if it is read within a directive, it is interpreted by the preprocessor and determines whether a macro has been defined.
The following are both accepted ways of invoking thedefined operator.
#if defined(MY_MACRO)#if defined MY_MACRO
Thestringification operator (a.k.a. stringizing operator), denoted by# converts a token into astring literal, escaping any quotes or backslashes as needed. For definition:
#define str(s) #sstr(\n) expands to"\n" andstr(p = "foo\n";) expands to"p = \"foo\\n\";".
If stringification of the expansion of a macro argument is desired, two levels of macros must be used. For definition:
#define xstr(s) str(s)#define str(s) #s#define foo 4
str(foo) expands to "foo" andxstr(foo) expands to "4".
A macro argument cannot be combined with additional text and then stringified. However, a series of adjacent string literals and stringified arguments, also string literals, are concatenated by the C compiler.
Thetoken pasting operator, denoted by##, concatenates two tokens into one. For definition:
#define DECLARE_STRUCT_TYPE(name) typedef struct name##_s name##_tDECLARE_STRUCT_TYPE(g_object) expands totypedef struct g_object_s g_object_t.
Processing can be aborted via the#error directive. For example:
#if RUBY_VERSION == 190#error Ruby version 1.9.0 is not supported#endif
As ofC23[13] andC++23,[14] a warning directive,#warning, to print a message without aborting is provided. Some typical uses are to warn about the use ofdeprecated functionality. For example:
Prior to C23 and C++23, this directive existed in many compilers as a non-standard feature, such as the C compilers by GNU, Intel, Microsoft and IBM. Because it was non-standard, the warning macro had varying forms:
// GNU, Intel and IBM#warning "Do not use ABC, which is deprecated. Use XYZ instead."// Microsoft#pragma message("Do not use ABC, which is deprecated. Use XYZ instead.")
#pragmaThe#pragma directive is defined by standard languages, but with little or no requirements for syntax after its name so that compilers are free to define subsequent syntax and associated behavior. For instance, a pragma is often used to allow suppression of error messages, manage heap and stack debugging and so on.
C99 introduced a few standard pragmas, taking the form#pragma STDC ..., which are used to control the floating-point implementation. The alternative, macro-like form_Pragma(...) was also added.
One of the most popular uses of the#pragma directive is#pragma once, which behaves the same way an#include guard would, condensed into a single directive placed at the top of the file. Despite being non-standard, it is supported by most compilers.
Many implementations do not support trigraphs or do not replace them by default.
SomeUnix preprocessors provided anassertion feature – which has little similarity to standard library assertions.[15]
#include_nextGCC provides#include_next for chaining headers of the same name.[16]
For example, if one overrides the file<stdio.h>, trying to include the standard library<stdio.h> would cause an infinite recursion of including if using#include, as it would re-include itself.#include_next solves this by including the next<stdio.h> found.
// override_stdio/stdio.h#ifndef MY_STDIO_H#define MY_STDIO_H// Custom overrides#define printf(...) my_custom_printf(__VA_ARGS__)// Include the next stdio.h in the search path#include_next <stdio.h>#endif
#importUnlike C and C++, Objective-C includes an#import directive that is like#include but results in a file being included only once – eliminating the need for include guards and#pragma once. It is a standard part of Objective-C.
#import <Foundation/Foundation.h>#import "MyClass.h"
InMicrosoft Visual C++ (MSVC), there also exists an#import preprocessor directive, used to import type libraries.[17] It is a nonstandard directive.
#import "C:\\Program Files\\Common Files\\System\\ado\\msado15.dll" no_namespace rename("EOF", "ADOEOF")These should not be confused with the C++ keywordimport, which is used to import C++modules (sinceC++20), and is not a preprocessor directive.
The null directive, which consists only of the# character, alone on a single line, is a non-standard directive in Microsoft Visual C++. It has no effect.[18]
#nullableThe#nullable directive in C# is used to enable and disable nullable reference types. To enable them, use#nullable enable, and#nullable disable to disable them.
#nullableenablestring?name=null;// OKstringfullName=null;// Warning: possible null assignment#nullabledisablestringtest=null;// No warning
This directive does not exist in C/C++.
#regionThe#region and#endregion directives in C# are used to expand/collapse sections of code in IDEs, and has no effect on actual compilation of the program. It is primarily used for code organisation and readability.
#region Helper methodsvoidLog(stringmessage){Console.WriteLine(message);}#endregion
While this directive does not exist in C/C++, MSVC and Visual Studio instead have#pragma region and#pragma endregion.[19] Thus the equivalent C++ code would be:
usingstd::string_view;#pragma region Helper methodsvoidlog(string_viewmessage){std::println(message);}#pragma endregion
#usingC++/CLI has the#using directive, which is used to import metadata into a program from a Microsoft Intermediate Language file (such as a.dll file).[20]
#using <MyComponent.dll>#using "AssemblyA.dll"#using "AssemblyB.dll"usingnamespaceSystem;publicrefclassB{publicvoidTest(Aa){// ...}};intmain(array<String^>^args){Aa;Bb;B.Test(a);}
Traditionally, the C preprocessor was a separatedevelopment tool from the compiler with which it is usually used. In that case, it can be used separately from the compiler. Notable examples include use with the (deprecated)imake system and for preprocessingFortran. However, use as ageneral purpose preprocessor is limited since the source code language must be relatively C-like for the preprocessor to parse it.[2]
TheGNU Fortran compiler runs "traditional mode" CPP before compiling Fortran code if certain file extensions are used.[21] Intel offers a Fortran preprocessor, fpp, for use with theifort compiler, which has similar capabilities.[22]
CPP also works acceptably with mostassembly languages and Algol-like languages. This requires that the language syntax not conflict with CPP syntax, which means no lines starting with# and that double quotes, which CPP interprets asstring literals and thus ignores, don't have syntactical meaning other than that. The "traditional mode" (acting like a pre-ISO C preprocessor) is generally more permissive and better suited for such use.[23]
Some modern compilers such as theGNU C Compiler provide preprocessing as a feature of the compiler; not as a separate tool.
Text substitution has a relatively high risk of causing asoftware bug as compared to other programming constructs.[24][25]
Consider the common definition of amax macro:
#define max(a,b) (((a) > (b)) ? (a) : (b))The expressions represented bya andb are both evaluated two times due to macro expansion, but this aspect is not obvious in the code where the macro is referenced. If the actual expressions have constant value, then multiple evaluation is not problematic from a logic standpoint even though it can affect runtime performance. But if an expression evaluates to a different value on subsequent evaluation, then the result may be unexpected. For example, givenint i = 1; j = 2;, the result ofmax(i,j) is 2. Ifa andb were only evaluated once, the result ofmax(i++,j++) would be the same, but with double evaluation the result is 3.
Failure to bracket arguments can lead to unexpected results. For example, a macro to double a value might be written as:
#define double(x) 2 * xButdouble(1 + 2) expands to2 * 1 + 2, which due to order of operations evaluates to 4 when the expected is 6. To mitigate this problem, a macro should bracket all expressions and substitution variables:
#define double(x) (2 * (x))The C preprocessor is notTuring-complete, but comes close. Recursive computations can be specified, but with a fixed upper bound on the amount of recursion performed.[26] However, the C preprocessor is not designed to be, nor does it perform well as, a general-purpose programming language. As the C preprocessor does not have features of some other preprocessors, such as recursive macros, selective expansion according to quoting, and string evaluation in conditionals, it is very limited in comparison to a more general macro processor such asm4.
Due to its limitations and lack of type safety (as the preprocessor is completely oblivious to C/C++ grammar, performing only text substitutions), C and C++ language features have been added over the years to minimize the value and need for the preprocessor.
For a long time, a preprocessor macro provided the preferred way to define a constant value. An alternative has always been to define aconst variable, but that results in consuming runtime memory. A newer language construct (since C++11 and C23),constexpr allows for declaring a compile-time constant value that need not consume runtime memory.[27]
For a long time, a function-like macro was the only way to define function-like behavior that did not incur runtime function call overhead. Via theinline keyword andoptimizing compilers that inline automatically, some functions can be invoked without call overhead.
The include directive limits code structure since it only allows including the content of one file into another. More modern languages support amodule concept that has public symbols that other modules import – instead of including file content. Many contend that resulting code has reduced boilerplate and is easier to maintain since there is only one file for a module, not both a header and a body.C++20 addsmodules, and animport statement that is not handled via preprocessing.[28][29] Modules in C++ compile faster and link faster than traditional headers,[30] and eliminate the necessity of#include guards or#pragma once. Until C++26,import,export, andmodule keywords were partially handled by the preprocessor.
For code bases that cannot migrate to modules immediately, C++ also offers "header units" as a feature, which allows header files to be imported in the same way a module would. Unlike modules, header units may emit macros, offering minimal breakage between migration. Header units are designed to be a transitional solution before totally migrating to modules.[31] For instance, one may writeimport<string>; instead of#include<string>, orimport"MyHeader.hpp"; instead of#include"MyHeader.hpp". Paradoxically, most build systems, such asCMake, do not currently support this feature.
InClang, a non-standardmodule feature forC is offered, allowing importing headers as modules.[32]
{{cite journal}}:Cite journal requires|journal= (help)Having said that, you can often get away with using cpp on things which are not C. Other Algol-ish programming languages are often safe (Ada, etc.) So is assembly, with caution. -traditional-cpp mode preserves more white space, and is otherwise more permissive. Many of the problems can be avoided by writing C or C++ style comments instead of native language comments, and keeping macros simple.