Movatterモバイル変換

In praise of the C preprocessor

Let's face it. cpp, the C preprocessor, gets a lot of flak among languagedesigners. People blame it for all sorts of atrocities, including code thatdoesn't do what it says it does, weird side effects caused bydouble-evaluation of parameters, pollution of namespaces, slow compilations,and the generally annoying evil that is the wholeconcept ofdeclaring your API in C/C++ header files.

All of these accusations are true. But there are some things you just can'tdo without it. Watch.

Example #1: doubling

Here's a typical example of why C/C++ preprocessor macros are "bad." What'swrong with this macro?

#define DOUBLE(x)   ((x)+(x))

(Notice all the parens. Neophytes often leave those out too, and hilarityensues when something like 3DOUBLE(4) turns into 34+4 instead of 3*(4+4).)

But the above macro has a bug too. Here's a hint: what if you write this?

y = DOUBLE(++x);

Aha. It expands to y=((++x)+(++x)), so x gets incrementedtwiceinstead of just once like you expected.

Macro-haters correctly point out that in C++ (and most newer C compilers),you can use an inline function to avoid this problem and everything like it:

inline double DOUBLE(double x) { return x+x; }

This works great, and look: I didn't need the extra parens either. That'sbecause C++ language rules require the parameter to be fully evaluated firstbefore we implement the function, whether it's inline or not. It would havebeen totally disastrous if inline functions didn't work like that.

Oh, but actually, that one function isn't really good enough: what if x isan int, or an instance of class Complex? The macro can doubleanything, but the inline can only double floating point numbers.

Never fear: C++ actually has a replacement macro system that's intended toobsolete cpp. It handles this case perfectly:

template<typename T>inline T DOUBLE(T x) { return x+x; }

Cool! Now we can double any kind of object we want, assuming it supportsthe "+" operation. Of course, we're getting a little heavy on screwy syntax - the #define was mucheasier to read - but it works, and there arenever any surprises no matter what you give for "x".

Example #2: logging

In the above example, C++ templated inline functions were definitely betterthan macros for solving our problem. Now let's look at something slightlydifferent: a log message printer. Which of the following is better, LOGv1or LOGv2?

#define LOGv1(lvl,str) do { \         if ((lvl)<= _loglevel) print((str)); \    } while (0)inline void LOGv2(int lvl, std::string str){    if (lvl<= _loglevel) print(str);}

(Trivia: can you figure out why I have to use the weird do { } while(0)notation?)

Notice that the problem from the first example doesn't happen here. As longas you only refer to each parameter once in the definition, you're okay. And you don't need a template for the inline function, because actually thelog level is always an int and the thing you're printing is (let's assume)always a string. You could complain about namespace pollution, but they'reboth global functions and you only get them if you include their headerfiles, so you should be pretty safe.

But my claim is that the #define ismuch better here. Why? Actually, for the same reason it wasworse in the first example:non-deterministic parameter evaluation. Try this:

LOGv1(1000, hexdump(buffer, 10240));

Let's say _loglevel is less than 1000, so we won't be printing the message. The macro expands to something like

if (1000<= _loglevel) print(hexdump(buffer, 10240));

So the print(), including the hexdump(), is bypassed if the log level is toolow. In fact, if _loglevel is a constant (or a #define, it doesn't matter),then the optimizer can throw it away entirely: the if() is always false, andanything inside an if(false) will never, ever run. There's no performancepenalty for LOGv1 if your log level is set low enough.

But because of the guaranteed evaluation rules, the inline function actuallyexpands out to something like this:

std::string s = hexdump(buffer, 10240);if (1000< _loglevel) print(s);

The optimizer throws away the print statement, just like before - butit's not allowed to discard the hexdump() call! That means yourprogram malloc()s a big string, fills it with stuff, and then free()s it -for no reason.

Now, it's possible that C++ templates - being a full-powered macro system -could be used to work around this, but I don't know how. And I'm prettysmart. So it'seffectively impossible for most C++ programmers toget the behaviour they want here without using cpp macros.

Of course, the workaround is to just type this every time instead:

if (1000<= LOGLEVEL) LOGv2(hexdump(buffer,10240));

You're comparing to LOGLEVEL twice - before LOGv2 and inside LOGv2 - butsince it's inline, the optimizer will throw away the extra compare. But thefact that one if() is outside the function call means it can skip evaluatingthe hexdump() call.

The fact that you can do this isn't really a justification for leaving out amacro system -of course, anything a macro system can do, I can alsodo by typing out all the code by hand. But why would I want to?

Java and C# programmers are pretty much screwed here⁽¹⁾ - they have nomacro processor at all,and those languages are especially slow soyou don't want to needlessly evaluate stuff. The only option is theexplicit if statement every time. Blech.

Example #3: assert()

My final example is especially heinous. assert() is one of the mostvaluable functions to C/C++ programmers (although some of them don't realizeit yet). Even if you prefer your assertions to be non-fatal, frameworkslike JUnit and NUnit have their own variants of assert() to check unit testresults.

Here's what a simplified assert() implementation might look like in C.

#define assert(cond)  do { \        if (!NDEBUG && !(cond)) \            _assert_fail(__FILE__, __LINE__, #cond); \    } while (0)

We have the same situation as example #2, where if NDEBUG is set, there's noneed to evaluate (cond). (Of course, exactly this lack of evaluation iswhat sometimes confuses people about assert(). Think about what happenswith and without NDEBUG if you typeassert(--x >= 0).)

But that's the least of our worries: I never use NDEBUG anyway.

Thereally valuable parts here are some things you just can't dowithout a preprocessor.FILE andLINE refer to the line whereassert() iscalled, not the line where the macro is declared, or theywouldn't be useful. And the highly magical "#cond" notation - which you'veprobably never seen before, since it's almost, but not quite, never needed -turns (cond) into a printable string. Why would you want to do that? Well,so that you can have _assert_fail print out something awesome like this:

** Assertion "--x >= 0" failed at mytest.c line 56

Languages without a preprocessor just can't do useful stuff like that, andit'svery bothersome. As with any macroless language, you end uptyping it yourself, like in JUnit:

assertTrue("oh no, x >= 5!", --x >= 0);

As you can see in the above example, the message is usually a lie, leadingto debugging wild goose chases. It's also alot more typing anddiscourages people from writing tests. (JUnit does manage to capture thefile and function, thankfully, by throwing an exception and looking at itsbacktrace. It's harder, but still possible, to get the line number too.)

Side note

⁽¹⁾ The C# language designers probably hate me, but actuallythere's nothing stopping you frompassing yourC# code through cpp to get these same advantages. Next time someonetells you cpp is poorly designed, ask yourself whether their "well-designed"macro language would let you do that.

Movatterモバイル変換

2007-08-13»

2007-08-13 »