Movatterモバイル変換

/sys/doc/Documentation archive

Notes on Programming in C

Introduction

Kernighan and Plauger'sThe Elements of Programming Stylewas an important and rightly influential book. Butsometimes I feel its concise rules were taken as a cookbookapproach to good style instead of the succinct expression ofa philosophy they were meant to be. If the book claims thatvariable names should be chosen meaningfully, doesn't itthen follow that variables whose names are small essays ontheir use are even better? Isn't MaximumValueUntilOverflowa better name than maxval? I don't think so.

What follows is a set of short essays that collectivelyencourage a philosophy of clarity in programming rather thangiving hard rules. I don't expect you to agree with all ofthem, because they are opinion and opinions change with thetimes. But they've been accumulating in my head, if not onpaper until now, for a long time, and are based on a lot ofexperience, so I hope they help you understand how to planthe details of a program. (I've yet to see a good essay onhow to plan the whole thing, but then that's partly whatthis course is about.) If you find them idiosyncratic,fine; if you disagree with them, fine; but if they make youthink about why you disagree, that's better. Under nocircumstances should you program the way I say to because I sayto; program the way you think expresses best what you'retrying to accomplish in the program. And do so consistentlyand ruthlessly.

Your comments are welcome.

Issues of typography

A program is a sort of publication. It's meant to beread by the programmer, another programmer (perhaps yourselfa few days, weeks or years later), and lastly a machine. The machine doesn't care how pretty the program is - if theprogram compiles, the machine's happy - but people do, andthey should. Sometimes they care too much: pretty printersmechanically produce pretty output that accentuatesirrelevant detail in the program, which isas sensibleasputting all the prepositionsin English textin bold font. Although many people think programs should look like theAlgol-68 report (and some systems even require you to editprograms in that style), a clear program is not made anyclearer by such presentation, and a bad program is only madelaughable.

Typographic conventions consistently held are importantto clear presentation, of course - indentation is probablythe best known and most useful example - but when the inkobscures the intent, typography has taken over. So even ifyou stick with plain old typewriter-like output, beconscious of typographic silliness. Avoid decoration; forinstance, keep comments brief and banner-free. Say what youwant to say in the program, neatly and consistently. Thenmove on.

Variable names

Ah, variable names. Length is not a virtue in a name;clarity of expressionis. A global variable rarely used maydeserve a long name, maxphysaddrsay. An array index usedon every line of a loop needn't be named any more elaboratelythan i. Saying indexor elementnumberis more totype (or calls upon your text editor) and obscures thedetails of the computation. When the variable names arehuge, it's harder to see what's going on. This is partly atypographic issue; consider

        for(i=0 to 100)                array[i]=0

vs.

        for(elementnumber=0 to 100)                array[elementnumber]=0;

The problem gets worse fast with real examples. Indices arejust notation, so treat them as such.

Pointers also require sensible notation. npis just asmnemonic as nodepointerif you consistently use a namingconvention from which npmeans ``node pointer'' is easilyderived. More on this in the next essay.

As in all other aspects of readable programming,consistency is important in naming. If you call onevariable maxphysaddr, don't call its cousin lowestaddress.

Finally, I prefer minimum-length butmaximum-information names, and then let the context fill in therest. Globals, for instance, typically have little contextwhen they are used, so their names need to be relativelyevocative. Thus I say maxphysaddr(notMaximumPhysicalAddress) for a global variable, but npnot NodePointerfor a pointer locally defined and used. Thisis largely a matter of taste, but taste is relevant to clarity.

I eschew embedded capital letters in names; to myprose-oriented eyes, they are too awkward to readcomfortably. They jangle like bad typography.

The use of pointers.

C is unusual in that it allows pointers to point toanything. Pointers are sharp tools, and like any such tool,used well they can be delightfully productive, but usedbadly they can do great damage (I sunk a wood chisel into mythumb a few days before writing this). Pointers have a badreputation in academia, because they are considered toodangerous, dirty somehow. But I think they are powerfulnotation, which means they can help us express ourselvesclearly.

Consider: When you have a pointer to an object, it is aname for exactly that object and no other. That soundstrivial, but look at the following two expressions:

        np        node[i]

The first points to a node, the second evaluates to (say)the same node. But the second form is an expression; it isnot so simple. To interpret it, we must know what nodeis,what iis, and that iand nodeare related bythe (probably unspecified) rules of the surrounding program. Nothingabout the expression in isolation can show that iis a validindex of node, let alone the index of the element we want. If iand jand kare all indices into thenode array, it's very easy to slip up, and the compiler cannot help. It'sparticularly easy to make mistakes when passing things tosubroutines: a pointer is a single thing; an array and anindex must be believed to belong together in the receivingsubroutine.

An expression that evaluates to an object is inherentlymore subtle and error-prone than the address of that object. Correct use of pointers can simplify code:

        parent->link[i].type

vs.

lp->type.

If we want the next element's type, it's

        parent->link[++i].type

        (++lp)->type.

iadvances but the rest of the expression must stay constant;with pointers, there's only one thing to advance.

Typographic considerations enter here, too. Steppingthrough structures using pointers can be much easier to readthan with expressions: less ink is needed and less effort isexpended by the compiler and computer. A related issue isthat the type of the pointer affects how it can be usedcorrectly, which allows some helpful compile-time errorchecking that array indices cannot share. Also, if theobjects are structures, their tag fields are reminders oftheir type, so

             np->left

is sufficiently evocative; if an array is being indexed thearray will have some well-chosen name and the expressionwill end up longer:

             node[i].left.

Again, the extra characters become more irritating as theexamples become larger.

As a rule, if you find code containing many similar,complex expressions that evaluate to elements of a datastructure, judicious use of pointers can clear things up. Consider what

        if(goleft)             p->left=p->right->left;        else             p->right=p->left->right;

would look like using a compound expression forp. Sometimes it's worth a temporary variable (here p) or a macro todistill the calculation.

Procedure names

Procedure names should reflect what they do; functionnames should reflect what theyreturn. Functions are usedin expressions, often in things like if's, so they need toread appropriately.

        if(checksize(x))

is unhelpful because we can't deduce whether checksizereturns true on error or non-error; instead

        if(validsize(x))

makes the point clear and makes a future mistake in usingthe routine less likely.

Comments

A delicate matter, requiring taste and judgement. Itend to err on the side of eliminating comments, for severalreasons. First, if the code is clear, and uses good typenames and variable names, it should explain itself. Second,comments aren't checked by the compiler, so there is noguarantee they're right, especially after the code ismodified. A misleading comment can be very confusing. Third,the issue of typography: comments clutter code.

But I do comment sometimes. Almost exclusively, I usethem as an introduction to what follows. Examples:explaining the use of global variables and types (the one thing Ialways comment in large programs); as an introduction to anunusual or critical procedure; or to mark off sections of alarge computation.

There is a famously bad comment style:

        i=i+1;           /* Add one to i */

and there are worse ways to do it:

        /**********************************         *                                *         *          Add one to i          *         *                                *         **********************************/                       i=i+1;

Don't laugh now, wait until you see it in real life.

Avoid cute typography in comments, avoid big blocks ofcomments except perhaps before vital sections like thedeclaration of the central data structure (comments on dataare usually much more helpful than on algorithms);basically, avoid comments. If your code needs a comment to beunderstood, it would be better to rewrite it so it's easierto understand. Which brings us to

Complexity

Most programs are too complicated - that is, morecomplex than they need to be to solve their problemsefficiently. Why? Mostly it's because of bad design, but Iwill skip that issue here because it's a big one. Butprograms are often complicated at the microscopic level, andthat is something I can address here.

Rule 1. You can't tell where a program is going tospend its time. Bottlenecks occur in surprising places, sodon't try to second guess and put in a speed hack untilyou've proven that's where the bottleneck is.

Rule 2. Measure. Don't tune for speed until you'vemeasured, and even then don't unless one part of the codeoverwhelms the rest.

Rule 3. Fancy algorithms are slow whenn is small, andn is usually small. Fancy algorithms have big constants.Until you know thatn is frequently going to be big, don'tget fancy. (Even ifn does get big, use Rule 2 first.) For example, binary trees are always faster than splay treesfor workaday problems.

Rule 4. Fancy algorithms are buggier than simple ones,and they're much harder to implement. Use simple algorithmsas well as simple data structures.

The following data structures are a complete list foralmost all practical programs:

: array
linked list
hash table
binary tree

Of course, you must also be prepared to collect these intocompound data structures. For instance, a symbol tablemight be implemented as a hash table containing linked listsof arrays of characters.

Rule 5. Data dominates. If you've chosen the rightdata structures and organized things well, the algorithmswill almost always be self-evident. Data structures, notalgorithms, are central to programming. (SeeThe Mythical Man-Month: Essays on Software Engineering by F. P. Brooks, page 102.)

Rule 6. There is no Rule 6.

Programming with data.

Algorithms, or details of algorithms, can often beencoded compactly, efficiently and expressively as datarather than, say, as lots of ifstatements. The reason isthat thecomplexity of the job at hand, if it is due to acombination of independent details,can be encoded. A classicexample of this is parsing tables, which encode thegrammar of a programming language in a form interpretable bya fixed, fairly simple piece of code. Finite state machinesare particularly amenable to this form of attack, but almostany program that involves the `parsing' of some abstractsort of input into a sequence of some independent `actions'can be constructed profitably as a data-driven algorithm.

Perhaps the most intriguing aspect of this kind ofdesign is that the tables can sometimes be generated byanother program - a parser generator, in the classical case.As a more earthy example, if an operating system is drivenby a set of tables that connect I/O requests to theappropriate device drivers, the system may be `configured'by a program that reads a description of the particulardevices connected to the machine in question and prints thecorresponding tables.

One of the reasons data-driven programs are not common,at least among beginners, is the tyranny of Pascal. Pascal,like its creator, believes firmly in the separation of codeand data. It therefore (at least in its original form) hasno ability to create initialized data. This flies in theface of the theories of Turing and von Neumann, which definethe basic principles of the stored-program computer. Codeand dataare the same, or at least they can be. How elsecan you explain how a compiler works? (Functional languageshave a similar problem with I/O.)

Function pointers

Another result of the tyranny of Pascal is thatbeginners don't use function pointers. (You can't havefunction-valued variables in Pascal.) Using functionpointers to encode complexity has some interestingproperties.

Some of the complexity is passed to the routine pointedto. The routine must obey some standard protocol - it's oneof a set of routines invoked identically - but beyond that,what it does is its business alone. The complexity isdistributed.

There is this idea of a protocol, in that all functionsused similarly must behave similarly. This makes for easydocumentation, testing, growth and even making the programrun distributed over a network - the protocol can be encodedas remote procedure calls.

I argue that clear use of function pointers is theheart of object-oriented programming. Given a set ofoperations you want to perform on data, and a set of data typesyou want to respond to those operations, the easiest way toput the program together is with a group of functionpointers for each type. This, in a nutshell, defines classand method. The O-O languages give you more of course -prettier syntax, derived types and so on - but conceptuallythey provide little extra.

Combining data-driven programs with function pointersleads to an astonishingly expressive way of working, a waythat, in my experience, has often led to pleasant surprises.Even without a special O-O language, you can get 90% of thebenefit for no extra work and be more in control of theresult. I cannot recommend an implementation style morehighly. All the programs I have organized this way havesurvived comfortably after much development - far betterthan with less disciplined approaches. Maybe that's it: thediscipline it forces pays off handsomely in the long run.

Include files

Simple rule: include files should never include includefiles. If instead they state (in comments or implicitly)what files they need to have included first, the problem ofdeciding which files to include is pushed to the user(programmer) but in a way that's easy to handle and that, byconstruction, avoids multiple inclusions. Multipleinclusions are a bane of systems programming. It's not rare tohave files included five or more times to compile a single Csource file. The Unix /usr/include/sysstuff is terriblethis way.

There's a little danceinvolving #ifdef's that can prevent a file being read twice,but it's usually done wrongin practice - the #ifdef's are in the file itself, not thefile that includes it. The result is often thousands ofneedless lines of code passing through the lexical analyzer,which is (in good compilers) the most expensive phase.

Just follow the simple rule.

[8]ページ先頭