Movatterモバイル変換


[0]ホーム

URL:




Chapter 26: Concrete Examples

In this chapter concrete examples ofC++ programs, classes and templatesare presented. Topics covered by theC++ Annotations such as virtualfunctions,static members, etc. are illustrated in this chapter. Theexamples roughly follow the organization of earlier chapters.

As an additional topic, not just providing examples ofC++ the subjects ofscanner andparser generators are covered. We show how these tools maybe used inC++ programs. These additional examples assume a certainfamiliarity with the concepts underlying these tools, like grammars,parse-trees and parse-tree decoration. Once the input for a program exceeds a certainlevel of complexity, it's attractive to use scanner- and parser-generators tocreate the code doing the actual input processing. One of theexamples in this chapter describes the usage of thesetools in aC++ environment.

26.1: Using file descriptors with `streambuf' classes

26.1.1: Classes for output operations

Reading and writing from and tofile descriptorsare not part of theC++ standard. But on most operating systems filedescriptorsare available and can be considered adevice. It seemsnatural to use the classstd::streambuf as the starting pointfor constructing classes interfacing such file descriptor devices.

Below we'll construct classes that can be used to write to a device givenits file descriptor. The devices may be files, but they could also bepipesorsockets. Section26.1.2 covers reading from such devices; section26.2.3 reconsiders redirection, discussed earlier in section6.6.2.

Using thestreambuf class as a base class it is relatively easy todesign classes for output operations. The only member function thatmustbe overridden is the (virtual) memberint streambuf::overflow(int c). This member's responsibility is towrite characters to the device. Iffd is an output file descriptor and ifoutput should not be buffered then the memberoverflow() can simply beimplemented as:

    class UnbufferedFD: public std::streambuf    {        public:            int overflow(int c) override;            ...    };    int UnbufferedFD::overflow(int c)    {        if (c != EOF)        {            if (write(d_fd, &c, 1) != 1)                return EOF;        }        return c;    }

The argument received byoverflow is either written to the filedescriptor (and returned fromoverflow), orEOF is returned.

This simple function does not use output buffering. For various reasons,using a buffer is usually a good idea (see also the next section).

When output buffering is used, theoverflow member is a bit morecomplex as it is only called when the buffer is full. Once the buffer is full,wefirst have to flush the buffer. Flushing the buffer is theresponsibility of the (virtual) functionstreambuf::sync. Sincesync is a virtual function, classes derived fromstreambuf mayredefinesync to flush a bufferstreambuf itself doesn't know about.

Overridingsync and using it inoverflow is not all that has to bedone. When the object of the class defining the buffer reaches the end of itslifetime the buffer may be only partially full. In that situation the buffermust also be flushed. This is easily done by simply callingsync from theclass's destructor.

Now that we've considered the consequences of using an output buffer,we're almost ready to design our derived class. Several more featuresare added as well, though:

To save some space in theC++ Annotations, the successful completion of thefunctions designed here is not checked in the example code. In `real life'implementations these checks should of course not be omitted. Our classOFdnStreambuf has the following characteristics: The next program uses theOFfdStreambuf class to copy its standardinput to file descriptorSTDOUT_FILENO, which is the symbolic name of thefile descriptor used for the standard output:
    #include <string>    #include <iostream>    #include <istream>    #include "fdout.h"    using namespace std;    int main(int argc, char **argv)    {        OFdnStreambuf   fds(STDOUT_FILENO, 500);        ostream         os(&fds);        switch (argc)        {            case 1:                for (string  s; getline(cin, s); )                    os << s << '\n';                os << "COPIED cin LINE BY LINE\n";            break;            case 2:                cin >> os.rdbuf();      // Alternatively, use:  cin >> &fds;                os << "COPIED cin BY EXTRACTING TO os.rdbuf()\n";            break;            case 3:                os << cin.rdbuf();                os << "COPIED cin BY INSERTING cin.rdbuf() into os\n";            break;        }    }

26.1.2: Classes for input operations

When classes for input operation are derived fromstd::streambuf, they should be provided with an input bufferof at least one character. The one-character input buffer allows for the useof the member functionsistream::putback oristream::ungetc. Strictlyspeaking it is not necessary to implement a buffer in classes derived fromstreambuf. But using buffers in these classes is strongly advised. Theirimplementation is very simple and straightforward and the applicability ofsuch classes is greatly improved. Therefore, all our classes derived from theclassstreambuf define a buffer ofat least one character.

26.1.2.1: Using a one-character buffer

When deriving a class (e.g.,IFdStreambuf) fromstreambuf using abuffer of one character, at least its memberstreambuf::underflow should be overridden, as this member eventuallyreceives all requests for input. The memberstreambuf::setg is used to inform thestreambuf base class of thesize and location of the input buffer, so that it is able to set up its inputbuffer pointers accordingly. This ensures thatstreambuf::eback,streambuf::gptr, andstreambuf::egptr return correct values.

The classIFdStreambuf is designed like this:

The followingmain function shows howIFdStreambuf can be used:
    int main()    {        IFdStreambuf fds(STDIN_FILENO);        istream      is(&fds);        cout << is.rdbuf();    }

26.1.2.2: Using an n-character buffer

How complex would things get if we decided to use a buffer ofsubstantial size? Not that complex. The following class allows us to specifythe size of a buffer, but apart from that it is basically the same class asIFdStreambuf developed in the previous section. To make things a bit moreinteresting, in the classIFdNStreambuf developed here, the memberstreambuf::xsgetn is also overridden, to optimize reading aseries of characters. Also a default constructor is provided that can be usedin combination with theopen member to construct anistream objectbefore the file descriptor becomes available. In that case, once thedescriptor becomes available, theopen member can be used to initiatethe object's buffer. Later, in section26.2, we'll encounter such asituation.

To save some space, the success of various calls was not checked. In `reallife' implementations, these checks should of course not be omitted. TheclassIFdNStreambuf has the following characteristics:

The member functionxsgetn is called bystreambuf::sgetn,which is astreambuf member. Here is an example illustrating the use ofthis member function with anIFdNStreambuf object:
    #include <unistd.h>    #include <iostream>    #include <istream>    #include "ifdnbuf.h"    using namespace std;    int main()    {                                    // internally: 30 char buffer        IFdNStreambuf fds(STDIN_FILENO, 30);        char buf[80];               // main() reads blocks of 80                                    // chars        while (true)        {            size_t n = fds.sgetn(buf, 80);            if (n == 0)                break;            cout.write(buf, n);        }    }

26.1.2.3: Seeking positions in `streambuf' objects

When devices supportseek operations, classes derived fromstd::streambuf should override the membersstreambuf::seekoff andstreambuf::seekpos. The classIFdSeek, developed in this section, can be used to read information fromdevices supporting seek operations. The classIFdSeek was derived fromIFdStreambuf, so it uses a character buffer of just one character. Thefacilities to perform seek operations, which are added to our new classIFdSeek, ensure that the input buffer is reset when a seek operation isrequested. The class could also be derived from the classIFdNStreambuf. In that case the arguments to reset the input buffermust be adapted so that its second and third parameters point beyond theavailable input buffer. Let's have a look at the characteristics ofIFdSeek: Here is an example of a program using the classIFdSeek. Ifthis program is given its own source file using input redirection thenseeking is supported (and with the exception of the first line, every otherline is shown twice):
    #include "fdinseek.h"    #include <string>    #include <iostream>    #include <istream>    #include <iomanip>    using namespace std;    int main()    {        IFdSeek fds(0);        istream is(&fds);        string  s;        while (true)        {            if (!getline(is, s))                break;            streampos pos = is.tellg();            cout << setw(5) << pos << ": `" << s << "'\n";            if (!getline(is, s))                break;            streampos pos2 = is.tellg();            cout << setw(5) << pos2 << ": `" << s << "'\n";            if (!is.seekg(pos))            {                cout << "Seek failed\n";                break;            }        }    }

26.1.2.4: Multiple `unget' calls in `streambuf' objects

Streambuf classes and classes derived fromstreambuf should supportat least ungetting the last read character. Special care must be takenwhenseries ofunget calls must be supported. In this section theconstruction of a class supporting a configurable number ofistream::ungetoristream::putback calls is discussed.

Support for multiple (say `n')unget calls is implemented byreserving an initial section of the input buffer, which is gradually filled upto contain the lastn characters read. The class is implemented asfollows:

An example using FdUnget

The next example program illustrates the use of the classFdUnget. Itreads at most 10 characters from the standard input, stopping atEOF. A guaranteed unget-buffer of 2 characters is defined in a bufferholding 3 characters. Just before reading a character, the program tries tounget at most 6 characters. This is, of course, not possible; but the programnicely ungets as many characters as possible, considering the actualnumber of characters read:

    #include "fdunget.h"    #include <string>    #include <iostream>    #include <istream>    using namespace std;    int main()    {        FdUnget fds(0, 3, 2);        istream is(&fds);        char    c;        for (int idx = 0; idx < 10; ++idx)        {            cout << "after reading " << idx << " characters:\n";            for (int ug = 0; ug <= 6; ++ug)            {                if (!is.unget())                {                    cout                    << "\tunget failed at attempt " << (ug + 1) << "\n"                    << "\trereading: '";                    is.clear();                    while (ug--)                    {                        is.get(c);                        cout << c;                    }                    cout << "'\n";                    break;                }            }            if (!is.get(c))            {                cout << " reached\n";                break;            }            cout << "Next character: " << c << '\n';        }    }    /*        Generated output after 'echo abcde | program':        after reading 0 characters:                unget failed at attempt 1                rereading: ''        Next character: a        after reading 1 characters:                unget failed at attempt 2                rereading: 'a'        Next character: b        after reading 2 characters:                unget failed at attempt 3                rereading: 'ab'        Next character: c        after reading 3 characters:                unget failed at attempt 4                rereading: 'abc'        Next character: d        after reading 4 characters:                unget failed at attempt 4                rereading: 'bcd'        Next character: e        after reading 5 characters:                unget failed at attempt 4                rereading: 'cde'        Next character:        after reading 6 characters:                unget failed at attempt 4                rereading: 'de        '         reached    */

26.1.3: Fixed-sized field extraction from istream objects

Usually when extracting information fromistream objectsoperator>>, thestandard extraction operator is perfectly suited for the task as in mostcases the extracted fields are white-space (or otherwise clearly) separatedfrom each other. But this does not hold true in all situations. For example,when a web-form is posted to some processing script or program, the receivingprogram may receive the form field's values asurl-encodedcharacters: letters and digits are sent unaltered, blanks are sent as+characters, and all other characters start with% followed by thecharacter'sascii-value represented by its two digit hexadecimal value.

When decoding url-encoded information, simple hexadecimal extraction won'twork, as that extracts as many hexadecimal characters as available,instead of just two. Since the lettersa-f` and0-9 are legalhexadecimal characters, a text likeMy name is `Ed', url-encoded as

    My+name+is+%60Ed%27

results in the extraction of the hexadecimal values60ed and27,instead of60 and27. The nameEd disappears from view, which isclearly not what we want.

In this case, having seen the%, we could extract 2 characters, putthem in anistringstream object, and extract the hexadecimal value fromtheistringstream object. A bit cumbersome, but doable. Other approachesare possible as well.

The classFistream forfixed-sized field istream definesanistream class supporting both fixed-sized field extractions andblank-delimited extractions (as well as unformattedread calls). Theclass may be initialized as awrapper around an existingistream, orit can be initialized using the name of an existing file. The class is derivedfromistream, allowing all extractions and operations supported byistreams in general.Fistream defines the following data members:

Here is the initial section ofFistream's class interface:
    class Fistream: public std::istream    {        std::unique_ptr<std::filebuf> d_filebuf;        std::streambuf *d_streambuf;        std::istringstream d_iss;        size_t d_width;
As stated,Fistream objects can be constructed from either afilename or an existingistream object. The class interface thereforedeclares two constructors:
            Fistream(std::istream &stream);            Fistream(char const *name,                std::ios::openmode mode = std::ios::in);
When anFistream object is constructed using an existingistreamobject, theFistream'sistream part simply uses thestream'sstreambuf object:
Fistream::Fistream(istream &stream):    istream(stream.rdbuf()),    d_streambuf(rdbuf()),    d_width(0){}
When anfstream object is constructed using a filename, theistream base initializer is given a newfilebuf object to be used asitsstreambuf. Since the class's data members are not initialized beforethe class's base class has been constructed,d_filebuf can only beinitialized thereafter. By then, thefilebuf is only available asrdbuf, returning astreambuf. However, as it is actually afilebuf, astatic_cast is used to cast thestreambuf pointerreturned byrdbuf toa filebuf *, sod_filebuf can be initialized:
Fistream::Fistream(char const *name, ios::openmode mode):    istream(new filebuf()),    d_filebuf(static_cast<filebuf *>(rdbuf())),    d_streambuf(d_filebuf.get()),    d_width(0){    d_filebuf->open(name, mode);}

26.1.3.1: Member functions and example

There is only one additional public member:setField(field const&). This member defines the size of the next field to extract. Itsparameter is a reference to afield class, amanipulator classdefining the width of the next field.

Since afield & is mentioned inFistream's interface,fieldmust be declared beforeFistream's interface starts. The classfielditself is simple and declaresFistream as its friend. It has two datamembers:d_width specifies the width of the next field, andd_newWidthwhich is set totrue ifd_width's value should actually be used. Ifd_newWidth is false,Fistream returns to its standard extractionmode. The classfield has two constructors: a defaultconstructor, settingd_newWidth tofalse, and a second constructorexpecting the width of the next field to extract as its value. Here is theclassfield:

    class field    {        friend class Fistream;        size_t d_width;        bool     d_newWidth;        public:            field(size_t width);            field();    };    inline field::field(size_t width)    :        d_width(width),        d_newWidth(true)    {}    inline field::field()    :        d_newWidth(false)    {}
Sincefield declaresFistream as its friend,setField mayinspectfield's members directly.

Time to return tosetField. This function expects a reference to afield object, initialized in one of three different ways:

Here issetField's implementation:
std::istream &Fistream::setField(field const &params){    if (params.d_newWidth)                  // new field size requested        d_width = params.d_width;           // set new width    if (!d_width)                           // no width?        rdbuf(d_streambuf);                 // return to the old buffer    else        setBuffer();                        // define the extraction buffer    return *this;}

The private membersetBuffer defines a buffer ofd_width + 1characters and usesread to fill the buffer withd_widthcharacters. The buffer is an NTBS. This buffer is used to initialize thed_iss member.Fistream'srdbuf member is used to extract thed_str's data via theFistream object itself:

void Fistream::setBuffer(){    char *buffer = new char[d_width + 1];    rdbuf(d_streambuf);                         // use istream's buffer to    buffer[read(buffer, d_width).gcount()] = 0; // read d_width chars,                                                // terminated by a 0-byte    d_iss.str(buffer);    delete[] buffer;    rdbuf(d_iss.rdbuf());                       // switch buffers}

AlthoughsetField could be used to configureFistream to use ornot to use fixed-sized field extraction, using manipulators is probablypreferable. To allowfield objects to be used as manipulators anoverloaded extraction operator was defined. This extraction operator acceptsistream & and afield const & objects. Using this extractionoperator, statements like

fis >> field(2) >> x >> field(0);

are possible (assumingfis is aFistream object). Here is theoverloadedoperator>>, as well as its declaration:

istream &std::operator>>(istream &str, field const &params){    return static_cast<Fistream *>(&str)->setField(params);}
Declaration:
namespace std{    istream &operator>>(istream &str, FBB::field const &params);}

Finally, an example. The following program uses aFistream object tourl-decode url-encoded information appearing at its standard input:

    int main()    {        Fistream fis(cin);        fis >> hex;        while (true)        {            size_t x;            switch (x = fis.get())            {                case '\n':                    cout << '\n';                break;                case '+':                    cout << ' ';                break;                case '%':                    fis >> field(2) >> x >> field(0);                // FALLING THROUGH                default:                    cout << static_cast<char>(x);                break;                case EOF:                return 0;            }        }    }    /*        Generated output after:            echo My+name+is+%60Ed%27 | a.out        My name is `Ed'    */

26.2: The `fork' system call

From theC programming language thefork system call is wellknown. When a program needs to start a new process,system can be used.The functionsystem requires the program to wait for thechild process to terminate. The more general way to spawn subprocesses is to usefork.

In this section we investigate howC++ can be used to wrap classes arounda complex system call likefork. Much of what follows in this sectiondirectly applies to the Unix operating system, and the discussion thereforefocuses on that operating system. Other systems usually providecomparable facilities. What follows is closely related to theTemplate Design Pattern (cf.Gamma et al. (1995)Design Patterns, Addison-Wesley)

Whenfork is called, the current program is duplicated in memory, thuscreating a new process. Following this duplication both processes continuetheir execution just below thefork system call. The two processes mayinspectfork's return value: the return value in theoriginal process (called theparent process) differs from the returnvalue in the newly created process (called thechild process):

26.2.1: A basic Fork class

A basicFork class should hide all bookkeeping details of a systemcall likefork from its users. The classFork developed here doesjust that. The class itself only ensures the proper execution of theforksystem call. Normally,fork is called to start a child process, usuallyboiling down to the execution of a separate process. This child process mayexpect input at its standard input stream and/or may generate output to itsstandard output and/or standard error streams.Fork does not know allthis, and does not have to know what the child process will do.Forkobjects should be able to start their child processes.

Fork's constructor cannot know what actions its childprocess should perform. Similarly, it cannot know what actions the parentprocess should perform. For these kind of situations, thetemplate method design pattern was developed. According to Gamma c.s., thetemplate method designpattern

``Define(s) the skeleton of an algorithm in an operation, deferring some steps to subclasses. [The] Template Method (design pattern) lets subclasses redefine certain steps of an algorithm, without changing the algorithm's structure.''

This design pattern allows us to define anabstract base class already providing the essential steps related to thefork system call,deferring the implementation of other parts of thefork system call tosubclasses.

TheFork abstract base class has the following characteristics:

26.2.2: Parents and Children

The member functionfork calls the system functionfork(Caution: since the system functionfork is called by a memberfunction having the same name, the:: scope resolution operator must beused to prevent a recursive call of the member function itself).The function::fork's return value determines whetherparentProcessorchildProcess is called. Maybe redirection isnecessary.Fork::fork's implementation callschildRedirectionsjust before callingchildProcess, andparentRedirections justbefore callingparentProcess:
    #include "fork.ih"    void Fork::fork()    {        if ((d_pid = ::fork()) < 0)            throw "Fork::fork() failed";        if (d_pid == 0)                 // childprocess has pid == 0        {            childRedirections();            childProcess();            exit(1);                    // we shouldn't come here:        }                               // childProcess() should exit        parentRedirections();        parentProcess();    }
Infork.cc the class'sinternal header filefork.ih is included. This header file takes care of the inclusion of thenecessary system header files, as well as the inclusion offork.hitself. Its implementation is:
    #include "fork.h"    #include <cstdlib>    #include <unistd.h>    #include <sys/types.h>    #include <sys/wait.h>

Child processes should not return: once they have completed their tasks,they should terminate. This happens automatically when the child processperforms a call to a member of theexec... family, but if the childitself remains active, then it must make sure that it terminates properly. Achild process normally usesexit to terminate itself, but note thatexit prevents the activation of destructors of objects defined at the same or more superficial nesting levels than the level atwhichexit is called. Destructors of globally defined objectsareactivated whenexit is used. When usingexit to terminatechildProcess, it should either itself call a support member functiondefining all nested objects it needs, or it should define all its objects in acompound statement (e.g., using athrow block) callingexit beyondthe compound statement.

Parent processes should normally wait for their children to complete.Terminating child processes inform their parents that they are about toterminate by sending asignal that should be caught by their parents. Ifchild processes terminate and their parent processes do not catch thosesignals then such child processes remain visible as so-calledzombieprocesses.

If parent processes must wait for their children to complete, they maycall the memberwaitForChild. This member returns the exit status of achild process to its parent.

There exists a situation where thechild processcontinues tolive, but theparent dies. This is a fairly natural event: parents tend todie before their children do. In our context (i.e.C++), this is called adaemon program. In a daemon the parent process dies and the child programcontinues to run as a child of the basicinit process. Again, when thechild eventually dies a signal is sent to its `step-parent'init. Thisdoes not create a zombie asinit catches the termination signals of allits (step-) children. The construction of a daemon process is very simple,given the availability of the classFork (cf. section26.2.4).

26.2.3: Redirection revisited

Earlier, in section6.6.2 streams were redirectedusing theios::rdbuf member function. By assigning thestreambuf of a stream to another stream, both stream objects access thesamestreambuf, thus implementing redirection at the level of theprogramming language itself.

This may be fine within the context of aC++ program, but once weleave that context the redirection terminates. The operating system does notknow aboutstreambuf objects. This situation is encountered, e.g., when aprogram uses asystem call to start a subprogram. The example program atthe end of this section usesC++ redirection to redirect the informationinserted intocout to a file, and then calls

    system("echo hello world")

to echo a well-known line of text. Sinceecho writes its informationto the standard output, this would be the program's redirected file if theoperating system would recognizeC++'s redirection.

But redirection doesn't happen. Instead,hello world still appears atthe program's standard output and the redirected file is left untouched. Towritehello world to the redirected file redirection must be realized atthe operating system level. Some operating systems (e.g.,Unix andfriends) provide system calls likedup anddup2 to accomplishthis. Examples of the use of these system calls are given in section26.2.5.

Here is the example of thefailing redirection at the system levelfollowingC++ redirection usingstreambuf redirection:

    #include <iostream>    #include <fstream>    #include <cstdlib>    using namespace std;    int main()    {        ofstream of("outfile");        streambuf *buf = cout.rdbuf(of.rdbuf());        cout << "To the of stream\n";        system("echo hello world");        cout << "To the of stream\n";        cout.rdbuf(buf);    }    /*        Generated output: on the file `outfile'        To the of stream        To the of stream        On standard output:        hello world    */

26.2.4: The `Daemon' program

Applications exist in which the only purpose offork is to start achild process. The parent process terminates immediately after spawning thechild process. If this happens, the child process continues to run as a childprocess ofinit, the always running first process onUnix systems. Sucha process is often called adaemon, running as abackground process.

Although the next example can easily be constructed as a plainCprogram, it was included in theC++ Annotations because it is so closelyrelated to the current discussion of theFork class. I thought aboutadding adaemon member to that class, but eventually decided against itbecause the construction of a daemon program is very simple and requires nofeatures other than those currently offered by the classFork. Here is anexample illustrating the construction of such a daemon program. Its childprocess doesn't doexit butthrow 0 which is caught by thecatchclause of the child'smain function. Doing this ensures that any objectsdefined by the child process are properly destroyed:

    #include <iostream>    #include <unistd.h>    #include "fork.h"    class Daemon: public Fork    {        void parentProcess() override       // the parent does nothing.        {}        void childProcess() override        // actions by the child        {            sleep(3);                                            // just a message...            std::cout << "Hello from the child process\n";            throw 0;                        // The child process ends        }    };    int main()    try    {        Daemon{}.fork();    }    catch(...)    {}    /*        Generated output:    The next command prompt, then after 3 seconds:    Hello from the child process    */

26.2.5: The class `Pipe'

Redirection at the system level requires the use offile descriptor, created by thepipe systemcall. When two processes want to communicate using such file descriptors, thefollowing happens: Though basically simple, errors may easily creep in. Functions of filedescriptors available to the two processes (child or parent) may easily getmixed up. To prevent bookkeeping errors, the bookkeeping may be properly setup once, to be hidden thereafter inside a class like thePipe classdeveloped here. Let's have a look at its characteristics (before usingfunctions likepipe anddup the compiler must have read the<unistd.h> header file): Now that redirection can be configured easily using one or morePipeobjects, we'll useFork andPipe in various example programs.

26.2.6: The class `ParentSlurp'

The classParentSlurp, derived fromFork, starts a child processexecuting a stand-alone program (like/bin/ls). The (standard) output ofthe executed program is not shown on the screen but is read by the parentprocess.

For demonstration purposes the parent process writes the lines itreceives to its standard output stream, prepending linenumbers to thelines. It is attractive to redirect the parent's standardinput stream toallow the parent to read theoutput from the child process using itsstd::cininput stream. Therefore, the only pipe in the program is usedas aninput pipe for the parent, and anoutput pipe for the child.

The classParentSlurp has the following characteristics:

The following program simply constructs aParentSlurp object, andcalls itsfork() member. Its output consists of a numbered list of filesin the directory where the program is started. Note that the program alsoneeds thefork.o, pipe.o andwaitforchild.o object files (seeearlier sources):
    int main()    {        ParentSlurp{}.fork();    }    /*        Generated Output (example only, actually obtained output may differ):        1: a.out        2: bitand.h        3: bitfunctional        4: bitnot.h        5: daemon.cc        6: fdinseek.cc        7: fdinseek.h        ...    */

26.2.7: Communicating with multiple children

The next step up the ladder is the construction of a child-processmonitor. Here, the parent process is responsible for all its child processes,but it also must read their standard output. The user enters information atthe standard input of the parent process. A simplecommand language isused for this: If a child process hasn't received text for some time it will complain bysending a message to the parent-process. Those messages are simply transmittedto the user by copying them to the standard output stream.

A problem with programs like our monitor is that they allowasynchronous input from multiple sources. Input may appear at thestandard input as well as at the input-sides of pipes. Also, multiple outputchannels are used. To handle situations like these, theselect systemcall was developed.

26.2.7.1: The class `Selector': interface

Theselect system call was developed to handle asynchronousI/O multiplexing. Theselect system call is used to handle, e.g., input appearingsimultaneously at a set of file descriptors.

Theselect function is rather complex, and its full discussion isbeyond theC++ Annotations' scope. By encapsulatingselect in aclassSelector, hiding its details and offering an intuitively attractiveinterface, its use is simplified. TheSelector class has thesefeatures:

26.2.7.2: The class `Selector': implementation

Selector's member functions serve the following tasks: The class's remaining (two) members are support members, and should not beused by non-member functions. Therefore, they are declared in the class'sprivate section:

26.2.7.3: The class `Monitor': interface

Themonitor program uses aMonitor object doing most of thework. The classMonitor's public interface only offers a defaultconstructor and one member,run, to perform its tasks. All other memberfunctions are located in the class'sprivate section.

Monitor defines theprivate enumCommands, symbolicallylisting the various commands its input language supports, as well as severaldata members. Among the data members are aSelector object and amapusing child order numbers as its keys and pointer toChild objects (seesection26.2.7.7) as its values. Furthermore,Monitor has a static arraymembers_handler[], storing pointers to member functions handling usercommands.

A destructor should be implemented as well, but its implementation is leftas an exercise to the reader. Here isMonitor's interface, including theinterface of the nested classFind that is used to create a functionobject:

    class Monitor    {        enum Commands        {            UNKNOWN,            START,            EXIT,            STOP,            TEXT,            sizeofCommands        };        using MapIntChild = std::map<int, std::shared_ptr<Child>>;        friend class Find;        class Find        {            int     d_nr;            public:                Find(int nr);                bool operator()(MapIntChild::value_type &vt) const;        };        Selector    d_selector;        int         d_nr;        MapIntChild d_child;        static void (Monitor::*s_handler[])(int, std::string const &);        static int s_initialize;        public:            enum Done            {};            Monitor();            void run();        private:            static void killChild(MapIntChild::value_type it);            static int initialize();            Commands    next(int *value, std::string *line);            void    processInput();            void    processChild(int fd);            void    createNewChild(int, std::string const &);            void    exiting(int = 0, std::string const &msg = std::string{});            void    sendChild(int value, std::string const &line);            void    stopChild(int value, std::string const &);            void    unknown(int, std::string const &);    };

Since there's only one non-class type data member, the class's constructoris a very simple function which could be implemented inline:

    inline Monitor::Monitor()    :        d_nr(0)    {}

26.2.7.4: The class `Monitor': s_handler

The arrays_handler, storing pointers to functions needs to be initialized aswell. This can be accomplished in several ways:

26.2.7.5: The class `Monitor': the member `run'

Monitor's core activities are performed byrun. Itperforms the following tasks: Here isrun's implementation ands_initialize's definition:
    #include "monitor.ih"    int Monitor::s_initialize = Monitor::initialize();    void Monitor::run()    {        d_selector.addReadFd(STDIN_FILENO);        while (true)        {            cout << "? " << flush;            try            {                d_selector.wait();                int fd;                while ((fd = d_selector.readFd()) != -1)                {                    if (fd == STDIN_FILENO)                        processInput();                    else                        processChild(fd);                }                cout << "NEXT ...\n";            }            catch (char const *msg)            {                exiting(1, msg);            }        }    }

The member functionprocessInput reads the commands entered by theuser using the program's standard input stream. The member itself is rathersimple. It callsnext to obtain the next command entered by the user, andthen calls the corresponding function using the matching element of thes_handler[] array. Here are the membersprocessInput andnext:

    void Monitor::processInput()    {        string line;        int    value;        Commands cmd = next(&value, &line);        (this->*s_handler[cmd])(value, line);    }
    Monitor::Commands Monitor::next(int *value, string *line)    {        if (!getline(cin, *line))            exiting(1, "Monitor::next(): reading cin failed");        if (*line == "start")            return START;        if (*line == "exit" || *line == "quit")        {            *value = 0;            return EXIT;        }        if (line->find("stop") == 0)        {            istringstream istr(line->substr(4));            istr >> *value;            return !istr ? UNKNOWN : STOP;        }        istringstream istr(line->c_str());        istr >> *value;        if (istr)        {            getline(istr, *line);            return TEXT;        }        return UNKNOWN;    }

All other input sensed byd_select is created by childprocesses. Becaused_select'sreadFd member returns the correspondinginput file descriptor, this descriptor can be passed toprocessChild. Using aIFdStreambuf (see section26.1.2.1), itsinformation is read from an input stream. The communication protocol used hereis rather basic. For every line of input sent to a child, the child replies bysending back exactly one line of text. This line is then read byprocessChild:

    void Monitor::processChild(int fd)    {        IFdStreambuf ifdbuf(fd);        istream istr(&ifdbuf);        string line;        getline(istr, line);        cout << d_child[fd]->pid() << ": " << line << '\n';    }
The constructiond_child[fd]->pid() used in the above source deservessome special attention.Monitor defines the data membermap<int,shared_ptr<Child>> d_child. This map contains the child's order numberas its key, and a (shared) pointer to theChild object as its value. Ashared pointer is used here, rather than aChild object, since we want touse the facilities offered by the map, but don't want to copy aChildobject time and again.

26.2.7.6: The class `Monitor': example

Now thatrun's implementation has been covered, we'll concentrate onthe various commands users might enter: The program'smain function is simple and needs no further comment:
    int main()    try    {        Monitor{}.run();    }    catch (int exitValue)    {        return exitValue;    }

26.2.7.7: The class `Child'

When theMonitor object starts a child process, it creates an objectof the classChild. TheChild class is derived from the classFork, allowing it to operate as adaemon (as discussed in theprevious section). SinceChild is a daemon class, we know that its parentprocess must be defined as an empty function. ItschildProcess memberhas a non-empty implementation. Here are the characteristics of the classChild:

26.3: Adding binary operators to classes

As we've seen in section11.7 binary operators expectingconst & arguments can be implemented using a member implementing theoperation, only offering the basic exception guarantee.This latter function can in turn be implemented using the binaryassignment member. The following examples illustrated this approach for afictitious classBinary:
    class Binary    {        public:            Binary();            Binary(int value);                // copy and move constructors are available by default, or                // they can be explicitly declared and implemented.            Binary &operator+=(Binary const &other) &;    // see the text            Binary &&operator+=(Binary const &other) &&;        private:            void add(Binary const &rhs);        friend Binary operator+(Binary const &lhs, Binary const &rhs);        friend Binary operator+(Binary &&lhs, Binary const &rhs);    };

Eventually, the implementation of binary operators depends on the availabilityof the member implementing the basic binary operation, modifying the objectcalling that member (i.e.,void Binary::add(Binary const &) in theexample).

Since template functions are not instantiated before they are actually used wecan call non-existing functions from template functions that are neverinstantiated. If such a template function is never instantiated, nothinghappens; if it is (accidentally) instantiated, then the compiler generates anerror message, complaining about the missing function.

This allows us to implement all binary operators, movable and non-movable, astemplates. In the following subsections we develop the class templateBinops, prividing binary operators. A complete implementation of a classDerived illustrating how addition and insertion operators can be added toa class is provided in the fileannotations/yo/concrete/examples/binopclasses.cc in theC++ Annotations'source archive.

26.3.1: Merely using operators

In section11.7 addition operators are implemented in terms ofa support memberadd. This is less attractive when developing functiontemplates, asadd is a private member, requiring us to provide frienddeclarations for all function templates so they may access the privateaddmember.

At the end of section11.7 we saw thatadd's implementationcan be provided byoperator+=(Class const &rhs) &&. This operator maythereupon be used when implementing the remaining addition operators:

    inline Binary &operator+=(Binary const &rhs) &    {        return *this = Binary{*this} += rhs;            }    Binary operator+(Binary &&lhs, Binary const &rhs)    {        return std::move(lhs) += rhs;    }    Binary operator+(Binary const &lhs, Binary const &rhs)    {        return Binary{lhs} += rhs;    }

In this implementationadd is no longer required. The plain binaryoperators are free functions, which supposedly can easily be converted tofunction templates. E.g.,

    template <typename Binary>    Binary operator+(Binary const &lhs, Binary const &rhs)    {        return Binary{lhs} += rhs;    }

26.3.1.1: To namespace or not to namespace?

When using the function templateBinary operator+(Binary const &lhs, Binaryconst &rhs), however, we may encounter a subtle and unexpectedcomplication. Consider the following program. When run, it displays the value12, rather than 1:
    enum Values    {        ZERO,        ONE    };        template <typename Tp>    Tp operator+(Tp const &lhs, Tp const &rhs)    {        return static_cast<Tp>(12);    };        int main()    {        cout << (ZERO + ONE);       // shows 12    }

This complication can be avoided by defining the operators in their ownnamespace, but then all classes using the binary operator also have to bedefined in that namespace, which is not a very attractiverestriction. Fortunately, there is a better alternative: using the CRTP(cf. section22.12).

26.3.2: The CRTP and defining operator function templates

When deriving classes from a class templateBinops, using the CRTP theoperators are defined for arguments of the classBinops<Derived>: a baseclass receiving the derived class as its template argument.

Thus the classBinops as well as the additional operators are defined,expectingBinops<Derived> type of arguments:

    template <class Derived>    struct Binops    {        Derived &operator+=(Derived const &rhs) &;    };    template <typename Derived>    Derived operator+(Binops<Derived> const &lhs, Derived const &rhs)    {        return Derived{static_cast<Derived const &>(lhs) } += rhs;    }    // analogous implementation for Binops<Derived> &&lhs

This way, a class that derives fromBinops, and that provides anoperator+= member which is bound to an rvalue reference object, suddenlyalso provides all other binary addition operators:

    class Derived: public Binops<Derived>    {        ...        public:            ...            Derived &&operator+=(Derived const &rhs) &&    };

All, but one....

The operator that's not available is the compound addition operator,bound to an lvalue reference. As its function name is identical to the one inthe classDerived, it is not automatically visible at the user level.

Although this problem can simply be solved by providing the classDerivedwith ausing Binops<Derived>::operator+= declaration, it is not a veryattractive solution, as separate using declarations have to be provided foreach binary operator that is implemented in the classDerived.

But amuch more attractive solution exists. A beautiful out-of-the-boxsolution, completely avoiding the hidden base class operator, was proposedbyWiebe-Marten Wijnja. Wiebe-Marten conjectured thatoperator+=, boundto an lvalue reference could also very well be defined as afreefunction. In that case no inheritance is used and therefore no function hidingoccurs. Consequently, theusing directive can be avoided.

The implementation of this freeoperator+= function looks like this:

    template <class Derived>    Derived &operator+=(Binops<Derived> &lhs, Derived const &rhs)     {        Derived tmp{ Derived{ static_cast<Derived &>(lhs) } += rhs };        tmp.swap(static_cast<Derived &>(lhs));        return static_cast<Derived &>(lhs);    }

The flexibility of this design can be further augmented once we realize thatthe right-hand side operand doesn't have to be aDerived classobject. Consideroperator<<: oftentimes shifts are bit-shifts, using asize_t to specify the number of bits to shift. In fact, the type of theright-hand side operand can completely be generalized by defining a secondtemplate type parameter, which is used to specify the right-hand side'soperand type. It's up to theDerived class to specify the argument type ofitsoperator+= (or any other binary compound operator), whereafter thecompiler will deduce the types of the right-hand side operands for theremaining binary operators. Here is the final implementation of the freeoperator+= function:

    template <class Derived, typename Rhs>    Derived &operator+=(Binops<Derived> &lhs, Rhs const &rhs)     {        Derived tmp{ Derived{ static_cast<Derived &>(lhs) } += rhs };        tmp.swap(static_cast<Derived &>(lhs));        return static_cast<Derived &>(lhs);    }

26.3.3: Insertion and extraction

Classes also frequently define overloaded insertion and extractionoperators. Since there are no `compound insertion operators' the design shownso far cannot be used when overloading these operators. Instead usingstandardized member function signatures is advocated:voidinsert(std::ostream &out) const to insert an object into anostream andvoid extract(std::istream &in) const to extract an object from anistream. As these functions are only used by, respectively, the insertionand extraction operators, they can be declared in theDerived class'sprivate interface. Instead of declaring the insertion and extraction operatorsfriends of the classDerived a singlefriend Binops<Derived> isspecified. This allowsBinops<Derived> to define private, inlineiWrapandeWrap members, merely calling, respectively,Derived's insert andextract members:
    template <typename Derived>    inline void Binops<Derived>::iWrap(std::ostream &out) const    {        static_cast<Derived const &>(*this).insert(out);    }

Binops<Derived> then declares the insertion and extraction operators asits friends, allowing these operators to call, respectively,iWrap andeWrap. Note that the software engineer designing the classDerivedonly has to provide afriend Binops<Derived> declaration. Here is theimplementation of the overloaded insertion operator:

    template <typename Derived>    std::ostream &operator<<(std::ostream &out, Binops<Derived> const &obj)    {        obj.iWrap(out);        return out;    }

This completes the coverage of the essentials of a class templateBinopspotentially offering binary operators and insertion/extraction operators forany class derived fromBinops. Finally, as noted at the beginning of thissection, a complete implementation of a class offering addition and insertionoperators is provided in the fileannotations/yo/concrete/examples/binopclasses.cc in theC++ Annotations'source archive.

26.4: Distinguishing lvalues from rvalues with operator[]()

A problem withoperator[] is that it can't distinguish between its use as anlvalue and as anrvalue. It is a familiar misconception tothink that
    Type const &operator[](size_t index) const

is used asrvalue (as the object isn't modified), and that

    Type &operator[](size_t index)

is used aslvalue (as the returned value can be modified).

The compiler, however, distinguishes between the two operators only by theconst-status of the object for whichoperator[] is called. Withconst objects the former operator is called, with non-const objectsthe latter is always used. It is always used, irrespective of it being used aslvalue or rvalue.

Being able to distinguish between lvalues and rvalues can be veryuseful. Consider the situation where a class supportingoperator[] storesdata of a type that is very hard to copy. With data like that referencecounting (e.g., usingshared_ptrs) is probably used to prevent needlesscopying.

As long asoperator[] is used as rvalue there's no need to copy the data,but the informationmust be copied if it is used as lvalue.

TheProxy Design Pattern (cf.Gamma et al. (1995)) canbe used to distinguish between lvalues and rvalues. With the Proxy DesignPattern an object of another class (the Proxy class) is used to act as astand in for the `real thing'. The proxy class offers functionality thatcannot be offered by the data themselves, like distinguishing between its useas lvalue or rvalue. A proxy class can be used in many situations where accessto the real data cannot or should not be directly provided. In this regarditerator types are examples of proxy classes as they create a layerbetween the real data and the software using the data. Proxy classes couldalso dereference pointers in a class storing its data by pointers.

In this section we concentrate on the distinction between usingoperator[]as lvalue and rvalue. Let's assume we have a classLines storing linesfrom a file. Its constructor expects the name of a stream from which thelines are read and it offers a non-constoperator[] that can be used aslvalue or rvalue (theconst version ofoperator[] is omitted as itcauses no confusion because it is always used as rvalue):

    class Lines    {        std::vector<std::string> d_line;        public:            Lines(std::istream &in);            std::string &operator[](size_t idx);    };

To distinguish between lvalues and rvalues we must find distinguishingcharacteristics of lvalues and rvalues that we can exploit. Suchdistinguishing characteristics areoperator= (which is always used aslvalue) and the conversion operator (which is always used as rvalue). Ratherthan havingoperator[] return astring & we can let it return aProxy object that is able to distinguish between its use as lvalueand rvalue.

The classProxy thus needsoperator=(string const &other) (acting aslvalue) andoperator std::string const &() const (acting as rvalue). Do weneed more operators? Thestd::string class also offersoperator+=, sowe should probably implement that operator as well. Plain characters can alsobe assigned tostring objects (even using their numeric values). Asstring objects cannot beconstructed from plain characterspromotion cannot be used withoperator=(string const &other) if theright-hand side argument is a character. Implementingoperator=(charvalue) could therefore also be considered. These additional operators areleft out of the current implementation but `real life' proxy classes shouldconsider implementing these additional operators as well. Another subtlety isthatProxy'soperator std::string const &() const is not usedwhen usingostream's insertion operator oristream's extractionoperator as these operators are implemented as templates not recognizing ourProxy class type. So when stream insertion and extraction is required (itprobably is) thenProxy must be given its own overloaded insertion andextraction operator. Here is an implementation of the overloaded insertionoperator inserting the object for whichProxy is a stand-in:

inline std::ostream &operator<<(std::ostream &out, Lines::Proxy const &proxy){    return out << static_cast<std::string const &>(proxy);}

There's no need for any code (exceptLines) to create or copyProxyobjects.Proxy's constructor should therefore be made private, andProxy can declareLines to be its friend. In fact,Proxy isintimately related toLines and can be defined as a nested class. In therevisedLines classoperator[] no longer returns astring butinstead aProxy is returned. Here is the revisedLines class,including its nestedProxy class:

    class Lines    {        std::vector<std::string> d_line;        public:            class Proxy;            Proxy operator[](size_t idx);            class Proxy            {                friend Proxy Lines::operator[](size_t idx);                std::string &d_str;                Proxy(std::string &str);                public:                    std::string &operator=(std::string const &rhs);                    operator std::string const &() const;            };            Lines(std::istream &in);    };

Proxy's members are very lightweight and can usually be implementedinline:

    inline Lines::Proxy::Proxy(std::string &str)    :        d_str(str)    {}    inline std::string &Lines::Proxy::operator=(std::string const &rhs)    {        return d_str = rhs;    }    inline Lines::Proxy::operator std::string const &() const    {        return d_str;    }

The memberLines::operator[] can also be implemented inline: it merelyreturns aProxy object initialized with thestring associated withindexidx.

Now that the classProxy has been developed it can be used in aprogram. Here is an example using theProxy object as lvalue or rvalue. Onthe surfaceLines objects won't behave differently fromLines objectsusing the original implementation, but by adding an identifyingcoutstatement toProxy's members it can be shown thatoperator[] behavesdifferently when used as lvalue or as rvalue:

    int main()    {        ifstream in("lines.cc");        Lines lines(in);        string s = lines[0];        // rvalue use        lines[0] = s;               // lvalue use        cout << lines[0] << '\n';   // rvalue use        lines[0] = "hello world";   // lvalue use        cout << lines[0] << '\n';   // rvalue use    }

26.5: Implementing a `reverse_iterator'

In section22.14.1 the construction of iterators andreverse iterators was discussed. In that section the iterator was constructedas an inner class in a class derived from a vector of pointers to strings.

An object of this nested iterator class handles the dereferencing of thepointers stored in the vector. This allowed us to sort thestringspointed to by the vector's elements rather than thepointers.

A drawback of this is that the class implementing the iterator is closelytied to the derived class as the iterator class was implemented as a nestedclass. What if we would like to provide any class derived from a containerclass storing pointers with an iterator handling pointer-dereferencing?

In this section a variant of the earlier (nested class) approach isdiscussed. Here the iterator class is defined as a class template, not only parameterizing the data type to which the container's elementspoint but also the container's iterator type itself. Once again, weconcentrate on developing aRandomIterator as it is the most complexiterator type.

Our class is namedRandomPtrIterator, indicating that it is a randomiterator operating on pointer values. The class template defines threetemplate type parameters:

RandomPtrIterator has one private data member, aBaseIterator. Here is the class interface and the constructor'simplementation:

    #include <iterator>    #include <compare>    template <typename Class, typename BaseIterator, typename Type>    struct RandomPtrIterator;    #define PtrIterator RandomPtrIterator<Class, BaseIterator, Type>    #define PtrIteratorValue RandomPtrIterator<Class, BaseIterator, value_type>    template <typename Class, typename BaseIterator, typename Type>    bool operator==(PtrIterator const &lhs, PtrIterator const &rhs);    template <typename Class, typename BaseIterator, typename Type>    auto operator<=>(PtrIterator const &lhs, PtrIterator const &rhs);    template <typename Class, typename BaseIterator, typename Type>    int operator-(PtrIterator const &lhs, PtrIterator const &rhs);    template <typename Class, typename BaseIterator, typename Type>    struct RandomPtrIterator    {        using iterator_category = std::random_access_iterator_tag;        using difference_type   = std::ptrdiff_t;        using value_type        = Type;        using pointer           = value_type *;        using reference         = value_type &;        friend PtrIterator Class::begin();        friend PtrIterator Class::end();        friend bool operator==<>(RandomPtrIterator const &lhs,                                 RandomPtrIterator const &rhs);        friend auto operator<=><>(RandomPtrIterator const &lhs,                                  RandomPtrIterator const &rhs);        friend int operator-<>(RandomPtrIterator const &lhs,                               RandomPtrIterator const &rhs);        private:            BaseIterator d_current;        public:            int operator-(RandomPtrIterator const &rhs) const;            RandomPtrIterator operator+(int step) const;            value_type &operator*() const;            RandomPtrIterator &operator--();            RandomPtrIterator operator--(int);            RandomPtrIterator &operator++();            RandomPtrIterator operator++(int);            RandomPtrIterator operator-(int step) const;            RandomPtrIterator &operator-=(int step);            RandomPtrIterator &operator+=(int step);            value_type *operator->() const;        private:            RandomPtrIterator(BaseIterator const &current);    };    template <typename Class, typename BaseIterator, typename value_type>    PtrIteratorValue::RandomPtrIterator(BaseIterator const &current)    :        d_current(current)    {}
Looking at itsfriend declarations, we see that the membersbeginandend of a classClass, returning aRandomPtrIterator object forthe typesClass, BaseIterator andType are granted access toRandomPtrIterator's private constructor. That is exactly what wewant. TheClass'sbegin andend members are declared asboundfriends.

AllRandomPtrIterator's remaining members are public. SinceRandomPtrIterator is just a generalization of the nested classiterator developed in section22.14.1, re-implementing the requiredmember functions is easy and only requires us to changeiterator intoRandomPtrIterator and to changestd::string intoType. Forexample,operator<, defined in the classiterator as is now implemented as:

    template <typename Class, typename BaseIterator, typename Type>    inline auto operator<=>(PtrIterator const &lhs, PtrIterator const &rhs)    {        return **lhs.d_current <=> **rhs.d_current;    }
Some additional examples:operator*, defined in the classiterator as
inline std::string &StringPtr::iterator::operator*() const{    return **d_current;}
is now implemented as:
    template <typename Class, typename BaseIterator, typename value_type>    value_type &PtrIteratorValue::operator*() const    {        return **d_current;    }
The pre- and postfix increment operators are now implemented as:
    template <typename Class, typename BaseIterator, typename value_type>    PtrIteratorValue &PtrIteratorValue::operator++()    {        ++d_current;        return *this;    }    template <typename Class, typename BaseIterator, typename value_type>    PtrIteratorValue PtrIteratorValue::operator++(int)    {        return RandomPtrIterator(d_current++);    }
Remaining members can be implemented accordingly, their actualimplementations are left as exercises to the reader (or can be obtained fromthecplusplus.yo.zip archive, of course).

Re-implementing the classStringPtr developed in section22.14.1is not difficult either. Apart from including the header file defining theclass templateRandomPtrIterator, it only requires a single modification.Itsiterator using-declaration must now be associated with aRandomPtrIterator. Here is the full class interface and the class's inlinemember definitions:

    #ifndef INCLUDED_STRINGPTR_H_    #define INCLUDED_STRINGPTR_H_    #include <vector>    #include <string>    #include "iterator.h"    class StringPtr: public std::vector<std::string *>    {        public:            using iterator =                    RandomPtrIterator                    <                        StringPtr,                        std::vector<std::string *>::iterator,                        std::string                    >;            using reverse_iterator = std::reverse_iterator<iterator>;            iterator begin();            iterator end();            reverse_iterator rbegin();            reverse_iterator rend();    };    inline StringPtr::iterator StringPtr::begin()    {        return iterator(this->std::vector<std::string *>::begin() );    }    inline StringPtr::iterator StringPtr::end()    {        return iterator(this->std::vector<std::string *>::end());    }    inline StringPtr::reverse_iterator StringPtr::rbegin()    {        return reverse_iterator(end());    }    inline StringPtr::reverse_iterator StringPtr::rend()    {        return reverse_iterator(begin());    }    #endif

IncludingStringPtr's modified header file into the program given insection22.14.2 results in a program behaving identically to itsearlier version. In this caseStringPtr::begin andStringPtr::endreturn iterator objects constructed from a template definition.

26.6: Using `bisonc++' and `flexc++'

The example discussed below digs into the peculiarities of usingparser- andscanner generators generatingC++ sources. Once theinput for a program exceeds a certain level of complexity, it becomesattractive to use scanner- and parser-generators generating the code whichdoes the actual input recognition.

The examples in this and subsequent sections assume that the reader knows howto use thescanner generatorflex and theparser generatorbison. Bothbison andflex are well documented elsewhere. The originalpredecessors ofbison andflex, calledyacc andlex aredescribed in several books, e.g. in O'Reilly's book`lex & yacc'.

Scanner- and parser generators are also available as free software. Bothbison andflex are usually part of software distributions or they canbe obtained fromftp://prep.ai.mit.edu/pub/non-gnu.Flex creates aC++ classwhen%option c++ is specified.

For parser generators the programbison is available. In the early 90'sAlain Coetmeur (coetmeur@icdc.fr) created aC++ variant (bison++) creating a parser class. Although thebison++ program produces code that can be used inC++ programs it alsoshows many characteristics that are more suggestive of aC context than aC++ context. In January 2005 I rewrote parts of Alain'sbison++program, resulting in the original version of the programbisonc++. Then, in May 2005 a complete rewrite of thebisonc++parser generator was completed (version number 0.98). Current versions ofbisonc++ can be downloaded fromhttps://fbb-git.gitlab.io/bisoncpp/. Binary versions for variousarchitectures are available as, e.g.,Debianpackage (includingbisonc++'s documentation).

Bisonc++ creates a cleaner parser class thanbison++. In particular,it derives the parser class from a base-class, containing the parser's token-and type-definitions as well as all member functions which should not be(re)defined by the programmer. As a result of this approach, the generatedparser class is very small, declaring only members that are actually definedby the programmer (as well as some other members, generated bybisonc++itself, implementing the parser'sparse() member). One member that isnot implemented by default islex, producing the next lexicaltoken. When the directive%scanner (see section26.6.2.1) is used,bisonc++ produces a standard implementation for this member; otherwise itmust be implemented by the programmer.

In early 2012 the programflexc++http://flexcpp.org/ reached its initial release. Likebisonc++ it is part of theDebian linuxdistribution.

Jean-Paul van Oosten (jp@jpvanoosten.nl) and Richard Berendsen(richardberendsen@xs4all.nl) started theflexc++ project in 2008and the final program was completed by Jean-Paul and me between 2010 and 2012.

These sections of theC++ Annotations focus onbisonc++ as ourparser generator andflexc++ as our lexical scannergenerator. Previous releases of theC++ Annotations were usingflex as thescanner generator.

Usingflexc++ andbisonc++class-based scanners and parsers aregenerated. The advantage of this approach is that the interface to the scannerand the parser tends to become cleaner than without usingclassinterfaces. Furthermore, classes allow us to get rid of most if not all globalvariables, making it easy to use multiple parsers in one program.

Below two example programs are developed. The first example only usesflexc++. The generated scanner monitors the production of a file fromseveral parts. That example focuses on the lexical scanner and on switchingfiles while churning through the information. The second example uses bothflexc++ andbisonc++ to generate a scanner and a parser transformingstandard arithmetic expressions to their postfix notations, commonly used incode generated by compilers and inHP-calculators. In the second examplethe emphasis is mainly onbisonc++ and on composing a scanner objectinside a generated parser.

26.6.1: Using `flexc++' to create a scanner

The lexical scanner developed in this section is used to monitor theproduction of a file from several subfiles. The setup is as follows: theinput-language defines#include directives, followed by a textstring specifying the file (path) which should be included at the location ofthe#include.

In order to avoid complexities irrelevant to the current example, the formatof the#include statement is restricted to the form#include<filepath>. The file specified between the angle brackets should beavailable at the location indicated byfilepath. If the file is notavailable, the program terminates after issuing an error message.

The program is started with one or two filename arguments. If the program isstarted with just one filename argument, the output is written to thestandard output streamcout. Otherwise, the output is written tothe stream whose name is given as the program's second argument.

The program defines a maximumnesting depth. Once this maximum is exceeded,the program terminates after issuing an error message. In that case, thefilename stack indicating where which file was included is printed.

An additional feature of the program is that (standardC++) comment-linesare ignored. Include-directives in comment-lines are also ignored.

The program is created in five major steps:

26.6.1.1: The derived class `Scanner'

The function matching theregular expression rules (lex) is amember of the classScanner. SinceScanner is derived fromScannerBase, it has access to all ofScannerBase's protected membersthat execute the lexical scanner's regular expression matching algorithm.

Looking at the regular expressions themselves, notice that we need rulesto recognize comment,#include directives, and all remaining characters.This all is fairly standard practice. When an#include directive issensed, the directive is parsed by the scanner. This too is commonpractice. Our lexical scanner performs the following tasks:

26.6.1.2: The lexical scanner specification file

Thelexical scanner specification file is organized comparably to theone used forflex inC contexts. However, inC++ contexts,flexc++ creates a classScanner, rather than just a scanner function.

Flexc++'s specification file consists of two sections:

26.6.1.3: Implementing `Scanner'

Theclass Scanner is generated once byflexc++. This class hasaccess to several members defined by its base classScannerBase. Some ofthese members have public access rights and can be used by code external tothe classScanner. These members are extensively documented in theflexc++(1) man-page, and the reader is referred to this man-page forfurther information.

Our scanner performs the following tasks:

The#include statements in the input allow the scanner to distill thename of the file where the scanning process must continue. This file name isstored in a local variabled_nextSource and a memberstackSourcehandles the switch to the next source. Nothing else is required. Pushing andpopping input files is handled by the scanner's memberspushStream andpopStream, provided byflexc++.Scanner's interface, therefore,only needs one additional function declaration:switchSource.

Switching streams is handled as follows: once the scanner has extracted afilename from an#include directive, a switch to another file is realizedbyswitchSource. This member callspushStream, defined byflexc++, to stack the current input stream and to switch to the streamwhose name is stored ind_nextSource. This also ends theincludemini-scanner, so to return the scanner to its default scanning modebegin(StartCondition__::INITIAL) is called. Here is its source:

#include "scanner.ih"void Scanner::switchSource(){    pushStream(d_nextSource);    begin(StartCondition__::INITIAL);}
The memberpushStream, defined byflexc++, handles all necessarychecks, throwing an exception if the file could not be opened or if too manyfiles are stacked.

The member performing the lexical scan is defined byflexc++ inScanner::lex, and this member can be called by code to process the tokensreturned by the scanner.

26.6.1.4: Using a `Scanner' object

The program using ourScanner is very simple. It expects a filenameindicating where to start the scanning process.

The program first checks the number of arguments. If at least one argument wasgiven, then that argument is passed toScanner's constructor, togetherwith a second argument"-", indicating that the output should go to thestandard output stream.

If the program receives more than one argument debug output, extensivelydocumenting the lexical scanner's actions, is written to the standard outputstream as well.

Next theScanner'slex member is called. If anything fails, astd::exception is thrown, which is caught bymain's try-block's catchclause. Here is the program's source:

#include "lexer.ih"int main(int argc, char **argv)try{    if (argc == 1)    {        cerr << "Filename argument required\n";        return 1;    }    Scanner scanner(argv[1], "-");    scanner.setDebug(argc > 2);    return scanner.lex();}catch (exception const &exc){    cerr << exc.what() << '\n';    return 1;}

26.6.1.5: Building the program

The final program is constructed in two steps. These steps are given for aUnix system, on whichflexc++ and theGNUC++ compilerg++have been installed:Flexc++ can be downloaded fromhttps://fbb-git.gitlab.io/flexcpp/, and requires thebobcat library, which can be downloaded fromhttp://fbb-git.gitlab.io/bobcat/.

26.6.2: Using `bisonc++' and `flexc++'

Once aninput language exceeds a certain level of complexity, aparseris often used to control the complexity of the language. In this case, aparser generator can be used to generate the code verifying the input'sgrammatical correctness. The lexical scanner (preferably composed into theparser) provides chunks of the input, calledtokens. The parserthen processes the series of tokens generated by the lexical scanner.

Starting point when developing programs that use both parsers and scanners isthegrammar. The grammar defines aset of tokens that can be returnedby the lexical scanner (called thescanner below).

Finally, auxiliary code is provided to `fill in the blanks': theactionsperformed by the parser and by the scanner are not normally specifiedliterally in the grammar rules or lexical regular expressions, butshould be implemented inmember functions, called from the parser'srules or which are associated with the scanner's regular expressions.

In the previous section we've seen an example of aC++ class generated byflexc++. In the current section we concentrate on the parser. The parsercan be generated from a grammar specification file, processed by the programbisonc++. The grammar specification file required bybisonc++ issimilar to the file processed bybison (orbison++,bisonc++'spredecessor, written in the early nineties byAlain Coetmeur).

In this section a program is developed convertinginfix expressions,where binary operators are written between their operands, topostfixexpressions, where operators are written behind their operands. Also, theunary operator- is converted from its prefix notation to a postfix form.The unary+ operator is ignored as it requires no further actions. Inessence our little calculator is a micro compiler, transforming numericexpressions into assembly-like instructions.

Our calculator recognizes a rather basic set of operators:multiplication, addition, parentheses, and the unary minus. We'lldistinguish real numbers from integers, to illustrate a subtlety inbison-like grammar specifications. That's all. The purpose of this section is,after all, to illustrate the construction of aC++ program that uses botha parser and a lexical scanner, rather than to construct a full-fledgedcalculator.

In the coming sections we'll develop the grammar specification forbisonc++. Then, the regular expressions for the scanner arespecified. Following that, the final program is constructed.

26.6.2.1: The `bisonc++' specification file

Thegrammar specification file required bybisonc++ is comparable tothe specification file required bybison. Differences are related to theclass nature of the resulting parser. Our calculator distinguishes realnumbers from integers, and supports a basic set of arithmetic operators.

Bisonc++ should be used as follows:

Thebisonc++ specification file has twosections:

Readers familiar withbison may note that there is noheader section anymore. Header sections are used by bison to providefor the necessary declarations allowing the compiler to compile theCfunction generated bybison. InC++ declarations are part of oralready used by class definitions. Therefore, a parser generator generating aC++ class and some of its member functions does not require a headersection anymore.

The declaration section

Thedeclaration section contains several sets of declarations, amongwhich definitions of all the tokens used in the grammar and the priorities andassociativities of the mathematical operators. Moreover, several new andimportant specifications can be used here as well. Those relevant to ourcurrent example and only available inbisonc++ are discussed here. Thereader is referred tobisonc++'s man-page for a full description. An example of a%union declaration is:
    %union    {        int     i;        double  d;    };

In pre-C++11 code aunion cannot contain objects as its fields, asconstructors cannot be called when a union is created. This means that astring cannot be a member of theunion. Astring *, however,is a possible union member. It might alsobe possible to useunrestricted unions (cf. section9.9), havingclass type objects as fields.

As an aside: the scanner does not have to know about such a union. Itcan simply pass its scanned text to the parser through itsmatched memberfunction. For example using a statement like

    $$.i = A2x(d_scanner.matched());

matched text is converted to a value of an appropriate type.

Tokens and non-terminals can be associated with union fields. This isstrongly advised, as it prevents type mismatches, since the compiler may thencheck for type correctness. At the same time, the bison specificvariables$$,$1,$2, etc. may be used, rather than the full fieldspecification (like$$.i). A non-terminal or a token may be associatedwith a union field using the<fieldname> specification. E.g.,

    %token <i> INT          // token association (deprecated, see below)           <d> DOUBLE    %type  <i> intExpr      // non-terminal association

In the example developed here, both the tokens and the non-terminals canbe associated with a union field. However, as noted before, the scanner doesnot have to know about all this. In our opinion, it is cleaner to let thescanner do just one thing: scan texts. Theparser, knowing what the inputis all about, may then convert strings like"123" to an integervalue. Consequently, the association of a union field and a token isdiscouraged. Below, while describing the grammar's rules, this is furtherillustrated.

In the%union discussion the%token and%type specificationsshould be noted. They are used to specify the tokens (terminal symbols) thatcan be returned by the scanner, and to specify the return types ofnon-terminals. Apart from%token the token declarators%left,%right, and%nonassoc can be used to specify the associativity ofoperators. The tokens mentioned at these indicators are interpreted as tokensindicating operators, associating in the indicated direction. The precedenceof operators is defined by their order: the first specification has the lowestpriority. To overrule a certain precedence in a certain context%prec canbe used. As all this is standardbisonc++ practice, it isn't furtherelaborated here. The documentation provided withbisonc++'s distributionshould be consulted for further reference.

Here is the specification of the calculator's declaration section:

%filenames parser%scanner ../scanner/scanner.h%union {    int i;    double d;};%token  INT DOUBLE%type   <i> intExpr%type   <d> doubleExpr%left   '+'%left   '*'%right  UnaryMinus
In the declaration section%type specifiers are used, associating theintExpr rule's value (see the next section) to thei-field of thesemantic-value union, and associatingdoubleExpr's value to thed-field. This approach, admittedly, is rather complex, as expression rulesmust be included for each of the supported union types. Alternatives aredefinitely possible, and involve the use ofpolymorphic semanticvalues, covered in detail in theBisonc++ user guide.

The grammar rules

The rules and actions of the grammar are specified as usual. The grammarfor our little calculator is given below. There are quite a few rules, butthey illustrate various features offered bybisonc++. In particular, notethat no action block requires more than a single line of code. This keeps thegrammar simple, and therefore enhances its readability andunderstandability. Even the rule defining the parser's proper termination (theempty line in theline rule) uses a single member function calleddone. The implementation of that function is simple, but it is worth whilenoting that it callsParser::ACCEPT, showing thatACCEPT can be calledindirectly from a production rule's action block. Here are the grammar'sproduction rules:
    lines:        lines        line    |        line    ;    line:        intExpr        '\n'        {            display($1);        }    |        doubleExpr        '\n'        {            display($1);        }    |        '\n'        {            done();        }    |        error        '\n'        {            reset();        }    ;    intExpr:        intExpr '*' intExpr        {            $$ = exec('*', $1, $3);        }    |        intExpr '+' intExpr        {            $$ = exec('+', $1, $3);        }    |        '(' intExpr ')'        {            $$ = $2;        }    |        '-' intExpr         %prec UnaryMinus        {            $$ = neg($2);        }    |        INT        {            $$ = convert<int>();        }    ;    doubleExpr:        doubleExpr '*' doubleExpr        {            $$ = exec('*', $1, $3);        }    |        doubleExpr '*' intExpr        {            $$ = exec('*', $1, d($3));        }    |        intExpr '*' doubleExpr        {            $$ = exec('*', d($1), $3);        }    |        doubleExpr '+' doubleExpr        {            $$ = exec('+', $1, $3);        }    |        doubleExpr '+' intExpr        {            $$ = exec('+', $1, d($3));        }    |        intExpr '+' doubleExpr        {            $$ = exec('+', d($1), $3);        }    |        '(' doubleExpr ')'        {            $$ = $2;        }    |        '-' doubleExpr         %prec UnaryMinus        {            $$ = neg($2);        }    |        DOUBLE        {            $$ = convert<double>();        }    ;
This grammar is used to implement a simple calculator in which integer andreal values can be negated, added, and multiplied and in which standardpriority rules can be overruled by parentheses. The grammar shows the use oftyped nonterminal symbols:doubleExpr is linked to real (double) values,intExpr is linked to integer values. Precedence and type association isdefined in the parser's definition section.

The Parser's header file

Several class members called from the grammar are defined as membertemplates.Bisonc++ generates multiple files, among which the filedefining the parser's class. Functions called from the production rule'saction blocks are usually member functions of the parser. These memberfunctions must be declared and defined. Oncebisonc++ has generated theheader file defining the parser's class, that header file isn't automaticallyrewritten, allowing the programmer to add new members to the parser classwhenever required. Here is `parser.h' as used in our little calculator:
#ifndef Parser_h_included#define Parser_h_included#include <iostream>#include <sstream>#include <bobcat/a2x>#include "parserbase.h"#include "../scanner/scanner.h"#undef Parserclass Parser: public ParserBase{    std::ostringstream d_rpn;    // $insert scannerobject    Scanner d_scanner;    public:        int parse();    private:        template <typename Type>        Type exec(char c, Type left, Type right);        template <typename Type>        Type neg(Type op);        template <typename Type>        Type convert();        void display(int x);        void display(double x);        void done() const;        void reset();        void error(char const *msg);        int lex();        void print();        static double d(int i);    // support functions for parse():        void executeAction(int d_ruleNr);        void errorRecovery();        int lookup(bool recovery);        void nextToken();        void print__();};inline double Parser::d(int i){    return i;}template <typename Type>Type Parser::exec(char c, Type left, Type right){    d_rpn << " " << c << " ";    return c == '*' ? left * right : left + right;}template <typename Type>Type Parser::neg(Type op){    d_rpn << " n ";    return -op;}template <typename Type>Type Parser::convert(){    Type ret = FBB::A2x(d_scanner.matched());    d_rpn << " " << ret << " ";    return ret;}inline void Parser::error(char const *msg){    std::cerr << msg << '\n';}inline int Parser::lex(){    return d_scanner.lex();}inline void Parser::print(){}#endif

26.6.2.2: The `flexc++' specification file

The flex-specification file used by the calculator is simple: blanks areignored, single characters are returned, and numeric values are returned aseitherParser::INT orParser::DOUBLE tokens.

Theflexc++ directive%interactive is provided since thecalculator is a program actively interacting with its human user.

Here is the completeflexc++ specification file:

%interactive%filenames scanner%%[ \t]                       // ignored[0-9]+                      return Parser::INT;"."[0-9]+                   |[0-9]+"."[0-9]*             return Parser::DOUBLE;.|\n                        return matched()[0];

26.6.2.3: Building the program

The calculator is built usingbisonc++ andflexc++. Here is theimplementation of the calculator'smain function:
#include "parser/parser.h"using namespace std;int main(){    Parser parser;    cout << "Enter (nested) expressions containing ints, doubles, *, + and "            "unary -\n"            "operators. Enter an empty line to stop.\n";    return parser.parse();}

The parser's filesparse.cc andparserbase.h are generated by thecommand:

    bisonc++ grammar

The fileparser.h is created only once, to allow the developer to addmembers to theParser class occe the need for them arises.

The programflexc++ is used to create a lexical scanner:

    flexc++ lexer

OnUnix systems a command like

    g++ -Wall -o calc *.cc -lbobcat -s

can be used to compile and link the source of the main program and thesources produced by the scanner and parser generators. The example uses theA2x class, which is part of thebobcat library (cf. section26.6.1.5) (the bobcat library is available on systems offering eitherbisonc++ orflexc++).Bisonc++ can be downloaded from

http://fbb-git.gitlab.io/bisoncpp/.




[8]ページ先頭

©2009-2025 Movatter.jp