Movatterモバイル変換

Chapter 26: Concrete Examples

In this chapter concrete examples ofC++ programs, classes and templatesare presented. Topics covered by theC++ Annotations such as virtualfunctions,static members, etc. are illustrated in this chapter. Theexamples roughly follow the organization of earlier chapters.

As an additional topic, not just providing examples ofC++ the subjects ofscanner andparser generators are covered. We show how these tools maybe used inC++ programs. These additional examples assume a certainfamiliarity with the concepts underlying these tools, like grammars,parse-trees and parse-tree decoration. Once the input for a program exceeds a certainlevel of complexity, it's attractive to use scanner- and parser-generators tocreate the code doing the actual input processing. One of theexamples in this chapter describes the usage of thesetools in aC++ environment.

26.1: Using file descriptors with `streambuf' classes

26.1.1: Classes for output operations

Reading and writing from and tofile descriptorsare not part of theC++ standard. But on most operating systems filedescriptorsare available and can be considered adevice. It seemsnatural to use the classstd::streambuf as the starting pointfor constructing classes interfacing such file descriptor devices.

Below we'll construct classes that can be used to write to a device givenits file descriptor. The devices may be files, but they could also bepipesorsockets. Section26.1.2 covers reading from such devices; section26.2.3 reconsiders redirection, discussed earlier in section6.6.2.

Using thestreambuf class as a base class it is relatively easy todesign classes for output operations. The only member function thatmustbe overridden is the (virtual) memberint streambuf::overflow(int c). This member's responsibility is towrite characters to the device. Iffd is an output file descriptor and ifoutput should not be buffered then the memberoverflow() can simply beimplemented as:

    class UnbufferedFD: public std::streambuf    {        public:            int overflow(int c) override;            ...    };    int UnbufferedFD::overflow(int c)    {        if (c != EOF)        {            if (write(d_fd, &c, 1) != 1)                return EOF;        }        return c;    }

The argument received byoverflow is either written to the filedescriptor (and returned fromoverflow), orEOF is returned.

This simple function does not use output buffering. For various reasons,using a buffer is usually a good idea (see also the next section).

When output buffering is used, theoverflow member is a bit morecomplex as it is only called when the buffer is full. Once the buffer is full,wefirst have to flush the buffer. Flushing the buffer is theresponsibility of the (virtual) functionstreambuf::sync. Sincesync is a virtual function, classes derived fromstreambuf mayredefinesync to flush a bufferstreambuf itself doesn't know about.

Overridingsync and using it inoverflow is not all that has to bedone. When the object of the class defining the buffer reaches the end of itslifetime the buffer may be only partially full. In that situation the buffermust also be flushed. This is easily done by simply callingsync from theclass's destructor.

Now that we've considered the consequences of using an output buffer,we're almost ready to design our derived class. Several more featuresare added as well, though:

First, we should allow the user of the class to specify the size ofthe output buffer.
Second, it should be possible to construct an object of our classbefore the file descriptor is actually known. Later, in section26.2we'll encounter a situation where this feature is actually used.

To save some space in theC++ Annotations, the successful completion of thefunctions designed here is not checked in the example code. In `real life'implementations these checks should of course not be omitted. Our classOFdnStreambuf has the following characteristics:

Its member functions use low-level functions operating on filedescriptors. So apart fromstreambuf the<unistd.h> header file musthave been read by the compiler before its member functions can be compiled.
The class is derived fromstd::streambuf.

It defines three data members. These data members keep track of,respectively, the size of the buffer, the file descriptor, and the bufferitself. Here is the full class interface

    class OFdnStreambuf: public std::streambuf    {        int     d_fd = -1;        size_t  d_bufsize = 0;        char   *d_buffer = 0;        public:            OFdnStreambuf() = default;            OFdnStreambuf(int fd, size_t bufsize = 1);            ~OFdnStreambuf() override;            void open(int fd, size_t bufsize = 1);        private:            int sync() override;            int overflow(int c) override;    };

Its default constructor merely initializes the buffer to 0. Slightlymore interesting is its constructor expecting a file descriptor and a buffersize. This constructor passes its arguments on to the class'sopenmember (see below). Here are the constructors:
```
    inline OFdnStreambuf::OFdnStreambuf(int fd, size_t bufsize)    {        open(fd, bufsize);    }
```
The destructor callssync, flushing any characters stored in theoutput buffer to the device. In implementations not using a buffer thedestructor can be given a default implementation:
```
    inline OFdnStreambuf::~OFdnStreambuf()    {        if (d_buffer)        {            sync();            delete[] d_buffer;        }    }
```
This implementation does not close the device. It is left as an exerciseto the reader to change this class in such a way that the device is optionallyclosed (or optionally remains open). This approach was adopted by, e.g., theBobcat library. See also section26.1.2.2.

Theopen member initializes the buffer. Usingstreambuf::setp, the begin and end points of the buffer aredefined. This is used by thestreambuf base class to initializestreambuf::pbase,streambuf::pptr, andstreambuf::epptr:

    inline void OFdnStreambuf::open(int fd, size_t bufsize)    {        d_fd = fd;        d_bufsize = bufsize == 0 ? 1 : bufsize;        delete[] d_buffer;        d_buffer = new char[d_bufsize];        setp(d_buffer, d_buffer + d_bufsize);    }

The membersync flushes the as yet unflushed content of thebuffer to the device. After the flush the buffer is reinitialized usingsetp. After successfully flushing the buffersync returns 0:

    inline int OFdnStreambuf::sync()    {        if (pptr() > pbase())        {            write(d_fd, d_buffer, pptr() - pbase());            setp(d_buffer, d_buffer + d_bufsize);        }        return 0;    }

The memberstreambuf::overflow is alsooverridden. Since this member is called from thestreambuf base class whenthe buffer is full it should first callsync to flush the buffer to thedevice. Next it should write the characterc to the (now empty)buffer. The characterc should be written usingpptr andstreambuf::pbump. Entering a character into the buffer should beimplemented using availablestreambuf member functions, rather than `byhand' as doing so might invalidatestreambuf's internal bookkeeping. Hereisoverflow's implementation:
```
    inline int OFdnStreambuf::overflow(int c)    {        sync();        if (c != EOF)        {            *pptr() = c;            pbump(1);        }        return c;    }
```

The next program uses theOFfdStreambuf class to copy its standardinput to file descriptorSTDOUT_FILENO, which is the symbolic name of thefile descriptor used for the standard output:

    #include <string>    #include <iostream>    #include <istream>    #include "fdout.h"    using namespace std;    int main(int argc, char **argv)    {        OFdnStreambuf   fds(STDOUT_FILENO, 500);        ostream         os(&fds);        switch (argc)        {            case 1:                for (string  s; getline(cin, s); )                    os << s << '\n';                os << "COPIED cin LINE BY LINE\n";            break;            case 2:                cin >> os.rdbuf();      // Alternatively, use:  cin >> &fds;                os << "COPIED cin BY EXTRACTING TO os.rdbuf()\n";            break;            case 3:                os << cin.rdbuf();                os << "COPIED cin BY INSERTING cin.rdbuf() into os\n";            break;        }    }

26.1.2: Classes for input operations

When classes for input operation are derived fromstd::streambuf, they should be provided with an input bufferof at least one character. The one-character input buffer allows for the useof the member functionsistream::putback oristream::ungetc. Strictlyspeaking it is not necessary to implement a buffer in classes derived fromstreambuf. But using buffers in these classes is strongly advised. Theirimplementation is very simple and straightforward and the applicability ofsuch classes is greatly improved. Therefore, all our classes derived from theclassstreambuf define a buffer ofat least one character.

26.1.2.1: Using a one-character buffer

When deriving a class (e.g.,IFdStreambuf) fromstreambuf using abuffer of one character, at least its memberstreambuf::underflow should be overridden, as this member eventuallyreceives all requests for input. The memberstreambuf::setg is used to inform thestreambuf base class of thesize and location of the input buffer, so that it is able to set up its inputbuffer pointers accordingly. This ensures thatstreambuf::eback,streambuf::gptr, andstreambuf::egptr return correct values.

The classIFdStreambuf is designed like this:

Its member functions use low-level functions operating on filedescriptors. Therefore, in addition tostreambuf, the<unistd.h>header file must have been read by the compiler before its member functionscan be compiled.
Like most classes designed for input operations, this class is derivedfromstd::streambuf as well.

The class defines two data members, one of them a fixed-sized onecharacter buffer. The data members are defined asprotected data membersso that derived classes (e.g., see section26.1.2.3) can access them. Hereis the full class interface:

    class IFdStreambuf: public std::streambuf    {        protected:            int     d_fd;            char    d_buffer[1];        public:            IFdStreambuf(int fd);        private:            int underflow() override;    };

The constructor initializes the buffer. However, the initializationsetsgptr's return value equal toegptr's return value. Thisimplies that the buffer is empty sounderflow is immediately calledto fill the buffer:
```
    inline IFdStreambuf::IFdStreambuf(int fd)    :        d_fd(fd)    {        setg(d_buffer, d_buffer + 1, d_buffer + 1);    }
```
Finallyunderflow is overridden. The buffer is refilled byreading from the file descriptor. If this fails (for whatever reason),EOF is returned. More sophisticated implementations could act moreintelligently here, of course. If the buffer could be refilled,setg iscalled to set upstreambuf's buffer pointers correctly:
```
    inline int IFdStreambuf::underflow()    {        if (read(d_fd, d_buffer, 1) <= 0)            return EOF;        setg(d_buffer, d_buffer, d_buffer + 1);        return static_cast<unsigned char>(*gptr());    }
```

The followingmain function shows howIFdStreambuf can be used:

    int main()    {        IFdStreambuf fds(STDIN_FILENO);        istream      is(&fds);        cout << is.rdbuf();    }

26.1.2.2: Using an n-character buffer

How complex would things get if we decided to use a buffer ofsubstantial size? Not that complex. The following class allows us to specifythe size of a buffer, but apart from that it is basically the same class asIFdStreambuf developed in the previous section. To make things a bit moreinteresting, in the classIFdNStreambuf developed here, the memberstreambuf::xsgetn is also overridden, to optimize reading aseries of characters. Also a default constructor is provided that can be usedin combination with theopen member to construct anistream objectbefore the file descriptor becomes available. In that case, once thedescriptor becomes available, theopen member can be used to initiatethe object's buffer. Later, in section26.2, we'll encounter such asituation.

To save some space, the success of various calls was not checked. In `reallife' implementations, these checks should of course not be omitted. TheclassIFdNStreambuf has the following characteristics:

Its member functions use low-level functions operating on filedescriptors. So apart fromstreambuf the<unistd.h> header file musthave been read by the compiler before its member functions can be compiled.
As usual, it is derived fromstd::streambuf.

Like the classIFdStreambuf (section26.1.2.1), its datamembers are protected. Since the buffer's size is configurable, this size iskept in a dedicated data member,d_bufsize:

    class IFdNStreambuf: public std::streambuf    {        protected:            int         d_fd = -1;            size_t      d_bufsize = 0;            char*       d_buffer = 0;        public:            IFdNStreambuf() = default;            IFdNStreambuf(int fd, size_t bufsize = 1);            ~IFdNStreambuf() override;            void open(int fd, size_t bufsize = 1);        private:            int underflow() override;            std::streamsize xsgetn(char *dest, std::streamsize n) override;    };

The default constructor does not allocate a buffer. It can be usedto construct an object before the file descriptor becomes known. A secondconstructor simply passes its arguments toopen.Open will theninitialize the object so that it can actually be used:
```
    inline IFdNStreambuf::IFdNStreambuf(int fd, size_t bufsize)    {        open(fd, bufsize);    }
```
Once the object has been initialized byopen, its destructor willboth delete the object's buffer and use the file descriptor to close thedevice:
```
    IFdNStreambuf::~IFdNStreambuf()    {        if (d_bufsize)        {            close(d_fd);            delete[] d_buffer;        }    }
```
Even though the device is closed in the above implementation this may notalways be desirable. In cases where the open file descriptor is alreadyavailable the intention may be to use that descriptor repeatedly, each timeusing a newly constructedIFdNStreambuf object. It is left as an exerciseto the reader to change this class in such a way that the device mayoptionally be closed. This approach was followed in, e.g., theBobcat library.
Theopen member simply allocates the object's buffer. It isassumed that the calling program has already opened the device. Once thebuffer has been allocated, the base class membersetg is used to ensurethatstreambuf::ebackstreambuf::gptr andstreambuf::egptr return correct values:
```
    void IFdNStreambuf::open(int fd, size_t bufsize)    {        d_fd = fd;        d_bufsize =  bufsize == 0 ? 1 : bufsize;        delete[] d_buffer;        d_buffer = new char[d_bufsize];        setg(d_buffer, d_buffer + d_bufsize, d_buffer + d_bufsize);    }
```

The overridden memberunderflow is implemented almostidentically toIFdStreambuf's (section26.1.2.1) member. The onlydifference is that the current class supports buffers of largersizes. Therefore, more characters (up tod_bufsize) may be read from thedevice at once:

    int IFdNStreambuf::underflow()    {        if (gptr() < egptr())            return *gptr();        int nread = read(d_fd, d_buffer, d_bufsize);        if (nread <= 0)            return EOF;        setg(d_buffer, d_buffer, d_buffer + nread);        return static_cast<unsigned char>(*gptr());    }

Finallyxsgetn is overridden. In a loop,n is reduced until0, at which point the function terminates. Alternatively, the member returnsifunderflow fails to obtain more characters. This member optimizes thereading of series of characters. Instead of callingstreambuf::sbumpcn times, a block ofavail characters is copiedto the destination, usingstreambuf::gbump to consumeavailcharacters from the buffer using one function call:

    std::streamsize IFdNStreambuf::xsgetn(char *dest, std::streamsize n)    {        int nread = 0;        while (n)        {            if (!in_avail())            {                if (underflow() == EOF)                    break;            }            int avail = in_avail();            if (avail > n)                avail = n;            memcpy(dest + nread, gptr(), avail);            gbump(avail);            nread += avail;            n -= avail;        }        return nread;    }

The member functionxsgetn is called bystreambuf::sgetn,which is astreambuf member. Here is an example illustrating the use ofthis member function with anIFdNStreambuf object:

    #include <unistd.h>    #include <iostream>    #include <istream>    #include "ifdnbuf.h"    using namespace std;    int main()    {                                    // internally: 30 char buffer        IFdNStreambuf fds(STDIN_FILENO, 30);        char buf[80];               // main() reads blocks of 80                                    // chars        while (true)        {            size_t n = fds.sgetn(buf, 80);            if (n == 0)                break;            cout.write(buf, n);        }    }

26.1.2.3: Seeking positions in `streambuf' objects

When devices supportseek operations, classes derived fromstd::streambuf should override the membersstreambuf::seekoff andstreambuf::seekpos. The classIFdSeek, developed in this section, can be used to read information fromdevices supporting seek operations. The classIFdSeek was derived fromIFdStreambuf, so it uses a character buffer of just one character. Thefacilities to perform seek operations, which are added to our new classIFdSeek, ensure that the input buffer is reset when a seek operation isrequested. The class could also be derived from the classIFdNStreambuf. In that case the arguments to reset the input buffermust be adapted so that its second and third parameters point beyond theavailable input buffer. Let's have a look at the characteristics ofIFdSeek:

As mentioned,IFdSeek is derived fromIFdStreambuf. Like thelatter class,IFdSeek's member functions use facilities declared inunistd.h. So, the header file<unistd.h> must have been read by thecompiler before it can compile the class's members functions. To reduce theamount of typing when specifying types and constants fromstreambuf andstd::ios, several using-declarations are defined by the class.These using-declarations refer to types that are defined in the header file<ios>, which must therefore also be included before the compiler cancompileIFdSeek's class interface:
```
    class IFdSeek: public IFdStreambuf    {        using pos_type = std::streambuf::pos_type;        using off_type = std::streambuf::off_type;        using seekdir = std::ios::seekdir;        using openmode = std::ios::openmode;        public:            IFdSeek(int fd);        private:            pos_type seekoff(off_type offset, seekdir dir, openmode);            pos_type seekpos(pos_type offset, openmode mode);    };
```
The class has a very basic interface. Its (only)constructor expects the device'sfile descriptor. It has no special tasksto perform and just calls itsbase class constructor:
```
    inline IFdSeek::IFdSeek(int fd)    :        IFdStreambuf(fd)    {}
```

The memberseek_off is responsible for performing the actualseek operations. It callslseek to seek a new position in a device whosefile descriptor is known. If seeking succeeds,setg is called to definean already empty buffer, so that the base class'sunderflow memberrefills the buffer at the next input request.

    IFdSeek::pos_type IFdSeek::seekoff(off_type off, seekdir dir, openmode)    {        pos_type pos =            lseek            (                d_fd, off,                (dir ==  std::ios::beg) ? SEEK_SET :                (dir ==  std::ios::cur) ? SEEK_CUR :                                          SEEK_END            );        if (pos < 0)            return -1;        setg(d_buffer, d_buffer + 1, d_buffer + 1);        return pos;    }

Finally, the companion functionseekpos is overridden as well:it is actually defined as a call toseekoff:

    inline IFdSeek::pos_type IFdSeek::seekpos(pos_type off, openmode mode)    {        return seekoff(off, std::ios::beg, mode);    }

Here is an example of a program using the classIFdSeek. Ifthis program is given its own source file using input redirection thenseeking is supported (and with the exception of the first line, every otherline is shown twice):

    #include "fdinseek.h"    #include <string>    #include <iostream>    #include <istream>    #include <iomanip>    using namespace std;    int main()    {        IFdSeek fds(0);        istream is(&fds);        string  s;        while (true)        {            if (!getline(is, s))                break;            streampos pos = is.tellg();            cout << setw(5) << pos << ": `" << s << "'\n";            if (!getline(is, s))                break;            streampos pos2 = is.tellg();            cout << setw(5) << pos2 << ": `" << s << "'\n";            if (!is.seekg(pos))            {                cout << "Seek failed\n";                break;            }        }    }

26.1.2.4: Multiple `unget' calls in `streambuf' objects

Streambuf classes and classes derived fromstreambuf should supportat least ungetting the last read character. Special care must be takenwhenseries ofunget calls must be supported. In this section theconstruction of a class supporting a configurable number ofistream::ungetoristream::putback calls is discussed.

Support for multiple (say `n')unget calls is implemented byreserving an initial section of the input buffer, which is gradually filled upto contain the lastn characters read. The class is implemented asfollows:

Once again, the class is derived fromstd::streambuf. Itdefines several data members, allowing the class to perform the bookkeepingrequired to maintain an unget-buffer of a configurable size:

    class FdUnget: public std::streambuf    {        int     d_fd;        size_t  d_bufsize;        size_t  d_reserved;        char   *d_buffer;        char   *d_base;        public:            FdUnget(int fd, size_t bufsz, size_t unget);            ~FdUnget() override;        private:            int underflow() override;    };

The class's constructor expects afile descriptor, a buffer sizeand the number of characters that can be ungot or pushed back as itsarguments. This number determines the size of areserved area,defined as the firstd_reserved bytes of the class's input buffer.
- The input buffer will always be at least one byte larger thand_reserved. So, a certain number of bytes may be read. Onced_reservedbytes have been read at mostd_reserved bytes can be ungot.
- Next, the starting point for reading operations is configured. Itis calledd_base, pointing to a locationd_reserved bytes beyond thelocation represented byd_buffer. This is always the location where bufferrefills start.
- Now that the buffer has been constructed, we're ready to definestreambuf's buffer pointers usingsetg. As no characters have beenread yet, all pointers are set to point tod_base. Ifunget iscalled at this point, no characters are available, andunget(correctly) fails.
- Eventually, the refill buffer's size is determined as thenumber of allocated bytes minus the size of the reserved area.
Here is the class's constructor:
```
    FdUnget::FdUnget(int fd, size_t bufsz, size_t unget)    :        d_fd(fd),        d_reserved(unget)    {        size_t allocate =                bufsz > d_reserved ?                    bufsz                :                    d_reserved + 1;        d_buffer = new char[allocate];        d_base = d_buffer + d_reserved;        setg(d_base, d_base, d_base);        d_bufsize = allocate - d_reserved;    }
```
The class's destructor simply returns the memory allocated for thebuffer to the common pool:
```
    inline FdUnget::~FdUnget()    {        delete[] d_buffer;    }
```
Finally,underflow is overridden as follows:
- Firstunderflow determines the number of characters thatcould potentially be ungot. If that number of characters are ungot, the inputbuffer is exhausted. So this value may be any value between 0 (the initialstate) or the input buffer's size (when the reserved area has been filled upcompletely, and all current characters in the remaining section of the bufferhave also been read);
- Next the number of bytes to move into the reserved area iscomputed. This number is at mostd_reserved, but it is set equal to theactual number of characters that can be ungot if this value is smaller;
- Now that the number of characters to move into the reserved areais known, this number of characters is moved from the input buffer's end tothe area immediately befored_base;
- Then the buffer is refilled. This all is standard, but noticethat reading starts fromd_base and not fromd_buffer;
- Finally,streambuf's read buffer pointers are set up.Eback is set tomove locations befored_base, thusdefining the guaranteed unget-area,gptr is set tod_base, since that's the location of thefirst read character after a refill, andegptr is set just beyond the location of the last characterread into the buffer.
Here isunderflow's implementation:
```
    int FdUnget::underflow()    {        size_t ungetsize = gptr() - eback();        size_t move = std::min(ungetsize, d_reserved);        memcpy(d_base - move, egptr() - move, move);        int nread = read(d_fd, d_base, d_bufsize);        if (nread <= 0)       // none read -> return EOF            return EOF;        setg(d_base - move, d_base, d_base + nread);        return static_cast<unsigned char>(*gptr());    }
```

An example using FdUnget

The next example program illustrates the use of the classFdUnget. Itreads at most 10 characters from the standard input, stopping atEOF. A guaranteed unget-buffer of 2 characters is defined in a bufferholding 3 characters. Just before reading a character, the program tries tounget at most 6 characters. This is, of course, not possible; but the programnicely ungets as many characters as possible, considering the actualnumber of characters read:

    #include "fdunget.h"    #include <string>    #include <iostream>    #include <istream>    using namespace std;    int main()    {        FdUnget fds(0, 3, 2);        istream is(&fds);        char    c;        for (int idx = 0; idx < 10; ++idx)        {            cout << "after reading " << idx << " characters:\n";            for (int ug = 0; ug <= 6; ++ug)            {                if (!is.unget())                {                    cout                    << "\tunget failed at attempt " << (ug + 1) << "\n"                    << "\trereading: '";                    is.clear();                    while (ug--)                    {                        is.get(c);                        cout << c;                    }                    cout << "'\n";                    break;                }            }            if (!is.get(c))            {                cout << " reached\n";                break;            }            cout << "Next character: " << c << '\n';        }    }    /*        Generated output after 'echo abcde | program':        after reading 0 characters:                unget failed at attempt 1                rereading: ''        Next character: a        after reading 1 characters:                unget failed at attempt 2                rereading: 'a'        Next character: b        after reading 2 characters:                unget failed at attempt 3                rereading: 'ab'        Next character: c        after reading 3 characters:                unget failed at attempt 4                rereading: 'abc'        Next character: d        after reading 4 characters:                unget failed at attempt 4                rereading: 'bcd'        Next character: e        after reading 5 characters:                unget failed at attempt 4                rereading: 'cde'        Next character:        after reading 6 characters:                unget failed at attempt 4                rereading: 'de        '         reached    */

26.1.3: Fixed-sized field extraction from istream objects

Usually when extracting information fromistream objectsoperator>>, thestandard extraction operator is perfectly suited for the task as in mostcases the extracted fields are white-space (or otherwise clearly) separatedfrom each other. But this does not hold true in all situations. For example,when a web-form is posted to some processing script or program, the receivingprogram may receive the form field's values asurl-encodedcharacters: letters and digits are sent unaltered, blanks are sent as+characters, and all other characters start with% followed by thecharacter'sascii-value represented by its two digit hexadecimal value.

When decoding url-encoded information, simple hexadecimal extraction won'twork, as that extracts as many hexadecimal characters as available,instead of just two. Since the lettersa-f` and0-9 are legalhexadecimal characters, a text likeMy name is `Ed', url-encoded as

    My+name+is+%60Ed%27

results in the extraction of the hexadecimal values60ed and27,instead of60 and27. The nameEd disappears from view, which isclearly not what we want.

In this case, having seen the%, we could extract 2 characters, putthem in anistringstream object, and extract the hexadecimal value fromtheistringstream object. A bit cumbersome, but doable. Other approachesare possible as well.

The classFistream forfixed-sized field istream definesanistream class supporting both fixed-sized field extractions andblank-delimited extractions (as well as unformattedread calls). Theclass may be initialized as awrapper around an existingistream, orit can be initialized using the name of an existing file. The class is derivedfromistream, allowing all extractions and operations supported byistreams in general.Fistream defines the following data members:

d_filebuf: a filebuffer used whenFistream reads its information from a named (existing) file. Since the filebuffer is only needed in that case, and since it must be allocated dynamically, it is defined as aunique_ptr<filebuf> object.
d_streambuf: a pointer toFistream'sstreambuf. It points tod_filebuf whenFistream opens a file by name. When an existingistream is used to construct anFistream, it points to the existingistream'sstreambuf.
d_iss: anistringstream object used for the fixed field extractions.
d_width: asize_t indicating the width of the field to extract. If 0 no fixed field extractions is used, but information is extracted from theistream base class object using standard extractions.

Here is the initial section ofFistream's class interface:

    class Fistream: public std::istream    {        std::unique_ptr<std::filebuf> d_filebuf;        std::streambuf *d_streambuf;        std::istringstream d_iss;        size_t d_width;

As stated,Fistream objects can be constructed from either afilename or an existingistream object. The class interface thereforedeclares two constructors:

            Fistream(std::istream &stream);            Fistream(char const *name,                std::ios::openmode mode = std::ios::in);

When anFistream object is constructed using an existingistreamobject, theFistream'sistream part simply uses thestream'sstreambuf object:

Fistream::Fistream(istream &stream):    istream(stream.rdbuf()),    d_streambuf(rdbuf()),    d_width(0){}

When anfstream object is constructed using a filename, theistream base initializer is given a newfilebuf object to be used asitsstreambuf. Since the class's data members are not initialized beforethe class's base class has been constructed,d_filebuf can only beinitialized thereafter. By then, thefilebuf is only available asrdbuf, returning astreambuf. However, as it is actually afilebuf, astatic_cast is used to cast thestreambuf pointerreturned byrdbuf toa filebuf *, sod_filebuf can be initialized:

Fistream::Fistream(char const *name, ios::openmode mode):    istream(new filebuf()),    d_filebuf(static_cast<filebuf *>(rdbuf())),    d_streambuf(d_filebuf.get()),    d_width(0){    d_filebuf->open(name, mode);}

26.1.3.1: Member functions and example

There is only one additional public member:setField(field const&). This member defines the size of the next field to extract. Itsparameter is a reference to afield class, amanipulator classdefining the width of the next field.

Since afield & is mentioned inFistream's interface,fieldmust be declared beforeFistream's interface starts. The classfielditself is simple and declaresFistream as its friend. It has two datamembers:d_width specifies the width of the next field, andd_newWidthwhich is set totrue ifd_width's value should actually be used. Ifd_newWidth is false,Fistream returns to its standard extractionmode. The classfield has two constructors: a defaultconstructor, settingd_newWidth tofalse, and a second constructorexpecting the width of the next field to extract as its value. Here is theclassfield:

    class field    {        friend class Fistream;        size_t d_width;        bool     d_newWidth;        public:            field(size_t width);            field();    };    inline field::field(size_t width)    :        d_width(width),        d_newWidth(true)    {}    inline field::field()    :        d_newWidth(false)    {}

Sincefield declaresFistream as its friend,setField mayinspectfield's members directly.

Time to return tosetField. This function expects a reference to afield object, initialized in one of three different ways:

field(): WhensetField's argument is afield object constructed by its default constructor the next extraction will use the same field width as the previous extraction.
field(0): When thisfield object is used assetField's argument, fixed-sized field extraction stops, and theFistream acts like any standardistream object again.
field(x): When thefield object itself is initialized by a non-zero size_t valuex, then the next field width isx characters wide. The preparation of such a field is left tosetBuffer,Fistream's only private member.

Here issetField's implementation:

std::istream &Fistream::setField(field const &params){    if (params.d_newWidth)                  // new field size requested        d_width = params.d_width;           // set new width    if (!d_width)                           // no width?        rdbuf(d_streambuf);                 // return to the old buffer    else        setBuffer();                        // define the extraction buffer    return *this;}

The private membersetBuffer defines a buffer ofd_width + 1characters and usesread to fill the buffer withd_widthcharacters. The buffer is an NTBS. This buffer is used to initialize thed_iss member.Fistream'srdbuf member is used to extract thed_str's data via theFistream object itself:

void Fistream::setBuffer(){    char *buffer = new char[d_width + 1];    rdbuf(d_streambuf);                         // use istream's buffer to    buffer[read(buffer, d_width).gcount()] = 0; // read d_width chars,                                                // terminated by a 0-byte    d_iss.str(buffer);    delete[] buffer;    rdbuf(d_iss.rdbuf());                       // switch buffers}

AlthoughsetField could be used to configureFistream to use ornot to use fixed-sized field extraction, using manipulators is probablypreferable. To allowfield objects to be used as manipulators anoverloaded extraction operator was defined. This extraction operator acceptsistream & and afield const & objects. Using this extractionoperator, statements like

fis >> field(2) >> x >> field(0);

are possible (assumingfis is aFistream object). Here is theoverloadedoperator>>, as well as its declaration:

istream &std::operator>>(istream &str, field const &params){    return static_cast<Fistream *>(&str)->setField(params);}

Declaration:

namespace std{    istream &operator>>(istream &str, FBB::field const &params);}

Finally, an example. The following program uses aFistream object tourl-decode url-encoded information appearing at its standard input:

    int main()    {        Fistream fis(cin);        fis >> hex;        while (true)        {            size_t x;            switch (x = fis.get())            {                case '\n':                    cout << '\n';                break;                case '+':                    cout << ' ';                break;                case '%':                    fis >> field(2) >> x >> field(0);                // FALLING THROUGH                default:                    cout << static_cast<char>(x);                break;                case EOF:                return 0;            }        }    }    /*        Generated output after:            echo My+name+is+%60Ed%27 | a.out        My name is `Ed'    */

26.2: The `fork' system call

From theC programming language thefork system call is wellknown. When a program needs to start a new process,system can be used.The functionsystem requires the program to wait for thechild process to terminate. The more general way to spawn subprocesses is to usefork.

In this section we investigate howC++ can be used to wrap classes arounda complex system call likefork. Much of what follows in this sectiondirectly applies to the Unix operating system, and the discussion thereforefocuses on that operating system. Other systems usually providecomparable facilities. What follows is closely related to theTemplate Design Pattern (cf.Gamma et al. (1995)Design Patterns, Addison-Wesley)

Whenfork is called, the current program is duplicated in memory, thuscreating a new process. Following this duplication both processes continuetheir execution just below thefork system call. The two processes mayinspectfork's return value: the return value in theoriginal process (called theparent process) differs from the returnvalue in the newly created process (called thechild process):

In theparent processfork returns theprocess ID of the(child) process that was created by thefork system call. This is apositive integer value.
In thechild processfork returns 0.
Iffork fails, -1 is returned.

26.2.1: A basic Fork class

A basicFork class should hide all bookkeeping details of a systemcall likefork from its users. The classFork developed here doesjust that. The class itself only ensures the proper execution of theforksystem call. Normally,fork is called to start a child process, usuallyboiling down to the execution of a separate process. This child process mayexpect input at its standard input stream and/or may generate output to itsstandard output and/or standard error streams.Fork does not know allthis, and does not have to know what the child process will do.Forkobjects should be able to start their child processes.

Fork's constructor cannot know what actions its childprocess should perform. Similarly, it cannot know what actions the parentprocess should perform. For these kind of situations, thetemplate method design pattern was developed. According to Gamma c.s., thetemplate method designpattern

``Define(s) the skeleton of an algorithm in an operation, deferring some steps to subclasses. [The] Template Method (design pattern) lets subclasses redefine certain steps of an algorithm, without changing the algorithm's structure.''

This design pattern allows us to define anabstract base class already providing the essential steps related to thefork system call,deferring the implementation of other parts of thefork system call tosubclasses.

TheFork abstract base class has the following characteristics:

It defines a data memberd_pid. In the parent process this datamember contains the child'sprocess id and in the child process it hasthe value 0. Its public interface declares only two members:

afork member function, responsible for the actual forking (i.e., it creates the (new) child process);
avirtual destructor~Fork (having an empty body).

Here isFork's interface:

    class Fork    {        int d_pid;        public:            virtual ~Fork();            void fork();        protected:            int pid() const;            int waitForChild();                 // returns the status        private:            virtual void childRedirections();            virtual void parentRedirections();            virtual void childProcess() = 0;    // pure virtual members            virtual void parentProcess() = 0;    };

All other non-virtual member functions are declared in the class'sprotected section and can thusonly be used by derived classes. Theyare:
- pid(): The member functionpid allows derived classes to access the systemfork's return value:
```
    inline int Fork::pid() const    {        return d_pid;    }
```
- waitForChild(): The memberint waitForChild can be called by parent processes to wait for the completion of their child processes (as discussed below). This member is declared in the class interface. Its implementation is:
```
    #include "fork.ih"    int Fork::waitForChild()    {        int status;        waitpid(d_pid, &status, 0);        return WEXITSTATUS(status);    }
```
  This simple implementation returns the child'sexit status to the parent. The called system functionwaitpidblocks until the child terminates.
Whenfork system calls are used,parent processes andchild processes must always be distinguished. The main distinction between these processes is thatd_pid becomes the child's process-id in the parent process, whiled_pid becomes 0 in the child process itself. Since these two processes must always be distinguished (and present), their implementation by classes derived fromFork is enforced byFork's interface: the memberschildProcess, defining the child process' actions andparentProcess, defining the parent process' actions were defined as pure virtual functions.
communication between parent- and child processes may use standard streams or other facilities, likepipes (cf. section26.2.5). To facilitate this inter-process communication, derived classesmay implement:
- childRedirections(): this member should be overridden by derived classes if any standard stream (cin, cout,) orcerr must be redirected in thechild process (cf. section26.2.3). By default it has an empty implementation;
- parentRedirections(): this member should be overridden by derived classes if any standard stream (cin, cout,) orcerr must be redirected in theparent process. By default it has an empty implementation.
Redirection of the standard streams is necessary if parent andchild processes must communicate with each other via the standard streams.Here are their default definitions. Since these functions are virtualfunctions they should not be implemented inline, but in their own source file:
```
    void Fork::childRedirections()    {}    void Fork::parentRedirections()    {}
```

26.2.2: Parents and Children

The member functionfork calls the system functionfork(Caution: since the system functionfork is called by a memberfunction having the same name, the:: scope resolution operator must beused to prevent a recursive call of the member function itself).The function::fork's return value determines whetherparentProcessorchildProcess is called. Maybe redirection isnecessary.Fork::fork's implementation callschildRedirectionsjust before callingchildProcess, andparentRedirections justbefore callingparentProcess:

    #include "fork.ih"    void Fork::fork()    {        if ((d_pid = ::fork()) < 0)            throw "Fork::fork() failed";        if (d_pid == 0)                 // childprocess has pid == 0        {            childRedirections();            childProcess();            exit(1);                    // we shouldn't come here:        }                               // childProcess() should exit        parentRedirections();        parentProcess();    }

Infork.cc the class'sinternal header filefork.ih is included. This header file takes care of the inclusion of thenecessary system header files, as well as the inclusion offork.hitself. Its implementation is:

    #include "fork.h"    #include <cstdlib>    #include <unistd.h>    #include <sys/types.h>    #include <sys/wait.h>

Child processes should not return: once they have completed their tasks,they should terminate. This happens automatically when the child processperforms a call to a member of theexec... family, but if the childitself remains active, then it must make sure that it terminates properly. Achild process normally usesexit to terminate itself, but note thatexit prevents the activation of destructors of objects defined at the same or more superficial nesting levels than the level atwhichexit is called. Destructors of globally defined objectsareactivated whenexit is used. When usingexit to terminatechildProcess, it should either itself call a support member functiondefining all nested objects it needs, or it should define all its objects in acompound statement (e.g., using athrow block) callingexit beyondthe compound statement.

Parent processes should normally wait for their children to complete.Terminating child processes inform their parents that they are about toterminate by sending asignal that should be caught by their parents. Ifchild processes terminate and their parent processes do not catch thosesignals then such child processes remain visible as so-calledzombieprocesses.

If parent processes must wait for their children to complete, they maycall the memberwaitForChild. This member returns the exit status of achild process to its parent.

There exists a situation where thechild processcontinues tolive, but theparent dies. This is a fairly natural event: parents tend todie before their children do. In our context (i.e.C++), this is called adaemon program. In a daemon the parent process dies and the child programcontinues to run as a child of the basicinit process. Again, when thechild eventually dies a signal is sent to its `step-parent'init. Thisdoes not create a zombie asinit catches the termination signals of allits (step-) children. The construction of a daemon process is very simple,given the availability of the classFork (cf. section26.2.4).

26.2.3: Redirection revisited

Earlier, in section6.6.2 streams were redirectedusing theios::rdbuf member function. By assigning thestreambuf of a stream to another stream, both stream objects access thesamestreambuf, thus implementing redirection at the level of theprogramming language itself.

This may be fine within the context of aC++ program, but once weleave that context the redirection terminates. The operating system does notknow aboutstreambuf objects. This situation is encountered, e.g., when aprogram uses asystem call to start a subprogram. The example program atthe end of this section usesC++ redirection to redirect the informationinserted intocout to a file, and then calls

    system("echo hello world")

to echo a well-known line of text. Sinceecho writes its informationto the standard output, this would be the program's redirected file if theoperating system would recognizeC++'s redirection.

But redirection doesn't happen. Instead,hello world still appears atthe program's standard output and the redirected file is left untouched. Towritehello world to the redirected file redirection must be realized atthe operating system level. Some operating systems (e.g.,Unix andfriends) provide system calls likedup anddup2 to accomplishthis. Examples of the use of these system calls are given in section26.2.5.

Here is the example of thefailing redirection at the system levelfollowingC++ redirection usingstreambuf redirection:

    #include <iostream>    #include <fstream>    #include <cstdlib>    using namespace std;    int main()    {        ofstream of("outfile");        streambuf *buf = cout.rdbuf(of.rdbuf());        cout << "To the of stream\n";        system("echo hello world");        cout << "To the of stream\n";        cout.rdbuf(buf);    }    /*        Generated output: on the file `outfile'        To the of stream        To the of stream        On standard output:        hello world    */

26.2.4: The `Daemon' program

Applications exist in which the only purpose offork is to start achild process. The parent process terminates immediately after spawning thechild process. If this happens, the child process continues to run as a childprocess ofinit, the always running first process onUnix systems. Sucha process is often called adaemon, running as abackground process.

Although the next example can easily be constructed as a plainCprogram, it was included in theC++ Annotations because it is so closelyrelated to the current discussion of theFork class. I thought aboutadding adaemon member to that class, but eventually decided against itbecause the construction of a daemon program is very simple and requires nofeatures other than those currently offered by the classFork. Here is anexample illustrating the construction of such a daemon program. Its childprocess doesn't doexit butthrow 0 which is caught by thecatchclause of the child'smain function. Doing this ensures that any objectsdefined by the child process are properly destroyed:

    #include <iostream>    #include <unistd.h>    #include "fork.h"    class Daemon: public Fork    {        void parentProcess() override       // the parent does nothing.        {}        void childProcess() override        // actions by the child        {            sleep(3);                                            // just a message...            std::cout << "Hello from the child process\n";            throw 0;                        // The child process ends        }    };    int main()    try    {        Daemon{}.fork();    }    catch(...)    {}    /*        Generated output:    The next command prompt, then after 3 seconds:    Hello from the child process    */

26.2.5: The class `Pipe'

Redirection at the system level requires the use offile descriptor, created by thepipe systemcall. When two processes want to communicate using such file descriptors, thefollowing happens:

The process constructs twoassociated file descriptors using thepipe system call. One of the file descriptors is used for writing, theother file descriptor is used for reading.
Forking takes place (i.e., the systemfork function is called),duplicating the file descriptors. Now we have four file descriptors asthe child process and the parent process both have their own copies of the twofile descriptors created bypipe.
One process (say, the parent process) uses the file descriptorsforreading. It should close its file descriptor intended forwriting.
The other process (say, the child process) uses the file descriptorsforwriting. It should therefore close its file descriptor intended forreading.
All information written by the child process to the filedescriptor intended for writing, can now be read by the parent process fromthe corresponding file descriptor intended for reading, thus establishing acommunication channel between the child and the parent process.

Though basically simple, errors may easily creep in. Functions of filedescriptors available to the two processes (child or parent) may easily getmixed up. To prevent bookkeeping errors, the bookkeeping may be properly setup once, to be hidden thereafter inside a class like thePipe classdeveloped here. Let's have a look at its characteristics (before usingfunctions likepipe anddup the compiler must have read the<unistd.h> header file):

Thepipe system call expects a pointer to twoint values,representing, respectively, the file descriptor used for reading and the filedescriptor used for writing. To avoid confusion, the classPipe defines anenum having values associating the indices of the array of 2-ints withsymbolic constants. The two file descriptors themselves are stored in a datamemberd_fd. Here is the initial section of the class's interface:
```
    class Pipe    {        enum    RW { READ, WRITE };        int     d_fd[2];
```
The class only needs a default constructor. This constructorcallspipe to create a set of associated file descriptors used foraccessing both ends of a pipe:
```
    Pipe::Pipe()    {        if (pipe(d_fd))            throw "Pipe::Pipe(): pipe() failed";    }
```
The membersreadOnly andreadFrom are used to configure thepipe's reading end. The latter function is used when using redirection. It isprovided with an alternate file descriptor to be used for reading from thepipe. Usually this alternate file descriptor isSTDIN_FILENO, allowingcin to extract information from the pipe. The former function is merelyused to configure the reading end of the pipe. It closes the matching writingend and returns a file descriptor that can be used to read from the pipe:
```
    int Pipe::readOnly()    {        close(d_fd[WRITE]);        return d_fd[READ];    }    void Pipe::readFrom(int fd)    {        readOnly();        redirect(d_fd[READ], fd);        close(d_fd[READ]);    }
```
writeOnly and twowrittenBy members are available toconfigure the writing end of a pipe. The former function is only used toconfigure the writing end of the pipe. It closes the reading end, andreturns a file descriptor that can be used for writing to the pipe:
```
    int Pipe::writeOnly()    {        close(d_fd[READ]);        return d_fd[WRITE];    }    void Pipe::writtenBy(int fd)    {        writtenBy(&fd, 1);    }    void Pipe::writtenBy(int const *fd, size_t n)    {        writeOnly();        for (size_t idx = 0; idx < n; idx++)            redirect(d_fd[WRITE], fd[idx]);        close(d_fd[WRITE]);    }
```
For the latter member two overloaded versions are available:
- writtenBy(int fd) is used to configuresingleredirection, so that a specific file descriptor (usuallySTDOUT_FILENOorSTDERR_FILENO) can be used to write to the pipe;
- (writtenBy(int const *fd, size_t n)) may be usedto configuremultiple redirection, providing an array argument containingfile descriptors. Information written to any of these file descriptors isactually written to the pipe.
The class has one private data member,redirect, used to set upredirection through thedup2 system call. This function expects two filedescriptors. The first file descriptor represents a file descriptor that canbe used to access the device's information; the second file descriptor is analternate file descriptor that may also be used to access the device'sinformation. Here isredirect's implementation:
```
    void Pipe::redirect(int d_fd, int alternateFd)    {        if (dup2(d_fd, alternateFd) < 0)            throw "Pipe: redirection failed";    }
```

Now that redirection can be configured easily using one or morePipeobjects, we'll useFork andPipe in various example programs.

26.2.6: The class `ParentSlurp'

The classParentSlurp, derived fromFork, starts a child processexecuting a stand-alone program (like/bin/ls). The (standard) output ofthe executed program is not shown on the screen but is read by the parentprocess.

For demonstration purposes the parent process writes the lines itreceives to its standard output stream, prepending linenumbers to thelines. It is attractive to redirect the parent's standardinput stream toallow the parent to read theoutput from the child process using itsstd::cininput stream. Therefore, the only pipe in the program is usedas aninput pipe for the parent, and anoutput pipe for the child.

The classParentSlurp has the following characteristics:

It is derived fromFork. Before startingParentSlurp's classinterface, the compiler must have readfork.h andpipe.h. The classonly uses one data member, aPipe objectd_pipe.
AsPipe's constructor already defines a pipe, and asd_pipeis automatically initialized byParentSlurp's default constructor, whichis implicitly provided, all additional members only exist forParentSlurp's own benefit so they can be defined in the class's (implicit)private section. Here is the class's interface:
```
    class ParentSlurp: public Fork    {        Pipe    d_pipe;        void childRedirections() override;        void parentRedirections() override;        void childProcess() override;        void parentProcess() override;    };
```
ThechildRedirections member configures the writing end of thepipe. So, all information written to the child's standard output stream endsup in the pipe. The big advantage of this is that no additional streams areneeded to write to a file descriptor:
```
    inline void ParentSlurp::childRedirections()    {        d_pipe.writtenBy(STDOUT_FILENO);    }
```
TheparentRedirections member, configures the reading end ofthe pipe. It does so by connecting the reading end of the pipe to the parent'sstandard input file descriptor (STDIN_FILENO). This allows the parent toperform extractions fromcin, not requiring any additional streams forreading.
```
    inline void ParentSlurp::parentRedirections()    {        d_pipe.readFrom(STDIN_FILENO);    }
```
ThechildProcess member only needs to concentrate on its ownactions. As it only needs to execute a program (writing information to itsstandard output), the member can consist of one single statement:
```
    inline void ParentSlurp::childProcess()    {        execl("/bin/ls", "/bin/ls", 0);    }
```
TheparentProcess member simply `slurps' the informationappearing at its standard input. Doing so, it actually reads the child'soutput. It copies the received lines to its standard output stream prefixingline numbers to them:
```
    void ParentSlurp::parentProcess()    {        std::string     line;        size_t    nr = 1;        while (getline(std::cin, line))            std::cout << nr++ << ": " << line << '\n';        waitForChild();    }
```

The following program simply constructs aParentSlurp object, andcalls itsfork() member. Its output consists of a numbered list of filesin the directory where the program is started. Note that the program alsoneeds thefork.o, pipe.o andwaitforchild.o object files (seeearlier sources):

    int main()    {        ParentSlurp{}.fork();    }    /*        Generated Output (example only, actually obtained output may differ):        1: a.out        2: bitand.h        3: bitfunctional        4: bitnot.h        5: daemon.cc        6: fdinseek.cc        7: fdinseek.h        ...    */

26.2.7: Communicating with multiple children

The next step up the ladder is the construction of a child-processmonitor. Here, the parent process is responsible for all its child processes,but it also must read their standard output. The user enters information atthe standard input of the parent process. A simplecommand language isused for this:

start: this starts a new child process. The parent returns thechild's ID (a number) to the user. The ID is thereupon be used to identify aparticular child process;
<nr> text sends ``text'' to the child process having ID<nr>;
stop <nr> terminates the child process having ID<nr>;
exit terminates the parent as well as all its child processes.

If a child process hasn't received text for some time it will complain bysending a message to the parent-process. Those messages are simply transmittedto the user by copying them to the standard output stream.

A problem with programs like our monitor is that they allowasynchronous input from multiple sources. Input may appear at thestandard input as well as at the input-sides of pipes. Also, multiple outputchannels are used. To handle situations like these, theselect systemcall was developed.

26.2.7.1: The class `Selector': interface

Theselect system call was developed to handle asynchronous I/O multiplexing. Theselect system call is used to handle, e.g., input appearingsimultaneously at a set of file descriptors.

Theselect function is rather complex, and its full discussion isbeyond theC++ Annotations' scope. By encapsulatingselect in aclassSelector, hiding its details and offering an intuitively attractiveinterface, its use is simplified. TheSelector class has thesefeatures:

Efficiency. As most ofSelect's members are very small,most members can be implemented inline. The class requires quite a few datamembers. Most of these data members belong to types that require some systemheaders to be included first:
```
    #include <limits.h>    #include <unistd.h>    #include <sys/time.h>    #include <sys/types.h>
```

The class interface can now be defined. The data typefd_set is atype designed to be used byselect and variables of this type contain theset of file descriptors on whichselect may sense someactivity. Furthermore,select allows us to fire anasynchronous alarm. To set the alarm time, the classSelectordefines atimeval data member. Other members are used for internalbookkeeping purposes. Here is the classSelector's interface:

    class Selector    {        fd_set          d_read;        fd_set          d_write;        fd_set          d_except;        fd_set          d_ret_read;        fd_set          d_ret_write;        fd_set          d_ret_except;        timeval         d_alarm;        int             d_max;        int             d_ret;        int             d_readidx;        int             d_writeidx;        int             d_exceptidx;        public:            Selector();            int exceptFd();            int nReady();            int readFd();            int wait();            int writeFd();            void addExceptFd(int fd);            void addReadFd(int fd);            void addWriteFd(int fd);            void noAlarm();            void rmExceptFd(int fd);            void rmReadFd(int fd);            void rmWriteFd(int fd);            void setAlarm(int sec, int usec = 0);        private:            int checkSet(int *index, fd_set &set);            void addFd(fd_set *set, int fd);    };

26.2.7.2: The class `Selector': implementation

Selector's member functions serve the following tasks:

Selector(): the (default) constructor. Itclears the read, write, and executefd_set variables, and switches off thealarm. Except ford_max, the remaining data members do not requirespecific initializations:
```
    Selector::Selector()    {        FD_ZERO(&d_read);        FD_ZERO(&d_write);        FD_ZERO(&d_except);        noAlarm();        d_max = 0;    }
```

int wait(): this memberblocks until thealarm timesout or until activity is sensed at any of the file descriptors monitored bytheSelector object. It throws an exception when theselect systemcall itself fails:

    int Selector::wait()    {        timeval t = d_alarm;        d_ret_read = d_read;        d_ret_write = d_write;        d_ret_except = d_except;        d_readidx = 0;        d_writeidx = 0;        d_exceptidx = 0;        d_ret = select(d_max, &d_ret_read, &d_ret_write, &d_ret_except,                       t.tv_sec == -1 && t.tv_usec == -1 ? 0 : &t);        if (d_ret < 0)            throw "Selector::wait()/select() failed";        return d_ret;    }

int nReady: this member function's return value is onlydefined whenwait has returned. In that case it returns 0 for analarm-timeout, -1 ifselect failed, and otherwise the number of filedescriptors on which activity was sensed:
```
    inline int Selector::nReady()    {        return d_ret;    }
```
int readFd(): this member function's returnvalue is also only defined afterwait has returned. Its return value is-1 if no (more) input file descriptors are available. Otherwise the next filedescriptor available for reading is returned:
```
    inline int Selector::readFd()    {        return checkSet(&d_readidx, d_ret_read);    }
```
int writeFd(): operating analogously toreadFd, itreturns the next file descriptor to which output is written. It usesd_writeidx andd_ret_read and is implemented analogously toreadFd;
int exceptFd(): operating analogously toreadFd, itreturns the next exception file descriptor on which activity was sensed. Itusesd_except_idx andd_ret_except and is implemented analogously toreadFd;
void setAlarm(int sec, int usec = 0): this memberactivatesSelect's alarm facility. At least the number of seconds to waitfor the alarm to go off must be specified. It simply assigns values tod_alarm's fields. At the nextSelect::wait call, the alarm fires(i.e.,wait returns with return value 0) once the configuredalarm-interval has passed:
```
    inline void Selector::setAlarm(int sec, int usec)    {        d_alarm.tv_sec  = sec;        d_alarm.tv_usec = usec;    }
```
void noAlarm(): this member switches off the alarm, bysimply setting the alarm interval to a very long period:
```
    inline void Selector::noAlarm()    {        setAlarm(-1, -1);    }
```
void addReadFd(int fd): this member adds afile descriptor to the set of input file descriptors monitored by theSelector object. The member functionwait returns once input isavailable at the indicated file descriptor:
```
    inline void Selector::addReadFd(int fd)    {        addFd(&d_read, fd);    }
```
void addWriteFd(int fd): this member adds a filedescriptor to the set of output file descriptors monitored by theSelectorobject. The member functionwait returns once output is available atthe indicated file descriptor. Usingd_write, it is implementedanalogously toaddReadFd;
void addExceptFd(int fd): this member adds a filedescriptor to the set of exception file descriptors to be monitored by theSelector object. The member functionwait returns once activityis sensed at the indicated file descriptor. Usingd_except, it isimplemented analogously toaddReadFd;
void rmReadFd(int fd): this member removes a filedescriptor from the set of input file descriptors monitored by theSelector object:
```
    inline void Selector::rmReadFd(int fd)    {        FD_CLR(fd, &d_read);    }
```
void rmWriteFd(int fd): this member removes a filedescriptor from the set of output file descriptors monitored by theSelector object. Usingd_write, it is implemented analogously tormReadFd;
void rmExceptFd(int fd): this member removes a filedescriptor from the set of exception file descriptors to be monitored by theSelector object. Usingd_except, it is implemented analogously tormReadFd;

The class's remaining (two) members are support members, and should not beused by non-member functions. Therefore, they are declared in the class'sprivate section:

The memberaddFd adds a file descriptor to afd_set:

    void Selector::addFd(fd_set *set, int fd)    {        FD_SET(fd, set);        if (fd >= d_max)            d_max = fd + 1;    }

The membercheckSet tests whether a file descriptor (*index)is found in afd_set:

    int Selector::checkSet(int *index, fd_set &set)    {        int &idx = *index;        while (idx < d_max && !FD_ISSET(idx, &set))            ++idx;        return idx == d_max ? -1 : idx++;    }

26.2.7.3: The class `Monitor': interface

Themonitor program uses aMonitor object doing most of thework. The classMonitor's public interface only offers a defaultconstructor and one member,run, to perform its tasks. All other memberfunctions are located in the class'sprivate section.

Monitor defines theprivate enumCommands, symbolicallylisting the various commands its input language supports, as well as severaldata members. Among the data members are aSelector object and amapusing child order numbers as its keys and pointer toChild objects (seesection26.2.7.7) as its values. Furthermore,Monitor has a static arraymembers_handler[], storing pointers to member functions handling usercommands.

A destructor should be implemented as well, but its implementation is leftas an exercise to the reader. Here isMonitor's interface, including theinterface of the nested classFind that is used to create a functionobject:

    class Monitor    {        enum Commands        {            UNKNOWN,            START,            EXIT,            STOP,            TEXT,            sizeofCommands        };        using MapIntChild = std::map<int, std::shared_ptr<Child>>;        friend class Find;        class Find        {            int     d_nr;            public:                Find(int nr);                bool operator()(MapIntChild::value_type &vt) const;        };        Selector    d_selector;        int         d_nr;        MapIntChild d_child;        static void (Monitor::*s_handler[])(int, std::string const &);        static int s_initialize;        public:            enum Done            {};            Monitor();            void run();        private:            static void killChild(MapIntChild::value_type it);            static int initialize();            Commands    next(int *value, std::string *line);            void    processInput();            void    processChild(int fd);            void    createNewChild(int, std::string const &);            void    exiting(int = 0, std::string const &msg = std::string{});            void    sendChild(int value, std::string const &line);            void    stopChild(int value, std::string const &);            void    unknown(int, std::string const &);    };

Since there's only one non-class type data member, the class's constructoris a very simple function which could be implemented inline:

    inline Monitor::Monitor()    :        d_nr(0)    {}

26.2.7.4: The class `Monitor': s_handler

The arrays_handler, storing pointers to functions needs to be initialized aswell. This can be accomplished in several ways:

Since theCommands enumeration only specifies a fairly limitedset of commands, compile-time initialization could be considered:
```
    void (Monitor::*Monitor::s_handler[])(int, string const &)  =    {        &Monitor::unknown,              // order follows enum Command's        &Monitor::createNewChild,       // elements        &Monitor::exiting,        &Monitor::stopChild,        &Monitor::sendChild,    };
```
The advantage of this is that it's simple, not requiring any run-timeeffort. The disadvantage is of course relatively complex maintenance. If forsome reasonCommands is modified,s_handler must be modified aswell. In cases like these, compile-time initialization often isasking for trouble. There is a simple alternative though.
Looking atMonitor's interface we see a static data members_initialize and a static member functioninitialize. The staticmember function handles the initialization of thes_handler array. Itexplicitly assigns the array's elements and any modification in ordering ofenum Commands' values is automatically accounted for by recompilinginitialize:
```
    void (Monitor::*Monitor::s_handler[sizeofCommands])(int, string const &);    int Monitor::initialize()    {        s_handler[UNKNOWN] =    &Monitor::unknown;        s_handler[START] =      &Monitor::createNewChild;        s_handler[EXIT] =       &Monitor::exiting;        s_handler[STOP] =       &Monitor::stopChild;        s_handler[TEXT] =       &Monitor::sendChild;        return 0;    }
```
The memberinitialize is a static member and so it can becalled to initializes_initialize, a staticint variable. Theinitialization is enforced by placing the initialization statement in thesource file of a function that is known to be executed. It could bemain,but if we'reMonitor's maintainers and only have control over the librarycontainingMonitor's code then that's not an option. In those cases thesource file containing the destructor is avery good candidate. If a classhas only one constructor and it'snot defined inline then theconstructor's source file is a good candidate as well. InMonitor'scurrent implementation the initialization statement is put inrun's sourcefile, reasoning thats_handler is only needed whenrun is used.

26.2.7.5: The class `Monitor': the member `run'

Monitor's core activities are performed byrun. Itperforms the following tasks:

Initially, theMonitor object only monitors its standardinput. The set of input file descriptors to whichd_selector listensis initialized toSTDIN_FILENO.
Then, in a loopd_selector'swait function is called.If input oncin is available, it is processed byprocessInput.Otherwise, the input has arrived from a child process. Information sent bychildren is processed byprocessChild.
To preventzombies, the child processes must catchtheirchildren's termination signals. This is discussed below.
As noted by Ben Simons (ben at mrxfx dot com)Monitor must not catchthe termination signals. Instead, the process spawning child processes hasthat responsibility (the underlying principle being that a parent process isresponsible for its child processes; a child process, in turn, is responsiblefor its own child processes).
As stated,run's source file also defines and initializess_initialize to ensure the proper initialization of thes_handlerarray.

Here isrun's implementation ands_initialize's definition:

    #include "monitor.ih"    int Monitor::s_initialize = Monitor::initialize();    void Monitor::run()    {        d_selector.addReadFd(STDIN_FILENO);        while (true)        {            cout << "? " << flush;            try            {                d_selector.wait();                int fd;                while ((fd = d_selector.readFd()) != -1)                {                    if (fd == STDIN_FILENO)                        processInput();                    else                        processChild(fd);                }                cout << "NEXT ...\n";            }            catch (char const *msg)            {                exiting(1, msg);            }        }    }

The member functionprocessInput reads the commands entered by theuser using the program's standard input stream. The member itself is rathersimple. It callsnext to obtain the next command entered by the user, andthen calls the corresponding function using the matching element of thes_handler[] array. Here are the membersprocessInput andnext:

    void Monitor::processInput()    {        string line;        int    value;        Commands cmd = next(&value, &line);        (this->*s_handler[cmd])(value, line);    }

    Monitor::Commands Monitor::next(int *value, string *line)    {        if (!getline(cin, *line))            exiting(1, "Monitor::next(): reading cin failed");        if (*line == "start")            return START;        if (*line == "exit" || *line == "quit")        {            *value = 0;            return EXIT;        }        if (line->find("stop") == 0)        {            istringstream istr(line->substr(4));            istr >> *value;            return !istr ? UNKNOWN : STOP;        }        istringstream istr(line->c_str());        istr >> *value;        if (istr)        {            getline(istr, *line);            return TEXT;        }        return UNKNOWN;    }

All other input sensed byd_select is created by childprocesses. Becaused_select'sreadFd member returns the correspondinginput file descriptor, this descriptor can be passed toprocessChild. Using aIFdStreambuf (see section26.1.2.1), itsinformation is read from an input stream. The communication protocol used hereis rather basic. For every line of input sent to a child, the child replies bysending back exactly one line of text. This line is then read byprocessChild:

    void Monitor::processChild(int fd)    {        IFdStreambuf ifdbuf(fd);        istream istr(&ifdbuf);        string line;        getline(istr, line);        cout << d_child[fd]->pid() << ": " << line << '\n';    }

The constructiond_child[fd]->pid() used in the above source deservessome special attention.Monitor defines the data membermap<int,shared_ptr<Child>> d_child. This map contains the child's order numberas its key, and a (shared) pointer to theChild object as its value. Ashared pointer is used here, rather than aChild object, since we want touse the facilities offered by the map, but don't want to copy aChildobject time and again.

26.2.7.6: The class `Monitor': example

Now thatrun's implementation has been covered, we'll concentrate onthe various commands users might enter:

When thestart command is issued, a new child process is started.A new element is added tod_child by the membercreateNewChild. Next,theChild object should start its activities, but theMonitor objectcan not wait for the child process to complete its activities, as there is nowell-defined endpoint in the near future, and the user probably wants to beable to enter more commands. Therefore, theChild process must run as adaemon. So the forked process terminates immediately, but its own childprocess continues to run (in the background). Consequently,createNewChild calls the child'sfork member. Although it is thechild'sfork function that is called, it is still the monitor programwherein thatfork function is called. So, themonitor program isduplicated byfork. Execution then continues:
- At theChild'sparentProcess in its parent process;
- At theChild'schildProcess in its child process
As theChild'sparentProcess is an empty function, returningimmediately, theChild's parent process effectively continues immediatelybelowcreateNewChild'scp->fork() statement. As the child processnever returns (see section26.2.7.7), the code belowcp->fork() is neverexecuted by theChild's child process. This is exactly as it should be.
In the parent process,createNewChild's remaining code simplyadds the file descriptor that's available for reading information from thechild to the set of input file descriptors monitored byd_select, andusesd_child to establish the association between thatfile descriptor and theChild object's address:
```
    void Monitor::createNewChild(int, string const &)    {        Child *cp = new Child{ ++d_nr };        cp->fork();        int fd = cp->readFd();        d_selector.addReadFd(fd);        d_child[fd].reset(cp);        cerr << "Child " << d_nr << " started\n";    }
```
Direct communication with the child is required for thestop <nr>and<nr> text commands. The former command terminates child process<nr>, by callingstopChild. This function locates the child processhaving the order number using an anonymous object of the classFind,nested insideMonitor. The classFind simply compares theprovidednr with the children's order number returned by theirnrmembers:
```
    inline Monitor::Find::Find(int nr)    :        d_nr(nr)    {}    inline bool Monitor::Find::operator()(MapIntChild::value_type &vt) const    {        return d_nr == vt.second->nr();    }
```
If the child process having order numbernr was found, its filedescriptor is removed fromd_selector's set of input filedescriptors. Then the child process itself is terminated by the static memberkillChild. The memberkillChild is declared as astatic memberfunction, as it is used as function argument of thefor_each genericalgorithm byexiting (see below). Here iskillChild'simplementation:
```
    void Monitor::killChild(MapIntChild::value_type it)    {        if (kill(it.second->pid(), SIGTERM))            cerr << "Couldn't kill process " << it.second->pid() << '\n';        // reap defunct child process        int status = 0;        while( waitpid( it.second->pid(), &status, WNOHANG) > -1)            ;    }
```
Having terminated the specified child process, the correspondingChildobject is destroyed and its pointer is removed fromd_child:
```
    void Monitor::stopChild(int nr, string const &)    {        auto it = find_if(d_child.begin(), d_child.end(), Find{ nr });        if (it == d_child.end())            cerr << "No child number " << nr << '\n';        else        {            d_selector.rmReadFd(it->second->readFd());            d_child.erase(it);        }    }
```

The command<nr> text sendstext to child processnrusing the member functionsendChild. This function also uses aFindobject to locate the child-process having order numbernr, and simplyinserts the text into the writing end of a pipe connected to that childprocess:

    void Monitor::sendChild(int nr, string const &line)    {        auto it = find_if(d_child.begin(), d_child.end(), Find(nr));        if (it == d_child.end())            cerr << "No child number " << nr << '\n';        else        {            OFdnStreambuf ofdn{ it->second->writeFd() };            ostream out(&ofdn);            out << line << '\n';        }    }

When users enterexit orquit the memberexiting iscalled. It terminates all child processes using thefor_each genericalgorithm (see section19.1.18) to visit all elements ofd_child. Then the program itself ends:

    void Monitor::exiting(int value, string const &msg)    {        for_each(d_child.begin(), d_child.end(), killChild);        if (msg.length())            cerr << msg << '\n';        throw value;    }

The program'smain function is simple and needs no further comment:

    int main()    try    {        Monitor{}.run();    }    catch (int exitValue)    {        return exitValue;    }

26.2.7.7: The class `Child'

When theMonitor object starts a child process, it creates an objectof the classChild. TheChild class is derived from the classFork, allowing it to operate as adaemon (as discussed in theprevious section). SinceChild is a daemon class, we know that its parentprocess must be defined as an empty function. ItschildProcess memberhas a non-empty implementation. Here are the characteristics of the classChild:

TheChild class has twoPipe data members, to handlecommunications between its own child- and parent processes. As these pipes areused by theChild's child process, their names refer to the childprocess. The child process reads fromd_in, and writes tod_out. Hereis the interface of the classChild:

    class Child: public Fork    {        Pipe                d_in;        Pipe                d_out;        int         d_parentReadFd;        int         d_parentWriteFd;        int         d_nr;        public:            Child(int nr);            ~Child() override;            int readFd() const;            int writeFd() const;            int pid() const;            int nr() const;        private:            void childRedirections() override;            void parentRedirections() override;            void childProcess() override;            void parentProcess() override;    };

TheChild's constructor simply stores its argument, achild-process order number, in its ownd_nr data member:
```
    inline Child::Child(int nr)    :        d_nr(nr)    {}
```
TheChild's child process obtains commands from its standardinput stream and writes its output to its standard output stream. Since theactual communication channels are pipes, redirections must be used. ThechildRedirections member looks like this:
```
    void Child::childRedirections()    {        d_in.readFrom(STDIN_FILENO);        d_out.writtenBy(STDOUT_FILENO);    }
```
Although the parent process performs no actions, it must configuresome redirections. Realizing that the names of the pipes indicate theirfunctions in the child process. So the parentwrites tod_in andreads fromd_out. Here isparentRedirections:
```
    void Child::parentRedirections()    {        d_parentReadFd = d_out.readOnly();        d_parentWriteFd = d_in.writeOnly();    }
```
TheChild object exists until it is destroyed by theMonitor'sstopChild member. By allowing its creator, theMonitorobject, to access the parent-side ends of the pipes, theMonitor objectcan communicate with theChild's child process via those pipe-ends. ThemembersreadFd andwriteFd allow theMonitor object to accessthese pipe-ends:
```
    inline int Child::readFd() const    {        return d_parentReadFd;    }    inline int Child::writeFd() const    {        return d_parentWriteFd;    }
```

TheChild object's child process performs two tasks:

It must reply to information appearing at its standard inputstream;
If no information has appeared within a certain time frame (theimplementations uses an interval of five seconds), then a message is writtento its standard output stream.

To implement this behavior,childProcess defines a localSelector object, addingSTDIN_FILENO to its set of monitored inputfile descriptors.

Then, in an endless loop,childProcess waits forselector.wait()to return. When the alarm goes off it sends a message to its standard output(hence, into the writing pipe). Otherwise, it echoes the messages appearingat its standard input to its standard output. Here is thechildProcessmember:

    void Child::childProcess()    {        Selector    selector;        size_t    message = 0;        selector.addReadFd(STDIN_FILENO);        selector.setAlarm(5);        while (true)        {            try            {                if (!selector.wait())       // timeout                    cout << "Child " << d_nr << ": standing by\n";                else                {                    string  line;                    getline(cin, line);                    cout << "Child " << d_nr << ":" << ++message << ": " <<                                                        line << '\n';                }            }            catch (...)            {                    cout << "Child " << d_nr << ":" << ++message << ": " <<                                "select() failed" << '\n';            }        }        exit(0);    }

Two accessors are defined allowing theMonitor object to obtaintheChild's process ID and its order number:

    inline int Child::pid() const    {        return Fork::pid();    }    inline int Child::nr() const    {        return d_nr;    }

AChild process terminates when the user enters astopcommand. When an existing child process number was entered, the correspondingChild object is removed fromMonitor'sd_child map. As a result,its destructor is called.Child's destructor callskill to terminateits child, and then waits for the child to terminate. Once its child hasterminated, the destructor has completed its work and returns, thus completingthe erasure fromd_child. The current implementation fails if the childprocess doesn't react to theSIGTERM signal. In this demonstration programthis does not happen. In `real life' more elaborate killing-procedures may berequired (e.g., usingSIGKILL in addition toSIGTERM). As discussed insection10.12 itis important to ensure the properdestruction. Here is theChild's destructor:
```
    Child::~Child()    {        if (pid())        {            cout << "Killing process " << pid() << "\n";            kill(pid(), SIGTERM);            int status;            wait(&status);        }    }
```

26.3: Adding binary operators to classes

As we've seen in section11.7 binary operators expectingconst & arguments can be implemented using a member implementing theoperation, only offering the basic exception guarantee.This latter function can in turn be implemented using the binaryassignment member. The following examples illustrated this approach for afictitious classBinary:

    class Binary    {        public:            Binary();            Binary(int value);                // copy and move constructors are available by default, or                // they can be explicitly declared and implemented.            Binary &operator+=(Binary const &other) &;    // see the text            Binary &&operator+=(Binary const &other) &&;        private:            void add(Binary const &rhs);        friend Binary operator+(Binary const &lhs, Binary const &rhs);        friend Binary operator+(Binary &&lhs, Binary const &rhs);    };

Eventually, the implementation of binary operators depends on the availabilityof the member implementing the basic binary operation, modifying the objectcalling that member (i.e.,void Binary::add(Binary const &) in theexample).

Since template functions are not instantiated before they are actually used wecan call non-existing functions from template functions that are neverinstantiated. If such a template function is never instantiated, nothinghappens; if it is (accidentally) instantiated, then the compiler generates anerror message, complaining about the missing function.

This allows us to implement all binary operators, movable and non-movable, astemplates. In the following subsections we develop the class templateBinops, prividing binary operators. A complete implementation of a classDerived illustrating how addition and insertion operators can be added toa class is provided in the fileannotations/yo/concrete/examples/binopclasses.cc in theC++ Annotations'source archive.

26.3.1: Merely using operators

In section11.7 addition operators are implemented in terms ofa support memberadd. This is less attractive when developing functiontemplates, asadd is a private member, requiring us to provide frienddeclarations for all function templates so they may access the privateaddmember.

At the end of section11.7 we saw thatadd's implementationcan be provided byoperator+=(Class const &rhs) &&. This operator maythereupon be used when implementing the remaining addition operators:

    inline Binary &operator+=(Binary const &rhs) &    {        return *this = Binary{*this} += rhs;            }    Binary operator+(Binary &&lhs, Binary const &rhs)    {        return std::move(lhs) += rhs;    }    Binary operator+(Binary const &lhs, Binary const &rhs)    {        return Binary{lhs} += rhs;    }

In this implementationadd is no longer required. The plain binaryoperators are free functions, which supposedly can easily be converted tofunction templates. E.g.,

    template <typename Binary>    Binary operator+(Binary const &lhs, Binary const &rhs)    {        return Binary{lhs} += rhs;    }

26.3.1.1: To namespace or not to namespace?

When using the function templateBinary operator+(Binary const &lhs, Binaryconst &rhs), however, we may encounter a subtle and unexpectedcomplication. Consider the following program. When run, it displays the value12, rather than 1:

    enum Values    {        ZERO,        ONE    };        template <typename Tp>    Tp operator+(Tp const &lhs, Tp const &rhs)    {        return static_cast<Tp>(12);    };        int main()    {        cout << (ZERO + ONE);       // shows 12    }

This complication can be avoided by defining the operators in their ownnamespace, but then all classes using the binary operator also have to bedefined in that namespace, which is not a very attractiverestriction. Fortunately, there is a better alternative: using the CRTP(cf. section22.12).

26.3.2: The CRTP and defining operator function templates

When deriving classes from a class templateBinops, using the CRTP theoperators are defined for arguments of the classBinops<Derived>: a baseclass receiving the derived class as its template argument.

Thus the classBinops as well as the additional operators are defined,expectingBinops<Derived> type of arguments:

    template <class Derived>    struct Binops    {        Derived &operator+=(Derived const &rhs) &;    };    template <typename Derived>    Derived operator+(Binops<Derived> const &lhs, Derived const &rhs)    {        return Derived{static_cast<Derived const &>(lhs) } += rhs;    }    // analogous implementation for Binops<Derived> &&lhs

This way, a class that derives fromBinops, and that provides anoperator+= member which is bound to an rvalue reference object, suddenlyalso provides all other binary addition operators:

    class Derived: public Binops<Derived>    {        ...        public:            ...            Derived &&operator+=(Derived const &rhs) &&    };

All, but one....

The operator that's not available is the compound addition operator,bound to an lvalue reference. As its function name is identical to the one inthe classDerived, it is not automatically visible at the user level.

Although this problem can simply be solved by providing the classDerivedwith ausing Binops<Derived>::operator+= declaration, it is not a veryattractive solution, as separate using declarations have to be provided foreach binary operator that is implemented in the classDerived.

But amuch more attractive solution exists. A beautiful out-of-the-boxsolution, completely avoiding the hidden base class operator, was proposedbyWiebe-Marten Wijnja. Wiebe-Marten conjectured thatoperator+=, boundto an lvalue reference could also very well be defined as afreefunction. In that case no inheritance is used and therefore no function hidingoccurs. Consequently, theusing directive can be avoided.

The implementation of this freeoperator+= function looks like this:

    template <class Derived>    Derived &operator+=(Binops<Derived> &lhs, Derived const &rhs)     {        Derived tmp{ Derived{ static_cast<Derived &>(lhs) } += rhs };        tmp.swap(static_cast<Derived &>(lhs));        return static_cast<Derived &>(lhs);    }

The flexibility of this design can be further augmented once we realize thatthe right-hand side operand doesn't have to be aDerived classobject. Consideroperator<<: oftentimes shifts are bit-shifts, using asize_t to specify the number of bits to shift. In fact, the type of theright-hand side operand can completely be generalized by defining a secondtemplate type parameter, which is used to specify the right-hand side'soperand type. It's up to theDerived class to specify the argument type ofitsoperator+= (or any other binary compound operator), whereafter thecompiler will deduce the types of the right-hand side operands for theremaining binary operators. Here is the final implementation of the freeoperator+= function:

    template <class Derived, typename Rhs>    Derived &operator+=(Binops<Derived> &lhs, Rhs const &rhs)     {        Derived tmp{ Derived{ static_cast<Derived &>(lhs) } += rhs };        tmp.swap(static_cast<Derived &>(lhs));        return static_cast<Derived &>(lhs);    }

26.3.3: Insertion and extraction

Classes also frequently define overloaded insertion and extractionoperators. Since there are no `compound insertion operators' the design shownso far cannot be used when overloading these operators. Instead usingstandardized member function signatures is advocated:voidinsert(std::ostream &out) const to insert an object into anostream andvoid extract(std::istream &in) const to extract an object from anistream. As these functions are only used by, respectively, the insertionand extraction operators, they can be declared in theDerived class'sprivate interface. Instead of declaring the insertion and extraction operatorsfriends of the classDerived a singlefriend Binops<Derived> isspecified. This allowsBinops<Derived> to define private, inlineiWrapandeWrap members, merely calling, respectively,Derived's insert andextract members:

    template <typename Derived>    inline void Binops<Derived>::iWrap(std::ostream &out) const    {        static_cast<Derived const &>(*this).insert(out);    }

Binops<Derived> then declares the insertion and extraction operators asits friends, allowing these operators to call, respectively,iWrap andeWrap. Note that the software engineer designing the classDerivedonly has to provide afriend Binops<Derived> declaration. Here is theimplementation of the overloaded insertion operator:

    template <typename Derived>    std::ostream &operator<<(std::ostream &out, Binops<Derived> const &obj)    {        obj.iWrap(out);        return out;    }

This completes the coverage of the essentials of a class templateBinopspotentially offering binary operators and insertion/extraction operators forany class derived fromBinops. Finally, as noted at the beginning of thissection, a complete implementation of a class offering addition and insertionoperators is provided in the fileannotations/yo/concrete/examples/binopclasses.cc in theC++ Annotations'source archive.

26.4: Distinguishing lvalues from rvalues with operator[]()

A problem withoperator[] is that it can't distinguish between its use as anlvalue and as anrvalue. It is a familiar misconception tothink that

    Type const &operator[](size_t index) const

is used asrvalue (as the object isn't modified), and that

    Type &operator[](size_t index)

is used aslvalue (as the returned value can be modified).

The compiler, however, distinguishes between the two operators only by theconst-status of the object for whichoperator[] is called. Withconst objects the former operator is called, with non-const objectsthe latter is always used. It is always used, irrespective of it being used aslvalue or rvalue.

Being able to distinguish between lvalues and rvalues can be veryuseful. Consider the situation where a class supportingoperator[] storesdata of a type that is very hard to copy. With data like that referencecounting (e.g., usingshared_ptrs) is probably used to prevent needlesscopying.

As long asoperator[] is used as rvalue there's no need to copy the data,but the informationmust be copied if it is used as lvalue.

TheProxy Design Pattern (cf.Gamma et al. (1995)) canbe used to distinguish between lvalues and rvalues. With the Proxy DesignPattern an object of another class (the Proxy class) is used to act as astand in for the `real thing'. The proxy class offers functionality thatcannot be offered by the data themselves, like distinguishing between its useas lvalue or rvalue. A proxy class can be used in many situations where accessto the real data cannot or should not be directly provided. In this regarditerator types are examples of proxy classes as they create a layerbetween the real data and the software using the data. Proxy classes couldalso dereference pointers in a class storing its data by pointers.

In this section we concentrate on the distinction between usingoperator[]as lvalue and rvalue. Let's assume we have a classLines storing linesfrom a file. Its constructor expects the name of a stream from which thelines are read and it offers a non-constoperator[] that can be used aslvalue or rvalue (theconst version ofoperator[] is omitted as itcauses no confusion because it is always used as rvalue):

    class Lines    {        std::vector<std::string> d_line;        public:            Lines(std::istream &in);            std::string &operator[](size_t idx);    };

To distinguish between lvalues and rvalues we must find distinguishingcharacteristics of lvalues and rvalues that we can exploit. Suchdistinguishing characteristics areoperator= (which is always used aslvalue) and the conversion operator (which is always used as rvalue). Ratherthan havingoperator[] return astring & we can let it return aProxy object that is able to distinguish between its use as lvalueand rvalue.

The classProxy thus needsoperator=(string const &other) (acting aslvalue) andoperator std::string const &() const (acting as rvalue). Do weneed more operators? Thestd::string class also offersoperator+=, sowe should probably implement that operator as well. Plain characters can alsobe assigned tostring objects (even using their numeric values). Asstring objects cannot beconstructed from plain characterspromotion cannot be used withoperator=(string const &other) if theright-hand side argument is a character. Implementingoperator=(charvalue) could therefore also be considered. These additional operators areleft out of the current implementation but `real life' proxy classes shouldconsider implementing these additional operators as well. Another subtlety isthatProxy'soperator std::string const &() const is not usedwhen usingostream's insertion operator oristream's extractionoperator as these operators are implemented as templates not recognizing ourProxy class type. So when stream insertion and extraction is required (itprobably is) thenProxy must be given its own overloaded insertion andextraction operator. Here is an implementation of the overloaded insertionoperator inserting the object for whichProxy is a stand-in:

inline std::ostream &operator<<(std::ostream &out, Lines::Proxy const &proxy){    return out << static_cast<std::string const &>(proxy);}

There's no need for any code (exceptLines) to create or copyProxyobjects.Proxy's constructor should therefore be made private, andProxy can declareLines to be its friend. In fact,Proxy isintimately related toLines and can be defined as a nested class. In therevisedLines classoperator[] no longer returns astring butinstead aProxy is returned. Here is the revisedLines class,including its nestedProxy class:

    class Lines    {        std::vector<std::string> d_line;        public:            class Proxy;            Proxy operator[](size_t idx);            class Proxy            {                friend Proxy Lines::operator[](size_t idx);                std::string &d_str;                Proxy(std::string &str);                public:                    std::string &operator=(std::string const &rhs);                    operator std::string const &() const;            };            Lines(std::istream &in);    };

Proxy's members are very lightweight and can usually be implementedinline:

    inline Lines::Proxy::Proxy(std::string &str)    :        d_str(str)    {}    inline std::string &Lines::Proxy::operator=(std::string const &rhs)    {        return d_str = rhs;    }    inline Lines::Proxy::operator std::string const &() const    {        return d_str;    }

The memberLines::operator[] can also be implemented inline: it merelyreturns aProxy object initialized with thestring associated withindexidx.

Now that the classProxy has been developed it can be used in aprogram. Here is an example using theProxy object as lvalue or rvalue. Onthe surfaceLines objects won't behave differently fromLines objectsusing the original implementation, but by adding an identifyingcoutstatement toProxy's members it can be shown thatoperator[] behavesdifferently when used as lvalue or as rvalue:

    int main()    {        ifstream in("lines.cc");        Lines lines(in);        string s = lines[0];        // rvalue use        lines[0] = s;               // lvalue use        cout << lines[0] << '\n';   // rvalue use        lines[0] = "hello world";   // lvalue use        cout << lines[0] << '\n';   // rvalue use    }

26.5: Implementing a `reverse_iterator'

In section22.14.1 the construction of iterators andreverse iterators was discussed. In that section the iterator was constructedas an inner class in a class derived from a vector of pointers to strings.

An object of this nested iterator class handles the dereferencing of thepointers stored in the vector. This allowed us to sort thestringspointed to by the vector's elements rather than thepointers.

A drawback of this is that the class implementing the iterator is closelytied to the derived class as the iterator class was implemented as a nestedclass. What if we would like to provide any class derived from a containerclass storing pointers with an iterator handling pointer-dereferencing?

In this section a variant of the earlier (nested class) approach isdiscussed. Here the iterator class is defined as a class template, not only parameterizing the data type to which the container's elementspoint but also the container's iterator type itself. Once again, weconcentrate on developing aRandomIterator as it is the most complexiterator type.

Our class is namedRandomPtrIterator, indicating that it is a randomiterator operating on pointer values. The class template defines threetemplate type parameters:

The first parameter specifies the derived class type(Class). Like before,RandomPtrIterator'sconstructor is private. Thereforefriend declarations are needed toallow client classes to constructRandomPtrIterators. However, afriend class Class cannot be used as template parameter types cannot beused infriend class ... declarations. But this is a minor problem as notevery member of the client class needs to construct iterators. In fact, onlyClass'sbegin andend members must constructiterators. Using the template's first parameter, friend declarations can bespecified for the client'sbegin andend members.
The second template parameter parameterizes the container's iteratortype (BaseIterator);
The third template parameter indicates the data type to which thepointers point (Type).

RandomPtrIterator has one private data member, aBaseIterator. Here is the class interface and the constructor'simplementation:

    #include <iterator>    #include <compare>    template <typename Class, typename BaseIterator, typename Type>    struct RandomPtrIterator;    #define PtrIterator RandomPtrIterator<Class, BaseIterator, Type>    #define PtrIteratorValue RandomPtrIterator<Class, BaseIterator, value_type>    template <typename Class, typename BaseIterator, typename Type>    bool operator==(PtrIterator const &lhs, PtrIterator const &rhs);    template <typename Class, typename BaseIterator, typename Type>    auto operator<=>(PtrIterator const &lhs, PtrIterator const &rhs);    template <typename Class, typename BaseIterator, typename Type>    int operator-(PtrIterator const &lhs, PtrIterator const &rhs);    template <typename Class, typename BaseIterator, typename Type>    struct RandomPtrIterator    {        using iterator_category = std::random_access_iterator_tag;        using difference_type   = std::ptrdiff_t;        using value_type        = Type;        using pointer           = value_type *;        using reference         = value_type &;        friend PtrIterator Class::begin();        friend PtrIterator Class::end();        friend bool operator==<>(RandomPtrIterator const &lhs,                                 RandomPtrIterator const &rhs);        friend auto operator<=><>(RandomPtrIterator const &lhs,                                  RandomPtrIterator const &rhs);        friend int operator-<>(RandomPtrIterator const &lhs,                               RandomPtrIterator const &rhs);        private:            BaseIterator d_current;        public:            int operator-(RandomPtrIterator const &rhs) const;            RandomPtrIterator operator+(int step) const;            value_type &operator*() const;            RandomPtrIterator &operator--();            RandomPtrIterator operator--(int);            RandomPtrIterator &operator++();            RandomPtrIterator operator++(int);            RandomPtrIterator operator-(int step) const;            RandomPtrIterator &operator-=(int step);            RandomPtrIterator &operator+=(int step);            value_type *operator->() const;        private:            RandomPtrIterator(BaseIterator const &current);    };    template <typename Class, typename BaseIterator, typename value_type>    PtrIteratorValue::RandomPtrIterator(BaseIterator const &current)    :        d_current(current)    {}

Looking at itsfriend declarations, we see that the membersbeginandend of a classClass, returning aRandomPtrIterator object forthe typesClass, BaseIterator andType are granted access toRandomPtrIterator's private constructor. That is exactly what wewant. TheClass'sbegin andend members are declared asboundfriends.

AllRandomPtrIterator's remaining members are public. SinceRandomPtrIterator is just a generalization of the nested classiterator developed in section22.14.1, re-implementing the requiredmember functions is easy and only requires us to changeiterator intoRandomPtrIterator and to changestd::string intoType. Forexample,operator<, defined in the classiterator as is now implemented as:

    template <typename Class, typename BaseIterator, typename Type>    inline auto operator<=>(PtrIterator const &lhs, PtrIterator const &rhs)    {        return **lhs.d_current <=> **rhs.d_current;    }

Some additional examples:operator*, defined in the classiterator as

inline std::string &StringPtr::iterator::operator*() const{    return **d_current;}

is now implemented as:

    template <typename Class, typename BaseIterator, typename value_type>    value_type &PtrIteratorValue::operator*() const    {        return **d_current;    }

The pre- and postfix increment operators are now implemented as:

    template <typename Class, typename BaseIterator, typename value_type>    PtrIteratorValue &PtrIteratorValue::operator++()    {        ++d_current;        return *this;    }    template <typename Class, typename BaseIterator, typename value_type>    PtrIteratorValue PtrIteratorValue::operator++(int)    {        return RandomPtrIterator(d_current++);    }

Remaining members can be implemented accordingly, their actualimplementations are left as exercises to the reader (or can be obtained fromthecplusplus.yo.zip archive, of course).

Re-implementing the classStringPtr developed in section22.14.1is not difficult either. Apart from including the header file defining theclass templateRandomPtrIterator, it only requires a single modification.Itsiterator using-declaration must now be associated with aRandomPtrIterator. Here is the full class interface and the class's inlinemember definitions:

    #ifndef INCLUDED_STRINGPTR_H_    #define INCLUDED_STRINGPTR_H_    #include <vector>    #include <string>    #include "iterator.h"    class StringPtr: public std::vector<std::string *>    {        public:            using iterator =                    RandomPtrIterator                    <                        StringPtr,                        std::vector<std::string *>::iterator,                        std::string                    >;            using reverse_iterator = std::reverse_iterator<iterator>;            iterator begin();            iterator end();            reverse_iterator rbegin();            reverse_iterator rend();    };    inline StringPtr::iterator StringPtr::begin()    {        return iterator(this->std::vector<std::string *>::begin() );    }    inline StringPtr::iterator StringPtr::end()    {        return iterator(this->std::vector<std::string *>::end());    }    inline StringPtr::reverse_iterator StringPtr::rbegin()    {        return reverse_iterator(end());    }    inline StringPtr::reverse_iterator StringPtr::rend()    {        return reverse_iterator(begin());    }    #endif

IncludingStringPtr's modified header file into the program given insection22.14.2 results in a program behaving identically to itsearlier version. In this caseStringPtr::begin andStringPtr::endreturn iterator objects constructed from a template definition.

26.6: Using `bisonc++' and `flexc++'

The example discussed below digs into the peculiarities of usingparser- andscanner generators generatingC++ sources. Once theinput for a program exceeds a certain level of complexity, it becomesattractive to use scanner- and parser-generators generating the code whichdoes the actual input recognition.

The examples in this and subsequent sections assume that the reader knows howto use thescanner generatorflex and theparser generatorbison. Bothbison andflex are well documented elsewhere. The originalpredecessors ofbison andflex, calledyacc andlex aredescribed in several books, e.g. in O'Reilly's book`lex & yacc'.

Scanner- and parser generators are also available as free software. Bothbison andflex are usually part of software distributions or they canbe obtained from ftp://prep.ai.mit.edu/pub/non-gnu.Flex creates aC++ classwhen%option c++ is specified.

For parser generators the programbison is available. In the early 90'sAlain Coetmeur (coetmeur@icdc.fr) created aC++ variant (bison++) creating a parser class. Although thebison++ program produces code that can be used inC++ programs it alsoshows many characteristics that are more suggestive of aC context than aC++ context. In January 2005 I rewrote parts of Alain'sbison++program, resulting in the original version of the programbisonc++. Then, in May 2005 a complete rewrite of thebisonc++parser generator was completed (version number 0.98). Current versions ofbisonc++ can be downloaded fromhttps://fbb-git.gitlab.io/bisoncpp/. Binary versions for variousarchitectures are available as, e.g.,Debianpackage (includingbisonc++'s documentation).

Bisonc++ creates a cleaner parser class thanbison++. In particular,it derives the parser class from a base-class, containing the parser's token-and type-definitions as well as all member functions which should not be(re)defined by the programmer. As a result of this approach, the generatedparser class is very small, declaring only members that are actually definedby the programmer (as well as some other members, generated bybisonc++itself, implementing the parser'sparse() member). One member that isnot implemented by default islex, producing the next lexicaltoken. When the directive%scanner (see section26.6.2.1) is used,bisonc++ produces a standard implementation for this member; otherwise itmust be implemented by the programmer.

In early 2012 the programflexc++http://flexcpp.org/ reached its initial release. Likebisonc++ it is part of theDebian linuxdistribution.

Jean-Paul van Oosten (jp@jpvanoosten.nl) and Richard Berendsen(richardberendsen@xs4all.nl) started theflexc++ project in 2008and the final program was completed by Jean-Paul and me between 2010 and 2012.

These sections of theC++ Annotations focus onbisonc++ as ourparser generator andflexc++ as our lexical scannergenerator. Previous releases of theC++ Annotations were usingflex as thescanner generator.

Usingflexc++ andbisonc++class-based scanners and parsers aregenerated. The advantage of this approach is that the interface to the scannerand the parser tends to become cleaner than without usingclassinterfaces. Furthermore, classes allow us to get rid of most if not all globalvariables, making it easy to use multiple parsers in one program.

Below two example programs are developed. The first example only usesflexc++. The generated scanner monitors the production of a file fromseveral parts. That example focuses on the lexical scanner and on switchingfiles while churning through the information. The second example uses bothflexc++ andbisonc++ to generate a scanner and a parser transformingstandard arithmetic expressions to their postfix notations, commonly used incode generated by compilers and inHP-calculators. In the second examplethe emphasis is mainly onbisonc++ and on composing a scanner objectinside a generated parser.

26.6.1: Using `flexc++' to create a scanner

The lexical scanner developed in this section is used to monitor theproduction of a file from several subfiles. The setup is as follows: theinput-language defines#include directives, followed by a textstring specifying the file (path) which should be included at the location ofthe#include.

In order to avoid complexities irrelevant to the current example, the formatof the#include statement is restricted to the form#include<filepath>. The file specified between the angle brackets should beavailable at the location indicated byfilepath. If the file is notavailable, the program terminates after issuing an error message.

The program is started with one or two filename arguments. If the program isstarted with just one filename argument, the output is written to thestandard output streamcout. Otherwise, the output is written tothe stream whose name is given as the program's second argument.

The program defines a maximumnesting depth. Once this maximum is exceeded,the program terminates after issuing an error message. In that case, thefilename stack indicating where which file was included is printed.

An additional feature of the program is that (standardC++) comment-linesare ignored. Include-directives in comment-lines are also ignored.

The program is created in five major steps:

First, the filelexer is constructed, containing theinput-language specifications.
From the specifications inlexer the requirements for theclass Scanner evolve. TheScanner class derives from the base classScannerBase generated byflexc++.
Next,main is constructed. AScanner object is createdinspecting the command-line arguments. If successful, the scanner's memberlex is called to produce the program's output.
Now that the global setup of the program has been specified,the member functions of the various classes are implemented.
Finally, the program is compiled and linked.

26.6.1.1: The derived class `Scanner'

The function matching theregular expression rules (lex) is amember of the classScanner. SinceScanner is derived fromScannerBase, it has access to all ofScannerBase's protected membersthat execute the lexical scanner's regular expression matching algorithm.

Looking at the regular expressions themselves, notice that we need rulesto recognize comment,#include directives, and all remaining characters.This all is fairly standard practice. When an#include directive issensed, the directive is parsed by the scanner. This too is commonpractice. Our lexical scanner performs the following tasks:

As usual,preprocessor directives are notanalyzed by a parser, but by the lexical scanner;
The scanner uses amini scanner to extract the filename from thedirective, throwing a exception if this fails;
If the filename could be extracted, processing switches to the nextstream, controlling for a maximum nesting depth.
Once the end of the current file has been reached processingautomatically returns to the previous file, restoring the previous file nameand line number. The scanner returns 0 if all files have been processed.

26.6.1.2: The lexical scanner specification file

Thelexical scanner specification file is organized comparably to theone used forflex inC contexts. However, inC++ contexts,flexc++ creates a classScanner, rather than just a scanner function.

Flexc++'s specification file consists of two sections:

The specification file's first section isflexc++'ssymbol area, used to define symbols, like amini scanner, oroptions. The following options are suggested:
- %debug: includesdebugging code into the code generated byflexc++. Calling the memberfunctionsetDebug(true) activates this debugging code at run-time. Whenactivated, information about the matching process is written to thestandard output stream. The execution of debug code is suppressed aftercalling the member functionsetDebug(false).
- %filenames: defines the base-name of the class header filesgenerated byflexc++. By default the class name (itself using the defaultScanner) is used.
Here is the specification files' symbol area:
```
%filenames scanner%debug%max-depth 3%x      comment%x      include
```

The specification file's second section is arules section inwhich the regular expressions and their associated actions are defined. In the example developed here, the lexer shouldcopy information from the standard input stream (std::cin) to thestandard output stream (std::cout). For this the predefined macroECHO can be used. Here are the rules:

%%    // The comment-rules: comment is ignored."//".*                  // ignore eoln comment"/*"                    begin(StartCondition__::comment);<comment>{    .|\n                // ignore all characters in std C comment    "*/"                begin(StartCondition__::INITIAL);}    // File switching: #include <filepath>#include[ \t]+"<"      begin(StartCondition__::include);<include>{    [^ \t>]+            d_nextSource = matched();    ">"[ \t]*\n         switchSource();    .|\n                throw runtime_error("Invalid include statement");}    // The default rule: echo anything else to std::cout.|\n                    echo();

26.6.1.3: Implementing `Scanner'

Theclass Scanner is generated once byflexc++. This class hasaccess to several members defined by its base classScannerBase. Some ofthese members have public access rights and can be used by code external tothe classScanner. These members are extensively documented in theflexc++(1) man-page, and the reader is referred to this man-page forfurther information.

Our scanner performs the following tasks:

it matches regular expressions, ignoring comment, and writing thematched text to the standard output stream;
it switches to other files, and returns to the previous file once afile has completely been processed, ending the lexical scan once the end ofthe first input file has been reached.

The#include statements in the input allow the scanner to distill thename of the file where the scanning process must continue. This file name isstored in a local variabled_nextSource and a memberstackSourcehandles the switch to the next source. Nothing else is required. Pushing andpopping input files is handled by the scanner's memberspushStream andpopStream, provided byflexc++.Scanner's interface, therefore,only needs one additional function declaration:switchSource.

Switching streams is handled as follows: once the scanner has extracted afilename from an#include directive, a switch to another file is realizedbyswitchSource. This member callspushStream, defined byflexc++, to stack the current input stream and to switch to the streamwhose name is stored ind_nextSource. This also ends theincludemini-scanner, so to return the scanner to its default scanning modebegin(StartCondition__::INITIAL) is called. Here is its source:

#include "scanner.ih"void Scanner::switchSource(){    pushStream(d_nextSource);    begin(StartCondition__::INITIAL);}

The memberpushStream, defined byflexc++, handles all necessarychecks, throwing an exception if the file could not be opened or if too manyfiles are stacked.

The member performing the lexical scan is defined byflexc++ inScanner::lex, and this member can be called by code to process the tokensreturned by the scanner.

26.6.1.4: Using a `Scanner' object

The program using ourScanner is very simple. It expects a filenameindicating where to start the scanning process.

The program first checks the number of arguments. If at least one argument wasgiven, then that argument is passed toScanner's constructor, togetherwith a second argument"-", indicating that the output should go to thestandard output stream.

If the program receives more than one argument debug output, extensivelydocumenting the lexical scanner's actions, is written to the standard outputstream as well.

Next theScanner'slex member is called. If anything fails, astd::exception is thrown, which is caught bymain's try-block's catchclause. Here is the program's source:

#include "lexer.ih"int main(int argc, char **argv)try{    if (argc == 1)    {        cerr << "Filename argument required\n";        return 1;    }    Scanner scanner(argv[1], "-");    scanner.setDebug(argc > 2);    return scanner.lex();}catch (exception const &exc){    cerr << exc.what() << '\n';    return 1;}

26.6.1.5: Building the program

The final program is constructed in two steps. These steps are given for aUnix system, on whichflexc++ and theGNUC++ compilerg++have been installed:

First, the lexical scanner's source is created usingflexc++. Forthis the following command can be given:
```
    flexc++ lexer
```
Next, all sources are compiled and linked:
```
    g++ -Wall *.cc
```

Flexc++ can be downloaded from https://fbb-git.gitlab.io/flexcpp/, and requires thebobcat library, which can be downloaded fromhttp://fbb-git.gitlab.io/bobcat/.

26.6.2: Using `bisonc++' and `flexc++'

Once aninput language exceeds a certain level of complexity, aparseris often used to control the complexity of the language. In this case, aparser generator can be used to generate the code verifying the input'sgrammatical correctness. The lexical scanner (preferably composed into theparser) provides chunks of the input, calledtokens. The parserthen processes the series of tokens generated by the lexical scanner.

Starting point when developing programs that use both parsers and scanners isthegrammar. The grammar defines aset of tokens that can be returnedby the lexical scanner (called thescanner below).

Finally, auxiliary code is provided to `fill in the blanks': theactionsperformed by the parser and by the scanner are not normally specifiedliterally in the grammar rules or lexical regular expressions, butshould be implemented inmember functions, called from the parser'srules or which are associated with the scanner's regular expressions.

In the previous section we've seen an example of aC++ class generated byflexc++. In the current section we concentrate on the parser. The parsercan be generated from a grammar specification file, processed by the programbisonc++. The grammar specification file required bybisonc++ issimilar to the file processed bybison (orbison++,bisonc++'spredecessor, written in the early nineties byAlain Coetmeur).

In this section a program is developed convertinginfix expressions,where binary operators are written between their operands, topostfixexpressions, where operators are written behind their operands. Also, theunary operator- is converted from its prefix notation to a postfix form.The unary+ operator is ignored as it requires no further actions. Inessence our little calculator is a micro compiler, transforming numericexpressions into assembly-like instructions.

Our calculator recognizes a rather basic set of operators:multiplication, addition, parentheses, and the unary minus. We'lldistinguish real numbers from integers, to illustrate a subtlety inbison-like grammar specifications. That's all. The purpose of this section is,after all, to illustrate the construction of aC++ program that uses botha parser and a lexical scanner, rather than to construct a full-fledgedcalculator.

In the coming sections we'll develop the grammar specification forbisonc++. Then, the regular expressions for the scanner arespecified. Following that, the final program is constructed.

26.6.2.1: The `bisonc++' specification file

Thegrammar specification file required bybisonc++ is comparable tothe specification file required bybison. Differences are related to theclass nature of the resulting parser. Our calculator distinguishes realnumbers from integers, and supports a basic set of arithmetic operators.

Bisonc++ should be used as follows:

As usual, a grammar is defined. Withbisonc++ this is no different, andbisonc++ grammar definitions are for all practical purposes identical tobison's grammar definitions.
Having specified the grammar and (usually) some declarationsbisonc++ can generate files defining the parser class and the implementation of the member functionparse.
All class members (except those that are required for the proper functioning of the memberparse) must be separately implemented. Of course, they should also be declared in the parser class's header. At the very least the memberlex must be implemented. This member is called byparse to obtain the next available token. However,bisonc++ offers a facility providing a standard implementation of the functionlex. The member functionerror(char const *msg) is given a simple default implementation that may be modified by the programmer. The member functionerror is called whenparse detects (syntactic) errors.
The parser can now be used in a program. A very simple example would be:
```
int main(){    Parser parser;    return parser.parse();}
```

Thebisonc++ specification file has twosections:

Thedeclaration section. In this section bison's tokens, and the priority rules for the operators are declared. However,bisonc++ also supports several new declarations. These new declarations are important and are discussed below.
Therules section. Thegrammatical rules define the grammar. This section is identical to the one required bybison, albeit that some members that were available inbison andbison++ are obsolete inbisonc++, while other members can be used in a wider context. For example,ACCEPT andABORT can be called from any member called from the parser's action blocks to terminate the parsing process.

Readers familiar withbison may note that there is noheader section anymore. Header sections are used by bison to providefor the necessary declarations allowing the compiler to compile theCfunction generated bybison. InC++ declarations are part of oralready used by class definitions. Therefore, a parser generator generating aC++ class and some of its member functions does not require a headersection anymore.

The declaration section

Thedeclaration section contains several sets of declarations, amongwhich definitions of all the tokens used in the grammar and the priorities andassociativities of the mathematical operators. Moreover, several new andimportant specifications can be used here as well. Those relevant to ourcurrent example and only available inbisonc++ are discussed here. Thereader is referred tobisonc++'s man-page for a full description.

%baseclass-preincludeheader
Useheader as the pathname to the file pre-included in the parser's base-class header. This declaration is useful in situations where the base class header file refers to types which might not yet be known. E.g., with%union astd::string * field might be used. Since the classstd::string might not yet be known to the compiler once it processes the base class header file we need a way to inform the compiler about these classes and types. The suggested procedure is to use a pre-include header file declaring the required types. By defaultheader is surrounded by double quotes (using, e.g.,#include "header"). When the argument is surrounded by angle brackets#include <header> is included. In the latter case, quotes might be required to escape interpretation by the shell (e.g., using-H '<header>').
%filenamesheader
Defines the generic name of all generated files, unless overridden by specific names. By default the generated files use the class-name as the generic file name.
%scannerheader
Useheader as the pathname to the file pre-included in the parser's class header. This file should define a classScanner, offering a memberint lex() producing the next token from the input stream to be analyzed by the parser generated bybisonc++. When this option is used the parser's memberint lex() is predefined as (assuming the default parser class nameParser is used):
```
inline int Parser::lex(){    return d_scanner.lex();}
```
and an objectScanner d_scanner is composed into the parser. Thed_scanner object is constructed by its default constructor. If another constructor is required, the parser class may be provided with an appropriate (overloaded) parser constructor after having constructed the default parser class header file usingbisonc++. By defaultheader is surrounded by double quotes (using, e.g.,#include "header"). When the argument is surrounded by angle brackets#include <header> is included.
%stypetypename
The type of the semantic value of tokens. The specificationtypename should be the name of an unstructured type (e.g.,size_t). By default it isint. SeeYYSTYPE inbison. It should not be used if a%union specification is used. Within the parser class, this type may be used asSTYPE.
%unionunion-definition
Acts identically to thebison declaration. As withbison this generates a union for the parser's semantic type. The union type is namedSTYPE. If no%union is declared, a simple stack-type may be defined using the%stype declaration. If no%stype declaration is used, the default stacktype (int) is used.

An example of a%union declaration is:

    %union    {        int     i;        double  d;    };

In pre-C++11 code aunion cannot contain objects as its fields, asconstructors cannot be called when a union is created. This means that astring cannot be a member of theunion. Astring *, however,is a possible union member. It might alsobe possible to useunrestricted unions (cf. section9.9), havingclass type objects as fields.

As an aside: the scanner does not have to know about such a union. Itcan simply pass its scanned text to the parser through itsmatched memberfunction. For example using a statement like

    $$.i = A2x(d_scanner.matched());

matched text is converted to a value of an appropriate type.

Tokens and non-terminals can be associated with union fields. This isstrongly advised, as it prevents type mismatches, since the compiler may thencheck for type correctness. At the same time, the bison specificvariables$$,$1,$2, etc. may be used, rather than the full fieldspecification (like$$.i). A non-terminal or a token may be associatedwith a union field using the<fieldname> specification. E.g.,

    %token <i> INT          // token association (deprecated, see below)           <d> DOUBLE    %type  <i> intExpr      // non-terminal association

In the example developed here, both the tokens and the non-terminals canbe associated with a union field. However, as noted before, the scanner doesnot have to know about all this. In our opinion, it is cleaner to let thescanner do just one thing: scan texts. Theparser, knowing what the inputis all about, may then convert strings like"123" to an integervalue. Consequently, the association of a union field and a token isdiscouraged. Below, while describing the grammar's rules, this is furtherillustrated.

In the%union discussion the%token and%type specificationsshould be noted. They are used to specify the tokens (terminal symbols) thatcan be returned by the scanner, and to specify the return types ofnon-terminals. Apart from%token the token declarators%left,%right, and%nonassoc can be used to specify the associativity ofoperators. The tokens mentioned at these indicators are interpreted as tokensindicating operators, associating in the indicated direction. The precedenceof operators is defined by their order: the first specification has the lowestpriority. To overrule a certain precedence in a certain context%prec canbe used. As all this is standardbisonc++ practice, it isn't furtherelaborated here. The documentation provided withbisonc++'s distributionshould be consulted for further reference.

Here is the specification of the calculator's declaration section:

%filenames parser%scanner ../scanner/scanner.h%union {    int i;    double d;};%token  INT DOUBLE%type   <i> intExpr%type   <d> doubleExpr%left   '+'%left   '*'%right  UnaryMinus

In the declaration section%type specifiers are used, associating theintExpr rule's value (see the next section) to thei-field of thesemantic-value union, and associatingdoubleExpr's value to thed-field. This approach, admittedly, is rather complex, as expression rulesmust be included for each of the supported union types. Alternatives aredefinitely possible, and involve the use ofpolymorphic semanticvalues, covered in detail in theBisonc++ user guide.

The grammar rules

The rules and actions of the grammar are specified as usual. The grammarfor our little calculator is given below. There are quite a few rules, butthey illustrate various features offered bybisonc++. In particular, notethat no action block requires more than a single line of code. This keeps thegrammar simple, and therefore enhances its readability andunderstandability. Even the rule defining the parser's proper termination (theempty line in theline rule) uses a single member function calleddone. The implementation of that function is simple, but it is worth whilenoting that it callsParser::ACCEPT, showing thatACCEPT can be calledindirectly from a production rule's action block. Here are the grammar'sproduction rules:

    lines:        lines        line    |        line    ;    line:        intExpr        '\n'        {            display($1);        }    |        doubleExpr        '\n'        {            display($1);        }    |        '\n'        {            done();        }    |        error        '\n'        {            reset();        }    ;    intExpr:        intExpr '*' intExpr        {            $$ = exec('*', $1, $3);        }    |        intExpr '+' intExpr        {            $$ = exec('+', $1, $3);        }    |        '(' intExpr ')'        {            $$ = $2;        }    |        '-' intExpr         %prec UnaryMinus        {            $$ = neg($2);        }    |        INT        {            $$ = convert<int>();        }    ;    doubleExpr:        doubleExpr '*' doubleExpr        {            $$ = exec('*', $1, $3);        }    |        doubleExpr '*' intExpr        {            $$ = exec('*', $1, d($3));        }    |        intExpr '*' doubleExpr        {            $$ = exec('*', d($1), $3);        }    |        doubleExpr '+' doubleExpr        {            $$ = exec('+', $1, $3);        }    |        doubleExpr '+' intExpr        {            $$ = exec('+', $1, d($3));        }    |        intExpr '+' doubleExpr        {            $$ = exec('+', d($1), $3);        }    |        '(' doubleExpr ')'        {            $$ = $2;        }    |        '-' doubleExpr         %prec UnaryMinus        {            $$ = neg($2);        }    |        DOUBLE        {            $$ = convert<double>();        }    ;

This grammar is used to implement a simple calculator in which integer andreal values can be negated, added, and multiplied and in which standardpriority rules can be overruled by parentheses. The grammar shows the use oftyped nonterminal symbols:doubleExpr is linked to real (double) values,intExpr is linked to integer values. Precedence and type association isdefined in the parser's definition section.

The Parser's header file

Several class members called from the grammar are defined as membertemplates.Bisonc++ generates multiple files, among which the filedefining the parser's class. Functions called from the production rule'saction blocks are usually member functions of the parser. These memberfunctions must be declared and defined. Oncebisonc++ has generated theheader file defining the parser's class, that header file isn't automaticallyrewritten, allowing the programmer to add new members to the parser classwhenever required. Here is `parser.h' as used in our little calculator:

#ifndef Parser_h_included#define Parser_h_included#include <iostream>#include <sstream>#include <bobcat/a2x>#include "parserbase.h"#include "../scanner/scanner.h"#undef Parserclass Parser: public ParserBase{    std::ostringstream d_rpn;    // $insert scannerobject    Scanner d_scanner;    public:        int parse();    private:        template <typename Type>        Type exec(char c, Type left, Type right);        template <typename Type>        Type neg(Type op);        template <typename Type>        Type convert();        void display(int x);        void display(double x);        void done() const;        void reset();        void error(char const *msg);        int lex();        void print();        static double d(int i);    // support functions for parse():        void executeAction(int d_ruleNr);        void errorRecovery();        int lookup(bool recovery);        void nextToken();        void print__();};inline double Parser::d(int i){    return i;}template <typename Type>Type Parser::exec(char c, Type left, Type right){    d_rpn << " " << c << " ";    return c == '*' ? left * right : left + right;}template <typename Type>Type Parser::neg(Type op){    d_rpn << " n ";    return -op;}template <typename Type>Type Parser::convert(){    Type ret = FBB::A2x(d_scanner.matched());    d_rpn << " " << ret << " ";    return ret;}inline void Parser::error(char const *msg){    std::cerr << msg << '\n';}inline int Parser::lex(){    return d_scanner.lex();}inline void Parser::print(){}#endif

26.6.2.2: The `flexc++' specification file

The flex-specification file used by the calculator is simple: blanks areignored, single characters are returned, and numeric values are returned aseitherParser::INT orParser::DOUBLE tokens.

Theflexc++ directive%interactive is provided since thecalculator is a program actively interacting with its human user.

Here is the completeflexc++ specification file:

%interactive%filenames scanner%%[ \t]                       // ignored[0-9]+                      return Parser::INT;"."[0-9]+                   |[0-9]+"."[0-9]*             return Parser::DOUBLE;.|\n                        return matched()[0];

26.6.2.3: Building the program

The calculator is built usingbisonc++ andflexc++. Here is theimplementation of the calculator'smain function:

#include "parser/parser.h"using namespace std;int main(){    Parser parser;    cout << "Enter (nested) expressions containing ints, doubles, *, + and "            "unary -\n"            "operators. Enter an empty line to stop.\n";    return parser.parse();}

The parser's filesparse.cc andparserbase.h are generated by thecommand:

    bisonc++ grammar

The fileparser.h is created only once, to allow the developer to addmembers to theParser class occe the need for them arises.

The programflexc++ is used to create a lexical scanner:

    flexc++ lexer

OnUnix systems a command like

    g++ -Wall -o calc *.cc -lbobcat -s

can be used to compile and link the source of the main program and thesources produced by the scanner and parser generators. The example uses theA2x class, which is part of thebobcat library (cf. section26.6.1.5) (the bobcat library is available on systems offering eitherbisonc++ orflexc++).Bisonc++ can be downloaded from

http://fbb-git.gitlab.io/bisoncpp/.

[8]ページ先頭