static members, etc. are illustrated in this chapter. Theexamples roughly follow the organization of earlier chapters.As an additional topic, not just providing examples ofC++ the subjects ofscanner andparser generators are covered. We show how these tools maybe used inC++ programs. These additional examples assume a certainfamiliarity with the concepts underlying these tools, like grammars,parse-trees and parse-tree decoration. Once the input for a program exceeds a certainlevel of complexity, it's attractive to use scanner- and parser-generators tocreate the code doing the actual input processing. One of theexamples in this chapter describes the usage of thesetools in aC++ environment.
std::streambuf as the starting pointfor constructing classes interfacing such file descriptor devices.Below we'll construct classes that can be used to write to a device givenits file descriptor. The devices may be files, but they could also bepipesorsockets. Section26.1.2 covers reading from such devices; section26.2.3 reconsiders redirection, discussed earlier in section6.6.2.
Using thestreambuf class as a base class it is relatively easy todesign classes for output operations. The only member function thatmustbe overridden is the (virtual) memberint streambuf::overflow(int c). This member's responsibility is towrite characters to the device. Iffd is an output file descriptor and ifoutput should not be buffered then the memberoverflow() can simply beimplemented as:
class UnbufferedFD: public std::streambuf { public: int overflow(int c) override; ... }; int UnbufferedFD::overflow(int c) { if (c != EOF) { if (write(d_fd, &c, 1) != 1) return EOF; } return c; }The argument received byoverflow is either written to the filedescriptor (and returned fromoverflow), orEOF is returned.
This simple function does not use output buffering. For various reasons,using a buffer is usually a good idea (see also the next section).
When output buffering is used, theoverflow member is a bit morecomplex as it is only called when the buffer is full. Once the buffer is full,wefirst have to flush the buffer. Flushing the buffer is theresponsibility of the (virtual) functionstreambuf::sync. Sincesync is a virtual function, classes derived fromstreambuf mayredefinesync to flush a bufferstreambuf itself doesn't know about.
Overridingsync and using it inoverflow is not all that has to bedone. When the object of the class defining the buffer reaches the end of itslifetime the buffer may be only partially full. In that situation the buffermust also be flushed. This is easily done by simply callingsync from theclass's destructor.
Now that we've considered the consequences of using an output buffer,we're almost ready to design our derived class. Several more featuresare added as well, though:
OFdnStreambuf has the following characteristics:streambuf the<unistd.h> header file musthave been read by the compiler before its member functions can be compiled.std::streambuf. class OFdnStreambuf: public std::streambuf { int d_fd = -1; size_t d_bufsize = 0; char *d_buffer = 0; public: OFdnStreambuf() = default; OFdnStreambuf(int fd, size_t bufsize = 1); ~OFdnStreambuf() override; void open(int fd, size_t bufsize = 1); private: int sync() override; int overflow(int c) override; };openmember (see below). Here are the constructors: inline OFdnStreambuf::OFdnStreambuf(int fd, size_t bufsize) { open(fd, bufsize); }sync, flushing any characters stored in theoutput buffer to the device. In implementations not using a buffer thedestructor can be given a default implementation: inline OFdnStreambuf::~OFdnStreambuf() { if (d_buffer) { sync(); delete[] d_buffer; } } This implementation does not close the device. It is left as an exerciseto the reader to change this class in such a way that the device is optionallyclosed (or optionally remains open). This approach was adopted by, e.g., theBobcat library. See also section26.1.2.2.open member initializes the buffer. Usingstreambuf::setp, the begin and end points of the buffer aredefined. This is used by thestreambuf base class to initializestreambuf::pbase,streambuf::pptr, andstreambuf::epptr: inline void OFdnStreambuf::open(int fd, size_t bufsize) { d_fd = fd; d_bufsize = bufsize == 0 ? 1 : bufsize; delete[] d_buffer; d_buffer = new char[d_bufsize]; setp(d_buffer, d_buffer + d_bufsize); }sync flushes the as yet unflushed content of thebuffer to the device. After the flush the buffer is reinitialized usingsetp. After successfully flushing the buffersync returns 0: inline int OFdnStreambuf::sync() { if (pptr() > pbase()) { write(d_fd, d_buffer, pptr() - pbase()); setp(d_buffer, d_buffer + d_bufsize); } return 0; }streambuf::overflow is alsooverridden. Since this member is called from thestreambuf base class whenthe buffer is full it should first callsync to flush the buffer to thedevice. Next it should write the characterc to the (now empty)buffer. The characterc should be written usingpptr andstreambuf::pbump. Entering a character into the buffer should beimplemented using availablestreambuf member functions, rather than `byhand' as doing so might invalidatestreambuf's internal bookkeeping. Hereisoverflow's implementation: inline int OFdnStreambuf::overflow(int c) { sync(); if (c != EOF) { *pptr() = c; pbump(1); } return c; }OFfdStreambuf class to copy its standardinput to file descriptorSTDOUT_FILENO, which is the symbolic name of thefile descriptor used for the standard output: #include <string> #include <iostream> #include <istream> #include "fdout.h" using namespace std; int main(int argc, char **argv) { OFdnStreambuf fds(STDOUT_FILENO, 500); ostream os(&fds); switch (argc) { case 1: for (string s; getline(cin, s); ) os << s << '\n'; os << "COPIED cin LINE BY LINE\n"; break; case 2: cin >> os.rdbuf(); // Alternatively, use: cin >> &fds; os << "COPIED cin BY EXTRACTING TO os.rdbuf()\n"; break; case 3: os << cin.rdbuf(); os << "COPIED cin BY INSERTING cin.rdbuf() into os\n"; break; } }std::streambuf, they should be provided with an input bufferof at least one character. The one-character input buffer allows for the useof the member functionsistream::putback oristream::ungetc. Strictlyspeaking it is not necessary to implement a buffer in classes derived fromstreambuf. But using buffers in these classes is strongly advised. Theirimplementation is very simple and straightforward and the applicability ofsuch classes is greatly improved. Therefore, all our classes derived from theclassstreambuf define a buffer ofat least one character.IFdStreambuf) fromstreambuf using abuffer of one character, at least its memberstreambuf::underflow should be overridden, as this member eventuallyreceives all requests for input. The memberstreambuf::setg is used to inform thestreambuf base class of thesize and location of the input buffer, so that it is able to set up its inputbuffer pointers accordingly. This ensures thatstreambuf::eback,streambuf::gptr, andstreambuf::egptr return correct values.The classIFdStreambuf is designed like this:
streambuf, the<unistd.h>header file must have been read by the compiler before its member functionscan be compiled.std::streambuf as well.protected data membersso that derived classes (e.g., see section26.1.2.3) can access them. Hereis the full class interface: class IFdStreambuf: public std::streambuf { protected: int d_fd; char d_buffer[1]; public: IFdStreambuf(int fd); private: int underflow() override; };gptr's return value equal toegptr's return value. Thisimplies that the buffer is empty sounderflow is immediately calledto fill the buffer: inline IFdStreambuf::IFdStreambuf(int fd) : d_fd(fd) { setg(d_buffer, d_buffer + 1, d_buffer + 1); }underflow is overridden. The buffer is refilled byreading from the file descriptor. If this fails (for whatever reason),EOF is returned. More sophisticated implementations could act moreintelligently here, of course. If the buffer could be refilled,setg iscalled to set upstreambuf's buffer pointers correctly: inline int IFdStreambuf::underflow() { if (read(d_fd, d_buffer, 1) <= 0) return EOF; setg(d_buffer, d_buffer, d_buffer + 1); return static_cast<unsigned char>(*gptr()); }main function shows howIFdStreambuf can be used: int main() { IFdStreambuf fds(STDIN_FILENO); istream is(&fds); cout << is.rdbuf(); }IFdStreambuf developed in the previous section. To make things a bit moreinteresting, in the classIFdNStreambuf developed here, the memberstreambuf::xsgetn is also overridden, to optimize reading aseries of characters. Also a default constructor is provided that can be usedin combination with theopen member to construct anistream objectbefore the file descriptor becomes available. In that case, once thedescriptor becomes available, theopen member can be used to initiatethe object's buffer. Later, in section26.2, we'll encounter such asituation.To save some space, the success of various calls was not checked. In `reallife' implementations, these checks should of course not be omitted. TheclassIFdNStreambuf has the following characteristics:
streambuf the<unistd.h> header file musthave been read by the compiler before its member functions can be compiled.std::streambuf.IFdStreambuf (section26.1.2.1), its datamembers are protected. Since the buffer's size is configurable, this size iskept in a dedicated data member,d_bufsize: class IFdNStreambuf: public std::streambuf { protected: int d_fd = -1; size_t d_bufsize = 0; char* d_buffer = 0; public: IFdNStreambuf() = default; IFdNStreambuf(int fd, size_t bufsize = 1); ~IFdNStreambuf() override; void open(int fd, size_t bufsize = 1); private: int underflow() override; std::streamsize xsgetn(char *dest, std::streamsize n) override; };open.Open will theninitialize the object so that it can actually be used: inline IFdNStreambuf::IFdNStreambuf(int fd, size_t bufsize) { open(fd, bufsize); }open, its destructor willboth delete the object's buffer and use the file descriptor to close thedevice: IFdNStreambuf::~IFdNStreambuf() { if (d_bufsize) { close(d_fd); delete[] d_buffer; } } Even though the device is closed in the above implementation this may notalways be desirable. In cases where the open file descriptor is alreadyavailable the intention may be to use that descriptor repeatedly, each timeusing a newly constructedIFdNStreambuf object. It is left as an exerciseto the reader to change this class in such a way that the device mayoptionally be closed. This approach was followed in, e.g., theBobcat library.open member simply allocates the object's buffer. It isassumed that the calling program has already opened the device. Once thebuffer has been allocated, the base class membersetg is used to ensurethatstreambuf::ebackstreambuf::gptr andstreambuf::egptr return correct values: void IFdNStreambuf::open(int fd, size_t bufsize) { d_fd = fd; d_bufsize = bufsize == 0 ? 1 : bufsize; delete[] d_buffer; d_buffer = new char[d_bufsize]; setg(d_buffer, d_buffer + d_bufsize, d_buffer + d_bufsize); }underflow is implemented almostidentically toIFdStreambuf's (section26.1.2.1) member. The onlydifference is that the current class supports buffers of largersizes. Therefore, more characters (up tod_bufsize) may be read from thedevice at once: int IFdNStreambuf::underflow() { if (gptr() < egptr()) return *gptr(); int nread = read(d_fd, d_buffer, d_bufsize); if (nread <= 0) return EOF; setg(d_buffer, d_buffer, d_buffer + nread); return static_cast<unsigned char>(*gptr()); }xsgetn is overridden. In a loop,n is reduced until0, at which point the function terminates. Alternatively, the member returnsifunderflow fails to obtain more characters. This member optimizes thereading of series of characters. Instead of callingstreambuf::sbumpcn times, a block ofavail characters is copiedto the destination, usingstreambuf::gbump to consumeavailcharacters from the buffer using one function call: std::streamsize IFdNStreambuf::xsgetn(char *dest, std::streamsize n) { int nread = 0; while (n) { if (!in_avail()) { if (underflow() == EOF) break; } int avail = in_avail(); if (avail > n) avail = n; memcpy(dest + nread, gptr(), avail); gbump(avail); nread += avail; n -= avail; } return nread; }xsgetn is called bystreambuf::sgetn,which is astreambuf member. Here is an example illustrating the use ofthis member function with anIFdNStreambuf object: #include <unistd.h> #include <iostream> #include <istream> #include "ifdnbuf.h" using namespace std; int main() { // internally: 30 char buffer IFdNStreambuf fds(STDIN_FILENO, 30); char buf[80]; // main() reads blocks of 80 // chars while (true) { size_t n = fds.sgetn(buf, 80); if (n == 0) break; cout.write(buf, n); } }std::streambuf should override the membersstreambuf::seekoff andstreambuf::seekpos. The classIFdSeek, developed in this section, can be used to read information fromdevices supporting seek operations. The classIFdSeek was derived fromIFdStreambuf, so it uses a character buffer of just one character. Thefacilities to perform seek operations, which are added to our new classIFdSeek, ensure that the input buffer is reset when a seek operation isrequested. The class could also be derived from the classIFdNStreambuf. In that case the arguments to reset the input buffermust be adapted so that its second and third parameters point beyond theavailable input buffer. Let's have a look at the characteristics ofIFdSeek:IFdSeek is derived fromIFdStreambuf. Like thelatter class,IFdSeek's member functions use facilities declared inunistd.h. So, the header file<unistd.h> must have been read by thecompiler before it can compile the class's members functions. To reduce theamount of typing when specifying types and constants fromstreambuf andstd::ios, several using-declarations are defined by the class.These using-declarations refer to types that are defined in the header file<ios>, which must therefore also be included before the compiler cancompileIFdSeek's class interface: class IFdSeek: public IFdStreambuf { using pos_type = std::streambuf::pos_type; using off_type = std::streambuf::off_type; using seekdir = std::ios::seekdir; using openmode = std::ios::openmode; public: IFdSeek(int fd); private: pos_type seekoff(off_type offset, seekdir dir, openmode); pos_type seekpos(pos_type offset, openmode mode); }; inline IFdSeek::IFdSeek(int fd) : IFdStreambuf(fd) {}seek_off is responsible for performing the actualseek operations. It callslseek to seek a new position in a device whosefile descriptor is known. If seeking succeeds,setg is called to definean already empty buffer, so that the base class'sunderflow memberrefills the buffer at the next input request. IFdSeek::pos_type IFdSeek::seekoff(off_type off, seekdir dir, openmode) { pos_type pos = lseek ( d_fd, off, (dir == std::ios::beg) ? SEEK_SET : (dir == std::ios::cur) ? SEEK_CUR : SEEK_END ); if (pos < 0) return -1; setg(d_buffer, d_buffer + 1, d_buffer + 1); return pos; }seekpos is overridden as well:it is actually defined as a call toseekoff: inline IFdSeek::pos_type IFdSeek::seekpos(pos_type off, openmode mode) { return seekoff(off, std::ios::beg, mode); }IFdSeek. Ifthis program is given its own source file using input redirection thenseeking is supported (and with the exception of the first line, every otherline is shown twice): #include "fdinseek.h" #include <string> #include <iostream> #include <istream> #include <iomanip> using namespace std; int main() { IFdSeek fds(0); istream is(&fds); string s; while (true) { if (!getline(is, s)) break; streampos pos = is.tellg(); cout << setw(5) << pos << ": `" << s << "'\n"; if (!getline(is, s)) break; streampos pos2 = is.tellg(); cout << setw(5) << pos2 << ": `" << s << "'\n"; if (!is.seekg(pos)) { cout << "Seek failed\n"; break; } } }Streambuf classes and classes derived fromstreambuf should supportat least ungetting the last read character. Special care must be takenwhenseries ofunget calls must be supported. In this section theconstruction of a class supporting a configurable number ofistream::ungetoristream::putback calls is discussed.Support for multiple (say `n')unget calls is implemented byreserving an initial section of the input buffer, which is gradually filled upto contain the lastn characters read. The class is implemented asfollows:
std::streambuf. Itdefines several data members, allowing the class to perform the bookkeepingrequired to maintain an unget-buffer of a configurable size: class FdUnget: public std::streambuf { int d_fd; size_t d_bufsize; size_t d_reserved; char *d_buffer; char *d_base; public: FdUnget(int fd, size_t bufsz, size_t unget); ~FdUnget() override; private: int underflow() override; };d_reserved bytes of the class's input buffer.d_reserved. So, a certain number of bytes may be read. Onced_reservedbytes have been read at mostd_reserved bytes can be ungot.d_base, pointing to a locationd_reserved bytes beyond thelocation represented byd_buffer. This is always the location where bufferrefills start.streambuf's buffer pointers usingsetg. As no characters have beenread yet, all pointers are set to point tod_base. Ifunget iscalled at this point, no characters are available, andunget(correctly) fails. FdUnget::FdUnget(int fd, size_t bufsz, size_t unget) : d_fd(fd), d_reserved(unget) { size_t allocate = bufsz > d_reserved ? bufsz : d_reserved + 1; d_buffer = new char[allocate]; d_base = d_buffer + d_reserved; setg(d_base, d_base, d_base); d_bufsize = allocate - d_reserved; } inline FdUnget::~FdUnget() { delete[] d_buffer; }underflow is overridden as follows:underflow determines the number of characters thatcould potentially be ungot. If that number of characters are ungot, the inputbuffer is exhausted. So this value may be any value between 0 (the initialstate) or the input buffer's size (when the reserved area has been filled upcompletely, and all current characters in the remaining section of the bufferhave also been read);d_reserved, but it is set equal to theactual number of characters that can be ungot if this value is smaller;d_base;d_base and not fromd_buffer;streambuf's read buffer pointers are set up.Eback is set tomove locations befored_base, thusdefining the guaranteed unget-area,gptr is set tod_base, since that's the location of thefirst read character after a refill, andegptr is set just beyond the location of the last characterread into the buffer.underflow's implementation: int FdUnget::underflow() { size_t ungetsize = gptr() - eback(); size_t move = std::min(ungetsize, d_reserved); memcpy(d_base - move, egptr() - move, move); int nread = read(d_fd, d_base, d_bufsize); if (nread <= 0) // none read -> return EOF return EOF; setg(d_base - move, d_base, d_base + nread); return static_cast<unsigned char>(*gptr()); }An example using FdUnget
The next example program illustrates the use of the classFdUnget. Itreads at most 10 characters from the standard input, stopping atEOF. A guaranteed unget-buffer of 2 characters is defined in a bufferholding 3 characters. Just before reading a character, the program tries tounget at most 6 characters. This is, of course, not possible; but the programnicely ungets as many characters as possible, considering the actualnumber of characters read:
#include "fdunget.h" #include <string> #include <iostream> #include <istream> using namespace std; int main() { FdUnget fds(0, 3, 2); istream is(&fds); char c; for (int idx = 0; idx < 10; ++idx) { cout << "after reading " << idx << " characters:\n"; for (int ug = 0; ug <= 6; ++ug) { if (!is.unget()) { cout << "\tunget failed at attempt " << (ug + 1) << "\n" << "\trereading: '"; is.clear(); while (ug--) { is.get(c); cout << c; } cout << "'\n"; break; } } if (!is.get(c)) { cout << " reached\n"; break; } cout << "Next character: " << c << '\n'; } } /* Generated output after 'echo abcde | program': after reading 0 characters: unget failed at attempt 1 rereading: '' Next character: a after reading 1 characters: unget failed at attempt 2 rereading: 'a' Next character: b after reading 2 characters: unget failed at attempt 3 rereading: 'ab' Next character: c after reading 3 characters: unget failed at attempt 4 rereading: 'abc' Next character: d after reading 4 characters: unget failed at attempt 4 rereading: 'bcd' Next character: e after reading 5 characters: unget failed at attempt 4 rereading: 'cde' Next character: after reading 6 characters: unget failed at attempt 4 rereading: 'de ' reached */istream objectsoperator>>, thestandard extraction operator is perfectly suited for the task as in mostcases the extracted fields are white-space (or otherwise clearly) separatedfrom each other. But this does not hold true in all situations. For example,when a web-form is posted to some processing script or program, the receivingprogram may receive the form field's values asurl-encodedcharacters: letters and digits are sent unaltered, blanks are sent as+characters, and all other characters start with% followed by thecharacter'sascii-value represented by its two digit hexadecimal value.When decoding url-encoded information, simple hexadecimal extraction won'twork, as that extracts as many hexadecimal characters as available,instead of just two. Since the lettersa-f` and0-9 are legalhexadecimal characters, a text likeMy name is `Ed', url-encoded as
My+name+is+%60Ed%27
results in the extraction of the hexadecimal values60ed and27,instead of60 and27. The nameEd disappears from view, which isclearly not what we want.
In this case, having seen the%, we could extract 2 characters, putthem in anistringstream object, and extract the hexadecimal value fromtheistringstream object. A bit cumbersome, but doable. Other approachesare possible as well.
The classFistream forfixed-sized field istream definesanistream class supporting both fixed-sized field extractions andblank-delimited extractions (as well as unformattedread calls). Theclass may be initialized as awrapper around an existingistream, orit can be initialized using the name of an existing file. The class is derivedfromistream, allowing all extractions and operations supported byistreams in general.Fistream defines the following data members:
d_filebuf: a filebuffer used whenFistream reads its information from a named (existing) file. Since the filebuffer is only needed in that case, and since it must be allocated dynamically, it is defined as aunique_ptr<filebuf> object.d_streambuf: a pointer toFistream'sstreambuf. It points tod_filebuf whenFistream opens a file by name. When an existingistream is used to construct anFistream, it points to the existingistream'sstreambuf.d_iss: anistringstream object used for the fixed field extractions.d_width: asize_t indicating the width of the field to extract. If 0 no fixed field extractions is used, but information is extracted from theistream base class object using standard extractions.Fistream's class interface: class Fistream: public std::istream { std::unique_ptr<std::filebuf> d_filebuf; std::streambuf *d_streambuf; std::istringstream d_iss; size_t d_width; As stated,Fistream objects can be constructed from either afilename or an existingistream object. The class interface thereforedeclares two constructors:Fistream(std::istream &stream); Fistream(char const *name, std::ios::openmode mode = std::ios::in);When an
Fistream object is constructed using an existingistreamobject, theFistream'sistream part simply uses thestream'sstreambuf object:Fistream::Fistream(istream &stream): istream(stream.rdbuf()), d_streambuf(rdbuf()), d_width(0){} When anfstream object is constructed using a filename, theistream base initializer is given a newfilebuf object to be used asitsstreambuf. Since the class's data members are not initialized beforethe class's base class has been constructed,d_filebuf can only beinitialized thereafter. By then, thefilebuf is only available asrdbuf, returning astreambuf. However, as it is actually afilebuf, astatic_cast is used to cast thestreambuf pointerreturned byrdbuf toa filebuf *, sod_filebuf can be initialized:Fistream::Fistream(char const *name, ios::openmode mode): istream(new filebuf()), d_filebuf(static_cast<filebuf *>(rdbuf())), d_streambuf(d_filebuf.get()), d_width(0){ d_filebuf->open(name, mode);}setField(field const&). This member defines the size of the next field to extract. Itsparameter is a reference to afield class, amanipulator classdefining the width of the next field.Since afield & is mentioned inFistream's interface,fieldmust be declared beforeFistream's interface starts. The classfielditself is simple and declaresFistream as its friend. It has two datamembers:d_width specifies the width of the next field, andd_newWidthwhich is set totrue ifd_width's value should actually be used. Ifd_newWidth is false,Fistream returns to its standard extractionmode. The classfield has two constructors: a defaultconstructor, settingd_newWidth tofalse, and a second constructorexpecting the width of the next field to extract as its value. Here is theclassfield:
class field { friend class Fistream; size_t d_width; bool d_newWidth; public: field(size_t width); field(); }; inline field::field(size_t width) : d_width(width), d_newWidth(true) {} inline field::field() : d_newWidth(false) {} Sincefield declaresFistream as its friend,setField mayinspectfield's members directly.Time to return tosetField. This function expects a reference to afield object, initialized in one of three different ways:
field(): WhensetField's argument is afield object constructed by its default constructor the next extraction will use the same field width as the previous extraction.field(0): When thisfield object is used assetField's argument, fixed-sized field extraction stops, and theFistream acts like any standardistream object again.field(x): When thefield object itself is initialized by a non-zero size_t valuex, then the next field width isx characters wide. The preparation of such a field is left tosetBuffer,Fistream's only private member.setField's implementation:std::istream &Fistream::setField(field const ¶ms){ if (params.d_newWidth) // new field size requested d_width = params.d_width; // set new width if (!d_width) // no width? rdbuf(d_streambuf); // return to the old buffer else setBuffer(); // define the extraction buffer return *this;}The private membersetBuffer defines a buffer ofd_width + 1characters and usesread to fill the buffer withd_widthcharacters. The buffer is an NTBS. This buffer is used to initialize thed_iss member.Fistream'srdbuf member is used to extract thed_str's data via theFistream object itself:
void Fistream::setBuffer(){ char *buffer = new char[d_width + 1]; rdbuf(d_streambuf); // use istream's buffer to buffer[read(buffer, d_width).gcount()] = 0; // read d_width chars, // terminated by a 0-byte d_iss.str(buffer); delete[] buffer; rdbuf(d_iss.rdbuf()); // switch buffers}AlthoughsetField could be used to configureFistream to use ornot to use fixed-sized field extraction, using manipulators is probablypreferable. To allowfield objects to be used as manipulators anoverloaded extraction operator was defined. This extraction operator acceptsistream & and afield const & objects. Using this extractionoperator, statements like
fis >> field(2) >> x >> field(0);
are possible (assumingfis is aFistream object). Here is theoverloadedoperator>>, as well as its declaration:
istream &std::operator>>(istream &str, field const ¶ms){ return static_cast<Fistream *>(&str)->setField(params);} Declaration:namespace std{ istream &operator>>(istream &str, FBB::field const ¶ms);}Finally, an example. The following program uses aFistream object tourl-decode url-encoded information appearing at its standard input:
int main() { Fistream fis(cin); fis >> hex; while (true) { size_t x; switch (x = fis.get()) { case '\n': cout << '\n'; break; case '+': cout << ' '; break; case '%': fis >> field(2) >> x >> field(0); // FALLING THROUGH default: cout << static_cast<char>(x); break; case EOF: return 0; } } } /* Generated output after: echo My+name+is+%60Ed%27 | a.out My name is `Ed' */fork system call is wellknown. When a program needs to start a new process,system can be used.The functionsystem requires the program to wait for thechild process to terminate. The more general way to spawn subprocesses is to usefork.In this section we investigate howC++ can be used to wrap classes arounda complex system call likefork. Much of what follows in this sectiondirectly applies to the Unix operating system, and the discussion thereforefocuses on that operating system. Other systems usually providecomparable facilities. What follows is closely related to theTemplate Design Pattern (cf.Gamma et al. (1995)Design Patterns, Addison-Wesley)
Whenfork is called, the current program is duplicated in memory, thuscreating a new process. Following this duplication both processes continuetheir execution just below thefork system call. The two processes mayinspectfork's return value: the return value in theoriginal process (called theparent process) differs from the returnvalue in the newly created process (called thechild process):
fork returns theprocess ID of the(child) process that was created by thefork system call. This is apositive integer value.fork returns 0.fork fails, -1 is returned.Fork class should hide all bookkeeping details of a systemcall likefork from its users. The classFork developed here doesjust that. The class itself only ensures the proper execution of theforksystem call. Normally,fork is called to start a child process, usuallyboiling down to the execution of a separate process. This child process mayexpect input at its standard input stream and/or may generate output to itsstandard output and/or standard error streams.Fork does not know allthis, and does not have to know what the child process will do.Forkobjects should be able to start their child processes.Fork's constructor cannot know what actions its childprocess should perform. Similarly, it cannot know what actions the parentprocess should perform. For these kind of situations, thetemplate method design pattern was developed. According to Gamma c.s., thetemplate method designpattern
``Define(s) the skeleton of an algorithm in an operation, deferring some steps to subclasses. [The] Template Method (design pattern) lets subclasses redefine certain steps of an algorithm, without changing the algorithm's structure.''
This design pattern allows us to define anabstract base class already providing the essential steps related to thefork system call,deferring the implementation of other parts of thefork system call tosubclasses.
TheFork abstract base class has the following characteristics:
d_pid. In the parent process this datamember contains the child'sprocess id and in the child process it hasthe value 0. Its public interface declares only two members:fork member function, responsible for the actual forking (i.e., it creates the (new) child process);virtual destructor~Fork (having an empty body).Fork's interface: class Fork { int d_pid; public: virtual ~Fork(); void fork(); protected: int pid() const; int waitForChild(); // returns the status private: virtual void childRedirections(); virtual void parentRedirections(); virtual void childProcess() = 0; // pure virtual members virtual void parentProcess() = 0; };protected section and can thusonly be used by derived classes. Theyare:pid(): The member functionpid allows derived classes to access the systemfork's return value: inline int Fork::pid() const { return d_pid; }waitForChild(): The memberint waitForChild can be called by parent processes to wait for the completion of their child processes (as discussed below). This member is declared in the class interface. Its implementation is: #include "fork.ih" int Fork::waitForChild() { int status; waitpid(d_pid, &status, 0); return WEXITSTATUS(status); } This simple implementation returns the child'sexit status to the parent. The called system functionwaitpidblocks until the child terminates.fork system calls are used,parent processes andchild processes must always be distinguished. The main distinction between these processes is thatd_pid becomes the child's process-id in the parent process, whiled_pid becomes 0 in the child process itself. Since these two processes must always be distinguished (and present), their implementation by classes derived fromFork is enforced byFork's interface: the memberschildProcess, defining the child process' actions andparentProcess, defining the parent process' actions were defined as pure virtual functions.childRedirections(): this member should be overridden by derived classes if any standard stream (cin, cout,) orcerr must be redirected in thechild process (cf. section26.2.3). By default it has an empty implementation;parentRedirections(): this member should be overridden by derived classes if any standard stream (cin, cout,) orcerr must be redirected in theparent process. By default it has an empty implementation. void Fork::childRedirections() {} void Fork::parentRedirections() {}fork calls the system functionfork(Caution: since the system functionfork is called by a memberfunction having the same name, the:: scope resolution operator must beused to prevent a recursive call of the member function itself).The function::fork's return value determines whetherparentProcessorchildProcess is called. Maybe redirection isnecessary.Fork::fork's implementation callschildRedirectionsjust before callingchildProcess, andparentRedirections justbefore callingparentProcess: #include "fork.ih" void Fork::fork() { if ((d_pid = ::fork()) < 0) throw "Fork::fork() failed"; if (d_pid == 0) // childprocess has pid == 0 { childRedirections(); childProcess(); exit(1); // we shouldn't come here: } // childProcess() should exit parentRedirections(); parentProcess(); } Infork.cc the class'sinternal header filefork.ih is included. This header file takes care of the inclusion of thenecessary system header files, as well as the inclusion offork.hitself. Its implementation is:#include "fork.h" #include <cstdlib> #include <unistd.h> #include <sys/types.h> #include <sys/wait.h>
Child processes should not return: once they have completed their tasks,they should terminate. This happens automatically when the child processperforms a call to a member of theexec... family, but if the childitself remains active, then it must make sure that it terminates properly. Achild process normally usesexit to terminate itself, but note thatexit prevents the activation of destructors of objects defined at the same or more superficial nesting levels than the level atwhichexit is called. Destructors of globally defined objectsareactivated whenexit is used. When usingexit to terminatechildProcess, it should either itself call a support member functiondefining all nested objects it needs, or it should define all its objects in acompound statement (e.g., using athrow block) callingexit beyondthe compound statement.
Parent processes should normally wait for their children to complete.Terminating child processes inform their parents that they are about toterminate by sending asignal that should be caught by their parents. Ifchild processes terminate and their parent processes do not catch thosesignals then such child processes remain visible as so-calledzombieprocesses.
If parent processes must wait for their children to complete, they maycall the memberwaitForChild. This member returns the exit status of achild process to its parent.
There exists a situation where thechild processcontinues tolive, but theparent dies. This is a fairly natural event: parents tend todie before their children do. In our context (i.e.C++), this is called adaemon program. In a daemon the parent process dies and the child programcontinues to run as a child of the basicinit process. Again, when thechild eventually dies a signal is sent to its `step-parent'init. Thisdoes not create a zombie asinit catches the termination signals of allits (step-) children. The construction of a daemon process is very simple,given the availability of the classFork (cf. section26.2.4).
ios::rdbuf member function. By assigning thestreambuf of a stream to another stream, both stream objects access thesamestreambuf, thus implementing redirection at the level of theprogramming language itself.This may be fine within the context of aC++ program, but once weleave that context the redirection terminates. The operating system does notknow aboutstreambuf objects. This situation is encountered, e.g., when aprogram uses asystem call to start a subprogram. The example program atthe end of this section usesC++ redirection to redirect the informationinserted intocout to a file, and then calls
system("echo hello world")to echo a well-known line of text. Sinceecho writes its informationto the standard output, this would be the program's redirected file if theoperating system would recognizeC++'s redirection.
But redirection doesn't happen. Instead,hello world still appears atthe program's standard output and the redirected file is left untouched. Towritehello world to the redirected file redirection must be realized atthe operating system level. Some operating systems (e.g.,Unix andfriends) provide system calls likedup anddup2 to accomplishthis. Examples of the use of these system calls are given in section26.2.5.
Here is the example of thefailing redirection at the system levelfollowingC++ redirection usingstreambuf redirection:
#include <iostream> #include <fstream> #include <cstdlib> using namespace std; int main() { ofstream of("outfile"); streambuf *buf = cout.rdbuf(of.rdbuf()); cout << "To the of stream\n"; system("echo hello world"); cout << "To the of stream\n"; cout.rdbuf(buf); } /* Generated output: on the file `outfile' To the of stream To the of stream On standard output: hello world */fork is to start achild process. The parent process terminates immediately after spawning thechild process. If this happens, the child process continues to run as a childprocess ofinit, the always running first process onUnix systems. Sucha process is often called adaemon, running as abackground process.Although the next example can easily be constructed as a plainCprogram, it was included in theC++ Annotations because it is so closelyrelated to the current discussion of theFork class. I thought aboutadding adaemon member to that class, but eventually decided against itbecause the construction of a daemon program is very simple and requires nofeatures other than those currently offered by the classFork. Here is anexample illustrating the construction of such a daemon program. Its childprocess doesn't doexit butthrow 0 which is caught by thecatchclause of the child'smain function. Doing this ensures that any objectsdefined by the child process are properly destroyed:
#include <iostream> #include <unistd.h> #include "fork.h" class Daemon: public Fork { void parentProcess() override // the parent does nothing. {} void childProcess() override // actions by the child { sleep(3); // just a message... std::cout << "Hello from the child process\n"; throw 0; // The child process ends } }; int main() try { Daemon{}.fork(); } catch(...) {} /* Generated output: The next command prompt, then after 3 seconds: Hello from the child process */pipe systemcall. When two processes want to communicate using such file descriptors, thefollowing happens:pipe system call. One of the file descriptors is used for writing, theother file descriptor is used for reading.fork function is called),duplicating the file descriptors. Now we have four file descriptors asthe child process and the parent process both have their own copies of the twofile descriptors created bypipe.Pipe classdeveloped here. Let's have a look at its characteristics (before usingfunctions likepipe anddup the compiler must have read the<unistd.h> header file):pipe system call expects a pointer to twoint values,representing, respectively, the file descriptor used for reading and the filedescriptor used for writing. To avoid confusion, the classPipe defines anenum having values associating the indices of the array of 2-ints withsymbolic constants. The two file descriptors themselves are stored in a datamemberd_fd. Here is the initial section of the class's interface: class Pipe { enum RW { READ, WRITE }; int d_fd[2];pipe to create a set of associated file descriptors used foraccessing both ends of a pipe: Pipe::Pipe() { if (pipe(d_fd)) throw "Pipe::Pipe(): pipe() failed"; }readOnly andreadFrom are used to configure thepipe's reading end. The latter function is used when using redirection. It isprovided with an alternate file descriptor to be used for reading from thepipe. Usually this alternate file descriptor isSTDIN_FILENO, allowingcin to extract information from the pipe. The former function is merelyused to configure the reading end of the pipe. It closes the matching writingend and returns a file descriptor that can be used to read from the pipe: int Pipe::readOnly() { close(d_fd[WRITE]); return d_fd[READ]; } void Pipe::readFrom(int fd) { readOnly(); redirect(d_fd[READ], fd); close(d_fd[READ]); }writeOnly and twowrittenBy members are available toconfigure the writing end of a pipe. The former function is only used toconfigure the writing end of the pipe. It closes the reading end, andreturns a file descriptor that can be used for writing to the pipe: int Pipe::writeOnly() { close(d_fd[READ]); return d_fd[WRITE]; } void Pipe::writtenBy(int fd) { writtenBy(&fd, 1); } void Pipe::writtenBy(int const *fd, size_t n) { writeOnly(); for (size_t idx = 0; idx < n; idx++) redirect(d_fd[WRITE], fd[idx]); close(d_fd[WRITE]); } For the latter member two overloaded versions are available:writtenBy(int fd) is used to configuresingleredirection, so that a specific file descriptor (usuallySTDOUT_FILENOorSTDERR_FILENO) can be used to write to the pipe;(writtenBy(int const *fd, size_t n)) may be usedto configuremultiple redirection, providing an array argument containingfile descriptors. Information written to any of these file descriptors isactually written to the pipe.redirect, used to set upredirection through thedup2 system call. This function expects two filedescriptors. The first file descriptor represents a file descriptor that canbe used to access the device's information; the second file descriptor is analternate file descriptor that may also be used to access the device'sinformation. Here isredirect's implementation: void Pipe::redirect(int d_fd, int alternateFd) { if (dup2(d_fd, alternateFd) < 0) throw "Pipe: redirection failed"; }Pipeobjects, we'll useFork andPipe in various example programs.ParentSlurp, derived fromFork, starts a child processexecuting a stand-alone program (like/bin/ls). The (standard) output ofthe executed program is not shown on the screen but is read by the parentprocess.For demonstration purposes the parent process writes the lines itreceives to its standard output stream, prepending linenumbers to thelines. It is attractive to redirect the parent's standardinput stream toallow the parent to read theoutput from the child process using itsstd::cininput stream. Therefore, the only pipe in the program is usedas aninput pipe for the parent, and anoutput pipe for the child.
The classParentSlurp has the following characteristics:
Fork. Before startingParentSlurp's classinterface, the compiler must have readfork.h andpipe.h. The classonly uses one data member, aPipe objectd_pipe.Pipe's constructor already defines a pipe, and asd_pipeis automatically initialized byParentSlurp's default constructor, whichis implicitly provided, all additional members only exist forParentSlurp's own benefit so they can be defined in the class's (implicit)private section. Here is the class's interface: class ParentSlurp: public Fork { Pipe d_pipe; void childRedirections() override; void parentRedirections() override; void childProcess() override; void parentProcess() override; };childRedirections member configures the writing end of thepipe. So, all information written to the child's standard output stream endsup in the pipe. The big advantage of this is that no additional streams areneeded to write to a file descriptor: inline void ParentSlurp::childRedirections() { d_pipe.writtenBy(STDOUT_FILENO); }parentRedirections member, configures the reading end ofthe pipe. It does so by connecting the reading end of the pipe to the parent'sstandard input file descriptor (STDIN_FILENO). This allows the parent toperform extractions fromcin, not requiring any additional streams forreading. inline void ParentSlurp::parentRedirections() { d_pipe.readFrom(STDIN_FILENO); }childProcess member only needs to concentrate on its ownactions. As it only needs to execute a program (writing information to itsstandard output), the member can consist of one single statement: inline void ParentSlurp::childProcess() { execl("/bin/ls", "/bin/ls", 0); }parentProcess member simply `slurps' the informationappearing at its standard input. Doing so, it actually reads the child'soutput. It copies the received lines to its standard output stream prefixingline numbers to them: void ParentSlurp::parentProcess() { std::string line; size_t nr = 1; while (getline(std::cin, line)) std::cout << nr++ << ": " << line << '\n'; waitForChild(); }ParentSlurp object, andcalls itsfork() member. Its output consists of a numbered list of filesin the directory where the program is started. Note that the program alsoneeds thefork.o, pipe.o andwaitforchild.o object files (seeearlier sources): int main() { ParentSlurp{}.fork(); } /* Generated Output (example only, actually obtained output may differ): 1: a.out 2: bitand.h 3: bitfunctional 4: bitnot.h 5: daemon.cc 6: fdinseek.cc 7: fdinseek.h ... */start: this starts a new child process. The parent returns thechild's ID (a number) to the user. The ID is thereupon be used to identify aparticular child process;<nr> text sends ``text'' to the child process having ID<nr>;stop <nr> terminates the child process having ID<nr>;exit terminates the parent as well as all its child processes.A problem with programs like our monitor is that they allowasynchronous input from multiple sources. Input may appear at thestandard input as well as at the input-sides of pipes. Also, multiple outputchannels are used. To handle situations like these, theselect systemcall was developed.
select system call was developed to handle asynchronousI/O multiplexing. Theselect system call is used to handle, e.g., input appearingsimultaneously at a set of file descriptors.Theselect function is rather complex, and its full discussion isbeyond theC++ Annotations' scope. By encapsulatingselect in aclassSelector, hiding its details and offering an intuitively attractiveinterface, its use is simplified. TheSelector class has thesefeatures:
Select's members are very small,most members can be implemented inline. The class requires quite a few datamembers. Most of these data members belong to types that require some systemheaders to be included first:#include <limits.h> #include <unistd.h> #include <sys/time.h> #include <sys/types.h>
fd_set is atype designed to be used byselect and variables of this type contain theset of file descriptors on whichselect may sense someactivity. Furthermore,select allows us to fire anasynchronous alarm. To set the alarm time, the classSelectordefines atimeval data member. Other members are used for internalbookkeeping purposes. Here is the classSelector's interface: class Selector { fd_set d_read; fd_set d_write; fd_set d_except; fd_set d_ret_read; fd_set d_ret_write; fd_set d_ret_except; timeval d_alarm; int d_max; int d_ret; int d_readidx; int d_writeidx; int d_exceptidx; public: Selector(); int exceptFd(); int nReady(); int readFd(); int wait(); int writeFd(); void addExceptFd(int fd); void addReadFd(int fd); void addWriteFd(int fd); void noAlarm(); void rmExceptFd(int fd); void rmReadFd(int fd); void rmWriteFd(int fd); void setAlarm(int sec, int usec = 0); private: int checkSet(int *index, fd_set &set); void addFd(fd_set *set, int fd); };Selector(): the (default) constructor. Itclears the read, write, and executefd_set variables, and switches off thealarm. Except ford_max, the remaining data members do not requirespecific initializations: Selector::Selector() { FD_ZERO(&d_read); FD_ZERO(&d_write); FD_ZERO(&d_except); noAlarm(); d_max = 0; }int wait(): this memberblocks until thealarm timesout or until activity is sensed at any of the file descriptors monitored bytheSelector object. It throws an exception when theselect systemcall itself fails: int Selector::wait() { timeval t = d_alarm; d_ret_read = d_read; d_ret_write = d_write; d_ret_except = d_except; d_readidx = 0; d_writeidx = 0; d_exceptidx = 0; d_ret = select(d_max, &d_ret_read, &d_ret_write, &d_ret_except, t.tv_sec == -1 && t.tv_usec == -1 ? 0 : &t); if (d_ret < 0) throw "Selector::wait()/select() failed"; return d_ret; }int nReady: this member function's return value is onlydefined whenwait has returned. In that case it returns 0 for analarm-timeout, -1 ifselect failed, and otherwise the number of filedescriptors on which activity was sensed: inline int Selector::nReady() { return d_ret; }int readFd(): this member function's returnvalue is also only defined afterwait has returned. Its return value is-1 if no (more) input file descriptors are available. Otherwise the next filedescriptor available for reading is returned: inline int Selector::readFd() { return checkSet(&d_readidx, d_ret_read); }int writeFd(): operating analogously toreadFd, itreturns the next file descriptor to which output is written. It usesd_writeidx andd_ret_read and is implemented analogously toreadFd;int exceptFd(): operating analogously toreadFd, itreturns the next exception file descriptor on which activity was sensed. Itusesd_except_idx andd_ret_except and is implemented analogously toreadFd;void setAlarm(int sec, int usec = 0): this memberactivatesSelect's alarm facility. At least the number of seconds to waitfor the alarm to go off must be specified. It simply assigns values tod_alarm's fields. At the nextSelect::wait call, the alarm fires(i.e.,wait returns with return value 0) once the configuredalarm-interval has passed: inline void Selector::setAlarm(int sec, int usec) { d_alarm.tv_sec = sec; d_alarm.tv_usec = usec; }void noAlarm(): this member switches off the alarm, bysimply setting the alarm interval to a very long period: inline void Selector::noAlarm() { setAlarm(-1, -1); }void addReadFd(int fd): this member adds afile descriptor to the set of input file descriptors monitored by theSelector object. The member functionwait returns once input isavailable at the indicated file descriptor: inline void Selector::addReadFd(int fd) { addFd(&d_read, fd); }void addWriteFd(int fd): this member adds a filedescriptor to the set of output file descriptors monitored by theSelectorobject. The member functionwait returns once output is available atthe indicated file descriptor. Usingd_write, it is implementedanalogously toaddReadFd;void addExceptFd(int fd): this member adds a filedescriptor to the set of exception file descriptors to be monitored by theSelector object. The member functionwait returns once activityis sensed at the indicated file descriptor. Usingd_except, it isimplemented analogously toaddReadFd;void rmReadFd(int fd): this member removes a filedescriptor from the set of input file descriptors monitored by theSelector object: inline void Selector::rmReadFd(int fd) { FD_CLR(fd, &d_read); }void rmWriteFd(int fd): this member removes a filedescriptor from the set of output file descriptors monitored by theSelector object. Usingd_write, it is implemented analogously tormReadFd;void rmExceptFd(int fd): this member removes a filedescriptor from the set of exception file descriptors to be monitored by theSelector object. Usingd_except, it is implemented analogously tormReadFd;private section:addFd adds a file descriptor to afd_set: void Selector::addFd(fd_set *set, int fd) { FD_SET(fd, set); if (fd >= d_max) d_max = fd + 1; }checkSet tests whether a file descriptor (*index)is found in afd_set: int Selector::checkSet(int *index, fd_set &set) { int &idx = *index; while (idx < d_max && !FD_ISSET(idx, &set)) ++idx; return idx == d_max ? -1 : idx++; }monitor program uses aMonitor object doing most of thework. The classMonitor's public interface only offers a defaultconstructor and one member,run, to perform its tasks. All other memberfunctions are located in the class'sprivate section.Monitor defines theprivate enumCommands, symbolicallylisting the various commands its input language supports, as well as severaldata members. Among the data members are aSelector object and amapusing child order numbers as its keys and pointer toChild objects (seesection26.2.7.7) as its values. Furthermore,Monitor has a static arraymembers_handler[], storing pointers to member functions handling usercommands.
A destructor should be implemented as well, but its implementation is leftas an exercise to the reader. Here isMonitor's interface, including theinterface of the nested classFind that is used to create a functionobject:
class Monitor { enum Commands { UNKNOWN, START, EXIT, STOP, TEXT, sizeofCommands }; using MapIntChild = std::map<int, std::shared_ptr<Child>>; friend class Find; class Find { int d_nr; public: Find(int nr); bool operator()(MapIntChild::value_type &vt) const; }; Selector d_selector; int d_nr; MapIntChild d_child; static void (Monitor::*s_handler[])(int, std::string const &); static int s_initialize; public: enum Done {}; Monitor(); void run(); private: static void killChild(MapIntChild::value_type it); static int initialize(); Commands next(int *value, std::string *line); void processInput(); void processChild(int fd); void createNewChild(int, std::string const &); void exiting(int = 0, std::string const &msg = std::string{}); void sendChild(int value, std::string const &line); void stopChild(int value, std::string const &); void unknown(int, std::string const &); };Since there's only one non-class type data member, the class's constructoris a very simple function which could be implemented inline:
inline Monitor::Monitor() : d_nr(0) {}s_handler, storing pointers to functions needs to be initialized aswell. This can be accomplished in several ways:Commands enumeration only specifies a fairly limitedset of commands, compile-time initialization could be considered: void (Monitor::*Monitor::s_handler[])(int, string const &) = { &Monitor::unknown, // order follows enum Command's &Monitor::createNewChild, // elements &Monitor::exiting, &Monitor::stopChild, &Monitor::sendChild, }; The advantage of this is that it's simple, not requiring any run-timeeffort. The disadvantage is of course relatively complex maintenance. If forsome reasonCommands is modified,s_handler must be modified aswell. In cases like these, compile-time initialization often isasking for trouble. There is a simple alternative though.Monitor's interface we see a static data members_initialize and a static member functioninitialize. The staticmember function handles the initialization of thes_handler array. Itexplicitly assigns the array's elements and any modification in ordering ofenum Commands' values is automatically accounted for by recompilinginitialize: void (Monitor::*Monitor::s_handler[sizeofCommands])(int, string const &); int Monitor::initialize() { s_handler[UNKNOWN] = &Monitor::unknown; s_handler[START] = &Monitor::createNewChild; s_handler[EXIT] = &Monitor::exiting; s_handler[STOP] = &Monitor::stopChild; s_handler[TEXT] = &Monitor::sendChild; return 0; } The memberinitialize is a static member and so it can becalled to initializes_initialize, a staticint variable. Theinitialization is enforced by placing the initialization statement in thesource file of a function that is known to be executed. It could bemain,but if we'reMonitor's maintainers and only have control over the librarycontainingMonitor's code then that's not an option. In those cases thesource file containing the destructor is avery good candidate. If a classhas only one constructor and it'snot defined inline then theconstructor's source file is a good candidate as well. InMonitor'scurrent implementation the initialization statement is put inrun's sourcefile, reasoning thats_handler is only needed whenrun is used.Monitor's core activities are performed byrun. Itperforms the following tasks:Monitor object only monitors its standardinput. The set of input file descriptors to whichd_selector listensis initialized toSTDIN_FILENO.d_selector'swait function is called.If input oncin is available, it is processed byprocessInput.Otherwise, the input has arrived from a child process. Information sent bychildren is processed byprocessChild.As noted by Ben Simons (ben at mrxfx dot com)Monitor must not catchthe termination signals. Instead, the process spawning child processes hasthat responsibility (the underlying principle being that a parent process isresponsible for its child processes; a child process, in turn, is responsiblefor its own child processes).
run's source file also defines and initializess_initialize to ensure the proper initialization of thes_handlerarray.run's implementation ands_initialize's definition: #include "monitor.ih" int Monitor::s_initialize = Monitor::initialize(); void Monitor::run() { d_selector.addReadFd(STDIN_FILENO); while (true) { cout << "? " << flush; try { d_selector.wait(); int fd; while ((fd = d_selector.readFd()) != -1) { if (fd == STDIN_FILENO) processInput(); else processChild(fd); } cout << "NEXT ...\n"; } catch (char const *msg) { exiting(1, msg); } } }The member functionprocessInput reads the commands entered by theuser using the program's standard input stream. The member itself is rathersimple. It callsnext to obtain the next command entered by the user, andthen calls the corresponding function using the matching element of thes_handler[] array. Here are the membersprocessInput andnext:
void Monitor::processInput() { string line; int value; Commands cmd = next(&value, &line); (this->*s_handler[cmd])(value, line); } Monitor::Commands Monitor::next(int *value, string *line) { if (!getline(cin, *line)) exiting(1, "Monitor::next(): reading cin failed"); if (*line == "start") return START; if (*line == "exit" || *line == "quit") { *value = 0; return EXIT; } if (line->find("stop") == 0) { istringstream istr(line->substr(4)); istr >> *value; return !istr ? UNKNOWN : STOP; } istringstream istr(line->c_str()); istr >> *value; if (istr) { getline(istr, *line); return TEXT; } return UNKNOWN; }All other input sensed byd_select is created by childprocesses. Becaused_select'sreadFd member returns the correspondinginput file descriptor, this descriptor can be passed toprocessChild. Using aIFdStreambuf (see section26.1.2.1), itsinformation is read from an input stream. The communication protocol used hereis rather basic. For every line of input sent to a child, the child replies bysending back exactly one line of text. This line is then read byprocessChild:
void Monitor::processChild(int fd) { IFdStreambuf ifdbuf(fd); istream istr(&ifdbuf); string line; getline(istr, line); cout << d_child[fd]->pid() << ": " << line << '\n'; } The constructiond_child[fd]->pid() used in the above source deservessome special attention.Monitor defines the data membermap<int,shared_ptr<Child>> d_child. This map contains the child's order numberas its key, and a (shared) pointer to theChild object as its value. Ashared pointer is used here, rather than aChild object, since we want touse the facilities offered by the map, but don't want to copy aChildobject time and again.run's implementation has been covered, we'll concentrate onthe various commands users might enter:start command is issued, a new child process is started.A new element is added tod_child by the membercreateNewChild. Next,theChild object should start its activities, but theMonitor objectcan not wait for the child process to complete its activities, as there is nowell-defined endpoint in the near future, and the user probably wants to beable to enter more commands. Therefore, theChild process must run as adaemon. So the forked process terminates immediately, but its own childprocess continues to run (in the background). Consequently,createNewChild calls the child'sfork member. Although it is thechild'sfork function that is called, it is still the monitor programwherein thatfork function is called. So, themonitor program isduplicated byfork. Execution then continues:Child'sparentProcess in its parent process;Child'schildProcess in its child processChild'sparentProcess is an empty function, returningimmediately, theChild's parent process effectively continues immediatelybelowcreateNewChild'scp->fork() statement. As the child processnever returns (see section26.2.7.7), the code belowcp->fork() is neverexecuted by theChild's child process. This is exactly as it should be.In the parent process,createNewChild's remaining code simplyadds the file descriptor that's available for reading information from thechild to the set of input file descriptors monitored byd_select, andusesd_child to establish the association between thatfile descriptor and theChild object's address:
void Monitor::createNewChild(int, string const &) { Child *cp = new Child{ ++d_nr }; cp->fork(); int fd = cp->readFd(); d_selector.addReadFd(fd); d_child[fd].reset(cp); cerr << "Child " << d_nr << " started\n"; }stop <nr>and<nr> text commands. The former command terminates child process<nr>, by callingstopChild. This function locates the child processhaving the order number using an anonymous object of the classFind,nested insideMonitor. The classFind simply compares theprovidednr with the children's order number returned by theirnrmembers: inline Monitor::Find::Find(int nr) : d_nr(nr) {} inline bool Monitor::Find::operator()(MapIntChild::value_type &vt) const { return d_nr == vt.second->nr(); } If the child process having order numbernr was found, its filedescriptor is removed fromd_selector's set of input filedescriptors. Then the child process itself is terminated by the static memberkillChild. The memberkillChild is declared as astatic memberfunction, as it is used as function argument of thefor_each genericalgorithm byexiting (see below). Here iskillChild'simplementation: void Monitor::killChild(MapIntChild::value_type it) { if (kill(it.second->pid(), SIGTERM)) cerr << "Couldn't kill process " << it.second->pid() << '\n'; // reap defunct child process int status = 0; while( waitpid( it.second->pid(), &status, WNOHANG) > -1) ; } Having terminated the specified child process, the correspondingChildobject is destroyed and its pointer is removed fromd_child: void Monitor::stopChild(int nr, string const &) { auto it = find_if(d_child.begin(), d_child.end(), Find{ nr }); if (it == d_child.end()) cerr << "No child number " << nr << '\n'; else { d_selector.rmReadFd(it->second->readFd()); d_child.erase(it); } }<nr> text sendstext to child processnrusing the member functionsendChild. This function also uses aFindobject to locate the child-process having order numbernr, and simplyinserts the text into the writing end of a pipe connected to that childprocess: void Monitor::sendChild(int nr, string const &line) { auto it = find_if(d_child.begin(), d_child.end(), Find(nr)); if (it == d_child.end()) cerr << "No child number " << nr << '\n'; else { OFdnStreambuf ofdn{ it->second->writeFd() }; ostream out(&ofdn); out << line << '\n'; } }exit orquit the memberexiting iscalled. It terminates all child processes using thefor_each genericalgorithm (see section19.1.18) to visit all elements ofd_child. Then the program itself ends: void Monitor::exiting(int value, string const &msg) { for_each(d_child.begin(), d_child.end(), killChild); if (msg.length()) cerr << msg << '\n'; throw value; }main function is simple and needs no further comment: int main() try { Monitor{}.run(); } catch (int exitValue) { return exitValue; }Monitor object starts a child process, it creates an objectof the classChild. TheChild class is derived from the classFork, allowing it to operate as adaemon (as discussed in theprevious section). SinceChild is a daemon class, we know that its parentprocess must be defined as an empty function. ItschildProcess memberhas a non-empty implementation. Here are the characteristics of the classChild:Child class has twoPipe data members, to handlecommunications between its own child- and parent processes. As these pipes areused by theChild's child process, their names refer to the childprocess. The child process reads fromd_in, and writes tod_out. Hereis the interface of the classChild: class Child: public Fork { Pipe d_in; Pipe d_out; int d_parentReadFd; int d_parentWriteFd; int d_nr; public: Child(int nr); ~Child() override; int readFd() const; int writeFd() const; int pid() const; int nr() const; private: void childRedirections() override; void parentRedirections() override; void childProcess() override; void parentProcess() override; };Child's constructor simply stores its argument, achild-process order number, in its ownd_nr data member: inline Child::Child(int nr) : d_nr(nr) {}Child's child process obtains commands from its standardinput stream and writes its output to its standard output stream. Since theactual communication channels are pipes, redirections must be used. ThechildRedirections member looks like this: void Child::childRedirections() { d_in.readFrom(STDIN_FILENO); d_out.writtenBy(STDOUT_FILENO); }d_in andreads fromd_out. Here isparentRedirections: void Child::parentRedirections() { d_parentReadFd = d_out.readOnly(); d_parentWriteFd = d_in.writeOnly(); }Child object exists until it is destroyed by theMonitor'sstopChild member. By allowing its creator, theMonitorobject, to access the parent-side ends of the pipes, theMonitor objectcan communicate with theChild's child process via those pipe-ends. ThemembersreadFd andwriteFd allow theMonitor object to accessthese pipe-ends: inline int Child::readFd() const { return d_parentReadFd; } inline int Child::writeFd() const { return d_parentWriteFd; }Child object's child process performs two tasks:childProcess defines a localSelector object, addingSTDIN_FILENO to its set of monitored inputfile descriptors.Then, in an endless loop,childProcess waits forselector.wait()to return. When the alarm goes off it sends a message to its standard output(hence, into the writing pipe). Otherwise, it echoes the messages appearingat its standard input to its standard output. Here is thechildProcessmember:
void Child::childProcess() { Selector selector; size_t message = 0; selector.addReadFd(STDIN_FILENO); selector.setAlarm(5); while (true) { try { if (!selector.wait()) // timeout cout << "Child " << d_nr << ": standing by\n"; else { string line; getline(cin, line); cout << "Child " << d_nr << ":" << ++message << ": " << line << '\n'; } } catch (...) { cout << "Child " << d_nr << ":" << ++message << ": " << "select() failed" << '\n'; } } exit(0); }Monitor object to obtaintheChild's process ID and its order number: inline int Child::pid() const { return Fork::pid(); } inline int Child::nr() const { return d_nr; }Child process terminates when the user enters astopcommand. When an existing child process number was entered, the correspondingChild object is removed fromMonitor'sd_child map. As a result,its destructor is called.Child's destructor callskill to terminateits child, and then waits for the child to terminate. Once its child hasterminated, the destructor has completed its work and returns, thus completingthe erasure fromd_child. The current implementation fails if the childprocess doesn't react to theSIGTERM signal. In this demonstration programthis does not happen. In `real life' more elaborate killing-procedures may berequired (e.g., usingSIGKILL in addition toSIGTERM). As discussed insection10.12 itis important to ensure the properdestruction. Here is theChild's destructor: Child::~Child() { if (pid()) { cout << "Killing process " << pid() << "\n"; kill(pid(), SIGTERM); int status; wait(&status); } }const & arguments can be implemented using a member implementing theoperation, only offering the basic exception guarantee.This latter function can in turn be implemented using the binaryassignment member. The following examples illustrated this approach for afictitious classBinary: class Binary { public: Binary(); Binary(int value); // copy and move constructors are available by default, or // they can be explicitly declared and implemented. Binary &operator+=(Binary const &other) &; // see the text Binary &&operator+=(Binary const &other) &&; private: void add(Binary const &rhs); friend Binary operator+(Binary const &lhs, Binary const &rhs); friend Binary operator+(Binary &&lhs, Binary const &rhs); };Eventually, the implementation of binary operators depends on the availabilityof the member implementing the basic binary operation, modifying the objectcalling that member (i.e.,void Binary::add(Binary const &) in theexample).
Since template functions are not instantiated before they are actually used wecan call non-existing functions from template functions that are neverinstantiated. If such a template function is never instantiated, nothinghappens; if it is (accidentally) instantiated, then the compiler generates anerror message, complaining about the missing function.
This allows us to implement all binary operators, movable and non-movable, astemplates. In the following subsections we develop the class templateBinops, prividing binary operators. A complete implementation of a classDerived illustrating how addition and insertion operators can be added toa class is provided in the fileannotations/yo/concrete/examples/binopclasses.cc in theC++ Annotations'source archive.
add. This is less attractive when developing functiontemplates, asadd is a private member, requiring us to provide frienddeclarations for all function templates so they may access the privateaddmember.At the end of section11.7 we saw thatadd's implementationcan be provided byoperator+=(Class const &rhs) &&. This operator maythereupon be used when implementing the remaining addition operators:
inline Binary &operator+=(Binary const &rhs) & { return *this = Binary{*this} += rhs; } Binary operator+(Binary &&lhs, Binary const &rhs) { return std::move(lhs) += rhs; } Binary operator+(Binary const &lhs, Binary const &rhs) { return Binary{lhs} += rhs; }In this implementationadd is no longer required. The plain binaryoperators are free functions, which supposedly can easily be converted tofunction templates. E.g.,
template <typename Binary> Binary operator+(Binary const &lhs, Binary const &rhs) { return Binary{lhs} += rhs; }Binary operator+(Binary const &lhs, Binaryconst &rhs), however, we may encounter a subtle and unexpectedcomplication. Consider the following program. When run, it displays the value12, rather than 1: enum Values { ZERO, ONE }; template <typename Tp> Tp operator+(Tp const &lhs, Tp const &rhs) { return static_cast<Tp>(12); }; int main() { cout << (ZERO + ONE); // shows 12 }This complication can be avoided by defining the operators in their ownnamespace, but then all classes using the binary operator also have to bedefined in that namespace, which is not a very attractiverestriction. Fortunately, there is a better alternative: using the CRTP(cf. section22.12).
Binops, using the CRTP theoperators are defined for arguments of the classBinops<Derived>: a baseclass receiving the derived class as its template argument.Thus the classBinops as well as the additional operators are defined,expectingBinops<Derived> type of arguments:
template <class Derived> struct Binops { Derived &operator+=(Derived const &rhs) &; }; template <typename Derived> Derived operator+(Binops<Derived> const &lhs, Derived const &rhs) { return Derived{static_cast<Derived const &>(lhs) } += rhs; } // analogous implementation for Binops<Derived> &&lhsThis way, a class that derives fromBinops, and that provides anoperator+= member which is bound to an rvalue reference object, suddenlyalso provides all other binary addition operators:
class Derived: public Binops<Derived> { ... public: ... Derived &&operator+=(Derived const &rhs) && };All, but one....
The operator that's not available is the compound addition operator,bound to an lvalue reference. As its function name is identical to the one inthe classDerived, it is not automatically visible at the user level.
Although this problem can simply be solved by providing the classDerivedwith ausing Binops<Derived>::operator+= declaration, it is not a veryattractive solution, as separate using declarations have to be provided foreach binary operator that is implemented in the classDerived.
But amuch more attractive solution exists. A beautiful out-of-the-boxsolution, completely avoiding the hidden base class operator, was proposedbyWiebe-Marten Wijnja. Wiebe-Marten conjectured thatoperator+=, boundto an lvalue reference could also very well be defined as afreefunction. In that case no inheritance is used and therefore no function hidingoccurs. Consequently, theusing directive can be avoided.
The implementation of this freeoperator+= function looks like this:
template <class Derived> Derived &operator+=(Binops<Derived> &lhs, Derived const &rhs) { Derived tmp{ Derived{ static_cast<Derived &>(lhs) } += rhs }; tmp.swap(static_cast<Derived &>(lhs)); return static_cast<Derived &>(lhs); }The flexibility of this design can be further augmented once we realize thatthe right-hand side operand doesn't have to be aDerived classobject. Consideroperator<<: oftentimes shifts are bit-shifts, using asize_t to specify the number of bits to shift. In fact, the type of theright-hand side operand can completely be generalized by defining a secondtemplate type parameter, which is used to specify the right-hand side'soperand type. It's up to theDerived class to specify the argument type ofitsoperator+= (or any other binary compound operator), whereafter thecompiler will deduce the types of the right-hand side operands for theremaining binary operators. Here is the final implementation of the freeoperator+= function:
template <class Derived, typename Rhs> Derived &operator+=(Binops<Derived> &lhs, Rhs const &rhs) { Derived tmp{ Derived{ static_cast<Derived &>(lhs) } += rhs }; tmp.swap(static_cast<Derived &>(lhs)); return static_cast<Derived &>(lhs); }voidinsert(std::ostream &out) const to insert an object into anostream andvoid extract(std::istream &in) const to extract an object from anistream. As these functions are only used by, respectively, the insertionand extraction operators, they can be declared in theDerived class'sprivate interface. Instead of declaring the insertion and extraction operatorsfriends of the classDerived a singlefriend Binops<Derived> isspecified. This allowsBinops<Derived> to define private, inlineiWrapandeWrap members, merely calling, respectively,Derived's insert andextract members: template <typename Derived> inline void Binops<Derived>::iWrap(std::ostream &out) const { static_cast<Derived const &>(*this).insert(out); }Binops<Derived> then declares the insertion and extraction operators asits friends, allowing these operators to call, respectively,iWrap andeWrap. Note that the software engineer designing the classDerivedonly has to provide afriend Binops<Derived> declaration. Here is theimplementation of the overloaded insertion operator:
template <typename Derived> std::ostream &operator<<(std::ostream &out, Binops<Derived> const &obj) { obj.iWrap(out); return out; }This completes the coverage of the essentials of a class templateBinopspotentially offering binary operators and insertion/extraction operators forany class derived fromBinops. Finally, as noted at the beginning of thissection, a complete implementation of a class offering addition and insertionoperators is provided in the fileannotations/yo/concrete/examples/binopclasses.cc in theC++ Annotations'source archive.
operator[] is that it can't distinguish between its use as anlvalue and as anrvalue. It is a familiar misconception tothink thatType const &operator[](size_t index) const
is used asrvalue (as the object isn't modified), and that
Type &operator[](size_t index)
is used aslvalue (as the returned value can be modified).
The compiler, however, distinguishes between the two operators only by theconst-status of the object for whichoperator[] is called. Withconst objects the former operator is called, with non-const objectsthe latter is always used. It is always used, irrespective of it being used aslvalue or rvalue.
Being able to distinguish between lvalues and rvalues can be veryuseful. Consider the situation where a class supportingoperator[] storesdata of a type that is very hard to copy. With data like that referencecounting (e.g., usingshared_ptrs) is probably used to prevent needlesscopying.
As long asoperator[] is used as rvalue there's no need to copy the data,but the informationmust be copied if it is used as lvalue.
TheProxy Design Pattern (cf.Gamma et al. (1995)) canbe used to distinguish between lvalues and rvalues. With the Proxy DesignPattern an object of another class (the Proxy class) is used to act as astand in for the `real thing'. The proxy class offers functionality thatcannot be offered by the data themselves, like distinguishing between its useas lvalue or rvalue. A proxy class can be used in many situations where accessto the real data cannot or should not be directly provided. In this regarditerator types are examples of proxy classes as they create a layerbetween the real data and the software using the data. Proxy classes couldalso dereference pointers in a class storing its data by pointers.
In this section we concentrate on the distinction between usingoperator[]as lvalue and rvalue. Let's assume we have a classLines storing linesfrom a file. Its constructor expects the name of a stream from which thelines are read and it offers a non-constoperator[] that can be used aslvalue or rvalue (theconst version ofoperator[] is omitted as itcauses no confusion because it is always used as rvalue):
class Lines { std::vector<std::string> d_line; public: Lines(std::istream &in); std::string &operator[](size_t idx); };To distinguish between lvalues and rvalues we must find distinguishingcharacteristics of lvalues and rvalues that we can exploit. Suchdistinguishing characteristics areoperator= (which is always used aslvalue) and the conversion operator (which is always used as rvalue). Ratherthan havingoperator[] return astring & we can let it return aProxy object that is able to distinguish between its use as lvalueand rvalue.
The classProxy thus needsoperator=(string const &other) (acting aslvalue) andoperator std::string const &() const (acting as rvalue). Do weneed more operators? Thestd::string class also offersoperator+=, sowe should probably implement that operator as well. Plain characters can alsobe assigned tostring objects (even using their numeric values). Asstring objects cannot beconstructed from plain characterspromotion cannot be used withoperator=(string const &other) if theright-hand side argument is a character. Implementingoperator=(charvalue) could therefore also be considered. These additional operators areleft out of the current implementation but `real life' proxy classes shouldconsider implementing these additional operators as well. Another subtlety isthatProxy'soperator std::string const &() const is not usedwhen usingostream's insertion operator oristream's extractionoperator as these operators are implemented as templates not recognizing ourProxy class type. So when stream insertion and extraction is required (itprobably is) thenProxy must be given its own overloaded insertion andextraction operator. Here is an implementation of the overloaded insertionoperator inserting the object for whichProxy is a stand-in:
inline std::ostream &operator<<(std::ostream &out, Lines::Proxy const &proxy){ return out << static_cast<std::string const &>(proxy);}There's no need for any code (exceptLines) to create or copyProxyobjects.Proxy's constructor should therefore be made private, andProxy can declareLines to be its friend. In fact,Proxy isintimately related toLines and can be defined as a nested class. In therevisedLines classoperator[] no longer returns astring butinstead aProxy is returned. Here is the revisedLines class,including its nestedProxy class:
class Lines { std::vector<std::string> d_line; public: class Proxy; Proxy operator[](size_t idx); class Proxy { friend Proxy Lines::operator[](size_t idx); std::string &d_str; Proxy(std::string &str); public: std::string &operator=(std::string const &rhs); operator std::string const &() const; }; Lines(std::istream &in); };Proxy's members are very lightweight and can usually be implementedinline:
inline Lines::Proxy::Proxy(std::string &str) : d_str(str) {} inline std::string &Lines::Proxy::operator=(std::string const &rhs) { return d_str = rhs; } inline Lines::Proxy::operator std::string const &() const { return d_str; }The memberLines::operator[] can also be implemented inline: it merelyreturns aProxy object initialized with thestring associated withindexidx.
Now that the classProxy has been developed it can be used in aprogram. Here is an example using theProxy object as lvalue or rvalue. Onthe surfaceLines objects won't behave differently fromLines objectsusing the original implementation, but by adding an identifyingcoutstatement toProxy's members it can be shown thatoperator[] behavesdifferently when used as lvalue or as rvalue:
int main() { ifstream in("lines.cc"); Lines lines(in); string s = lines[0]; // rvalue use lines[0] = s; // lvalue use cout << lines[0] << '\n'; // rvalue use lines[0] = "hello world"; // lvalue use cout << lines[0] << '\n'; // rvalue use }An object of this nested iterator class handles the dereferencing of thepointers stored in the vector. This allowed us to sort thestringspointed to by the vector's elements rather than thepointers.
A drawback of this is that the class implementing the iterator is closelytied to the derived class as the iterator class was implemented as a nestedclass. What if we would like to provide any class derived from a containerclass storing pointers with an iterator handling pointer-dereferencing?
In this section a variant of the earlier (nested class) approach isdiscussed. Here the iterator class is defined as a class template, not only parameterizing the data type to which the container's elementspoint but also the container's iterator type itself. Once again, weconcentrate on developing aRandomIterator as it is the most complexiterator type.
Our class is namedRandomPtrIterator, indicating that it is a randomiterator operating on pointer values. The class template defines threetemplate type parameters:
Class). Like before,RandomPtrIterator'sconstructor is private. Thereforefriend declarations are needed toallow client classes to constructRandomPtrIterators. However, afriend class Class cannot be used as template parameter types cannot beused infriend class ... declarations. But this is a minor problem as notevery member of the client class needs to construct iterators. In fact, onlyClass'sbegin andend members must constructiterators. Using the template's first parameter, friend declarations can bespecified for the client'sbegin andend members.BaseIterator);Type).RandomPtrIterator has one private data member, aBaseIterator. Here is the class interface and the constructor'simplementation:
#include <iterator> #include <compare> template <typename Class, typename BaseIterator, typename Type> struct RandomPtrIterator; #define PtrIterator RandomPtrIterator<Class, BaseIterator, Type> #define PtrIteratorValue RandomPtrIterator<Class, BaseIterator, value_type> template <typename Class, typename BaseIterator, typename Type> bool operator==(PtrIterator const &lhs, PtrIterator const &rhs); template <typename Class, typename BaseIterator, typename Type> auto operator<=>(PtrIterator const &lhs, PtrIterator const &rhs); template <typename Class, typename BaseIterator, typename Type> int operator-(PtrIterator const &lhs, PtrIterator const &rhs); template <typename Class, typename BaseIterator, typename Type> struct RandomPtrIterator { using iterator_category = std::random_access_iterator_tag; using difference_type = std::ptrdiff_t; using value_type = Type; using pointer = value_type *; using reference = value_type &; friend PtrIterator Class::begin(); friend PtrIterator Class::end(); friend bool operator==<>(RandomPtrIterator const &lhs, RandomPtrIterator const &rhs); friend auto operator<=><>(RandomPtrIterator const &lhs, RandomPtrIterator const &rhs); friend int operator-<>(RandomPtrIterator const &lhs, RandomPtrIterator const &rhs); private: BaseIterator d_current; public: int operator-(RandomPtrIterator const &rhs) const; RandomPtrIterator operator+(int step) const; value_type &operator*() const; RandomPtrIterator &operator--(); RandomPtrIterator operator--(int); RandomPtrIterator &operator++(); RandomPtrIterator operator++(int); RandomPtrIterator operator-(int step) const; RandomPtrIterator &operator-=(int step); RandomPtrIterator &operator+=(int step); value_type *operator->() const; private: RandomPtrIterator(BaseIterator const ¤t); }; template <typename Class, typename BaseIterator, typename value_type> PtrIteratorValue::RandomPtrIterator(BaseIterator const ¤t) : d_current(current) {} Looking at itsfriend declarations, we see that the membersbeginandend of a classClass, returning aRandomPtrIterator object forthe typesClass, BaseIterator andType are granted access toRandomPtrIterator's private constructor. That is exactly what wewant. TheClass'sbegin andend members are declared asboundfriends.AllRandomPtrIterator's remaining members are public. SinceRandomPtrIterator is just a generalization of the nested classiterator developed in section22.14.1, re-implementing the requiredmember functions is easy and only requires us to changeiterator intoRandomPtrIterator and to changestd::string intoType. Forexample,operator<, defined in the classiterator as is now implemented as:
template <typename Class, typename BaseIterator, typename Type> inline auto operator<=>(PtrIterator const &lhs, PtrIterator const &rhs) { return **lhs.d_current <=> **rhs.d_current; } Some additional examples:operator*, defined in the classiterator asinline std::string &StringPtr::iterator::operator*() const{ return **d_current;} is now implemented as: template <typename Class, typename BaseIterator, typename value_type> value_type &PtrIteratorValue::operator*() const { return **d_current; } The pre- and postfix increment operators are now implemented as: template <typename Class, typename BaseIterator, typename value_type> PtrIteratorValue &PtrIteratorValue::operator++() { ++d_current; return *this; } template <typename Class, typename BaseIterator, typename value_type> PtrIteratorValue PtrIteratorValue::operator++(int) { return RandomPtrIterator(d_current++); } Remaining members can be implemented accordingly, their actualimplementations are left as exercises to the reader (or can be obtained fromthecplusplus.yo.zip archive, of course).Re-implementing the classStringPtr developed in section22.14.1is not difficult either. Apart from including the header file defining theclass templateRandomPtrIterator, it only requires a single modification.Itsiterator using-declaration must now be associated with aRandomPtrIterator. Here is the full class interface and the class's inlinemember definitions:
#ifndef INCLUDED_STRINGPTR_H_ #define INCLUDED_STRINGPTR_H_ #include <vector> #include <string> #include "iterator.h" class StringPtr: public std::vector<std::string *> { public: using iterator = RandomPtrIterator < StringPtr, std::vector<std::string *>::iterator, std::string >; using reverse_iterator = std::reverse_iterator<iterator>; iterator begin(); iterator end(); reverse_iterator rbegin(); reverse_iterator rend(); }; inline StringPtr::iterator StringPtr::begin() { return iterator(this->std::vector<std::string *>::begin() ); } inline StringPtr::iterator StringPtr::end() { return iterator(this->std::vector<std::string *>::end()); } inline StringPtr::reverse_iterator StringPtr::rbegin() { return reverse_iterator(end()); } inline StringPtr::reverse_iterator StringPtr::rend() { return reverse_iterator(begin()); } #endifIncludingStringPtr's modified header file into the program given insection22.14.2 results in a program behaving identically to itsearlier version. In this caseStringPtr::begin andStringPtr::endreturn iterator objects constructed from a template definition.
The examples in this and subsequent sections assume that the reader knows howto use thescanner generatorflex and theparser generatorbison. Bothbison andflex are well documented elsewhere. The originalpredecessors ofbison andflex, calledyacc andlex aredescribed in several books, e.g. in O'Reilly's book`lex & yacc'.
Scanner- and parser generators are also available as free software. Bothbison andflex are usually part of software distributions or they canbe obtained fromftp://prep.ai.mit.edu/pub/non-gnu.Flex creates aC++ classwhen%option c++ is specified.
For parser generators the programbison is available. In the early 90'sAlain Coetmeur (coetmeur@icdc.fr) created aC++ variant (bison++) creating a parser class. Although thebison++ program produces code that can be used inC++ programs it alsoshows many characteristics that are more suggestive of aC context than aC++ context. In January 2005 I rewrote parts of Alain'sbison++program, resulting in the original version of the programbisonc++. Then, in May 2005 a complete rewrite of thebisonc++parser generator was completed (version number 0.98). Current versions ofbisonc++ can be downloaded fromhttps://fbb-git.gitlab.io/bisoncpp/. Binary versions for variousarchitectures are available as, e.g.,Debianpackage (includingbisonc++'s documentation).
Bisonc++ creates a cleaner parser class thanbison++. In particular,it derives the parser class from a base-class, containing the parser's token-and type-definitions as well as all member functions which should not be(re)defined by the programmer. As a result of this approach, the generatedparser class is very small, declaring only members that are actually definedby the programmer (as well as some other members, generated bybisonc++itself, implementing the parser'sparse() member). One member that isnot implemented by default islex, producing the next lexicaltoken. When the directive%scanner (see section26.6.2.1) is used,bisonc++ produces a standard implementation for this member; otherwise itmust be implemented by the programmer.
In early 2012 the programflexc++http://flexcpp.org/ reached its initial release. Likebisonc++ it is part of theDebian linuxdistribution.
Jean-Paul van Oosten (jp@jpvanoosten.nl) and Richard Berendsen(richardberendsen@xs4all.nl) started theflexc++ project in 2008and the final program was completed by Jean-Paul and me between 2010 and 2012.
These sections of theC++ Annotations focus onbisonc++ as ourparser generator andflexc++ as our lexical scannergenerator. Previous releases of theC++ Annotations were usingflex as thescanner generator.
Usingflexc++ andbisonc++class-based scanners and parsers aregenerated. The advantage of this approach is that the interface to the scannerand the parser tends to become cleaner than without usingclassinterfaces. Furthermore, classes allow us to get rid of most if not all globalvariables, making it easy to use multiple parsers in one program.
Below two example programs are developed. The first example only usesflexc++. The generated scanner monitors the production of a file fromseveral parts. That example focuses on the lexical scanner and on switchingfiles while churning through the information. The second example uses bothflexc++ andbisonc++ to generate a scanner and a parser transformingstandard arithmetic expressions to their postfix notations, commonly used incode generated by compilers and inHP-calculators. In the second examplethe emphasis is mainly onbisonc++ and on composing a scanner objectinside a generated parser.
#include directives, followed by a textstring specifying the file (path) which should be included at the location ofthe#include.In order to avoid complexities irrelevant to the current example, the formatof the#include statement is restricted to the form#include<filepath>. The file specified between the angle brackets should beavailable at the location indicated byfilepath. If the file is notavailable, the program terminates after issuing an error message.
The program is started with one or two filename arguments. If the program isstarted with just one filename argument, the output is written to thestandard output streamcout. Otherwise, the output is written tothe stream whose name is given as the program's second argument.
The program defines a maximumnesting depth. Once this maximum is exceeded,the program terminates after issuing an error message. In that case, thefilename stack indicating where which file was included is printed.
An additional feature of the program is that (standardC++) comment-linesare ignored. Include-directives in comment-lines are also ignored.
The program is created in five major steps:
lexer is constructed, containing theinput-language specifications.lexer the requirements for theclass Scanner evolve. TheScanner class derives from the base classScannerBase generated byflexc++.main is constructed. AScanner object is createdinspecting the command-line arguments. If successful, the scanner's memberlex is called to produce the program's output.lex) is amember of the classScanner. SinceScanner is derived fromScannerBase, it has access to all ofScannerBase's protected membersthat execute the lexical scanner's regular expression matching algorithm.Looking at the regular expressions themselves, notice that we need rulesto recognize comment,#include directives, and all remaining characters.This all is fairly standard practice. When an#include directive issensed, the directive is parsed by the scanner. This too is commonpractice. Our lexical scanner performs the following tasks:
flex inC contexts. However, inC++ contexts,flexc++ creates a classScanner, rather than just a scanner function.Flexc++'s specification file consists of two sections:
flexc++'ssymbol area, used to define symbols, like amini scanner, oroptions. The following options are suggested:%debug: includesdebugging code into the code generated byflexc++. Calling the memberfunctionsetDebug(true) activates this debugging code at run-time. Whenactivated, information about the matching process is written to thestandard output stream. The execution of debug code is suppressed aftercalling the member functionsetDebug(false).%filenames: defines the base-name of the class header filesgenerated byflexc++. By default the class name (itself using the defaultScanner) is used.%filenames scanner%debug%max-depth 3%x comment%x include
std::cin) to thestandard output stream (std::cout). For this the predefined macroECHO can be used. Here are the rules:%% // The comment-rules: comment is ignored."//".* // ignore eoln comment"/*" begin(StartCondition__::comment);<comment>{ .|\n // ignore all characters in std C comment "*/" begin(StartCondition__::INITIAL);} // File switching: #include <filepath>#include[ \t]+"<" begin(StartCondition__::include);<include>{ [^ \t>]+ d_nextSource = matched(); ">"[ \t]*\n switchSource(); .|\n throw runtime_error("Invalid include statement");} // The default rule: echo anything else to std::cout.|\n echo();class Scanner is generated once byflexc++. This class hasaccess to several members defined by its base classScannerBase. Some ofthese members have public access rights and can be used by code external tothe classScanner. These members are extensively documented in theflexc++(1) man-page, and the reader is referred to this man-page forfurther information.Our scanner performs the following tasks:
The#include statements in the input allow the scanner to distill thename of the file where the scanning process must continue. This file name isstored in a local variabled_nextSource and a memberstackSourcehandles the switch to the next source. Nothing else is required. Pushing andpopping input files is handled by the scanner's memberspushStream andpopStream, provided byflexc++.Scanner's interface, therefore,only needs one additional function declaration:switchSource.
Switching streams is handled as follows: once the scanner has extracted afilename from an#include directive, a switch to another file is realizedbyswitchSource. This member callspushStream, defined byflexc++, to stack the current input stream and to switch to the streamwhose name is stored ind_nextSource. This also ends theincludemini-scanner, so to return the scanner to its default scanning modebegin(StartCondition__::INITIAL) is called. Here is its source:
#include "scanner.ih"void Scanner::switchSource(){ pushStream(d_nextSource); begin(StartCondition__::INITIAL);} The memberpushStream, defined byflexc++, handles all necessarychecks, throwing an exception if the file could not be opened or if too manyfiles are stacked.The member performing the lexical scan is defined byflexc++ inScanner::lex, and this member can be called by code to process the tokensreturned by the scanner.
Scanner is very simple. It expects a filenameindicating where to start the scanning process.The program first checks the number of arguments. If at least one argument wasgiven, then that argument is passed toScanner's constructor, togetherwith a second argument"-", indicating that the output should go to thestandard output stream.
If the program receives more than one argument debug output, extensivelydocumenting the lexical scanner's actions, is written to the standard outputstream as well.
Next theScanner'slex member is called. If anything fails, astd::exception is thrown, which is caught bymain's try-block's catchclause. Here is the program's source:
#include "lexer.ih"int main(int argc, char **argv)try{ if (argc == 1) { cerr << "Filename argument required\n"; return 1; } Scanner scanner(argv[1], "-"); scanner.setDebug(argc > 2); return scanner.lex();}catch (exception const &exc){ cerr << exc.what() << '\n'; return 1;}flexc++ and theGNUC++ compilerg++have been installed:flexc++. Forthis the following command can be given:flexc++ lexer
g++ -Wall *.cc
Flexc++ can be downloaded fromhttps://fbb-git.gitlab.io/flexcpp/, and requires thebobcat library, which can be downloaded fromhttp://fbb-git.gitlab.io/bobcat/.Starting point when developing programs that use both parsers and scanners isthegrammar. The grammar defines aset of tokens that can be returnedby the lexical scanner (called thescanner below).
Finally, auxiliary code is provided to `fill in the blanks': theactionsperformed by the parser and by the scanner are not normally specifiedliterally in the grammar rules or lexical regular expressions, butshould be implemented inmember functions, called from the parser'srules or which are associated with the scanner's regular expressions.
In the previous section we've seen an example of aC++ class generated byflexc++. In the current section we concentrate on the parser. The parsercan be generated from a grammar specification file, processed by the programbisonc++. The grammar specification file required bybisonc++ issimilar to the file processed bybison (orbison++,bisonc++'spredecessor, written in the early nineties byAlain Coetmeur).
In this section a program is developed convertinginfix expressions,where binary operators are written between their operands, topostfixexpressions, where operators are written behind their operands. Also, theunary operator- is converted from its prefix notation to a postfix form.The unary+ operator is ignored as it requires no further actions. Inessence our little calculator is a micro compiler, transforming numericexpressions into assembly-like instructions.
Our calculator recognizes a rather basic set of operators:multiplication, addition, parentheses, and the unary minus. We'lldistinguish real numbers from integers, to illustrate a subtlety inbison-like grammar specifications. That's all. The purpose of this section is,after all, to illustrate the construction of aC++ program that uses botha parser and a lexical scanner, rather than to construct a full-fledgedcalculator.
In the coming sections we'll develop the grammar specification forbisonc++. Then, the regular expressions for the scanner arespecified. Following that, the final program is constructed.
bisonc++ is comparable tothe specification file required bybison. Differences are related to theclass nature of the resulting parser. Our calculator distinguishes realnumbers from integers, and supports a basic set of arithmetic operators.Bisonc++ should be used as follows:
bisonc++ this is no different, andbisonc++ grammar definitions are for all practical purposes identical tobison's grammar definitions.bisonc++ can generate files defining the parser class and the implementation of the member functionparse.parse) must be separately implemented. Of course, they should also be declared in the parser class's header. At the very least the memberlex must be implemented. This member is called byparse to obtain the next available token. However,bisonc++ offers a facility providing a standard implementation of the functionlex. The member functionerror(char const *msg) is given a simple default implementation that may be modified by the programmer. The member functionerror is called whenparse detects (syntactic) errors.int main(){ Parser parser; return parser.parse();}Thebisonc++ specification file has twosections:
bisonc++ also supports several new declarations. These new declarations are important and are discussed below.bison, albeit that some members that were available inbison andbison++ are obsolete inbisonc++, while other members can be used in a wider context. For example,ACCEPT andABORT can be called from any member called from the parser's action blocks to terminate the parsing process.bison may note that there is noheader section anymore. Header sections are used by bison to providefor the necessary declarations allowing the compiler to compile theCfunction generated bybison. InC++ declarations are part of oralready used by class definitions. Therefore, a parser generator generating aC++ class and some of its member functions does not require a headersection anymore.bisonc++ are discussed here. Thereader is referred tobisonc++'s man-page for a full description.headerheader as the pathname to the file pre-included in the parser's base-class header. This declaration is useful in situations where the base class header file refers to types which might not yet be known. E.g., with%union astd::string * field might be used. Since the classstd::string might not yet be known to the compiler once it processes the base class header file we need a way to inform the compiler about these classes and types. The suggested procedure is to use a pre-include header file declaring the required types. By defaultheader is surrounded by double quotes (using, e.g.,#include "header"). When the argument is surrounded by angle brackets#include <header> is included. In the latter case, quotes might be required to escape interpretation by the shell (e.g., using-H '<header>').headerheaderheader as the pathname to the file pre-included in the parser's class header. This file should define a classScanner, offering a memberint lex() producing the next token from the input stream to be analyzed by the parser generated bybisonc++. When this option is used the parser's memberint lex() is predefined as (assuming the default parser class nameParser is used):inline int Parser::lex(){ return d_scanner.lex();}and an objectScanner d_scanner is composed into the parser. Thed_scanner object is constructed by its default constructor. If another constructor is required, the parser class may be provided with an appropriate (overloaded) parser constructor after having constructed the default parser class header file usingbisonc++. By defaultheader is surrounded by double quotes (using, e.g.,#include "header"). When the argument is surrounded by angle brackets#include <header> is included.
typenametypename should be the name of an unstructured type (e.g.,size_t). By default it isint. SeeYYSTYPE inbison. It should not be used if a%union specification is used. Within the parser class, this type may be used asSTYPE.union-definitionbison declaration. As withbison this generates a union for the parser's semantic type. The union type is namedSTYPE. If no%union is declared, a simple stack-type may be defined using the%stype declaration. If no%stype declaration is used, the default stacktype (int) is used.%union declaration is: %union { int i; double d; };In pre-C++11 code aunion cannot contain objects as its fields, asconstructors cannot be called when a union is created. This means that astring cannot be a member of theunion. Astring *, however,is a possible union member. It might alsobe possible to useunrestricted unions (cf. section9.9), havingclass type objects as fields.
As an aside: the scanner does not have to know about such a union. Itcan simply pass its scanned text to the parser through itsmatched memberfunction. For example using a statement like
$$.i = A2x(d_scanner.matched());
matched text is converted to a value of an appropriate type.
Tokens and non-terminals can be associated with union fields. This isstrongly advised, as it prevents type mismatches, since the compiler may thencheck for type correctness. At the same time, the bison specificvariables$$,$1,$2, etc. may be used, rather than the full fieldspecification (like$$.i). A non-terminal or a token may be associatedwith a union field using the<fieldname> specification. E.g.,
%token <i> INT // token association (deprecated, see below) <d> DOUBLE %type <i> intExpr // non-terminal association
In the example developed here, both the tokens and the non-terminals canbe associated with a union field. However, as noted before, the scanner doesnot have to know about all this. In our opinion, it is cleaner to let thescanner do just one thing: scan texts. Theparser, knowing what the inputis all about, may then convert strings like"123" to an integervalue. Consequently, the association of a union field and a token isdiscouraged. Below, while describing the grammar's rules, this is furtherillustrated.
In the%union discussion the%token and%type specificationsshould be noted. They are used to specify the tokens (terminal symbols) thatcan be returned by the scanner, and to specify the return types ofnon-terminals. Apart from%token the token declarators%left,%right, and%nonassoc can be used to specify the associativity ofoperators. The tokens mentioned at these indicators are interpreted as tokensindicating operators, associating in the indicated direction. The precedenceof operators is defined by their order: the first specification has the lowestpriority. To overrule a certain precedence in a certain context%prec canbe used. As all this is standardbisonc++ practice, it isn't furtherelaborated here. The documentation provided withbisonc++'s distributionshould be consulted for further reference.
Here is the specification of the calculator's declaration section:
%filenames parser%scanner ../scanner/scanner.h%union { int i; double d;};%token INT DOUBLE%type <i> intExpr%type <d> doubleExpr%left '+'%left '*'%right UnaryMinus In the declaration section%type specifiers are used, associating theintExpr rule's value (see the next section) to thei-field of thesemantic-value union, and associatingdoubleExpr's value to thed-field. This approach, admittedly, is rather complex, as expression rulesmust be included for each of the supported union types. Alternatives aredefinitely possible, and involve the use ofpolymorphic semanticvalues, covered in detail in theBisonc++ user guide.bisonc++. In particular, notethat no action block requires more than a single line of code. This keeps thegrammar simple, and therefore enhances its readability andunderstandability. Even the rule defining the parser's proper termination (theempty line in theline rule) uses a single member function calleddone. The implementation of that function is simple, but it is worth whilenoting that it callsParser::ACCEPT, showing thatACCEPT can be calledindirectly from a production rule's action block. Here are the grammar'sproduction rules: lines: lines line | line ; line: intExpr '\n' { display($1); } | doubleExpr '\n' { display($1); } | '\n' { done(); } | error '\n' { reset(); } ; intExpr: intExpr '*' intExpr { $$ = exec('*', $1, $3); } | intExpr '+' intExpr { $$ = exec('+', $1, $3); } | '(' intExpr ')' { $$ = $2; } | '-' intExpr %prec UnaryMinus { $$ = neg($2); } | INT { $$ = convert<int>(); } ; doubleExpr: doubleExpr '*' doubleExpr { $$ = exec('*', $1, $3); } | doubleExpr '*' intExpr { $$ = exec('*', $1, d($3)); } | intExpr '*' doubleExpr { $$ = exec('*', d($1), $3); } | doubleExpr '+' doubleExpr { $$ = exec('+', $1, $3); } | doubleExpr '+' intExpr { $$ = exec('+', $1, d($3)); } | intExpr '+' doubleExpr { $$ = exec('+', d($1), $3); } | '(' doubleExpr ')' { $$ = $2; } | '-' doubleExpr %prec UnaryMinus { $$ = neg($2); } | DOUBLE { $$ = convert<double>(); } ; This grammar is used to implement a simple calculator in which integer andreal values can be negated, added, and multiplied and in which standardpriority rules can be overruled by parentheses. The grammar shows the use oftyped nonterminal symbols:doubleExpr is linked to real (double) values,intExpr is linked to integer values. Precedence and type association isdefined in the parser's definition section.Bisonc++ generates multiple files, among which the filedefining the parser's class. Functions called from the production rule'saction blocks are usually member functions of the parser. These memberfunctions must be declared and defined. Oncebisonc++ has generated theheader file defining the parser's class, that header file isn't automaticallyrewritten, allowing the programmer to add new members to the parser classwhenever required. Here is `parser.h' as used in our little calculator:#ifndef Parser_h_included#define Parser_h_included#include <iostream>#include <sstream>#include <bobcat/a2x>#include "parserbase.h"#include "../scanner/scanner.h"#undef Parserclass Parser: public ParserBase{ std::ostringstream d_rpn; // $insert scannerobject Scanner d_scanner; public: int parse(); private: template <typename Type> Type exec(char c, Type left, Type right); template <typename Type> Type neg(Type op); template <typename Type> Type convert(); void display(int x); void display(double x); void done() const; void reset(); void error(char const *msg); int lex(); void print(); static double d(int i); // support functions for parse(): void executeAction(int d_ruleNr); void errorRecovery(); int lookup(bool recovery); void nextToken(); void print__();};inline double Parser::d(int i){ return i;}template <typename Type>Type Parser::exec(char c, Type left, Type right){ d_rpn << " " << c << " "; return c == '*' ? left * right : left + right;}template <typename Type>Type Parser::neg(Type op){ d_rpn << " n "; return -op;}template <typename Type>Type Parser::convert(){ Type ret = FBB::A2x(d_scanner.matched()); d_rpn << " " << ret << " "; return ret;}inline void Parser::error(char const *msg){ std::cerr << msg << '\n';}inline int Parser::lex(){ return d_scanner.lex();}inline void Parser::print(){}#endifParser::INT orParser::DOUBLE tokens.Theflexc++ directive%interactive is provided since thecalculator is a program actively interacting with its human user.
Here is the completeflexc++ specification file:
%interactive%filenames scanner%%[ \t] // ignored[0-9]+ return Parser::INT;"."[0-9]+ |[0-9]+"."[0-9]* return Parser::DOUBLE;.|\n return matched()[0];
bisonc++ andflexc++. Here is theimplementation of the calculator'smain function:#include "parser/parser.h"using namespace std;int main(){ Parser parser; cout << "Enter (nested) expressions containing ints, doubles, *, + and " "unary -\n" "operators. Enter an empty line to stop.\n"; return parser.parse();}The parser's filesparse.cc andparserbase.h are generated by thecommand:
bisonc++ grammar
The fileparser.h is created only once, to allow the developer to addmembers to theParser class occe the need for them arises.
The programflexc++ is used to create a lexical scanner:
flexc++ lexer
g++ -Wall -o calc *.cc -lbobcat -s
can be used to compile and link the source of the main program and thesources produced by the scanner and parser generators. The example uses theA2x class, which is part of thebobcat library (cf. section26.6.1.5) (the bobcat library is available on systems offering eitherbisonc++ orflexc++).Bisonc++ can be downloaded from
http://fbb-git.gitlab.io/bisoncpp/.