I have known aboutstd::variant for a while but had dismissed it as not very useful (I have revised that opinion now). But I would like a second opinion on my usage to see if I can improve this.
The original class looked like this:
namespace ThorsAnvil::Serialize { class ParserInterface { public: ParserInterface(std::istream& stream) : input(stream) {} virtual ~ParserInterface() {} virtual ParserToken getNextToken() = 0; virtual std::string getKey() = 0; virtual void getValue(short int&) = 0; virtual void getValue(int&) = 0; virtual void getValue(long int&) = 0; virtual void getValue(long long int&) = 0; virtual void getValue(unsigned short int&) = 0; virtual void getValue(unsigned int&) = 0; virtual void getValue(unsigned long int&) = 0; virtual void getValue(unsigned long long int&)= 0; virtual void getValue(float&) = 0; virtual void getValue(double&) = 0; virtual void getValue(long double&) = 0; virtual void getValue(bool&) = 0; virtual void getValue(std::string&) = 0; virtual bool isValueNull() = 0; virtual std::string getRawValue() = 0; protected: bool read(char* dst, std::size_t size) { return static_cast<bool>(input.read(dst, size)); } bool readTo(std::string& dst, char delim) { return static_cast<bool>(std::getline(input, dst, delim)); } std::size_t lastReadCount() const { return input.gcount(); } std::streampos getPos() { return input.tellg(); } int get() { return input.get(); } int peek() { return input.peek(); } void ignore(std::size_t size) { input.ignore(size); } void clear() { input.clear(); } void unget() { input.unget(); } bool ok() const { return !input.fail(); } void setFail() { input.setstate(std::ios::failbit); } template<typename T> bool readValue(T& value) { return static_cast<bool>(input >> value); } private: std::istream& input; };}I then have implementations for Json/Yaml/Bson that override the virtual functions to provide the actual implementation. But all these implementations use the protected methods to read from the stream.
I did some experiments and found that I this could be much more efficient with astd::string as the underlying storage. I know I can move astd::string into astd::stringstream and simply use the code above, but the overhead of the string stream is very high. So I decided I wanted to modify the class to support strings directly.
This is wherestd::variant comes in. Without changing anything (drastically) I made the input typestd::variant<std::istream&, StringInput>. Note: I had to changestd::istream& tostd::istream* asstd::variant does not support references (and I could not move the stream). AlsoStringInput is a very basic wrapper aroundstd::string_view that simply supports the methods I need.
Thus the class I would like reviewed is:
namespace ThorsAnvil::Serialize{ class ParserInterface { public: // Added a new constructor // So it can take a string in addition to a stream. ParserInterface(std::string_view const& str) : input(str) {} ParserInterface(std::istream& stream, ParserConfig config = ParserConfig{}) : input(&stream) {} virtual ~ParserInterface() {} virtual ParserToken getNextToken() = 0; virtual std::string getKey() = 0; virtual void getValue(short int&) = 0; virtual void getValue(int&) = 0; virtual void getValue(long int&) = 0; virtual void getValue(long long int&) = 0; virtual void getValue(unsigned short int&) = 0; virtual void getValue(unsigned int&) = 0; virtual void getValue(unsigned long int&) = 0; virtual void getValue(unsigned long long int&)= 0; virtual void getValue(float&) = 0; virtual void getValue(double&) = 0; virtual void getValue(long double&) = 0; virtual void getValue(bool&) = 0; virtual void getValue(std::string&) = 0; virtual bool isValueNull() = 0; virtual std::string getRawValue() = 0; protected: // The protected functions are all modified to read from // either a string or stream using the `std::visit()` method. bool read(char* dst, std::size_t size) { struct Read { char* dst; std::size_t size; Read(char* dst, std::size_t size):dst(dst),size(size){} bool operator()(std::istream* input) {return static_cast<bool>(input->read(dst, size));} bool operator()(StringInput& input) {return input.read(dst, size);} }; return std::visit(Read{dst, size}, input); } bool readTo(std::string& dst, char delim) { struct ReadTo { std::string& dst; char delim; ReadTo(std::string& dst, char delim):dst(dst),delim(delim){} bool operator()(std::istream* input) {return static_cast<bool>(std::getline((*input), dst, delim));} bool operator()(StringInput& input) {return input.readTo(dst, delim);} }; return std::visit(ReadTo(dst, delim), input); } std::size_t lastReadCount() const { struct LastReadCount { std::size_t operator()(std::istream const* input) {return input->gcount();} std::size_t operator()(StringInput const& input) {return input.getLastReadCount();} }; return std::visit(LastReadCount{}, input); } std::streampos getPos() { struct GetPos { std::streampos operator()(std::istream* input) {return input->tellg();} std::streampos operator()(StringInput& input) {return input.getPos();} }; return std::visit(GetPos{}, input); } int get() { struct Get { int operator()(std::istream* input) {return input->get();} int operator()(StringInput& input) {return input.get();} }; return std::visit(Get{}, input); } int peek() { struct Peek { int operator()(std::istream* input) {return input->peek();} int operator()(StringInput& input) {return input.peek();} }; return std::visit(Peek{}, input); } void ignore(std::size_t size) { struct Ignore { std::size_t size; Ignore(std::size_t size): size(size) {} void operator()(std::istream* input) {input->ignore(size);} void operator()(StringInput& input) {input.ignore(size);} }; std::visit(Ignore{size}, input); } void clear() { struct Clear { void operator()(std::istream* input) {input->clear();} void operator()(StringInput& input) {input.clear();} }; std::visit(Clear{}, input); } void unget() { struct Unget { void operator()(std::istream* input) {input->unget();} void operator()(StringInput& input) {input.unget();} }; std::visit(Unget{}, input); } bool ok() const { struct OK { bool operator()(std::istream const* input) {return !input->fail();} bool operator()(StringInput const& input) {return input.isOk();} }; return std::visit(OK{}, input); } void setFail() { struct SetFail { void operator()(std::istream* input) {input->setstate(std::ios::failbit);} void operator()(StringInput& input) {input.setFail();} }; std::visit(SetFail{}, input); } template<typename T> bool readValue(T& value) { struct ReadValue { T& value; ReadValue(T& value) :value(value) {} bool operator()(std::istream* input) {return static_cast<bool>((*input) >> value);} bool operator()(StringInput& input) {input.readValue(value);return true;} }; return std::visit(ReadValue{value}, input); } private: using DataInputStream = std::variant<std::istream*, StringInput>; DataInputStream input; };}For completness the StringInput class:
namespace ThorsAnvil::Serialize{ struct StringInput { std::string_view data; std::size_t position; std::size_t lastRead; public: StringInput(std::string_view const& view) : data(view) , position(0) , lastRead(0) {} bool read(char* dst, std::size_t size) { std::size_t copySize = std::min(size, data.size() - position); std::copy(&data[position], &data[position + copySize], dst); position += copySize; lastRead = copySize; return position <= data.size(); } bool readTo(std::string& dst, char delim) { auto find = data.find(delim, position); if (find == std::string::npos) { find = data.size(); } auto size = find - position; dst.resize(size); std::copy(&data[position], &data[position + size], &dst[0]); position += (size + 1); return position <= data.size(); } std::size_t getLastReadCount() const {return lastRead;} std::streampos getPos() {return position;} int get() {return data[position++];} int peek() {return data[position];} void ignore(std::size_t size) {position += size;} void clear() {position = 0;} void unget() {--position;} bool isOk() const {return position <= data.size();{ void setFail() {position = data.size() + 1;} template<typename T> void readValue(T& value) { using std::from_chars; #if defined(NO_STD_SUPPORT_FROM_CHAR_DOUBLE) && (NO_STD_SUPPORT_FROM_CHAR_DOUBLE >= 1) using fast_float::from_chars; #endif auto start = &data[position]; auto result = from_chars(start, &data[0] + data.size(), value); if (result.ec != std::errc::invalid_argument) { lastRead = (result.ptr - start); position+= lastRead; } } void readValue(char& value) { while (position < data.size() && std::isspace(data[position])) { ++position; } value = (position < data.size()) ? data[position] : -1; ++position; } }}- \$\begingroup\$
std::variantcan store references if you usestd::reference_wrapper. But if you already working withstd::istreams, then to read from a string the caller can just make astd::istringstreamor since C++23 make astd::ispanstreamfor an existingstd::string.\$\endgroup\$G. Sliepen– G. Sliepen2024-09-17 05:17:05 +00:00CommentedSep 17, 2024 at 5:17 - \$\begingroup\$@G.Sliepen Yes you can use the
std::istringstream. But I am optimizing for performance. The stream interface has an excissively high overhead. By adding the theStringInputclass I have increased throughput by a factor of 4.\$\endgroup\$Loki Astari– Loki Astari2024-09-17 07:14:37 +00:00CommentedSep 17, 2024 at 7:14 - 1\$\begingroup\$You don’t want to store references as data members in any case, and
reference_wrapperis really just a pointer. Might as well just store the pointer.\$\endgroup\$indi– indi2024-09-17 19:11:45 +00:00CommentedSep 17, 2024 at 19:11 - \$\begingroup\$As an experiment I tried converting this variant interface to a polymorphic one. I did not see any improvement (which was surprising).\$\endgroup\$Loki Astari– Loki Astari2024-09-17 22:07:04 +00:00CommentedSep 17, 2024 at 22:07
1 Answer1
I can hardly criticize the overall design all that much, because I’ve done more or less the same thing. 😏
It sounds like you’re primarily interested in the use ofstd::variant, so I’ll focus on that.
So, as I mentioned, I’ve done more or less the same thing. The difference is that I made myStringInput a implementation detail of the parser class. I regret that choice, but I also don’t think that it makes much sense as an independent type. What I planned to do (never got around to it), was to build a generalized input shim class, andthat would wrap theistream,string_view, and any other input sources, in a pleasant and useful universal interface. In other words, I madeStringInput an implementation detail… you madeDataInputStream an implementation detail… but I think theright solution would be makeDataInputStream (or something much like it) an independent, reusable type.
IfDataInputStream were a completely independent type, then any complexity inside it wouldn’t be that big a deal; the point of a type is to wrap complexity and insulate other business logic from it. So all those visitor classes that take up so much space inParserInterface would be a wash.
For now, let’s see if we can reduce the footprintvariant takes up inParserInterface.
To start, I notice that there are several functions where the only difference between the stream action and the string action is whether the call is direct or indirect:
int get() { struct Get { int operator()(std::istream* input) {return input->get();} int operator()(StringInput& input) {return input.get();} }; return std::visit(Get{}, input); } int peek() { struct Peek { int operator()(std::istream* input) {return input->peek();} int operator()(StringInput& input) {return input.peek();} }; return std::visit(Peek{}, input); }// etc. ...An easy way to simplify this near-duplication is witha wrapper that overloads the indirection operator. It’s basically just 3 lines that turnmultiple functions from this:
int get() { struct Get { int operator()(std::istream* input) {return input->get();} int operator()(StringInput& input) {return input.get();} }; return std::visit(Get{}, input); }… to this:
int get() { return std::visit([](auto& input) { return input->get(); }, input); }And it shouldn’t add any noticeable overhead, because both the compiler and the CPU can trivially see through that extra indirection, so prefetch is on your side. (You’re already paying for a hidden branch when selecting the active variant member.)
Generally, I would suggest this pattern:
auto func(...){ struct Func { auto operator()(T1& t) { return t.func(...); } auto operator()(T2& t) { return t.func(...); } auto operator()(T3& t) { return t.func(...); } // ... etc. // ... data members ... }; return std::visit(Func{...}, v);}… is not a good idea.
The big hazard is that an unfortunate typo could potentially silently “work”. Consider what might happen if there happened to be a completely unrelated but visible class namedFnuc, and—by accident—you typed thevisit line above asreturn std::visit(Fnuc{}, v);, andFnuc happens to have anoperator() defined that happens to accept the same types that are in the variant (not so far fetched if the call operator is a template that just acceptseverything). Who knows what could happen. It could even appear to work.
The above scenario is notlikely… but it is also not exactlyridiculous. That’s what makes it concerning for me. If it were alikely problem, you’d be inclined to check for it. But because it is anunlikely problem, you probably won’t bother to check all the time, and then when it silently “works”….
The other thing that bothers me about the above pattern is how non-intuitive it is. Normally you read a function downward—top to bottom—and follow the logic step-by-step as you go. (This is obviously a simplification, because loops, but it is part of the reason why “goto considered harmful” became a thing.) But to read the function above, you kinda have to jump around a bit. First you read thisstruct nested in the function, and be like… “ooookay, what’s this about?”. Andthen you see thevisit() call and have the “aha” moment… and then you have to goback up, now that you know what’s going on and understand that the multiple call operator functions are branches of a pattern-matching visit operation.
Whenever I use variants, I always use either a lambda that is defined in situ:
auto func(...){ return std::visit([...](auto& t) { using T = std::remove_cvref_t<decltype(t)>; if constexpr (std::is_same_v<T, T1>) { return t.func(...); } else if constexpr (std::is_same_v<T, T2>) { return t.func(...); } else if constexpr (std::is_same_v<T, T3>) { return t.func(...); } // ... etc. // Optional catch-all: else { ... } }, v);}… or the “overload” pattern:
// Assuming:template<typename... Ts>struct overloaded : Ts... { using Ts::operator()...; };auto func(...){ return std::visit(overloaded{ [](T1& t) { return t.func(...); }, [](T2& t) { return t.func(...); }, [](T3& t) { return t.func(...); }, // ... etc. // Optional catch-all: [](auto& t) { ... } }, v);}Either way I think is easier to read and understand, because the logic flows naturally.std::visit() basically becomes the head of a pattern-matching tree—or aswitch on steroids—and the pattern branches are all right there in order. You’re not just confronted with a set of (what you will later realize are) match patterns with no context.
Also, it makes the hazard of a typo creating a silent match to some external class impossible.
Also, you can use member function templates (either as the in situ lambda itself, of as a catch-all matcher), which you can’t do with local classes.
(Also, keep in mind that the future is pattern matching, whichprobably will look like this:
auto func(...){ return v match { T1: let t => t.func(...); T2: let t => t.func(...); T3: let t => t.func(...); // ... etc. // Optional catch-all: let t => ...; };}… which looks a lot more like either the in situ lambda or the “overload” version than it does the version with the local class.)
Of the three versions, I would recommend the “overload” version, because it is both simpler and shorter. Having a type likeoverloaded defined in your library would be generally useful in any case.
Some more before and after.
// Before:bool read(char* dst, std::size_t size){ struct Read { char* dst; std::size_t size; Read(char* dst, std::size_t size):dst(dst),size(size){} bool operator()(std::istream* input) {return static_cast<bool>(input->read(dst, size));} bool operator()(StringInput& input) {return input.read(dst, size);} }; return std::visit(Read{dst, size}, input);}// After:bool read(char* dst, std::size_t size){ return std::visit(overloaded{ [dst, size](std::istream* input) { return static_cast<bool>(input->read(dst, size)); }, [dst, size](StringInput& input) { return input.read(dst, size); } }, input);}// Or, assuming a wrapper with the indirection operator...:bool read(char* dst, std::size_t size){ // Don't even need overloaded if there's only one branch: return std::visit([dst, size](auto& input) { return bool(input->read(dst, size)); }, input);}// ---------------------------------------------// Before:bool readTo(std::string& dst, char delim){ struct ReadTo { std::string& dst; char delim; ReadTo(std::string& dst, char delim):dst(dst),delim(delim){} bool operator()(std::istream* input) {return static_cast<bool>(std::getline((*input), dst, delim));} bool operator()(StringInput& input) {return input.readTo(dst, delim);} }; return std::visit(ReadTo(dst, delim), input);}// After:bool readTo(std::string& dst, char delim){ return std::visit(overloaded{ // Fine to use single-letter identifiers, because the scope is so tiny: [&dst, delim](std::istream* i) { return bool(std::getline(*i, dst, delim)); }, [&dst, delim](StringInput& i) { return i.readTo(dst, delim); } }, input);}… and so on.
Hope this helps!
- \$\begingroup\$Thanks that was useful.\$\endgroup\$Loki Astari– Loki Astari2024-09-27 17:00:28 +00:00CommentedSep 27, 2024 at 17:00
- \$\begingroup\$I agree that readability is slightly lower the way I am doing it. So I will look into your other forms.\$\endgroup\$Loki Astari– Loki Astari2024-09-27 17:01:56 +00:00CommentedSep 27, 2024 at 17:01
- \$\begingroup\$I am really surprised (Impressed) by how efficient this technique is. I have been trying all sorts of optimizations but the code generated by the visitor pattern is as good as any other technique I have tried). And most of the other techniques are a lot messier in code.\$\endgroup\$Loki Astari– Loki Astari2024-09-27 17:03:57 +00:00CommentedSep 27, 2024 at 17:03
- \$\begingroup\$The one thing I have done since this is convert
ReadandReadTo,IgnoreandReadValueinto stateless classes (no constructor or members). This does mean the parameters to visit become a bit more funky (as they all need to be variants) but withusingit is still readable.\$\endgroup\$Loki Astari– Loki Astari2024-09-27 17:05:46 +00:00CommentedSep 27, 2024 at 17:05 - \$\begingroup\$Take everything I say next with aheavy grain of salt, because my focus is on high-level code architecture, and not low-level code gen or CPU architecture: It seems to me that visiting a variant, no matter how you do it, should be more efficient than a classic OO polymorphic implementation… or, in the worst case, not less efficient. 1/5\$\endgroup\$indi– indi2024-09-28 12:26:42 +00:00CommentedSep 28, 2024 at 12:26
You mustlog in to answer this question.
Explore related questions
See similar questions with these tags.

