-2
\$\begingroup\$
Closed. This question isoff-topic. It is not currently accepting answers.

Missing Review Context: Code Review requiresconcrete code from a project, with enough code and / or context for reviewers to understand how that code is used. Pseudocode, stub code, hypothetical code, obfuscated code, and generic best practices are outside the scope of this site.

Closed3 years ago.

I want to serialize a C++ classRamdomclass . Below is the serialization function.

std::vector<uint8_t> Serialize(Ramdomclass &Ramdom){    // First encode the dom content    std::vector<uint8_t> Buf;    // First Part    // Name: vec(byte)    // vec(byte): size:u32, bytes    auto EncodedSize = EncodeU32(Ramdom.getName().length());    Buf.insert(Buf.end(), EncodedSize.begin(), EncodedSize.end());    Buf.insert(Buf.end(), Ramdom.getName().cbegin(), Ramdom.getName().cend());    // Second Part    // Buf: vec(byte)    // vec(byte): size:u32, bytes    EncodedSize = EncodeU32(Ramdom.getContent().size());    Buf.insert(Buf.end(), EncodedSize.begin(), EncodedSize.end());    Buf.insert(Buf.end(), Ramdom.getContent().cbegin(), Ramdom.getContent().cend());    //  dom ID    std::vector<uint8_t> Result;    Result.push_back(0x00U);    // The dom size.    EncodedSize = EncodeU32(Buf.size());    Result.insert(Result.end(), EncodedSize.begin(), EncodedSize.end());    // add dom content.    Result.insert(Result.end(), Buf.begin(), Buf.end());    // This is an bruteforce example. Can you do more fast ?.    return Result;}std::vector<uint8_t> EncodeU32(uint32_t d){    std::vector<uint8_t> encoded;    // unsigned LEB128 encoding    do    {        auto x = d & 0b01111111;        d >>= 7;        if (d)            x |= 0b10000000;        encoded.push_back(x);    } while (d);    return encoded; // this will be in bytes if we want to see it then unsigned(encoded[i]) will be a number betweern 0 to 255}

I think I can improve in terms of the way I am appending different parts instd::vector<uint8_t> Buf;. I want to know is there any better way of doing this ? maybe instead of using insert can I used anyone other way

Note:: All the things I am appending tostd::vector<uint8_t> Buf are in bytes(binary).

askedOct 15, 2022 at 15:41
Sebastian Orteho's user avatar
\$\endgroup\$
4
  • 2
    \$\begingroup\$There's a lot of omission that makes this code hard to review. Without the definition ofRandomclass and without the deserialisation code, it's difficult to be sure it's even correct. For a good review, I recommend you include both of those, and the unit-tests, too.\$\endgroup\$CommentedOct 15, 2022 at 15:51
  • \$\begingroup\$Personally, I don't like binary serialization. As a lead, I would push back and ask why not use a text based serialization method as a first version. It's easier to do, easier to debug, and easier to fix when it goes wrong. There are reason to use binary serialization (speed/size), but you have to justify the use case because of all the extra headaches (and thus work) it introduces. So my first question is: Why not serialize into JSON or YAML?\$\endgroup\$CommentedOct 16, 2022 at 18:11
  • \$\begingroup\$Also why are you serializing to in memory buffer. I would abstract this to a stream. The stream could be an in memory buffer but could also be a file or a socket (your code should not need to know where the data is going). By using a stream you will remove the need to serialize to a buffer than copy to the final destination.\$\endgroup\$CommentedOct 16, 2022 at 18:17
  • \$\begingroup\$@MartinYork, thanks. Ramdomclass is only one class there are many such class that need to be serialize.. In my case serialize and deserialize works as an API. Can you show how to do with stream ? consideringSerialize(Classes &AllClasses){ Serialize(Ramdomclass &Ramdom) ; Serialize(Ramclass &Ram) ; Serialize(Rodonclass &Rodan); .... } and every class contains same members .\$\endgroup\$CommentedOct 17, 2022 at 6:14

1 Answer1

1
\$\begingroup\$

Improving oninsert()

Your spider senses are tingling for good reasons.std::vector'sinsert() function gets two iterators that describe the range to copy from, but if it's written in a generic way it cannot make any assumptions about those iterators, so it might just copy character by character instead of doing a call tomemcpy(). Also, every time it adds another character, it has to make sureBuf is large enough, and reallocate its storage if not.

However, it can in principle detect what you are trying to do and implement an optimized version that basically just reallocates and thenmemcpy()s. Some compilers do this, while others don't. You could "help" the compiler by making this explicit; instead of callinginsert(), write:

auto old_size = Buf.size();Buf.resize(old_size + Random.getName().size());std::copy_n(Ramdom.getName().cbegin(), Ramdom.getName().size(),            Buf.begin() + old_size);

However, that might also generate worse output, asresize() might cause value-initialization of the extra storage being added, even though it will be overwritten by the call tostd::copy_n(). It all depends on how the compiler optimizes your code.

While the above shows how you could change a single call toinsert(), consider that you could also calculate the size of everything you are going to add up-front, then do a singleresize() inSerialize(), and then just copy all the data to the right place insideBuf.

If you want to write reasonably performant, maintainable code, then I would keep the calls toinsert() as you have now. If you need to get every bit of performance you can and maintainability is of lesser concern, then try the above, create realistic performance benchmarks, and measure this on several platforms with different compilers to see if it's worth it or not.

Callreserve() if possible

Regardless of how you add elements to vectors, if you have some idea of how much you are going to add, you can callreserve() first. This might avoid multiple memory (re)allocations later on.

Avoid code duplication

Your code basically does the same thing three times: add a size to a buffer and then copy the data. Consider writing a function that does that:

template<typename T>static void append_data(std::vector<uint8_t> &Buf, const T &Data){    auto EncodedSize = EncodeU32(Data.size());    Buf.insert(Buf.end(), EncodedSize.begin(), EncodedSize.end());    Buf.insert(Buf.end(), Data.cbegin(), Data.cend());}std::vector<uint8_t> Serialize(Ramdomclass &Ramdom){    std::vector<uint8_t> Buf;    append_data(Buf, Ramdom.getName());    append_data(Buf, Ramdom.getContent());    std::vector<uint8_t> Result;    Result.push_back(0x0);    append_data(Result, Buf);    return Result;}

Alternative ways to serialize data

There are other approaches to serialize data that you could use, or perhaps look at to get inspiration for your own serialization code.

If you want to have a clean and extensible way to serialize your own data structures in C++, thenBoost::Serialization might be interesting for you. The drawback is that it doesn't do any compression of integers as far as I am aware.

Instead of a custom serializationformat, you could use a standard format. For example,MessagePack seems like a good fit here, but there aremany more.

answeredOct 15, 2022 at 17:39
G. Sliepen's user avatar
\$\endgroup\$
6
  • \$\begingroup\$Thanks, in what case will copy_n gives undefined behavior considering my case?\$\endgroup\$CommentedOct 18, 2022 at 10:17
  • \$\begingroup\$I'm not sure what you mean by that? The example I showed is correct and should not have any undefined behavior.\$\endgroup\$CommentedOct 18, 2022 at 11:15
  • \$\begingroup\$thanks again.. Is your way ie by using copy_n() is the fastest way to add elements into a vector compare to using a insert()?\$\endgroup\$CommentedOct 18, 2022 at 12:05
  • \$\begingroup\$You have to combine it withresize(). And as I wrote in my answer, it depends on the implementation the standard library and your compiler which way will be faster.\$\endgroup\$CommentedOct 18, 2022 at 12:53
  • \$\begingroup\$I tried your solution but I am getting the error.. I am not getting why is happening check hereonlinegdb.com/8MZTHabr1 . Can you please help me\$\endgroup\$CommentedOct 18, 2022 at 13:55

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.