6
\$\begingroup\$

I've written a bytes to number converter for cpp similar to the pythonint.from_bytes.

It does not make any assumptions on:

  • The endianness of the input buffer
  • The endianness of the system
  • The endianness consistency of the the system ( = no assumption that all numeric types have the same endianness )

Additionally the source buffer can be smaller than the target.

#include <cstdint>#include <span>#include <stdexcept>namespace Binary{    // Used to indicate the endianness of a type    // Better readability than a simple bool    enum class Endianness    {        Little,        Big,    };    namespace Int {        // Base is the underlying type that will be used to store the eventual result        template< typename Base >        Base FromBytes( std::span< std::uint8_t > inBuffer, Binary::Endianness inEndianness )        {            // Check if we have enough space            if( sizeof( Base ) < inBuffer.size() )            {                throw std::length_error( "unable to fit data in requested type" );            }            // Determine endianness of the Base type on this system            union {                Base i;                char c[ sizeof( Base ) ];            } test = { .i = 1 };            bool tmp = test.c[ sizeof( Base ) - 1 ] == '\1';            Binary::Endianness type_endianness = tmp ? Binary::Endianness::Big : Binary::Endianness::Little;            // Always initialize your variables            Base result = Base{ 0 };            std::uint8_t * destination = ( std::uint8_t * ) &result;            // If we have big endian and we have a smaller buffer the first bytes should            // remain zero, meaning we should start filling further back            if( type_endianness == Binary::Endianness::Big ) { destination += sizeof( Base ) - inBuffer.size(); }            if( type_endianness == inEndianness )            {                // equal endianness => simple copy                memcpy( destination, inBuffer.data(), inBuffer.size() );            }            else            {                // differing endianness => copy in reverse                for( size_t i = 0; i < inBuffer.size(); i+=1 )                {                    destination[ i ] = inBuffer[ inBuffer.size() - i - 1 ];                }            }            return result;        }    }}

Are there any improvements I could make?

Sᴀᴍ Onᴇᴌᴀ's user avatar
Sᴀᴍ Onᴇᴌᴀ
29.6k16 gold badges46 silver badges203 bronze badges
askedMar 31, 2023 at 14:22
Sam Coutteau's user avatar
\$\endgroup\$

1 Answer1

3
\$\begingroup\$

I like the enum.Except that it is different from the one C++20defines.

We're missing some review context,like the range of use cases we're trying to address.This submission contains no unit tests or other demo code.It's unclear if caller is likely to call into thisrepeatedly through an array loop.Perhaps "exotic" 24-bit quantities motivated this code.


use constexpr

Every call toFromBytes asks about native endianess.It's a nice piece of code, very clear, but we shouldcache the result as it won't change.Or better, find the result at compile time.

Even better, dispense with it and rely onstd::endian::native.Or write down the underlying requirementsso we understand what systems are beingtargeted and what language / compiler variantsare within scope.

nit: Assigningtmp = test.c[0] == '\1';seems a little more convenient. But perhapsyou wanted the order for the ternaryto be "big, little" for aesthetic reasons.Consider renamingtmp tois_big.


Delete uninformative comments like "Always initialize your variables".

I am glad to see that.size() is the first thing we check. Good.


nit: I find the second form slightly easier to read.

... inBuffer[ inBuffer.size() - i - 1 ]
... inBuffer[ inBuffer.size() - 1 - i ]

Why? I read it as "constant minusi"where I can reason about the constant locationindependent of the loop.Rather than having to puzzle out the numericrelationship between size andi, observeit is always positive, and then decrementto get to the zero-th element.


document language version

std::reverse_copyhas been available since C++17,but we choose not to use it, unclear why.

Using a standard routine might yield performance identical to a naïve loop.But maybe a vendor went to some trouble tovectorize it for your target processor,and it will go quicker.Use it for the same reason you prefer memcpy().

Tell us which language variants are acceptablefor this project, write it down so futuremaintainers will know the rules.


I was slightly surprised to not seentohl mentioned,norbe64toh, but maybe {3, 5, 6, 7}-byte inBuffersare just as important for the use case we're addressing.Those standard functions should absolutely appear inyour automated test suite.


document the sign bit

The brief review context explicitly says that we implement awell-knowninterface.Wedo support a corresponding endianness parameter.But interpretation of the source sign bit isleft implicit. Presumably it matches theBase type.This should be explicitly written down.

(Also, "Base" seems a slightly odd name as itsuggests "Radix" which isn't relevant here.Maybe "NumericType"?)


API design

Imagine the caller is assigning a great many 16-bit integersvia an array loop.

I worry about whether we're giving the compiler a goodchance at vectorizing such a bulk assignment.Consider offering a second method that moves K integersof identical size.

Imagine the input numbers arealready in native orderand they have a standard {16, 32, 64}-bit size.

Would it help your use case if this routine was giventhe flexibility to report no-op by returningaddress of input buffer, avoiding the memcpy() ?


This code achieves its design goals.Before merging it tomain it should be better documentedand should be accompanied by unit tests.

I would be happy to delegate or accept maintenancetasks on this codebase.

answeredMar 31, 2023 at 16:35
J_H's user avatar
\$\endgroup\$

You mustlog in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.