I've written a bytes to number converter for cpp similar to the pythonint.from_bytes.
It does not make any assumptions on:
- The endianness of the input buffer
- The endianness of the system
- The endianness consistency of the the system ( = no assumption that all numeric types have the same endianness )
Additionally the source buffer can be smaller than the target.
#include <cstdint>#include <span>#include <stdexcept>namespace Binary{ // Used to indicate the endianness of a type // Better readability than a simple bool enum class Endianness { Little, Big, }; namespace Int { // Base is the underlying type that will be used to store the eventual result template< typename Base > Base FromBytes( std::span< std::uint8_t > inBuffer, Binary::Endianness inEndianness ) { // Check if we have enough space if( sizeof( Base ) < inBuffer.size() ) { throw std::length_error( "unable to fit data in requested type" ); } // Determine endianness of the Base type on this system union { Base i; char c[ sizeof( Base ) ]; } test = { .i = 1 }; bool tmp = test.c[ sizeof( Base ) - 1 ] == '\1'; Binary::Endianness type_endianness = tmp ? Binary::Endianness::Big : Binary::Endianness::Little; // Always initialize your variables Base result = Base{ 0 }; std::uint8_t * destination = ( std::uint8_t * ) &result; // If we have big endian and we have a smaller buffer the first bytes should // remain zero, meaning we should start filling further back if( type_endianness == Binary::Endianness::Big ) { destination += sizeof( Base ) - inBuffer.size(); } if( type_endianness == inEndianness ) { // equal endianness => simple copy memcpy( destination, inBuffer.data(), inBuffer.size() ); } else { // differing endianness => copy in reverse for( size_t i = 0; i < inBuffer.size(); i+=1 ) { destination[ i ] = inBuffer[ inBuffer.size() - i - 1 ]; } } return result; } }}Are there any improvements I could make?
1 Answer1
I like the enum.Except that it is different from the one C++20defines.
We're missing some review context,like the range of use cases we're trying to address.This submission contains no unit tests or other demo code.It's unclear if caller is likely to call into thisrepeatedly through an array loop.Perhaps "exotic" 24-bit quantities motivated this code.
use constexpr
Every call toFromBytes asks about native endianess.It's a nice piece of code, very clear, but we shouldcache the result as it won't change.Or better, find the result at compile time.
Even better, dispense with it and rely onstd::endian::native.Or write down the underlying requirementsso we understand what systems are beingtargeted and what language / compiler variantsare within scope.
nit: Assigningtmp = test.c[0] == '\1';seems a little more convenient. But perhapsyou wanted the order for the ternaryto be "big, little" for aesthetic reasons.Consider renamingtmp tois_big.
Delete uninformative comments like "Always initialize your variables".
I am glad to see that.size() is the first thing we check. Good.
nit: I find the second form slightly easier to read.
... inBuffer[ inBuffer.size() - i - 1 ]... inBuffer[ inBuffer.size() - 1 - i ]Why? I read it as "constant minusi"where I can reason about the constant locationindependent of the loop.Rather than having to puzzle out the numericrelationship between size andi, observeit is always positive, and then decrementto get to the zero-th element.
document language version
std::reverse_copyhas been available since C++17,but we choose not to use it, unclear why.
Using a standard routine might yield performance identical to a naïve loop.But maybe a vendor went to some trouble tovectorize it for your target processor,and it will go quicker.Use it for the same reason you prefer memcpy().
Tell us which language variants are acceptablefor this project, write it down so futuremaintainers will know the rules.
I was slightly surprised to not seentohl mentioned,norbe64toh, but maybe {3, 5, 6, 7}-byte inBuffersare just as important for the use case we're addressing.Those standard functions should absolutely appear inyour automated test suite.
document the sign bit
The brief review context explicitly says that we implement awell-knowninterface.Wedo support a corresponding endianness parameter.But interpretation of the source sign bit isleft implicit. Presumably it matches theBase type.This should be explicitly written down.
(Also, "Base" seems a slightly odd name as itsuggests "Radix" which isn't relevant here.Maybe "NumericType"?)
API design
Imagine the caller is assigning a great many 16-bit integersvia an array loop.
I worry about whether we're giving the compiler a goodchance at vectorizing such a bulk assignment.Consider offering a second method that moves K integersof identical size.
Imagine the input numbers arealready in native orderand they have a standard {16, 32, 64}-bit size.
Would it help your use case if this routine was giventhe flexibility to report no-op by returningaddress of input buffer, avoiding the memcpy() ?
This code achieves its design goals.Before merging it tomain it should be better documentedand should be accompanied by unit tests.
I would be happy to delegate or accept maintenancetasks on this codebase.
You mustlog in to answer this question.
Explore related questions
See similar questions with these tags.

