Movatterモバイル変換

[0]ホーム

Jump to content

Type punning

Edit links

From Wikipedia, the free encyclopedia

Technique circumventing programming language data typing

This article has multiple issues. Please helpimprove it or discuss these issues on thetalk page.(Learn how and when to remove these messages)

This articleneeds additional citations forverification. Please helpimprove this article byadding citations to reliable sources. Unsourced material may be challenged and removed.
Find sources: "Type punning" – news ·newspapers ·books ·scholar ·JSTOR(October 2011) (Learn how and when to remove this message)

This articlecontainsinstructions, advice, or how-to content. Please helprewrite the content so that it is more encyclopedic ormove it toWikiversity,Wikibooks, orWikivoyage.(October 2011)

(Learn how and when to remove this message)

Incomputer science, atype punning is any programming technique that subverts or circumvents thetype system of aprogramming language in order to achieve an effect that would be difficult or impossible to achieve within the bounds of the formal language.

InC andC++, constructs such aspointer type conversion andunion — C++ addsreference type conversion andreinterpret_cast to this list — are provided in order to permit many kinds of type punning, although some kinds are not actually supported by the standard language.

In thePascal programming language, the use ofrecords withvariants may be used to treat a particular data type in more than one manner, or in a manner not normally permitted.

Sockets example

[edit]

One classic example of type punning is found in theBerkeley sockets interface. The function to bind an opened but uninitialized socket to anIP address is declared as follows:

intbind(intsockfd,structsockaddr*my_addr,socklen_taddrlen);

Thebind function is usually called as follows:

structsockaddr_insa={0};intsockfd=...;sa.sin_family=AF_INET;sa.sin_port=htons(port);bind(sockfd,(structsockaddr*)&sa,sizeofsa);

The Berkeley sockets library fundamentally relies on the fact that inC, a pointer tostruct sockaddr_in is freely convertible to a pointer tostruct sockaddr; and, in addition, that the two structure types share the same memory layout. Therefore, a reference to the structure fieldmy_addr->sin_family (wheremy_addr is of typestruct sockaddr*) will actually refer to the fieldsa.sin_family (wheresa is of typestruct sockaddr_in). In other words, the sockets library uses type punning to implement a rudimentary form ofpolymorphism orinheritance.

Often seen in the programming world is the use of "padded" data structures to allow for the storage of different kinds of values in what is effectively the same storage space. This is often seen when two structures are used in mutual exclusivity for optimization.

Floating-point example

[edit]

Not all examples of type punning involve structures, as the previous example did. Suppose we want to determine whether afloating-point number is negative. We could write:

boolis_negative(floatx){returnx<0.0f;}

However, supposing that floating-point comparisons are expensive, and also supposing thatfloat is represented according to theIEEE floating-point standard, and integers are 32 bits wide, we could engage in type punning to extract thesign bit of the floating-point number using only integer operations:

boolis_negative(floatx){int*i=(int*)&x;return*i<0;}

Note that the behaviour will not be exactly the same: in the special case ofx beingnegative zero, the first implementation yieldsfalse while the second yieldstrue. Also, the first implementation will returnfalse for anyNaN value, but the latter might returntrue for NaN values with the sign bit set. Lastly we have the problem wherein the storage of the floating point data may be in big endian or little endian memory order and thus the sign bit could be in the least significant byte or the most significant byte. Therefore the use of type punning with floating point data is a questionable method with unpredictable results.

This kind of type punning is more dangerous than most. Whereas the former example relied only on guarantees made by the C programming language about structure layout and pointer convertibility, the latter example relies on assumptions about a particular system's hardware. The C99 Language Specification ( ISO9899:1999 ) has the following warning in section 6.3.2.3 Pointers : "A pointer to an object or incomplete type may be converted to a pointer to a different object or incomplete type. If the resulting pointer is not correctly aligned for the pointed-to type, the behavior is undefined." Therefore one should be very careful with the use of type punning.

Some situations, such astime-critical code that the compiler otherwise fails tooptimize, may require dangerous code. In these cases, documenting all such assumptions incomments, and introducingstatic assertions to verify portability expectations, helps to keep the codemaintainable.

Practical examples of floating-point punning includefast inverse square root popularized byQuake III, fast FP comparison as integers,^[1] and finding neighboring values by incrementing as an integer (implementingnextafter).^[2]

By language

[edit]

C and C++

[edit]

In addition to the assumption about bit-representation of floating-point numbers, the above floating-point type-punning example also violates the C language's constraints on how objects are accessed:^[3] the declared type ofx isfloat but it is read through an expression of typeunsigned int. On many common platforms, this use of pointer punning can create problems if different pointers arealigned in machine-specific ways. Furthermore, pointers of different sizes canalias accesses to the same memory, causing problems that are unchecked by the compiler. Even when data size and pointer representation match, however, compilers can rely on the non-aliasing constraints to perform optimizations that would be unsafe in the presence of disallowed aliasing.

Use of pointers

[edit]

A naive attempt at type-punning can be achieved by using pointers: (The following running example assumes IEEE-754 bit-representation for typefloat.)

boolis_negative(floatx){int32_ti=*(int32_t*)&x;// In C++ this is equivalent to: int32_t i = *reinterpret_cast<int32_t*>(&x);returni<0;}

The C standard's aliasing rules state that an object shall have its stored value accessed only by an lvalue expression of a compatible type.^[4] The typesfloat andint32_t are not compatible, therefore this code's behavior isundefined. Although on GCC and LLVM this particular program compiles and runs as expected, more complicated examples may interact with assumptions made bystrict aliasing and lead to unwanted behavior. The option-fno-strict-aliasing will ensure correct behavior of code using this form of type-punning, although using other forms of type punning is recommended.^[5]

Use of`union`

[edit]

In C, but not in C++, it is sometimes possible to perform type punning via aunion.

boolis_negative(floatx){union{inti;floatd;}my_union;my_union.d=x;returnmy_union.i<0;}

Accessingmy_union.i after most recently writing to the other member,my_union.d, is an allowed form of type-punning in C,^[6] provided that the member read is not larger than the one whose value was set (otherwise the read hasunspecified behavior^[7]). The same is syntactically valid but hasundefined behavior in C++,^[8] however, where only the last-written member of aunion is considered to have any value at all.

For another example of type punning, seeStride of an array.

Use of`bit_cast`

[edit]

InC++20, thestd::bit_cast function allows type punning with no undefined behavior. It also allows the function be labeledconstexpr.

constexprboolis_negative(floatx)noexcept{static_assert(std::numeric_limits<float>::is_iec559);// (enable only on IEEE 754)autoi=std::bit_cast<std::int32_t>(x);returni<0;}

Pascal

[edit]

A variant record permits treating a data type as multiple kinds of data depending on which variant is being referenced. In the following example,integer is presumed to be 16 bit, whilelongint andreal are presumed to be 32, while character is presumed to be 8 bit:

typeVariantRecord=recordcaseRecType:LongIntof1:(I:array[1..2]ofInteger);(* not show here: there can be several variables in a variant record's case statement *)2:(L:LongInt);3:(R:Real);4:(C:array[1..4]ofChar);end;varV:VariantRecord;K:Integer;LA:LongInt;RA:Real;Ch:Character;V.I[1]:=1;Ch:=V.C[1];(* this would extract the first byte of V.I *)V.R:=8.3;LA:=V.L;(* this would store a Real into an Integer *)

In Pascal, copying a real to an integer converts it to the truncated value. This method would translate the binary value of the floating-point number into whatever it is as a long integer (32 bit), which will not be the same and may be incompatible with the long integer value on some systems.

These examples could be used to create strange conversions, although, in some cases, there may be legitimate uses for these types of constructs, such as for determining locations of particular pieces of data. In the following example a pointer and a longint are both presumed to be 32 bit:

typePA=^Arec;Arec=recordcaseRT:LongIntof1:(P:PA);2:(L:LongInt);end;varPP:PA;K:LongInt;New(PP);PP^.P:=PP;WriteLn('Variable PP is located at address ',Hex(PP^.L));

Where "new" is the standard routine in Pascal for allocating memory for a pointer, and "hex" is presumably a routine to print the hexadecimal string describing the value of an integer. This would allow the display of the address of a pointer, something which is not normally permitted. (Pointers cannot be read or written, only assigned.) Assigning a value to an integer variant of a pointer would allow examining or writing to any location in system memory:

PP^.L:=0;PP:=PP^.P;(* PP now points to address 0     *)K:=PP^.L;(* K contains the value of word 0 *)WriteLn('Word 0 of this machine contains ',K);

This construct may cause a program check or protection violation if address 0 is protected against reading on the machine the program is running upon or the operating system it is running under.

The reinterpret cast technique from C/C++ also works in Pascal. This can be useful, when eg. reading dwords from a byte stream, and we want to treat them as float. Here is a working example, where we reinterpret-cast a dword to a float:

typepReal=^Real;varDW:DWord;F:Real;F:=pReal(@DW)^;

C#

[edit]

InC# (and other .NET languages), type punning is a little harder to achieve because of the type system, but can be done nonetheless, using pointers or struct unions.

Pointers

[edit]

C# only allows pointers to so-called native types, i.e. any primitive type (exceptstring), enum, array or struct that is composed only of other native types. Note that pointers are only allowed in code blocks marked 'unsafe'.

floatpi=3.14159;uintpiAsRawData=*(uint*)&pi;

Struct unions

[edit]

Struct unions are allowed without any notion of 'unsafe' code, but they do require the definition of a new type.

[StructLayout(LayoutKind.Explicit)]structFloatAndUIntUnion{[FieldOffset(0)]publicfloatDataAsFloat;[FieldOffset(0)]publicuintDataAsUInt;}// ...FloatAndUIntUnionunion;union.DataAsFloat=3.14159;uintpiAsRawData=union.DataAsUInt;

Raw CIL code

[edit]

RawCIL can be used instead of C#, because it doesn't have most of the type limitations. This allows one to, for example, combine two enum values of a generic type:

TEnuma=...;TEnumb=...;TEnumcombined=a|b;// illegal

This can be circumvented by the following CIL code:

.methodpublicstatichidebysig!!TEnumCombineEnums<valuetype.ctor([mscorlib]System.ValueType)TEnum>(!!TEnuma,!!TEnumb)cilmanaged{.maxstack2ldarg.0ldarg.1or// this will not cause an overflow, because a and b have the same type, and therefore the same size.ret}

Thecpblk CIL opcode allows for some other tricks, such as converting a struct to a byte array:

.methodpublicstatichidebysiguint8[]ToByteArray<valuetype.ctor([mscorlib]System.ValueType)T>(!!T&v// 'ref T' in C#)cilmanaged{.localsinit([0]uint8[]).maxstack3// create a new byte array with length sizeof(T) and store it in local 0sizeof!!Tnewarruint8dup// keep a copy on the stack for later (1)stloc.0ldc.i4.0ldelemauint8// memcpy(local 0, &v, sizeof(T));// <the array is still on the stack, see (1)>ldarg.0// this is the *address* of 'v', because its type is '!!T&'sizeof!!Tcpblkldloc.0ret}

References

[edit]

^Herf, Michael (December 2001)."radix tricks".stereopsis : graphics.
^"Stupid Float Tricks".Random ASCII - tech blog of Bruce Dawson. 24 January 2012.
^ISO/IEC 9899:1999 s6.5/7
^"§ 6.5/7"(PDF),ISO/IEC 9899:2018, 2018, p. 55, archived fromthe original(PDF) on 2018-12-30,An object shall have its stored value accessed only by an lvalue expression that has one of the following types: [...]
^"GCC Bugs - GNU Project".gcc.gnu.org.
^"§ 6.5.2.3/3, footnote 97"(PDF),ISO/IEC 9899:2018, 2018, p. 59, archived fromthe original(PDF) on 2018-12-30,If the member used to read the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called "type punning").This might be a trap representation.
^"§ J.1/1, bullet 11"(PDF),ISO/IEC 9899:2018, 2018, p. 403, archived fromthe original(PDF) on 2018-12-30,The following are unspecified: … The values of bytes that correspond to union membersother than the one last stored into (6.2.6.1).
^ISO/IEC 14882:2011 Section 9.5

External links

[edit]

Section of theGCC manual on-fstrict-aliasing, which defeats some type punning
Defect Report 257 to theC99 standard, incidentally defining "type punning" in terms ofunion, and discussing the issues surrounding the implementation-defined behavior of the last example above
Defect Report 283 on the use of unions for type punning