This articleneeds additional citations forverification. Please helpimprove this article byadding citations to reliable sources. Unsourced material may be challenged and removed. Find sources: "Undefined behavior" – news ·newspapers ·books ·scholar ·JSTOR(February 2025) (Learn how and when to remove this message) |
Acomputer program exhibitsundefined behavior (UB) when it contains, or is executing code for which itsprogramming language specification does not mandate any specific requirements.[1] This is different fromunspecified behavior, for which the language specification does not prescribe a result, and implementation-defined behavior that defers to the documentation of another component of theplatform (such as theABI or thetranslator documentation).
In theC programming community, undefined behavior may be humorously referred to as "nasal demons", after acomp.std.c post that explained undefined behavior as allowing the compiler to do anything it chooses, even "to make demons fly out of your nose".[2]
Some programming languages allow a program to operate differently or even have a different control flow from the source code, as long as it exhibits the same user-visibleside effects,if undefined behavior never happens during program execution. Undefined behavior is the name of a list of conditions that the program must not meet.
In the early versions ofC, undefined behavior's primary advantage was the production of performantcompilers for a wide variety of machines[citation needed]: a specific construct could be mapped to a machine-specific feature,[vague] and the compiler did not have to generate additional code for the runtime to adapt the side effects to match semantics imposed by the language. The program source code was written with prior knowledge of the specific compiler and of theplatforms that it would support.
However, progressive standardization of the platforms has made this less of an advantage, especially in newer versions of C. Now, the cases for undefined behavior typically represent unambiguousbugs in the code, for exampleindexing an array outside of its bounds. By definition, theruntime can assume that undefined behavior never happens; therefore, some invalid conditions do not need to be checked against. For acompiler, this also means that variousprogram transformations become valid, or their proofs of correctness are simplified; this allows for various kinds of optimizations whose correctness depend on the assumption that the program state never meets any such condition. The compiler can also remove explicit checks that may have been in the source code, without notifying the programmer; for example, detecting undefined behavior by testing whether it happened is not guaranteed to work, by definition. This makes it hard or impossible to program a portable fail-safe option (non-portable solutions are possible for some constructs).
Current compiler development usually evaluates and compares compiler performance with benchmarks designed around micro-optimizations, even on platforms that are mostly used on the general-purpose desktop and laptop market (such as amd64). Therefore, undefined behavior provides ample room for compiler performance improvement, as the source code for a specific source code statement is allowed to be mapped to anything at runtime.
For C and C++, the compiler is allowed to give a compile-time diagnostic in these cases, but is not required to: the implementation will be considered correct whatever it does in such cases, analogous todon't-care terms in digital logic. It is the responsibility of the programmer to write code that never invokes undefined behavior, although compiler implementations are allowed to issue diagnostics when this happens. Compilers nowadays have flags that enable such diagnostics, for example,-fsanitize=undefined enables the "undefined behavior sanitizer" (UBSan) ingcc 4.9[3] and inclang. However, this flag is not the default and enabling it is a choice of the person who builds the code.
Under some circumstances there can be specific restrictions on undefined behavior. For example, theinstruction set specifications of aCPU might leave the behavior of some forms of an instruction undefined, but if the CPU supportsmemory protection then the specification will probably include a blanket rule stating that no user-accessible instruction may cause a hole in theoperating system's security; so an actual CPU would be permitted to corrupt user registers in response to such an instruction, but would not be allowed to, for example, switch intosupervisor mode.
The runtimeplatform can also provide some restrictions or guarantees on undefined behavior, if thetoolchain or theruntime explicitly document that specific constructs found in thesource code are mapped to specific well-defined mechanisms available at runtime. For example, aninterpreter may document a particular behavior for some operations that are undefined in the language specification, while other interpreters or compilers for the same language may not. Acompiler producesexecutable code for a specificABI, filling thesemantic gap in ways that depend on the compiler version: the documentation for that compiler version and the ABI specification can provide restrictions on undefined behavior. Relying on these implementation details makes the software non-portable, but portability may not be a concern if the software is not supposed to be used outside of a specific runtime.
Undefined behavior can result in a program crash or even in failures that are harder to detect and make the program look like it is working normally, such as silent loss of data and production of incorrect results.
In the design ofprogramming languages, anerroneous program is one whose semantics are not well-defined, but where the language implementation is not obligated to signal an error either at compile or at execution time. For example, inAda:
Defining a condition as "erroneous" means that the language implementation need not perform a potentially expensive check (e.g. that a global variablerefers to the same object as a subroutine parameter) but may nonetheless depend on a condition being true in defining the semantics of the program.
Documenting an operation as undefined behavior allows compilers to assume that this operation will never happen in a conforming program. This gives the compiler more information about the code and this information can lead to more optimization opportunities.
An example for the C language:
intfoo(unsignedcharx){intvalue=2147483600;// assuming 32-bit int and 8-bit charvalue+=x;if(value<2147483600){bar();}returnvalue;}
The value ofx cannot be negative and, given that signedinteger overflow is undefined behavior in C, the compiler can assume thatvalue < 2147483600 will always be false. Thus theif statement, including the call to the functionbar, can be ignored by the compiler since the test expression in theif has noside effects and its condition will never be satisfied. The code is therefore semantically equivalent to:
intfoo(unsignedcharx){intvalue=2147483600;value+=x;returnvalue;}
Had the compiler been forced to assume that signed integer overflow haswraparound behavior, then the transformation above would not have been legal.
Such optimizations become hard to spot by humans when the code is more complex and other optimizations, likeinlining, take place. For example, another function may call the above function:
voidrun_tasks(unsignedchar*ptrx){intz;z=foo(*ptrx);while(*ptrx>60){run_one_task(ptrx,z);}}
The compiler is free to optimize away thewhile-loop here by applyingvalue range analysis: by inspectingfoo(), it knows that the initial value pointed to byptrx cannot possibly exceed 47 (as any more would trigger undefined behavior infoo()); therefore, the initial check of*ptrx > 60 will always be false in a conforming program. Going further, since the resultz is now never used andfoo() has no side effects, the compiler can optimizerun_tasks() to be an empty function that returns immediately. The disappearance of thewhile-loop may be especially surprising iffoo() is defined in aseparately compiled object file.
Another benefit from allowing signed integer overflow to be undefined is that it makes it possible to store and manipulate a variable's value in aprocessor register that is larger than the size of the variable in the source code. For example, if the type of a variable as specified in the source code is narrower than the native register width (such asint on a64-bit machine, a common scenario), then the compiler can safely use a signed 64-bit integer for the variable in themachine code it produces, without changing the defined behavior of the code. If a program depended on the behavior of a 32-bit integer overflow, then a compiler would have to insert additional logic when compiling for a 64-bit machine, because the overflow behavior of most machine instructions depends on the register width.[5]
Undefined behavior also allows more compile-time checks by both compilers andstatic program analysis.[citation needed]
C and C++ standards have several forms of undefined behavior throughout, which offer increased liberty in compiler implementations and compile-time checks at the expense of undefined run-time behavior if present. In particular, theISO standard for C has an appendix listing common sources of undefined behavior.[6] Moreover, compilers are not required to diagnose code that relies on undefined behavior. Hence, it is common for programmers, even experienced ones, to rely on undefined behavior either by mistake, or simply because they are not well-versed in the rules of the language that can span hundreds of pages. This can result in bugs that are exposed when a different compiler, or different settings, are used. Testing orfuzzing with dynamic undefined behavior checks enabled, e.g., theClang sanitizers, can help to catch undefined behavior not diagnosed by the compiler or static analyzers.[7]
Undefined behavior can lead tosecurity vulnerabilities in software. For example, buffer overflows and other security vulnerabilities in the majorweb browsers are due to undefined behavior. WhenGCC's developers changed their compiler in 2008 such that it omitted certain overflow checks that relied on undefined behavior,CERT issued a warning against the newer versions of the compiler.[8]Linux Weekly News pointed out that the same behavior was observed inPathScale C,Microsoft Visual C++ 2005 and several other compilers;[9] the warning was later amended to warn about various compilers.[10]
The major forms of undefined behavior in C can be broadly classified as:[11] spatial memory safety violations, temporal memory safety violations,integer overflow, strict aliasing violations, alignment violations, unsequenced modifications, data races, and loops that neither perform I/O nor terminate.
In C the use of anyautomatic variable before it has been initialized yields undefined behavior, as does integerdivision by zero, signed integer overflow, indexing an array outside of its defined bounds (seebuffer overflow), ornull pointerdereferencing. In general, any instance of undefined behavior leaves the abstract execution machine in an unknown state, and causes the behavior of the entire program to be undefined.
Due to the fact thatstring literals are usually stored in read-only memory, attempting to modify one causes undefined behavior:[12]
char*p="wikipedia";// valid C, deprecated in C++98/C++03, ill-formed as of C++11p[0]='W';// undefined behavior
Integerdivision by zero results in undefined behavior:[13]
intx=1;returnx/0;// undefined behavior
Certain pointer operations may result in undefined behavior:[14]
intarr[4]={0,1,2,3};int*p=arr+5;// undefined behavior for indexing out of boundsp=nullptr;inta=*p;// undefined behavior for dereferencing a null pointer
In C and C++, the relational comparison ofpointers to objects (for less-than or greater-than comparison) is only strictly defined if the pointers point to members of the same object, or elements of the samearray.[15] Example:
intmain(void){inta=0;intb=0;return&a<&b;// undefined behavior}
Reaching the end of a value-returning function (other thanmain()) without a return statement results in undefined behavior if the value of the function call is used by the caller:[16]
intf(){}intx=f();// undefined behavior
Modifying an object between twosequence points more than once produces undefined behavior.[17] There are considerable changes in what causes undefined behavior in relation to sequence points as of C++11.[18] Modern compilers can emit warnings when they encounter multiple unsequenced modifications to the same object.[19][20] The following example will cause undefined behavior in both C and C++.
intf(inti){// undefined behavior: two unsequenced modifications to ireturni+++i++;}
When modifying an object between two sequence points, reading the value of the object for any other purpose than determining the value to be stored is also undefined behavior.[21]
a[i]=i++;// undefined behaviorprintf("%d %d\n",++n,pow(2,n));// also undefined behavior
In C/C++bitwise shifting a value by a number of bits which is either a negative number or is greater than or equal to the total number of bits in this value results in undefined behavior. The safest way (regardless of compiler vendor) is to always keep the number of bits to shift (the right operand of the<< and>>bitwise operators) within the range: [0,sizeof value * CHAR_BIT - 1] (wherevalue is the left operand).
intnum=-1;unsignedintval=1<<num;// shifting by a negative number - undefined behaviornum=32;// or any number greater than 31val=1<<num;// the literal '1' is typed as a 32-bit integer - in this case shifting by more than 31 bits is undefined behaviornum=64;// or any number greater than 63unsignedlonglongval2=1ULL<<num;// the literal '1ULL' is typed as a 64-bit integer - in this case shifting by more than 63 bits is undefined behavior
InC#, undefined behavior can be invoked inunsafe context.
usingSystem;unsafe{int*p=(int*)0x12345678;Console.WriteLine(*p);// reading an arbitrary memory address}
A use-after-free of stack memory also triggers undefined behavior.
usingSystem;unsafeint*GetPointer(){intx=100;return&x;}int*p=GetPointer();Console.WriteLine(*p);// gets a pointer to no-longer valid stack memory
InJava, native interop and data races are the most notable cases where undefined behavior occurs.
The followingdata race can trigger undefined behavior by violating the Java Memory Model.
intx=0;booleanready=false;Threadt1=newThread(()->{x=33;ready=true;});Threadt2=newThread(()->{if(ready){System.out.println(x);// may print 33 or 0}});t1.start();t2.start();
Undefined memory can arise inJava Native Interface calls that may case undefined behavior. From C:
#include<jni.h>JNIEXPORTjintJNICALLJava_Crash_boom(JNIEnv*env,jclasscls){int*p=NULL;return*p;// dereferencing a null pointer}
On Java:
packageorg.wikipedia.examples;publicclassCrash{static{System.loadLibrary("crash");}privatestaticnativeintboom();publicstaticvoidmain(String[]args){System.out.println(boom());}}
Whilst undefined behaviour can generally be expected to not occur in safeRust, improper, unsafe code can still expose UB to safe code in what is known assoundness holes.[22]
As an example, many data types in Rust make use ofinvariants that allow for useful optimisations.References are one example in that – whilst fundamentally having the same representation as raw pointers – they may never be e.g.null,unaligned, or otherwise point to invalid destinations. Thus, breaking any of these invariants is undefined no matter how the resulting reference is used:
usestd::mem;/// Constructs a null reference.pubconstfnnull_ref<T:?Sized>()->&T{unsafe{mem::zeroed()}}
Callingnull_ref is always malformed due to the invariants imposed by all reference types, even though the function itself is not anunsafe fn item and can be called from safe code.
Furthermore, dereferencing any null pointer is undefined, although many host systems are still designed to handle such cases in asegmentation fault:
usestd::ptr;fnmain(){letp:*consti32=ptr::null();// SAFETY: `p` is null and may not be dereferenced.unsafe{*p};}