Constant Interpreter¶
Introduction¶
The constexpr interpreter aims to replace the existing tree evaluator inclang, improving performance on constructs which are executed inefficientlyby the evaluator. The interpreter is activated using the following flags:
-fexperimental-new-constant-interpreterenables the interpreter,emitting an error if an unsupported feature is encountered
Bytecode Compilation¶
Bytecode compilation is handled inCompiler.h for statementsand for expressions. The compiler has two differentbackends: one to generate bytecode for functions (ByteCodeEmitter) andone to directly evaluate expressions as they are compiled, withoutgenerating bytecode (EvalEmitter). All functions are compiled tobytecode, while toplevel expressions used in constant contexts are directlyevaluated since the bytecode would never be reused. This mechanism aims topave the way towards replacing the evaluator, improving its performance onfunctions and loops, while being just as fast on single-use toplevelexpressions.
The interpreter relies on stack-based, strongly-typed opcodes. The gluelogic between the code generator, along with the enumeration anddescription of opcodes, can be found inOpcodes.td. The opcodes areimplemented as generic template methods inInterp.h and instantiatedwith the relevant primitive types by the interpreter loop or by theevaluating emitter.
Primitive Types¶
PT_{U|S}int{8|16|32|64}Signed or unsigned integers of a specific bit width, implemented usingthe
`Integral`type.PT_IntAP{S}Signed or unsigned integers of an arbitrary, but fixed width used toimplement integral types which are required by the target, but are notsupported by the host. Under the hood, they rely on
APInt. TheIntegralspecialisation for these types is required by opcodes toshare an implementation with fixed integrals.PT_BoolRepresentation for boolean types, essentially a 1-bit unsigned
Integral.PT_FloatArbitrary, but fixed precision floating point numbers. Could bespecialised in the future similarly to integers in order to improvefloating point performance.
PT_PtrPointer type, defined in
"Pointer.h". The most common type ofpointer is a “BlockPointer”, which points to aninterp::Block.But other pointer types exist, such as typeid pointers orintegral pointers.PT_FnPtrFunction pointer type, can also be a null function pointer. Definedin
"FunctionPointer.h".PT_MemberPtrMember pointer type, can also be a null member pointer. Definedin
"MemberPointer.h"
Composite types¶
The interpreter distinguishes two kinds of composite types: arrays andrecords (structs and classes). Unions are represented as records, exceptat most a single field can be marked as active. The contents of inactivefields are kept until they are reactivated and overwritten.Complex numbers (_Complex) and vectors(__attribute((vector_size(16)))) are treated as arrays.
Bytecode Execution¶
Bytecode is executed using a stack-based interpreter. The executioncontext consists of anInterpStack, along with a chain ofInterpFrame objects storing the call frames. Frames are built bycall instructions and destroyed by return instructions. They performone allocation to reserve space for all locals in a single block.These objects store all the required information to emit stack traceswhenever evaluation fails.
Memory Organisation¶
Memory management in the interpreter relies on 3 data structures:Blockobjects which store the data and associated inline metadata,Pointerobjects which refer to or into blocks, andDescriptor structures whichdescribe blocks and subobjects nested inside blocks.
Blocks¶
Blocks contain data interleaved with metadata. They are allocated eitherstatically in the code generator (globals, static members, dummy parametervalues etc.) or dynamically in the interpreter, when creating the framecontaining the local variables of a function. Blocks are associated with adescriptor that characterises the entire allocation, along with a fewadditional attributes:
IsStaticindicates whether the block has static duration in theinterpreter, i.e. it is not a local in a frame.DeclIDidentifies each global declaration (it is set to an invalidand irrelevant value for locals) in order to prevent illegal writes andreads involving globals and temporaries with static storage duration.
Static blocks are never deallocated, but local ones might be deallocatedeven when there are live pointers to them. Pointers are only valid aslong as the blocks they point to are valid, so a block with pointers toit whose lifetime ends is kept alive until all pointers to it go out ofscope. Since the frame is destroyed on function exit, such blocks areturned into aDeadBlock and copied to storage managed by theinterpreter itself, not the frame. Reads and writes to these blocks areillegal and cause an appropriate diagnostic to be emitted. When the lastpointer goes out of scope, dead blocks are also deallocated.
The lifetime of blocks is managed through 3 methods stored in thedescriptor of the block:
CtorFn: initializes the metadata which is store in the block,alongside actual data. Invokes the default constructors of objectswhich are not trivial (
Pointer,RealFP, etc.)DtorFn: invokes the destructors of non-trivial objects.
MoveFn: moves a block to dead storage.
Non-static blocks track all the pointers into them through an intrusivedoubly-linked list, required to adjust and invalidate all pointers whentransforming a block into a dead block. If the lifetime of an object ends,all pointers to it are invalidated, emitting the appropriate diagnostics whendereferenced.
The interpreter distinguishes 3 different kinds of blocks:
Primitives
A block containing a single primitive with no additional metadata.
Arrays of primitives
An array of primitives contains a pointer to an
InitMapstorage as itsfirst field: the initialisation map is a bit map indicating all elements ofthe array which were initialised. If the pointer is null, no elements wereinitialised, while a value of(InitMap*)-1indicates that the object wasfully initialised. When all fields are initialised, the map is deallocatedand replaced with that token.Array elements are stored sequentially, without padding, after the pointerto the map.
Arrays of composites and records
Each element in an array of composites is preceded by an
InlineDescriptorwhich stores the attributes specific to the field and not the wholeallocation site. Descriptors and elements are stored sequentially in theblock.Records are laid out identically to arrays of composites: each field and baseclass is preceded by an inline descriptor. TheInlineDescriptorhas the following fields:Offset: byte offset into the array or record, used to step back to theparent array or record.
IsConst: flag indicating if the field is const-qualified.
IsInitialized: flag indicating whether the field or element wasinitialized. For non-primitive fields, this is only relevant to determinethe dynamic type of objects during construction.
IsBase: flag indicating whether the record is a base class. In thatcase, the offset can be used to identify the derived class.
IsActive: indicates if the field is the active field of a union.
IsMutable: indicates if the field is marked as mutable.
Inline descriptors are filled in by theCtorFn of blocks, which leaves storagein an uninitialised, but valid state.
Descriptors¶
Descriptors are generated at bytecode compilation time and contain informationrequired to determine if a particular memory access is allowed in constexpr.They also carry all the information required to emit a diagnostic involvinga memory access, such as the declaration which originates the block.Currently there is a single kind of descriptor encoding information for allblock types.
Pointers¶
Pointers, implemented inPointer.h are represented as a tagged union.
BlockPointer: used to reference memory allocated and managed by theinterpreter, being the only pointer kind which allows dereferencing in theinterpreter
TypeIDPointer: tracks information for the opaque type returned by
typeidIntegralPointer: a pointer formed from an integer,think
(int*)123.
Besides the previously mentioned union, a number of other pointer-like typeshave their own type:
FunctionPointer tracks functions.
MemberPointer tracks C++ object members
BlockPointer¶
Block pointers track aPointee, the block to which they point, alongwith aBase and anOffset. The base identifies the innermost field,while the offset points to an array element relative to the base (includingone-past-end pointers). The offset identifies the array element or fieldwhich is referenced, while the base points to the outer object or array whichcontains the field. These two fields allow all pointers to be uniquelyidentified, disambiguated and characterised.
As an example, consider the following structure:
structA{structB{intx;inty;}b;structC{inta;intb;}c[2];intz;};constexprAa;
On the target,&a and&a.b.x are equal. So are&a.c[0] and&a.c[0].a. In the interpreter, all these pointers must bedistinguished since the are all allowed to address distinct range ofmemory.
In the interpreter, the object would require 240 bytes of storage andwould have its field interleaved with metadata. The pointers which canbe derived to the object are illustrated in the following diagram:
016324056648096112120136144160176184200208224240+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---++B|D|D|x|D|y|D|D|D|a|D|b|D|D|a|D|b|D|z|+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+^^^^^^^^^^^^|||||||&a.c[0].b||&a.c[1].b|a|&a.b.x&a.y&a.c|&a.c[0].a|&a.c[1].a|&a.b&a.c[0]&a.c[1]&a.z
TheBase offset of all pointers points to the start of a field oran array and is preceded by an inline descriptor (unlessBase iszero, pointing to the root). All the relevant attributes can be readfrom either the inline descriptor or the descriptor of the block.
Array elements are identified by theOffset field of pointers,pointing to past the inline descriptors for composites and beforethe actual data in the case of primitive arrays. TheOffsetpoints to the offset where primitives can be read from. As an example,a.c+1 would have the same base asa.c since it is an elementofa.c, but its offset would point to&a.c[1]. Thearray-to-pointer decay operation adjusts a pointer to an array (wherethe offset is equal to the base) to a pointer to the first element.
TypeInfoPointer¶
TypeInfoPointer tracks two types: the type assigned tostd::type_info and the type which was passed totypeinfo.It is part of the taged union inPointer.