Expression templates are aC++template metaprogramming technique that builds structures representing a computation at compile time, where expressions areevaluated only as needed to produce efficient code for the entire computation.[1] Expression templates thus allow programmers to bypass the normal order of evaluation of the C++ language and achieve optimizations such asloop fusion.
Expression templates were invented independently by Todd Veldhuizen and David Vandevoorde;[2][3] it was Veldhuizen who gave them their name.[3] They are a popular technique for the implementation oflinear algebra software.[1]
Consider a library representingvectors and operations on them. One common mathematical operation is to add two vectorsu andv, element-wise, to produce a new vector. The obvious C++ implementation of this operation would be anoverloadedoperator+ that returns a new vector object:
importstd;usingstd::array;/// @brief class representing a mathematical 3D vectorclassVec3:publicarray<double,3>{public:Vec3():array<double,3>(){}};/// @brief sum 'u' and 'v' into a new instance of Vec3Vec3operator+(Vec3const&u,Vec3const&v){Vec3sum;for(size_ti=0;i<u.size();i++){sum[i]=u[i]+v[i];}returnsum;}
Users of this class can now writeVec3 x = a + b; wherea andb are both instances ofVec3.
A problem with this approach is that more complicated expressions such asVec3 x = a + b + c are implemented inefficiently. The implementation first produces a temporaryVec3 to holda + b, then produces anotherVec3 with the elements ofc added in. Even withreturn value optimization this will allocate memory at least twice and require two loops.
Delayed evaluation solves this problem, and can be implemented in C++ by lettingoperator+ return an object of an auxiliary type, sayVec3Sum, that represents the unevaluated sum of twoVec3s, or a vector with aVec3Sum, etc. Larger expressions then effectively buildexpression trees that are evaluated only when assigned to an actualVec3 variable. But this requires traversing such trees to do the evaluation, which is in itself costly.[4]
Expression templates implement delayed evaluation using expression trees that only exist at compile time. Each assignment to aVec3, such asVec3 x = a + b + c, generates a newVec3 constructor if needed by template instantiation. This constructor operates on threeVec3; it allocates the necessary memory and then performs the computation. Thus only one memory allocation is performed.
An example implementation of expression templates looks like the following. A base classVec3Expression represents any vector-valued expression. It is templated on the actual expression typeE to be implemented, per thecuriously recurring template pattern. The existence of a base class likeVecExpression is not strictly necessary for expression templates to work. It will merely serve as a function argument type to distinguish the expressions from other types (note the definition of aVec3 constructor andoperator+ below).
importstd;template<typenameE>classVecExpression{public:staticconstexprboolIS_LEAF=false;[[nodiscard]]doubleoperator[](size_ti)const{// Delegation to the actual expression type. This avoids dynamic polymorphism (a.k.a. virtual functions in C++)returnstatic_cast<Econst&>(*this)[i];}[[nodiscard]]size_tsize()const{returnstatic_cast<Econst&>(*this).size();}};
The Booleanis_leaf is there to tagVecExpressions that are "leafs", i.e. that actually contain data. TheVec3 class is a leaf that stores the coordinates of a fully evaluated vector expression, and becomes a subclass ofVecExpression.
importstd;usingstd::array;usingstd::initializer_list;classVec3:publicVecExpression<Vec3>{private:array<double,3>elems;public:staticconstexprboolIS_LEAF=true;[[nodiscard]]doubleoperator[](size_ti)constnoexcept{returnelems[i];}double&operator[](size_ti)noexcept{returnelems[i];}[[nodiscard]]size_tsize()constnoexcept{returnelems.size();}// construct Vec using initializer listVec3(initializer_list<double>init){std::ranges::copy(init,elems.begin());}// A Vec can be constructed from any VecExpression, forcing its evaluation.template<typenameE>Vec3(VecExpression<E>const&expr){for(size_ti=0;i!=expr.size();++i){elems[i]=expr[i];}}};
The sum of twoVec3s is represented by a new type,VecSum, that is templated on the types of the left- and right-hand sides of the sum so that it can be applied to arbitrary pairs ofVec3 expressions. An overloadedoperator+ serves assyntactic sugar for theVecSum constructor. A subtlety intervenes in this case: in order to reference the original data when summing twoVecExpressions,VecSum needs to store aconstreference to eachVecExpression if it is a leaf, otherwise it is a temporary object that needs to be copied to be properly saved.
importstd;usingstd::conditional;template<typenameE1,typenameE2>classVec3Sum:publicVecExpression<Vec3Sum<E1,E2>>{private:// cref if leaf, copy otherwisetypenameconditional<E1::is_leaf,constE1&,constE1>::typeu;typenameconditional<E2::is_leaf,constE2&,constE2>::typev;public:staticconstexprboolIS_LEAF=false;Vec3Sum(E1const&u,E2const&v):u{u},v{v}{assert(u.size()==v.size());}[[nodiscard]]doubleoperator[](size_ti)constnoexcept{returnu[i]+v[i];}[[nodiscard]]size_tsize()constnoexcept{returnv.size();}};template<typenameE1,typenameE2>Vec3Sum<E1,E2>operator+(VecExpression<E1>const&u,VecExpression<E2>const&v){returnVec3Sum<E1,E2>(*static_cast<constE1*>(&u),*static_cast<constE2*>(&v));}
With the above definitions, the expressiona + b + c is of type
Vec3Sum<Vec3Sum<Vec3,Vec3>,Vec3>
soVec3 x = a + b + c invokes the templatedVec3 constructorVec3(VecExpression<E> const& expr) with its template argumentE being this type (meaningVec3Sum<Vec3Sum<Vec3, Vec3>, Vec3>). Inside this constructor, the loop body
elems[i]=expr[i];
is effectively expanded (following the recursive definitions ofoperator+ andoperator[]on this type) to
elems[i]=a.elems[i]+b.elems[i]+c.elems[i];
with no temporaryVec objects needed and only one pass through each memory block.
The following demonstrates basic usage of the above:
importstd;intmain(){Vec3v0{23.4,12.5,144.56};Vec3v1{67.12,34.8,90.34};Vec3v2{34.90,111.9,45.12};// Following assignment will call the ctor of Vec3 which accept type of// `VecExpression<E> const&`. Then expand the loop body to// a.elems[i] + b.elems[i] + c.elems[i]Vec3sumOfVecType=v0+v1+v2;for(size_ti=0;i<sumOfVecType.size();++i){std::println("{}",sumOfVecType[i]);}// To avoid creating any extra storage, other than v0, v1, v2// one can do the following (Tested with C++11 on GCC 5.3.0)autosum=v0+v1+v2;for(size_ti=0;i<sum.size();++i){std::println("{}",sum[i]);}// Observe that in this case typeid(sum) will be Vec3Sum<Vec3Sum<Vec3, Vec3>, Vec3>// and this chaining of operations can go on.}
Expression templates have been found especially useful by the authors of libraries for linear algebra, that is, for dealing with vectors andmatrices of numbers. Among libraries employing expression template areDlib,Armadillo,Blaze,[5]Blitz++,[6]Boost uBLAS,[7]Eigen,[8] POOMA,[9]Stan Math Library,[10] and xtensor.[11] Expression templates can also accelerate C++automatic differentiation implementations,[12] as demonstrated in theAdept library.
Outside of vector math, theSpirit parser framework uses expression templates to representformal grammars and compile these into parsers.