TableGen Overview¶
Introduction¶
TableGen’s purpose is to help a human develop and maintain records ofdomain-specific information. Because there may be a large number of theserecords, it is specifically designed to allow writing flexible descriptions andfor common features of these records to be factored out. This reduces theamount of duplication in the description, reduces the chance of error, and makesit easier to structure domain specific information.
The TableGen front end parses a file, instantiates the declarations, andhands the result off to a domain-specificbackend for processing. SeetheTableGen Programmer’s Reference for an in-depthdescription of TableGen. Seetblgen - Description to C++ Code for details on the*-tblgen
commandsthat run the various flavors of TableGen.
The current major users of TableGen areThe LLVM Target-IndependentCode Generator and theClang diagnostics and attributes.
Note that if you work with TableGen frequently and use emacs or vim,you can find an emacs “TableGen mode” and a vim language file in thellvm/utils/emacs
andllvm/utils/vim
directories of your LLVMdistribution, respectively.
The TableGen program¶
TableGen files are interpreted by the TableGen program:llvm-tblgen availableon your build directory underbin. It is not installed in the system (or whereyour sysroot is set to), since it has no use beyond LLVM’s build process.
Running TableGen¶
TableGen runs just like any other LLVM tool. The first (optional) argumentspecifies the file to read. If a filename is not specified,llvm-tblgen
reads from standard input.
To be useful, one of thebackends must be used. These backends areselectable on the command line (type ‘llvm-tblgen-help
’ for a list). Forexample, to get a list of all of the definitions that subclass a particular type(which can be useful for building up an enum list of these records), use the-print-enums
option:
$llvm-tblgenX86.td-print-enums-class=RegisterAH,AL,AX,BH,BL,BP,BPL,BX,CH,CL,CX,DH,DI,DIL,DL,DX,EAX,EBP,EBX,ECX,EDI,EDX,EFLAGS,EIP,ESI,ESP,FP0,FP1,FP2,FP3,FP4,FP5,FP6,IP,MM0,MM1,MM2,MM3,MM4,MM5,MM6,MM7,R10,R10B,R10D,R10W,R11,R11B,R11D,R11W,R12,R12B,R12D,R12W,R13,R13B,R13D,R13W,R14,R14B,R14D,R14W,R15,R15B,R15D,R15W,R8,R8B,R8D,R8W,R9,R9B,R9D,R9W,RAX,RBP,RBX,RCX,RDI,RDX,RIP,RSI,RSP,SI,SIL,SP,SPL,ST0,ST1,ST2,ST3,ST4,ST5,ST6,ST7,XMM0,XMM1,XMM10,XMM11,XMM12,XMM13,XMM14,XMM15,XMM2,XMM3,XMM4,XMM5,XMM6,XMM7,XMM8,XMM9,$llvm-tblgenX86.td-print-enums-class=InstructionABS_F,ABS_Fp32,ABS_Fp64,ABS_Fp80,ADC32mi,ADC32mi8,ADC32mr,ADC32ri,ADC32ri8,ADC32rm,ADC32rr,ADC64mi32,ADC64mi8,ADC64mr,ADC64ri32,ADC64ri8,ADC64rm,ADC64rr,ADD16mi,ADD16mi8,ADD16mr,ADD16ri,ADD16ri8,ADD16rm,ADD16rr,ADD32mi,ADD32mi8,ADD32mr,ADD32ri,ADD32ri8,ADD32rm,ADD32rr,ADD64mi32,ADD64mi8,ADD64mr,ADD64ri32,...
The default backend prints out all of the records. There is also a generalbackend which outputs all the records as a JSON data structure, enabled usingthe-dump-json option.
If you plan to use TableGen, you will most likely have to write abackendthat extracts the information specific to what you need and formats it in theappropriate way. You can do this by extending TableGen itself in C++, or bywriting a script in any language that can consume the JSON output.
Example¶
With no other arguments,llvm-tblgen parses the specified file and prints out allof the classes, then all of the definitions. This is a good way to see what thevarious definitions expand to fully. Running this on theX86.td
file printsthis (at the time of this writing):
...def ADD32rr { // Instruction X86Inst I string Namespace = "X86"; dag OutOperandList = (outs GR32:$dst); dag InOperandList = (ins GR32:$src1, GR32:$src2); string AsmString = "add{l}\t{$src2, $dst|$dst, $src2}"; list<dag> Pattern = [(set GR32:$dst, (add GR32:$src1, GR32:$src2))]; list<Register> Uses = []; list<Register> Defs = [EFLAGS]; list<Predicate> Predicates = []; int CodeSize = 3; int AddedComplexity = 0; bit isReturn = 0; bit isBranch = 0; bit isIndirectBranch = 0; bit isBarrier = 0; bit isCall = 0; bit canFoldAsLoad = 0; bit mayLoad = 0; bit mayStore = 0; bit isImplicitDef = 0; bit isConvertibleToThreeAddress = 1; bit isCommutable = 1; bit isTerminator = 0; bit isReMaterializable = 0; bit isPredicable = 0; bit hasDelaySlot = 0; bit usesCustomInserter = 0; bit hasCtrlDep = 0; bit isNotDuplicable = 0; bit hasSideEffects = 0; InstrItinClass Itinerary = NoItinerary; string Constraints = ""; string DisableEncoding = ""; bits<8> Opcode = { 0, 0, 0, 0, 0, 0, 0, 1 }; Format Form = MRMDestReg; bits<6> FormBits = { 0, 0, 0, 0, 1, 1 }; ImmType ImmT = NoImm; bits<3> ImmTypeBits = { 0, 0, 0 }; bit hasOpSizePrefix = 0; bit hasAdSizePrefix = 0; bits<4> Prefix = { 0, 0, 0, 0 }; bit hasREX_WPrefix = 0; FPFormat FPForm = ?; bits<3> FPFormBits = { 0, 0, 0 };}...
This definition corresponds to the 32-bit register-registeradd
instructionof the x86 architecture.defADD32rr
defines a record namedADD32rr
, and the comment at the end of the line indicates the superclassesof the definition. The body of the record contains all of the data thatTableGen assembled for the record, indicating that the instruction is part ofthe “X86” namespace, the pattern indicating how the instruction is selected bythe code generator, that it is a two-address instruction, has a particularencoding, etc. The contents and semantics of the information in the record arespecific to the needs of the X86 backend, and are only shown as an example.
As you can see, a lot of information is needed for every instruction supportedby the code generator, and specifying it all manually would be unmaintainable,prone to bugs, and tiring to do in the first place. Because we are usingTableGen, all of the information was derived from the following definition:
let Defs = [EFLAGS], isCommutable = 1, // X = ADD Y,Z --> X = ADD Z,Y isConvertibleToThreeAddress = 1 in // Can transform into LEA.def ADD32rr : I<0x01, MRMDestReg, (outs GR32:$dst), (ins GR32:$src1, GR32:$src2), "add{l}\t{$src2, $dst|$dst, $src2}", [(set GR32:$dst, (add GR32:$src1, GR32:$src2))]>;
This definition makes use of the custom classI
(extended from the customclassX86Inst
), which is defined in the X86-specific TableGen file, tofactor out the common features that instructions of its class share. A keyfeature of TableGen is that it allows the end-user to define the abstractionsthey prefer to use when describing their information.
Syntax¶
TableGen has a syntax that is loosely based on C++ templates, with built-intypes and specification. In addition, TableGen’s syntax introduces someautomation concepts like multiclass, foreach, let, etc.
Basic concepts¶
TableGen files consist of two key parts: ‘classes’ and ‘definitions’, both ofwhich are considered ‘records’.
TableGen records have a unique name, a list of values, and a list ofsuperclasses. The list of values is the main data that TableGen builds for eachrecord; it is this that holds the domain specific information for theapplication. The interpretation of this data is left to a specificbackend,but the structure and format rules are taken care of and are fixed byTableGen.
TableGen definitions are the concrete form of ‘records’. These generally donot have any undefined values, and are marked with the ‘def
’ keyword.
def FeatureFPARMv8 : SubtargetFeature<"fp-armv8", "HasFPARMv8", "true", "Enable ARMv8 FP">;
In this example, FeatureFPARMv8 isSubtargetFeature
record initialisedwith some values. The names of the classes are defined via thekeywordclass either on the same file or some other included. Most targetTableGen files include the generic ones ininclude/llvm/Target
.
TableGen classes are abstract records that are used to build and describeother records. These classes allow the end-user to build abstractions foreither the domain they are targeting (such as “Register”, “RegisterClass”, and“Instruction” in the LLVM code generator) or for the implementor to help factorout common properties of records (such as “FPInst”, which is used to representfloating point instructions in the X86 backend). TableGen keeps track of all ofthe classes that are used to build up a definition, so the backend can find alldefinitions of a particular class, such as “Instruction”.
class ProcNoItin<string Name, list<SubtargetFeature> Features> : Processor<Name, NoItineraries, Features>;
Here, the class ProcNoItin, receiving parametersName of typestring anda list of target features is specializing the class Processor by passing thearguments down as well as hard-coding NoItineraries.
TableGen multiclasses are groups of abstract records that are instantiatedall at once. Each instantiation can result in multiple TableGen definitions.If a multiclass inherits from another multiclass, the definitions in thesub-multiclass become part of the current multiclass, as if they were declaredin the current multiclass.
multiclass ro_signed_pats<string T, string Rm, dag Base, dag Offset, dag Extend, dag address, ValueType sty> {def : Pat<(i32 (!cast<SDNode>("sextload" # sty) address)), (!cast<Instruction>("LDRS" # T # "w_" # Rm # "_RegOffset") Base, Offset, Extend)>;def : Pat<(i64 (!cast<SDNode>("sextload" # sty) address)), (!cast<Instruction>("LDRS" # T # "x_" # Rm # "_RegOffset") Base, Offset, Extend)>;}defm : ro_signed_pats<"B", Rm, Base, Offset, Extend, !foreach(decls.pattern, address, !subst(SHIFT, imm_eq0, decls.pattern)), i8>;
See theTableGen Programmer’s Reference for an in-depthdescription of TableGen.
TableGen backends¶
TableGen files have no real meaning without a backend. The default operationwhen running*-tblgen
is to print the information in a textual format, butthat’s only useful for debugging the TableGen files themselves. The powerin TableGen is, however, to interpret the source files into an internalrepresentation that can be generated into anything you want.
Current usage of TableGen is to create huge include files with tables that youcan either include directly (if the output is in the language you’re coding),or be used in pre-processing via macros surrounding the include of the file.
Direct output can be used if the backend already prints a table in C formator if the output is just a list of strings (for error and warning messages).Pre-processed output should be used if the same information needs to be usedin different contexts (like Instruction names), so your backend should printa meta-information list that can be shaped into different compile-time formats.
SeeTableGen BackEnds for a list of availablebackends, and see theTableGen Backend Developer’s Guidefor information on how to write and debug a new backend.
Tools and Resources¶
In addition to this documentation, a list of tools and resources for TableGencan be found in TableGen’sREADME.
TableGen Deficiencies¶
Despite being very generic, TableGen has some deficiencies that have beenpointed out numerous times. The common theme is that, while TableGen allowsyou to build domain specific languages, the final languages that you createlack the power of other DSLs, which in turn increase considerably the sizeand complexity of TableGen files.
At the same time, TableGen allows you to create virtually any meaning ofthe basic concepts via custom-made backends, which can pervert the originaldesign and make it very hard for newcomers to understand the evil TableGenfile.
There are some in favor of extending the semantics even more, but making surebackends adhere to strict rules. Others are suggesting we should move to less,more powerful DSLs designed with specific purposes, or even reusing existingDSLs.