Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

[TableGen][DecoderEmitter] Add option to emit type-specialized code#146593

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Open
jurahul wants to merge14 commits intollvm:main
base:main
Choose a base branch
Loading
fromjurahul:decoder_emitter_type_specialization

Conversation

jurahul
Copy link
Contributor

@jurahuljurahul commentedJul 1, 2025
edited
Loading

This change attempts to reduce the size of the disassembler code generated by DecoderEmitter.

Current state:

  1. Currently, the code generated by the decoder emitter consists of two key functions:decodeInstruction which is the entry point into the generated code anddecodeToMCInst which is invoked when a decode op is reached in the while traversing through the decoder table. Both functions are templated onInsnType which is the raw instruction bits that are supplied todecodeInstruction.
  2. Several backends calldecodeInstruction with different types, leading to several template instantiations of this function in the final code. As an example, AMDGPU instantiates this function with typeDecoderUInt128 type for decoding 96/128 bit instructions,uint64_t for decoding 64-bit instructions, anduint32_t for decoding 32-bit instructions.
  3. Since there is just onedecodeToMCInst generated, it has code that handles all instruction sizes. The decoders emitted for different instructions sizes rarely have any intersection with each other. That means, in the AMDGPU case, the instantiation with InsnType == DecoderUInt128 has decoder code for 32/64-bit instructions that is never exercised. Conversely, the instantiation with InsnType == uint64_t has decoder code for 128/96/32 bit instructions that is never exercised. This leads to unnecessary dead code in the generated disassembler binary.

With this change, the DecoderEmitter will stop generating a single templateddecodeInstruction and will instead generate several overloaded versions of this function and the associateddecodeToMCInst function as well. Instead of using the templatedInsnType, it will use an auto-inferred type which can be either a standard C++ integrer type, APInt, or a std::bitset. As a results, decoders for 32-bit instructions will appear only in the 32-bit variant ofdecodeToMCinst and 64-bit decoders will appear only in 64-bit variant and that will fix the code duplication in the templated variant.

Additionally, theDecodeIndex will now be computed per-instruction bitwidth as instead of being computed globally across all bitwidths in the earlier case. So, the values will generally be smaller than before and hence will consume less bytes in their ULEB128 encoding in the decoder tables, resulting in further reduction in the size of the decode tables.

Since this non-templated decoder also needs some changes in the C++ code, added an optionGenerateTemplatedDecoder toInstrInfo that is defaulted to false, but targets can set to true to fall back to using templated code. The goal is to migrate all targets to use non-templated decoder and deprecate this option in future.

Adopt this feature for the AMDGPU backend. In a release build, this results in a net 35% reduction in the .text size of libLLVMAMDGPUDisassembler.so and a 5% reduction in the .rodata size. Actual numbers measured locally for a Linux x86_64 build using clang-18.1.3 toolchain are:

.text 378780 -> 244684, i.e., a 35% reduction in size.rodata 352648 -> 334960 i.e., a 5% reduction in size

For targets that do not use multiple instantiations ofdecodeInstruction, opting in into this feature may not result in code/data size improvement but potential compile time improvements by avoiding the use of templated code.

@topperc
Copy link
Collaborator

What's the motivation?

@jurahul
Copy link
ContributorAuthor

Please see the PR description that I just added. I also have data to support this, will add soon (tabulating it ATM)

@jurahul
Copy link
ContributorAuthor

Failure is in the unit test (TableGen/VarLenDecoder.td), I need to update it.

@topperc
Copy link
Collaborator

Why can't the disassembler emitter figure out the bitwidths from the tablegen input? It already makes separate tables for each instruction size.

@topperc
Copy link
Collaborator

Why can't the disassembler emitter figure out the bitwidths from the tablegen input? It already makes separate tables for each instruction size.

Oh is it because RISC-V uses 3 bit widths, but only 2 types for DecodeInstruction?

@topperc
Copy link
Collaborator

Can we store the type in the Instruction class in the .td files like the bitwidth instead of introducing a complex command line argument?

@jurahul
Copy link
ContributorAuthor

Right, this is POC at this which shows that the proposed optimization works. I am open to changing the interface here as well. The command line one was simple enough to not mess with tablegen instruction class etc, but that is an option, though it feels more intrusive. The command line is moderately complex and localized to the decoder emitter.

@jurahul
Copy link
ContributorAuthor

jurahul commentedJul 1, 2025
edited
Loading

Repeating the type per-instruction record might be redundant (and we would need more verification as well to verify for a given size, all insts of that size have the C++ type specified and its consistent). One option is to add a new InstructionTypeAndSize class that records this information, and DecoderEmitter can use that if its present else fall back to templated code. Something like

class InstructionDecoderTypeAndSize<string CPPType, list<int> Bitwidths> {}class InstructionDecoderTypeAndSizes<list<InstructionDecoderTypeAndSize>> {}

and a particular backend can define a single record of type InstructionDecoderTypeAndSizes<> which the DecoderEmitter will use. This is essentially encoding the command line option as a record.

// RISCV.td// Opt-in to non-templated deocder code.def : InstructionDecoderTypeAndSizes<[                InstructionDecoderTypeAndSize<"uint64_t", [48]>,                InstructionDecoderTypeAndSize<"uint32_t", [16,32]>]>;

or more simply

class InstructionDecoderTypeAndSizes<list<string> CPPTypes, list<list<int>> Bitwidths> {}def : InstructionDecoderTypeAndSizes<           [ "uint32_t", uint64_t"],           [ [16,32],    [64]     ]>;

@jurahuljurahulforce-pushed thedecoder_emitter_type_specialization branch from30d0838 to2d7d1dcCompareJuly 1, 2025 22:19
@topperc
Copy link
Collaborator

Repeating the type per-instruction record might be redundant (and we would need more verification as well to verify for a given size, all insts of that size have the C++ type specified and its consistent). One option is to add a new InstructionTypeAndSize class that records this information, and DecoderEmitter can use that if its present else fall back to templated code. Something like

class InstructionDecoderTypeAndSize<string CPPType, list<int> Bitwidths> {}class InstructionDecoderTypeAndSizes<list<InstructionDecoderTypeAndSize>> {}

and a particular backend can define a single record of type InstructionDecoderTypeAndSizes<> which the DecoderEmitter will use. This is essentially encoding the command line option as a record.

// RISCV.td// Opt-in to non-templated deocder code.def : InstructionDecoderTypeAndSizes<[                InstructionDecoderTypeAndSize<"uint64_t", [48]>,                InstructionDecoderTypeAndSize<"uint32_t", [16,32]>]>;

or more simply

class InstructionDecoderTypeAndSizes<list<string> CPPTypes, list<list<int>> Bitwidths> {}def : InstructionDecoderTypeAndSizes<           [ "uint32_t", uint64_t"],           [ [16,32],    [64]     ]>;

RISCV uses a common base class for each of the 3 instruction sizes. Other targets may be similar.

class RVInst<dag outs, dag ins, string opcodestr, string argstr,                              list<dag> pattern, InstFormat format>                                   : RVInstCommon<outs, ins, opcodestr, argstr, pattern, format> {                field bits<32> Inst;                                                             // SoftFail is a field the disassembler can use to provide a way for             // instructions to not match without killing the whole decode process. It is     // mainly used for ARM, but Tablegen expects this field to exist or it fails     // to build the decode table.                                                    field bits<32> SoftFail = 0;                                                     let Size = 4;                                                                  }                                                                                                                                                                 class RVInst48<dag outs, dag ins, string opcodestr, string argstr,                              list<dag> pattern, InstFormat format>                                 : RVInstCommon<outs, ins, opcodestr, argstr, pattern, format> {                field bits<48> Inst;                                                             field bits<48> SoftFail = 0;                                                     let Size = 6;                                                                  }                                                                                                                                                                 class RVInst64<dag outs, dag ins, string opcodestr, string argstr,                              list<dag> pattern, InstFormat format>                                 : RVInstCommon<outs, ins, opcodestr, argstr, pattern, format> {                field bits<64> Inst;                                                             field bits<64> SoftFail = 0;                                                     let Size = 8;                                                                  }

@jurahul
Copy link
ContributorAuthor

Repeating the type per-instruction record might be redundant (and we would need more verification as well to verify for a given size, all insts of that size have the C++ type specified and its consistent). One option is to add a new InstructionTypeAndSize class that records this information, and DecoderEmitter can use that if its present else fall back to templated code. Something like

class InstructionDecoderTypeAndSize<string CPPType, list<int> Bitwidths> {}class InstructionDecoderTypeAndSizes<list<InstructionDecoderTypeAndSize>> {}

and a particular backend can define a single record of type InstructionDecoderTypeAndSizes<> which the DecoderEmitter will use. This is essentially encoding the command line option as a record.

// RISCV.td// Opt-in to non-templated deocder code.def : InstructionDecoderTypeAndSizes<[                InstructionDecoderTypeAndSize<"uint64_t", [48]>,                InstructionDecoderTypeAndSize<"uint32_t", [16,32]>]>;

or more simply

class InstructionDecoderTypeAndSizes<list<string> CPPTypes, list<list<int>> Bitwidths> {}def : InstructionDecoderTypeAndSizes<           [ "uint32_t", uint64_t"],           [ [16,32],    [64]     ]>;

RISCV uses a common base class for each of the 3 instruction sizes. Other targets may be similar.

class RVInst<dag outs, dag ins, string opcodestr, string argstr,                              list<dag> pattern, InstFormat format>                                   : RVInstCommon<outs, ins, opcodestr, argstr, pattern, format> {                field bits<32> Inst;                                                             // SoftFail is a field the disassembler can use to provide a way for             // instructions to not match without killing the whole decode process. It is     // mainly used for ARM, but Tablegen expects this field to exist or it fails     // to build the decode table.                                                    field bits<32> SoftFail = 0;                                                     let Size = 4;                                                                  }                                                                                                                                                                 class RVInst48<dag outs, dag ins, string opcodestr, string argstr,                              list<dag> pattern, InstFormat format>                                 : RVInstCommon<outs, ins, opcodestr, argstr, pattern, format> {                field bits<48> Inst;                                                             field bits<48> SoftFail = 0;                                                     let Size = 6;                                                                  }                                                                                                                                                                 class RVInst64<dag outs, dag ins, string opcodestr, string argstr,                              list<dag> pattern, InstFormat format>                                 : RVInstCommon<outs, ins, opcodestr, argstr, pattern, format> {                field bits<64> Inst;                                                             field bits<64> SoftFail = 0;                                                     let Size = 8;                                                                  }

Right, but nonetheless, we will have the type specified per instructioninstance and we will still need to validate for example that for all instructions with a particular size, the type string is same. To me that seems unnecessary duplication of this information and then additional verification to make sure that it's consistent. Also, unlike the size in bytes, which is a core property of the instruction, its C++ type to represent its bits in memory seems not a core property. Many backends seems to choose the same type (for example uint64_t) for all their 16/32/48/64 bit insts. Adoption wise as well, sticking it in the per-inst record seems more invasive (for example, in our and several other downstream backends the core instruction records are auto-generated so the adoption curve for this increases further).

@jurahul
Copy link
ContributorAuthor

Requesting not review per-se but opinion on the user interface for this optimization. Choices proposed are:

  • Command line option, as in this PR. +: non-intrusive in terms of .td files, -: need to parse it and parsing can be flaky.
  • Per instruction record carries cpp type (@topperc 's suggestion): +: No command line option parsing flakiness, -: (IMO) too invasive, I see difficulty or increased complexity in adoption for auto-generated inst defs used in several downstream backends, needs additional validation for consistent across all insts of a given size.
  • Option embedded as a new singleton records in the .td file: +:No command line option parsing flakiness, less intrusive than the option below (as the single def is standalone not attached to anything else), no consistency checks. -: ?

@topperc
Copy link
Collaborator

Many backends seems to choose the same type (for example uint64_t) for all their 16/32/48/64 bit insts.

I'm probably going to change to uint64_t for RISC-V. The 48-bit instructions are only used by one vendor and are relatively recent additions. I think the duplication cost just wasn't considered when they were added.

I agree adding to the Inst class might be too invasive. I still think it should be in .td files somehow. Needing to change a CMake file and replicating to GN and the other build systems when a new instruction width is added seems bad.

@jurahul
Copy link
ContributorAuthor

Many backends seems to choose the same type (for example uint64_t) for all their 16/32/48/64 bit insts.

I'm probably going to change to uint64_t for RISC-V. The 48-bit instructions are only used by one vendor and are relatively recent additions. I think the duplication cost just wasn't considered when they were added.

I agree adding to the Inst class might be too invasive. I still think it should be in .td files somehow. Needing to change a CMake file and replicating to GN and the other build systems when a new instruction width is added seems bad.

Right, is the option#3 above palatable? We essentially encode it as a standalone record that the DecoderEmitter will look for.

@topperc
Copy link
Collaborator

Many backends seems to choose the same type (for example uint64_t) for all their 16/32/48/64 bit insts.

I'm probably going to change to uint64_t for RISC-V. The 48-bit instructions are only used by one vendor and are relatively recent additions. I think the duplication cost just wasn't considered when they were added.
I agree adding to the Inst class might be too invasive. I still think it should be in .td files somehow. Needing to change a CMake file and replicating to GN and the other build systems when a new instruction width is added seems bad.

Right, is the option#3 above palatable? We essentially encode it as a standalone record that the DecoderEmitter will look for.

Maybe it should be stored in theInstrInfo class likeisLittleEndianEncoding or theTarget class?

topperc added a commit to topperc/llvm-project that referenced this pull requestJul 2, 2025
…6. NFCInsn is passed to decodeInstruction which is a template functionbased on the type of Insn. By using uint64_t we ensure only oneversion of decodeInstruction is created. This reduces the file sizeof RISCVDisassembler.cpp.o by ~25% in my local build.This should get even more size benefit thanllvm#146593.
@jurahul
Copy link
ContributorAuthor

InstrInfo seems reasonable. Let me rework the PR to add that

@jurahuljurahulforce-pushed thedecoder_emitter_type_specialization branch from2d7d1dc to04366eeCompareJuly 2, 2025 14:38
@jurahul
Copy link
ContributorAuthor

@topperc Please let me know if this new scheme looks ok. If yes, I'll migrate the rest of the targets (Right now I just changed AARCH64 and RISCV) to use this, and add some unit tests for a final review.

@topperc
Copy link
Collaborator

@topperc Please let me know if this new scheme looks ok. If yes, I'll migrate the rest of the targets (Right now I just changed AARCH64 and RISCV) to use this, and add some unit tests for a final review.

Does this have any binary size effect on RISCV after#146619?

@jurahul
Copy link
ContributorAuthor

jurahul commentedJul 2, 2025
edited
Loading

@topperc Please let me know if this new scheme looks ok. If yes, I'll migrate the rest of the targets (Right now I just changed AARCH64 and RISCV) to use this, and add some unit tests for a final review.

Does this have any binary size effect on RISCV after#146619?

I have not tested. My speculation is, no binary size change but just minor compile time improvement by avoiding template specialization. I'll check and report back.

@jurahul
Copy link
ContributorAuthor

Looks like templating adds a little bit to the code size. Building the RISCVDisassembler.cpp.o in a release config with/without this change results in the following:

Old:196112 ./build/lib/Target/RISCV/Disassembler/CMakeFiles/LLVMRISCVDisassembler.dir/RISCVDisassembler.cpp.o New:196096 ./build/lib/Target/RISCV/Disassembler/CMakeFiles/LLVMRISCVDisassembler.dir/RISCVDisassembler.cpp.o

So, 16 bytes less. Not significant though.

@topperc
Copy link
Collaborator

Looks like templating adds a little bit to the code size. Building the RISCVDisassembler.cpp.o in a release config with/without this change results in the following:

Old:196112 ./build/lib/Target/RISCV/Disassembler/CMakeFiles/LLVMRISCVDisassembler.dir/RISCVDisassembler.cpp.o New:196096 ./build/lib/Target/RISCV/Disassembler/CMakeFiles/LLVMRISCVDisassembler.dir/RISCVDisassembler.cpp.o

So, 16 bytes less. Not significant though.

Could just be a difference in the name mangling of the function name? Or are you checking the .text size?

@jurahul
Copy link
ContributorAuthor

jurahul commentedJul 2, 2025
edited
Loading

yeah, your guess was right. I dumped the sizes withsize -A and I see:

New:.text._ZN12_GLOBAL__N_114decodeToMCInstEjN4llvm14MCDisassembler12DecodeStatusEmRNS0_6MCInstEmPKS1_Rb     27023.text._ZN12_GLOBAL__N_117decodeInstructionEPKhRN4llvm6MCInstEmmPKNS2_14MCDisassemblerERKNS2_15MCSubtargetInfoE        9238 Old:                                                                                                                                                                                                                                                  .text._ZN12_GLOBAL__N_114decodeToMCInstImEEN4llvm14MCDisassembler12DecodeStatusEjS3_T_RNS1_6MCInstEmPKS2_Rb                             27023 .text._ZN12_GLOBAL__N_117decodeInstructionImEEN4llvm14MCDisassembler12DecodeStatusEPKhRNS1_6MCInstET_mPKS2_RKNS1_15MCSubtargetInfoE      9238

That is, text sizes are the same but mangled names are different and that likely leads to larger object file sizes.

@jurahul
Copy link
ContributorAuthor

Note though that what you did for RISCV may not be applicable/desirable for all targets. For example, AMDGPU has 128 bit instructions, so I am assuming if we just use a 128-bit type for all instructions, we may pay a penalty in terms of the bit extraction costs (32 vs 64-bit may not be as bad).

@jurahul
Copy link
ContributorAuthor

@topperc My question is still unanswered. WDYT of this new interface to op-in into this optimization?

@github-actionsGitHub Actions
Copy link

github-actionsbot commentedJul 12, 2025
edited
Loading

✅ With the latest revision this PR passed the C/C++ code formatter.

@jurahuljurahulforce-pushed thedecoder_emitter_type_specialization branch fromd51627f to1824e68CompareJuly 12, 2025 01:03
Copy link
Contributor

@s-barannikovs-barannikov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Minor notes/suggestions

@jurahuljurahulforce-pushed thedecoder_emitter_type_specialization branch from1824e68 to17a507fCompareJuly 14, 2025 19:19
@jurahuljurahulforce-pushed thedecoder_emitter_type_specialization branch 2 times, most recently from23e63c2 to357a409CompareJuly 14, 2025 23:33
Copy link
Contributor

@s-barannikovs-barannikov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

.

@jurahuljurahulforce-pushed thedecoder_emitter_type_specialization branch from9c3defd to180acbbCompareJuly 15, 2025 22:59
@jurahuljurahul marked this pull request as ready for reviewJuly 16, 2025 03:05
@jurahuljurahul changed the title[TableGen][DecoderEmitter] Add option to emit type-specializeddecodeToMCInst[TableGen][DecoderEmitter] Add option to emit type-specialized codeJul 16, 2025
@jurahuljurahulforce-pushed thedecoder_emitter_type_specialization branch from180acbb to326a481CompareJuly 18, 2025 17:10
@jurahul
Copy link
ContributorAuthor

I did one other minor change. We used to generate, for each bitwidth, adecodeInstruction<N> function as well as adecodeInstruction that just callsdecodeInstruction<N>. Since we don't really usedecodeInstruction<N> directly, I am just generatingdecodeInstruction directly now. So, for example, for 16 and 32-bit instructions, old code has:

decodeToMCInst16() {}decodeInstruction16 {    call decodeToMCInst16()}decodeInstruction(..., uint16_t insn, ...)decodeToMCInst32decodeInstruction32 {    call decodeToMCInst32()}decodeInstruction(..., uint32_t insn, ...) {  return decodeInstruction32(...)}

Now it will just have:

decodeToMCInst16() {}decodeInstruction(..., uint16_t insn, ...) {    call decodeToMCInst16()}decodeToMCInst32decodeInstruction(..., uint32_t insn, ...) {    call decodeToMCInst32()}

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@s-barannikovs-barannikovs-barannikov approved these changes

@mshockwavemshockwaveAwaiting requested review from mshockwave

@topperctoppercAwaiting requested review from topperc

Assignees
No one assigned
Labels
None yet
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

3 participants
@jurahul@topperc@s-barannikov

[8]ページ先頭

©2009-2025 Movatter.jp