1//===-- AMDGPULowerModuleLDSPass.cpp ------------------------------*- C++ -*-=// 3// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. 4// See https://llvm.org/LICENSE.txt for license information. 5// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception 7//===----------------------------------------------------------------------===// 9// This pass eliminates local data store, LDS, uses from non-kernel functions. 10// LDS is contiguous memory allocated per kernel execution. 14// The programming model is global variables, or equivalently function local 15// static variables, accessible from kernels or other functions. For uses from 16// kernels this is straightforward - assign an integer to the kernel for the 17// memory required by all the variables combined, allocate them within that. 18// For uses from functions there are performance tradeoffs to choose between. 20// This model means the GPU runtime can specify the amount of memory allocated. 21// If this is more than the kernel assumed, the excess can be made available 22// using a language specific feature, which IR represents as a variable with 23// no initializer. This feature is referred to here as "Dynamic LDS" and is 24// lowered slightly differently to the normal case. 26// Consequences of this GPU feature: 27// - memory is limited and exceeding it halts compilation 28// - a global accessed by one kernel exists independent of other kernels 29// - a global exists independent of simultaneous execution of the same kernel 30// - the address of the global may be different from different kernels as they 31// do not alias, which permits only allocating variables they use 32// - if the address is allowed to differ, functions need help to find it 34// Uses from kernels are implemented here by grouping them in a per-kernel 35// struct instance. This duplicates the variables, accurately modelling their 36// aliasing properties relative to a single global representation. It also 37// permits control over alignment via padding. 39// Uses from functions are more complicated and the primary purpose of this 40// IR pass. Several different lowering are chosen between to meet requirements 41// to avoid allocating any LDS where it is not necessary, as that impacts 42// occupancy and may fail the compilation, while not imposing overhead on a 43// feature whose primary advantage over global memory is performance. The basic 44// design goal is to avoid one kernel imposing overhead on another. 48// LDS variables with constant annotation or non-undef initializer are passed 49// through unchanged for simplification or error diagnostics in later passes. 50// Non-undef initializers are not yet implemented for LDS. 52// LDS variables that are always allocated at the same address can be found 53// by lookup at that address. Otherwise runtime information/cost is required. 55// The simplest strategy possible is to group all LDS variables in a single 56// struct and allocate that struct in every kernel such that the original 57// variables are always at the same address. LDS is however a limited resource 58// so this strategy is unusable in practice. It is not implemented here. 60// Strategy | Precise allocation | Zero runtime cost | General purpose | 61// --------+--------------------+-------------------+-----------------+ 62// Module | No | Yes | Yes | 63// Table | Yes | No | Yes | 64// Kernel | Yes | Yes | No | 65// Hybrid | Yes | Partial | Yes | 67// "Module" spends LDS memory to save cycles. "Table" spends cycles and global 68// memory to save LDS. "Kernel" is as fast as kernel allocation but only works 69// for variables that are known reachable from a single kernel. "Hybrid" picks 70// between all three. When forced to choose between LDS and cycles we minimise 73// The "module" lowering implemented here finds LDS variables which are used by 74// non-kernel functions and creates a new struct with a field for each of those 75// LDS variables. Variables that are only used from kernels are excluded. 77// The "table" lowering implemented here has three components. 78// First kernels are assigned a unique integer identifier which is available in 79// functions it calls through the intrinsic amdgcn_lds_kernel_id. The integer 80// is passed through a specific SGPR, thus works with indirect calls. 81// Second, each kernel allocates LDS variables independent of other kernels and 82// writes the addresses it chose for each variable into an array in consistent 83// order. If the kernel does not allocate a given variable, it writes undef to 84// the corresponding array location. These arrays are written to a constant 85// table in the order matching the kernel unique integer identifier. 86// Third, uses from non-kernel functions are replaced with a table lookup using 87// the intrinsic function to find the address of the variable. 89// "Kernel" lowering is only applicable for variables that are unambiguously 90// reachable from exactly one kernel. For those cases, accesses to the variable 91// can be lowered to ConstantExpr address of a struct instance specific to that 92// one kernel. This is zero cost in space and in compute. It will raise a fatal 93// error on any variable that might be reachable from multiple kernels and is 94// thus most easily used as part of the hybrid lowering strategy. 96// Hybrid lowering is a mixture of the above. It uses the zero cost kernel 97// lowering where it can. It lowers the variable accessed by the greatest 98// number of kernels using the module strategy as that is free for the first 99// variable. Any futher variables that can be lowered with the module strategy 100// without incurring LDS memory overhead are. The remaining ones are lowered 104// - No heuristics or user controlled magic numbers, hybrid is the right choice 105// - Kernels that don't use functions (or have had them all inlined) are not 106// affected by any lowering for kernels that do. 107// - Kernels that don't make indirect function calls are not affected by those 109// - Variables which are used by lots of kernels, e.g. those injected by a 110// language runtime in most kernels, are expected to have no overhead 111// - Implementations that instantiate templates per-kernel where those templates 112// use LDS are expected to hit the "Kernel" lowering strategy 113// - The runtime properties impose a cost in compiler implementation complexity 115// Dynamic LDS implementation 116// Dynamic LDS is lowered similarly to the "table" strategy above and uses the 117// same intrinsic to identify which kernel is at the root of the dynamic call 118// graph. This relies on the specified behaviour that all dynamic LDS variables 119// alias one another, i.e. are at the same address, with respect to a given 120// kernel. Therefore this pass creates new dynamic LDS variables for each kernel 121// that allocates any dynamic LDS and builds a table of addresses out of those. 122// The AMDGPUPromoteAlloca pass skips kernels that use dynamic LDS. 123// The corresponding optimisation for "kernel" lowering where the table lookup 124// is elided is not implemented. 127// Implementation notes / limitations 128// A single LDS global variable represents an instance per kernel that can reach 129// said variables. This pass essentially specialises said variables per kernel. 130// Handling ConstantExpr during the pass complicated this significantly so now 131// all ConstantExpr uses of LDS variables are expanded to instructions. This 132// may need amending when implementing non-undef initialisers. 134// Lowering is split between this IR pass and the back end. This pass chooses 135// where given variables should be allocated and marks them with metadata, 136// MD_absolute_symbol. The backend places the variables in coincidentally the 137// same location and raises a fatal error if something has gone awry. This works 138// in practice because the only pass between this one and the backend that 139// changes LDS is PromoteAlloca and the changes it makes do not conflict. 141// Addresses are written to constant global arrays based on the same metadata. 143// The backend lowers LDS variables in the order of traversal of the function. 144// This is at odds with the deterministic layout required. The workaround is to 145// allocate the fixed-address variables immediately upon starting the function 146// where they can be placed as intended. This requires a means of mapping from 147// the function to the variables that it allocates. For the module scope lds, 148// this is via metadata indicating whether the variable is not required. If a 149// pass deletes that metadata, a fatal error on disagreement with the absolute 150// symbol metadata will occur. For kernel scope and dynamic, this is by _name_ 151// correspondence between the function and the variable. It requires the 152// kernel to have a name (which is only a limitation for tests in practice) and 153// for nothing to rename the corresponding symbols. This is a hazard if the pass 154// is run multiple times during debugging. Alternative schemes considered all 155// involve bespoke metadata. 157// If the name correspondence can be replaced, multiple distinct kernels that 158// have the same memory layout can map to the same kernel id (as the address 159// itself is handled by the absolute symbol metadata) and that will allow more 160// uses of the "kernel" style faster lowering and reduce the size of the lookup 163// There is a test that checks this does not fire for a graphics shader. This 164// lowering is expected to work for graphics if the isKernel test is changed. 166// The current markUsedByKernel is sufficient for PromoteAlloca but is elided 167// before codegen. Replacing this with an equivalent intrinsic which lasts until 168// shortly after the machine function lowering of LDS would help break the name 169// mapping. The other part needed is probably to amend PromoteAlloca to embed 170// the LDS variables it creates in the same struct created here. That avoids the 171// current hazard where a PromoteAlloca LDS variable might be allocated before 172// the kernel scope (and thus error on the address check). Given a new invariant 173// that no LDS variables exist outside of the structs managed here, and an 174// intrinsic that lasts until after the LDS frame lowering, it should be 175// possible to drop the name mapping and fold equivalent memory layouts. 177//===----------------------------------------------------------------------===// 195#include "llvm/IR/IntrinsicsAMDGPU.h" 212#define DEBUG_TYPE "amdgpu-lower-module-lds" 215using namespaceAMDGPU;
220"amdgpu-super-align-lds-globals",
221cl::desc(
"Increase alignment of LDS if it is not on align boundary"),
224enum class LoweringKind { module, table, kernel, hybrid };
226"amdgpu-lower-module-lds-strategy",
230clEnumValN(LoweringKind::table,
"table",
"Lower via table lookup"),
231clEnumValN(LoweringKind::module,
"module",
"Lower via module struct"),
233 LoweringKind::kernel,
"kernel",
234"Lower variables reachable from one kernel, otherwise abort"),
236"Lower via mixture of above strategies")));
238template <
typename T> std::vector<T> sortByName(std::vector<T> &&V) {
239llvm::sort(V.begin(), V.end(), [](
constauto *L,
constauto *R) {
240 return L->getName() < R->getName();
242return {std::move(V)};
245classAMDGPULowerModuleLDS {
249 removeLocalVarsFromUsedLists(
Module &M,
251// The verifier rejects used lists containing an inttoptr of a constant 252// so remove the variables from these lists before replaceAllUsesWith 255 LocalVarsSet.
insert(cast<Constant>(LocalVar->stripPointerCasts()));
261 LocalVar->removeDeadConstantUsers();
265// The llvm.amdgcn.module.lds instance is implicitly used by all kernels 266// that might call a function which accesses a field within it. This is 267// presently approximated to 'all kernels' if there are any such functions 268// in the module. This implicit use is redefined as an explicit use here so 269// that later passes, specifically PromoteAlloca, account for the required 270// memory without any knowledge of this transform. 272// An operand bundle on llvm.donothing works because the call instruction 273// survives until after the last pass that needs to account for LDS. It is 274// better than inline asm as the latter survives until the end of codegen. A 275// totally robust solution would be a function with the same semantics as 276// llvm.donothing that takes a pointer to the instance and is lowered to a 277// no-op after LDS is allocated, but that is not presently necessary. 279// This intrinsic is eliminated shortly before instruction selection. It 280// does not suffice to indicate to ISel that a given global which is not 281// immediately used by the kernel must still be allocated by it. An 282// equivalent target specific intrinsic which lasts until immediately after 283// codegen would suffice for that, but one would still need to ensure that 284// the variables are allocated in the anticipated order. 286IRBuilder<> Builder(Entry, Entry->getFirstNonPHIIt());
289 Func->getParent(), Intrinsic::donothing, {});
291Value *UseInstance[1] = {
301structLDSVariableReplacement {
306// remap from lds global to a constantexpr gep to where it has been moved to 308// an array with an element for each kernel containing where the corresponding 309// variable was remapped to 311staticConstant *getAddressesOfVariablesInKernel(
314// Create a ConstantArray containing the address of each Variable within the 315// kernel corresponding to LDSVarsToConstantGEP, or poison if that kernel 316// does not allocate it 317// TODO: Drop the ptrtoint conversion 321ArrayType *KernelOffsetsType = ArrayType::get(I32, Variables.
size());
325auto ConstantGepIt = LDSVarsToConstantGEP.
find(GV);
326if (ConstantGepIt != LDSVarsToConstantGEP.
end()) {
328 Elements.push_back(elt);
340if (Variables.
empty()) {
345constsize_t NumberVariables = Variables.
size();
346constsize_t NumberKernels = kernels.
size();
352 ArrayType::get(KernelOffsetsType, NumberKernels);
355 std::vector<Constant *> overallConstantExprElts(NumberKernels);
356for (
size_t i = 0; i < NumberKernels; i++) {
357auto Replacement = KernelToReplacement.
find(kernels[i]);
358 overallConstantExprElts[i] =
359 (Replacement == KernelToReplacement.
end())
361 : getAddressesOfVariablesInKernel(
362 Ctx, Variables, Replacement->second.LDSVarsToConstantGEP);
377Value *OptionalIndex) {
378// Table is a constant array of the same length as OrderedKernels 381auto *
I = cast<Instruction>(U.getUser());
383Value *tableKernelIndex = getTableLookupKernelIndex(M,
I->getFunction());
385if (
auto *Phi = dyn_cast<PHINode>(
I)) {
393 ConstantInt::get(I32, 0),
400 LookupTable->getValueType(), LookupTable, GEPIdx, GV->
getName());
410void replaceUsesInInstructionsWithTableLookup(
418for (
size_t Index = 0; Index < ModuleScopeVariables.
size(); Index++) {
419auto *GV = ModuleScopeVariables[Index];
422auto *
I = dyn_cast<Instruction>(U.getUser());
426 replaceUseWithTableLookup(M, Builder, LookupTable, GV, U,
427 ConstantInt::get(I32, Index));
438if (VariableSet.
empty())
441for (
Function &Func : M.functions()) {
446 KernelSet.insert(&Func);
456 chooseBestVariableForModuleStrategy(
constDataLayout &
DL,
458// Find the global variable with the most indirect uses from kernels 465 CandidateTy() =
default;
468 : GV(GV), UserCount(UserCount),
Size(AllocSize) {}
471// Fewer users makes module scope variable less attractive 472if (UserCount <
Other.UserCount) {
475if (UserCount >
Other.UserCount) {
479// Bigger makes module scope variable less attractive 488// Arbitrary but consistent 493 CandidateTy MostUsed;
495for (
auto &K : LDSVars) {
497if (K.second.size() <= 1) {
498// A variable reachable by only one kernel is best lowered with kernel 502 CandidateTy Candidate(
505if (MostUsed < Candidate)
506 MostUsed = Candidate;
514// Write the specified address into metadata where it can be retrieved by 515// the assembler. Format is a half open range, [Address Address+1) 527// Accesses from a function use the amdgcn_lds_kernel_id intrinsic which 528// lowers to a read from a live in register. Emit it once in the entry 529// block to spare deduplicating it later. 530auto [It, Inserted] = tableKernelIndexCache.
try_emplace(
F);
532auto InsertAt =
F->getEntryBlock().getFirstNonPHIOrDbgOrAlloca();
542static std::vector<Function *> assignLDSKernelIDToEachKernel(
545// Associate kernels in the set with an arbitrary but reproducible order and 546// annotate them with that order in metadata. This metadata is recognised by 547// the backend and lowered to a SGPR which can be read from using 548// amdgcn_lds_kernel_id. 550 std::vector<Function *> OrderedKernels;
551if (!KernelsThatAllocateTableLDS.
empty() ||
552 !KernelsThatIndirectlyAllocateDynamicLDS.
empty()) {
554for (
Function &Func : M->functions()) {
555if (Func.isDeclaration())
560if (KernelsThatAllocateTableLDS.
contains(&Func) ||
561 KernelsThatIndirectlyAllocateDynamicLDS.
contains(&Func)) {
562assert(Func.hasName());
// else fatal error earlier 563 OrderedKernels.push_back(&Func);
567// Put them in an arbitrary but reproducible order 568 OrderedKernels = sortByName(std::move(OrderedKernels));
570// Annotate the kernels with their order in this vector 574if (OrderedKernels.size() > UINT32_MAX) {
575// 32 bit keeps it in one SGPR. > 2**32 kernels won't fit on the GPU 579for (
size_t i = 0; i < OrderedKernels.size(); i++) {
583 OrderedKernels[i]->setMetadata(
"llvm.amdgcn.lds.kernel.id",
587return OrderedKernels;
590staticvoid partitionVariablesIntoIndirectStrategies(
599 LoweringKindLoc != LoweringKind::hybrid
601 : chooseBestVariableForModuleStrategy(
602 M.getDataLayout(), LDSToKernelsThatNeedToAccessItIndirectly);
607 ? LDSToKernelsThatNeedToAccessItIndirectly[HybridModuleRoot]
610for (
auto &K : LDSToKernelsThatNeedToAccessItIndirectly) {
611// Each iteration of this loop assigns exactly one global variable to 612// exactly one of the implementation strategies. 616assert(K.second.size() != 0);
619 DynamicVariables.
insert(GV);
623switch (LoweringKindLoc) {
624case LoweringKind::module:
625 ModuleScopeVariables.insert(GV);
628case LoweringKind::table:
629 TableLookupVariables.
insert(GV);
632case LoweringKind::kernel:
633if (K.second.size() == 1) {
634 KernelAccessVariables.
insert(GV);
637"cannot lower LDS '" + GV->
getName() +
638"' to kernel access as it is reachable from multiple kernels");
642case LoweringKind::hybrid: {
643if (GV == HybridModuleRoot) {
644assert(K.second.size() != 1);
645 ModuleScopeVariables.insert(GV);
646 }
elseif (K.second.size() == 1) {
647 KernelAccessVariables.
insert(GV);
649 ModuleScopeVariables.insert(GV);
651 TableLookupVariables.
insert(GV);
658// All LDS variables accessed indirectly have now been partitioned into 659// the distinct lowering strategies. 661 KernelAccessVariables.
size() + DynamicVariables.
size() ==
662 LDSToKernelsThatNeedToAccessItIndirectly.size());
668// Create a struct to hold the ModuleScopeVariables 669// Replace all uses of those variables from non-kernel functions with the 670// new struct instance Replace only the uses from kernel functions that will 671// allocate this instance. That is a space optimisation - kernels that use a 672// subset of the module scope struct and do not need to allocate it for 673// indirect calls will only allocate the subset they use (they do so as part 674// of the per-kernel lowering). 675if (ModuleScopeVariables.
empty()) {
681 LDSVariableReplacement ModuleScopeReplacement =
682 createLDSVariableReplacement(M,
"llvm.amdgcn.module.lds",
683 ModuleScopeVariables);
687 cast<Constant>(ModuleScopeReplacement.SGV),
688 PointerType::getUnqual(Ctx)))});
690// module.lds will be allocated at zero in any kernel that allocates it 691 recordLDSAbsoluteAddress(&M, ModuleScopeReplacement.SGV, 0);
694 removeLocalVarsFromUsedLists(M, ModuleScopeVariables);
696// Replace all uses of module scope variable from non-kernel functions 697 replaceLDSVariablesWithStruct(
698 M, ModuleScopeVariables, ModuleScopeReplacement, [&](
Use &U) {
707// Replace uses of module scope variable from kernel functions that 708// allocate the module scope variable, otherwise leave them unchanged 709// Record on each kernel whether the module scope global is used by it 711for (
Function &Func : M.functions()) {
715if (KernelsThatAllocateModuleLDS.
contains(&Func)) {
716 replaceLDSVariablesWithStruct(
717 M, ModuleScopeVariables, ModuleScopeReplacement, [&](
Use &U) {
726 markUsedByKernel(&Func, ModuleScopeReplacement.SGV);
730return ModuleScopeReplacement.SGV;
734 lowerKernelScopeStructVariables(
740// Create a struct for each kernel for the non-module-scope variables. 743for (
Function &Func : M.functions()) {
748// Allocating variables that are used directly in this struct to get 749// alignment aware allocation and predictable frame size. 752 KernelUsedVariables.
insert(v);
756// Allocating variables that are accessed indirectly so that a lookup of 757// this struct instance can find them from nested functions. 760 KernelUsedVariables.
insert(v);
764// Variables allocated in module lds must all resolve to that struct, 765// not to the per-kernel instance. 766if (KernelsThatAllocateModuleLDS.
contains(&Func)) {
768 KernelUsedVariables.
erase(v);
772if (KernelUsedVariables.
empty()) {
773// Either used no LDS, or the LDS it used was all in the module struct 774// or dynamically sized 778// The association between kernel function and LDS struct is done by 779// symbol name, which only works if the function in question has a 780// name This is not expected to be a problem in practice as kernels 781// are called by name making anonymous ones (which are named by the 782// backend) difficult to use. This does mean that llvm test cases need 783// to name the kernels. 784if (!Func.hasName()) {
788 std::string VarName =
789 (
Twine(
"llvm.amdgcn.kernel.") + Func.getName() +
".lds").str();
792 createLDSVariableReplacement(M, VarName, KernelUsedVariables);
794// If any indirect uses, create a direct use to ensure allocation 795// TODO: Simpler to unconditionally mark used but that regresses 796// codegen in test/CodeGen/AMDGPU/noclobber-barrier.ll 799 !Accesses->second.empty())
800 markUsedByKernel(&Func, Replacement.SGV);
802// remove preserves existing codegen 803 removeLocalVarsFromUsedLists(M, KernelUsedVariables);
804 KernelToReplacement[&Func] = Replacement;
806// Rewrite uses within kernel to the new struct 807 replaceLDSVariablesWithStruct(
808 M, KernelUsedVariables, Replacement, [&Func](
Use &U) {
810returnI &&
I->getFunction() == &Func;
813return KernelToReplacement;
819// Create a dynamic lds variable with a name associated with the passed 820// function that has the maximum alignment of any dynamic lds variable 821// reachable from this kernel. Dynamic LDS is allocated after the static LDS 822// allocation, possibly after alignment padding. The representative variable 823// created here has the maximum alignment of any other dynamic variable 824// reachable by that kernel. All dynamic LDS variables are allocated at the 825// same address in each kernel in order to provide the documented aliasing 826// semantics. Setting the alignment here allows this IR pass to accurately 827// predict the exact constant at which it will be allocated. 833Align MaxDynamicAlignment(1);
837 MaxDynamicAlignment =
843 UpdateMaxAlignment(GV);
847 UpdateMaxAlignment(GV);
856N->setAlignment(MaxDynamicAlignment);
866 std::vector<Function *>
const &OrderedKernels) {
868if (!KernelsThatIndirectlyAllocateDynamicLDS.
empty()) {
873 std::vector<Constant *> newDynamicLDS;
875// Table is built in the same order as OrderedKernels 876for (
auto &
func : OrderedKernels) {
878if (KernelsThatIndirectlyAllocateDynamicLDS.
contains(
func)) {
880if (!
func->hasName()) {
885 buildRepresentativeDynamicLDSInstance(M, LDSUsesInfo,
func);
887 KernelToCreatedDynamicLDS[
func] =
N;
889 markUsedByKernel(
func,
N);
893 emptyCharArray,
N, ConstantInt::get(I32, 0),
true);
899assert(OrderedKernels.size() == newDynamicLDS.size());
901ArrayType *t = ArrayType::get(I32, newDynamicLDS.size());
905"llvm.amdgcn.dynlds.offset.table",
nullptr,
910auto *
I = dyn_cast<Instruction>(U.getUser());
916 replaceUseWithTableLookup(M, Builder, table, GV, U,
nullptr);
920return KernelToCreatedDynamicLDS;
925bool NeedsReplacement =
false;
927if (
auto *
I = dyn_cast<Instruction>(U.getUser())) {
930 NeedsReplacement =
true;
935if (!NeedsReplacement)
937// Create a new GV used only by this kernel and its function 944if (
auto *
I = dyn_cast<Instruction>(U.getUser())) {
947 U.getUser()->replaceUsesOfWith(GV, NewGV);
954bool lowerSpecialLDSVariables(
958// The 1st round: give module-absolute assignments 960 std::vector<GlobalVariable *> OrderedGVs;
961for (
auto &K : LDSToKernelsThatNeedToAccessItIndirectly) {
965// give a module-absolute assignment if it is indirectly accessed by 966// multiple kernels. This is not precise, but we don't want to duplicate 967// a function when it is called by multiple kernels. 968if (LDSToKernelsThatNeedToAccessItIndirectly[GV].
size() > 1) {
969 OrderedGVs.push_back(GV);
971// leave it to the 2nd round, which will give a kernel-relative 972// assignment if it is only indirectly accessed by one kernel 975 LDSToKernelsThatNeedToAccessItIndirectly.
erase(GV);
977 OrderedGVs = sortByName(std::move(OrderedGVs));
979int BarId = ++NumAbsolutes;
981// 4 bits for alignment, 5 bits for the barrier num, 982// 3 bits for the barrier scope 983unsignedOffset = 0x802000u | BarrierScope << 9 | BarId << 4;
984 recordLDSAbsoluteAddress(&M, GV,
Offset);
988// The 2nd round: give a kernel-relative assignment for GV that 989// either only indirectly accessed by single kernel or only directly 990// accessed by multiple kernels. 991 std::vector<Function *> OrderedKernels;
995 OrderedKernels.push_back(
F);
997 OrderedKernels = sortByName(std::move(OrderedKernels));
1010 OrderedGVs.push_back(GV);
1012 OrderedGVs = sortByName(std::move(OrderedGVs));
1014// GV could also be used directly by other kernels. If so, we need to 1015// create a new GV used only by this kernel and its function. 1016auto NewGV = uniquifyGVPerKernel(M, GV,
F);
1017 Changed |= (NewGV != GV);
1018int BarId = (NumAbsolutes + 1);
1019if (Kernel2BarId.
find(
F) != Kernel2BarId.
end()) {
1020 BarId = (Kernel2BarId[
F] + 1);
1022 Kernel2BarId[
F] = BarId;
1024unsignedOffset = 0x802000u | BarrierScope << 9 | BarId << 4;
1025 recordLDSAbsoluteAddress(&M, NewGV,
Offset);
1029// Also erase those special LDS variables from indirect_access. 1040bool runOnModule(
Module &M) {
1042bool Changed = superAlignLDSGlobals(M);
1046 Changed =
true;
// todo: narrow this down 1048// For each kernel, what variables does it access directly or through 1052// For each variable accessed through callees, which kernels access it 1058 LDSToKernelsThatNeedToAccessItIndirectly[GV].
insert(
F);
1063// Special LDS variables need special address assignment 1064 Changed |= lowerSpecialLDSVariables(
1065 M, LDSUsesInfo, LDSToKernelsThatNeedToAccessItIndirectly);
1068// Partition variables accessed indirectly into the different strategies 1073 partitionVariablesIntoIndirectStrategies(
1074 M, LDSUsesInfo, LDSToKernelsThatNeedToAccessItIndirectly,
1075 ModuleScopeVariables, TableLookupVariables, KernelAccessVariables,
1078// If the kernel accesses a variable that is going to be stored in the 1079// module instance through a call then that kernel needs to allocate the 1082 kernelsThatIndirectlyAccessAnyOfPassedVariables(M, LDSUsesInfo,
1083 ModuleScopeVariables);
1085 kernelsThatIndirectlyAccessAnyOfPassedVariables(M, LDSUsesInfo,
1086 TableLookupVariables);
1089 kernelsThatIndirectlyAccessAnyOfPassedVariables(M, LDSUsesInfo,
1092GlobalVariable *MaybeModuleScopeStruct = lowerModuleScopeStructVariables(
1093 M, ModuleScopeVariables, KernelsThatAllocateModuleLDS);
1096 lowerKernelScopeStructVariables(M, LDSUsesInfo, ModuleScopeVariables,
1097 KernelsThatAllocateModuleLDS,
1098 MaybeModuleScopeStruct);
1100// Lower zero cost accesses to the kernel instances just created 1101for (
auto &GV : KernelAccessVariables) {
1102auto &funcs = LDSToKernelsThatNeedToAccessItIndirectly[GV];
1103assert(funcs.size() == 1);
// Only one kernel can access it 1104 LDSVariableReplacement Replacement =
1105 KernelToReplacement[*(funcs.begin())];
1110 replaceLDSVariablesWithStruct(M, Vec, Replacement, [](
Use &U) {
1111return isa<Instruction>(U.getUser());
1115// The ith element of this vector is kernel id i 1116 std::vector<Function *> OrderedKernels =
1117 assignLDSKernelIDToEachKernel(&M, KernelsThatAllocateTableLDS,
1118 KernelsThatIndirectlyAllocateDynamicLDS);
1120if (!KernelsThatAllocateTableLDS.
empty()) {
1124// The order must be consistent between lookup table and accesses to 1126auto TableLookupVariablesOrdered =
1127 sortByName(std::vector<GlobalVariable *>(TableLookupVariables.
begin(),
1128 TableLookupVariables.
end()));
1131 M, TableLookupVariablesOrdered, OrderedKernels, KernelToReplacement);
1132 replaceUsesInInstructionsWithTableLookup(M, TableLookupVariablesOrdered,
1137 lowerDynamicLDSVariables(M, LDSUsesInfo,
1138 KernelsThatIndirectlyAllocateDynamicLDS,
1139 DynamicVariables, OrderedKernels);
1141// Strip amdgpu-no-lds-kernel-id from all functions reachable from the 1142// kernel. We may have inferred this wasn't used prior to the pass. 1143// TODO: We could filter out subgraphs that do not access LDS globals. 1144for (
auto *KernelSet : {&KernelsThatIndirectlyAllocateDynamicLDS,
1145 &KernelsThatAllocateTableLDS})
1149// All kernel frames have been allocated. Calculate and record the 1154for (
Function &Func : M.functions()) {
1158// All three of these are optional. The first variable is allocated at 1159// zero. They are allocated by AMDGPUMachineFunction as one block. 1166// dynamic lds variables 1169constbool AllocateModuleScopeStruct =
1170 MaybeModuleScopeStruct &&
1171 KernelsThatAllocateModuleLDS.
contains(&Func);
1173auto Replacement = KernelToReplacement.
find(&Func);
1174constbool AllocateKernelScopeStruct =
1175 Replacement != KernelToReplacement.
end();
1177constbool AllocateDynamicVariable =
1178 KernelToCreatedDynamicLDS.
contains(&Func);
1182if (AllocateModuleScopeStruct) {
1183// Allocated at zero, recorded once on construction, not once per 1188if (AllocateKernelScopeStruct) {
1191 recordLDSAbsoluteAddress(&M, KernelStruct,
Offset);
1195// If there is dynamic allocation, the alignment needed is included in 1196// the static frame size. There may be no reference to the dynamic 1197// variable in the kernel itself, so without including it here, that 1198// alignment padding could be missed. 1199if (AllocateDynamicVariable) {
1200GlobalVariable *DynamicVariable = KernelToCreatedDynamicLDS[&Func];
1202 recordLDSAbsoluteAddress(&M, DynamicVariable,
Offset);
1206 (void)TM;
// TODO: Account for target maximum LDS 1211// Instead of explicitly marking kernels that access dynamic variables 1212// using special case metadata, annotate with min-lds == max-lds, i.e. 1213// that there is no more space available for allocating more static 1214// LDS variables. That is the right condition to prevent allocating 1215// more variables which would collide with the addresses assigned to 1216// dynamic variables. 1217if (AllocateDynamicVariable)
1220 Func.addFnAttr(
"amdgpu-lds-size", Buffer);
1227// probably want to remove from used lists 1237// Increase the alignment of LDS globals if necessary to maximise the chance 1238// that we can use aligned LDS instructions to access them. 1239staticbool superAlignLDSGlobals(
Module &M) {
1242if (!SuperAlignLDSGlobals) {
1246for (
auto &GV : M.globals()) {
1248// Only changing alignment of LDS variables 1252// cuda/hip extern __shared__ variable, leave alignment alone 1257// If the variable is already allocated, don't change the alignment 1265// We might want to use a b96 or b128 load/store 1266 Alignment = std::max(Alignment,
Align(16));
1267 }
elseif (GVSize > 4) {
1268// We might want to use a b64 load/store 1269 Alignment = std::max(Alignment,
Align(8));
1270 }
elseif (GVSize > 2) {
1271// We might want to use a b32 load/store 1272 Alignment = std::max(Alignment,
Align(4));
1273 }
elseif (GVSize > 1) {
1274// We might want to use a b16 load/store 1275 Alignment = std::max(Alignment,
Align(2));
1286static LDSVariableReplacement createLDSVariableReplacement(
1287Module &M, std::string VarName,
1289// Create a struct instance containing LDSVarsToTransform and map from those 1290// variables to ConstantExprGEP 1291// Variables may be introduced to meet alignment requirements. No aliasing 1292// metadata is useful for these as they have no uses. Erased before return. 1301// The order of fields in this struct depends on the order of 1302// variables in the argument which varies when changing how they 1303// are identified, leading to spurious test breakage. 1304auto Sorted = sortByName(std::vector<GlobalVariable *>(
1305 LDSVarsToTransform.
begin(), LDSVarsToTransform.
end()));
1317 std::vector<GlobalVariable *> LocalVars;
1319 LocalVars.
reserve(LDSVarsToTransform.
size());
// will be at least this large 1320 IsPaddingField.
reserve(LDSVarsToTransform.
size());
1323for (
auto &
F : LayoutFields) {
1326Align DataAlign =
F.Alignment;
1329if (
uint64_t Rem = CurrentOffset % DataAlignV) {
1330uint64_t Padding = DataAlignV - Rem;
1332// Append an array of padding bytes to meet alignment requested 1333// Note (o + (a - (o % a)) ) % a == 0 1334// (offset + Padding ) % align == 0 1342 CurrentOffset += Padding;
1345 LocalVars.push_back(FGV);
1347 CurrentOffset +=
F.Size;
1351 std::vector<Type *> LocalVarTypes;
1352 LocalVarTypes.reserve(LocalVars.size());
1354 LocalVars.cbegin(), LocalVars.cend(), std::back_inserter(LocalVarTypes),
1369for (
size_tI = 0;
I < LocalVars.size();
I++) {
1371Constant *GEPIdx[] = {ConstantInt::get(I32, 0), ConstantInt::get(I32,
I)};
1373if (IsPaddingField[
I]) {
1380assert(Map.size() == LDSVarsToTransform.
size());
1381return {SGV, std::move(Map)};
1384template <
typename PredicateTy>
1385staticvoid replaceLDSVariablesWithStruct(
1387const LDSVariableReplacement &Replacement, PredicateTy
Predicate) {
1391// A hack... we need to insert the aliasing info in a predictable order for 1392// lit tests. Would like to have them in a stable order already, ideally the 1393// same order they get allocated, which might mean an ordered set container 1394auto LDSVarsToTransform = sortByName(std::vector<GlobalVariable *>(
1395 LDSVarsToTransformArg.
begin(), LDSVarsToTransformArg.
end()));
1397// Create alias.scope and their lists. Each field in the new structure 1398// does not alias with all other fields. 1401constsize_t NumberVars = LDSVarsToTransform.
size();
1402if (NumberVars > 1) {
1404 AliasScopes.
reserve(NumberVars);
1406for (
size_tI = 0;
I < NumberVars;
I++) {
1410 NoAliasList.
append(&AliasScopes[1], AliasScopes.
end());
1413// Replace uses of ith variable with a constantexpr to the corresponding 1414// field of the instance that will be allocated by AMDGPUMachineFunction 1415for (
size_tI = 0;
I < NumberVars;
I++) {
1417Constant *
GEP = Replacement.LDSVarsToConstantGEP.at(GV);
1421APInt APOff(
DL.getIndexTypeSizeInBits(
GEP->getType()), 0);
1422GEP->stripAndAccumulateInBoundsConstantOffsets(
DL, APOff);
1429 NoAliasList[
I - 1] = AliasScopes[
I - 1];
1435 refineUsesAlignmentAndAA(
GEP,
A,
DL, AliasScope, NoAlias);
1441MDNode *NoAlias,
unsigned MaxDepth = 5) {
1442if (!MaxDepth || (
A == 1 && !AliasScope))
1446if (
auto *
I = dyn_cast<Instruction>(U)) {
1447if (AliasScope &&
I->mayReadOrWriteMemory()) {
1448MDNode *AS =
I->getMetadata(LLVMContext::MD_alias_scope);
1451I->setMetadata(LLVMContext::MD_alias_scope, AS);
1453MDNode *NA =
I->getMetadata(LLVMContext::MD_noalias);
1455I->setMetadata(LLVMContext::MD_noalias, NA);
1459if (
auto *LI = dyn_cast<LoadInst>(U)) {
1460 LI->setAlignment(std::max(
A, LI->getAlign()));
1463if (
auto *SI = dyn_cast<StoreInst>(U)) {
1464if (SI->getPointerOperand() ==
Ptr)
1465 SI->setAlignment(std::max(
A, SI->getAlign()));
1468if (
auto *AI = dyn_cast<AtomicRMWInst>(U)) {
1469// None of atomicrmw operations can work on pointers, but let's 1470// check it anyway in case it will or we will process ConstantExpr. 1471if (AI->getPointerOperand() ==
Ptr)
1472 AI->setAlignment(std::max(
A, AI->getAlign()));
1475if (
auto *AI = dyn_cast<AtomicCmpXchgInst>(U)) {
1476if (AI->getPointerOperand() ==
Ptr)
1477 AI->setAlignment(std::max(
A, AI->getAlign()));
1480if (
auto *
GEP = dyn_cast<GetElementPtrInst>(U)) {
1483if (
GEP->getPointerOperand() ==
Ptr) {
1485if (
GEP->accumulateConstantOffset(
DL, Off))
1487 refineUsesAlignmentAndAA(
GEP, GA,
DL, AliasScope, NoAlias,
1492if (
auto *
I = dyn_cast<Instruction>(U)) {
1493if (
I->getOpcode() == Instruction::BitCast ||
1494I->getOpcode() == Instruction::AddrSpaceCast)
1495 refineUsesAlignmentAndAA(
I,
A,
DL, AliasScope, NoAlias, MaxDepth - 1);
1501classAMDGPULowerModuleLDSLegacy :
publicModulePass {
1518auto &TPC = getAnalysis<TargetPassConfig>();
1522return AMDGPULowerModuleLDS(*TM).runOnModule(M);
1527char AMDGPULowerModuleLDSLegacy::ID = 0;
1532"Lower uses of LDS variables from non-kernel functions",
1541returnnew AMDGPULowerModuleLDSLegacy(TM);
Lower uses of LDS variables from non kernel functions
AMDGPU promote alloca to vector or LDS
The AMDGPU TargetMachine interface definition for hw codegen targets.
MachineBasicBlock MachineBasicBlock::iterator DebugLoc DL
This file implements the BitVector class.
static GCRegistry::Add< ErlangGC > A("erlang", "erlang-compatible garbage collector")
This file provides interfaces used to build and manipulate a call graph, which is a very useful tool ...
#define clEnumValN(ENUMVAL, FLAGNAME, DESC)
This file contains the declarations for the subclasses of Constant, which represent the different fla...
Given that RA is a live propagate it s liveness to any other values it uses(according to Uses). void DeadArgumentEliminationPass
This file defines the DenseMap class.
This file defines the DenseSet and SmallDenseSet classes.
std::optional< std::vector< StOtherPiece > > Other
This file provides an interface for laying out a sequence of fields as a struct in a way that attempt...
#define INITIALIZE_PASS_DEPENDENCY(depName)
#define INITIALIZE_PASS_END(passName, arg, name, cfg, analysis)
#define INITIALIZE_PASS_BEGIN(passName, arg, name, cfg, analysis)
assert(ImpDefSCC.getReg()==AMDGPU::SCC &&ImpDefSCC.isDef())
This file contains some templates that are useful if you are working with the STL at all.
This file defines generic set operations that may be used on set's of different types,...
Target-Independent Code Generator Pass Configuration Options pass.
Class for arbitrary precision integers.
uint64_t getZExtValue() const
Get zero extended value.
A container for analyses that lazily runs them and caches their results.
Represent the analysis usage information of a pass.
AnalysisUsage & addRequired()
ArrayRef - Represent a constant reference to an array (0 or more elements consecutively in memory),...
size_t size() const
size - Get the array size.
bool empty() const
empty - Check if the array is empty.
LLVM Basic Block Representation.
const_iterator getFirstInsertionPt() const
Returns an iterator to the first instruction in this block that is suitable for inserting a non-PHI i...
The basic data container for the call graph of a Module of IR.
static Constant * get(ArrayType *T, ArrayRef< Constant * > V)
static ConstantAsMetadata * get(Constant *C)
static Constant * getPointerBitCastOrAddrSpaceCast(Constant *C, Type *Ty)
Create a BitCast or AddrSpaceCast for a pointer type depending on the address space.
static Constant * getPtrToInt(Constant *C, Type *Ty, bool OnlyIfReduced=false)
static Constant * getGetElementPtr(Type *Ty, Constant *C, ArrayRef< Constant * > IdxList, GEPNoWrapFlags NW=GEPNoWrapFlags::none(), std::optional< ConstantRange > InRange=std::nullopt, Type *OnlyIfReducedTy=nullptr)
Getelementptr form.
This is an important base class in LLVM.
void removeDeadConstantUsers() const
If there are any dead constant users dangling off of this constant, remove them.
A parsed version of the target data layout string in and methods for querying it.
iterator find(const_arg_type_t< KeyT > Val)
std::pair< iterator, bool > try_emplace(KeyT &&Key, Ts &&...Args)
bool erase(const KeyT &Val)
bool contains(const_arg_type_t< KeyT > Val) const
Return true if the specified key is in the map, false otherwise.
std::pair< iterator, bool > insert(const std::pair< KeyT, ValueT > &KV)
Implements a dense probed hash-table based set.
void setMetadata(unsigned KindID, MDNode *Node)
Set a particular kind of metadata attachment.
void setAlignment(Align Align)
Sets the alignment attribute of the GlobalObject.
LinkageTypes getLinkage() const
bool isAbsoluteSymbolRef() const
Returns whether this is a reference to an absolute symbol.
ThreadLocalMode getThreadLocalMode() const
PointerType * getType() const
Global values are always pointers.
@ InternalLinkage
Rename collisions when linking (static functions).
@ ExternalLinkage
Externally visible function.
Type * getValueType() const
const Constant * getInitializer() const
getInitializer - Return the initializer for this global variable.
bool hasInitializer() const
Definitions have initializers, declarations don't.
void copyAttributesFrom(const GlobalVariable *Src)
copyAttributesFrom - copy all additional attributes (those not needed to create a GlobalVariable) fro...
bool isConstant() const
If the value is a global constant, its value is immutable throughout the runtime execution of the pro...
void eraseFromParent()
eraseFromParent - This method unlinks 'this' from the containing module and deletes it.
Value * CreateIntToPtr(Value *V, Type *DestTy, const Twine &Name="")
Value * CreateConstInBoundsGEP1_32(Type *Ty, Value *Ptr, unsigned Idx0, const Twine &Name="")
Value * CreateInBoundsGEP(Type *Ty, Value *Ptr, ArrayRef< Value * > IdxList, const Twine &Name="")
CallInst * CreateIntrinsic(Intrinsic::ID ID, ArrayRef< Type * > Types, ArrayRef< Value * > Args, FMFSource FMFSource={}, const Twine &Name="")
Create a call to intrinsic ID with Args, mangled using Types.
ConstantInt * getInt32(uint32_t C)
Get a constant 32-bit value.
LoadInst * CreateLoad(Type *Ty, Value *Ptr, const char *Name)
Provided to resolve 'CreateLoad(Ty, Ptr, "...")' correctly, instead of converting the string to 'bool...
CallInst * CreateCall(FunctionType *FTy, Value *Callee, ArrayRef< Value * > Args={}, const Twine &Name="", MDNode *FPMathTag=nullptr)
void SetInsertPoint(BasicBlock *TheBB)
This specifies that created instructions should be appended to the end of the specified block.
This provides a uniform API for creating instructions and inserting them into a basic block: either a...
This is an important class for using LLVM in a threaded context.
MDNode * createAnonymousAliasScope(MDNode *Domain, StringRef Name=StringRef())
Return metadata appropriate for an alias scope root node.
MDNode * createAnonymousAliasScopeDomain(StringRef Name=StringRef())
Return metadata appropriate for an alias scope domain node.
static MDNode * getMostGenericAliasScope(MDNode *A, MDNode *B)
static MDTuple * get(LLVMContext &Context, ArrayRef< Metadata * > MDs)
static MDNode * intersect(MDNode *A, MDNode *B)
Root of the metadata hierarchy.
ModulePass class - This class is used to implement unstructured interprocedural optimizations and ana...
virtual bool runOnModule(Module &M)=0
runOnModule - Virtual method overriden by subclasses to process the module being operated on.
A Module instance is used to store all the information related to an LLVM module.
A container for an operand bundle being viewed as a set of values rather than a set of uses.
static PassRegistry * getPassRegistry()
getPassRegistry - Access the global registry object, which is automatically initialized at applicatio...
virtual void getAnalysisUsage(AnalysisUsage &) const
getAnalysisUsage - This function should be overriden by passes that need analysis information to do t...
unsigned getAddressSpace() const
Return the address space of the Pointer type.
static PoisonValue * get(Type *T)
Static factory methods - Return an 'poison' object of the specified type.
A set of analyses that are preserved following a run of a transformation pass.
static PreservedAnalyses none()
Convenience factory function for the empty preserved set.
static PreservedAnalyses all()
Construct a special preserved set that preserves all passes.
size_type count(ConstPtrType Ptr) const
count - Return 1 if the specified pointer is in the set, 0 otherwise.
std::pair< iterator, bool > insert(PtrType Ptr)
Inserts Ptr if and only if there is no element in the container equal to Ptr.
SmallPtrSet - This class implements a set which is optimized for holding SmallSize or less elements.
reference emplace_back(ArgTypes &&... Args)
void reserve(size_type N)
void append(ItTy in_start, ItTy in_end)
Add the specified range to the end of the SmallVector.
void push_back(const T &Elt)
This is a 'vector' (really, a variable-sized array), optimized for the case when the array is small.
Class to represent struct types.
static StructType * create(LLVMContext &Context, StringRef Name)
This creates an identified struct.
Target-Independent Code Generator Pass Configuration Options.
Twine - A lightweight data structure for efficiently representing the concatenation of temporary valu...
The instances of the Type class are immutable: once they are created, they are never changed.
unsigned getPointerAddressSpace() const
Get the address space of this pointer or pointer vector type.
static IntegerType * getInt8Ty(LLVMContext &C)
static IntegerType * getInt32Ty(LLVMContext &C)
A Use represents the edge between a Value definition and its users.
LLVM Value Representation.
void replaceUsesWithIf(Value *New, llvm::function_ref< bool(Use &U)> ShouldReplace)
Go through the uses list for this definition and make each use point to "V" if the callback ShouldRep...
iterator_range< use_iterator > uses()
StringRef getName() const
Return a constant reference to the value's name.
std::pair< iterator, bool > insert(const ValueT &V)
bool contains(const_arg_type_t< ValueT > V) const
Check if the set contains the given element.
bool erase(const ValueT &V)
A raw_ostream that writes to an std::string.
@ LOCAL_ADDRESS
Address space for local memory.
@ CONSTANT_ADDRESS
Address space for constant memory (VTX2).
@ BARRIER_SCOPE_WORKGROUP
bool isDynamicLDS(const GlobalVariable &GV)
void removeFnAttrFromReachable(CallGraph &CG, Function *KernelRoot, ArrayRef< StringRef > FnAttrs)
Strip FnAttr attribute from any functions where we may have introduced its use.
LDSUsesInfoTy getTransitiveUsesOfLDS(const CallGraph &CG, Module &M)
TargetExtType * isNamedBarrier(const GlobalVariable &GV)
bool isLDSVariableToLower(const GlobalVariable &GV)
bool eliminateConstantExprUsesOfLDSFromAllInstructions(Module &M)
Align getAlign(const DataLayout &DL, const GlobalVariable *GV)
bool isKernelLDS(const Function *F)
@ C
The default llvm calling convention, compatible with C.
unsigned ID
LLVM IR allows to use arbitrary numbers as calling convention identifiers.
Function * getOrInsertDeclaration(Module *M, ID id, ArrayRef< Type * > Tys={})
Look up the Function declaration of the intrinsic id in the Module M.
ValuesClass values(OptsTy... Options)
Helper to build a ValuesClass by forwarding a variable number of arguments as an initializer list to ...
initializer< Ty > init(const Ty &Val)
This is an optimization pass for GlobalISel generic memory operations.
bool operator<(int64_t V1, const APSInt &V2)
auto size(R &&Range, std::enable_if_t< std::is_base_of< std::random_access_iterator_tag, typename std::iterator_traits< decltype(Range.begin())>::iterator_category >::value, void > *=nullptr)
Get the size of a range.
bool set_is_subset(const S1Ty &S1, const S2Ty &S2)
set_is_subset(A, B) - Return true iff A in B
iterator_range< early_inc_iterator_impl< detail::IterOfRange< RangeT > > > make_early_inc_range(RangeT &&Range)
Make a range that does early increment to allow mutation of the underlying range without disrupting i...
void initializeAMDGPULowerModuleLDSLegacyPass(PassRegistry &)
void sort(IteratorTy Start, IteratorTy End)
void report_fatal_error(Error Err, bool gen_crash_diag=true)
Report a serious error, calling any installed error handler.
char & AMDGPULowerModuleLDSLegacyPassID
void removeFromUsedLists(Module &M, function_ref< bool(Constant *)> ShouldRemove)
Removes global values from the llvm.used and llvm.compiler.used arrays.
format_object< Ts... > format(const char *Fmt, const Ts &... Vals)
These are helper functions used to produce formatted output.
ModulePass * createAMDGPULowerModuleLDSLegacyPass(const AMDGPUTargetMachine *TM=nullptr)
void appendToCompilerUsed(Module &M, ArrayRef< GlobalValue * > Values)
Adds global values to the llvm.compiler.used list.
std::pair< uint64_t, Align > performOptimizedStructLayout(MutableArrayRef< OptimizedStructLayoutField > Fields)
Compute a layout for a struct containing the given fields, making a best-effort attempt to minimize t...
uint64_t alignTo(uint64_t Size, Align A)
Returns a multiple of A needed to store Size bytes.
constexpr unsigned BitWidth
Align commonAlignment(Align A, uint64_t Offset)
Returns the alignment that satisfies both alignments.
PreservedAnalyses run(Module &M, ModuleAnalysisManager &AM)
const AMDGPUTargetMachine & TM
FunctionVariableMap direct_access
FunctionVariableMap indirect_access
This struct is a compact representation of a valid (non-zero power of two) alignment.
uint64_t value() const
This is a hole in the type system and should not be abused.