Movatterモバイル変換


[0]ホーム

URL:


LLVM 20.0.0git
AMDGPULowerModuleLDSPass.cpp
Go to the documentation of this file.
1//===-- AMDGPULowerModuleLDSPass.cpp ------------------------------*- C++ -*-=//
2//
3// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
4// See https://llvm.org/LICENSE.txt for license information.
5// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
6//
7//===----------------------------------------------------------------------===//
8//
9// This pass eliminates local data store, LDS, uses from non-kernel functions.
10// LDS is contiguous memory allocated per kernel execution.
11//
12// Background.
13//
14// The programming model is global variables, or equivalently function local
15// static variables, accessible from kernels or other functions. For uses from
16// kernels this is straightforward - assign an integer to the kernel for the
17// memory required by all the variables combined, allocate them within that.
18// For uses from functions there are performance tradeoffs to choose between.
19//
20// This model means the GPU runtime can specify the amount of memory allocated.
21// If this is more than the kernel assumed, the excess can be made available
22// using a language specific feature, which IR represents as a variable with
23// no initializer. This feature is referred to here as "Dynamic LDS" and is
24// lowered slightly differently to the normal case.
25//
26// Consequences of this GPU feature:
27// - memory is limited and exceeding it halts compilation
28// - a global accessed by one kernel exists independent of other kernels
29// - a global exists independent of simultaneous execution of the same kernel
30// - the address of the global may be different from different kernels as they
31// do not alias, which permits only allocating variables they use
32// - if the address is allowed to differ, functions need help to find it
33//
34// Uses from kernels are implemented here by grouping them in a per-kernel
35// struct instance. This duplicates the variables, accurately modelling their
36// aliasing properties relative to a single global representation. It also
37// permits control over alignment via padding.
38//
39// Uses from functions are more complicated and the primary purpose of this
40// IR pass. Several different lowering are chosen between to meet requirements
41// to avoid allocating any LDS where it is not necessary, as that impacts
42// occupancy and may fail the compilation, while not imposing overhead on a
43// feature whose primary advantage over global memory is performance. The basic
44// design goal is to avoid one kernel imposing overhead on another.
45//
46// Implementation.
47//
48// LDS variables with constant annotation or non-undef initializer are passed
49// through unchanged for simplification or error diagnostics in later passes.
50// Non-undef initializers are not yet implemented for LDS.
51//
52// LDS variables that are always allocated at the same address can be found
53// by lookup at that address. Otherwise runtime information/cost is required.
54//
55// The simplest strategy possible is to group all LDS variables in a single
56// struct and allocate that struct in every kernel such that the original
57// variables are always at the same address. LDS is however a limited resource
58// so this strategy is unusable in practice. It is not implemented here.
59//
60// Strategy | Precise allocation | Zero runtime cost | General purpose |
61// --------+--------------------+-------------------+-----------------+
62// Module | No | Yes | Yes |
63// Table | Yes | No | Yes |
64// Kernel | Yes | Yes | No |
65// Hybrid | Yes | Partial | Yes |
66//
67// "Module" spends LDS memory to save cycles. "Table" spends cycles and global
68// memory to save LDS. "Kernel" is as fast as kernel allocation but only works
69// for variables that are known reachable from a single kernel. "Hybrid" picks
70// between all three. When forced to choose between LDS and cycles we minimise
71// LDS use.
72
73// The "module" lowering implemented here finds LDS variables which are used by
74// non-kernel functions and creates a new struct with a field for each of those
75// LDS variables. Variables that are only used from kernels are excluded.
76//
77// The "table" lowering implemented here has three components.
78// First kernels are assigned a unique integer identifier which is available in
79// functions it calls through the intrinsic amdgcn_lds_kernel_id. The integer
80// is passed through a specific SGPR, thus works with indirect calls.
81// Second, each kernel allocates LDS variables independent of other kernels and
82// writes the addresses it chose for each variable into an array in consistent
83// order. If the kernel does not allocate a given variable, it writes undef to
84// the corresponding array location. These arrays are written to a constant
85// table in the order matching the kernel unique integer identifier.
86// Third, uses from non-kernel functions are replaced with a table lookup using
87// the intrinsic function to find the address of the variable.
88//
89// "Kernel" lowering is only applicable for variables that are unambiguously
90// reachable from exactly one kernel. For those cases, accesses to the variable
91// can be lowered to ConstantExpr address of a struct instance specific to that
92// one kernel. This is zero cost in space and in compute. It will raise a fatal
93// error on any variable that might be reachable from multiple kernels and is
94// thus most easily used as part of the hybrid lowering strategy.
95//
96// Hybrid lowering is a mixture of the above. It uses the zero cost kernel
97// lowering where it can. It lowers the variable accessed by the greatest
98// number of kernels using the module strategy as that is free for the first
99// variable. Any futher variables that can be lowered with the module strategy
100// without incurring LDS memory overhead are. The remaining ones are lowered
101// via table.
102//
103// Consequences
104// - No heuristics or user controlled magic numbers, hybrid is the right choice
105// - Kernels that don't use functions (or have had them all inlined) are not
106// affected by any lowering for kernels that do.
107// - Kernels that don't make indirect function calls are not affected by those
108// that do.
109// - Variables which are used by lots of kernels, e.g. those injected by a
110// language runtime in most kernels, are expected to have no overhead
111// - Implementations that instantiate templates per-kernel where those templates
112// use LDS are expected to hit the "Kernel" lowering strategy
113// - The runtime properties impose a cost in compiler implementation complexity
114//
115// Dynamic LDS implementation
116// Dynamic LDS is lowered similarly to the "table" strategy above and uses the
117// same intrinsic to identify which kernel is at the root of the dynamic call
118// graph. This relies on the specified behaviour that all dynamic LDS variables
119// alias one another, i.e. are at the same address, with respect to a given
120// kernel. Therefore this pass creates new dynamic LDS variables for each kernel
121// that allocates any dynamic LDS and builds a table of addresses out of those.
122// The AMDGPUPromoteAlloca pass skips kernels that use dynamic LDS.
123// The corresponding optimisation for "kernel" lowering where the table lookup
124// is elided is not implemented.
125//
126//
127// Implementation notes / limitations
128// A single LDS global variable represents an instance per kernel that can reach
129// said variables. This pass essentially specialises said variables per kernel.
130// Handling ConstantExpr during the pass complicated this significantly so now
131// all ConstantExpr uses of LDS variables are expanded to instructions. This
132// may need amending when implementing non-undef initialisers.
133//
134// Lowering is split between this IR pass and the back end. This pass chooses
135// where given variables should be allocated and marks them with metadata,
136// MD_absolute_symbol. The backend places the variables in coincidentally the
137// same location and raises a fatal error if something has gone awry. This works
138// in practice because the only pass between this one and the backend that
139// changes LDS is PromoteAlloca and the changes it makes do not conflict.
140//
141// Addresses are written to constant global arrays based on the same metadata.
142//
143// The backend lowers LDS variables in the order of traversal of the function.
144// This is at odds with the deterministic layout required. The workaround is to
145// allocate the fixed-address variables immediately upon starting the function
146// where they can be placed as intended. This requires a means of mapping from
147// the function to the variables that it allocates. For the module scope lds,
148// this is via metadata indicating whether the variable is not required. If a
149// pass deletes that metadata, a fatal error on disagreement with the absolute
150// symbol metadata will occur. For kernel scope and dynamic, this is by _name_
151// correspondence between the function and the variable. It requires the
152// kernel to have a name (which is only a limitation for tests in practice) and
153// for nothing to rename the corresponding symbols. This is a hazard if the pass
154// is run multiple times during debugging. Alternative schemes considered all
155// involve bespoke metadata.
156//
157// If the name correspondence can be replaced, multiple distinct kernels that
158// have the same memory layout can map to the same kernel id (as the address
159// itself is handled by the absolute symbol metadata) and that will allow more
160// uses of the "kernel" style faster lowering and reduce the size of the lookup
161// tables.
162//
163// There is a test that checks this does not fire for a graphics shader. This
164// lowering is expected to work for graphics if the isKernel test is changed.
165//
166// The current markUsedByKernel is sufficient for PromoteAlloca but is elided
167// before codegen. Replacing this with an equivalent intrinsic which lasts until
168// shortly after the machine function lowering of LDS would help break the name
169// mapping. The other part needed is probably to amend PromoteAlloca to embed
170// the LDS variables it creates in the same struct created here. That avoids the
171// current hazard where a PromoteAlloca LDS variable might be allocated before
172// the kernel scope (and thus error on the address check). Given a new invariant
173// that no LDS variables exist outside of the structs managed here, and an
174// intrinsic that lasts until after the LDS frame lowering, it should be
175// possible to drop the name mapping and fold equivalent memory layouts.
176//
177//===----------------------------------------------------------------------===//
178
179#include "AMDGPU.h"
180#include "AMDGPUMemoryUtils.h"
181#include "AMDGPUTargetMachine.h"
182#include "Utils/AMDGPUBaseInfo.h"
183#include "llvm/ADT/BitVector.h"
184#include "llvm/ADT/DenseMap.h"
185#include "llvm/ADT/DenseSet.h"
186#include "llvm/ADT/STLExtras.h"
187#include "llvm/ADT/SetOperations.h"
188#include "llvm/Analysis/CallGraph.h"
189#include "llvm/CodeGen/TargetPassConfig.h"
190#include "llvm/IR/Constants.h"
191#include "llvm/IR/DerivedTypes.h"
192#include "llvm/IR/IRBuilder.h"
193#include "llvm/IR/InlineAsm.h"
194#include "llvm/IR/Instructions.h"
195#include "llvm/IR/IntrinsicsAMDGPU.h"
196#include "llvm/IR/MDBuilder.h"
197#include "llvm/IR/ReplaceConstant.h"
198#include "llvm/InitializePasses.h"
199#include "llvm/Pass.h"
200#include "llvm/Support/CommandLine.h"
201#include "llvm/Support/Debug.h"
202#include "llvm/Support/Format.h"
203#include "llvm/Support/OptimizedStructLayout.h"
204#include "llvm/Support/raw_ostream.h"
205#include "llvm/Transforms/Utils/BasicBlockUtils.h"
206#include "llvm/Transforms/Utils/ModuleUtils.h"
207
208#include <vector>
209
210#include <cstdio>
211
212#define DEBUG_TYPE "amdgpu-lower-module-lds"
213
214using namespacellvm;
215using namespaceAMDGPU;
216
217namespace{
218
219cl::opt<bool> SuperAlignLDSGlobals(
220"amdgpu-super-align-lds-globals",
221cl::desc("Increase alignment of LDS if it is not on align boundary"),
222cl::init(true),cl::Hidden);
223
224enum class LoweringKind { module, table, kernel, hybrid };
225cl::opt<LoweringKind> LoweringKindLoc(
226"amdgpu-lower-module-lds-strategy",
227cl::desc("Specify lowering strategy for function LDS access:"),cl::Hidden,
228cl::init(LoweringKind::hybrid),
229cl::values(
230clEnumValN(LoweringKind::table,"table","Lower via table lookup"),
231clEnumValN(LoweringKind::module,"module","Lower via module struct"),
232clEnumValN(
233 LoweringKind::kernel,"kernel",
234"Lower variables reachable from one kernel, otherwise abort"),
235clEnumValN(LoweringKind::hybrid,"hybrid",
236"Lower via mixture of above strategies")));
237
238template <typename T> std::vector<T> sortByName(std::vector<T> &&V) {
239llvm::sort(V.begin(), V.end(), [](constauto *L,constauto *R) {
240 return L->getName() < R->getName();
241 });
242return {std::move(V)};
243}
244
245classAMDGPULowerModuleLDS {
246constAMDGPUTargetMachine &TM;
247
248staticvoid
249 removeLocalVarsFromUsedLists(Module &M,
250constDenseSet<GlobalVariable *> &LocalVars) {
251// The verifier rejects used lists containing an inttoptr of a constant
252// so remove the variables from these lists before replaceAllUsesWith
253SmallPtrSet<Constant *, 8> LocalVarsSet;
254for (GlobalVariable *LocalVar : LocalVars)
255 LocalVarsSet.insert(cast<Constant>(LocalVar->stripPointerCasts()));
256
257removeFromUsedLists(
258 M, [&LocalVarsSet](Constant *C) {return LocalVarsSet.count(C); });
259
260for (GlobalVariable *LocalVar : LocalVars)
261 LocalVar->removeDeadConstantUsers();
262 }
263
264staticvoid markUsedByKernel(Function *Func,GlobalVariable *SGV) {
265// The llvm.amdgcn.module.lds instance is implicitly used by all kernels
266// that might call a function which accesses a field within it. This is
267// presently approximated to 'all kernels' if there are any such functions
268// in the module. This implicit use is redefined as an explicit use here so
269// that later passes, specifically PromoteAlloca, account for the required
270// memory without any knowledge of this transform.
271
272// An operand bundle on llvm.donothing works because the call instruction
273// survives until after the last pass that needs to account for LDS. It is
274// better than inline asm as the latter survives until the end of codegen. A
275// totally robust solution would be a function with the same semantics as
276// llvm.donothing that takes a pointer to the instance and is lowered to a
277// no-op after LDS is allocated, but that is not presently necessary.
278
279// This intrinsic is eliminated shortly before instruction selection. It
280// does not suffice to indicate to ISel that a given global which is not
281// immediately used by the kernel must still be allocated by it. An
282// equivalent target specific intrinsic which lasts until immediately after
283// codegen would suffice for that, but one would still need to ensure that
284// the variables are allocated in the anticipated order.
285BasicBlock *Entry = &Func->getEntryBlock();
286IRBuilder<> Builder(Entry, Entry->getFirstNonPHIIt());
287
288Function *Decl =Intrinsic::getOrInsertDeclaration(
289 Func->getParent(), Intrinsic::donothing, {});
290
291Value *UseInstance[1] = {
292 Builder.CreateConstInBoundsGEP1_32(SGV->getValueType(), SGV, 0)};
293
294 Builder.CreateCall(
295 Decl, {}, {OperandBundleDefT<Value *>("ExplicitUse", UseInstance)});
296 }
297
298public:
299 AMDGPULowerModuleLDS(constAMDGPUTargetMachine &TM_) : TM(TM_) {}
300
301structLDSVariableReplacement {
302GlobalVariable *SGV =nullptr;
303DenseMap<GlobalVariable *, Constant *> LDSVarsToConstantGEP;
304 };
305
306// remap from lds global to a constantexpr gep to where it has been moved to
307// for each kernel
308// an array with an element for each kernel containing where the corresponding
309// variable was remapped to
310
311staticConstant *getAddressesOfVariablesInKernel(
312LLVMContext &Ctx,ArrayRef<GlobalVariable *> Variables,
313constDenseMap<GlobalVariable *, Constant *> &LDSVarsToConstantGEP) {
314// Create a ConstantArray containing the address of each Variable within the
315// kernel corresponding to LDSVarsToConstantGEP, or poison if that kernel
316// does not allocate it
317// TODO: Drop the ptrtoint conversion
318
319Type *I32 =Type::getInt32Ty(Ctx);
320
321ArrayType *KernelOffsetsType = ArrayType::get(I32, Variables.size());
322
323SmallVector<Constant *> Elements;
324for (GlobalVariable *GV : Variables) {
325auto ConstantGepIt = LDSVarsToConstantGEP.find(GV);
326if (ConstantGepIt != LDSVarsToConstantGEP.end()) {
327auto *elt =ConstantExpr::getPtrToInt(ConstantGepIt->second, I32);
328 Elements.push_back(elt);
329 }else {
330 Elements.push_back(PoisonValue::get(I32));
331 }
332 }
333returnConstantArray::get(KernelOffsetsType, Elements);
334 }
335
336staticGlobalVariable *buildLookupTable(
337Module &M,ArrayRef<GlobalVariable *> Variables,
338ArrayRef<Function *> kernels,
339DenseMap<Function *, LDSVariableReplacement> &KernelToReplacement) {
340if (Variables.empty()) {
341returnnullptr;
342 }
343LLVMContext &Ctx = M.getContext();
344
345constsize_t NumberVariables = Variables.size();
346constsize_t NumberKernels = kernels.size();
347
348ArrayType *KernelOffsetsType =
349 ArrayType::get(Type::getInt32Ty(Ctx), NumberVariables);
350
351ArrayType *AllKernelsOffsetsType =
352 ArrayType::get(KernelOffsetsType, NumberKernels);
353
354Constant *Missing =PoisonValue::get(KernelOffsetsType);
355 std::vector<Constant *> overallConstantExprElts(NumberKernels);
356for (size_t i = 0; i < NumberKernels; i++) {
357auto Replacement = KernelToReplacement.find(kernels[i]);
358 overallConstantExprElts[i] =
359 (Replacement == KernelToReplacement.end())
360 ? Missing
361 : getAddressesOfVariablesInKernel(
362 Ctx, Variables, Replacement->second.LDSVarsToConstantGEP);
363 }
364
365Constant *init =
366ConstantArray::get(AllKernelsOffsetsType, overallConstantExprElts);
367
368returnnewGlobalVariable(
369 M, AllKernelsOffsetsType,true,GlobalValue::InternalLinkage, init,
370"llvm.amdgcn.lds.offset.table",nullptr,GlobalValue::NotThreadLocal,
371AMDGPUAS::CONSTANT_ADDRESS);
372 }
373
374void replaceUseWithTableLookup(Module &M,IRBuilder<> &Builder,
375GlobalVariable *LookupTable,
376GlobalVariable *GV,Use &U,
377Value *OptionalIndex) {
378// Table is a constant array of the same length as OrderedKernels
379LLVMContext &Ctx = M.getContext();
380Type *I32 =Type::getInt32Ty(Ctx);
381auto *I = cast<Instruction>(U.getUser());
382
383Value *tableKernelIndex = getTableLookupKernelIndex(M,I->getFunction());
384
385if (auto *Phi = dyn_cast<PHINode>(I)) {
386BasicBlock *BB = Phi->getIncomingBlock(U);
387 Builder.SetInsertPoint(&(*(BB->getFirstInsertionPt())));
388 }else {
389 Builder.SetInsertPoint(I);
390 }
391
392SmallVector<Value *, 3> GEPIdx = {
393 ConstantInt::get(I32, 0),
394 tableKernelIndex,
395 };
396if (OptionalIndex)
397 GEPIdx.push_back(OptionalIndex);
398
399Value *Address = Builder.CreateInBoundsGEP(
400 LookupTable->getValueType(), LookupTable, GEPIdx, GV->getName());
401
402Value *loaded = Builder.CreateLoad(I32,Address);
403
404Value *replacement =
405 Builder.CreateIntToPtr(loaded, GV->getType(), GV->getName());
406
407 U.set(replacement);
408 }
409
410void replaceUsesInInstructionsWithTableLookup(
411Module &M,ArrayRef<GlobalVariable *> ModuleScopeVariables,
412GlobalVariable *LookupTable) {
413
414LLVMContext &Ctx = M.getContext();
415IRBuilder<> Builder(Ctx);
416Type *I32 =Type::getInt32Ty(Ctx);
417
418for (size_t Index = 0; Index < ModuleScopeVariables.size(); Index++) {
419auto *GV = ModuleScopeVariables[Index];
420
421for (Use &U :make_early_inc_range(GV->uses())) {
422auto *I = dyn_cast<Instruction>(U.getUser());
423if (!I)
424continue;
425
426 replaceUseWithTableLookup(M, Builder, LookupTable, GV, U,
427 ConstantInt::get(I32, Index));
428 }
429 }
430 }
431
432staticDenseSet<Function *> kernelsThatIndirectlyAccessAnyOfPassedVariables(
433Module &M,LDSUsesInfoTy &LDSUsesInfo,
434DenseSet<GlobalVariable *>const &VariableSet) {
435
436DenseSet<Function *> KernelSet;
437
438if (VariableSet.empty())
439return KernelSet;
440
441for (Function &Func : M.functions()) {
442if (Func.isDeclaration() || !isKernelLDS(&Func))
443continue;
444for (GlobalVariable *GV : LDSUsesInfo.indirect_access[&Func]) {
445if (VariableSet.contains(GV)) {
446 KernelSet.insert(&Func);
447break;
448 }
449 }
450 }
451
452return KernelSet;
453 }
454
455staticGlobalVariable *
456 chooseBestVariableForModuleStrategy(constDataLayout &DL,
457VariableFunctionMap &LDSVars) {
458// Find the global variable with the most indirect uses from kernels
459
460structCandidateTy {
461GlobalVariable *GV =nullptr;
462size_t UserCount = 0;
463size_tSize = 0;
464
465 CandidateTy() =default;
466
467 CandidateTy(GlobalVariable *GV,uint64_t UserCount,uint64_t AllocSize)
468 : GV(GV), UserCount(UserCount),Size(AllocSize) {}
469
470booloperator<(const CandidateTy &Other) const{
471// Fewer users makes module scope variable less attractive
472if (UserCount <Other.UserCount) {
473returntrue;
474 }
475if (UserCount >Other.UserCount) {
476returnfalse;
477 }
478
479// Bigger makes module scope variable less attractive
480if (Size <Other.Size) {
481returnfalse;
482 }
483
484if (Size >Other.Size) {
485returntrue;
486 }
487
488// Arbitrary but consistent
489return GV->getName() <Other.GV->getName();
490 }
491 };
492
493 CandidateTy MostUsed;
494
495for (auto &K : LDSVars) {
496GlobalVariable *GV = K.first;
497if (K.second.size() <= 1) {
498// A variable reachable by only one kernel is best lowered with kernel
499// strategy
500continue;
501 }
502 CandidateTy Candidate(
503 GV, K.second.size(),
504DL.getTypeAllocSize(GV->getValueType()).getFixedValue());
505if (MostUsed < Candidate)
506 MostUsed = Candidate;
507 }
508
509return MostUsed.GV;
510 }
511
512staticvoid recordLDSAbsoluteAddress(Module *M,GlobalVariable *GV,
513uint32_tAddress) {
514// Write the specified address into metadata where it can be retrieved by
515// the assembler. Format is a half open range, [Address Address+1)
516LLVMContext &Ctx = M->getContext();
517auto *IntTy =
518 M->getDataLayout().getIntPtrType(Ctx,AMDGPUAS::LOCAL_ADDRESS);
519auto *MinC =ConstantAsMetadata::get(ConstantInt::get(IntTy,Address));
520auto *MaxC =ConstantAsMetadata::get(ConstantInt::get(IntTy,Address + 1));
521 GV->setMetadata(LLVMContext::MD_absolute_symbol,
522MDNode::get(Ctx, {MinC, MaxC}));
523 }
524
525DenseMap<Function *, Value *> tableKernelIndexCache;
526Value *getTableLookupKernelIndex(Module &M,Function *F) {
527// Accesses from a function use the amdgcn_lds_kernel_id intrinsic which
528// lowers to a read from a live in register. Emit it once in the entry
529// block to spare deduplicating it later.
530auto [It, Inserted] = tableKernelIndexCache.try_emplace(F);
531if (Inserted) {
532auto InsertAt =F->getEntryBlock().getFirstNonPHIOrDbgOrAlloca();
533IRBuilder<> Builder(&*InsertAt);
534
535 It->second =
536 Builder.CreateIntrinsic(Intrinsic::amdgcn_lds_kernel_id, {}, {});
537 }
538
539return It->second;
540 }
541
542static std::vector<Function *> assignLDSKernelIDToEachKernel(
543Module *M,DenseSet<Function *>const &KernelsThatAllocateTableLDS,
544DenseSet<Function *>const &KernelsThatIndirectlyAllocateDynamicLDS) {
545// Associate kernels in the set with an arbitrary but reproducible order and
546// annotate them with that order in metadata. This metadata is recognised by
547// the backend and lowered to a SGPR which can be read from using
548// amdgcn_lds_kernel_id.
549
550 std::vector<Function *> OrderedKernels;
551if (!KernelsThatAllocateTableLDS.empty() ||
552 !KernelsThatIndirectlyAllocateDynamicLDS.empty()) {
553
554for (Function &Func : M->functions()) {
555if (Func.isDeclaration())
556continue;
557if (!isKernelLDS(&Func))
558continue;
559
560if (KernelsThatAllocateTableLDS.contains(&Func) ||
561 KernelsThatIndirectlyAllocateDynamicLDS.contains(&Func)) {
562assert(Func.hasName());// else fatal error earlier
563 OrderedKernels.push_back(&Func);
564 }
565 }
566
567// Put them in an arbitrary but reproducible order
568 OrderedKernels = sortByName(std::move(OrderedKernels));
569
570// Annotate the kernels with their order in this vector
571LLVMContext &Ctx = M->getContext();
572IRBuilder<> Builder(Ctx);
573
574if (OrderedKernels.size() > UINT32_MAX) {
575// 32 bit keeps it in one SGPR. > 2**32 kernels won't fit on the GPU
576report_fatal_error("Unimplemented LDS lowering for > 2**32 kernels");
577 }
578
579for (size_t i = 0; i < OrderedKernels.size(); i++) {
580Metadata *AttrMDArgs[1] = {
581ConstantAsMetadata::get(Builder.getInt32(i)),
582 };
583 OrderedKernels[i]->setMetadata("llvm.amdgcn.lds.kernel.id",
584MDNode::get(Ctx, AttrMDArgs));
585 }
586 }
587return OrderedKernels;
588 }
589
590staticvoid partitionVariablesIntoIndirectStrategies(
591Module &M,LDSUsesInfoTyconst &LDSUsesInfo,
592VariableFunctionMap &LDSToKernelsThatNeedToAccessItIndirectly,
593DenseSet<GlobalVariable *> &ModuleScopeVariables,
594DenseSet<GlobalVariable *> &TableLookupVariables,
595DenseSet<GlobalVariable *> &KernelAccessVariables,
596DenseSet<GlobalVariable *> &DynamicVariables) {
597
598GlobalVariable *HybridModuleRoot =
599 LoweringKindLoc != LoweringKind::hybrid
600 ? nullptr
601 : chooseBestVariableForModuleStrategy(
602 M.getDataLayout(), LDSToKernelsThatNeedToAccessItIndirectly);
603
604DenseSet<Function *>const EmptySet;
605DenseSet<Function *>const &HybridModuleRootKernels =
606 HybridModuleRoot
607 ? LDSToKernelsThatNeedToAccessItIndirectly[HybridModuleRoot]
608 : EmptySet;
609
610for (auto &K : LDSToKernelsThatNeedToAccessItIndirectly) {
611// Each iteration of this loop assigns exactly one global variable to
612// exactly one of the implementation strategies.
613
614GlobalVariable *GV = K.first;
615assert(AMDGPU::isLDSVariableToLower(*GV));
616assert(K.second.size() != 0);
617
618if (AMDGPU::isDynamicLDS(*GV)) {
619 DynamicVariables.insert(GV);
620continue;
621 }
622
623switch (LoweringKindLoc) {
624case LoweringKind::module:
625 ModuleScopeVariables.insert(GV);
626break;
627
628case LoweringKind::table:
629 TableLookupVariables.insert(GV);
630break;
631
632case LoweringKind::kernel:
633if (K.second.size() == 1) {
634 KernelAccessVariables.insert(GV);
635 }else {
636report_fatal_error(
637"cannot lower LDS '" + GV->getName() +
638"' to kernel access as it is reachable from multiple kernels");
639 }
640break;
641
642case LoweringKind::hybrid: {
643if (GV == HybridModuleRoot) {
644assert(K.second.size() != 1);
645 ModuleScopeVariables.insert(GV);
646 }elseif (K.second.size() == 1) {
647 KernelAccessVariables.insert(GV);
648 }elseif (set_is_subset(K.second, HybridModuleRootKernels)) {
649 ModuleScopeVariables.insert(GV);
650 }else {
651 TableLookupVariables.insert(GV);
652 }
653break;
654 }
655 }
656 }
657
658// All LDS variables accessed indirectly have now been partitioned into
659// the distinct lowering strategies.
660assert(ModuleScopeVariables.size() + TableLookupVariables.size() +
661 KernelAccessVariables.size() + DynamicVariables.size() ==
662 LDSToKernelsThatNeedToAccessItIndirectly.size());
663 }
664
665staticGlobalVariable *lowerModuleScopeStructVariables(
666Module &M,DenseSet<GlobalVariable *>const &ModuleScopeVariables,
667DenseSet<Function *>const &KernelsThatAllocateModuleLDS) {
668// Create a struct to hold the ModuleScopeVariables
669// Replace all uses of those variables from non-kernel functions with the
670// new struct instance Replace only the uses from kernel functions that will
671// allocate this instance. That is a space optimisation - kernels that use a
672// subset of the module scope struct and do not need to allocate it for
673// indirect calls will only allocate the subset they use (they do so as part
674// of the per-kernel lowering).
675if (ModuleScopeVariables.empty()) {
676returnnullptr;
677 }
678
679LLVMContext &Ctx = M.getContext();
680
681 LDSVariableReplacement ModuleScopeReplacement =
682 createLDSVariableReplacement(M,"llvm.amdgcn.module.lds",
683 ModuleScopeVariables);
684
685appendToCompilerUsed(M, {static_cast<GlobalValue *>(
686ConstantExpr::getPointerBitCastOrAddrSpaceCast(
687 cast<Constant>(ModuleScopeReplacement.SGV),
688 PointerType::getUnqual(Ctx)))});
689
690// module.lds will be allocated at zero in any kernel that allocates it
691 recordLDSAbsoluteAddress(&M, ModuleScopeReplacement.SGV, 0);
692
693// historic
694 removeLocalVarsFromUsedLists(M, ModuleScopeVariables);
695
696// Replace all uses of module scope variable from non-kernel functions
697 replaceLDSVariablesWithStruct(
698 M, ModuleScopeVariables, ModuleScopeReplacement, [&](Use &U) {
699Instruction *I = dyn_cast<Instruction>(U.getUser());
700if (!I) {
701returnfalse;
702 }
703Function *F =I->getFunction();
704return !isKernelLDS(F);
705 });
706
707// Replace uses of module scope variable from kernel functions that
708// allocate the module scope variable, otherwise leave them unchanged
709// Record on each kernel whether the module scope global is used by it
710
711for (Function &Func : M.functions()) {
712if (Func.isDeclaration() || !isKernelLDS(&Func))
713continue;
714
715if (KernelsThatAllocateModuleLDS.contains(&Func)) {
716 replaceLDSVariablesWithStruct(
717 M, ModuleScopeVariables, ModuleScopeReplacement, [&](Use &U) {
718Instruction *I = dyn_cast<Instruction>(U.getUser());
719if (!I) {
720returnfalse;
721 }
722Function *F =I->getFunction();
723returnF == &Func;
724 });
725
726 markUsedByKernel(&Func, ModuleScopeReplacement.SGV);
727 }
728 }
729
730return ModuleScopeReplacement.SGV;
731 }
732
733staticDenseMap<Function *, LDSVariableReplacement>
734 lowerKernelScopeStructVariables(
735Module &M,LDSUsesInfoTy &LDSUsesInfo,
736DenseSet<GlobalVariable *>const &ModuleScopeVariables,
737DenseSet<Function *>const &KernelsThatAllocateModuleLDS,
738GlobalVariable *MaybeModuleScopeStruct) {
739
740// Create a struct for each kernel for the non-module-scope variables.
741
742DenseMap<Function *, LDSVariableReplacement> KernelToReplacement;
743for (Function &Func : M.functions()) {
744if (Func.isDeclaration() || !isKernelLDS(&Func))
745continue;
746
747DenseSet<GlobalVariable *> KernelUsedVariables;
748// Allocating variables that are used directly in this struct to get
749// alignment aware allocation and predictable frame size.
750for (auto &v : LDSUsesInfo.direct_access[&Func]) {
751if (!AMDGPU::isDynamicLDS(*v)) {
752 KernelUsedVariables.insert(v);
753 }
754 }
755
756// Allocating variables that are accessed indirectly so that a lookup of
757// this struct instance can find them from nested functions.
758for (auto &v : LDSUsesInfo.indirect_access[&Func]) {
759if (!AMDGPU::isDynamicLDS(*v)) {
760 KernelUsedVariables.insert(v);
761 }
762 }
763
764// Variables allocated in module lds must all resolve to that struct,
765// not to the per-kernel instance.
766if (KernelsThatAllocateModuleLDS.contains(&Func)) {
767for (GlobalVariable *v : ModuleScopeVariables) {
768 KernelUsedVariables.erase(v);
769 }
770 }
771
772if (KernelUsedVariables.empty()) {
773// Either used no LDS, or the LDS it used was all in the module struct
774// or dynamically sized
775continue;
776 }
777
778// The association between kernel function and LDS struct is done by
779// symbol name, which only works if the function in question has a
780// name This is not expected to be a problem in practice as kernels
781// are called by name making anonymous ones (which are named by the
782// backend) difficult to use. This does mean that llvm test cases need
783// to name the kernels.
784if (!Func.hasName()) {
785report_fatal_error("Anonymous kernels cannot use LDS variables");
786 }
787
788 std::string VarName =
789 (Twine("llvm.amdgcn.kernel.") + Func.getName() +".lds").str();
790
791auto Replacement =
792 createLDSVariableReplacement(M, VarName, KernelUsedVariables);
793
794// If any indirect uses, create a direct use to ensure allocation
795// TODO: Simpler to unconditionally mark used but that regresses
796// codegen in test/CodeGen/AMDGPU/noclobber-barrier.ll
797auto Accesses = LDSUsesInfo.indirect_access.find(&Func);
798if ((Accesses != LDSUsesInfo.indirect_access.end()) &&
799 !Accesses->second.empty())
800 markUsedByKernel(&Func, Replacement.SGV);
801
802// remove preserves existing codegen
803 removeLocalVarsFromUsedLists(M, KernelUsedVariables);
804 KernelToReplacement[&Func] = Replacement;
805
806// Rewrite uses within kernel to the new struct
807 replaceLDSVariablesWithStruct(
808 M, KernelUsedVariables, Replacement, [&Func](Use &U) {
809Instruction *I = dyn_cast<Instruction>(U.getUser());
810returnI &&I->getFunction() == &Func;
811 });
812 }
813return KernelToReplacement;
814 }
815
816staticGlobalVariable *
817 buildRepresentativeDynamicLDSInstance(Module &M,LDSUsesInfoTy &LDSUsesInfo,
818Function *func) {
819// Create a dynamic lds variable with a name associated with the passed
820// function that has the maximum alignment of any dynamic lds variable
821// reachable from this kernel. Dynamic LDS is allocated after the static LDS
822// allocation, possibly after alignment padding. The representative variable
823// created here has the maximum alignment of any other dynamic variable
824// reachable by that kernel. All dynamic LDS variables are allocated at the
825// same address in each kernel in order to provide the documented aliasing
826// semantics. Setting the alignment here allows this IR pass to accurately
827// predict the exact constant at which it will be allocated.
828
829assert(isKernelLDS(func));
830
831LLVMContext &Ctx = M.getContext();
832constDataLayout &DL = M.getDataLayout();
833Align MaxDynamicAlignment(1);
834
835auto UpdateMaxAlignment = [&MaxDynamicAlignment, &DL](GlobalVariable *GV) {
836if (AMDGPU::isDynamicLDS(*GV)) {
837 MaxDynamicAlignment =
838 std::max(MaxDynamicAlignment,AMDGPU::getAlign(DL, GV));
839 }
840 };
841
842for (GlobalVariable *GV : LDSUsesInfo.indirect_access[func]) {
843 UpdateMaxAlignment(GV);
844 }
845
846for (GlobalVariable *GV : LDSUsesInfo.direct_access[func]) {
847 UpdateMaxAlignment(GV);
848 }
849
850assert(func->hasName());// Checked by caller
851auto *emptyCharArray = ArrayType::get(Type::getInt8Ty(Ctx), 0);
852GlobalVariable *N =newGlobalVariable(
853 M, emptyCharArray,false,GlobalValue::ExternalLinkage,nullptr,
854Twine("llvm.amdgcn." +func->getName() +".dynlds"),nullptr,GlobalValue::NotThreadLocal,AMDGPUAS::LOCAL_ADDRESS,
855false);
856N->setAlignment(MaxDynamicAlignment);
857
858assert(AMDGPU::isDynamicLDS(*N));
859returnN;
860 }
861
862DenseMap<Function *, GlobalVariable *> lowerDynamicLDSVariables(
863Module &M,LDSUsesInfoTy &LDSUsesInfo,
864DenseSet<Function *>const &KernelsThatIndirectlyAllocateDynamicLDS,
865DenseSet<GlobalVariable *>const &DynamicVariables,
866 std::vector<Function *>const &OrderedKernels) {
867DenseMap<Function *, GlobalVariable *> KernelToCreatedDynamicLDS;
868if (!KernelsThatIndirectlyAllocateDynamicLDS.empty()) {
869LLVMContext &Ctx = M.getContext();
870IRBuilder<> Builder(Ctx);
871Type *I32 =Type::getInt32Ty(Ctx);
872
873 std::vector<Constant *> newDynamicLDS;
874
875// Table is built in the same order as OrderedKernels
876for (auto &func : OrderedKernels) {
877
878if (KernelsThatIndirectlyAllocateDynamicLDS.contains(func)) {
879assert(isKernelLDS(func));
880if (!func->hasName()) {
881report_fatal_error("Anonymous kernels cannot use LDS variables");
882 }
883
884GlobalVariable *N =
885 buildRepresentativeDynamicLDSInstance(M, LDSUsesInfo,func);
886
887 KernelToCreatedDynamicLDS[func] =N;
888
889 markUsedByKernel(func,N);
890
891auto *emptyCharArray = ArrayType::get(Type::getInt8Ty(Ctx), 0);
892auto *GEP =ConstantExpr::getGetElementPtr(
893 emptyCharArray,N, ConstantInt::get(I32, 0),true);
894 newDynamicLDS.push_back(ConstantExpr::getPtrToInt(GEP, I32));
895 }else {
896 newDynamicLDS.push_back(PoisonValue::get(I32));
897 }
898 }
899assert(OrderedKernels.size() == newDynamicLDS.size());
900
901ArrayType *t = ArrayType::get(I32, newDynamicLDS.size());
902Constant *init =ConstantArray::get(t, newDynamicLDS);
903GlobalVariable *table =newGlobalVariable(
904 M, t,true,GlobalValue::InternalLinkage, init,
905"llvm.amdgcn.dynlds.offset.table",nullptr,
906GlobalValue::NotThreadLocal,AMDGPUAS::CONSTANT_ADDRESS);
907
908for (GlobalVariable *GV : DynamicVariables) {
909for (Use &U :make_early_inc_range(GV->uses())) {
910auto *I = dyn_cast<Instruction>(U.getUser());
911if (!I)
912continue;
913if (isKernelLDS(I->getFunction()))
914continue;
915
916 replaceUseWithTableLookup(M, Builder, table, GV, U,nullptr);
917 }
918 }
919 }
920return KernelToCreatedDynamicLDS;
921 }
922
923staticGlobalVariable *uniquifyGVPerKernel(Module &M,GlobalVariable *GV,
924Function *KF) {
925bool NeedsReplacement =false;
926for (Use &U : GV->uses()) {
927if (auto *I = dyn_cast<Instruction>(U.getUser())) {
928Function *F =I->getFunction();
929if (isKernelLDS(F) &&F != KF) {
930 NeedsReplacement =true;
931break;
932 }
933 }
934 }
935if (!NeedsReplacement)
936return GV;
937// Create a new GV used only by this kernel and its function
938GlobalVariable *NewGV =newGlobalVariable(
939 M, GV->getValueType(), GV->isConstant(), GV->getLinkage(),
940 GV->getInitializer(), GV->getName() +"." + KF->getName(),nullptr,
941 GV->getThreadLocalMode(), GV->getType()->getAddressSpace());
942 NewGV->copyAttributesFrom(GV);
943for (Use &U :make_early_inc_range(GV->uses())) {
944if (auto *I = dyn_cast<Instruction>(U.getUser())) {
945Function *F =I->getFunction();
946if (!isKernelLDS(F) ||F == KF) {
947 U.getUser()->replaceUsesOfWith(GV, NewGV);
948 }
949 }
950 }
951return NewGV;
952 }
953
954bool lowerSpecialLDSVariables(
955Module &M,LDSUsesInfoTy &LDSUsesInfo,
956VariableFunctionMap &LDSToKernelsThatNeedToAccessItIndirectly) {
957bool Changed =false;
958// The 1st round: give module-absolute assignments
959int NumAbsolutes = 0;
960 std::vector<GlobalVariable *> OrderedGVs;
961for (auto &K : LDSToKernelsThatNeedToAccessItIndirectly) {
962GlobalVariable *GV = K.first;
963if (!isNamedBarrier(*GV))
964continue;
965// give a module-absolute assignment if it is indirectly accessed by
966// multiple kernels. This is not precise, but we don't want to duplicate
967// a function when it is called by multiple kernels.
968if (LDSToKernelsThatNeedToAccessItIndirectly[GV].size() > 1) {
969 OrderedGVs.push_back(GV);
970 }else {
971// leave it to the 2nd round, which will give a kernel-relative
972// assignment if it is only indirectly accessed by one kernel
973 LDSUsesInfo.direct_access[*K.second.begin()].insert(GV);
974 }
975 LDSToKernelsThatNeedToAccessItIndirectly.erase(GV);
976 }
977 OrderedGVs = sortByName(std::move(OrderedGVs));
978for (GlobalVariable *GV : OrderedGVs) {
979int BarId = ++NumAbsolutes;
980unsigned BarrierScope =llvm::AMDGPU::Barrier::BARRIER_SCOPE_WORKGROUP;
981// 4 bits for alignment, 5 bits for the barrier num,
982// 3 bits for the barrier scope
983unsignedOffset = 0x802000u | BarrierScope << 9 | BarId << 4;
984 recordLDSAbsoluteAddress(&M, GV,Offset);
985 }
986 OrderedGVs.clear();
987
988// The 2nd round: give a kernel-relative assignment for GV that
989// either only indirectly accessed by single kernel or only directly
990// accessed by multiple kernels.
991 std::vector<Function *> OrderedKernels;
992for (auto &K : LDSUsesInfo.direct_access) {
993Function *F = K.first;
994assert(isKernelLDS(F));
995 OrderedKernels.push_back(F);
996 }
997 OrderedKernels = sortByName(std::move(OrderedKernels));
998
999llvm::DenseMap<Function *, uint32_t> Kernel2BarId;
1000for (Function *F : OrderedKernels) {
1001for (GlobalVariable *GV : LDSUsesInfo.direct_access[F]) {
1002if (!isNamedBarrier(*GV))
1003continue;
1004
1005 LDSUsesInfo.direct_access[F].erase(GV);
1006if (GV->isAbsoluteSymbolRef()) {
1007// already assigned
1008continue;
1009 }
1010 OrderedGVs.push_back(GV);
1011 }
1012 OrderedGVs = sortByName(std::move(OrderedGVs));
1013for (GlobalVariable *GV : OrderedGVs) {
1014// GV could also be used directly by other kernels. If so, we need to
1015// create a new GV used only by this kernel and its function.
1016auto NewGV = uniquifyGVPerKernel(M, GV,F);
1017 Changed |= (NewGV != GV);
1018int BarId = (NumAbsolutes + 1);
1019if (Kernel2BarId.find(F) != Kernel2BarId.end()) {
1020 BarId = (Kernel2BarId[F] + 1);
1021 }
1022 Kernel2BarId[F] = BarId;
1023unsigned BarrierScope =llvm::AMDGPU::Barrier::BARRIER_SCOPE_WORKGROUP;
1024unsignedOffset = 0x802000u | BarrierScope << 9 | BarId << 4;
1025 recordLDSAbsoluteAddress(&M, NewGV,Offset);
1026 }
1027 OrderedGVs.clear();
1028 }
1029// Also erase those special LDS variables from indirect_access.
1030for (auto &K : LDSUsesInfo.indirect_access) {
1031assert(isKernelLDS(K.first));
1032for (GlobalVariable *GV : K.second) {
1033if (isNamedBarrier(*GV))
1034 K.second.erase(GV);
1035 }
1036 }
1037return Changed;
1038 }
1039
1040bool runOnModule(Module &M) {
1041CallGraph CG =CallGraph(M);
1042bool Changed = superAlignLDSGlobals(M);
1043
1044 Changed |=eliminateConstantExprUsesOfLDSFromAllInstructions(M);
1045
1046 Changed =true;// todo: narrow this down
1047
1048// For each kernel, what variables does it access directly or through
1049// callees
1050LDSUsesInfoTy LDSUsesInfo =getTransitiveUsesOfLDS(CG, M);
1051
1052// For each variable accessed through callees, which kernels access it
1053VariableFunctionMap LDSToKernelsThatNeedToAccessItIndirectly;
1054for (auto &K : LDSUsesInfo.indirect_access) {
1055Function *F = K.first;
1056assert(isKernelLDS(F));
1057for (GlobalVariable *GV : K.second) {
1058 LDSToKernelsThatNeedToAccessItIndirectly[GV].insert(F);
1059 }
1060 }
1061
1062if (LDSUsesInfo.HasSpecialGVs) {
1063// Special LDS variables need special address assignment
1064 Changed |= lowerSpecialLDSVariables(
1065 M, LDSUsesInfo, LDSToKernelsThatNeedToAccessItIndirectly);
1066 }
1067
1068// Partition variables accessed indirectly into the different strategies
1069DenseSet<GlobalVariable *> ModuleScopeVariables;
1070DenseSet<GlobalVariable *> TableLookupVariables;
1071DenseSet<GlobalVariable *> KernelAccessVariables;
1072DenseSet<GlobalVariable *> DynamicVariables;
1073 partitionVariablesIntoIndirectStrategies(
1074 M, LDSUsesInfo, LDSToKernelsThatNeedToAccessItIndirectly,
1075 ModuleScopeVariables, TableLookupVariables, KernelAccessVariables,
1076 DynamicVariables);
1077
1078// If the kernel accesses a variable that is going to be stored in the
1079// module instance through a call then that kernel needs to allocate the
1080// module instance
1081constDenseSet<Function *> KernelsThatAllocateModuleLDS =
1082 kernelsThatIndirectlyAccessAnyOfPassedVariables(M, LDSUsesInfo,
1083 ModuleScopeVariables);
1084constDenseSet<Function *> KernelsThatAllocateTableLDS =
1085 kernelsThatIndirectlyAccessAnyOfPassedVariables(M, LDSUsesInfo,
1086 TableLookupVariables);
1087
1088constDenseSet<Function *> KernelsThatIndirectlyAllocateDynamicLDS =
1089 kernelsThatIndirectlyAccessAnyOfPassedVariables(M, LDSUsesInfo,
1090 DynamicVariables);
1091
1092GlobalVariable *MaybeModuleScopeStruct = lowerModuleScopeStructVariables(
1093 M, ModuleScopeVariables, KernelsThatAllocateModuleLDS);
1094
1095DenseMap<Function *, LDSVariableReplacement> KernelToReplacement =
1096 lowerKernelScopeStructVariables(M, LDSUsesInfo, ModuleScopeVariables,
1097 KernelsThatAllocateModuleLDS,
1098 MaybeModuleScopeStruct);
1099
1100// Lower zero cost accesses to the kernel instances just created
1101for (auto &GV : KernelAccessVariables) {
1102auto &funcs = LDSToKernelsThatNeedToAccessItIndirectly[GV];
1103assert(funcs.size() == 1);// Only one kernel can access it
1104 LDSVariableReplacement Replacement =
1105 KernelToReplacement[*(funcs.begin())];
1106
1107DenseSet<GlobalVariable *> Vec;
1108 Vec.insert(GV);
1109
1110 replaceLDSVariablesWithStruct(M, Vec, Replacement, [](Use &U) {
1111return isa<Instruction>(U.getUser());
1112 });
1113 }
1114
1115// The ith element of this vector is kernel id i
1116 std::vector<Function *> OrderedKernels =
1117 assignLDSKernelIDToEachKernel(&M, KernelsThatAllocateTableLDS,
1118 KernelsThatIndirectlyAllocateDynamicLDS);
1119
1120if (!KernelsThatAllocateTableLDS.empty()) {
1121LLVMContext &Ctx = M.getContext();
1122IRBuilder<> Builder(Ctx);
1123
1124// The order must be consistent between lookup table and accesses to
1125// lookup table
1126auto TableLookupVariablesOrdered =
1127 sortByName(std::vector<GlobalVariable *>(TableLookupVariables.begin(),
1128 TableLookupVariables.end()));
1129
1130GlobalVariable *LookupTable = buildLookupTable(
1131 M, TableLookupVariablesOrdered, OrderedKernels, KernelToReplacement);
1132 replaceUsesInInstructionsWithTableLookup(M, TableLookupVariablesOrdered,
1133 LookupTable);
1134 }
1135
1136DenseMap<Function *, GlobalVariable *> KernelToCreatedDynamicLDS =
1137 lowerDynamicLDSVariables(M, LDSUsesInfo,
1138 KernelsThatIndirectlyAllocateDynamicLDS,
1139 DynamicVariables, OrderedKernels);
1140
1141// Strip amdgpu-no-lds-kernel-id from all functions reachable from the
1142// kernel. We may have inferred this wasn't used prior to the pass.
1143// TODO: We could filter out subgraphs that do not access LDS globals.
1144for (auto *KernelSet : {&KernelsThatIndirectlyAllocateDynamicLDS,
1145 &KernelsThatAllocateTableLDS})
1146for (Function *F : *KernelSet)
1147removeFnAttrFromReachable(CG,F, {"amdgpu-no-lds-kernel-id"});
1148
1149// All kernel frames have been allocated. Calculate and record the
1150// addresses.
1151 {
1152constDataLayout &DL = M.getDataLayout();
1153
1154for (Function &Func : M.functions()) {
1155if (Func.isDeclaration() || !isKernelLDS(&Func))
1156continue;
1157
1158// All three of these are optional. The first variable is allocated at
1159// zero. They are allocated by AMDGPUMachineFunction as one block.
1160// Layout:
1161//{
1162// module.lds
1163// alignment padding
1164// kernel instance
1165// alignment padding
1166// dynamic lds variables
1167//}
1168
1169constbool AllocateModuleScopeStruct =
1170 MaybeModuleScopeStruct &&
1171 KernelsThatAllocateModuleLDS.contains(&Func);
1172
1173auto Replacement = KernelToReplacement.find(&Func);
1174constbool AllocateKernelScopeStruct =
1175 Replacement != KernelToReplacement.end();
1176
1177constbool AllocateDynamicVariable =
1178 KernelToCreatedDynamicLDS.contains(&Func);
1179
1180uint32_tOffset = 0;
1181
1182if (AllocateModuleScopeStruct) {
1183// Allocated at zero, recorded once on construction, not once per
1184// kernel
1185Offset +=DL.getTypeAllocSize(MaybeModuleScopeStruct->getValueType());
1186 }
1187
1188if (AllocateKernelScopeStruct) {
1189GlobalVariable *KernelStruct = Replacement->second.SGV;
1190Offset =alignTo(Offset,AMDGPU::getAlign(DL, KernelStruct));
1191 recordLDSAbsoluteAddress(&M, KernelStruct,Offset);
1192Offset +=DL.getTypeAllocSize(KernelStruct->getValueType());
1193 }
1194
1195// If there is dynamic allocation, the alignment needed is included in
1196// the static frame size. There may be no reference to the dynamic
1197// variable in the kernel itself, so without including it here, that
1198// alignment padding could be missed.
1199if (AllocateDynamicVariable) {
1200GlobalVariable *DynamicVariable = KernelToCreatedDynamicLDS[&Func];
1201Offset =alignTo(Offset,AMDGPU::getAlign(DL, DynamicVariable));
1202 recordLDSAbsoluteAddress(&M, DynamicVariable,Offset);
1203 }
1204
1205if (Offset != 0) {
1206 (void)TM;// TODO: Account for target maximum LDS
1207 std::string Buffer;
1208raw_string_ostream SS{Buffer};
1209 SS <<format("%u",Offset);
1210
1211// Instead of explicitly marking kernels that access dynamic variables
1212// using special case metadata, annotate with min-lds == max-lds, i.e.
1213// that there is no more space available for allocating more static
1214// LDS variables. That is the right condition to prevent allocating
1215// more variables which would collide with the addresses assigned to
1216// dynamic variables.
1217if (AllocateDynamicVariable)
1218 SS <<format(",%u",Offset);
1219
1220 Func.addFnAttr("amdgpu-lds-size", Buffer);
1221 }
1222 }
1223 }
1224
1225for (auto &GV :make_early_inc_range(M.globals()))
1226if (AMDGPU::isLDSVariableToLower(GV)) {
1227// probably want to remove from used lists
1228 GV.removeDeadConstantUsers();
1229if (GV.use_empty())
1230 GV.eraseFromParent();
1231 }
1232
1233return Changed;
1234 }
1235
1236private:
1237// Increase the alignment of LDS globals if necessary to maximise the chance
1238// that we can use aligned LDS instructions to access them.
1239staticbool superAlignLDSGlobals(Module &M) {
1240constDataLayout &DL = M.getDataLayout();
1241bool Changed =false;
1242if (!SuperAlignLDSGlobals) {
1243return Changed;
1244 }
1245
1246for (auto &GV : M.globals()) {
1247if (GV.getType()->getPointerAddressSpace() !=AMDGPUAS::LOCAL_ADDRESS) {
1248// Only changing alignment of LDS variables
1249continue;
1250 }
1251if (!GV.hasInitializer()) {
1252// cuda/hip extern __shared__ variable, leave alignment alone
1253continue;
1254 }
1255
1256if (GV.isAbsoluteSymbolRef()) {
1257// If the variable is already allocated, don't change the alignment
1258continue;
1259 }
1260
1261Align Alignment =AMDGPU::getAlign(DL, &GV);
1262TypeSize GVSize =DL.getTypeAllocSize(GV.getValueType());
1263
1264if (GVSize > 8) {
1265// We might want to use a b96 or b128 load/store
1266 Alignment = std::max(Alignment,Align(16));
1267 }elseif (GVSize > 4) {
1268// We might want to use a b64 load/store
1269 Alignment = std::max(Alignment,Align(8));
1270 }elseif (GVSize > 2) {
1271// We might want to use a b32 load/store
1272 Alignment = std::max(Alignment,Align(4));
1273 }elseif (GVSize > 1) {
1274// We might want to use a b16 load/store
1275 Alignment = std::max(Alignment,Align(2));
1276 }
1277
1278if (Alignment !=AMDGPU::getAlign(DL, &GV)) {
1279 Changed =true;
1280 GV.setAlignment(Alignment);
1281 }
1282 }
1283return Changed;
1284 }
1285
1286static LDSVariableReplacement createLDSVariableReplacement(
1287Module &M, std::string VarName,
1288DenseSet<GlobalVariable *>const &LDSVarsToTransform) {
1289// Create a struct instance containing LDSVarsToTransform and map from those
1290// variables to ConstantExprGEP
1291// Variables may be introduced to meet alignment requirements. No aliasing
1292// metadata is useful for these as they have no uses. Erased before return.
1293
1294LLVMContext &Ctx = M.getContext();
1295constDataLayout &DL = M.getDataLayout();
1296assert(!LDSVarsToTransform.empty());
1297
1298SmallVector<OptimizedStructLayoutField, 8> LayoutFields;
1299 LayoutFields.reserve(LDSVarsToTransform.size());
1300 {
1301// The order of fields in this struct depends on the order of
1302// variables in the argument which varies when changing how they
1303// are identified, leading to spurious test breakage.
1304auto Sorted = sortByName(std::vector<GlobalVariable *>(
1305 LDSVarsToTransform.begin(), LDSVarsToTransform.end()));
1306
1307for (GlobalVariable *GV : Sorted) {
1308OptimizedStructLayoutFieldF(GV,
1309DL.getTypeAllocSize(GV->getValueType()),
1310AMDGPU::getAlign(DL, GV));
1311 LayoutFields.emplace_back(F);
1312 }
1313 }
1314
1315performOptimizedStructLayout(LayoutFields);
1316
1317 std::vector<GlobalVariable *> LocalVars;
1318BitVector IsPaddingField;
1319 LocalVars.reserve(LDSVarsToTransform.size());// will be at least this large
1320 IsPaddingField.reserve(LDSVarsToTransform.size());
1321 {
1322uint64_t CurrentOffset = 0;
1323for (auto &F : LayoutFields) {
1324GlobalVariable *FGV =
1325static_cast<GlobalVariable *>(const_cast<void *>(F.Id));
1326Align DataAlign =F.Alignment;
1327
1328uint64_t DataAlignV = DataAlign.value();
1329if (uint64_t Rem = CurrentOffset % DataAlignV) {
1330uint64_t Padding = DataAlignV - Rem;
1331
1332// Append an array of padding bytes to meet alignment requested
1333// Note (o + (a - (o % a)) ) % a == 0
1334// (offset + Padding ) % align == 0
1335
1336Type *ATy = ArrayType::get(Type::getInt8Ty(Ctx), Padding);
1337 LocalVars.push_back(newGlobalVariable(
1338 M, ATy,false,GlobalValue::InternalLinkage,
1339PoisonValue::get(ATy),"",nullptr,GlobalValue::NotThreadLocal,
1340AMDGPUAS::LOCAL_ADDRESS,false));
1341 IsPaddingField.push_back(true);
1342 CurrentOffset += Padding;
1343 }
1344
1345 LocalVars.push_back(FGV);
1346 IsPaddingField.push_back(false);
1347 CurrentOffset +=F.Size;
1348 }
1349 }
1350
1351 std::vector<Type *> LocalVarTypes;
1352 LocalVarTypes.reserve(LocalVars.size());
1353 std::transform(
1354 LocalVars.cbegin(), LocalVars.cend(), std::back_inserter(LocalVarTypes),
1355 [](constGlobalVariable *V) ->Type * { return V->getValueType(); });
1356
1357StructType *LDSTy =StructType::create(Ctx, LocalVarTypes, VarName +".t");
1358
1359Align StructAlign =AMDGPU::getAlign(DL, LocalVars[0]);
1360
1361GlobalVariable *SGV =newGlobalVariable(
1362 M, LDSTy,false,GlobalValue::InternalLinkage,PoisonValue::get(LDSTy),
1363 VarName,nullptr,GlobalValue::NotThreadLocal,AMDGPUAS::LOCAL_ADDRESS,
1364false);
1365 SGV->setAlignment(StructAlign);
1366
1367DenseMap<GlobalVariable *, Constant *> Map;
1368Type *I32 =Type::getInt32Ty(Ctx);
1369for (size_tI = 0;I < LocalVars.size();I++) {
1370GlobalVariable *GV = LocalVars[I];
1371Constant *GEPIdx[] = {ConstantInt::get(I32, 0), ConstantInt::get(I32,I)};
1372Constant *GEP =ConstantExpr::getGetElementPtr(LDSTy, SGV, GEPIdx,true);
1373if (IsPaddingField[I]) {
1374assert(GV->use_empty());
1375 GV->eraseFromParent();
1376 }else {
1377 Map[GV] =GEP;
1378 }
1379 }
1380assert(Map.size() == LDSVarsToTransform.size());
1381return {SGV, std::move(Map)};
1382 }
1383
1384template <typename PredicateTy>
1385staticvoid replaceLDSVariablesWithStruct(
1386Module &M,DenseSet<GlobalVariable *>const &LDSVarsToTransformArg,
1387const LDSVariableReplacement &Replacement, PredicateTyPredicate) {
1388LLVMContext &Ctx = M.getContext();
1389constDataLayout &DL = M.getDataLayout();
1390
1391// A hack... we need to insert the aliasing info in a predictable order for
1392// lit tests. Would like to have them in a stable order already, ideally the
1393// same order they get allocated, which might mean an ordered set container
1394auto LDSVarsToTransform = sortByName(std::vector<GlobalVariable *>(
1395 LDSVarsToTransformArg.begin(), LDSVarsToTransformArg.end()));
1396
1397// Create alias.scope and their lists. Each field in the new structure
1398// does not alias with all other fields.
1399SmallVector<MDNode *> AliasScopes;
1400SmallVector<Metadata *> NoAliasList;
1401constsize_t NumberVars = LDSVarsToTransform.size();
1402if (NumberVars > 1) {
1403MDBuilder MDB(Ctx);
1404 AliasScopes.reserve(NumberVars);
1405MDNode *Domain = MDB.createAnonymousAliasScopeDomain();
1406for (size_tI = 0;I < NumberVars;I++) {
1407MDNode *Scope = MDB.createAnonymousAliasScope(Domain);
1408 AliasScopes.push_back(Scope);
1409 }
1410 NoAliasList.append(&AliasScopes[1], AliasScopes.end());
1411 }
1412
1413// Replace uses of ith variable with a constantexpr to the corresponding
1414// field of the instance that will be allocated by AMDGPUMachineFunction
1415for (size_tI = 0;I < NumberVars;I++) {
1416GlobalVariable *GV = LDSVarsToTransform[I];
1417Constant *GEP = Replacement.LDSVarsToConstantGEP.at(GV);
1418
1419 GV->replaceUsesWithIf(GEP,Predicate);
1420
1421APInt APOff(DL.getIndexTypeSizeInBits(GEP->getType()), 0);
1422GEP->stripAndAccumulateInBoundsConstantOffsets(DL, APOff);
1423uint64_tOffset = APOff.getZExtValue();
1424
1425AlignA =
1426commonAlignment(Replacement.SGV->getAlign().valueOrOne(),Offset);
1427
1428if (I)
1429 NoAliasList[I - 1] = AliasScopes[I - 1];
1430MDNode *NoAlias =
1431 NoAliasList.empty() ? nullptr :MDNode::get(Ctx, NoAliasList);
1432MDNode *AliasScope =
1433 AliasScopes.empty() ? nullptr :MDNode::get(Ctx, {AliasScopes[I]});
1434
1435 refineUsesAlignmentAndAA(GEP,A,DL, AliasScope, NoAlias);
1436 }
1437 }
1438
1439staticvoid refineUsesAlignmentAndAA(Value *Ptr,AlignA,
1440constDataLayout &DL,MDNode *AliasScope,
1441MDNode *NoAlias,unsigned MaxDepth = 5) {
1442if (!MaxDepth || (A == 1 && !AliasScope))
1443return;
1444
1445for (User *U :Ptr->users()) {
1446if (auto *I = dyn_cast<Instruction>(U)) {
1447if (AliasScope &&I->mayReadOrWriteMemory()) {
1448MDNode *AS =I->getMetadata(LLVMContext::MD_alias_scope);
1449 AS = (AS ?MDNode::getMostGenericAliasScope(AS, AliasScope)
1450 : AliasScope);
1451I->setMetadata(LLVMContext::MD_alias_scope, AS);
1452
1453MDNode *NA =I->getMetadata(LLVMContext::MD_noalias);
1454 NA = (NA ?MDNode::intersect(NA, NoAlias) : NoAlias);
1455I->setMetadata(LLVMContext::MD_noalias, NA);
1456 }
1457 }
1458
1459if (auto *LI = dyn_cast<LoadInst>(U)) {
1460 LI->setAlignment(std::max(A, LI->getAlign()));
1461continue;
1462 }
1463if (auto *SI = dyn_cast<StoreInst>(U)) {
1464if (SI->getPointerOperand() ==Ptr)
1465 SI->setAlignment(std::max(A, SI->getAlign()));
1466continue;
1467 }
1468if (auto *AI = dyn_cast<AtomicRMWInst>(U)) {
1469// None of atomicrmw operations can work on pointers, but let's
1470// check it anyway in case it will or we will process ConstantExpr.
1471if (AI->getPointerOperand() ==Ptr)
1472 AI->setAlignment(std::max(A, AI->getAlign()));
1473continue;
1474 }
1475if (auto *AI = dyn_cast<AtomicCmpXchgInst>(U)) {
1476if (AI->getPointerOperand() ==Ptr)
1477 AI->setAlignment(std::max(A, AI->getAlign()));
1478continue;
1479 }
1480if (auto *GEP = dyn_cast<GetElementPtrInst>(U)) {
1481unsignedBitWidth =DL.getIndexTypeSizeInBits(GEP->getType());
1482APInt Off(BitWidth, 0);
1483if (GEP->getPointerOperand() ==Ptr) {
1484Align GA;
1485if (GEP->accumulateConstantOffset(DL, Off))
1486 GA =commonAlignment(A, Off.getLimitedValue());
1487 refineUsesAlignmentAndAA(GEP, GA,DL, AliasScope, NoAlias,
1488 MaxDepth - 1);
1489 }
1490continue;
1491 }
1492if (auto *I = dyn_cast<Instruction>(U)) {
1493if (I->getOpcode() == Instruction::BitCast ||
1494I->getOpcode() == Instruction::AddrSpaceCast)
1495 refineUsesAlignmentAndAA(I,A,DL, AliasScope, NoAlias, MaxDepth - 1);
1496 }
1497 }
1498 }
1499};
1500
1501classAMDGPULowerModuleLDSLegacy :publicModulePass {
1502public:
1503constAMDGPUTargetMachine *TM;
1504staticcharID;
1505
1506 AMDGPULowerModuleLDSLegacy(constAMDGPUTargetMachine *TM_ =nullptr)
1507 :ModulePass(ID), TM(TM_) {
1508initializeAMDGPULowerModuleLDSLegacyPass(*PassRegistry::getPassRegistry());
1509 }
1510
1511voidgetAnalysisUsage(AnalysisUsage &AU) const override{
1512if (!TM)
1513 AU.addRequired<TargetPassConfig>();
1514 }
1515
1516boolrunOnModule(Module &M) override{
1517if (!TM) {
1518auto &TPC = getAnalysis<TargetPassConfig>();
1519 TM = &TPC.getTM<AMDGPUTargetMachine>();
1520 }
1521
1522return AMDGPULowerModuleLDS(*TM).runOnModule(M);
1523 }
1524};
1525
1526}// namespace
1527char AMDGPULowerModuleLDSLegacy::ID = 0;
1528
1529char &llvm::AMDGPULowerModuleLDSLegacyPassID = AMDGPULowerModuleLDSLegacy::ID;
1530
1531INITIALIZE_PASS_BEGIN(AMDGPULowerModuleLDSLegacy,DEBUG_TYPE,
1532"Lower uses of LDS variables from non-kernel functions",
1533false,false)
1534INITIALIZE_PASS_DEPENDENCY(TargetPassConfig)
1535INITIALIZE_PASS_END(AMDGPULowerModuleLDSLegacy,DEBUG_TYPE,
1536 "Loweruses ofLDS variables from non-kernelfunctions",
1537false,false)
1538
1539ModulePass *
1540llvm::createAMDGPULowerModuleLDSLegacyPass(constAMDGPUTargetMachine *TM) {
1541returnnew AMDGPULowerModuleLDSLegacy(TM);
1542}
1543
1544PreservedAnalysesAMDGPULowerModuleLDSPass::run(Module &M,
1545ModuleAnalysisManager &) {
1546return AMDGPULowerModuleLDS(TM).runOnModule(M) ?PreservedAnalyses::none()
1547 :PreservedAnalyses::all();
1548}
const
aarch64 promote const
Definition:AArch64PromoteConstant.cpp:230
AMDGPUBaseInfo.h
functions
Lower uses of LDS variables from non kernel functions
Definition:AMDGPULowerModuleLDSPass.cpp:1536
DEBUG_TYPE
#define DEBUG_TYPE
Definition:AMDGPULowerModuleLDSPass.cpp:212
AMDGPUMemoryUtils.h
LDS
AMDGPU promote alloca to vector or LDS
Definition:AMDGPUPromoteAlloca.cpp:209
AMDGPUTargetMachine.h
The AMDGPU TargetMachine interface definition for hw codegen targets.
AMDGPU.h
DL
MachineBasicBlock MachineBasicBlock::iterator DebugLoc DL
Definition:ARMSLSHardening.cpp:73
BasicBlockUtils.h
BitVector.h
This file implements the BitVector class.
A
static GCRegistry::Add< ErlangGC > A("erlang", "erlang-compatible garbage collector")
CallGraph.h
This file provides interfaces used to build and manipulate a call graph, which is a very useful tool ...
CommandLine.h
clEnumValN
#define clEnumValN(ENUMVAL, FLAGNAME, DESC)
Definition:CommandLine.h:686
Constants.h
This file contains the declarations for the subclasses of Constant, which represent the different fla...
Domain
Domain
Definition:CorrelatedValuePropagation.cpp:744
uses
Given that RA is a live propagate it s liveness to any other values it uses(according to Uses). void DeadArgumentEliminationPass
Definition:DeadArgumentElimination.cpp:717
Debug.h
DenseMap.h
This file defines the DenseMap class.
DenseSet.h
This file defines the DenseSet and SmallDenseSet classes.
DerivedTypes.h
Size
uint64_t Size
Definition:ELFObjHandler.cpp:81
Other
std::optional< std::vector< StOtherPiece > > Other
Definition:ELFYAML.cpp:1315
Format.h
func
global merge func
Definition:GlobalMergeFunctions.cpp:622
GEP
Hexagon Common GEP
Definition:HexagonCommonGEP.cpp:170
IRBuilder.h
InitializePasses.h
InlineAsm.h
Instructions.h
F
#define F(x, y, z)
Definition:MD5.cpp:55
I
#define I(x, y, z)
Definition:MD5.cpp:58
MDBuilder.h
ModuleUtils.h
OptimizedStructLayout.h
This file provides an interface for laying out a sequence of fields as a struct in a way that attempt...
INITIALIZE_PASS_DEPENDENCY
#define INITIALIZE_PASS_DEPENDENCY(depName)
Definition:PassSupport.h:55
INITIALIZE_PASS_END
#define INITIALIZE_PASS_END(passName, arg, name, cfg, analysis)
Definition:PassSupport.h:57
INITIALIZE_PASS_BEGIN
#define INITIALIZE_PASS_BEGIN(passName, arg, name, cfg, analysis)
Definition:PassSupport.h:52
Pass.h
ReplaceConstant.h
assert
assert(ImpDefSCC.getReg()==AMDGPU::SCC &&ImpDefSCC.isDef())
Address
@ Address
Definition:SPIRVEmitNonSemanticDI.cpp:68
STLExtras.h
This file contains some templates that are useful if you are working with the STL at all.
SetOperations.h
This file defines generic set operations that may be used on set's of different types,...
Ptr
@ Ptr
Definition:TargetLibraryInfo.cpp:77
TargetPassConfig.h
Target-Independent Code Generator Pass Configuration Options pass.
ArrayType
Definition:ItaniumDemangle.h:785
Predicate
Definition:AMDGPURegBankLegalizeRules.cpp:332
llvm::AMDGPUTargetMachine
Definition:AMDGPUTargetMachine.h:31
llvm::APInt
Class for arbitrary precision integers.
Definition:APInt.h:78
llvm::APInt::getZExtValue
uint64_t getZExtValue() const
Get zero extended value.
Definition:APInt.h:1520
llvm::AnalysisManager
A container for analyses that lazily runs them and caches their results.
Definition:PassManager.h:253
llvm::AnalysisUsage
Represent the analysis usage information of a pass.
Definition:PassAnalysisSupport.h:47
llvm::AnalysisUsage::addRequired
AnalysisUsage & addRequired()
Definition:PassAnalysisSupport.h:75
llvm::ArrayRef
ArrayRef - Represent a constant reference to an array (0 or more elements consecutively in memory),...
Definition:ArrayRef.h:41
llvm::ArrayRef::size
size_t size() const
size - Get the array size.
Definition:ArrayRef.h:168
llvm::ArrayRef::empty
bool empty() const
empty - Check if the array is empty.
Definition:ArrayRef.h:163
llvm::BasicBlock
LLVM Basic Block Representation.
Definition:BasicBlock.h:61
llvm::BasicBlock::getFirstInsertionPt
const_iterator getFirstInsertionPt() const
Returns an iterator to the first instruction in this block that is suitable for inserting a non-PHI i...
Definition:BasicBlock.cpp:437
llvm::BitVector
Definition:BitVector.h:82
llvm::BitVector::reserve
void reserve(unsigned N)
Definition:BitVector.h:348
llvm::BitVector::push_back
void push_back(bool Val)
Definition:BitVector.h:466
llvm::CallGraph
The basic data container for the call graph of a Module of IR.
Definition:CallGraph.h:71
llvm::ConstantArray::get
static Constant * get(ArrayType *T, ArrayRef< Constant * > V)
Definition:Constants.cpp:1312
llvm::ConstantAsMetadata::get
static ConstantAsMetadata * get(Constant *C)
Definition:Metadata.h:532
llvm::ConstantExpr::getPointerBitCastOrAddrSpaceCast
static Constant * getPointerBitCastOrAddrSpaceCast(Constant *C, Type *Ty)
Create a BitCast or AddrSpaceCast for a pointer type depending on the address space.
Definition:Constants.cpp:2268
llvm::ConstantExpr::getPtrToInt
static Constant * getPtrToInt(Constant *C, Type *Ty, bool OnlyIfReduced=false)
Definition:Constants.cpp:2293
llvm::ConstantExpr::getGetElementPtr
static Constant * getGetElementPtr(Type *Ty, Constant *C, ArrayRef< Constant * > IdxList, GEPNoWrapFlags NW=GEPNoWrapFlags::none(), std::optional< ConstantRange > InRange=std::nullopt, Type *OnlyIfReducedTy=nullptr)
Getelementptr form.
Definition:Constants.h:1267
llvm::Constant
This is an important base class in LLVM.
Definition:Constant.h:42
llvm::Constant::removeDeadConstantUsers
void removeDeadConstantUsers() const
If there are any dead constant users dangling off of this constant, remove them.
Definition:Constants.cpp:739
llvm::DataLayout
A parsed version of the target data layout string in and methods for querying it.
Definition:DataLayout.h:63
llvm::DenseMapBase::find
iterator find(const_arg_type_t< KeyT > Val)
Definition:DenseMap.h:156
llvm::DenseMapBase::try_emplace
std::pair< iterator, bool > try_emplace(KeyT &&Key, Ts &&...Args)
Definition:DenseMap.h:226
llvm::DenseMapBase::erase
bool erase(const KeyT &Val)
Definition:DenseMap.h:321
llvm::DenseMapBase::end
iterator end()
Definition:DenseMap.h:84
llvm::DenseMapBase::contains
bool contains(const_arg_type_t< KeyT > Val) const
Return true if the specified key is in the map, false otherwise.
Definition:DenseMap.h:147
llvm::DenseMapBase::insert
std::pair< iterator, bool > insert(const std::pair< KeyT, ValueT > &KV)
Definition:DenseMap.h:211
llvm::DenseMap
Definition:DenseMap.h:727
llvm::DenseSet
Implements a dense probed hash-table based set.
Definition:DenseSet.h:278
llvm::Function
Definition:Function.h:63
llvm::GlobalObject::setMetadata
void setMetadata(unsigned KindID, MDNode *Node)
Set a particular kind of metadata attachment.
Definition:Metadata.cpp:1531
llvm::GlobalObject::setAlignment
void setAlignment(Align Align)
Sets the alignment attribute of the GlobalObject.
Definition:Globals.cpp:143
llvm::GlobalValue
Definition:GlobalValue.h:48
llvm::GlobalValue::NotThreadLocal
@ NotThreadLocal
Definition:GlobalValue.h:197
llvm::GlobalValue::getLinkage
LinkageTypes getLinkage() const
Definition:GlobalValue.h:547
llvm::GlobalValue::isAbsoluteSymbolRef
bool isAbsoluteSymbolRef() const
Returns whether this is a reference to an absolute symbol.
Definition:Globals.cpp:405
llvm::GlobalValue::getThreadLocalMode
ThreadLocalMode getThreadLocalMode() const
Definition:GlobalValue.h:272
llvm::GlobalValue::getType
PointerType * getType() const
Global values are always pointers.
Definition:GlobalValue.h:295
llvm::GlobalValue::InternalLinkage
@ InternalLinkage
Rename collisions when linking (static functions).
Definition:GlobalValue.h:59
llvm::GlobalValue::ExternalLinkage
@ ExternalLinkage
Externally visible function.
Definition:GlobalValue.h:52
llvm::GlobalValue::getValueType
Type * getValueType() const
Definition:GlobalValue.h:297
llvm::GlobalVariable
Definition:GlobalVariable.h:39
llvm::GlobalVariable::getInitializer
const Constant * getInitializer() const
getInitializer - Return the initializer for this global variable.
Definition:GlobalVariable.h:150
llvm::GlobalVariable::hasInitializer
bool hasInitializer() const
Definitions have initializers, declarations don't.
Definition:GlobalVariable.h:106
llvm::GlobalVariable::copyAttributesFrom
void copyAttributesFrom(const GlobalVariable *Src)
copyAttributesFrom - copy all additional attributes (those not needed to create a GlobalVariable) fro...
Definition:Globals.cpp:521
llvm::GlobalVariable::isConstant
bool isConstant() const
If the value is a global constant, its value is immutable throughout the runtime execution of the pro...
Definition:GlobalVariable.h:173
llvm::GlobalVariable::eraseFromParent
void eraseFromParent()
eraseFromParent - This method unlinks 'this' from the containing module and deletes it.
Definition:Globals.cpp:488
llvm::IRBuilderBase::CreateIntToPtr
Value * CreateIntToPtr(Value *V, Type *DestTy, const Twine &Name="")
Definition:IRBuilder.h:2147
llvm::IRBuilderBase::CreateConstInBoundsGEP1_32
Value * CreateConstInBoundsGEP1_32(Type *Ty, Value *Ptr, unsigned Idx0, const Twine &Name="")
Definition:IRBuilder.h:1897
llvm::IRBuilderBase::CreateInBoundsGEP
Value * CreateInBoundsGEP(Type *Ty, Value *Ptr, ArrayRef< Value * > IdxList, const Twine &Name="")
Definition:IRBuilder.h:1882
llvm::IRBuilderBase::CreateIntrinsic
CallInst * CreateIntrinsic(Intrinsic::ID ID, ArrayRef< Type * > Types, ArrayRef< Value * > Args, FMFSource FMFSource={}, const Twine &Name="")
Create a call to intrinsic ID with Args, mangled using Types.
Definition:IRBuilder.cpp:900
llvm::IRBuilderBase::getInt32
ConstantInt * getInt32(uint32_t C)
Get a constant 32-bit value.
Definition:IRBuilder.h:505
llvm::IRBuilderBase::CreateLoad
LoadInst * CreateLoad(Type *Ty, Value *Ptr, const char *Name)
Provided to resolve 'CreateLoad(Ty, Ptr, "...")' correctly, instead of converting the string to 'bool...
Definition:IRBuilder.h:1798
llvm::IRBuilderBase::CreateCall
CallInst * CreateCall(FunctionType *FTy, Value *Callee, ArrayRef< Value * > Args={}, const Twine &Name="", MDNode *FPMathTag=nullptr)
Definition:IRBuilder.h:2449
llvm::IRBuilderBase::SetInsertPoint
void SetInsertPoint(BasicBlock *TheBB)
This specifies that created instructions should be appended to the end of the specified block.
Definition:IRBuilder.h:199
llvm::IRBuilder
This provides a uniform API for creating instructions and inserting them into a basic block: either a...
Definition:IRBuilder.h:2705
llvm::Instruction
Definition:Instruction.h:68
llvm::LLVMContext
This is an important class for using LLVM in a threaded context.
Definition:LLVMContext.h:67
llvm::MDBuilder
Definition:MDBuilder.h:36
llvm::MDBuilder::createAnonymousAliasScope
MDNode * createAnonymousAliasScope(MDNode *Domain, StringRef Name=StringRef())
Return metadata appropriate for an alias scope root node.
Definition:MDBuilder.h:174
llvm::MDBuilder::createAnonymousAliasScopeDomain
MDNode * createAnonymousAliasScopeDomain(StringRef Name=StringRef())
Return metadata appropriate for an alias scope domain node.
Definition:MDBuilder.h:167
llvm::MDNode
Metadata node.
Definition:Metadata.h:1073
llvm::MDNode::getMostGenericAliasScope
static MDNode * getMostGenericAliasScope(MDNode *A, MDNode *B)
Definition:Metadata.cpp:1141
llvm::MDNode::get
static MDTuple * get(LLVMContext &Context, ArrayRef< Metadata * > MDs)
Definition:Metadata.h:1549
llvm::MDNode::intersect
static MDNode * intersect(MDNode *A, MDNode *B)
Definition:Metadata.cpp:1128
llvm::Metadata
Root of the metadata hierarchy.
Definition:Metadata.h:62
llvm::ModulePass
ModulePass class - This class is used to implement unstructured interprocedural optimizations and ana...
Definition:Pass.h:251
llvm::ModulePass::runOnModule
virtual bool runOnModule(Module &M)=0
runOnModule - Virtual method overriden by subclasses to process the module being operated on.
llvm::Module
A Module instance is used to store all the information related to an LLVM module.
Definition:Module.h:65
llvm::OperandBundleDefT
A container for an operand bundle being viewed as a set of values rather than a set of uses.
Definition:InstrTypes.h:1065
llvm::PassRegistry::getPassRegistry
static PassRegistry * getPassRegistry()
getPassRegistry - Access the global registry object, which is automatically initialized at applicatio...
Definition:PassRegistry.cpp:24
llvm::Pass::getAnalysisUsage
virtual void getAnalysisUsage(AnalysisUsage &) const
getAnalysisUsage - This function should be overriden by passes that need analysis information to do t...
Definition:Pass.cpp:98
llvm::PointerType::getAddressSpace
unsigned getAddressSpace() const
Return the address space of the Pointer type.
Definition:DerivedTypes.h:703
llvm::PoisonValue::get
static PoisonValue * get(Type *T)
Static factory methods - Return an 'poison' object of the specified type.
Definition:Constants.cpp:1878
llvm::PreservedAnalyses
A set of analyses that are preserved following a run of a transformation pass.
Definition:Analysis.h:111
llvm::PreservedAnalyses::none
static PreservedAnalyses none()
Convenience factory function for the empty preserved set.
Definition:Analysis.h:114
llvm::PreservedAnalyses::all
static PreservedAnalyses all()
Construct a special preserved set that preserves all passes.
Definition:Analysis.h:117
llvm::SmallPtrSetImpl::count
size_type count(ConstPtrType Ptr) const
count - Return 1 if the specified pointer is in the set, 0 otherwise.
Definition:SmallPtrSet.h:452
llvm::SmallPtrSetImpl::insert
std::pair< iterator, bool > insert(PtrType Ptr)
Inserts Ptr if and only if there is no element in the container equal to Ptr.
Definition:SmallPtrSet.h:384
llvm::SmallPtrSet
SmallPtrSet - This class implements a set which is optimized for holding SmallSize or less elements.
Definition:SmallPtrSet.h:519
llvm::SmallVectorBase::empty
bool empty() const
Definition:SmallVector.h:81
llvm::SmallVectorImpl::emplace_back
reference emplace_back(ArgTypes &&... Args)
Definition:SmallVector.h:937
llvm::SmallVectorImpl::reserve
void reserve(size_type N)
Definition:SmallVector.h:663
llvm::SmallVectorImpl::append
void append(ItTy in_start, ItTy in_end)
Add the specified range to the end of the SmallVector.
Definition:SmallVector.h:683
llvm::SmallVectorTemplateBase::push_back
void push_back(const T &Elt)
Definition:SmallVector.h:413
llvm::SmallVectorTemplateCommon::end
iterator end()
Definition:SmallVector.h:269
llvm::SmallVector
This is a 'vector' (really, a variable-sized array), optimized for the case when the array is small.
Definition:SmallVector.h:1196
llvm::StructType
Class to represent struct types.
Definition:DerivedTypes.h:218
llvm::StructType::create
static StructType * create(LLVMContext &Context, StringRef Name)
This creates an identified struct.
Definition:Type.cpp:612
llvm::TargetPassConfig
Target-Independent Code Generator Pass Configuration Options.
Definition:TargetPassConfig.h:85
llvm::Twine
Twine - A lightweight data structure for efficiently representing the concatenation of temporary valu...
Definition:Twine.h:81
llvm::TypeSize
Definition:TypeSize.h:334
llvm::Type
The instances of the Type class are immutable: once they are created, they are never changed.
Definition:Type.h:45
llvm::Type::getPointerAddressSpace
unsigned getPointerAddressSpace() const
Get the address space of this pointer or pointer vector type.
llvm::Type::getInt8Ty
static IntegerType * getInt8Ty(LLVMContext &C)
llvm::Type::getInt32Ty
static IntegerType * getInt32Ty(LLVMContext &C)
llvm::Use
A Use represents the edge between a Value definition and its users.
Definition:Use.h:43
llvm::User
Definition:User.h:44
llvm::Value
LLVM Value Representation.
Definition:Value.h:74
llvm::Value::replaceUsesWithIf
void replaceUsesWithIf(Value *New, llvm::function_ref< bool(Use &U)> ShouldReplace)
Go through the uses list for this definition and make each use point to "V" if the callback ShouldRep...
Definition:Value.cpp:542
llvm::Value::use_empty
bool use_empty() const
Definition:Value.h:344
llvm::Value::uses
iterator_range< use_iterator > uses()
Definition:Value.h:376
llvm::Value::getName
StringRef getName() const
Return a constant reference to the value's name.
Definition:Value.cpp:309
llvm::cl::opt
Definition:CommandLine.h:1423
llvm::detail::DenseSetImpl::insert
std::pair< iterator, bool > insert(const ValueT &V)
Definition:DenseSet.h:213
llvm::detail::DenseSetImpl::end
iterator end()
Definition:DenseSet.h:182
llvm::detail::DenseSetImpl::size
size_type size() const
Definition:DenseSet.h:81
llvm::detail::DenseSetImpl::empty
bool empty() const
Definition:DenseSet.h:80
llvm::detail::DenseSetImpl::contains
bool contains(const_arg_type_t< ValueT > V) const
Check if the set contains the given element.
Definition:DenseSet.h:193
llvm::detail::DenseSetImpl::begin
iterator begin()
Definition:DenseSet.h:181
llvm::detail::DenseSetImpl::erase
bool erase(const ValueT &V)
Definition:DenseSet.h:97
llvm::raw_string_ostream
A raw_ostream that writes to an std::string.
Definition:raw_ostream.h:661
uint32_t
uint64_t
unsigned
false
Definition:StackSlotColoring.cpp:193
llvm::AMDGPUAS::LOCAL_ADDRESS
@ LOCAL_ADDRESS
Address space for local memory.
Definition:AMDGPUAddrSpace.h:35
llvm::AMDGPUAS::CONSTANT_ADDRESS
@ CONSTANT_ADDRESS
Address space for constant memory (VTX2).
Definition:AMDGPUAddrSpace.h:34
llvm::AMDGPU::Barrier::BARRIER_SCOPE_WORKGROUP
@ BARRIER_SCOPE_WORKGROUP
Definition:SIDefines.h:1081
llvm::AMDGPU::isDynamicLDS
bool isDynamicLDS(const GlobalVariable &GV)
Definition:AMDGPUMemoryUtils.cpp:56
llvm::AMDGPU::removeFnAttrFromReachable
void removeFnAttrFromReachable(CallGraph &CG, Function *KernelRoot, ArrayRef< StringRef > FnAttrs)
Strip FnAttr attribute from any functions where we may have introduced its use.
Definition:AMDGPUMemoryUtils.cpp:306
llvm::AMDGPU::getTransitiveUsesOfLDS
LDSUsesInfoTy getTransitiveUsesOfLDS(const CallGraph &CG, Module &M)
Definition:AMDGPUMemoryUtils.cpp:138
llvm::AMDGPU::isNamedBarrier
TargetExtType * isNamedBarrier(const GlobalVariable &GV)
Definition:AMDGPUMemoryUtils.cpp:34
llvm::AMDGPU::isLDSVariableToLower
bool isLDSVariableToLower(const GlobalVariable &GV)
Definition:AMDGPUMemoryUtils.cpp:65
llvm::AMDGPU::eliminateConstantExprUsesOfLDSFromAllInstructions
bool eliminateConstantExprUsesOfLDSFromAllInstructions(Module &M)
Definition:AMDGPUMemoryUtils.cpp:86
llvm::AMDGPU::getAlign
Align getAlign(const DataLayout &DL, const GlobalVariable *GV)
Definition:AMDGPUMemoryUtils.cpp:29
llvm::AMDGPU::isKernelLDS
bool isKernelLDS(const Function *F)
Definition:AMDGPUMemoryUtils.cpp:127
llvm::CallingConv::C
@ C
The default llvm calling convention, compatible with C.
Definition:CallingConv.h:34
llvm::CallingConv::ID
unsigned ID
LLVM IR allows to use arbitrary numbers as calling convention identifiers.
Definition:CallingConv.h:24
llvm::Intrinsic::getOrInsertDeclaration
Function * getOrInsertDeclaration(Module *M, ID id, ArrayRef< Type * > Tys={})
Look up the Function declaration of the intrinsic id in the Module M.
Definition:Intrinsics.cpp:732
llvm::cl::Hidden
@ Hidden
Definition:CommandLine.h:137
llvm::cl::values
ValuesClass values(OptsTy... Options)
Helper to build a ValuesClass by forwarding a variable number of arguments as an initializer list to ...
Definition:CommandLine.h:711
llvm::cl::init
initializer< Ty > init(const Ty &Val)
Definition:CommandLine.h:443
llvm
This is an optimization pass for GlobalISel generic memory operations.
Definition:AddressRanges.h:18
llvm::Offset
@ Offset
Definition:DWP.cpp:480
llvm::operator<
bool operator<(int64_t V1, const APSInt &V2)
Definition:APSInt.h:361
llvm::size
auto size(R &&Range, std::enable_if_t< std::is_base_of< std::random_access_iterator_tag, typename std::iterator_traits< decltype(Range.begin())>::iterator_category >::value, void > *=nullptr)
Get the size of a range.
Definition:STLExtras.h:1697
llvm::set_is_subset
bool set_is_subset(const S1Ty &S1, const S2Ty &S2)
set_is_subset(A, B) - Return true iff A in B
Definition:SetOperations.h:151
llvm::make_early_inc_range
iterator_range< early_inc_iterator_impl< detail::IterOfRange< RangeT > > > make_early_inc_range(RangeT &&Range)
Make a range that does early increment to allow mutation of the underlying range without disrupting i...
Definition:STLExtras.h:657
llvm::initializeAMDGPULowerModuleLDSLegacyPass
void initializeAMDGPULowerModuleLDSLegacyPass(PassRegistry &)
llvm::HexPrintStyle::Lower
@ Lower
llvm::sort
void sort(IteratorTy Start, IteratorTy End)
Definition:STLExtras.h:1664
llvm::report_fatal_error
void report_fatal_error(Error Err, bool gen_crash_diag=true)
Report a serious error, calling any installed error handler.
Definition:Error.cpp:167
llvm::AMDGPULowerModuleLDSLegacyPassID
char & AMDGPULowerModuleLDSLegacyPassID
Definition:AMDGPULowerModuleLDSPass.cpp:1529
llvm::removeFromUsedLists
void removeFromUsedLists(Module &M, function_ref< bool(Constant *)> ShouldRemove)
Removes global values from the llvm.used and llvm.compiler.used arrays.
Definition:ModuleUtils.cpp:195
llvm::format
format_object< Ts... > format(const char *Fmt, const Ts &... Vals)
These are helper functions used to produce formatted output.
Definition:Format.h:125
llvm::createAMDGPULowerModuleLDSLegacyPass
ModulePass * createAMDGPULowerModuleLDSLegacyPass(const AMDGPUTargetMachine *TM=nullptr)
Definition:AMDGPULowerModuleLDSPass.cpp:1540
llvm::appendToCompilerUsed
void appendToCompilerUsed(Module &M, ArrayRef< GlobalValue * > Values)
Adds global values to the llvm.compiler.used list.
Definition:ModuleUtils.cpp:161
llvm::performOptimizedStructLayout
std::pair< uint64_t, Align > performOptimizedStructLayout(MutableArrayRef< OptimizedStructLayoutField > Fields)
Compute a layout for a struct containing the given fields, making a best-effort attempt to minimize t...
Definition:OptimizedStructLayout.cpp:43
llvm::alignTo
uint64_t alignTo(uint64_t Size, Align A)
Returns a multiple of A needed to store Size bytes.
Definition:Alignment.h:155
llvm::BitWidth
constexpr unsigned BitWidth
Definition:BitmaskEnum.h:217
llvm::commonAlignment
Align commonAlignment(Align A, uint64_t Offset)
Returns the alignment that satisfies both alignments.
Definition:Alignment.h:212
raw_ostream.h
N
#define N
llvm::AMDGPULowerModuleLDSPass::run
PreservedAnalyses run(Module &M, ModuleAnalysisManager &AM)
Definition:AMDGPULowerModuleLDSPass.cpp:1544
llvm::AMDGPULowerModuleLDSPass::TM
const AMDGPUTargetMachine & TM
Definition:AMDGPU.h:141
llvm::AMDGPU::LDSUsesInfoTy
Definition:AMDGPUMemoryUtils.h:44
llvm::AMDGPU::LDSUsesInfoTy::direct_access
FunctionVariableMap direct_access
Definition:AMDGPUMemoryUtils.h:45
llvm::AMDGPU::LDSUsesInfoTy::indirect_access
FunctionVariableMap indirect_access
Definition:AMDGPUMemoryUtils.h:46
llvm::AMDGPU::LDSUsesInfoTy::HasSpecialGVs
bool HasSpecialGVs
Definition:AMDGPUMemoryUtils.h:47
llvm::Align
This struct is a compact representation of a valid (non-zero power of two) alignment.
Definition:Alignment.h:39
llvm::Align::value
uint64_t value() const
This is a hole in the type system and should not be abused.
Definition:Alignment.h:85
llvm::OptimizedStructLayoutField
A field in a structure.
Definition:OptimizedStructLayout.h:45
llvm::cl::desc
Definition:CommandLine.h:409

Generated on Fri Jul 18 2025 13:10:41 for LLVM by doxygen 1.9.6
[8]ページ先頭

©2009-2025 Movatter.jp