Movatterモバイル変換

Go to the documentation of this file.

1//===-- AMDGPULowerModuleLDSPass.cpp ------------------------------*- C++ -*-=//

2//

3// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.

4// See https://llvm.org/LICENSE.txt for license information.

5// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

6//

7//===----------------------------------------------------------------------===//

8//

9// This pass eliminates local data store, LDS, uses from non-kernel functions.

10// LDS is contiguous memory allocated per kernel execution.

11//

12// Background.

13//

14// The programming model is global variables, or equivalently function local

15// static variables, accessible from kernels or other functions. For uses from

16// kernels this is straightforward - assign an integer to the kernel for the

17// memory required by all the variables combined, allocate them within that.

18// For uses from functions there are performance tradeoffs to choose between.

19//

20// This model means the GPU runtime can specify the amount of memory allocated.

21// If this is more than the kernel assumed, the excess can be made available

22// using a language specific feature, which IR represents as a variable with

23// no initializer. This feature is referred to here as "Dynamic LDS" and is

24// lowered slightly differently to the normal case.

25//

26// Consequences of this GPU feature:

27// - memory is limited and exceeding it halts compilation

28// - a global accessed by one kernel exists independent of other kernels

29// - a global exists independent of simultaneous execution of the same kernel

30// - the address of the global may be different from different kernels as they

31// do not alias, which permits only allocating variables they use

32// - if the address is allowed to differ, functions need help to find it

33//

34// Uses from kernels are implemented here by grouping them in a per-kernel

35// struct instance. This duplicates the variables, accurately modelling their

36// aliasing properties relative to a single global representation. It also

37// permits control over alignment via padding.

38//

39// Uses from functions are more complicated and the primary purpose of this

40// IR pass. Several different lowering are chosen between to meet requirements

41// to avoid allocating any LDS where it is not necessary, as that impacts

42// occupancy and may fail the compilation, while not imposing overhead on a

43// feature whose primary advantage over global memory is performance. The basic

44// design goal is to avoid one kernel imposing overhead on another.

45//

46// Implementation.

47//

48// LDS variables with constant annotation or non-undef initializer are passed

49// through unchanged for simplification or error diagnostics in later passes.

50// Non-undef initializers are not yet implemented for LDS.

51//

52// LDS variables that are always allocated at the same address can be found

53// by lookup at that address. Otherwise runtime information/cost is required.

54//

55// The simplest strategy possible is to group all LDS variables in a single

56// struct and allocate that struct in every kernel such that the original

57// variables are always at the same address. LDS is however a limited resource

58// so this strategy is unusable in practice. It is not implemented here.

59//

60// Strategy | Precise allocation | Zero runtime cost | General purpose |

61// --------+--------------------+-------------------+-----------------+

62// Module | No | Yes | Yes |

63// Table | Yes | No | Yes |

64// Kernel | Yes | Yes | No |

65// Hybrid | Yes | Partial | Yes |

66//

67// "Module" spends LDS memory to save cycles. "Table" spends cycles and global

68// memory to save LDS. "Kernel" is as fast as kernel allocation but only works

69// for variables that are known reachable from a single kernel. "Hybrid" picks

70// between all three. When forced to choose between LDS and cycles we minimise

71// LDS use.

73// The "module" lowering implemented here finds LDS variables which are used by

74// non-kernel functions and creates a new struct with a field for each of those

75// LDS variables. Variables that are only used from kernels are excluded.

76//

77// The "table" lowering implemented here has three components.

78// First kernels are assigned a unique integer identifier which is available in

79// functions it calls through the intrinsic amdgcn_lds_kernel_id. The integer

80// is passed through a specific SGPR, thus works with indirect calls.

81// Second, each kernel allocates LDS variables independent of other kernels and

82// writes the addresses it chose for each variable into an array in consistent

83// order. If the kernel does not allocate a given variable, it writes undef to

84// the corresponding array location. These arrays are written to a constant

85// table in the order matching the kernel unique integer identifier.

86// Third, uses from non-kernel functions are replaced with a table lookup using

87// the intrinsic function to find the address of the variable.

88//

89// "Kernel" lowering is only applicable for variables that are unambiguously

90// reachable from exactly one kernel. For those cases, accesses to the variable

91// can be lowered to ConstantExpr address of a struct instance specific to that

92// one kernel. This is zero cost in space and in compute. It will raise a fatal

93// error on any variable that might be reachable from multiple kernels and is

94// thus most easily used as part of the hybrid lowering strategy.

95//

96// Hybrid lowering is a mixture of the above. It uses the zero cost kernel

97// lowering where it can. It lowers the variable accessed by the greatest

98// number of kernels using the module strategy as that is free for the first

99// variable. Any futher variables that can be lowered with the module strategy

100// without incurring LDS memory overhead are. The remaining ones are lowered

101// via table.

102//

103// Consequences

104// - No heuristics or user controlled magic numbers, hybrid is the right choice

105// - Kernels that don't use functions (or have had them all inlined) are not

106// affected by any lowering for kernels that do.

107// - Kernels that don't make indirect function calls are not affected by those

108// that do.

109// - Variables which are used by lots of kernels, e.g. those injected by a

110// language runtime in most kernels, are expected to have no overhead

111// - Implementations that instantiate templates per-kernel where those templates

112// use LDS are expected to hit the "Kernel" lowering strategy

113// - The runtime properties impose a cost in compiler implementation complexity

114//

115// Dynamic LDS implementation

116// Dynamic LDS is lowered similarly to the "table" strategy above and uses the

117// same intrinsic to identify which kernel is at the root of the dynamic call

118// graph. This relies on the specified behaviour that all dynamic LDS variables

119// alias one another, i.e. are at the same address, with respect to a given

120// kernel. Therefore this pass creates new dynamic LDS variables for each kernel

121// that allocates any dynamic LDS and builds a table of addresses out of those.

122// The AMDGPUPromoteAlloca pass skips kernels that use dynamic LDS.

123// The corresponding optimisation for "kernel" lowering where the table lookup

124// is elided is not implemented.

125//

126//

127// Implementation notes / limitations

128// A single LDS global variable represents an instance per kernel that can reach

129// said variables. This pass essentially specialises said variables per kernel.

130// Handling ConstantExpr during the pass complicated this significantly so now

131// all ConstantExpr uses of LDS variables are expanded to instructions. This

132// may need amending when implementing non-undef initialisers.

133//

134// Lowering is split between this IR pass and the back end. This pass chooses

135// where given variables should be allocated and marks them with metadata,

136// MD_absolute_symbol. The backend places the variables in coincidentally the

137// same location and raises a fatal error if something has gone awry. This works

138// in practice because the only pass between this one and the backend that

139// changes LDS is PromoteAlloca and the changes it makes do not conflict.

140//

141// Addresses are written to constant global arrays based on the same metadata.

142//

143// The backend lowers LDS variables in the order of traversal of the function.

144// This is at odds with the deterministic layout required. The workaround is to

145// allocate the fixed-address variables immediately upon starting the function

146// where they can be placed as intended. This requires a means of mapping from

147// the function to the variables that it allocates. For the module scope lds,

148// this is via metadata indicating whether the variable is not required. If a

149// pass deletes that metadata, a fatal error on disagreement with the absolute

150// symbol metadata will occur. For kernel scope and dynamic, this is by _name_

151// correspondence between the function and the variable. It requires the

152// kernel to have a name (which is only a limitation for tests in practice) and

153// for nothing to rename the corresponding symbols. This is a hazard if the pass

154// is run multiple times during debugging. Alternative schemes considered all

155// involve bespoke metadata.

156//

157// If the name correspondence can be replaced, multiple distinct kernels that

158// have the same memory layout can map to the same kernel id (as the address

159// itself is handled by the absolute symbol metadata) and that will allow more

160// uses of the "kernel" style faster lowering and reduce the size of the lookup

161// tables.

162//

163// There is a test that checks this does not fire for a graphics shader. This

164// lowering is expected to work for graphics if the isKernel test is changed.

165//

166// The current markUsedByKernel is sufficient for PromoteAlloca but is elided

167// before codegen. Replacing this with an equivalent intrinsic which lasts until

168// shortly after the machine function lowering of LDS would help break the name

169// mapping. The other part needed is probably to amend PromoteAlloca to embed

170// the LDS variables it creates in the same struct created here. That avoids the

171// current hazard where a PromoteAlloca LDS variable might be allocated before

172// the kernel scope (and thus error on the address check). Given a new invariant

173// that no LDS variables exist outside of the structs managed here, and an

174// intrinsic that lasts until after the LDS frame lowering, it should be

175// possible to drop the name mapping and fold equivalent memory layouts.

176//

177//===----------------------------------------------------------------------===//

178

179#include "AMDGPU.h"

180#include "AMDGPUMemoryUtils.h"

181#include "AMDGPUTargetMachine.h"

182#include "Utils/AMDGPUBaseInfo.h"

183#include "llvm/ADT/BitVector.h"

184#include "llvm/ADT/DenseMap.h"

185#include "llvm/ADT/DenseSet.h"

186#include "llvm/ADT/STLExtras.h"

187#include "llvm/ADT/SetOperations.h"

188#include "llvm/Analysis/CallGraph.h"

189#include "llvm/CodeGen/TargetPassConfig.h"

190#include "llvm/IR/Constants.h"

191#include "llvm/IR/DerivedTypes.h"

192#include "llvm/IR/IRBuilder.h"

193#include "llvm/IR/InlineAsm.h"

194#include "llvm/IR/Instructions.h"

195#include "llvm/IR/IntrinsicsAMDGPU.h"

196#include "llvm/IR/MDBuilder.h"

197#include "llvm/IR/ReplaceConstant.h"

198#include "llvm/InitializePasses.h"

199#include "llvm/Pass.h"

200#include "llvm/Support/CommandLine.h"

201#include "llvm/Support/Debug.h"

202#include "llvm/Support/Format.h"

203#include "llvm/Support/OptimizedStructLayout.h"

204#include "llvm/Support/raw_ostream.h"

205#include "llvm/Transforms/Utils/BasicBlockUtils.h"

206#include "llvm/Transforms/Utils/ModuleUtils.h"

207

208#include <vector>

209

210#include <cstdio>

211

212#define DEBUG_TYPE "amdgpu-lower-module-lds"

213

214using namespacellvm;

215using namespaceAMDGPU;

216

217namespace{

218

219cl::opt<bool> SuperAlignLDSGlobals(

220"amdgpu-super-align-lds-globals",

221cl::desc("Increase alignment of LDS if it is not on align boundary"),

222cl::init(true),cl::Hidden);

223

224enum class LoweringKind { module, table, kernel, hybrid };

225cl::opt<LoweringKind> LoweringKindLoc(

226"amdgpu-lower-module-lds-strategy",

227cl::desc("Specify lowering strategy for function LDS access:"),cl::Hidden,

228cl::init(LoweringKind::hybrid),

229cl::values(

230clEnumValN(LoweringKind::table,"table","Lower via table lookup"),

231clEnumValN(LoweringKind::module,"module","Lower via module struct"),

232clEnumValN(

233 LoweringKind::kernel,"kernel",

234"Lower variables reachable from one kernel, otherwise abort"),

235clEnumValN(LoweringKind::hybrid,"hybrid",

236"Lower via mixture of above strategies")));

237

238template <typename T> std::vector<T> sortByName(std::vector<T> &&V) {

239llvm::sort(V.begin(), V.end(), [](constauto *L,constauto *R) {

240 return L->getName() < R->getName();

241 });

242return {std::move(V)};

243}

244

245classAMDGPULowerModuleLDS {

246constAMDGPUTargetMachine &TM;

247

248staticvoid

249 removeLocalVarsFromUsedLists(Module &M,

250constDenseSet<GlobalVariable *> &LocalVars) {

251// The verifier rejects used lists containing an inttoptr of a constant

252// so remove the variables from these lists before replaceAllUsesWith

253SmallPtrSet<Constant *, 8> LocalVarsSet;

254for (GlobalVariable *LocalVar : LocalVars)

255 LocalVarsSet.insert(cast<Constant>(LocalVar->stripPointerCasts()));

256

257removeFromUsedLists(

258 M, [&LocalVarsSet](Constant *C) {return LocalVarsSet.count(C); });

259

260for (GlobalVariable *LocalVar : LocalVars)

261 LocalVar->removeDeadConstantUsers();

262 }

263

264staticvoid markUsedByKernel(Function *Func,GlobalVariable *SGV) {

265// The llvm.amdgcn.module.lds instance is implicitly used by all kernels

266// that might call a function which accesses a field within it. This is

267// presently approximated to 'all kernels' if there are any such functions

268// in the module. This implicit use is redefined as an explicit use here so

269// that later passes, specifically PromoteAlloca, account for the required

270// memory without any knowledge of this transform.

271

272// An operand bundle on llvm.donothing works because the call instruction

273// survives until after the last pass that needs to account for LDS. It is

274// better than inline asm as the latter survives until the end of codegen. A

275// totally robust solution would be a function with the same semantics as

276// llvm.donothing that takes a pointer to the instance and is lowered to a

277// no-op after LDS is allocated, but that is not presently necessary.

278

279// This intrinsic is eliminated shortly before instruction selection. It

280// does not suffice to indicate to ISel that a given global which is not

281// immediately used by the kernel must still be allocated by it. An

282// equivalent target specific intrinsic which lasts until immediately after

283// codegen would suffice for that, but one would still need to ensure that

284// the variables are allocated in the anticipated order.

285BasicBlock *Entry = &Func->getEntryBlock();

286IRBuilder<> Builder(Entry, Entry->getFirstNonPHIIt());

287

288Function *Decl =Intrinsic::getOrInsertDeclaration(

289 Func->getParent(), Intrinsic::donothing, {});

290

291Value *UseInstance[1] = {

292 Builder.CreateConstInBoundsGEP1_32(SGV->getValueType(), SGV, 0)};

293

294 Builder.CreateCall(

295 Decl, {}, {OperandBundleDefT<Value *>("ExplicitUse", UseInstance)});

296 }

297

298public:

299 AMDGPULowerModuleLDS(constAMDGPUTargetMachine &TM_) : TM(TM_) {}

300

301structLDSVariableReplacement {

302GlobalVariable *SGV =nullptr;

303DenseMap<GlobalVariable *, Constant *> LDSVarsToConstantGEP;

304 };

305

306// remap from lds global to a constantexpr gep to where it has been moved to

307// for each kernel

308// an array with an element for each kernel containing where the corresponding

309// variable was remapped to

310

311staticConstant *getAddressesOfVariablesInKernel(

312LLVMContext &Ctx,ArrayRef<GlobalVariable *> Variables,

313constDenseMap<GlobalVariable *, Constant *> &LDSVarsToConstantGEP) {

314// Create a ConstantArray containing the address of each Variable within the

315// kernel corresponding to LDSVarsToConstantGEP, or poison if that kernel

316// does not allocate it

317// TODO: Drop the ptrtoint conversion

318

319Type *I32 =Type::getInt32Ty(Ctx);

320

321ArrayType *KernelOffsetsType = ArrayType::get(I32, Variables.size());

322

323SmallVector<Constant *> Elements;

324for (GlobalVariable *GV : Variables) {

325auto ConstantGepIt = LDSVarsToConstantGEP.find(GV);

326if (ConstantGepIt != LDSVarsToConstantGEP.end()) {

327auto *elt =ConstantExpr::getPtrToInt(ConstantGepIt->second, I32);

328 Elements.push_back(elt);

329 }else {

330 Elements.push_back(PoisonValue::get(I32));

331 }

332 }

333returnConstantArray::get(KernelOffsetsType, Elements);

334 }

335

336staticGlobalVariable *buildLookupTable(

337Module &M,ArrayRef<GlobalVariable *> Variables,

338ArrayRef<Function *> kernels,

339DenseMap<Function *, LDSVariableReplacement> &KernelToReplacement) {

340if (Variables.empty()) {

341returnnullptr;

342 }

343LLVMContext &Ctx = M.getContext();

344

345constsize_t NumberVariables = Variables.size();

346constsize_t NumberKernels = kernels.size();

347

348ArrayType *KernelOffsetsType =

349 ArrayType::get(Type::getInt32Ty(Ctx), NumberVariables);

350

351ArrayType *AllKernelsOffsetsType =

352 ArrayType::get(KernelOffsetsType, NumberKernels);

353

354Constant *Missing =PoisonValue::get(KernelOffsetsType);

355 std::vector<Constant *> overallConstantExprElts(NumberKernels);

356for (size_t i = 0; i < NumberKernels; i++) {

357auto Replacement = KernelToReplacement.find(kernels[i]);

358 overallConstantExprElts[i] =

359 (Replacement == KernelToReplacement.end())

360 ? Missing

361 : getAddressesOfVariablesInKernel(

362 Ctx, Variables, Replacement->second.LDSVarsToConstantGEP);

363 }

364

365Constant *init =

366ConstantArray::get(AllKernelsOffsetsType, overallConstantExprElts);

367

368returnnewGlobalVariable(

369 M, AllKernelsOffsetsType,true,GlobalValue::InternalLinkage, init,

370"llvm.amdgcn.lds.offset.table",nullptr,GlobalValue::NotThreadLocal,

371AMDGPUAS::CONSTANT_ADDRESS);

372 }

373

374void replaceUseWithTableLookup(Module &M,IRBuilder<> &Builder,

375GlobalVariable *LookupTable,

376GlobalVariable *GV,Use &U,

377Value *OptionalIndex) {

378// Table is a constant array of the same length as OrderedKernels

379LLVMContext &Ctx = M.getContext();

380Type *I32 =Type::getInt32Ty(Ctx);

381auto *I = cast<Instruction>(U.getUser());

382

383Value *tableKernelIndex = getTableLookupKernelIndex(M,I->getFunction());

384

385if (auto *Phi = dyn_cast<PHINode>(I)) {

386BasicBlock *BB = Phi->getIncomingBlock(U);

387 Builder.SetInsertPoint(&(*(BB->getFirstInsertionPt())));

388 }else {

389 Builder.SetInsertPoint(I);

390 }

391

392SmallVector<Value *, 3> GEPIdx = {

393 ConstantInt::get(I32, 0),

394 tableKernelIndex,

395 };

396if (OptionalIndex)

397 GEPIdx.push_back(OptionalIndex);

398

399Value *Address = Builder.CreateInBoundsGEP(

400 LookupTable->getValueType(), LookupTable, GEPIdx, GV->getName());

401

402Value *loaded = Builder.CreateLoad(I32,Address);

403

404Value *replacement =

405 Builder.CreateIntToPtr(loaded, GV->getType(), GV->getName());

406

407 U.set(replacement);

408 }

409

410void replaceUsesInInstructionsWithTableLookup(

411Module &M,ArrayRef<GlobalVariable *> ModuleScopeVariables,

412GlobalVariable *LookupTable) {

413

414LLVMContext &Ctx = M.getContext();

415IRBuilder<> Builder(Ctx);

416Type *I32 =Type::getInt32Ty(Ctx);

417

418for (size_t Index = 0; Index < ModuleScopeVariables.size(); Index++) {

419auto *GV = ModuleScopeVariables[Index];

420

421for (Use &U :make_early_inc_range(GV->uses())) {

422auto *I = dyn_cast<Instruction>(U.getUser());

423if (!I)

424continue;

425

426 replaceUseWithTableLookup(M, Builder, LookupTable, GV, U,

427 ConstantInt::get(I32, Index));

428 }

429 }

430 }

431

432staticDenseSet<Function *> kernelsThatIndirectlyAccessAnyOfPassedVariables(

433Module &M,LDSUsesInfoTy &LDSUsesInfo,

434DenseSet<GlobalVariable *>const &VariableSet) {

435

436DenseSet<Function *> KernelSet;

437

438if (VariableSet.empty())

439return KernelSet;

440

441for (Function &Func : M.functions()) {

442if (Func.isDeclaration() || !isKernelLDS(&Func))

443continue;

444for (GlobalVariable *GV : LDSUsesInfo.indirect_access[&Func]) {

445if (VariableSet.contains(GV)) {

446 KernelSet.insert(&Func);

447break;

448 }

449 }

450 }

451

452return KernelSet;

453 }

454

455staticGlobalVariable *

456 chooseBestVariableForModuleStrategy(constDataLayout &DL,

457VariableFunctionMap &LDSVars) {

458// Find the global variable with the most indirect uses from kernels

459

460structCandidateTy {

461GlobalVariable *GV =nullptr;

462size_t UserCount = 0;

463size_tSize = 0;

464

465 CandidateTy() =default;

466

467 CandidateTy(GlobalVariable *GV,uint64_t UserCount,uint64_t AllocSize)

468 : GV(GV), UserCount(UserCount),Size(AllocSize) {}

469

470booloperator<(const CandidateTy &Other) const{

471// Fewer users makes module scope variable less attractive

472if (UserCount <Other.UserCount) {

473returntrue;

474 }

475if (UserCount >Other.UserCount) {

476returnfalse;

477 }

478

479// Bigger makes module scope variable less attractive

480if (Size <Other.Size) {

481returnfalse;

482 }

483

484if (Size >Other.Size) {

485returntrue;

486 }

487

488// Arbitrary but consistent

489return GV->getName() <Other.GV->getName();

490 }

491 };

492

493 CandidateTy MostUsed;

494

495for (auto &K : LDSVars) {

496GlobalVariable *GV = K.first;

497if (K.second.size() <= 1) {

498// A variable reachable by only one kernel is best lowered with kernel

499// strategy

500continue;

501 }

502 CandidateTy Candidate(

503 GV, K.second.size(),

504DL.getTypeAllocSize(GV->getValueType()).getFixedValue());

505if (MostUsed < Candidate)

506 MostUsed = Candidate;

507 }

508

509return MostUsed.GV;

510 }

511

512staticvoid recordLDSAbsoluteAddress(Module *M,GlobalVariable *GV,

513uint32_t Address) {

514// Write the specified address into metadata where it can be retrieved by

515// the assembler. Format is a half open range, [Address Address+1)

516LLVMContext &Ctx = M->getContext();

517auto *IntTy =

518 M->getDataLayout().getIntPtrType(Ctx,AMDGPUAS::LOCAL_ADDRESS);

519auto *MinC =ConstantAsMetadata::get(ConstantInt::get(IntTy,Address));

520auto *MaxC =ConstantAsMetadata::get(ConstantInt::get(IntTy,Address + 1));

521 GV->setMetadata(LLVMContext::MD_absolute_symbol,

522MDNode::get(Ctx, {MinC, MaxC}));

523 }

524

525DenseMap<Function *, Value *> tableKernelIndexCache;

526Value *getTableLookupKernelIndex(Module &M,Function *F) {

527// Accesses from a function use the amdgcn_lds_kernel_id intrinsic which

528// lowers to a read from a live in register. Emit it once in the entry

529// block to spare deduplicating it later.

530auto [It, Inserted] = tableKernelIndexCache.try_emplace(F);

531if (Inserted) {

532auto InsertAt =F->getEntryBlock().getFirstNonPHIOrDbgOrAlloca();

533IRBuilder<> Builder(&*InsertAt);

534

535 It->second =

536 Builder.CreateIntrinsic(Intrinsic::amdgcn_lds_kernel_id, {}, {});

537 }

538

539return It->second;

540 }

541

542static std::vector<Function *> assignLDSKernelIDToEachKernel(

543Module *M,DenseSet<Function *>const &KernelsThatAllocateTableLDS,

544DenseSet<Function *>const &KernelsThatIndirectlyAllocateDynamicLDS) {

545// Associate kernels in the set with an arbitrary but reproducible order and

546// annotate them with that order in metadata. This metadata is recognised by

547// the backend and lowered to a SGPR which can be read from using

548// amdgcn_lds_kernel_id.

549

550 std::vector<Function *> OrderedKernels;

551if (!KernelsThatAllocateTableLDS.empty() ||

552 !KernelsThatIndirectlyAllocateDynamicLDS.empty()) {

553

554for (Function &Func : M->functions()) {

555if (Func.isDeclaration())

556continue;

557if (!isKernelLDS(&Func))

558continue;

559

560if (KernelsThatAllocateTableLDS.contains(&Func) ||

561 KernelsThatIndirectlyAllocateDynamicLDS.contains(&Func)) {

562assert(Func.hasName());// else fatal error earlier

563 OrderedKernels.push_back(&Func);

564 }

565 }

566

567// Put them in an arbitrary but reproducible order

568 OrderedKernels = sortByName(std::move(OrderedKernels));

569

570// Annotate the kernels with their order in this vector

571LLVMContext &Ctx = M->getContext();

572IRBuilder<> Builder(Ctx);

573

574if (OrderedKernels.size() > UINT32_MAX) {

575// 32 bit keeps it in one SGPR. > 2**32 kernels won't fit on the GPU

576report_fatal_error("Unimplemented LDS lowering for > 2**32 kernels");

577 }

578

579for (size_t i = 0; i < OrderedKernels.size(); i++) {

580Metadata *AttrMDArgs[1] = {

581ConstantAsMetadata::get(Builder.getInt32(i)),

582 };

583 OrderedKernels[i]->setMetadata("llvm.amdgcn.lds.kernel.id",

584MDNode::get(Ctx, AttrMDArgs));

585 }

586 }

587return OrderedKernels;

588 }

589

590staticvoid partitionVariablesIntoIndirectStrategies(

591Module &M,LDSUsesInfoTyconst &LDSUsesInfo,

592VariableFunctionMap &LDSToKernelsThatNeedToAccessItIndirectly,

593DenseSet<GlobalVariable *> &ModuleScopeVariables,

594DenseSet<GlobalVariable *> &TableLookupVariables,

595DenseSet<GlobalVariable *> &KernelAccessVariables,

596DenseSet<GlobalVariable *> &DynamicVariables) {

597

598GlobalVariable *HybridModuleRoot =

599 LoweringKindLoc != LoweringKind::hybrid

600 ? nullptr

601 : chooseBestVariableForModuleStrategy(

602 M.getDataLayout(), LDSToKernelsThatNeedToAccessItIndirectly);

603

604DenseSet<Function *>const EmptySet;

605DenseSet<Function *>const &HybridModuleRootKernels =

606 HybridModuleRoot

607 ? LDSToKernelsThatNeedToAccessItIndirectly[HybridModuleRoot]

608 : EmptySet;

609

610for (auto &K : LDSToKernelsThatNeedToAccessItIndirectly) {

611// Each iteration of this loop assigns exactly one global variable to

612// exactly one of the implementation strategies.

613

614GlobalVariable *GV = K.first;

615assert(AMDGPU::isLDSVariableToLower(*GV));

616assert(K.second.size() != 0);

617

618if (AMDGPU::isDynamicLDS(*GV)) {

619 DynamicVariables.insert(GV);

620continue;

621 }

622

623switch (LoweringKindLoc) {

624case LoweringKind::module:

625 ModuleScopeVariables.insert(GV);

626break;

627

628case LoweringKind::table:

629 TableLookupVariables.insert(GV);

630break;

631

632case LoweringKind::kernel:

633if (K.second.size() == 1) {

634 KernelAccessVariables.insert(GV);

635 }else {

636report_fatal_error(

637"cannot lower LDS '" + GV->getName() +

638"' to kernel access as it is reachable from multiple kernels");

639 }

640break;

641

642case LoweringKind::hybrid: {

643if (GV == HybridModuleRoot) {

644assert(K.second.size() != 1);

645 ModuleScopeVariables.insert(GV);

646 }elseif (K.second.size() == 1) {

647 KernelAccessVariables.insert(GV);

648 }elseif (set_is_subset(K.second, HybridModuleRootKernels)) {

649 ModuleScopeVariables.insert(GV);

650 }else {

651 TableLookupVariables.insert(GV);

652 }

653break;

654 }

655 }

656 }

657

658// All LDS variables accessed indirectly have now been partitioned into

659// the distinct lowering strategies.

660assert(ModuleScopeVariables.size() + TableLookupVariables.size() +

661 KernelAccessVariables.size() + DynamicVariables.size() ==

662 LDSToKernelsThatNeedToAccessItIndirectly.size());

663 }

664

665staticGlobalVariable *lowerModuleScopeStructVariables(

666Module &M,DenseSet<GlobalVariable *>const &ModuleScopeVariables,

667DenseSet<Function *>const &KernelsThatAllocateModuleLDS) {

668// Create a struct to hold the ModuleScopeVariables

669// Replace all uses of those variables from non-kernel functions with the

670// new struct instance Replace only the uses from kernel functions that will

671// allocate this instance. That is a space optimisation - kernels that use a

672// subset of the module scope struct and do not need to allocate it for

673// indirect calls will only allocate the subset they use (they do so as part

674// of the per-kernel lowering).

675if (ModuleScopeVariables.empty()) {

676returnnullptr;

677 }

678

679LLVMContext &Ctx = M.getContext();

680

681 LDSVariableReplacement ModuleScopeReplacement =

682 createLDSVariableReplacement(M,"llvm.amdgcn.module.lds",

683 ModuleScopeVariables);

684

685appendToCompilerUsed(M, {static_cast<GlobalValue *>(

686ConstantExpr::getPointerBitCastOrAddrSpaceCast(

687 cast<Constant>(ModuleScopeReplacement.SGV),

688 PointerType::getUnqual(Ctx)))});

689

690// module.lds will be allocated at zero in any kernel that allocates it

691 recordLDSAbsoluteAddress(&M, ModuleScopeReplacement.SGV, 0);

692

693// historic

694 removeLocalVarsFromUsedLists(M, ModuleScopeVariables);

695

696// Replace all uses of module scope variable from non-kernel functions

697 replaceLDSVariablesWithStruct(

698 M, ModuleScopeVariables, ModuleScopeReplacement, [&](Use &U) {

699Instruction *I = dyn_cast<Instruction>(U.getUser());

700if (!I) {

701returnfalse;

702 }

703Function *F =I->getFunction();

704return !isKernelLDS(F);

705 });

706

707// Replace uses of module scope variable from kernel functions that

708// allocate the module scope variable, otherwise leave them unchanged

709// Record on each kernel whether the module scope global is used by it

710

711for (Function &Func : M.functions()) {

712if (Func.isDeclaration() || !isKernelLDS(&Func))

713continue;

714

715if (KernelsThatAllocateModuleLDS.contains(&Func)) {

716 replaceLDSVariablesWithStruct(

717 M, ModuleScopeVariables, ModuleScopeReplacement, [&](Use &U) {

718Instruction *I = dyn_cast<Instruction>(U.getUser());

719if (!I) {

720returnfalse;

721 }

722Function *F =I->getFunction();

723returnF == &Func;

724 });

725

726 markUsedByKernel(&Func, ModuleScopeReplacement.SGV);

727 }

728 }

729

730return ModuleScopeReplacement.SGV;

731 }

732

733staticDenseMap<Function *, LDSVariableReplacement>

734 lowerKernelScopeStructVariables(

735Module &M,LDSUsesInfoTy &LDSUsesInfo,

736DenseSet<GlobalVariable *>const &ModuleScopeVariables,

737DenseSet<Function *>const &KernelsThatAllocateModuleLDS,

738GlobalVariable *MaybeModuleScopeStruct) {

739

740// Create a struct for each kernel for the non-module-scope variables.

741

742DenseMap<Function *, LDSVariableReplacement> KernelToReplacement;

743for (Function &Func : M.functions()) {

744if (Func.isDeclaration() || !isKernelLDS(&Func))

745continue;

746

747DenseSet<GlobalVariable *> KernelUsedVariables;

748// Allocating variables that are used directly in this struct to get

749// alignment aware allocation and predictable frame size.

750for (auto &v : LDSUsesInfo.direct_access[&Func]) {

751if (!AMDGPU::isDynamicLDS(*v)) {

752 KernelUsedVariables.insert(v);

753 }

754 }

755

756// Allocating variables that are accessed indirectly so that a lookup of

757// this struct instance can find them from nested functions.

758for (auto &v : LDSUsesInfo.indirect_access[&Func]) {

759if (!AMDGPU::isDynamicLDS(*v)) {

760 KernelUsedVariables.insert(v);

761 }

762 }

763

764// Variables allocated in module lds must all resolve to that struct,

765// not to the per-kernel instance.

766if (KernelsThatAllocateModuleLDS.contains(&Func)) {

767for (GlobalVariable *v : ModuleScopeVariables) {

768 KernelUsedVariables.erase(v);

769 }

770 }

771

772if (KernelUsedVariables.empty()) {

773// Either used no LDS, or the LDS it used was all in the module struct

774// or dynamically sized

775continue;

776 }

777

778// The association between kernel function and LDS struct is done by

779// symbol name, which only works if the function in question has a

780// name This is not expected to be a problem in practice as kernels

781// are called by name making anonymous ones (which are named by the

782// backend) difficult to use. This does mean that llvm test cases need

783// to name the kernels.

784if (!Func.hasName()) {

785report_fatal_error("Anonymous kernels cannot use LDS variables");

786 }

787

788 std::string VarName =

789 (Twine("llvm.amdgcn.kernel.") + Func.getName() +".lds").str();

790

791auto Replacement =

792 createLDSVariableReplacement(M, VarName, KernelUsedVariables);

793

794// If any indirect uses, create a direct use to ensure allocation

795// TODO: Simpler to unconditionally mark used but that regresses

796// codegen in test/CodeGen/AMDGPU/noclobber-barrier.ll

797auto Accesses = LDSUsesInfo.indirect_access.find(&Func);

798if ((Accesses != LDSUsesInfo.indirect_access.end()) &&

799 !Accesses->second.empty())

800 markUsedByKernel(&Func, Replacement.SGV);

801

802// remove preserves existing codegen

803 removeLocalVarsFromUsedLists(M, KernelUsedVariables);

804 KernelToReplacement[&Func] = Replacement;

805

806// Rewrite uses within kernel to the new struct

807 replaceLDSVariablesWithStruct(

808 M, KernelUsedVariables, Replacement, [&Func](Use &U) {

809Instruction *I = dyn_cast<Instruction>(U.getUser());

810returnI &&I->getFunction() == &Func;

811 });

812 }

813return KernelToReplacement;

814 }

815

816staticGlobalVariable *

817 buildRepresentativeDynamicLDSInstance(Module &M,LDSUsesInfoTy &LDSUsesInfo,

818Function *func) {

819// Create a dynamic lds variable with a name associated with the passed

820// function that has the maximum alignment of any dynamic lds variable

821// reachable from this kernel. Dynamic LDS is allocated after the static LDS

822// allocation, possibly after alignment padding. The representative variable

823// created here has the maximum alignment of any other dynamic variable

824// reachable by that kernel. All dynamic LDS variables are allocated at the

825// same address in each kernel in order to provide the documented aliasing

826// semantics. Setting the alignment here allows this IR pass to accurately

827// predict the exact constant at which it will be allocated.

828

829assert(isKernelLDS(func));

830

831LLVMContext &Ctx = M.getContext();

832constDataLayout &DL = M.getDataLayout();

833Align MaxDynamicAlignment(1);

834

835auto UpdateMaxAlignment = [&MaxDynamicAlignment, &DL](GlobalVariable *GV) {

836if (AMDGPU::isDynamicLDS(*GV)) {

837 MaxDynamicAlignment =

838 std::max(MaxDynamicAlignment,AMDGPU::getAlign(DL, GV));

839 }

840 };

841

842for (GlobalVariable *GV : LDSUsesInfo.indirect_access[func]) {

843 UpdateMaxAlignment(GV);

844 }

845

846for (GlobalVariable *GV : LDSUsesInfo.direct_access[func]) {

847 UpdateMaxAlignment(GV);

848 }

849

850assert(func->hasName());// Checked by caller

851auto *emptyCharArray = ArrayType::get(Type::getInt8Ty(Ctx), 0);

852GlobalVariable *N =newGlobalVariable(

853 M, emptyCharArray,false,GlobalValue::ExternalLinkage,nullptr,

854Twine("llvm.amdgcn." +func->getName() +".dynlds"),nullptr,GlobalValue::NotThreadLocal,AMDGPUAS::LOCAL_ADDRESS,

855false);

856N->setAlignment(MaxDynamicAlignment);

857

858assert(AMDGPU::isDynamicLDS(*N));

859returnN;

860 }

861

862DenseMap<Function *, GlobalVariable *> lowerDynamicLDSVariables(

863Module &M,LDSUsesInfoTy &LDSUsesInfo,

864DenseSet<Function *>const &KernelsThatIndirectlyAllocateDynamicLDS,

865DenseSet<GlobalVariable *>const &DynamicVariables,

866 std::vector<Function *>const &OrderedKernels) {

867DenseMap<Function *, GlobalVariable *> KernelToCreatedDynamicLDS;

868if (!KernelsThatIndirectlyAllocateDynamicLDS.empty()) {

869LLVMContext &Ctx = M.getContext();

870IRBuilder<> Builder(Ctx);

871Type *I32 =Type::getInt32Ty(Ctx);

872

873 std::vector<Constant *> newDynamicLDS;

874

875// Table is built in the same order as OrderedKernels

876for (auto &func : OrderedKernels) {

877

878if (KernelsThatIndirectlyAllocateDynamicLDS.contains(func)) {

879assert(isKernelLDS(func));

880if (!func->hasName()) {

881report_fatal_error("Anonymous kernels cannot use LDS variables");

882 }

883

884GlobalVariable *N =

885 buildRepresentativeDynamicLDSInstance(M, LDSUsesInfo,func);

886

887 KernelToCreatedDynamicLDS[func] =N;

888

889 markUsedByKernel(func,N);

890

891auto *emptyCharArray = ArrayType::get(Type::getInt8Ty(Ctx), 0);

892auto *GEP =ConstantExpr::getGetElementPtr(

893 emptyCharArray,N, ConstantInt::get(I32, 0),true);

894 newDynamicLDS.push_back(ConstantExpr::getPtrToInt(GEP, I32));

895 }else {

896 newDynamicLDS.push_back(PoisonValue::get(I32));

897 }

898 }

899assert(OrderedKernels.size() == newDynamicLDS.size());

900

901ArrayType *t = ArrayType::get(I32, newDynamicLDS.size());

902Constant *init =ConstantArray::get(t, newDynamicLDS);

903GlobalVariable *table =newGlobalVariable(

904 M, t,true,GlobalValue::InternalLinkage, init,

905"llvm.amdgcn.dynlds.offset.table",nullptr,

906GlobalValue::NotThreadLocal,AMDGPUAS::CONSTANT_ADDRESS);

907

908for (GlobalVariable *GV : DynamicVariables) {

909for (Use &U :make_early_inc_range(GV->uses())) {

910auto *I = dyn_cast<Instruction>(U.getUser());

911if (!I)

912continue;

913if (isKernelLDS(I->getFunction()))

914continue;

915

916 replaceUseWithTableLookup(M, Builder, table, GV, U,nullptr);

917 }

918 }

919 }

920return KernelToCreatedDynamicLDS;

921 }

922

923staticGlobalVariable *uniquifyGVPerKernel(Module &M,GlobalVariable *GV,

924Function *KF) {

925bool NeedsReplacement =false;

926for (Use &U : GV->uses()) {

927if (auto *I = dyn_cast<Instruction>(U.getUser())) {

928Function *F =I->getFunction();

929if (isKernelLDS(F) &&F != KF) {

930 NeedsReplacement =true;

931break;

932 }

933 }

934 }

935if (!NeedsReplacement)

936return GV;

937// Create a new GV used only by this kernel and its function

938GlobalVariable *NewGV =newGlobalVariable(

939 M, GV->getValueType(), GV->isConstant(), GV->getLinkage(),

940 GV->getInitializer(), GV->getName() +"." + KF->getName(),nullptr,

941 GV->getThreadLocalMode(), GV->getType()->getAddressSpace());

942 NewGV->copyAttributesFrom(GV);

943for (Use &U :make_early_inc_range(GV->uses())) {

944if (auto *I = dyn_cast<Instruction>(U.getUser())) {

945Function *F =I->getFunction();

946if (!isKernelLDS(F) ||F == KF) {

947 U.getUser()->replaceUsesOfWith(GV, NewGV);

948 }

949 }

950 }

951return NewGV;

952 }

953

954bool lowerSpecialLDSVariables(

955Module &M,LDSUsesInfoTy &LDSUsesInfo,

956VariableFunctionMap &LDSToKernelsThatNeedToAccessItIndirectly) {

957bool Changed =false;

958// The 1st round: give module-absolute assignments

959int NumAbsolutes = 0;

960 std::vector<GlobalVariable *> OrderedGVs;

961for (auto &K : LDSToKernelsThatNeedToAccessItIndirectly) {

962GlobalVariable *GV = K.first;

963if (!isNamedBarrier(*GV))

964continue;

965// give a module-absolute assignment if it is indirectly accessed by

966// multiple kernels. This is not precise, but we don't want to duplicate

967// a function when it is called by multiple kernels.

968if (LDSToKernelsThatNeedToAccessItIndirectly[GV].size() > 1) {

969 OrderedGVs.push_back(GV);

970 }else {

971// leave it to the 2nd round, which will give a kernel-relative

972// assignment if it is only indirectly accessed by one kernel

973 LDSUsesInfo.direct_access[*K.second.begin()].insert(GV);

974 }

975 LDSToKernelsThatNeedToAccessItIndirectly.erase(GV);

976 }

977 OrderedGVs = sortByName(std::move(OrderedGVs));

978for (GlobalVariable *GV : OrderedGVs) {

979int BarId = ++NumAbsolutes;

980unsigned BarrierScope =llvm::AMDGPU::Barrier::BARRIER_SCOPE_WORKGROUP;

981// 4 bits for alignment, 5 bits for the barrier num,

982// 3 bits for the barrier scope

983unsignedOffset = 0x802000u | BarrierScope << 9 | BarId << 4;

984 recordLDSAbsoluteAddress(&M, GV,Offset);

985 }

986 OrderedGVs.clear();

987

988// The 2nd round: give a kernel-relative assignment for GV that

989// either only indirectly accessed by single kernel or only directly

990// accessed by multiple kernels.

991 std::vector<Function *> OrderedKernels;

992for (auto &K : LDSUsesInfo.direct_access) {

993Function *F = K.first;

994assert(isKernelLDS(F));

995 OrderedKernels.push_back(F);

996 }

997 OrderedKernels = sortByName(std::move(OrderedKernels));

998

999llvm::DenseMap<Function *, uint32_t> Kernel2BarId;

1000for (Function *F : OrderedKernels) {

1001for (GlobalVariable *GV : LDSUsesInfo.direct_access[F]) {

1002if (!isNamedBarrier(*GV))

1003continue;

1004

1005 LDSUsesInfo.direct_access[F].erase(GV);

1006if (GV->isAbsoluteSymbolRef()) {

1007// already assigned

1008continue;

1009 }

1010 OrderedGVs.push_back(GV);

1011 }

1012 OrderedGVs = sortByName(std::move(OrderedGVs));

1013for (GlobalVariable *GV : OrderedGVs) {

1014// GV could also be used directly by other kernels. If so, we need to

1015// create a new GV used only by this kernel and its function.

1016auto NewGV = uniquifyGVPerKernel(M, GV,F);

1017 Changed |= (NewGV != GV);

1018int BarId = (NumAbsolutes + 1);

1019if (Kernel2BarId.find(F) != Kernel2BarId.end()) {

1020 BarId = (Kernel2BarId[F] + 1);

1021 }

1022 Kernel2BarId[F] = BarId;

1023unsigned BarrierScope =llvm::AMDGPU::Barrier::BARRIER_SCOPE_WORKGROUP;

1024unsignedOffset = 0x802000u | BarrierScope << 9 | BarId << 4;

1025 recordLDSAbsoluteAddress(&M, NewGV,Offset);

1026 }

1027 OrderedGVs.clear();

1028 }

1029// Also erase those special LDS variables from indirect_access.

1030for (auto &K : LDSUsesInfo.indirect_access) {

1031assert(isKernelLDS(K.first));

1032for (GlobalVariable *GV : K.second) {

1033if (isNamedBarrier(*GV))

1034 K.second.erase(GV);

1035 }

1036 }

1037return Changed;

1038 }

1039

1040bool runOnModule(Module &M) {

1041CallGraph CG =CallGraph(M);

1042bool Changed = superAlignLDSGlobals(M);

1043

1044 Changed |=eliminateConstantExprUsesOfLDSFromAllInstructions(M);

1045

1046 Changed =true;// todo: narrow this down

1047

1048// For each kernel, what variables does it access directly or through

1049// callees

1050LDSUsesInfoTy LDSUsesInfo =getTransitiveUsesOfLDS(CG, M);

1051

1052// For each variable accessed through callees, which kernels access it

1053VariableFunctionMap LDSToKernelsThatNeedToAccessItIndirectly;

1054for (auto &K : LDSUsesInfo.indirect_access) {

1055Function *F = K.first;

1056assert(isKernelLDS(F));

1057for (GlobalVariable *GV : K.second) {

1058 LDSToKernelsThatNeedToAccessItIndirectly[GV].insert(F);

1059 }

1060 }

1061

1062if (LDSUsesInfo.HasSpecialGVs) {

1063// Special LDS variables need special address assignment

1064 Changed |= lowerSpecialLDSVariables(

1065 M, LDSUsesInfo, LDSToKernelsThatNeedToAccessItIndirectly);

1066 }

1067

1068// Partition variables accessed indirectly into the different strategies

1069DenseSet<GlobalVariable *> ModuleScopeVariables;

1070DenseSet<GlobalVariable *> TableLookupVariables;

1071DenseSet<GlobalVariable *> KernelAccessVariables;

1072DenseSet<GlobalVariable *> DynamicVariables;

1073 partitionVariablesIntoIndirectStrategies(

1074 M, LDSUsesInfo, LDSToKernelsThatNeedToAccessItIndirectly,

1075 ModuleScopeVariables, TableLookupVariables, KernelAccessVariables,

1076 DynamicVariables);

1077

1078// If the kernel accesses a variable that is going to be stored in the

1079// module instance through a call then that kernel needs to allocate the

1080// module instance

1081constDenseSet<Function *> KernelsThatAllocateModuleLDS =

1082 kernelsThatIndirectlyAccessAnyOfPassedVariables(M, LDSUsesInfo,

1083 ModuleScopeVariables);

1084constDenseSet<Function *> KernelsThatAllocateTableLDS =

1085 kernelsThatIndirectlyAccessAnyOfPassedVariables(M, LDSUsesInfo,

1086 TableLookupVariables);

1087

1088constDenseSet<Function *> KernelsThatIndirectlyAllocateDynamicLDS =

1089 kernelsThatIndirectlyAccessAnyOfPassedVariables(M, LDSUsesInfo,

1090 DynamicVariables);

1091

1092GlobalVariable *MaybeModuleScopeStruct = lowerModuleScopeStructVariables(

1093 M, ModuleScopeVariables, KernelsThatAllocateModuleLDS);

1094

1095DenseMap<Function *, LDSVariableReplacement> KernelToReplacement =

1096 lowerKernelScopeStructVariables(M, LDSUsesInfo, ModuleScopeVariables,

1097 KernelsThatAllocateModuleLDS,

1098 MaybeModuleScopeStruct);

1099

1100// Lower zero cost accesses to the kernel instances just created

1101for (auto &GV : KernelAccessVariables) {

1102auto &funcs = LDSToKernelsThatNeedToAccessItIndirectly[GV];

1103assert(funcs.size() == 1);// Only one kernel can access it

1104 LDSVariableReplacement Replacement =

1105 KernelToReplacement[*(funcs.begin())];

1106

1107DenseSet<GlobalVariable *> Vec;

1108 Vec.insert(GV);

1109

1110 replaceLDSVariablesWithStruct(M, Vec, Replacement, [](Use &U) {

1111return isa<Instruction>(U.getUser());

1112 });

1113 }

1114

1115// The ith element of this vector is kernel id i

1116 std::vector<Function *> OrderedKernels =

1117 assignLDSKernelIDToEachKernel(&M, KernelsThatAllocateTableLDS,

1118 KernelsThatIndirectlyAllocateDynamicLDS);

1119

1120if (!KernelsThatAllocateTableLDS.empty()) {

1121LLVMContext &Ctx = M.getContext();

1122IRBuilder<> Builder(Ctx);

1123

1124// The order must be consistent between lookup table and accesses to

1125// lookup table

1126auto TableLookupVariablesOrdered =

1127 sortByName(std::vector<GlobalVariable *>(TableLookupVariables.begin(),

1128 TableLookupVariables.end()));

1129

1130GlobalVariable *LookupTable = buildLookupTable(

1131 M, TableLookupVariablesOrdered, OrderedKernels, KernelToReplacement);

1132 replaceUsesInInstructionsWithTableLookup(M, TableLookupVariablesOrdered,

1133 LookupTable);

1134 }

1135

1136DenseMap<Function *, GlobalVariable *> KernelToCreatedDynamicLDS =

1137 lowerDynamicLDSVariables(M, LDSUsesInfo,

1138 KernelsThatIndirectlyAllocateDynamicLDS,

1139 DynamicVariables, OrderedKernels);

1140

1141// Strip amdgpu-no-lds-kernel-id from all functions reachable from the

1142// kernel. We may have inferred this wasn't used prior to the pass.

1143// TODO: We could filter out subgraphs that do not access LDS globals.

1144for (auto *KernelSet : {&KernelsThatIndirectlyAllocateDynamicLDS,

1145 &KernelsThatAllocateTableLDS})

1146for (Function *F : *KernelSet)

1147removeFnAttrFromReachable(CG,F, {"amdgpu-no-lds-kernel-id"});

1148

1149// All kernel frames have been allocated. Calculate and record the

1150// addresses.

1151 {

1152constDataLayout &DL = M.getDataLayout();

1153

1154for (Function &Func : M.functions()) {

1155if (Func.isDeclaration() || !isKernelLDS(&Func))

1156continue;

1157

1158// All three of these are optional. The first variable is allocated at

1159// zero. They are allocated by AMDGPUMachineFunction as one block.

1160// Layout:

1161//{

1162// module.lds

1163// alignment padding

1164// kernel instance

1165// alignment padding

1166// dynamic lds variables

1167//}

1168

1169constbool AllocateModuleScopeStruct =

1170 MaybeModuleScopeStruct &&

1171 KernelsThatAllocateModuleLDS.contains(&Func);

1172

1173auto Replacement = KernelToReplacement.find(&Func);

1174constbool AllocateKernelScopeStruct =

1175 Replacement != KernelToReplacement.end();

1176

1177constbool AllocateDynamicVariable =

1178 KernelToCreatedDynamicLDS.contains(&Func);

1179

1180uint32_t Offset = 0;

1181

1182if (AllocateModuleScopeStruct) {

1183// Allocated at zero, recorded once on construction, not once per

1184// kernel

1185Offset +=DL.getTypeAllocSize(MaybeModuleScopeStruct->getValueType());

1186 }

1187

1188if (AllocateKernelScopeStruct) {

1189GlobalVariable *KernelStruct = Replacement->second.SGV;

1190Offset =alignTo(Offset,AMDGPU::getAlign(DL, KernelStruct));

1191 recordLDSAbsoluteAddress(&M, KernelStruct,Offset);

1192Offset +=DL.getTypeAllocSize(KernelStruct->getValueType());

1193 }

1194

1195// If there is dynamic allocation, the alignment needed is included in

1196// the static frame size. There may be no reference to the dynamic

1197// variable in the kernel itself, so without including it here, that

1198// alignment padding could be missed.

1199if (AllocateDynamicVariable) {

1200GlobalVariable *DynamicVariable = KernelToCreatedDynamicLDS[&Func];

1201Offset =alignTo(Offset,AMDGPU::getAlign(DL, DynamicVariable));

1202 recordLDSAbsoluteAddress(&M, DynamicVariable,Offset);

1203 }

1204

1205if (Offset != 0) {

1206 (void)TM;// TODO: Account for target maximum LDS

1207 std::string Buffer;

1208raw_string_ostream SS{Buffer};

1209 SS <<format("%u",Offset);

1210

1211// Instead of explicitly marking kernels that access dynamic variables

1212// using special case metadata, annotate with min-lds == max-lds, i.e.

1213// that there is no more space available for allocating more static

1214// LDS variables. That is the right condition to prevent allocating

1215// more variables which would collide with the addresses assigned to

1216// dynamic variables.

1217if (AllocateDynamicVariable)

1218 SS <<format(",%u",Offset);

1219

1220 Func.addFnAttr("amdgpu-lds-size", Buffer);

1221 }

1222 }

1223 }

1224

1225for (auto &GV :make_early_inc_range(M.globals()))

1226if (AMDGPU::isLDSVariableToLower(GV)) {

1227// probably want to remove from used lists

1228 GV.removeDeadConstantUsers();

1229if (GV.use_empty())

1230 GV.eraseFromParent();

1231 }

1232

1233return Changed;

1234 }

1235

1236private:

1237// Increase the alignment of LDS globals if necessary to maximise the chance

1238// that we can use aligned LDS instructions to access them.

1239staticbool superAlignLDSGlobals(Module &M) {

1240constDataLayout &DL = M.getDataLayout();

1241bool Changed =false;

1242if (!SuperAlignLDSGlobals) {

1243return Changed;

1244 }

1245

1246for (auto &GV : M.globals()) {

1247if (GV.getType()->getPointerAddressSpace() !=AMDGPUAS::LOCAL_ADDRESS) {

1248// Only changing alignment of LDS variables

1249continue;

1250 }

1251if (!GV.hasInitializer()) {

1252// cuda/hip extern __shared__ variable, leave alignment alone

1253continue;

1254 }

1255

1256if (GV.isAbsoluteSymbolRef()) {

1257// If the variable is already allocated, don't change the alignment

1258continue;

1259 }

1260

1261Align Alignment =AMDGPU::getAlign(DL, &GV);

1262TypeSize GVSize =DL.getTypeAllocSize(GV.getValueType());

1263

1264if (GVSize > 8) {

1265// We might want to use a b96 or b128 load/store

1266 Alignment = std::max(Alignment,Align(16));

1267 }elseif (GVSize > 4) {

1268// We might want to use a b64 load/store

1269 Alignment = std::max(Alignment,Align(8));

1270 }elseif (GVSize > 2) {

1271// We might want to use a b32 load/store

1272 Alignment = std::max(Alignment,Align(4));

1273 }elseif (GVSize > 1) {

1274// We might want to use a b16 load/store

1275 Alignment = std::max(Alignment,Align(2));

1276 }

1277

1278if (Alignment !=AMDGPU::getAlign(DL, &GV)) {

1279 Changed =true;

1280 GV.setAlignment(Alignment);

1281 }

1282 }

1283return Changed;

1284 }

1285

1286static LDSVariableReplacement createLDSVariableReplacement(

1287Module &M, std::string VarName,

1288DenseSet<GlobalVariable *>const &LDSVarsToTransform) {

1289// Create a struct instance containing LDSVarsToTransform and map from those

1290// variables to ConstantExprGEP

1291// Variables may be introduced to meet alignment requirements. No aliasing

1292// metadata is useful for these as they have no uses. Erased before return.

1293

1294LLVMContext &Ctx = M.getContext();

1295constDataLayout &DL = M.getDataLayout();

1296assert(!LDSVarsToTransform.empty());

1297

1298SmallVector<OptimizedStructLayoutField, 8> LayoutFields;

1299 LayoutFields.reserve(LDSVarsToTransform.size());

1300 {

1301// The order of fields in this struct depends on the order of

1302// variables in the argument which varies when changing how they

1303// are identified, leading to spurious test breakage.

1304auto Sorted = sortByName(std::vector<GlobalVariable *>(

1305 LDSVarsToTransform.begin(), LDSVarsToTransform.end()));

1306

1307for (GlobalVariable *GV : Sorted) {

1308OptimizedStructLayoutField F(GV,

1309DL.getTypeAllocSize(GV->getValueType()),

1310AMDGPU::getAlign(DL, GV));

1311 LayoutFields.emplace_back(F);

1312 }

1313 }

1314

1315performOptimizedStructLayout(LayoutFields);

1316

1317 std::vector<GlobalVariable *> LocalVars;

1318BitVector IsPaddingField;

1319 LocalVars.reserve(LDSVarsToTransform.size());// will be at least this large

1320 IsPaddingField.reserve(LDSVarsToTransform.size());

1321 {

1322uint64_t CurrentOffset = 0;

1323for (auto &F : LayoutFields) {

1324GlobalVariable *FGV =

1325static_cast<GlobalVariable *>(const_cast<void *>(F.Id));

1326Align DataAlign =F.Alignment;

1327

1328uint64_t DataAlignV = DataAlign.value();

1329if (uint64_t Rem = CurrentOffset % DataAlignV) {

1330uint64_t Padding = DataAlignV - Rem;

1331

1332// Append an array of padding bytes to meet alignment requested

1333// Note (o + (a - (o % a)) ) % a == 0

1334// (offset + Padding ) % align == 0

1335

1336Type *ATy = ArrayType::get(Type::getInt8Ty(Ctx), Padding);

1337 LocalVars.push_back(newGlobalVariable(

1338 M, ATy,false,GlobalValue::InternalLinkage,

1339PoisonValue::get(ATy),"",nullptr,GlobalValue::NotThreadLocal,

1340AMDGPUAS::LOCAL_ADDRESS,false));

1341 IsPaddingField.push_back(true);

1342 CurrentOffset += Padding;

1343 }

1344

1345 LocalVars.push_back(FGV);

1346 IsPaddingField.push_back(false);

1347 CurrentOffset +=F.Size;

1348 }

1349 }

1350

1351 std::vector<Type *> LocalVarTypes;

1352 LocalVarTypes.reserve(LocalVars.size());

1353 std::transform(

1354 LocalVars.cbegin(), LocalVars.cend(), std::back_inserter(LocalVarTypes),

1355 [](constGlobalVariable *V) ->Type * { return V->getValueType(); });

1356

1357StructType *LDSTy =StructType::create(Ctx, LocalVarTypes, VarName +".t");

1358

1359Align StructAlign =AMDGPU::getAlign(DL, LocalVars[0]);

1360

1361GlobalVariable *SGV =newGlobalVariable(

1362 M, LDSTy,false,GlobalValue::InternalLinkage,PoisonValue::get(LDSTy),

1363 VarName,nullptr,GlobalValue::NotThreadLocal,AMDGPUAS::LOCAL_ADDRESS,

1364false);

1365 SGV->setAlignment(StructAlign);

1366

1367DenseMap<GlobalVariable *, Constant *> Map;

1368Type *I32 =Type::getInt32Ty(Ctx);

1369for (size_tI = 0;I < LocalVars.size();I++) {

1370GlobalVariable *GV = LocalVars[I];

1371Constant *GEPIdx[] = {ConstantInt::get(I32, 0), ConstantInt::get(I32,I)};

1372Constant *GEP =ConstantExpr::getGetElementPtr(LDSTy, SGV, GEPIdx,true);

1373if (IsPaddingField[I]) {

1374assert(GV->use_empty());

1375 GV->eraseFromParent();

1376 }else {

1377 Map[GV] =GEP;

1378 }

1379 }

1380assert(Map.size() == LDSVarsToTransform.size());

1381return {SGV, std::move(Map)};

1382 }

1383

1384template <typename PredicateTy>

1385staticvoid replaceLDSVariablesWithStruct(

1386Module &M,DenseSet<GlobalVariable *>const &LDSVarsToTransformArg,

1387const LDSVariableReplacement &Replacement, PredicateTyPredicate) {

1388LLVMContext &Ctx = M.getContext();

1389constDataLayout &DL = M.getDataLayout();

1390

1391// A hack... we need to insert the aliasing info in a predictable order for

1392// lit tests. Would like to have them in a stable order already, ideally the

1393// same order they get allocated, which might mean an ordered set container

1394auto LDSVarsToTransform = sortByName(std::vector<GlobalVariable *>(

1395 LDSVarsToTransformArg.begin(), LDSVarsToTransformArg.end()));

1396

1397// Create alias.scope and their lists. Each field in the new structure

1398// does not alias with all other fields.

1399SmallVector<MDNode *> AliasScopes;

1400SmallVector<Metadata *> NoAliasList;

1401constsize_t NumberVars = LDSVarsToTransform.size();

1402if (NumberVars > 1) {

1403MDBuilder MDB(Ctx);

1404 AliasScopes.reserve(NumberVars);

1405MDNode *Domain = MDB.createAnonymousAliasScopeDomain();

1406for (size_tI = 0;I < NumberVars;I++) {

1407MDNode *Scope = MDB.createAnonymousAliasScope(Domain);

1408 AliasScopes.push_back(Scope);

1409 }

1410 NoAliasList.append(&AliasScopes[1], AliasScopes.end());

1411 }

1412

1413// Replace uses of ith variable with a constantexpr to the corresponding

1414// field of the instance that will be allocated by AMDGPUMachineFunction

1415for (size_tI = 0;I < NumberVars;I++) {

1416GlobalVariable *GV = LDSVarsToTransform[I];

1417Constant *GEP = Replacement.LDSVarsToConstantGEP.at(GV);

1418

1419 GV->replaceUsesWithIf(GEP,Predicate);

1420

1421APInt APOff(DL.getIndexTypeSizeInBits(GEP->getType()), 0);

1422GEP->stripAndAccumulateInBoundsConstantOffsets(DL, APOff);

1423uint64_t Offset = APOff.getZExtValue();

1424

1425Align A =

1426commonAlignment(Replacement.SGV->getAlign().valueOrOne(),Offset);

1427

1428if (I)

1429 NoAliasList[I - 1] = AliasScopes[I - 1];

1430MDNode *NoAlias =

1431 NoAliasList.empty() ? nullptr :MDNode::get(Ctx, NoAliasList);

1432MDNode *AliasScope =

1433 AliasScopes.empty() ? nullptr :MDNode::get(Ctx, {AliasScopes[I]});

1434

1435 refineUsesAlignmentAndAA(GEP,A,DL, AliasScope, NoAlias);

1436 }

1437 }

1438

1439staticvoid refineUsesAlignmentAndAA(Value *Ptr,Align A,

1440constDataLayout &DL,MDNode *AliasScope,

1441MDNode *NoAlias,unsigned MaxDepth = 5) {

1442if (!MaxDepth || (A == 1 && !AliasScope))

1443return;

1444

1445for (User *U :Ptr->users()) {

1446if (auto *I = dyn_cast<Instruction>(U)) {

1447if (AliasScope &&I->mayReadOrWriteMemory()) {

1448MDNode *AS =I->getMetadata(LLVMContext::MD_alias_scope);

1449 AS = (AS ?MDNode::getMostGenericAliasScope(AS, AliasScope)

1450 : AliasScope);

1451I->setMetadata(LLVMContext::MD_alias_scope, AS);

1452

1453MDNode *NA =I->getMetadata(LLVMContext::MD_noalias);

1454 NA = (NA ?MDNode::intersect(NA, NoAlias) : NoAlias);

1455I->setMetadata(LLVMContext::MD_noalias, NA);

1456 }

1457 }

1458

1459if (auto *LI = dyn_cast<LoadInst>(U)) {

1460 LI->setAlignment(std::max(A, LI->getAlign()));

1461continue;

1462 }

1463if (auto *SI = dyn_cast<StoreInst>(U)) {

1464if (SI->getPointerOperand() ==Ptr)

1465 SI->setAlignment(std::max(A, SI->getAlign()));

1466continue;

1467 }

1468if (auto *AI = dyn_cast<AtomicRMWInst>(U)) {

1469// None of atomicrmw operations can work on pointers, but let's

1470// check it anyway in case it will or we will process ConstantExpr.

1471if (AI->getPointerOperand() ==Ptr)

1472 AI->setAlignment(std::max(A, AI->getAlign()));

1473continue;

1474 }

1475if (auto *AI = dyn_cast<AtomicCmpXchgInst>(U)) {

1476if (AI->getPointerOperand() ==Ptr)

1477 AI->setAlignment(std::max(A, AI->getAlign()));

1478continue;

1479 }

1480if (auto *GEP = dyn_cast<GetElementPtrInst>(U)) {

1481unsignedBitWidth =DL.getIndexTypeSizeInBits(GEP->getType());

1482APInt Off(BitWidth, 0);

1483if (GEP->getPointerOperand() ==Ptr) {

1484Align GA;

1485if (GEP->accumulateConstantOffset(DL, Off))

1486 GA =commonAlignment(A, Off.getLimitedValue());

1487 refineUsesAlignmentAndAA(GEP, GA,DL, AliasScope, NoAlias,

1488 MaxDepth - 1);

1489 }

1490continue;

1491 }

1492if (auto *I = dyn_cast<Instruction>(U)) {

1493if (I->getOpcode() == Instruction::BitCast ||

1494I->getOpcode() == Instruction::AddrSpaceCast)

1495 refineUsesAlignmentAndAA(I,A,DL, AliasScope, NoAlias, MaxDepth - 1);

1496 }

1497 }

1498 }

1499};

1500

1501classAMDGPULowerModuleLDSLegacy :publicModulePass {

1502public:

1503constAMDGPUTargetMachine *TM;

1504staticcharID;

1505

1506 AMDGPULowerModuleLDSLegacy(constAMDGPUTargetMachine *TM_ =nullptr)

1507 :ModulePass(ID), TM(TM_) {

1508initializeAMDGPULowerModuleLDSLegacyPass(*PassRegistry::getPassRegistry());

1509 }

1510

1511voidgetAnalysisUsage(AnalysisUsage &AU) const override{

1512if (!TM)

1513 AU.addRequired<TargetPassConfig>();

1514 }

1515

1516boolrunOnModule(Module &M) override{

1517if (!TM) {

1518auto &TPC = getAnalysis<TargetPassConfig>();

1519 TM = &TPC.getTM<AMDGPUTargetMachine>();

1520 }

1521

1522return AMDGPULowerModuleLDS(*TM).runOnModule(M);

1523 }

1524};

1525

1526}// namespace

1527char AMDGPULowerModuleLDSLegacy::ID = 0;

1528

1529char &llvm::AMDGPULowerModuleLDSLegacyPassID = AMDGPULowerModuleLDSLegacy::ID;

1530

1531INITIALIZE_PASS_BEGIN(AMDGPULowerModuleLDSLegacy,DEBUG_TYPE,

1532"Lower uses of LDS variables from non-kernel functions",

1533false,false)

1534INITIALIZE_PASS_DEPENDENCY(TargetPassConfig)

1535INITIALIZE_PASS_END(AMDGPULowerModuleLDSLegacy,DEBUG_TYPE,

1536 "Lower uses ofLDS variables from non-kernelfunctions",

1537false,false)

1538

1539ModulePass *

1540llvm::createAMDGPULowerModuleLDSLegacyPass(const AMDGPUTargetMachine *TM) {

1541returnnew AMDGPULowerModuleLDSLegacy(TM);

1542}

1543

1544PreservedAnalyses AMDGPULowerModuleLDSPass::run(Module &M,

1545ModuleAnalysisManager &) {

1546return AMDGPULowerModuleLDS(TM).runOnModule(M) ?PreservedAnalyses::none()

1547 :PreservedAnalyses::all();

1548}

const

aarch64 promote const

Definition:AArch64PromoteConstant.cpp:230

AMDGPUBaseInfo.h

functions

Lower uses of LDS variables from non kernel functions

Definition:AMDGPULowerModuleLDSPass.cpp:1536

DEBUG_TYPE

#define DEBUG_TYPE

Definition:AMDGPULowerModuleLDSPass.cpp:212

AMDGPUMemoryUtils.h

LDS

AMDGPU promote alloca to vector or LDS

Definition:AMDGPUPromoteAlloca.cpp:209

AMDGPUTargetMachine.h

The AMDGPU TargetMachine interface definition for hw codegen targets.

AMDGPU.h

MachineBasicBlock MachineBasicBlock::iterator DebugLoc DL

Definition:ARMSLSHardening.cpp:73

BasicBlockUtils.h

BitVector.h

This file implements the BitVector class.

static GCRegistry::Add< ErlangGC > A("erlang", "erlang-compatible garbage collector")

CallGraph.h

This file provides interfaces used to build and manipulate a call graph, which is a very useful tool ...

CommandLine.h

clEnumValN

#define clEnumValN(ENUMVAL, FLAGNAME, DESC)

Definition:CommandLine.h:686

Constants.h

This file contains the declarations for the subclasses of Constant, which represent the different fla...

Domain

Definition:CorrelatedValuePropagation.cpp:744

uses

Given that RA is a live propagate it s liveness to any other values it uses(according to Uses). void DeadArgumentEliminationPass

Definition:DeadArgumentElimination.cpp:717

Debug.h

DenseMap.h

This file defines the DenseMap class.

DenseSet.h

This file defines the DenseSet and SmallDenseSet classes.

DerivedTypes.h

Size

uint64_t Size

Definition:ELFObjHandler.cpp:81

Other

std::optional< std::vector< StOtherPiece > > Other

Definition:ELFYAML.cpp:1315

Format.h

func

global merge func

Definition:GlobalMergeFunctions.cpp:622

GEP

Hexagon Common GEP

Definition:HexagonCommonGEP.cpp:170

#define F(x, y, z)

Definition:MD5.cpp:55

#define I(x, y, z)

Definition:MD5.cpp:58

MDBuilder.h

ModuleUtils.h

OptimizedStructLayout.h

This file provides an interface for laying out a sequence of fields as a struct in a way that attempt...

INITIALIZE_PASS_DEPENDENCY

#define INITIALIZE_PASS_DEPENDENCY(depName)

Definition:PassSupport.h:55

INITIALIZE_PASS_END

#define INITIALIZE_PASS_END(passName, arg, name, cfg, analysis)

Definition:PassSupport.h:57

INITIALIZE_PASS_BEGIN

#define INITIALIZE_PASS_BEGIN(passName, arg, name, cfg, analysis)

Definition:PassSupport.h:52

Pass.h

ReplaceConstant.h

assert

assert(ImpDefSCC.getReg()==AMDGPU::SCC &&ImpDefSCC.isDef())

Address

@ Address

Definition:SPIRVEmitNonSemanticDI.cpp:68

STLExtras.h

This file contains some templates that are useful if you are working with the STL at all.

SetOperations.h

This file defines generic set operations that may be used on set's of different types,...

Ptr

@ Ptr

Definition:TargetLibraryInfo.cpp:77

TargetPassConfig.h

Target-Independent Code Generator Pass Configuration Options pass.

ArrayType

Definition:ItaniumDemangle.h:785

Predicate

Definition:AMDGPURegBankLegalizeRules.cpp:332

llvm::AMDGPUTargetMachine

Definition:AMDGPUTargetMachine.h:31

llvm::APInt

Class for arbitrary precision integers.

Definition:APInt.h:78

llvm::APInt::getZExtValue

uint64_t getZExtValue() const

Get zero extended value.

Definition:APInt.h:1520

llvm::AnalysisManager

A container for analyses that lazily runs them and caches their results.

Definition:PassManager.h:253

llvm::AnalysisUsage

Represent the analysis usage information of a pass.

Definition:PassAnalysisSupport.h:47

llvm::AnalysisUsage::addRequired

AnalysisUsage & addRequired()

Definition:PassAnalysisSupport.h:75

llvm::ArrayRef

ArrayRef - Represent a constant reference to an array (0 or more elements consecutively in memory),...

Definition:ArrayRef.h:41

llvm::ArrayRef::size

size_t size() const

size - Get the array size.

Definition:ArrayRef.h:168

llvm::ArrayRef::empty

bool empty() const

empty - Check if the array is empty.

Definition:ArrayRef.h:163

llvm::BasicBlock

LLVM Basic Block Representation.

Definition:BasicBlock.h:61

llvm::BasicBlock::getFirstInsertionPt

const_iterator getFirstInsertionPt() const

Returns an iterator to the first instruction in this block that is suitable for inserting a non-PHI i...

Definition:BasicBlock.cpp:437

llvm::BitVector

Definition:BitVector.h:82

llvm::BitVector::reserve

void reserve(unsigned N)

Definition:BitVector.h:348

llvm::BitVector::push_back

void push_back(bool Val)

Definition:BitVector.h:466

llvm::CallGraph

The basic data container for the call graph of a Module of IR.

Definition:CallGraph.h:71

llvm::ConstantArray::get

static Constant * get(ArrayType *T, ArrayRef< Constant * > V)

Definition:Constants.cpp:1312

llvm::ConstantAsMetadata::get

static ConstantAsMetadata * get(Constant *C)

Definition:Metadata.h:532

llvm::ConstantExpr::getPointerBitCastOrAddrSpaceCast

static Constant * getPointerBitCastOrAddrSpaceCast(Constant *C, Type *Ty)

Create a BitCast or AddrSpaceCast for a pointer type depending on the address space.

Definition:Constants.cpp:2268

llvm::ConstantExpr::getPtrToInt

static Constant * getPtrToInt(Constant *C, Type *Ty, bool OnlyIfReduced=false)

Definition:Constants.cpp:2293

llvm::ConstantExpr::getGetElementPtr

static Constant * getGetElementPtr(Type *Ty, Constant *C, ArrayRef< Constant * > IdxList, GEPNoWrapFlags NW=GEPNoWrapFlags::none(), std::optional< ConstantRange > InRange=std::nullopt, Type *OnlyIfReducedTy=nullptr)

Getelementptr form.

Definition:Constants.h:1267

llvm::Constant

This is an important base class in LLVM.

Definition:Constant.h:42

llvm::Constant::removeDeadConstantUsers

void removeDeadConstantUsers() const

If there are any dead constant users dangling off of this constant, remove them.

Definition:Constants.cpp:739

llvm::DataLayout

A parsed version of the target data layout string in and methods for querying it.

Definition:DataLayout.h:63

llvm::DenseMapBase::find

iterator find(const_arg_type_t< KeyT > Val)

Definition:DenseMap.h:156

llvm::DenseMapBase::try_emplace

std::pair< iterator, bool > try_emplace(KeyT &&Key, Ts &&...Args)

Definition:DenseMap.h:226

llvm::DenseMapBase::erase

bool erase(const KeyT &Val)

Definition:DenseMap.h:321

llvm::DenseMapBase::end

iterator end()

Definition:DenseMap.h:84

llvm::DenseMapBase::contains

bool contains(const_arg_type_t< KeyT > Val) const

Return true if the specified key is in the map, false otherwise.

Definition:DenseMap.h:147

llvm::DenseMapBase::insert

std::pair< iterator, bool > insert(const std::pair< KeyT, ValueT > &KV)

Definition:DenseMap.h:211

llvm::DenseMap

Definition:DenseMap.h:727

llvm::DenseSet

Implements a dense probed hash-table based set.

Definition:DenseSet.h:278

llvm::Function

Definition:Function.h:63

llvm::GlobalObject::setMetadata

void setMetadata(unsigned KindID, MDNode *Node)

Set a particular kind of metadata attachment.

Definition:Metadata.cpp:1531

llvm::GlobalObject::setAlignment

void setAlignment(Align Align)

Sets the alignment attribute of the GlobalObject.

Definition:Globals.cpp:143

llvm::GlobalValue

Definition:GlobalValue.h:48

llvm::GlobalValue::NotThreadLocal

@ NotThreadLocal

Definition:GlobalValue.h:197

llvm::GlobalValue::getLinkage

LinkageTypes getLinkage() const

Definition:GlobalValue.h:547

llvm::GlobalValue::isAbsoluteSymbolRef

bool isAbsoluteSymbolRef() const

Returns whether this is a reference to an absolute symbol.

Definition:Globals.cpp:405

llvm::GlobalValue::getThreadLocalMode

ThreadLocalMode getThreadLocalMode() const

Definition:GlobalValue.h:272

llvm::GlobalValue::getType

PointerType * getType() const

Global values are always pointers.

Definition:GlobalValue.h:295

llvm::GlobalValue::InternalLinkage

@ InternalLinkage

Rename collisions when linking (static functions).

Definition:GlobalValue.h:59

llvm::GlobalValue::ExternalLinkage

@ ExternalLinkage

Externally visible function.

Definition:GlobalValue.h:52

llvm::GlobalValue::getValueType

Type * getValueType() const

Definition:GlobalValue.h:297

llvm::GlobalVariable

Definition:GlobalVariable.h:39

llvm::GlobalVariable::getInitializer

const Constant * getInitializer() const

getInitializer - Return the initializer for this global variable.

Definition:GlobalVariable.h:150

llvm::GlobalVariable::hasInitializer

bool hasInitializer() const

Definitions have initializers, declarations don't.

Definition:GlobalVariable.h:106

llvm::GlobalVariable::copyAttributesFrom

void copyAttributesFrom(const GlobalVariable *Src)

copyAttributesFrom - copy all additional attributes (those not needed to create a GlobalVariable) fro...

Definition:Globals.cpp:521

llvm::GlobalVariable::isConstant

bool isConstant() const

If the value is a global constant, its value is immutable throughout the runtime execution of the pro...

Definition:GlobalVariable.h:173

llvm::GlobalVariable::eraseFromParent

void eraseFromParent()

eraseFromParent - This method unlinks 'this' from the containing module and deletes it.

Definition:Globals.cpp:488

llvm::IRBuilderBase::CreateIntToPtr

Value * CreateIntToPtr(Value *V, Type *DestTy, const Twine &Name="")

Definition:IRBuilder.h:2147

llvm::IRBuilderBase::CreateConstInBoundsGEP1_32

Value * CreateConstInBoundsGEP1_32(Type *Ty, Value *Ptr, unsigned Idx0, const Twine &Name="")

Definition:IRBuilder.h:1897

llvm::IRBuilderBase::CreateInBoundsGEP

Value * CreateInBoundsGEP(Type *Ty, Value *Ptr, ArrayRef< Value * > IdxList, const Twine &Name="")

Definition:IRBuilder.h:1882

llvm::IRBuilderBase::CreateIntrinsic

CallInst * CreateIntrinsic(Intrinsic::ID ID, ArrayRef< Type * > Types, ArrayRef< Value * > Args, FMFSource FMFSource={}, const Twine &Name="")

Create a call to intrinsic ID with Args, mangled using Types.

Definition:IRBuilder.cpp:900

llvm::IRBuilderBase::getInt32

ConstantInt * getInt32(uint32_t C)

Get a constant 32-bit value.

Definition:IRBuilder.h:505

llvm::IRBuilderBase::CreateLoad

LoadInst * CreateLoad(Type *Ty, Value *Ptr, const char *Name)

Provided to resolve 'CreateLoad(Ty, Ptr, "...")' correctly, instead of converting the string to 'bool...

Definition:IRBuilder.h:1798

llvm::IRBuilderBase::CreateCall

CallInst * CreateCall(FunctionType *FTy, Value *Callee, ArrayRef< Value * > Args={}, const Twine &Name="", MDNode *FPMathTag=nullptr)

Definition:IRBuilder.h:2449

llvm::IRBuilderBase::SetInsertPoint

void SetInsertPoint(BasicBlock *TheBB)

This specifies that created instructions should be appended to the end of the specified block.

Definition:IRBuilder.h:199

llvm::IRBuilder

This provides a uniform API for creating instructions and inserting them into a basic block: either a...

Definition:IRBuilder.h:2705

llvm::Instruction

Definition:Instruction.h:68

llvm::LLVMContext

This is an important class for using LLVM in a threaded context.

Definition:LLVMContext.h:67

llvm::MDBuilder

Definition:MDBuilder.h:36

llvm::MDBuilder::createAnonymousAliasScope

MDNode * createAnonymousAliasScope(MDNode *Domain, StringRef Name=StringRef())

Return metadata appropriate for an alias scope root node.

Definition:MDBuilder.h:174

llvm::MDBuilder::createAnonymousAliasScopeDomain

MDNode * createAnonymousAliasScopeDomain(StringRef Name=StringRef())

Return metadata appropriate for an alias scope domain node.

Definition:MDBuilder.h:167

llvm::MDNode

Metadata node.

Definition:Metadata.h:1073

llvm::MDNode::getMostGenericAliasScope

static MDNode * getMostGenericAliasScope(MDNode *A, MDNode *B)

Definition:Metadata.cpp:1141

llvm::MDNode::get

static MDTuple * get(LLVMContext &Context, ArrayRef< Metadata * > MDs)

Definition:Metadata.h:1549

llvm::MDNode::intersect

static MDNode * intersect(MDNode *A, MDNode *B)

Definition:Metadata.cpp:1128

llvm::Metadata

Root of the metadata hierarchy.

Definition:Metadata.h:62

llvm::ModulePass

ModulePass class - This class is used to implement unstructured interprocedural optimizations and ana...

Definition:Pass.h:251

llvm::ModulePass::runOnModule

virtual bool runOnModule(Module &M)=0

runOnModule - Virtual method overriden by subclasses to process the module being operated on.

llvm::Module

A Module instance is used to store all the information related to an LLVM module.

Definition:Module.h:65

llvm::OperandBundleDefT

A container for an operand bundle being viewed as a set of values rather than a set of uses.

Definition:InstrTypes.h:1065

llvm::PassRegistry::getPassRegistry

static PassRegistry * getPassRegistry()

getPassRegistry - Access the global registry object, which is automatically initialized at applicatio...

Definition:PassRegistry.cpp:24

llvm::Pass::getAnalysisUsage

virtual void getAnalysisUsage(AnalysisUsage &) const

getAnalysisUsage - This function should be overriden by passes that need analysis information to do t...

Definition:Pass.cpp:98

llvm::PointerType::getAddressSpace

unsigned getAddressSpace() const

Return the address space of the Pointer type.

Definition:DerivedTypes.h:703

llvm::PoisonValue::get

static PoisonValue * get(Type *T)

Static factory methods - Return an 'poison' object of the specified type.

Definition:Constants.cpp:1878

llvm::PreservedAnalyses

A set of analyses that are preserved following a run of a transformation pass.

Definition:Analysis.h:111

llvm::PreservedAnalyses::none

static PreservedAnalyses none()

Convenience factory function for the empty preserved set.

Definition:Analysis.h:114

llvm::PreservedAnalyses::all

static PreservedAnalyses all()

Construct a special preserved set that preserves all passes.

Definition:Analysis.h:117

llvm::SmallPtrSetImpl::count

size_type count(ConstPtrType Ptr) const

count - Return 1 if the specified pointer is in the set, 0 otherwise.

Definition:SmallPtrSet.h:452

llvm::SmallPtrSetImpl::insert

std::pair< iterator, bool > insert(PtrType Ptr)

Inserts Ptr if and only if there is no element in the container equal to Ptr.

Definition:SmallPtrSet.h:384

llvm::SmallPtrSet

SmallPtrSet - This class implements a set which is optimized for holding SmallSize or less elements.

Definition:SmallPtrSet.h:519

llvm::SmallVectorBase::empty

bool empty() const

Definition:SmallVector.h:81

llvm::SmallVectorImpl::emplace_back

reference emplace_back(ArgTypes &&... Args)

Definition:SmallVector.h:937

llvm::SmallVectorImpl::reserve

void reserve(size_type N)

Definition:SmallVector.h:663

llvm::SmallVectorImpl::append

void append(ItTy in_start, ItTy in_end)

Add the specified range to the end of the SmallVector.

Definition:SmallVector.h:683

llvm::SmallVectorTemplateBase::push_back

void push_back(const T &Elt)

Definition:SmallVector.h:413

llvm::SmallVectorTemplateCommon::end

iterator end()

Definition:SmallVector.h:269

llvm::SmallVector

This is a 'vector' (really, a variable-sized array), optimized for the case when the array is small.

Definition:SmallVector.h:1196

llvm::StructType

Class to represent struct types.

Definition:DerivedTypes.h:218

llvm::StructType::create

static StructType * create(LLVMContext &Context, StringRef Name)

This creates an identified struct.

Definition:Type.cpp:612

llvm::TargetPassConfig

Target-Independent Code Generator Pass Configuration Options.

Definition:TargetPassConfig.h:85

llvm::Twine

Twine - A lightweight data structure for efficiently representing the concatenation of temporary valu...

Definition:Twine.h:81

llvm::TypeSize

Definition:TypeSize.h:334

llvm::Type

The instances of the Type class are immutable: once they are created, they are never changed.

Definition:Type.h:45

llvm::Type::getPointerAddressSpace

unsigned getPointerAddressSpace() const

Get the address space of this pointer or pointer vector type.

llvm::Type::getInt8Ty

static IntegerType * getInt8Ty(LLVMContext &C)

llvm::Type::getInt32Ty

static IntegerType * getInt32Ty(LLVMContext &C)

llvm::Use

A Use represents the edge between a Value definition and its users.

Definition:Use.h:43

llvm::User

Definition:User.h:44

llvm::Value

LLVM Value Representation.

Definition:Value.h:74

llvm::Value::replaceUsesWithIf

void replaceUsesWithIf(Value *New, llvm::function_ref< bool(Use &U)> ShouldReplace)

Go through the uses list for this definition and make each use point to "V" if the callback ShouldRep...

Definition:Value.cpp:542

llvm::Value::use_empty

bool use_empty() const

Definition:Value.h:344

llvm::Value::uses

iterator_range< use_iterator > uses()

Definition:Value.h:376

llvm::Value::getName

StringRef getName() const

Return a constant reference to the value's name.

Definition:Value.cpp:309

llvm::cl::opt

Definition:CommandLine.h:1423

llvm::detail::DenseSetImpl::insert

std::pair< iterator, bool > insert(const ValueT &V)

Definition:DenseSet.h:213

llvm::detail::DenseSetImpl::end

iterator end()

Definition:DenseSet.h:182

llvm::detail::DenseSetImpl::size

size_type size() const

Definition:DenseSet.h:81

llvm::detail::DenseSetImpl::empty

bool empty() const

Definition:DenseSet.h:80

llvm::detail::DenseSetImpl::contains

bool contains(const_arg_type_t< ValueT > V) const

Check if the set contains the given element.

Definition:DenseSet.h:193

llvm::detail::DenseSetImpl::begin

iterator begin()

Definition:DenseSet.h:181

llvm::detail::DenseSetImpl::erase

bool erase(const ValueT &V)

Definition:DenseSet.h:97

llvm::raw_string_ostream

A raw_ostream that writes to an std::string.

Definition:raw_ostream.h:661

Definition:StackSlotColoring.cpp:193

llvm::AMDGPUAS::LOCAL_ADDRESS

@ LOCAL_ADDRESS

Address space for local memory.

Definition:AMDGPUAddrSpace.h:35

llvm::AMDGPUAS::CONSTANT_ADDRESS

@ CONSTANT_ADDRESS

Address space for constant memory (VTX2).

Definition:AMDGPUAddrSpace.h:34

llvm::AMDGPU::Barrier::BARRIER_SCOPE_WORKGROUP

@ BARRIER_SCOPE_WORKGROUP

Definition:SIDefines.h:1081

llvm::AMDGPU::isDynamicLDS

bool isDynamicLDS(const GlobalVariable &GV)

Definition:AMDGPUMemoryUtils.cpp:56

llvm::AMDGPU::removeFnAttrFromReachable

void removeFnAttrFromReachable(CallGraph &CG, Function *KernelRoot, ArrayRef< StringRef > FnAttrs)

Strip FnAttr attribute from any functions where we may have introduced its use.

Definition:AMDGPUMemoryUtils.cpp:306

llvm::AMDGPU::getTransitiveUsesOfLDS

LDSUsesInfoTy getTransitiveUsesOfLDS(const CallGraph &CG, Module &M)

Definition:AMDGPUMemoryUtils.cpp:138

llvm::AMDGPU::isNamedBarrier

TargetExtType * isNamedBarrier(const GlobalVariable &GV)

Definition:AMDGPUMemoryUtils.cpp:34

llvm::AMDGPU::isLDSVariableToLower

bool isLDSVariableToLower(const GlobalVariable &GV)

Definition:AMDGPUMemoryUtils.cpp:65

llvm::AMDGPU::eliminateConstantExprUsesOfLDSFromAllInstructions

bool eliminateConstantExprUsesOfLDSFromAllInstructions(Module &M)

Definition:AMDGPUMemoryUtils.cpp:86

llvm::AMDGPU::getAlign

Align getAlign(const DataLayout &DL, const GlobalVariable *GV)

Definition:AMDGPUMemoryUtils.cpp:29

llvm::AMDGPU::isKernelLDS

bool isKernelLDS(const Function *F)

Definition:AMDGPUMemoryUtils.cpp:127

llvm::CallingConv::C

@ C

The default llvm calling convention, compatible with C.

Definition:CallingConv.h:34

llvm::CallingConv::ID

unsigned ID

LLVM IR allows to use arbitrary numbers as calling convention identifiers.

Definition:CallingConv.h:24

llvm::Intrinsic::getOrInsertDeclaration

Function * getOrInsertDeclaration(Module *M, ID id, ArrayRef< Type * > Tys={})

Look up the Function declaration of the intrinsic id in the Module M.

Definition:Intrinsics.cpp:732

llvm::cl::Hidden

@ Hidden

Definition:CommandLine.h:137

llvm::cl::values

ValuesClass values(OptsTy... Options)

Helper to build a ValuesClass by forwarding a variable number of arguments as an initializer list to ...

Definition:CommandLine.h:711

llvm::cl::init

initializer< Ty > init(const Ty &Val)

Definition:CommandLine.h:443

llvm

This is an optimization pass for GlobalISel generic memory operations.

Definition:AddressRanges.h:18

llvm::Offset

@ Offset

Definition:DWP.cpp:480

llvm::operator<

bool operator<(int64_t V1, const APSInt &V2)

Definition:APSInt.h:361

llvm::size

auto size(R &&Range, std::enable_if_t< std::is_base_of< std::random_access_iterator_tag, typename std::iterator_traits< decltype(Range.begin())>::iterator_category >::value, void > *=nullptr)

Get the size of a range.

Definition:STLExtras.h:1697

llvm::set_is_subset

bool set_is_subset(const S1Ty &S1, const S2Ty &S2)

set_is_subset(A, B) - Return true iff A in B

Definition:SetOperations.h:151

llvm::make_early_inc_range

iterator_range< early_inc_iterator_impl< detail::IterOfRange< RangeT > > > make_early_inc_range(RangeT &&Range)

Make a range that does early increment to allow mutation of the underlying range without disrupting i...

Definition:STLExtras.h:657

llvm::initializeAMDGPULowerModuleLDSLegacyPass

void initializeAMDGPULowerModuleLDSLegacyPass(PassRegistry &)