1//===- LoadStoreVectorizer.cpp - GPU Load & Store Vectorizer --------------===// 3// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. 4// See https://llvm.org/LICENSE.txt for license information. 5// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception 7//===----------------------------------------------------------------------===// 9// This pass merges loads/stores to/from sequential memory addresses into vector 10// loads/stores. Although there's nothing GPU-specific in here, this pass is 11// motivated by the microarchitectural quirks of nVidia and AMD GPUs. 13// (For simplicity below we talk about loads only, but everything also applies 16// This pass is intended to be run late in the pipeline, after other 17// vectorization opportunities have been exploited. So the assumption here is 18// that immediately following our new vector load we'll need to extract out the 19// individual elements of the load, so we can operate on them individually. 21// On CPUs this transformation is usually not beneficial, because extracting the 22// elements of a vector register is expensive on most architectures. It's 23// usually better just to load each element individually into its own scalar 26// However, nVidia and AMD GPUs don't have proper vector registers. Instead, a 27// "vector load" loads directly into a series of scalar registers. In effect, 28// extracting the elements of the vector is free. It's therefore always 29// beneficial to vectorize a sequence of loads on these architectures. 31// Vectorizing (perhaps a better name might be "coalescing") loads can have 32// large performance impacts on GPU kernels, and opportunities for vectorizing 33// are common in GPU code. This pass tries very hard to find such 34// opportunities; its runtime is quadratic in the number of loads in a BB. 36// Some CPU architectures, such as ARM, have instructions that load into 37// multiple scalar registers, similar to a GPU vectorized load. In theory ARM 38// could use this pass (with some modifications), but currently it implements 39// its own pass to do something similar to what we do here. 41// Overview of the algorithm and terminology in this pass: 43// - Break up each basic block into pseudo-BBs, composed of instructions which 44// are guaranteed to transfer control to their successors. 45// - Within a single pseudo-BB, find all loads, and group them into 46// "equivalence classes" according to getUnderlyingObject() and loaded 47// element size. Do the same for stores. 48// - For each equivalence class, greedily build "chains". Each chain has a 49// leader instruction, and every other member of the chain has a known 50// constant offset from the first instr in the chain. 51// - Break up chains so that they contain only contiguous accesses of legal 52// size with no intervening may-alias instrs. 53// - Convert each chain to vector instructions. 55// The O(n^2) behavior of this pass comes from initially building the chains. 56// In the worst case we have to compare each new instruction to all of those 57// that came before. To limit this, we only calculate the offset to the leaders 58// of the N most recently-used chains. 114#include <type_traits> 120#define DEBUG_TYPE "load-store-vectorizer" 122STATISTIC(NumVectorInstructions,
"Number of vector accesses generated");
123STATISTIC(NumScalarsVectorized,
"Number of scalar accesses vectorized");
127// Equivalence class key, the initial tuple by which we group loads/stores. 128// Loads/stores with different EqClassKeys are never merged. 130// (We could in theory remove element-size from the this tuple. We'd just need 131// to fix up the vector packing/unpacking code.) 133 std::tuple<
constValue *
/* result of getUnderlyingObject() */,
134unsigned/* AddrSpace */,
135unsigned/* Load/Store element size bits */,
136char/* IsLoad; char b/c bool can't be a DenseMap key */ 139const EqClassKey &K) {
142 <<
" of element size " << ElementSize <<
" bits in addrspace " 146// A Chain is a set of instructions such that: 147// - All instructions have the same equivalence class, so in particular all are 148// loads, or all are stores. 149// - We know the address accessed by the i'th chain elem relative to the 150// chain's leader instruction, which is the first instr of the chain in BB 153// Chains have two canonical orderings: 154// - BB order, sorted by Instr->comesBefore. 155// - Offset order, sorted by OffsetFromLeader. 156// This pass switches back and forth between these orders. 159APInt OffsetFromLeader;
161 : Inst(
std::
move(Inst)), OffsetFromLeader(
std::
move(OffsetFromLeader)) {}
165void sortChainInBBOrder(Chain &
C) {
166sort(
C, [](
auto &
A,
auto &
B) {
returnA.Inst->comesBefore(
B.Inst); });
169void sortChainInOffsetOrder(Chain &
C) {
170sort(
C, [](
constauto &
A,
constauto &
B) {
171if (
A.OffsetFromLeader !=
B.OffsetFromLeader)
172returnA.OffsetFromLeader.slt(
B.OffsetFromLeader);
173returnA.Inst->comesBefore(
B.Inst);
// stable tiebreaker 178for (
constauto &E :
C) {
179dbgs() <<
" " << *E.Inst <<
" (offset " << E.OffsetFromLeader <<
")\n";
183usingEquivalenceClassMap =
186// FIXME: Assuming stack alignment of 4 is always good enough 187constexprunsigned StackAdjustedAlignment = 4;
191for (
const ChainElem &E :
C)
197constLoadInst *LI = dyn_cast<LoadInst>(
I);
198return LI !=
nullptr && LI->
hasMetadata(LLVMContext::MD_invariant_load);
201/// Reorders the instructions that I depends on (the instructions defining its 202/// operands), to ensure they dominate I. 208while (!Worklist.
empty()) {
211for (
intIdx = 0;
Idx < NumOperands;
Idx++) {
213if (!IM || IM->
getOpcode() == Instruction::PHI)
216// If IM is in another BB, no need to move it, because this pass only 217// vectorizes instructions within one BB. 221assert(IM !=
I &&
"Unexpected cycle while re-ordering instructions");
224 InstructionsToMove.
insert(IM);
230// All instructions to move should follow I. Start from I, not from begin(). 231for (
auto BBI =
I->getIterator(), E =
I->getParent()->end(); BBI != E;) {
233if (!InstructionsToMove.
contains(IM))
249// We could erase instrs right after vectorizing them, but that can mess up 250// our BB iterators, and also can make the equivalence class keys point to 251// freed memory. This is fixable, but it's simpler just to wait until we're 252// done with the BB and erase all at once. 258 :
F(
F), AA(AA), AC(AC), DT(DT), SE(SE),
TTI(
TTI),
259DL(
F.getDataLayout()), Builder(SE.getContext()) {}
264staticconstunsigned MaxDepth = 3;
266 /// Runs the vectorizer on a "pseudo basic block", which is a range of 267 /// instructions [Begin, End) within one BB all of which have 268 /// isGuaranteedToTransferExecutionToSuccessor(I) == true. 271 /// Runs the vectorizer on one equivalence class, i.e. one set of loads/stores 272 /// in the same BB with the same value for getUnderlyingObject() etc. 273bool runOnEquivalenceClass(
const EqClassKey &EqClassKey,
276 /// Runs the vectorizer on one chain, i.e. a subset of an equivalence class 277 /// where all instructions access a known, constant offset from the first 279bool runOnChain(Chain &
C);
281 /// Splits the chain into subchains of instructions which read/write a 282 /// contiguous block of memory. Discards any length-1 subchains (because 283 /// there's nothing to vectorize in there). 284 std::vector<Chain> splitChainByContiguity(Chain &
C);
286 /// Splits the chain into subchains where it's safe to hoist loads up to the 287 /// beginning of the sub-chain and it's safe to sink loads up to the end of 288 /// the sub-chain. Discards any length-1 subchains. 289 std::vector<Chain> splitChainByMayAliasInstrs(Chain &
C);
291 /// Splits the chain into subchains that make legal, aligned accesses. 292 /// Discards any length-1 subchains. 293 std::vector<Chain> splitChainByAlignment(Chain &
C);
295 /// Converts the instrs in the chain into a single vectorized load or store. 296 /// Adds the old scalar loads/stores to ToErase. 297bool vectorizeChain(Chain &
C);
299 /// Tries to compute the offset in bytes PtrB - PtrA. 300 std::optional<APInt> getConstantOffset(
Value *PtrA,
Value *PtrB,
303 std::optional<APInt> getConstantOffsetComplexAddrs(
Value *PtrA,
Value *PtrB,
306 std::optional<APInt> getConstantOffsetSelects(
Value *PtrA,
Value *PtrB,
310 /// Gets the element type of the vector that the chain will load or store. 311 /// This is nontrivial because the chain may contain elements of different 312 /// types; e.g. it's legal to have a chain that contains both i32 and float. 313Type *getChainElemTy(
const Chain &
C);
315 /// Determines whether ChainElem can be moved up (if IsLoad) or down (if 316 /// !IsLoad) to ChainBegin -- i.e. there are no intervening may-alias 319 /// The map ChainElemOffsets must contain all of the elements in 320 /// [ChainBegin, ChainElem] and their offsets from some arbitrary base 321 /// address. It's ok if it contains additional entries. 322template <
bool IsLoadChain>
327 /// Merges the equivalence classes if they have underlying objects that differ 328 /// by one level of indirection (i.e., one is a getelementptr and the other is 329 /// the base pointer in that getelementptr). 330void mergeEquivalenceClasses(EquivalenceClassMap &EQClasses)
const;
332 /// Collects loads and stores grouped by "equivalence class", where: 333 /// - all elements in an eq class are a load or all are a store, 334 /// - they all load/store the same element size (it's OK to have e.g. i8 and 335 /// <4 x i8> in the same class, but not i32 and <4 x i8>), and 336 /// - they all have the same value for getUnderlyingObject(). 340 /// Partitions Instrs into "chains" where every instruction has a known 341 /// constant offset from the first instr in the chain. 343 /// Postcondition: For all i, ret[i][0].second == 0, because the first instr 344 /// in the chain is the leader, and an instr touches distance 0 from itself. 360return"GPU Load and Store Vectorizer";
373}
// end anonymous namespace 375char LoadStoreVectorizerLegacyPass::ID = 0;
378"Vectorize load and Store instructions",
false,
false)
389returnnew LoadStoreVectorizerLegacyPass();
392bool LoadStoreVectorizerLegacyPass::runOnFunction(
Function &
F) {
393// Don't vectorize when the attribute NoImplicitFloat is used. 394if (skipFunction(
F) ||
F.hasFnAttribute(Attribute::NoImplicitFloat))
397AliasAnalysis &AA = getAnalysis<AAResultsWrapperPass>().getAAResults();
398DominatorTree &DT = getAnalysis<DominatorTreeWrapperPass>().getDomTree();
399ScalarEvolution &SE = getAnalysis<ScalarEvolutionWrapperPass>().getSE();
401 getAnalysis<TargetTransformInfoWrapperPass>().getTTI(
F);
404 getAnalysis<AssumptionCacheTracker>().getAssumptionCache(
F);
406return Vectorizer(
F, AA, AC, DT, SE,
TTI).run();
411// Don't vectorize when the attribute NoImplicitFloat is used. 412if (
F.hasFnAttribute(Attribute::NoImplicitFloat))
421bool Changed = Vectorizer(
F, AA, AC, DT, SE,
TTI).
run();
427bool Vectorizer::run() {
429// Break up the BB if there are any instrs which aren't guaranteed to transfer 430// execution to their successor. 432// Consider, for example: 434// def assert_arr_len(int n) { if (n < 2) exit(); } 437// call assert_array_len(arr.length) 440// Even though assert_arr_len does not read or write any memory, we can't 441// speculate the second load before the call. More info at 442// https://github.com/llvm/llvm-project/issues/52950. 444// BB must at least have a terminator. 454for (
auto It = Barriers.
begin(),
End = std::prev(Barriers.
end()); It !=
End;
456 Changed |= runOnPseudoBB(*It, *std::next(It));
473dbgs() <<
"LSV: Running on pseudo-BB [" << *Begin <<
" ... ";
474if (
End != Begin->getParent()->end())
482for (
constauto &[EqClassKey, EqClass] :
483 collectEquivalenceClasses(Begin,
End))
484 Changed |= runOnEquivalenceClass(EqClassKey, EqClass);
489bool Vectorizer::runOnEquivalenceClass(
const EqClassKey &EqClassKey,
494dbgs() <<
"LSV: Running on equivalence class of size " << EqClass.
size()
495 <<
" keyed on " << EqClassKey <<
":\n";
497dbgs() <<
" " << *
I <<
"\n";
500 std::vector<Chain> Chains = gatherChains(EqClass);
502 <<
" nontrivial chains.\n";);
503for (Chain &
C : Chains)
504 Changed |= runOnChain(
C);
508bool Vectorizer::runOnChain(Chain &
C) {
510dbgs() <<
"LSV: Running on chain with " <<
C.size() <<
" instructions:\n";
514// Split up the chain into increasingly smaller chains, until we can finally 515// vectorize the chains. 517// (Don't be scared by the depth of the loop nest here. These operations are 518// all at worst O(n lg n) in the number of instructions, and splitting chains 519// doesn't change the number of instrs. So the whole loop nest is O(n lg n).) 521for (
auto &
C : splitChainByMayAliasInstrs(
C))
522for (
auto &
C : splitChainByContiguity(
C))
523for (
auto &
C : splitChainByAlignment(
C))
524 Changed |= vectorizeChain(
C);
528std::vector<Chain> Vectorizer::splitChainByMayAliasInstrs(Chain &
C) {
532 sortChainInBBOrder(
C);
535dbgs() <<
"LSV: splitChainByMayAliasInstrs considering chain:\n";
539// We know that elements in the chain with nonverlapping offsets can't 540// alias, but AA may not be smart enough to figure this out. Use a 544 ChainOffsets.insert({&*E.Inst, E.OffsetFromLeader});
546// Loads get hoisted up to the first load in the chain. Stores get sunk 547// down to the last store in the chain. Our algorithm for loads is: 549// - Take the first element of the chain. This is the start of a new chain. 550// - Take the next element of `Chain` and check for may-alias instructions 551// up to the start of NewChain. If no may-alias instrs, add it to 552// NewChain. Otherwise, start a new NewChain. 554// For stores it's the same except in the reverse direction. 556// We expect IsLoad to be an std::bool_constant. 557auto Impl = [&](
auto IsLoad) {
558// MSVC is unhappy if IsLoad is a capture, so pass it as an arg. 559auto [ChainBegin, ChainEnd] = [&](
auto IsLoad) {
560ifconstexpr (IsLoad())
561return std::make_pair(
C.begin(),
C.end());
563return std::make_pair(
C.rbegin(),
C.rend());
565assert(ChainBegin != ChainEnd);
567 std::vector<Chain> Chains;
570for (
auto ChainIt = std::next(ChainBegin); ChainIt != ChainEnd; ++ChainIt) {
571if (isSafeToMove<IsLoad>(ChainIt->Inst, NewChain.
front().Inst,
573LLVM_DEBUG(
dbgs() <<
"LSV: No intervening may-alias instrs; can merge " 574 << *ChainIt->Inst <<
" into " << *ChainBegin->Inst
579dbgs() <<
"LSV: Found intervening may-alias instrs; cannot merge " 580 << *ChainIt->Inst <<
" into " << *ChainBegin->Inst <<
"\n");
581if (NewChain.
size() > 1) {
583dbgs() <<
"LSV: got nontrivial chain without aliasing instrs:\n";
586 Chains.emplace_back(std::move(NewChain));
593if (NewChain.
size() > 1) {
595dbgs() <<
"LSV: got nontrivial chain without aliasing instrs:\n";
598 Chains.emplace_back(std::move(NewChain));
603if (isa<LoadInst>(
C[0].Inst))
604return Impl(
/*IsLoad=*/std::bool_constant<true>());
606assert(isa<StoreInst>(
C[0].Inst));
607return Impl(
/*IsLoad=*/std::bool_constant<false>());
610std::vector<Chain> Vectorizer::splitChainByContiguity(Chain &
C) {
614 sortChainInOffsetOrder(
C);
617dbgs() <<
"LSV: splitChainByContiguity considering chain:\n";
621 std::vector<Chain>
Ret;
622Ret.push_back({
C.front()});
624for (
auto It = std::next(
C.begin()),
End =
C.end(); It !=
End; ++It) {
625// `prev` accesses offsets [PrevDistFromBase, PrevReadEnd). 626auto &CurChain =
Ret.back();
627const ChainElem &Prev = CurChain.back();
629assert(SzBits % 8 == 0 &&
"Non-byte sizes should have been filtered out by " 630"collectEquivalenceClass");
631APInt PrevReadEnd = Prev.OffsetFromLeader + SzBits / 8;
633// Add this instruction to the end of the current chain, or start a new one. 634bool AreContiguous = It->OffsetFromLeader == PrevReadEnd;
636 << (AreContiguous ?
"" :
"not ") <<
"contiguous: " 637 << *Prev.Inst <<
" (ends at offset " << PrevReadEnd
638 <<
") -> " << *It->Inst <<
" (starts at offset " 639 << It->OffsetFromLeader <<
")\n");
641 CurChain.push_back(*It);
646// Filter out length-1 chains, these are uninteresting. 647llvm::erase_if(Ret, [](
constauto &Chain) {
return Chain.size() <= 1; });
651Type *Vectorizer::getChainElemTy(
const Chain &
C) {
654// - If there are any pointer types in the chain, use an integer type. 655// - Prefer an integer type if it appears in the chain. 656// - Otherwise, use the first type in the chain. 658// The rule about pointer types is a simplification when we merge e.g. a load 659// of a ptr and a double. There's no direct conversion from a ptr to a 660// double; it requires a ptrtoint followed by a bitcast. 662// It's unclear to me if the other rules have any practical effect, but we do 663// it to match this pass's previous behavior. 664if (
any_of(
C, [](
const ChainElem &E) {
672for (
const ChainElem &E :
C)
678std::vector<Chain> Vectorizer::splitChainByAlignment(Chain &
C) {
679// We use a simple greedy algorithm. 680// - Given a chain of length N, find all prefixes that 681// (a) are not longer than the max register length, and 682// (b) are a power of 2. 683// - Starting from the longest prefix, try to create a vector of that length. 684// - If one of them works, great. Repeat the algorithm on any remaining 685// elements in the chain. 686// - If none of them work, discard the first element and repeat on a chain 691 sortChainInOffsetOrder(
C);
694dbgs() <<
"LSV: splitChainByAlignment considering chain:\n";
698bool IsLoadChain = isa<LoadInst>(
C[0].Inst);
699auto GetVectorFactor = [&](
unsigned VF,
unsigned LoadStoreSize,
702 ChainSizeBytes, VecTy)
704 ChainSizeBytes, VecTy);
708for (
constauto &E :
C) {
711"Should have filtered out non-power-of-two elements in " 712"collectEquivalenceClasses.");
719 std::vector<Chain>
Ret;
720for (
unsigned CBegin = 0; CBegin <
C.size(); ++CBegin) {
721// Find candidate chains of size not greater than the largest vector reg. 722// These chains are over the closed interval [CBegin, CEnd]. 723SmallVector<std::pair<
unsigned/*CEnd*/,
unsigned/*SizeBytes*/>, 8>
725for (
unsigned CEnd = CBegin + 1,
Size =
C.size(); CEnd <
Size; ++CEnd) {
726APInt Sz =
C[CEnd].OffsetFromLeader +
728C[CBegin].OffsetFromLeader;
729if (Sz.
sgt(VecRegBytes))
731 CandidateChains.emplace_back(CEnd,
735// Consider the longest chain first. 736for (
auto It = CandidateChains.rbegin(),
End = CandidateChains.rend();
738auto [CEnd, SizeBytes] = *It;
740dbgs() <<
"LSV: splitChainByAlignment considering candidate chain [" 741 << *
C[CBegin].Inst <<
" ... " << *
C[CEnd].Inst <<
"]\n");
743Type *VecElemTy = getChainElemTy(
C);
744// Note, VecElemTy is a power of 2, but might be less than one byte. For 745// example, we can vectorize 2 x <2 x i4> to <4 x i4>, and in this case 746// VecElemTy would be i4. 747unsigned VecElemBits =
DL.getTypeSizeInBits(VecElemTy);
749// SizeBytes and VecElemBits are powers of 2, so they divide evenly. 750assert((8 * SizeBytes) % VecElemBits == 0);
751unsigned NumVecElems = 8 * SizeBytes / VecElemBits;
753unsigned VF = 8 * VecRegBytes / VecElemBits;
755// Check that TTI is happy with this vectorization factor. 756unsigned TargetVF = GetVectorFactor(VF, VecElemBits,
757 VecElemBits * NumVecElems / 8, VecTy);
758if (TargetVF != VF && TargetVF < NumVecElems) {
760dbgs() <<
"LSV: splitChainByAlignment discarding candidate chain " 762 << TargetVF <<
" != VF=" << VF
763 <<
" and TargetVF < NumVecElems=" << NumVecElems <<
"\n");
767// Is a load/store with this alignment allowed by TTI and at least as fast 768// as an unvectorized load/store? 770// TTI and F are passed as explicit captures to WAR an MSVC misparse (??). 771auto IsAllowedAndFast = [&, SizeBytes = SizeBytes, &
TTI =
TTI,
773if (Alignment.value() % SizeBytes == 0)
775unsigned VectorizedSpeed = 0;
777F.getContext(), SizeBytes * 8, AS, Alignment, &VectorizedSpeed);
778if (!AllowsMisaligned) {
780 <<
"LSV: Access of " << SizeBytes <<
"B in addrspace " 781 << AS <<
" with alignment " << Alignment.value()
782 <<
" is misaligned, and therefore can't be vectorized.\n");
786unsigned ElementwiseSpeed = 0;
787 (
TTI).allowsMisalignedMemoryAccesses((
F).getContext(), VecElemBits, AS,
788 Alignment, &ElementwiseSpeed);
789if (VectorizedSpeed < ElementwiseSpeed) {
791 <<
"LSV: Access of " << SizeBytes <<
"B in addrspace " 792 << AS <<
" with alignment " << Alignment.value()
793 <<
" has relative speed " << VectorizedSpeed
794 <<
", which is lower than the elementwise speed of " 796 <<
". Therefore this access won't be vectorized.\n");
802// If we're loading/storing from an alloca, align it if possible. 804// FIXME: We eagerly upgrade the alignment, regardless of whether TTI 805// tells us this is beneficial. This feels a bit odd, but it matches 806// existing tests. This isn't *so* bad, because at most we align to 4 807// bytes (current value of StackAdjustedAlignment). 809// FIXME: We will upgrade the alignment of the alloca even if it turns out 810// we can't vectorize for some other reason. 812bool IsAllocaAccess = AS ==
DL.getAllocaAddrSpace() &&
815Align PrefAlign =
Align(StackAdjustedAlignment);
816if (IsAllocaAccess && Alignment.
value() % SizeBytes != 0 &&
817 IsAllowedAndFast(PrefAlign)) {
819 PtrOperand, PrefAlign,
DL,
C[CBegin].Inst,
nullptr, &DT);
820if (NewAlign >= Alignment) {
822 <<
"LSV: splitByChain upgrading alloca alignment from " 823 << Alignment.
value() <<
" to " << NewAlign.
value()
825 Alignment = NewAlign;
829if (!IsAllowedAndFast(Alignment)) {
831dbgs() <<
"LSV: splitChainByAlignment discarding candidate chain " 832"because its alignment is not AllowedAndFast: " 833 << Alignment.
value() <<
"\n");
842dbgs() <<
"LSV: splitChainByAlignment discarding candidate chain " 843"because !isLegalToVectorizeLoad/StoreChain.");
847// Hooray, we can vectorize this chain! 848 Chain &NewChain =
Ret.emplace_back();
849for (
unsignedI = CBegin;
I <= CEnd; ++
I)
850 NewChain.emplace_back(
C[
I]);
851 CBegin = CEnd;
// Skip over the instructions we've added to the chain. 858bool Vectorizer::vectorizeChain(Chain &
C) {
862 sortChainInOffsetOrder(
C);
865dbgs() <<
"LSV: Vectorizing chain of " <<
C.size() <<
" instructions:\n";
869Type *VecElemTy = getChainElemTy(
C);
870bool IsLoadChain = isa<LoadInst>(
C[0].Inst);
872unsigned ChainBytes = std::accumulate(
873C.begin(),
C.end(), 0u, [&](
unsigned Bytes,
const ChainElem &E) {
874 return Bytes + DL.getTypeStoreSize(getLoadStoreType(E.Inst));
876assert(ChainBytes %
DL.getTypeStoreSize(VecElemTy) == 0);
877// VecTy is a power of 2 and 1 byte at smallest, but VecElemTy may be smaller 878// than 1 byte (e.g. VecTy == <32 x i1>). 880 VecElemTy, 8 * ChainBytes /
DL.getTypeSizeInBits(VecElemTy));
883// If this is a load/store of an alloca, we might have upgraded the alloca's 884// alignment earlier. Get the new alignment. 885if (AS ==
DL.getAllocaAddrSpace()) {
886 Alignment = std::max(
892// All elements of the chain must have the same scalar-type size. 894for (
const ChainElem &E :
C)
896DL.getTypeStoreSize(VecElemTy));
901// Loads get hoisted to the location of the first load in the chain. We may 902// also need to hoist the (transitive) operands of the loads. 903 Builder.SetInsertPoint(
905returnA.Inst->comesBefore(
B.Inst);
908// Chain is in offset order, so C[0] is the instr with the lowest offset, 909// i.e. the root of the vector. 910 VecInst = Builder.CreateAlignedLoad(VecTy,
915for (
const ChainElem &E :
C) {
919if (
auto *VT = dyn_cast<FixedVectorType>(
T)) {
920autoMask = llvm::to_vector<8>(
921 llvm::seq<int>(VecIdx, VecIdx + VT->getNumElements()));
922V = Builder.CreateShuffleVector(VecInst, Mask,
I->getName());
923 VecIdx += VT->getNumElements();
925V = Builder.CreateExtractElement(VecInst, Builder.getInt32(VecIdx),
929if (
V->getType() !=
I->getType())
930V = Builder.CreateBitOrPointerCast(V,
I->getType());
931I->replaceAllUsesWith(V);
934// Finally, we need to reorder the instrs in the BB so that the (transitive) 935// operands of VecInst appear before it. To see why, suppose we have 936// vectorized the following code: 939// load1 = load i32 ptr1 941// load0 = load i32 ptr0 943// We will put the vectorized load at the location of the earliest load in 944// the BB, i.e. load1. We get: 947// loadv = load <2 x i32> ptr0 948// load0 = extractelement loadv, 0 949// load1 = extractelement loadv, 1 952// Notice that loadv uses ptr0, which is defined *after* it! 955// Stores get sunk to the location of the last store in the chain. 957returnA.Inst->comesBefore(
B.Inst);
960// Build the vector to store. 963auto InsertElem = [&](
Value *
V) {
964if (
V->getType() != VecElemTy)
965 V = Builder.CreateBitOrPointerCast(V, VecElemTy);
966 Vec = Builder.CreateInsertElement(Vec, V, Builder.getInt32(VecIdx++));
968for (
const ChainElem &E :
C) {
969auto *
I = cast<StoreInst>(E.Inst);
972for (
int J = 0, JE = VT->getNumElements(); J < JE; ++J) {
973 InsertElem(Builder.CreateExtractElement(
I->getValueOperand(),
974 Builder.getInt32(J)));
977 InsertElem(
I->getValueOperand());
981// Chain is in offset order, so C[0] is the instr with the lowest offset, 982// i.e. the root of the vector. 983 VecInst = Builder.CreateAlignedStore(
991for (
const ChainElem &E :
C)
992 ToErase.emplace_back(E.Inst);
994 ++NumVectorInstructions;
995 NumScalarsVectorized +=
C.size();
999template <
bool IsLoadChain>
1000bool Vectorizer::isSafeToMove(
1004 << *ChainBegin <<
")\n");
1006assert(isa<LoadInst>(ChainElem) == IsLoadChain);
1007if (ChainElem == ChainBegin)
1010// Invariant loads can always be reordered; by definition they are not 1011// clobbered by stores. 1012if (isInvariantLoad(ChainElem))
1015auto BBIt = std::next([&] {
1016ifconstexpr (IsLoadChain)
1021auto BBItEnd = std::next([&] {
1022ifconstexpr (IsLoadChain)
1028constAPInt &ChainElemOffset = ChainOffsets.
at(ChainElem);
1029constunsigned ChainElemSize =
1032for (; BBIt != BBItEnd; ++BBIt) {
1035if (!
I->mayReadOrWriteMemory())
1038// Loads can be reordered with other loads. 1039if (IsLoadChain && isa<LoadInst>(
I))
1042// Stores can be sunk below invariant loads. 1043if (!IsLoadChain && isInvariantLoad(
I))
1046// If I is in the chain, we can tell whether it aliases ChainIt by checking 1047// what offset ChainIt accesses. This may be better than AA is able to do. 1049// We should really only have duplicate offsets for stores (the duplicate 1050// loads should be CSE'ed), but in case we have a duplicate load, we'll 1051// split the chain so we don't have to handle this case specially. 1052if (
auto OffsetIt = ChainOffsets.
find(
I); OffsetIt != ChainOffsets.
end()) {
1053// I and ChainElem overlap if: 1054// - I and ChainElem have the same offset, OR 1055// - I's offset is less than ChainElem's, but I touches past the 1056// beginning of ChainElem, OR 1057// - ChainElem's offset is less than I's, but ChainElem touches past the 1059constAPInt &IOffset = OffsetIt->second;
1061if (IOffset == ChainElemOffset ||
1062 (IOffset.
sle(ChainElemOffset) &&
1063 (IOffset + IElemSize).sgt(ChainElemOffset)) ||
1064 (ChainElemOffset.
sle(IOffset) &&
1065 (ChainElemOffset + ChainElemSize).sgt(OffsetIt->second))) {
1067// Double check that AA also sees this alias. If not, we probably 1071dbgs() <<
"LSV: Found alias in chain: " << *
I <<
"\n";
1073returnfalse;
// We found an aliasing instruction; bail. 1076continue;
// We're confident there's no alias. 1083 <<
" Aliasing instruction:\n" 1085 <<
" Aliased instruction and pointer:\n" 1086 <<
" " << *ChainElem <<
'\n' 1104unsigned MatchingOpIdxB,
boolSigned) {
1105LLVM_DEBUG(
dbgs() <<
"LSV: checkIfSafeAddSequence IdxDiff=" << IdxDiff
1106 <<
", AddOpA=" << *AddOpA <<
", MatchingOpIdxA=" 1107 << MatchingOpIdxA <<
", AddOpB=" << *AddOpB
1108 <<
", MatchingOpIdxB=" << MatchingOpIdxB
1109 <<
", Signed=" <<
Signed <<
"\n");
1110// If both OpA and OpB are adds with NSW/NUW and with one of the operands 1111// being the same, we can guarantee that the transformation is safe if we can 1112// prove that OpA won't overflow when Ret added to the other operand of OpA. 1114// %tmp7 = add nsw i32 %tmp2, %v0 1115// %tmp8 = sext i32 %tmp7 to i64 1117// %tmp11 = add nsw i32 %v0, 1 1118// %tmp12 = add nsw i32 %tmp2, %tmp11 1119// %tmp13 = sext i32 %tmp12 to i64 1121// Both %tmp7 and %tmp12 have the nsw flag and the first operand is %tmp2. 1122// It's guaranteed that adding 1 to %tmp7 won't overflow because %tmp11 adds 1123// 1 to %v0 and both %tmp11 and %tmp12 have the nsw flag. 1125 AddOpB->
getOpcode() == Instruction::Add &&
1129Value *OtherOperandA = AddOpA->
getOperand(MatchingOpIdxA == 1 ? 0 : 1);
1130Value *OtherOperandB = AddOpB->
getOperand(MatchingOpIdxB == 1 ? 0 : 1);
1131Instruction *OtherInstrA = dyn_cast<Instruction>(OtherOperandA);
1132Instruction *OtherInstrB = dyn_cast<Instruction>(OtherOperandB);
1133// Match `x +nsw/nuw y` and `x +nsw/nuw (y +nsw/nuw IdxDiff)`. 1134if (OtherInstrB && OtherInstrB->
getOpcode() == Instruction::Add &&
1136 isa<ConstantInt>(OtherInstrB->
getOperand(1))) {
1138 cast<ConstantInt>(OtherInstrB->
getOperand(1))->getSExtValue();
1139if (OtherInstrB->
getOperand(0) == OtherOperandA &&
1143// Match `x +nsw/nuw (y +nsw/nuw -Idx)` and `x +nsw/nuw (y +nsw/nuw x)`. 1144if (OtherInstrA && OtherInstrA->
getOpcode() == Instruction::Add &&
1146 isa<ConstantInt>(OtherInstrA->
getOperand(1))) {
1148 cast<ConstantInt>(OtherInstrA->
getOperand(1))->getSExtValue();
1149if (OtherInstrA->
getOperand(0) == OtherOperandB &&
1153// Match `x +nsw/nuw (y +nsw/nuw c)` and 1154// `x +nsw/nuw (y +nsw/nuw (c + IdxDiff))`. 1155if (OtherInstrA && OtherInstrB &&
1156 OtherInstrA->
getOpcode() == Instruction::Add &&
1157 OtherInstrB->
getOpcode() == Instruction::Add &&
1160 isa<ConstantInt>(OtherInstrA->
getOperand(1)) &&
1161 isa<ConstantInt>(OtherInstrB->
getOperand(1))) {
1163 cast<ConstantInt>(OtherInstrA->
getOperand(1))->getSExtValue();
1165 cast<ConstantInt>(OtherInstrB->
getOperand(1))->getSExtValue();
1174std::optional<APInt> Vectorizer::getConstantOffsetComplexAddrs(
1176LLVM_DEBUG(
dbgs() <<
"LSV: getConstantOffsetComplexAddrs PtrA=" << *PtrA
1177 <<
" PtrB=" << *PtrB <<
" ContextInst=" << *ContextInst
1178 <<
" Depth=" <<
Depth <<
"\n");
1179auto *GEPA = dyn_cast<GetElementPtrInst>(PtrA);
1180auto *GEPB = dyn_cast<GetElementPtrInst>(PtrB);
1182return getConstantOffsetSelects(PtrA, PtrB, ContextInst,
Depth);
1184// Look through GEPs after checking they're the same except for the last 1186if (GEPA->getNumOperands() != GEPB->getNumOperands() ||
1187 GEPA->getPointerOperand() != GEPB->getPointerOperand())
1191for (
unsignedI = 0, E = GEPA->getNumIndices() - 1;
I < E; ++
I) {
1206// Only look through a ZExt/SExt. 1207if (!isa<SExtInst>(OpA) && !isa<ZExtInst>(OpA))
1210boolSigned = isa<SExtInst>(OpA);
1212// At this point A could be a function parameter, i.e. not an instruction 1214 OpB = dyn_cast<Instruction>(OpB->
getOperand(0));
1229LLVM_DEBUG(
dbgs() <<
"LSV: getConstantOffsetComplexAddrs IdxDiff=" << IdxDiff
1232// Now we need to prove that adding IdxDiff to ValA won't overflow. 1235// First attempt: if OpB is an add with NSW/NUW, and OpB is IdxDiff added to 1237if (OpB->
getOpcode() == Instruction::Add &&
1239 IdxDiff.
sle(cast<ConstantInt>(OpB->
getOperand(1))->getSExtValue()) &&
1243// Second attempt: check if we have eligible add NSW/NUW instruction 1245 OpA = dyn_cast<Instruction>(ValA);
1246if (!Safe && OpA && OpA->
getOpcode() == Instruction::Add &&
1249// In the checks below a matching operand in OpA and OpB is an operand which 1250// is the same in those two instructions. Below we account for possible 1251// orders of the operands of these add instructions. 1252for (
unsigned MatchingOpIdxA : {0, 1})
1253for (
unsigned MatchingOpIdxB : {0, 1})
1263// Assuming IdxDiff is positive: If all set bits of IdxDiff or any higher 1264// order bit other than the sign bit are known to be zero in ValA, we can add 1265// Diff to it while guaranteeing no overflow of any sort. 1267// If IdxDiff is negative, do the same, but swap ValA and ValB. 1269// When computing known bits, use the GEPs as context instructions, since 1270// they likely are in the same BB as the load/store. 1277if (BitsAllowedToBeSet.
ult(IdxDiff.
abs()))
1283return IdxDiff * Stride;
1287std::optional<APInt> Vectorizer::getConstantOffsetSelects(
1289if (
Depth++ == MaxDepth)
1292if (
auto *SelectA = dyn_cast<SelectInst>(PtrA)) {
1293if (
auto *SelectB = dyn_cast<SelectInst>(PtrB)) {
1294if (SelectA->getCondition() != SelectB->getCondition())
1296LLVM_DEBUG(
dbgs() <<
"LSV: getConstantOffsetSelects, PtrA=" << *PtrA
1297 <<
", PtrB=" << *PtrB <<
", ContextInst=" 1298 << *ContextInst <<
", Depth=" <<
Depth <<
"\n");
1299 std::optional<APInt> TrueDiff = getConstantOffset(
1300 SelectA->getTrueValue(), SelectB->getTrueValue(), ContextInst,
Depth);
1303 std::optional<APInt> FalseDiff =
1304 getConstantOffset(SelectA->getFalseValue(), SelectB->getFalseValue(),
1306if (TrueDiff == FalseDiff)
1313void Vectorizer::mergeEquivalenceClasses(EquivalenceClassMap &EQClasses)
const{
1314if (EQClasses.size() < 2)
// There is nothing to merge. 1317// The reduced key has all elements of the ECClassKey except the underlying 1318// object. Check that EqClassKey has 4 elements and define the reduced key. 1319static_assert(std::tuple_size_v<EqClassKey> == 4,
1320"EqClassKey has changed - EqClassReducedKey needs changes too");
1321usingEqClassReducedKey =
1322 std::tuple<std::tuple_element_t<1, EqClassKey>
/* AddrSpace */,
1323 std::tuple_element_t<2, EqClassKey>
/* Element size */,
1324 std::tuple_element_t<3, EqClassKey>
/* IsLoad; */>;
1325usingECReducedKeyToUnderlyingObjectMap =
1329// Form a map from the reduced key (without the underlying object) to the 1330// underlying objects: 1 reduced key to many underlying objects, to form 1331// groups of potentially merge-able equivalence classes. 1332 ECReducedKeyToUnderlyingObjectMap RedKeyToUOMap;
1333bool FoundPotentiallyOptimizableEC =
false;
1334for (
constauto &EC : EQClasses) {
1335constauto &
Key =
EC.first;
1336 EqClassReducedKey RedKey{std::get<1>(Key), std::get<2>(Key),
1338 RedKeyToUOMap[RedKey].
insert(std::get<0>(Key));
1339if (RedKeyToUOMap[RedKey].
size() > 1)
1340 FoundPotentiallyOptimizableEC =
true;
1342if (!FoundPotentiallyOptimizableEC)
1346dbgs() <<
"LSV: mergeEquivalenceClasses: before merging:\n";
1347for (
constauto &EC : EQClasses) {
1348dbgs() <<
" Key: {" <<
EC.first <<
"}\n";
1349for (
constauto &Inst :
EC.second)
1350dbgs() <<
" Inst: " << *Inst <<
'\n';
1354dbgs() <<
"LSV: mergeEquivalenceClasses: RedKeyToUOMap:\n";
1355for (
constauto &RedKeyToUO : RedKeyToUOMap) {
1356dbgs() <<
" Reduced key: {" << std::get<0>(RedKeyToUO.first) <<
", " 1357 << std::get<1>(RedKeyToUO.first) <<
", " 1358 <<
static_cast<int>(std::get<2>(RedKeyToUO.first)) <<
"} --> " 1359 << RedKeyToUO.second.size() <<
" underlying objects:\n";
1360for (
auto UObject : RedKeyToUO.second)
1361dbgs() <<
" " << *UObject <<
'\n';
1367// Compute the ultimate targets for a set of underlying objects. 1368auto GetUltimateTargets =
1370 UObjectToUObjectMap IndirectionMap;
1371for (
constauto *UObject : UObjects) {
1372constunsigned MaxLookupDepth = 1;
// look for 1-level indirections only 1374if (UltimateTarget != UObject)
1375 IndirectionMap[UObject] = UltimateTarget;
1377 UObjectToUObjectMap UltimateTargetsMap;
1378for (
constauto *UObject : UObjects) {
1380auto It = IndirectionMap.find(
Target);
1381for (; It != IndirectionMap.end(); It = IndirectionMap.find(
Target))
1383 UltimateTargetsMap[UObject] =
Target;
1385return UltimateTargetsMap;
1388// For each item in RedKeyToUOMap, if it has more than one underlying object, 1389// try to merge the equivalence classes. 1390for (
auto &[RedKey, UObjects] : RedKeyToUOMap) {
1391if (UObjects.size() < 2)
1393auto UTMap = GetUltimateTargets(UObjects);
1394for (
constauto &[UObject, UltimateTarget] : UTMap) {
1395if (UObject == UltimateTarget)
1398 EqClassKey KeyFrom{UObject, std::get<0>(RedKey), std::get<1>(RedKey),
1399 std::get<2>(RedKey)};
1400 EqClassKey KeyTo{UltimateTarget, std::get<0>(RedKey), std::get<1>(RedKey),
1401 std::get<2>(RedKey)};
1402// The entry for KeyFrom is guarantted to exist, unlike KeyTo. Thus, 1403// request the reference to the instructions vector for KeyTo first. 1404constauto &VecTo = EQClasses[KeyTo];
1405constauto &VecFrom = EQClasses[KeyFrom];
1407 std::merge(VecFrom.begin(), VecFrom.end(), VecTo.begin(), VecTo.end(),
1408 std::back_inserter(MergedVec),
1410 return A && B && A->comesBefore(B);
1412 EQClasses[KeyTo] = std::move(MergedVec);
1413 EQClasses.erase(KeyFrom);
1417dbgs() <<
"LSV: mergeEquivalenceClasses: after merging:\n";
1418for (
constauto &EC : EQClasses) {
1419dbgs() <<
" Key: {" <<
EC.first <<
"}\n";
1420for (
constauto &Inst :
EC.second)
1421dbgs() <<
" Inst: " << *Inst <<
'\n';
1429 EquivalenceClassMap
Ret;
1431auto GetUnderlyingObject = [](
constValue *
Ptr) ->
constValue * {
1433if (
constauto *Sel = dyn_cast<SelectInst>(ObjPtr)) {
1434// The select's themselves are distinct instructions even if they share 1435// the same condition and evaluate to consecutive pointers for true and 1436// false values of the condition. Therefore using the select's themselves 1437// for grouping instructions would put consecutive accesses into different 1438// lists and they won't be even checked for being consecutive, and won't 1440return Sel->getCondition();
1446auto *LI = dyn_cast<LoadInst>(&
I);
1447auto *
SI = dyn_cast<StoreInst>(&
I);
1451if ((LI && !LI->
isSimple()) || (SI && !
SI->isSimple()))
1462// Skip weird non-byte sizes. They probably aren't worth the effort of 1463// handling correctly. 1464unsigned TySize =
DL.getTypeSizeInBits(Ty);
1465if ((TySize % 8) != 0)
1468// Skip vectors of pointers. The vectorizeLoadChain/vectorizeStoreChain 1469// functions are currently using an integer type for the vectorized 1470// load/store, and does not support casting between the integer type and a 1471// vector of pointers (e.g. i64 to <2 x i16*>) 1476unsigned AS =
Ptr->getType()->getPointerAddressSpace();
1479unsigned VF = VecRegSize / TySize;
1482// Only handle power-of-two sized elements. 1484 (VecTy && !
isPowerOf2_32(
DL.getTypeSizeInBits(VecTy->getScalarType()))))
1487// No point in looking at these if they're too big to vectorize. 1488if (TySize > VecRegSize / 2 ||
1492Ret[{GetUnderlyingObject(
Ptr), AS,
1494/*IsLoad=*/LI !=
nullptr}]
1498 mergeEquivalenceClasses(Ret);
1507unsigned ASPtrBits =
DL.getIndexSizeInBits(AS);
1510// Check that Instrs is in BB order and all have the same addr space. 1511for (
size_tI = 1;
I < Instrs.
size(); ++
I) {
1512assert(Instrs[
I - 1]->comesBefore(Instrs[
I]));
1517// Machinery to build an MRU-hashtable of Chains. 1519// (Ideally this could be done with MapVector, but as currently implemented, 1520// moving an element to the front of a MapVector is O(n).) 1521structInstrListElem :
ilist_node<InstrListElem>,
1522 std::pair<Instruction *, Chain> {
1526structInstrListElemDenseMapInfo {
1529static InstrListElem *getEmptyKey() {
return PtrInfo::getEmptyKey(); }
1530static InstrListElem *getTombstoneKey() {
1531return PtrInfo::getTombstoneKey();
1533staticunsigned getHashValue(
const InstrListElem *
E) {
1534return IInfo::getHashValue(
E->first);
1536staticboolisEqual(
const InstrListElem *
A,
const InstrListElem *
B) {
1537if (
A == getEmptyKey() ||
B == getEmptyKey())
1538returnA == getEmptyKey() &&
B == getEmptyKey();
1539if (
A == getTombstoneKey() ||
B == getTombstoneKey())
1540returnA == getTombstoneKey() &&
B == getTombstoneKey();
1541return IInfo::isEqual(
A->first,
B->first);
1548// Compare each instruction in `instrs` to leader of the N most recently-used 1549// chains. This limits the O(n^2) behavior of this pass while also allowing 1550// us to build arbitrarily long chains. 1552constexprint MaxChainsToTry = 64;
1554bool MatchFound =
false;
1555auto ChainIter = MRU.
begin();
1556for (
size_t J = 0; J < MaxChainsToTry && ChainIter != MRU.
end();
1558if (std::optional<APInt>
Offset = getConstantOffset(
1562 (ChainIter->first->comesBefore(
I) ?
I : ChainIter->first))) {
1563// `Offset` might not have the expected number of bits, if e.g. AS has a 1564// different number of bits than opaque pointers. 1565 ChainIter->second.emplace_back(
I,
Offset.value());
1566// Move ChainIter to the front of the MRU list. 1575APInt ZeroOffset(ASPtrBits, 0);
1576 InstrListElem *E =
new (
Allocator.Allocate()) InstrListElem(
I);
1577 E->second.emplace_back(
I, ZeroOffset);
1583 std::vector<Chain>
Ret;
1585// Iterate over MRU rather than Chains so the order is deterministic. 1587if (E.second.size() > 1)
1588Ret.emplace_back(std::move(E.second));
1592std::optional<APInt> Vectorizer::getConstantOffset(
Value *PtrA,
Value *PtrB,
1596 <<
", PtrB=" << *PtrB <<
", ContextInst= " << *ContextInst
1597 <<
", Depth=" <<
Depth <<
"\n");
1598// We'll ultimately return a value of this bit width, even if computations 1599// happen in a different width. 1600unsigned OrigBitWidth =
DL.getIndexTypeSizeInBits(PtrA->
getType());
1601APInt OffsetA(OrigBitWidth, 0);
1602APInt OffsetB(OrigBitWidth, 0);
1605unsigned NewPtrBitWidth =
DL.getTypeStoreSizeInBits(PtrA->
getType());
1606if (NewPtrBitWidth !=
DL.getTypeStoreSizeInBits(PtrB->
getType()))
1609// If we have to shrink the pointer, stripAndAccumulateInBoundsConstantOffsets 1610// should properly handle a possible overflow and the value should fit into 1611// the smallest data type used in the cast/gep chain. 1612assert(OffsetA.getSignificantBits() <= NewPtrBitWidth &&
1613 OffsetB.getSignificantBits() <= NewPtrBitWidth);
1615 OffsetA = OffsetA.sextOrTrunc(NewPtrBitWidth);
1616 OffsetB = OffsetB.sextOrTrunc(NewPtrBitWidth);
1618return (OffsetB - OffsetA).sextOrTrunc(OrigBitWidth);
1620// Try to compute B - A. 1623LLVM_DEBUG(
dbgs() <<
"LSV: SCEV PtrB - PtrA =" << *DistScev <<
"\n");
1626// Handle index width (the width of Dist) != pointer width (the width of 1627// the Offset*s at this point). 1629return (OffsetB - OffsetA + Dist).
sextOrTrunc(OrigBitWidth);
1632if (std::optional<APInt> Diff =
1633 getConstantOffsetComplexAddrs(PtrA, PtrB, ContextInst,
Depth))
1634return (OffsetB - OffsetA + Diff->sext(OffsetB.getBitWidth()))
AMDGPU Mark last scratch load
This file implements a class to represent arbitrary precision integral constant values and operations...
MachineBasicBlock MachineBasicBlock::iterator DebugLoc DL
Expand Atomic instructions
static bool isEqual(const Function &Caller, const Function &Callee)
This file contains the simple types necessary to represent the attributes associated with functions a...
static GCRegistry::Add< OcamlGC > B("ocaml", "ocaml 3.10-compatible GC")
static GCRegistry::Add< ErlangGC > A("erlang", "erlang-compatible garbage collector")
static GCRegistry::Add< CoreCLRGC > E("coreclr", "CoreCLR-compatible GC")
This file contains the declarations for the subclasses of Constant, which represent the different fla...
Returns the sub type a function will return at a given Idx Should correspond to the result type of an ExtractValue instruction executed with just that one unsigned Idx
This file defines the DenseMap class.
Module.h This file contains the declarations for the Module class.
static bool checkNoWrapFlags(Instruction *I, bool Signed)
static bool checkIfSafeAddSequence(const APInt &IdxDiff, Instruction *AddOpA, unsigned MatchingOpIdxA, Instruction *AddOpB, unsigned MatchingOpIdxB, bool Signed)
This file implements a map that provides insertion order iteration.
This file provides utility analysis objects describing memory locations.
#define INITIALIZE_PASS_DEPENDENCY(depName)
#define INITIALIZE_PASS_END(passName, arg, name, cfg, analysis)
#define INITIALIZE_PASS_BEGIN(passName, arg, name, cfg, analysis)
This file builds on the ADT/GraphTraits.h file to build a generic graph post order iterator.
static bool isSafeToMove(const MachineInstr &From, const MachineInstr &To)
Check if it's safe to move From down to To, checking that no physical registers are clobbered.
assert(ImpDefSCC.getReg()==AMDGPU::SCC &&ImpDefSCC.isDef())
This file contains some templates that are useful if you are working with the STL at all.
Provides some synthesis utilities to produce sequences of values.
This file defines the SmallPtrSet class.
This file defines the SmallVector class.
This file defines the 'Statistic' class, which is designed to be an easy way to expose various metric...
#define STATISTIC(VARNAME, DESC)
This pass exposes codegen information to IR-level passes.
A manager for alias analyses.
A wrapper pass to provide the legacy pass manager access to a suitably prepared AAResults object.
ModRefInfo getModRefInfo(const Instruction *I, const std::optional< MemoryLocation > &OptLoc)
Check whether or not an instruction may read or write the optionally specified memory location.
Class for arbitrary precision integers.
void clearBit(unsigned BitPosition)
Set a given bit to 0.
APInt zext(unsigned width) const
Zero extend to a new width.
APInt abs() const
Get the absolute value.
bool sgt(const APInt &RHS) const
Signed greater than comparison.
unsigned getBitWidth() const
Return the number of bits in the APInt.
bool ult(const APInt &RHS) const
Unsigned less than comparison.
bool sle(const APInt &RHS) const
Signed less or equal comparison.
APInt sextOrTrunc(unsigned width) const
Sign extend or truncate to width.
uint64_t getLimitedValue(uint64_t Limit=UINT64_MAX) const
If this value is smaller than the specified limit, return it, otherwise return the limit value.
bool sge(const APInt &RHS) const
Signed greater or equal comparison.
int64_t getSExtValue() const
Get sign extended value.
A container for analyses that lazily runs them and caches their results.
PassT::Result & getResult(IRUnitT &IR, ExtraArgTs... ExtraArgs)
Get the result of an analysis pass for a given IR unit.
Represent the analysis usage information of a pass.
AnalysisUsage & addRequired()
void setPreservesCFG()
This function should be called by the pass, iff they do not:
ArrayRef - Represent a constant reference to an array (0 or more elements consecutively in memory),...
size_t size() const
size - Get the array size.
bool empty() const
empty - Check if the array is empty.
A function analysis which provides an AssumptionCache.
AssumptionCache run(Function &F, FunctionAnalysisManager &)
An immutable pass that tracks lazily created AssumptionCache objects.
A cache of @llvm.assume calls within a function.
LLVM Basic Block Representation.
InstListType::reverse_iterator reverse_iterator
InstListType::iterator iterator
Instruction iterators...
Represents analyses that only rely on functions' control flow.
This class represents a range of values.
const APInt * getSingleElement() const
If this set contains a single element, return it, otherwise return null.
bool isSingleElement() const
Return true if this set contains exactly one member.
A parsed version of the target data layout string in and methods for querying it.
iterator find(const_arg_type_t< KeyT > Val)
const ValueT & at(const_arg_type_t< KeyT > Val) const
at - Return the entry for the specified key, or abort if no such entry exists.
Implements a dense probed hash-table based set.
Analysis pass which computes a DominatorTree.
Legacy analysis pass which computes a DominatorTree.
Concrete subclass of DominatorTreeBase that is used to compute a normal dominator tree.
Class to represent fixed width SIMD vectors.
static FixedVectorType * get(Type *ElementType, unsigned NumElts)
FunctionPass class - This class is used to implement most global optimizations.
virtual bool runOnFunction(Function &F)=0
runOnFunction - Virtual method overriden by subclasses to do the per-function processing of the pass.
Legacy wrapper pass to provide the GlobalsAAResult object.
This provides a uniform API for creating instructions and inserting them into a basic block: either a...
bool hasNoUnsignedWrap() const LLVM_READONLY
Determine whether the no unsigned wrap flag is set.
bool hasNoSignedWrap() const LLVM_READONLY
Determine whether the no signed wrap flag is set.
bool hasMetadata() const
Return true if this instruction has any metadata attached to it.
bool comesBefore(const Instruction *Other) const
Given an instruction Other in the same basic block as this instruction, return true if this instructi...
unsigned getOpcode() const
Returns a member of one of the enums like Instruction::Add.
void moveBefore(Instruction *MovePos)
Unlink this instruction from its current basic block and insert it into the basic block that MovePos ...
An instruction for reading from memory.
PreservedAnalyses run(Function &F, FunctionAnalysisManager &AM)
This class implements a map that also provides access to all stored values in a deterministic order.
std::pair< iterator, bool > insert(const std::pair< KeyT, ValueT > &KV)
static MemoryLocation get(const LoadInst *LI)
Return a location with information about the memory reference by the given instruction.
static PassRegistry * getPassRegistry()
getPassRegistry - Access the global registry object, which is automatically initialized at applicatio...
Pass interface - Implemented by all 'passes'.
virtual void getAnalysisUsage(AnalysisUsage &) const
getAnalysisUsage - This function should be overriden by passes that need analysis information to do t...
virtual StringRef getPassName() const
getPassName - Return a nice clean name for a pass.
static PoisonValue * get(Type *T)
Static factory methods - Return an 'poison' object of the specified type.
A set of analyses that are preserved following a run of a transformation pass.
static PreservedAnalyses all()
Construct a special preserved set that preserves all passes.
void preserveSet()
Mark an analysis set as preserved.
Legacy wrapper pass to provide the SCEVAAResult object.
This class represents an analyzed expression in the program.
Analysis pass that exposes the ScalarEvolution for a function.
The main scalar evolution driver.
const SCEV * getSCEV(Value *V)
Return a SCEV expression for the full generality of the specified expression.
ConstantRange getSignedRange(const SCEV *S)
Determine the signed range for a particular SCEV.
const SCEV * getMinusSCEV(const SCEV *LHS, const SCEV *RHS, SCEV::NoWrapFlags Flags=SCEV::FlagAnyWrap, unsigned Depth=0)
Return LHS-RHS.
const SCEV * getCouldNotCompute()
A templated base class for SmallPtrSet which provides the typesafe interface that is common across al...
std::pair< iterator, bool > insert(PtrType Ptr)
Inserts Ptr if and only if there is no element in the container equal to Ptr.
bool contains(ConstPtrType Ptr) const
SmallPtrSet - This class implements a set which is optimized for holding SmallSize or less elements.
reference emplace_back(ArgTypes &&... Args)
This is a 'vector' (really, a variable-sized array), optimized for the case when the array is small.
A BumpPtrAllocator that allows only elements of a specific type to be allocated.
StringRef - Represent a constant reference to a string, i.e.
Analysis pass providing the TargetTransformInfo.
Wrapper pass for TargetTransformInfo.
This pass provides access to the codegen interfaces that are needed for IR-level transformations.
bool isLegalToVectorizeLoad(LoadInst *LI) const
bool isLegalToVectorizeStore(StoreInst *SI) const
unsigned getStoreVectorFactor(unsigned VF, unsigned StoreSize, unsigned ChainSizeInBytes, VectorType *VecTy) const
bool isLegalToVectorizeStoreChain(unsigned ChainSizeInBytes, Align Alignment, unsigned AddrSpace) const
unsigned getLoadStoreVecRegBitWidth(unsigned AddrSpace) const
unsigned getLoadVectorFactor(unsigned VF, unsigned LoadSize, unsigned ChainSizeInBytes, VectorType *VecTy) const
bool allowsMisalignedMemoryAccesses(LLVMContext &Context, unsigned BitWidth, unsigned AddressSpace=0, Align Alignment=Align(1), unsigned *Fast=nullptr) const
Determine if the target supports unaligned memory accesses.
bool isLegalToVectorizeLoadChain(unsigned ChainSizeInBytes, Align Alignment, unsigned AddrSpace) const
Target - Wrapper for Target specific information.
The instances of the Type class are immutable: once they are created, they are never changed.
bool isVectorTy() const
True if this is an instance of VectorType.
bool isPointerTy() const
True if this is an instance of PointerType.
static IntegerType * getIntNTy(LLVMContext &C, unsigned N)
unsigned getScalarSizeInBits() const LLVM_READONLY
If this is a vector type, return the getPrimitiveSizeInBits value for the element type.
bool isPtrOrPtrVectorTy() const
Return true if this is a pointer type or a vector of pointer types.
Type * getScalarType() const
If this is a vector type, return the element type, otherwise return 'this'.
Value * getOperand(unsigned i) const
unsigned getNumOperands() const
LLVM Value Representation.
Type * getType() const
All values are typed, get the type of this value.
const Value * stripAndAccumulateInBoundsConstantOffsets(const DataLayout &DL, APInt &Offset) const
This is a wrapper around stripAndAccumulateConstantOffsets with the in-bounds requirement set to fals...
const Value * stripPointerCasts() const
Strip off pointer casts, all-zero GEPs and address space casts.
static bool isValidElementType(Type *ElemTy)
Return true if the specified type is valid as a element type.
std::pair< iterator, bool > insert(const ValueT &V)
TypeSize getSequentialElementStride(const DataLayout &DL) const
Value * getOperand() const
const ParentTy * getParent() const
This class implements an extremely fast bulk output stream that can only output to a stream.
A simple intrusive list implementation.
void push_front(reference Node)
Insert a node at the front; never copies.
void remove(reference N)
Remove a node by reference; never deletes.
This provides a very simple, boring adaptor for a begin and end iterator into a range type.
constexpr std::underlying_type_t< E > Mask()
Get a bitmask with 1s in all places up to the high-order bit of E's largest value.
@ C
The default llvm calling convention, compatible with C.
unsigned ID
LLVM IR allows to use arbitrary numbers as calling convention identifiers.
PointerTypeMap run(const Module &M)
Compute the PointerTypeMap for the module M.
This is an optimization pass for GlobalISel generic memory operations.
auto min_element(R &&Range)
Provide wrappers to std::min_element which take ranges instead of having to pass begin/end explicitly...
unsigned getLoadStoreAddressSpace(const Value *I)
A helper function that returns the address space of the pointer operand of load or store instruction.
auto size(R &&Range, std::enable_if_t< std::is_base_of< std::random_access_iterator_tag, typename std::iterator_traits< decltype(Range.begin())>::iterator_category >::value, void > *=nullptr)
Get the size of a range.
bool RecursivelyDeleteTriviallyDeadInstructions(Value *V, const TargetLibraryInfo *TLI=nullptr, MemorySSAUpdater *MSSAU=nullptr, std::function< void(Value *)> AboutToDeleteCallback=std::function< void(Value *)>())
If the specified value is a trivially dead instruction, delete it.
Pass * createLoadStoreVectorizerPass()
Create a legacy pass manager instance of the LoadStoreVectorizer pass.
const Value * getLoadStorePointerOperand(const Value *V)
A helper function that returns the pointer operand of a load or store instruction.
iterator_range< T > make_range(T x, T y)
Convenience function for iterating over sub-ranges.
const Value * getUnderlyingObject(const Value *V, unsigned MaxLookup=6)
This method strips off any GEP address adjustments, pointer casts or llvm.threadlocal....
iterator_range< po_iterator< T > > post_order(const T &G)
Align getLoadStoreAlignment(const Value *I)
A helper function that returns the alignment of load or store instruction.
Instruction * propagateMetadata(Instruction *I, ArrayRef< Value * > VL)
Specifically, let Kinds = [MD_tbaa, MD_alias_scope, MD_noalias, MD_fpmath, MD_nontemporal,...
bool any_of(R &&range, UnaryPredicate P)
Provide wrappers to std::any_of which take ranges instead of having to pass begin/end explicitly.
constexpr bool isPowerOf2_32(uint32_t Value)
Return true if the argument is a power of two > 0.
Align getOrEnforceKnownAlignment(Value *V, MaybeAlign PrefAlign, const DataLayout &DL, const Instruction *CxtI=nullptr, AssumptionCache *AC=nullptr, const DominatorTree *DT=nullptr)
Try to ensure that the alignment of V is at least PrefAlign bytes.
bool isModSet(const ModRefInfo MRI)
void sort(IteratorTy Start, IteratorTy End)
raw_ostream & dbgs()
dbgs() - This returns a reference to a raw_ostream for debugging messages.
bool isModOrRefSet(const ModRefInfo MRI)
ModRefInfo
Flags indicating whether a memory access modifies or references memory.
void computeKnownBits(const Value *V, KnownBits &Known, const DataLayout &DL, unsigned Depth=0, AssumptionCache *AC=nullptr, const Instruction *CxtI=nullptr, const DominatorTree *DT=nullptr, bool UseInstrInfo=true)
Determine which bits of V are known to be either zero or one and return them in the KnownZero/KnownOn...
auto max_element(R &&Range)
Provide wrappers to std::max_element which take ranges instead of having to pass begin/end explicitly...
raw_ostream & operator<<(raw_ostream &OS, const APFixedPoint &FX)
constexpr unsigned BitWidth
OutputIt move(R &&Range, OutputIt Out)
Provide wrappers to std::move which take ranges instead of having to pass begin/end explicitly.
bool isGuaranteedToTransferExecutionToSuccessor(const Instruction *I)
Return true if this function can prove that the instruction I will always transfer execution to one o...
gep_type_iterator gep_type_begin(const User *GEP)
void erase_if(Container &C, UnaryPredicate P)
Provide a container algorithm similar to C++ Library Fundamentals v2's erase_if which is equivalent t...
Type * getLoadStoreType(const Value *I)
A helper function that returns the type of a load or store instruction.
void initializeLoadStoreVectorizerLegacyPassPass(PassRegistry &)
Implement std::hash so that hash_code can be used in STL containers.
This struct is a compact representation of a valid (non-zero power of two) alignment.
uint64_t value() const
This is a hole in the type system and should not be abused.
An information struct used to provide DenseMap with the various necessary components for a given valu...
This struct is a compact representation of a valid (power of two) or undefined (0) alignment.