1//===- StackColoring.cpp --------------------------------------------------===// 3// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions. 4// See https://llvm.org/LICENSE.txt for license information. 5// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception 7//===----------------------------------------------------------------------===// 9// This pass implements the stack-coloring optimization that looks for 10// lifetime markers machine instructions (LIFETIME_START and LIFETIME_END), 11// which represent the possible lifetime of stack slots. It attempts to 12// merge disjoint stack slots and reduce the used stack space. 13// NOTE: This pass is not StackSlotColoring, which optimizes spill slots. 15// TODO: In the future we plan to improve stack coloring in the following ways: 16// 1. Allow merging multiple small slots into a single larger slot at different 18// 2. Merge this pass with StackSlotColoring and allow merging of allocas with 21//===----------------------------------------------------------------------===// 44#include "llvm/Config/llvm-config.h" 66#define DEBUG_TYPE "stack-coloring" 73/// The user may write code that uses allocas outside of the declared lifetime 74/// zone. This can happen when the user returns a reference to a local 75/// data-structure. We can detect these cases and decide not to optimize the 76/// code. If this flag is enabled, we try to save the user. This option 77/// is treated as overriding LifetimeStartOnFirstUse below. 81cl::desc(
"Do not optimize lifetime zones that " 84/// Enable enhanced dataflow scheme for lifetime analysis (treat first 85/// use of stack slot as start of slot lifetime, as opposed to looking 86/// for LIFETIME_START marker). See "Implementation notes" below for 91cl::desc(
"Treat stack lifetimes as starting on first use, not on START marker."));
94STATISTIC(NumMarkerSeen,
"Number of lifetime markers found.");
95STATISTIC(StackSpaceSaved,
"Number of bytes saved due to merging slots.");
96STATISTIC(StackSlotMerged,
"Number of stack slot merged.");
97STATISTIC(EscapedAllocas,
"Number of allocas that escaped the lifetime region");
99//===----------------------------------------------------------------------===// 101//===----------------------------------------------------------------------===// 103// Stack Coloring reduces stack usage by merging stack slots when they 104// can't be used together. For example, consider the following C program: 106// void bar(char *, int); 107// void foo(bool var) { 126// Naively-compiled, this program would use 12k of stack space. However, the 127// stack slot corresponding to `z` is always destroyed before either of the 128// stack slots for `x` or `y` are used, and then `x` is only used if `var` 129// is true, while `y` is only used if `var` is false. So in no time are 2 130// of the stack slots used together, and therefore we can merge them, 131// compiling the function using only a single 4k alloca: 133// void foo(bool var) { // equivalent 146// This is an important optimization if we want stack space to be under 147// control in large functions, both open-coded ones and ones created by 150// Implementation Notes: 151// --------------------- 153// An important part of the above reasoning is that `z` can't be accessed 154// while the latter 2 calls to `bar` are running. This is justified because 155// `z`'s lifetime is over after we exit from block `A:`, so any further 156// accesses to it would be UB. The way we represent this information 157// in LLVM is by having frontends delimit blocks with `lifetime.start` 158// and `lifetime.end` intrinsics. 160// The effect of these intrinsics seems to be as follows (maybe I should 161// specify this in the reference?): 163// L1) at start, each stack-slot is marked as *out-of-scope*, unless no 164// lifetime intrinsic refers to that stack slot, in which case 165// it is marked as *in-scope*. 166// L2) on a `lifetime.start`, a stack slot is marked as *in-scope* and 167// the stack slot is overwritten with `undef`. 168// L3) on a `lifetime.end`, a stack slot is marked as *out-of-scope*. 169// L4) on function exit, all stack slots are marked as *out-of-scope*. 170// L5) `lifetime.end` is a no-op when called on a slot that is already 172// L6) memory accesses to *out-of-scope* stack slots are UB. 173// L7) when a stack-slot is marked as *out-of-scope*, all pointers to it 174// are invalidated, unless the slot is "degenerate". This is used to 175// justify not marking slots as in-use until the pointer to them is 176// used, but feels a bit hacky in the presence of things like LICM. See 177// the "Degenerate Slots" section for more details. 179// Now, let's ground stack coloring on these rules. We'll define a slot 180// as *in-use* at a (dynamic) point in execution if it either can be 181// written to at that point, or if it has a live and non-undef content 184// Obviously, slots that are never *in-use* together can be merged, and 185// in our example `foo`, the slots for `x`, `y` and `z` are never 186// in-use together (of course, sometimes slots that *are* in-use together 187// might still be mergable, but we don't care about that here). 189// In this implementation, we successively merge pairs of slots that are 190// not *in-use* together. We could be smarter - for example, we could merge 191// a single large slot with 2 small slots, or we could construct the 192// interference graph and run a "smart" graph coloring algorithm, but with 193// that aside, how do we find out whether a pair of slots might be *in-use* 196// From our rules, we see that *out-of-scope* slots are never *in-use*, 197// and from (L7) we see that "non-degenerate" slots remain non-*in-use* 198// until their address is taken. Therefore, we can approximate slot activity 201// A subtle point: naively, we might try to figure out which pairs of 202// stack-slots interfere by propagating `S in-use` through the CFG for every 203// stack-slot `S`, and having `S` and `T` interfere if there is a CFG point in 204// which they are both *in-use*. 206// That is sound, but overly conservative in some cases: in our (artificial) 207// example `foo`, either `x` or `y` might be in use at the label `B:`, but 208// as `x` is only in use if we came in from the `var` edge and `y` only 209// if we came from the `!var` edge, they still can't be in use together. 210// See PR32488 for an important real-life case. 212// If we wanted to find all points of interference precisely, we could 213// propagate `S in-use` and `S&T in-use` predicates through the CFG. That 214// would be precise, but requires propagating `O(n^2)` dataflow facts. 216// However, we aren't interested in the *set* of points of interference 217// between 2 stack slots, only *whether* there *is* such a point. So we 218// can rely on a little trick: for `S` and `T` to be in-use together, 219// one of them needs to become in-use while the other is in-use (or 220// they might both become in use simultaneously). We can check this 221// by also keeping track of the points at which a stack slot might *start* 227// Consider the following motivating example: 230// char b1[1024], b2[1024]; 236// char b4[1024], b5[1024]; 237// <uses of b2, b4, b5>; 242// In the code above, "b3" and "b4" are declared in distinct lexical 243// scopes, meaning that it is easy to prove that they can share the 244// same stack slot. Variables "b1" and "b2" are declared in the same 245// scope, meaning that from a lexical point of view, their lifetimes 246// overlap. From a control flow pointer of view, however, the two 247// variables are accessed in disjoint regions of the CFG, thus it 248// should be possible for them to share the same stack slot. An ideal 249// stack allocation for the function above would look like: 255// Achieving this allocation is tricky, however, due to the way 256// lifetime markers are inserted. Here is a simplified view of the 257// control flow graph for the code above: 259// +------ block 0 -------+ 260// 0| LIFETIME_START b1, b2 | 261// 1| <test 'if' condition> | 262// +-----------------------+ 264// +------ block 1 -------+ +------ block 2 -------+ 265// 2| LIFETIME_START b3 | 5| LIFETIME_START b4, b5 | 266// 3| <uses of b1, b3> | 6| <uses of b2, b4, b5> | 267// 4| LIFETIME_END b3 | 7| LIFETIME_END b4, b5 | 268// +-----------------------+ +-----------------------+ 270// +------ block 3 -------+ 272// 9| LIFETIME_END b1, b2 | 274// +-----------------------+ 276// If we create live intervals for the variables above strictly based 277// on the lifetime markers, we'll get the set of intervals on the 278// left. If we ignore the lifetime start markers and instead treat a 279// variable's lifetime as beginning with the first reference to the 280// var, then we get the intervals on the right. 282// LIFETIME_START First Use 283// b1: [0,9] [3,4] [8,9] 289// For the intervals on the left, the best we can do is overlap two 290// variables (b3 and b4, for example); this gives us a stack size of 291// 4*1024 bytes, not ideal. When treating first-use as the start of a 292// lifetime, we can additionally overlap b1 and b5, giving us a 3*1024 293// byte stack (better). 298// Relying entirely on first-use of stack slots is problematic, 299// however, due to the fact that optimizations can sometimes migrate 300// uses of a variable outside of its lifetime start/end region. Here 304// char b1[1024], b2[1024]; 317// Before optimization, the control flow graph for the code above 318// might look like the following: 320// +------ block 0 -------+ 321// 0| LIFETIME_START b1, b2 | 322// 1| <test 'if' condition> | 323// +-----------------------+ 325// +------ block 1 -------+ +------- block 2 -------+ 326// 2| <uses of b2> | 3| <uses of b1> | 327// +-----------------------+ +-----------------------+ 329// | +------- block 3 -------+ <-\. 330// | 4| <while condition> | | 331// | +-----------------------+ | 333// | / +------- block 4 -------+ 334// \ / 5| LIFETIME_START b3 | | 335// \ / 6| <uses of b3> | | 336// \ / 7| LIFETIME_END b3 | | 337// \ | +------------------------+ | 339// +------ block 5 -----+ \--------------- 341// 9| LIFETIME_END b1, b2 | 343// +---------------------+ 345// During optimization, however, it can happen that an instruction 346// computing an address in "b3" (for example, a loop-invariant GEP) is 347// hoisted up out of the loop from block 4 to block 2. [Note that 348// this is not an actual load from the stack, only an instruction that 349// computes the address to be loaded]. If this happens, there is now a 350// path leading from the first use of b3 to the return instruction 351// that does not encounter the b3 LIFETIME_END, hence b3's lifetime is 352// now larger than if we were computing live intervals strictly based 353// on lifetime markers. In the example above, this lengthened lifetime 354// would mean that it would appear illegal to overlap b3 with b2. 356// To deal with this such cases, the code in ::collectMarkers() below 357// tries to identify "degenerate" slots -- those slots where on a single 358// forward pass through the CFG we encounter a first reference to slot 359// K before we hit the slot K lifetime start marker. For such slots, 360// we fall back on using the lifetime start marker as the beginning of 361// the variable's lifetime. NB: with this implementation, slots can 362// appear degenerate in cases where there is unstructured control flow: 367// memcpy(&b[0], ...); 372// If in RPO ordering chosen to walk the CFG we happen to visit the b[k] 373// before visiting the memcpy block (which will contain the lifetime start 374// for "b" then it will appear that 'b' has a degenerate lifetime. 378/// StackColoring - A machine pass for merging disjoint stack allocations, 379/// marked by the LIFETIME_START and LIFETIME_END pseudo instructions. 384 /// A class representing liveness information for a single basic block. 385 /// Each bit in the BitVector represents the liveness property 386 /// for a different stack slot. 387structBlockLifetimeInfo {
388 /// Which slots BEGINs in each basic block. 391 /// Which slots ENDs in each basic block. 394 /// Which slots are marked as LIVE_IN, coming into each basic block. 397 /// Which slots are marked as LIVE_OUT, coming out of each basic block. 401 /// Maps active slots (per bit) for each basic block. 403 LivenessMap BlockLiveness;
405 /// Maps serial numbers to basic blocks. 408 /// Maps basic blocks to a serial number. 411 /// Maps slots to their use interval. Outside of this interval, slots 412 /// values are either dead or `undef` and they will not be written to. 415 /// Maps slots to the points where they can become in-use. 418 /// VNInfo is used for the construction of LiveIntervals. 421 /// SlotIndex analysis object. 424 /// The list of lifetime markers found. These markers are to be removed 425 /// once the coloring is done. 428 /// Record the FI slots for which we have seen some sort of 429 /// lifetime marker (either start or end). 432 /// FI slots that need to be handled conservatively (for these 433 /// slots lifetime-start-on-first-use is disabled). 436 /// Number of iterations taken during data flow analysis. 437unsigned NumIterations;
440 StackColoring(
SlotIndexes *Indexes) : Indexes(Indexes) {}
444 /// Used in collectMarkers 449void dumpIntervals()
const;
451void dumpBV(
constchar *tag,
constBitVector &BV)
const;
453 /// Removes all of the lifetime marker instructions from the function. 454 /// \returns true if any markers were removed. 455bool removeAllMarkers();
457 /// Scan the machine function and find all of the lifetime markers. 458 /// Record the findings in the BEGIN and END vectors. 459 /// \returns the number of markers found. 460unsigned collectMarkers(
unsigned NumSlot);
462 /// Perform the dataflow calculation and calculate the lifetime for each of 463 /// the slots, based on the BEGIN/END vectors. Set the LifetimeLIVE_IN and 464 /// LifetimeLIVE_OUT maps that represent which stack slots are live coming 465 /// in and out blocks. 466void calculateLocalLiveness();
468 /// Returns TRUE if we're using the first-use-begins-lifetime method for 469 /// this slot (if FALSE, then the start marker is treated as start of lifetime). 470bool applyFirstUse(
int Slot) {
473if (ConservativeSlots.
test(Slot))
478 /// Examines the specified instruction and returns TRUE if the instruction 479 /// represents the start or end of an interesting lifetime. The slot or slots 480 /// starting or ending are added to the vector "slots" and "isStart" is set 482 /// \returns True if inst contains a lifetime start or end 487 /// Construct the LiveIntervals for the slots. 488void calculateLiveIntervals(
unsigned NumSlots);
490 /// Go over the machine function and change instructions which use stack 491 /// slots to use the joint slots. 494 /// The input program may contain instructions which are not inside lifetime 495 /// markers. This can happen due to a bug in the compiler or due to a bug in 496 /// user code (for example, returning a reference to a local variable). 497 /// This procedure checks all of the instructions in the function and 498 /// invalidates lifetime ranges which do not contain all of the instructions 499 /// which access that frame slot. 500void removeInvalidSlotRanges();
502 /// Map entries which point to other entries to their destination. 503 /// A->B->C becomes A->C. 517}
// end anonymous namespace 519char StackColoringLegacy::ID = 0;
524"Merge disjoint stack slots",
false,
false)
529void StackColoringLegacy::getAnalysisUsage(
AnalysisUsage &AU)
const{
534#if !defined(NDEBUG) || defined(LLVM_ENABLE_DUMP) 537dbgs() << tag <<
" : { ";
538for (
unsignedI = 0, E = BV.
size();
I != E; ++
I)
544 LivenessMap::const_iterator BI = BlockLiveness.find(
MBB);
545assert(BI != BlockLiveness.end() &&
"Block not found");
546const BlockLifetimeInfo &BlockInfo = BI->second;
548 dumpBV(
"BEGIN", BlockInfo.Begin);
549 dumpBV(
"END", BlockInfo.End);
550 dumpBV(
"LIVE_IN", BlockInfo.LiveIn);
551 dumpBV(
"LIVE_OUT", BlockInfo.LiveOut);
563for (
unsignedI = 0, E = Intervals.
size();
I != E; ++
I) {
564dbgs() <<
"Interval[" <<
I <<
"]:\n";
565 Intervals[
I]->dump();
572assert((
MI.getOpcode() == TargetOpcode::LIFETIME_START ||
573MI.getOpcode() == TargetOpcode::LIFETIME_END) &&
574"Expected LIFETIME_START or LIFETIME_END op");
582// At the moment the only way to end a variable lifetime is with 583// a VARIABLE_LIFETIME op (which can't contain a start). If things 584// change and the IR allows for a single inst that both begins 585// and ends lifetime(s), this interface will need to be reworked. 589if (
MI.getOpcode() == TargetOpcode::LIFETIME_START ||
590MI.getOpcode() == TargetOpcode::LIFETIME_END) {
594if (!InterestingSlots.
test(Slot))
596slots.push_back(Slot);
597if (
MI.getOpcode() == TargetOpcode::LIFETIME_END) {
601if (!applyFirstUse(Slot)) {
606if (!
MI.isDebugInstr()) {
611intSlot = MO.getIndex();
614if (InterestingSlots.
test(Slot) && applyFirstUse(Slot)) {
615slots.push_back(Slot);
628unsigned StackColoring::collectMarkers(
unsigned NumSlot) {
629unsigned MarkersFound = 0;
630 BlockBitVecMap SeenStartMap;
631 InterestingSlots.
clear();
632 InterestingSlots.
resize(NumSlot);
633 ConservativeSlots.
clear();
634 ConservativeSlots.
resize(NumSlot);
636// number of start and end lifetime ops for each slot 640// Step 1: collect markers and populate the "InterestingSlots" 641// and "ConservativeSlots" sets. 643// Compute the set of slots for which we've seen a START marker but have 644// not yet seen an END marker at this point in the walk (e.g. on entry 647 BetweenStartEnd.
resize(NumSlot);
649 BlockBitVecMap::const_iterator
I = SeenStartMap.find(Pred);
650if (
I != SeenStartMap.end()) {
651 BetweenStartEnd |=
I->second;
655// Walk the instructions in the block to look for start/end ops. 657if (
MI.isDebugInstr())
659if (
MI.getOpcode() == TargetOpcode::LIFETIME_START ||
660MI.getOpcode() == TargetOpcode::LIFETIME_END) {
664 InterestingSlots.
set(Slot);
665if (
MI.getOpcode() == TargetOpcode::LIFETIME_START) {
666 BetweenStartEnd.
set(Slot);
667 NumStartLifetimes[
Slot] += 1;
669 BetweenStartEnd.
reset(Slot);
670 NumEndLifetimes[
Slot] += 1;
680 <<
" with allocation: " << Allocation->
getName() <<
"\n");
688intSlot = MO.getIndex();
691if (! BetweenStartEnd.
test(Slot)) {
692 ConservativeSlots.
set(Slot);
698 SeenStart |= BetweenStartEnd;
704// PR27903: slots with multiple start or end lifetime ops are not 705// safe to enable for "lifetime-start-on-first-use". 706for (
unsigned slot = 0; slot < NumSlot; ++slot) {
707if (NumStartLifetimes[slot] > 1 || NumEndLifetimes[slot] > 1)
708 ConservativeSlots.
set(slot);
711// The write to the catch object by the personality function is not propely 712// modeled in IR: It happens before any cleanuppads are executed, even if the 713// first mention of the catch object is in a catchpad. As such, mark catch 714// object slots as conservative, so they are excluded from first-use analysis. 718if (
H.CatchObj.FrameIndex != std::numeric_limits<int>::max() &&
719H.CatchObj.FrameIndex >= 0)
720 ConservativeSlots.
set(
H.CatchObj.FrameIndex);
722LLVM_DEBUG(dumpBV(
"Conservative slots", ConservativeSlots));
724// Step 2: compute begin/end sets for each block 726// NOTE: We use a depth-first iteration to ensure that we obtain a 727// deterministic numbering. 729// Assign a serial number to this basic block. 730 BasicBlocks[
MBB] = BasicBlockNumbering.
size();
733// Keep a reference to avoid repeated lookups. 734 BlockLifetimeInfo &BlockInfo = BlockLiveness[
MBB];
736 BlockInfo.Begin.resize(NumSlot);
737 BlockInfo.End.resize(NumSlot);
743if (isLifetimeStartOrEnd(
MI,
slots, isStart)) {
745assert(
slots.size() == 1 &&
"unexpected: MI ends multiple slots");
747if (BlockInfo.Begin.test(Slot)) {
748 BlockInfo.Begin.reset(Slot);
750 BlockInfo.End.set(Slot);
752for (
auto Slot :
slots) {
760 <<
" with allocation: " << Allocation->
getName());
763if (BlockInfo.End.test(Slot)) {
764 BlockInfo.End.reset(Slot);
766 BlockInfo.Begin.set(Slot);
774 NumMarkerSeen += MarkersFound;
778void StackColoring::calculateLocalLiveness() {
779unsigned NumIters = 0;
781// Create BitVector outside the loop and reuse them to avoid repeated heap 790// Use an iterator to avoid repeated lookups. 791 LivenessMap::iterator BI = BlockLiveness.find(BB);
792assert(BI != BlockLiveness.end() &&
"Block not found");
793 BlockLifetimeInfo &BlockInfo = BI->second;
795// Compute LiveIn by unioning together the LiveOut sets of all preds. 798 LivenessMap::const_iterator
I = BlockLiveness.find(Pred);
799// PR37130: transformations prior to stack coloring can 800// sometimes leave behind statically unreachable blocks; these 801// can be safely skipped here. 802if (
I != BlockLiveness.end())
803 LocalLiveIn |=
I->second.LiveOut;
806// Compute LiveOut by subtracting out lifetimes that end in this 807// block, then adding in lifetimes that begin in this block. If 808// we have both BEGIN and END markers in the same basic block 809// then we know that the BEGIN marker comes after the END, 810// because we already handle the case where the BEGIN comes 811// before the END when collecting the markers (and building the 812// BEGIN/END vectors). 813 LocalLiveOut = LocalLiveIn;
814 LocalLiveOut.
reset(BlockInfo.End);
815 LocalLiveOut |= BlockInfo.Begin;
817// Update block LiveIn set, noting whether it has changed. 818if (LocalLiveIn.
test(BlockInfo.LiveIn)) {
820 BlockInfo.LiveIn |= LocalLiveIn;
823// Update block LiveOut set, noting whether it has changed. 824if (LocalLiveOut.
test(BlockInfo.LiveOut)) {
826 BlockInfo.LiveOut |= LocalLiveOut;
831 NumIterations = NumIters;
834void StackColoring::calculateLiveIntervals(
unsigned NumSlots) {
838// For each block, find which slots are active within this block 839// and update the live intervals. 843 DefinitelyInUse.
clear();
844 DefinitelyInUse.
resize(NumSlots);
846// Start the interval of the slots that we previously found to be 'in-use'. 847 BlockLifetimeInfo &MBBLiveness = BlockLiveness[&
MBB];
848for (
int pos = MBBLiveness.LiveIn.find_first(); pos != -1;
849 pos = MBBLiveness.LiveIn.find_next(pos)) {
853// Create the interval for the basic blocks containing lifetime begin/end. 857if (!isLifetimeStartOrEnd(
MI,
slots, IsStart))
860for (
auto Slot :
slots) {
862// If a slot is already definitely in use, we don't have to emit 863// a new start marker because there is already a pre-existing 865if (!DefinitelyInUse[Slot]) {
867 DefinitelyInUse[
Slot] =
true;
870 Starts[
Slot] = ThisIndex;
873VNInfo *VNI = Intervals[
Slot]->getValNumInfo(0);
874 Intervals[
Slot]->addSegment(
877 DefinitelyInUse[
Slot] =
false;
883// Finish up started segments 884for (
unsigned i = 0; i < NumSlots; ++i) {
889VNInfo *VNI = Intervals[i]->getValNumInfo(0);
895bool StackColoring::removeAllMarkers() {
898MI->eraseFromParent();
908unsigned FixedInstr = 0;
909unsigned FixedMemOp = 0;
910unsigned FixedDbg = 0;
912// Remap debug information that refers to stack slots. 913for (
auto &VI : MF->getVariableDbgInfo()) {
914if (!
VI.Var || !
VI.inStackSlot())
916intSlot =
VI.getStackSlot();
917if (SlotRemap.
count(Slot)) {
919 << cast<DILocalVariable>(
VI.Var)->getName() <<
"].\n");
920VI.updateStackSlot(SlotRemap[Slot]);
925// Keep a list of *allocas* which need to be remapped. 928// Keep a list of allocas which has been affected by the remap. 931for (
const std::pair<int, int> &SI : SlotRemap) {
934assert(To &&
From &&
"Invalid allocation object");
937// If From is before wo, its possible that there is a use of From between 939if (
From->comesBefore(To))
943// AA might be used later for instruction scheduling, and we need it to be 944// able to deduce the correct aliasing releationships between pointers 945// derived from the alloca being remapped and the target of that remapping. 946// The only safe way, without directly informing AA about the remapping 947// somehow, is to directly update the IR to reflect the change being made 956// We keep both slots to maintain AliasAnalysis metadata later. 960// Transfer the stack protector layout tag, but make sure that SSPLK_AddrOf 961// does not overwrite SSPLK_SmallArray or SSPLK_LargeArray, and make sure 962// that SSPLK_SmallArray does not overwrite SSPLK_LargeArray. 972// The new alloca might not be valid in a llvm.dbg.declare for this 973// variable, so poison out the use to make the verifier happy. 977for (
auto &
Use : FromAI->
uses()) {
979if (BCI->isUsedByMetadata())
983// Note that this will not replace uses in MMOs (which we'll update below), 984// or anywhere else (which is why we won't delete the original 989// Remap all instructions to the new stack slots. 990 std::vector<std::vector<MachineMemOperand *>> SSRefs(
994// Skip lifetime markers. We'll remove them soon. 995if (
I.getOpcode() == TargetOpcode::LIFETIME_START ||
996I.getOpcode() == TargetOpcode::LIFETIME_END)
999// Update the MachineMemOperand to use the new alloca. 1001// We've replaced IR-level uses of the remapped allocas, so we only 1002// need to replace direct uses here. 1003constAllocaInst *AI = dyn_cast_or_null<AllocaInst>(MMO->getValue());
1007if (!Allocas.
count(AI))
1010 MMO->setValue(Allocas[AI]);
1014// Update all of the machine instruction operands. 1018int FromSlot = MO.getIndex();
1020// Don't touch arguments. 1024// Only look at mapped slots. 1025if (!SlotRemap.count(FromSlot))
1028// In a debug build, check that the instruction that we are modifying is 1029// inside the expected live range. If the instruction is not inside 1030// the calculated range then it means that the alloca usage moved 1031// outside of the lifetime markers, or that the user has a bug. 1032// NOTE: Alloca address calculations which happen outside the lifetime 1033// zone are okay, despite the fact that we don't have a good way 1034// for validating all of the usages of the calculation. 1036bool TouchesMemory =
I.mayLoadOrStore();
1037// If we *don't* protect the user from escaped allocas, don't bother 1038// validating the instructions. 1043"Found instruction usage outside of live range.");
1047// Fix the machine instructions. 1048int ToSlot = SlotRemap[FromSlot];
1049 MO.setIndex(ToSlot);
1053// We adjust AliasAnalysis information for merged stack slots. 1055bool ReplaceMemOps =
false;
1057// Collect MachineMemOperands which reference 1058// FixedStackPseudoSourceValues with old frame indices. 1059if (
constauto *FSV = dyn_cast_or_null<FixedStackPseudoSourceValue>(
1060 MMO->getPseudoValue())) {
1061int FI = FSV->getFrameIndex();
1062auto To = SlotRemap.find(FI);
1063if (To != SlotRemap.end())
1064 SSRefs[FI].push_back(MMO);
1067// If this memory location can be a slot remapped here, 1068// we remove AA information. 1069bool MayHaveConflictingAAMD =
false;
1070if (MMO->getAAInfo()) {
1071if (
constValue *MMOV = MMO->getValue()) {
1076 MayHaveConflictingAAMD =
true;
1078for (
Value *V : Objs) {
1079// If this memory location comes from a known stack slot 1080// that is not remapped, we continue checking. 1081// Otherwise, we need to invalidate AA infomation. 1082constAllocaInst *AI = dyn_cast_or_null<AllocaInst>(V);
1083if (AI && MergedAllocas.
count(AI)) {
1084 MayHaveConflictingAAMD =
true;
1090if (MayHaveConflictingAAMD) {
1092 ReplaceMemOps =
true;
1098// If any memory operand is updated, set memory references of 1101I.setMemRefs(*MF, NewMMOs);
1104// Rewrite MachineMemOperands that reference old frame indices. 1106if (!E.value().empty()) {
1108 MF->getPSVManager().getFixedStack(SlotRemap.find(E.index())->second);
1110Ref->setValue(NewSV);
1113// Update the location of C++ catch objects for the MSVC personality routine. 1117if (
H.CatchObj.FrameIndex != std::numeric_limits<int>::max() &&
1118 SlotRemap.count(
H.CatchObj.FrameIndex))
1119H.CatchObj.FrameIndex = SlotRemap[
H.CatchObj.FrameIndex];
1121LLVM_DEBUG(
dbgs() <<
"Fixed " << FixedMemOp <<
" machine memory operands.\n");
1122LLVM_DEBUG(
dbgs() <<
"Fixed " << FixedDbg <<
" debug locations.\n");
1123LLVM_DEBUG(
dbgs() <<
"Fixed " << FixedInstr <<
" machine instructions.\n");
1129void StackColoring::removeInvalidSlotRanges() {
1132if (
I.getOpcode() == TargetOpcode::LIFETIME_START ||
1133I.getOpcode() == TargetOpcode::LIFETIME_END ||
I.isDebugInstr())
1136// Some intervals are suspicious! In some cases we find address 1137// calculations outside of the lifetime zone, but not actual memory 1138// read or write. Memory accesses outside of the lifetime zone are a clear 1139// violation, but address calculations are okay. This can happen when 1140// GEPs are hoisted outside of the lifetime zone. 1141// So, in here we only check instructions which can read or write memory. 1142if (!
I.mayLoad() && !
I.mayStore())
1145// Check all of the machine operands. 1150intSlot = MO.getIndex();
1155if (Intervals[Slot]->empty())
1158// Check that the used slot is inside the calculated lifetime range. 1159// If it is not, warn about it and invalidate the range. 1173// Expunge slot remap map. 1174for (
unsigned i=0; i < NumSlots; ++i) {
1175// If we are remapping i 1176if (SlotRemap.
count(i)) {
1178// As long as our target is mapped to something else, follow it. 1191 StackColoring
SC(&getAnalysis<SlotIndexesWrapperPass>().getSI());
1205 <<
"********** Function: " << Func.getName() <<
'\n');
1208 BlockLiveness.clear();
1209 BasicBlocks.
clear();
1210 BasicBlockNumbering.clear();
1214 VNInfoAllocator.
Reset();
1218// If there are no stack slots then there are no markers to remove. 1223 SortedSlots.
reserve(NumSlots);
1225 LiveStarts.
resize(NumSlots);
1227unsigned NumMarkers = collectMarkers(NumSlots);
1229unsigned TotalSize = 0;
1230LLVM_DEBUG(
dbgs() <<
"Found " << NumMarkers <<
" markers and " << NumSlots
1240LLVM_DEBUG(
dbgs() <<
"Total Stack size: " << TotalSize <<
" bytes\n\n");
1242// Don't continue because there are not enough lifetime markers, or the 1243// stack is too small, or we are told not to optimize the slots. 1246return removeAllMarkers();
1249for (
unsigned i=0; i < NumSlots; ++i) {
1250 std::unique_ptr<LiveInterval> LI(
newLiveInterval(i, 0));
1251 LI->getNextValue(Indexes->
getZeroIndex(), VNInfoAllocator);
1256// Calculate the liveness of each block. 1257 calculateLocalLiveness();
1258LLVM_DEBUG(
dbgs() <<
"Dataflow iterations: " << NumIterations <<
"\n");
1261// Propagate the liveness information. 1262 calculateLiveIntervals(NumSlots);
1265// Search for allocas which are used outside of the declared lifetime 1268 removeInvalidSlotRanges();
1270// Maps old slots to new slots. 1272unsigned RemovedSlots = 0;
1273unsigned ReducedSize = 0;
1275// Do not bother looking at empty intervals. 1276for (
unsignedI = 0;
I < NumSlots; ++
I) {
1277if (Intervals[SortedSlots[
I]]->empty())
1278 SortedSlots[
I] = -1;
1281// This is a simple greedy algorithm for merging allocas. First, sort the 1282// slots, placing the largest slots first. Next, perform an n^2 scan and look 1283// for disjoint slots. When you find disjoint slots, merge the smaller one 1284// into the bigger one and update the live interval. Remove the small alloca 1287// Sort the slots according to their size. Place unused slots at the end. 1288// Use stable sort to guarantee deterministic code generation. 1290// We use -1 to denote a uninteresting slot. Place these slots at the end. 1295// Sort according to size. 1299for (
auto &s : LiveStarts)
1305for (
unsignedI = 0;
I < NumSlots; ++
I) {
1306if (SortedSlots[
I] == -1)
1309for (
unsigned J=
I+1; J < NumSlots; ++J) {
1310if (SortedSlots[J] == -1)
1313int FirstSlot = SortedSlots[
I];
1314int SecondSlot = SortedSlots[J];
1316// Objects with different stack IDs cannot be merged. 1322auto &FirstS = LiveStarts[FirstSlot];
1323auto &SecondS = LiveStarts[SecondSlot];
1326// Merge disjoint slots. This is a little bit tricky - see the 1327// Implementation Notes section for an explanation. 1328if (!
First->isLiveAtIndexes(SecondS) &&
1331First->MergeSegmentsInAsValue(*Second,
First->getValNumInfo(0));
1333int OldSize = FirstS.size();
1334 FirstS.append(SecondS.begin(), SecondS.end());
1335auto Mid = FirstS.begin() + OldSize;
1336 std::inplace_merge(FirstS.begin(), Mid, FirstS.end());
1338 SlotRemap[SecondSlot] = FirstSlot;
1339 SortedSlots[J] = -1;
1341 << SecondSlot <<
" together.\n");
1347"Merging a small object into a larger one");
1358// Record statistics. 1359 StackSpaceSaved += ReducedSize;
1360 StackSlotMerged += RemovedSlots;
1362 << ReducedSize <<
" bytes\n");
1364// Scan the entire function and update all machine operands that use frame 1365// indices to use the remapped frame index. 1366if (!SlotRemap.
empty()) {
1367 expungeSlotMap(SlotRemap, NumSlots);
1368 remapInstructions(SlotRemap);
1371return removeAllMarkers();
This file implements the BitVector class.
BlockVerifier::State From
#define LLVM_DUMP_METHOD
Mark debug helper function definitions like dump() that should not be stripped from debug builds.
This file contains the declarations for the subclasses of Constant, which represent the different fla...
This file defines the DenseMap class.
This file builds on the ADT/GraphTraits.h file to build generic depth first graph iterator.
This defines the Use class.
std::pair< uint64_t, uint64_t > Interval
This file contains the declarations for metadata subclasses.
#define INITIALIZE_PASS_DEPENDENCY(depName)
#define INITIALIZE_PASS_END(passName, arg, name, cfg, analysis)
#define INITIALIZE_PASS_BEGIN(passName, arg, name, cfg, analysis)
static bool isValid(const char C)
Returns true if C is a valid mangled character: <0-9a-zA-Z_>.
assert(ImpDefSCC.getReg()==AMDGPU::SCC &&ImpDefSCC.isDef())
This file defines the SmallPtrSet class.
This file defines the SmallVector class.
static int getStartOrEndSlot(const MachineInstr &MI)
static cl::opt< bool > DisableColoring("no-stack-coloring", cl::init(false), cl::Hidden, cl::desc("Disable stack coloring"))
static cl::opt< bool > ProtectFromEscapedAllocas("protect-from-escaped-allocas", cl::init(false), cl::Hidden, cl::desc("Do not optimize lifetime zones that " "are broken"))
The user may write code that uses allocas outside of the declared lifetime zone.
static cl::opt< bool > LifetimeStartOnFirstUse("stackcoloring-lifetime-start-on-first-use", cl::init(true), cl::Hidden, cl::desc("Treat stack lifetimes as starting on first use, not on START marker."))
Enable enhanced dataflow scheme for lifetime analysis (treat first use of stack slot as start of slot...
Merge disjoint stack slots
This file defines the 'Statistic' class, which is designed to be an easy way to expose various metric...
#define STATISTIC(VARNAME, DESC)
an instruction to allocate memory on the stack
PointerType * getType() const
Overload to return most specific pointer type.
A container for analyses that lazily runs them and caches their results.
PassT::Result & getResult(IRUnitT &IR, ExtraArgTs... ExtraArgs)
Get the result of an analysis pass for a given IR unit.
Represent the analysis usage information of a pass.
This class represents a no-op cast from one type to another.
bool test(unsigned Idx) const
void resize(unsigned N, bool t=false)
resize - Grow or shrink the bitvector.
void clear()
clear - Removes all bits from the bitvector.
size_type size() const
size - Returns the number of bits in this bitvector.
Allocate memory in an ever growing pool, as if by bump-pointer.
void Reset()
Deallocate all but the current slab and reset the current pointer to the beginning of it,...
size_type count(const_arg_type_t< KeyT > Val) const
Return 1 if the specified key is in the map, 0 otherwise.
void insertAfter(Instruction *InsertPos)
Insert an unlinked instruction into a basic block immediately after the specified instruction.
LiveInterval - This class represents the liveness of a register, or stack slot.
bool isLiveAtIndexes(ArrayRef< SlotIndex > Slots) const
int getNumber() const
MachineBasicBlocks are uniquely numbered at the function level, unless they're not in a MachineFuncti...
iterator_range< pred_iterator > predecessors()
StringRef getName() const
Return the name of the corresponding LLVM basic block, or an empty string.
The MachineFrameInfo class represents an abstract stack frame until prolog/epilog code is inserted.
SSPLayoutKind getObjectSSPLayout(int ObjectIdx) const
const AllocaInst * getObjectAllocation(int ObjectIdx) const
Return the underlying Alloca of the specified stack object if it exists.
SSPLayoutKind
Stack Smashing Protection (SSP) rules require that vulnerable stack allocations are located close the...
@ SSPLK_LargeArray
Array or nested array >= SSP-buffer-size.
@ SSPLK_AddrOf
The address of this allocation is exposed and triggered protection.
@ SSPLK_None
Did not trigger a stack protector.
void setObjectSSPLayout(int ObjectIdx, SSPLayoutKind Kind)
Align getObjectAlign(int ObjectIdx) const
Return the alignment of the specified stack object.
int64_t getObjectSize(int ObjectIdx) const
Return the size of the specified object.
void RemoveStackObject(int ObjectIdx)
Remove or mark dead a statically sized stack object.
int getObjectIndexEnd() const
Return one past the maximum frame object index.
uint8_t getStackID(int ObjectIdx) const
void setObjectAlignment(int ObjectIdx, Align Alignment)
setObjectAlignment - Change the alignment of the specified stack object.
MachineFunctionPass - This class adapts the FunctionPass interface to allow convenient creation of pa...
void getAnalysisUsage(AnalysisUsage &AU) const override
getAnalysisUsage - Subclasses that override getAnalysisUsage must call this.
virtual bool runOnMachineFunction(MachineFunction &MF)=0
runOnMachineFunction - This method must be overloaded to perform the desired machine code transformat...
const WinEHFuncInfo * getWinEHFuncInfo() const
getWinEHFuncInfo - Return information about how the current function uses Windows exception handling.
MachineFrameInfo & getFrameInfo()
getFrameInfo - Return the frame info object for the current function.
Function & getFunction()
Return the LLVM function that this machine code represents.
Representation of each machine instruction.
A description of a memory reference used in the backend.
MachineOperand class - Representation of each machine instruction operand.
static PoisonValue * get(Type *T)
Static factory methods - Return an 'poison' object of the specified type.
A set of analyses that are preserved following a run of a transformation pass.
static PreservedAnalyses none()
Convenience factory function for the empty preserved set.
static PreservedAnalyses all()
Construct a special preserved set that preserves all passes.
Special value supplied for machine level alias analysis.
SlotIndex - An opaque wrapper around machine indexes.
void print(raw_ostream &os) const
Print this index to the given raw_ostream.
SlotIndex getMBBEndIdx(unsigned Num) const
Returns the last index in the given basic block number.
SlotIndex getInstructionIndex(const MachineInstr &MI, bool IgnoreBundle=false) const
Returns the base index for the given instruction.
SlotIndex getMBBStartIdx(unsigned Num) const
Returns the first index in the given basic block number.
SlotIndex getZeroIndex()
Returns the zero index for this analysis.
size_type count(ConstPtrType Ptr) const
count - Return 1 if the specified pointer is in the set, 0 otherwise.
std::pair< iterator, bool > insert(PtrType Ptr)
Inserts Ptr if and only if there is no element in the container equal to Ptr.
SmallPtrSet - This class implements a set which is optimized for holding SmallSize or less elements.
void reserve(size_type N)
void push_back(const T &Elt)
This is a 'vector' (really, a variable-sized array), optimized for the case when the array is small.
PreservedAnalyses run(MachineFunction &MF, MachineFunctionAnalysisManager &MFAM)
Target - Wrapper for Target specific information.
A Use represents the edge between a Value definition and its users.
VNInfo - Value Number Information.
static void handleRAUW(Value *From, Value *To)
LLVM Value Representation.
void replaceAllUsesWith(Value *V)
Change all uses of this to point to a new Value.
bool isUsedByMetadata() const
Return true if there is metadata referencing this value.
iterator_range< use_iterator > uses()
StringRef getName() const
Return a constant reference to the value's name.
self_iterator getIterator()
unsigned ID
LLVM IR allows to use arbitrary numbers as calling convention identifiers.
@ SC
CHAIN = SC CHAIN, Imm128 - System call.
initializer< Ty > init(const Ty &Val)
PointerTypeMap run(const Module &M)
Compute the PointerTypeMap for the module M.
This is an optimization pass for GlobalISel generic memory operations.
void dump(const SparseBitVector< ElementSize > &LHS, raw_ostream &out)
void stable_sort(R &&Range)
auto enumerate(FirstRange &&First, RestRanges &&...Rest)
Given two or more input ranges, returns a new range whose values are tuples (A, B,...
bool getUnderlyingObjectsForCodeGen(const Value *V, SmallVectorImpl< Value * > &Objects)
This is a wrapper around getUnderlyingObjects and adds support for basic ptrtoint+arithmetic+inttoptr...
void sort(IteratorTy Start, IteratorTy End)
raw_ostream & dbgs()
dbgs() - This returns a reference to a raw_ostream for debugging messages.
@ Ref
The access may reference the value stored in memory.
@ First
Helpers to iterate all locations in the MemoryEffectsBase class.
char & StackColoringLegacyID
StackSlotColoring - This pass performs stack coloring and merging.
iterator_range< df_iterator< T > > depth_first(const T &G)
Printable printMBBReference(const MachineBasicBlock &MBB)
Prints a machine basic block reference.
A collection of metadata nodes that might be associated with a memory access used by the alias-analys...
This struct is a compact representation of a valid (non-zero power of two) alignment.
This represents a simple continuous liveness interval for a value.
SmallVector< WinEHHandlerType, 1 > HandlerArray