Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

[LV] Use ExtractLane(LastActiveLane, V) live outs when tail-folding. (WIP)#149042

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Open
fhahn wants to merge2 commits intollvm:main
base:main
Choose a base branch
Loading
fromfhahn:lv-tf-external-users

Conversation

fhahn
Copy link
Contributor

Building on top of#148817, use
ExtractLane + FirstActiveLane to support vectorizing external users when
tail-folding.

Currently marked as WIP as there is a regression when
-prefer-predicate-over-epilogue=predicate-else-scalar-epilogue is used
because we bail out when building VPlans, so cannot recover and switch
to non-tail-folding.

Ideally we would have built both VPlans
(#148882).

See also#148603

Depends on#148817 (included in
the PR).

This patch adds a new ExtractLane VPInstruction which extracts acrossmultiple parts using a wide index, to be used in combination withFirstActiveLane.The patch updates early-exit codegen to use it instead ExtractElement,which is only per-part. With this change, interleaving should workcorrectly with early-exit loops.The patch removes the restrictions added in6f43754 (llvm#145877), butdoes not yet automatically select interleave counts > 1 for early-exitloops.I'll share a patch as follow-up. The cost of extracting a lane addsnon-trivial overhead in the exit block, so that should be consideredwhen picking the interleave count.
@llvmbot
Copy link
Member

llvmbot commentedJul 16, 2025
edited
Loading

@llvm/pr-subscribers-backend-risc-v
@llvm/pr-subscribers-vectorizers

@llvm/pr-subscribers-llvm-transforms

Author: Florian Hahn (fhahn)

Changes

Building on top of#148817, use
ExtractLane + FirstActiveLane to support vectorizing external users when
tail-folding.

Currently marked as WIP as there is a regression when
-prefer-predicate-over-epilogue=predicate-else-scalar-epilogue is used
because we bail out when building VPlans, so cannot recover and switch
to non-tail-folding.

Ideally we would have built both VPlans
(#148882).

See also#148603

Depends on#148817 (included in
the PR).


Patch is 134.79 KiB, truncated to 20.00 KiB below, full version:https://github.com/llvm/llvm-project/pull/149042.diff

17 Files Affected:

  • (modified) llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp (-18)
  • (modified) llvm/lib/Transforms/Vectorize/LoopVectorize.cpp (+20-11)
  • (modified) llvm/lib/Transforms/Vectorize/VPlan.h (+4)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp (+2)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanPredicator.cpp (+47-5)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp (+45-3)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp (+4-4)
  • (modified) llvm/lib/Transforms/Vectorize/VPlanUnroll.cpp (+7)
  • (modified) llvm/test/Transforms/LoopVectorize/AArch64/single-early-exit-interleave.ll (+56-6)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/scalable-tailfold.ll (+43-9)
  • (modified) llvm/test/Transforms/LoopVectorize/RISCV/uniform-load-store.ll (+95-38)
  • (modified) llvm/test/Transforms/LoopVectorize/X86/small-size.ll (+71-10)
  • (modified) llvm/test/Transforms/LoopVectorize/pr43166-fold-tail-by-masking.ll (+73-15)
  • (modified) llvm/test/Transforms/LoopVectorize/single-early-exit-interleave-hint.ll (+15-8)
  • (modified) llvm/test/Transforms/LoopVectorize/single-early-exit-interleave.ll (+334-32)
  • (modified) llvm/test/Transforms/LoopVectorize/use-scalar-epilogue-if-tp-fails.ll (+108-88)
  • (modified) llvm/test/Transforms/LoopVectorize/vector-loop-backedge-elimination-early-exit.ll (+25-22)
diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cppindex 969d225c6ef2e..b3b5f2aa39540 100644--- a/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp+++ b/llvm/lib/Transforms/Vectorize/LoopVectorizationLegality.cpp@@ -1929,24 +1929,6 @@ bool LoopVectorizationLegality::canFoldTailByMasking() const {   for (const auto &Reduction : getReductionVars())     ReductionLiveOuts.insert(Reduction.second.getLoopExitInstr());-  // TODO: handle non-reduction outside users when tail is folded by masking.-  for (auto *AE : AllowedExit) {-    // Check that all users of allowed exit values are inside the loop or-    // are the live-out of a reduction.-    if (ReductionLiveOuts.count(AE))-      continue;-    for (User *U : AE->users()) {-      Instruction *UI = cast<Instruction>(U);-      if (TheLoop->contains(UI))-        continue;-      LLVM_DEBUG(-          dbgs()-          << "LV: Cannot fold tail by masking, loop has an outside user for "-          << *UI << "\n");-      return false;-    }-  }-   for (const auto &Entry : getInductionVars()) {     PHINode *OrigPhi = Entry.first;     for (User *U : OrigPhi->users()) {diff --git a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp b/llvm/lib/Transforms/Vectorize/LoopVectorize.cppindex ceeabd65cced3..dbd97cdad607f 100644--- a/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp+++ b/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp@@ -8446,7 +8446,9 @@ static void addScalarResumePhis(VPRecipeBuilder &Builder, VPlan &Plan, /// exit block. The penultimate value of recurrences is fed to their LCSSA phi /// users in the original exit block using the VPIRInstruction wrapping to the /// LCSSA phi.-static void addExitUsersForFirstOrderRecurrences(VPlan &Plan, VFRange &Range) {+static bool addExitUsersForFirstOrderRecurrences(VPlan &Plan, VFRange &Range) {+   using namespace llvm::VPlanPatternMatch;+   VPRegionBlock *VectorRegion = Plan.getVectorLoopRegion();   auto *ScalarPHVPBB = Plan.getScalarPreheader();   auto *MiddleVPBB = Plan.getMiddleBlock();@@ -8465,6 +8467,15 @@ static void addExitUsersForFirstOrderRecurrences(VPlan &Plan, VFRange &Range) {     assert(VectorRegion->getSingleSuccessor() == Plan.getMiddleBlock() &&            "Cannot handle loops with uncountable early exits");+    // TODO: Support ExtractLane of last-active-lane with first-order+    // recurrences.++    if (any_of(FOR->users(), [FOR](VPUser *U) {+               return match(U, m_VPInstruction<VPInstruction::ExtractLane>(+                       m_VPValue(), m_Specific(FOR)));+        }))+    return false;+     // This is the second phase of vectorizing first-order recurrences, creating     // extract for users outside the loop. An overview of the transformation is     // described below. Suppose we have the following loop with some use after@@ -8536,10 +8547,10 @@ static void addExitUsersForFirstOrderRecurrences(VPlan &Plan, VFRange &Range) {     // Extract the penultimate value of the recurrence and use it as operand for     // the VPIRInstruction modeling the phi.     for (VPUser *U : FOR->users()) {-      using namespace llvm::VPlanPatternMatch;       if (!match(U, m_VPInstruction<VPInstruction::ExtractLastElement>(                         m_Specific(FOR))))         continue;+       // For VF vscale x 1, if vscale = 1, we are unable to extract the       // penultimate value of the recurrence. Instead we rely on the existing       // extract of the last element from the result of@@ -8547,13 +8558,14 @@ static void addExitUsersForFirstOrderRecurrences(VPlan &Plan, VFRange &Range) {       // TODO: Consider vscale_range info and UF.       if (LoopVectorizationPlanner::getDecisionAndClampRange(IsScalableOne,                                                              Range))-        return;+        return true;       VPValue *PenultimateElement = MiddleBuilder.createNaryOp(           VPInstruction::ExtractPenultimateElement, {FOR->getBackedgeValue()},           {}, "vector.recur.extract.for.phi");       cast<VPInstruction>(U)->replaceAllUsesWith(PenultimateElement);     }   }+  return true; }  VPlanPtr LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(@@ -8758,7 +8770,8 @@ VPlanPtr LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(     R->setOperand(1, WideIV->getStepValue());   }-  addExitUsersForFirstOrderRecurrences(*Plan, Range);+  if (!addExitUsersForFirstOrderRecurrences(*Plan, Range))+    return nullptr;   DenseMap<VPValue *, VPValue *> IVEndValues;   addScalarResumePhis(RecipeBuilder, *Plan, IVEndValues);@@ -9170,7 +9183,9 @@ void LoopVectorizationPlanner::adjustRecipesForReductions(         continue;       U->replaceUsesOfWith(OrigExitingVPV, FinalReductionResult);       if (match(U, m_VPInstruction<VPInstruction::ExtractLastElement>(-                       m_VPValue())))+                       m_VPValue())) ||+          match(U, m_VPInstruction<VPInstruction::ExtractLane>(m_VPValue(),+                                                               m_VPValue())))         cast<VPInstruction>(U)->replaceAllUsesWith(FinalReductionResult);     }@@ -10022,12 +10037,6 @@ bool LoopVectorizePass::processLoop(Loop *L) {   // Get user vectorization factor and interleave count.   ElementCount UserVF = Hints.getWidth();   unsigned UserIC = Hints.getInterleave();-  if (LVL.hasUncountableEarlyExit() && UserIC != 1) {-    UserIC = 1;-    reportVectorizationInfo("Interleaving not supported for loops "-                            "with uncountable early exits",-                            "InterleaveEarlyExitDisabled", ORE, L);-  }    // Plan how to best vectorize.   LVP.plan(UserVF, UserIC);diff --git a/llvm/lib/Transforms/Vectorize/VPlan.h b/llvm/lib/Transforms/Vectorize/VPlan.hindex 703cfe969577d..a81dc0bb0bef6 100644--- a/llvm/lib/Transforms/Vectorize/VPlan.h+++ b/llvm/lib/Transforms/Vectorize/VPlan.h@@ -1012,6 +1012,10 @@ class LLVM_ABI_FOR_TEST VPInstruction : public VPRecipeWithIRFlags,     ReductionStartVector,     // Creates a step vector starting from 0 to VF with a step of 1.     StepVector,+    /// Extracts a single lane (first operand) from a set of vector operands.+    /// The lane specifies an index into a vector formed by combining all vector+    /// operands (all operands after the first one).+    ExtractLane,    };diff --git a/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp b/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cppindex b27a7ffeed208..a0f5f10beb9fa 100644--- a/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp+++ b/llvm/lib/Transforms/Vectorize/VPlanAnalysis.cpp@@ -109,6 +109,8 @@ Type *VPTypeAnalysis::inferScalarTypeForRecipe(const VPInstruction *R) {   case VPInstruction::BuildStructVector:   case VPInstruction::BuildVector:     return SetResultTyFromOp();+  case VPInstruction::ExtractLane:+    return inferScalarType(R->getOperand(1));   case VPInstruction::FirstActiveLane:     return Type::getIntNTy(Ctx, 64);   case VPInstruction::ExtractLastElement:diff --git a/llvm/lib/Transforms/Vectorize/VPlanPredicator.cpp b/llvm/lib/Transforms/Vectorize/VPlanPredicator.cppindex f0cab79197b4d..9a1e25ee2f28c 100644--- a/llvm/lib/Transforms/Vectorize/VPlanPredicator.cpp+++ b/llvm/lib/Transforms/Vectorize/VPlanPredicator.cpp@@ -14,11 +14,13 @@ #include "VPRecipeBuilder.h" #include "VPlan.h" #include "VPlanCFG.h"+#include "VPlanPatternMatch.h" #include "VPlanTransforms.h" #include "VPlanUtils.h" #include "llvm/ADT/PostOrderIterator.h"  using namespace llvm;+using namespace VPlanPatternMatch;  namespace { class VPPredicator {@@ -42,11 +44,6 @@ class VPPredicator {   /// possibly inserting new recipes at \p Dst (using Builder's insertion point)   VPValue *createEdgeMask(VPBasicBlock *Src, VPBasicBlock *Dst);-  /// Returns the *entry* mask for \p VPBB.-  VPValue *getBlockInMask(VPBasicBlock *VPBB) const {-    return BlockMaskCache.lookup(VPBB);-  }-   /// Record \p Mask as the *entry* mask of \p VPBB, which is expected to not   /// already have a mask.   void setBlockInMask(VPBasicBlock *VPBB, VPValue *Mask) {@@ -66,6 +63,11 @@ class VPPredicator {   }  public:+  /// Returns the *entry* mask for \p VPBB.+  VPValue *getBlockInMask(VPBasicBlock *VPBB) const {+    return BlockMaskCache.lookup(VPBB);+  }+   /// Returns the precomputed predicate of the edge from \p Src to \p Dst.   VPValue *getEdgeMask(const VPBasicBlock *Src, const VPBasicBlock *Dst) const {     return EdgeMaskCache.lookup({Src, Dst});@@ -300,5 +302,45 @@ VPlanTransforms::introduceMasksAndLinearize(VPlan &Plan, bool FoldTail) {      PrevVPBB = VPBB;   }++  // If we folded the tail and introduced a header mask, any extract of the last element must be updated to only extract the last-active-lane of the header mask.+  if (FoldTail) {+    assert(Plan.getExitBlocks().size() == 1 &&+           "only a single-exit block is supported currently");+    VPBasicBlock *EB = Plan.getExitBlocks().front();+    assert(EB->getSinglePredecessor() == Plan.getMiddleBlock() &&+           "the exit block must have middle block as single predecessor");++    VPValue *LastActiveLane = nullptr;+    VPBuilder B(Plan.getMiddleBlock()->getTerminator());+    for (auto &P : EB->phis()) {+      auto *ExitIRI = cast<VPIRPhi>(&P);+      VPValue *Inc = ExitIRI->getIncomingValue(0);+      VPValue *Op;+      if (!match(Inc, m_VPInstruction<VPInstruction::ExtractLastElement>(+                          m_VPValue(Op))))+        continue;++      if (!LastActiveLane) {+        // Compute the index of the last active lane, by getting the+        // first-active-lane of the negated header mask (which is the first lane+        // the original header mask was false) and subtract 1.+        VPValue *HeaderMask = Predicator.getBlockInMask(+            Plan.getVectorLoopRegion()->getEntryBasicBlock());+        LastActiveLane = B.createNaryOp(+            Instruction::Sub,+            {B.createNaryOp(VPInstruction::FirstActiveLane,+                            {B.createNot(HeaderMask)}),+             Plan.getOrAddLiveIn(ConstantInt::get(+                 IntegerType::get(+                     Plan.getScalarHeader()->getIRBasicBlock()->getContext(),+                     64),+                 1))});+      }+      auto *Ext =+          B.createNaryOp(VPInstruction::ExtractLane, {LastActiveLane, Op});+      Inc->replaceAllUsesWith(Ext);+    }+  }   return Predicator.getBlockMaskCache(); }diff --git a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cppindex 1664bcc3881aa..cd95f648ffc11 100644--- a/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp+++ b/llvm/lib/Transforms/Vectorize/VPlanRecipes.cpp@@ -862,6 +862,31 @@ Value *VPInstruction::generate(VPTransformState &State) {       Res = Builder.CreateOr(Res, State.get(Op));     return Builder.CreateOrReduce(Res);   }+  case VPInstruction::ExtractLane: {+    Value *LaneToExtract = State.get(getOperand(0), true);+    Type *IdxTy = State.TypeAnalysis.inferScalarType(getOperand(0));+    Value *Res = nullptr;+    Value *RuntimeVF = getRuntimeVF(State.Builder, IdxTy, State.VF);++    for (unsigned Idx = 1; Idx != getNumOperands(); ++Idx) {+      Value *VectorStart =+          Builder.CreateMul(RuntimeVF, ConstantInt::get(IdxTy, Idx - 1));+      Value *VectorIdx = Idx == 1+                             ? LaneToExtract+                             : Builder.CreateSub(LaneToExtract, VectorStart);+      Value *Ext = State.VF.isScalar()+                       ? State.get(getOperand(Idx))+                       : Builder.CreateExtractElement(+                             State.get(getOperand(Idx)), VectorIdx);+      if (Res) {+        Value *Cmp = Builder.CreateICmpUGE(LaneToExtract, VectorStart);+        Res = Builder.CreateSelect(Cmp, Ext, Res);+      } else {+        Res = Ext;+      }+    }+    return Res;+  }   case VPInstruction::FirstActiveLane: {     if (getNumOperands() == 1) {       Value *Mask = State.get(getOperand(0));@@ -876,8 +901,17 @@ Value *VPInstruction::generate(VPTransformState &State) {     unsigned LastOpIdx = getNumOperands() - 1;     Value *Res = nullptr;     for (int Idx = LastOpIdx; Idx >= 0; --Idx) {-      Value *TrailingZeros = Builder.CreateCountTrailingZeroElems(-          Builder.getInt64Ty(), State.get(getOperand(Idx)), true, Name);+      Value *TrailingZeros =+          State.VF.isScalar()+              ? Builder.CreateZExt(+                    Builder.CreateICmpEQ(State.get(getOperand(Idx)),+                                         Builder.getInt1(0)),+                    Builder.getInt64Ty())+              : Builder.CreateCountTrailingZeroElems(+                    //      Value *TrailingZeros =+                    //      Builder.CreateCountTrailingZeroElems(+                    Builder.getInt64Ty(), State.get(getOperand(Idx)), true,+                    Name);       Value *Current = Builder.CreateAdd(           Builder.CreateMul(RuntimeVF, Builder.getInt64(Idx)), TrailingZeros);       if (Res) {@@ -920,7 +954,8 @@ InstructionCost VPInstruction::computeCost(ElementCount VF,   }    switch (getOpcode()) {-  case Instruction::ExtractElement: {+  case Instruction::ExtractElement:+  case VPInstruction::ExtractLane: {     // Add on the cost of extracting the element.     auto *VecTy = toVectorTy(Ctx.Types.inferScalarType(getOperand(0)), VF);     return Ctx.TTI.getVectorInstrCost(Instruction::ExtractElement, VecTy,@@ -982,6 +1017,7 @@ bool VPInstruction::isVectorToScalar() const {   return getOpcode() == VPInstruction::ExtractLastElement ||          getOpcode() == VPInstruction::ExtractPenultimateElement ||          getOpcode() == Instruction::ExtractElement ||+         getOpcode() == VPInstruction::ExtractLane ||          getOpcode() == VPInstruction::FirstActiveLane ||          getOpcode() == VPInstruction::ComputeAnyOfResult ||          getOpcode() == VPInstruction::ComputeFindIVResult ||@@ -1040,6 +1076,7 @@ bool VPInstruction::opcodeMayReadOrWriteFromMemory() const {   case VPInstruction::BuildVector:   case VPInstruction::CalculateTripCountMinusVF:   case VPInstruction::CanonicalIVIncrementForPart:+  case VPInstruction::ExtractLane:   case VPInstruction::ExtractLastElement:   case VPInstruction::ExtractPenultimateElement:   case VPInstruction::FirstActiveLane:@@ -1088,6 +1125,8 @@ bool VPInstruction::onlyFirstLaneUsed(const VPValue *Op) const {   case VPInstruction::ComputeAnyOfResult:   case VPInstruction::ComputeFindIVResult:     return Op == getOperand(1);+  case VPInstruction::ExtractLane:+    return Op == getOperand(0);   };   llvm_unreachable("switch should return"); }@@ -1166,6 +1205,9 @@ void VPInstruction::print(raw_ostream &O, const Twine &Indent,   case VPInstruction::BuildVector:     O << "buildvector";     break;+  case VPInstruction::ExtractLane:+    O << "extract-lane";+    break;   case VPInstruction::ExtractLastElement:     O << "extract-last-element";     break;diff --git a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cppindex 6a3b3e6e41955..338001820d593 100644--- a/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp+++ b/llvm/lib/Transforms/Vectorize/VPlanTransforms.cpp@@ -774,10 +774,10 @@ static VPValue *optimizeEarlyExitInductionUser(VPlan &Plan,   using namespace VPlanPatternMatch;    VPValue *Incoming, *Mask;-  if (!match(Op, m_VPInstruction<Instruction::ExtractElement>(-                     m_VPValue(Incoming),+  if (!match(Op, m_VPInstruction<VPInstruction::ExtractLane>(                      m_VPInstruction<VPInstruction::FirstActiveLane>(-                         m_VPValue(Mask)))))+                         m_VPValue(Mask)),+                     m_VPValue(Incoming))))     return nullptr;    auto *WideIV = getOptimizableIVOf(Incoming);@@ -2831,7 +2831,7 @@ void VPlanTransforms::handleUncountableEarlyExit(           VPInstruction::FirstActiveLane, {CondToEarlyExit}, nullptr,           "first.active.lane");       IncomingFromEarlyExit = EarlyExitB.createNaryOp(-          Instruction::ExtractElement, {IncomingFromEarlyExit, FirstActiveLane},+          VPInstruction::ExtractLane, {FirstActiveLane, IncomingFromEarlyExit},           nullptr, "early.exit.value");       ExitIRI->setOperand(EarlyExitIdx, IncomingFromEarlyExit);     }diff --git a/llvm/lib/Transforms/Vectorize/VPlanUnroll.cpp b/llvm/lib/Transforms/Vectorize/VPlanUnroll.cppindex b89cd21595efd..871e37ef3966a 100644--- a/llvm/lib/Transforms/Vectorize/VPlanUnroll.cpp+++ b/llvm/lib/Transforms/Vectorize/VPlanUnroll.cpp@@ -363,6 +363,13 @@ void UnrollState::unrollBlock(VPBlockBase *VPB) {       continue;     }     VPValue *Op0;+    if (match(&R, m_VPInstruction<VPInstruction::ExtractLane>(+                      m_VPValue(Op0), m_VPValue(Op1)))) {+      addUniformForAllParts(cast<VPInstruction>(&R));+      for (unsigned Part = 1; Part != UF; ++Part)+        R.addOperand(getValueForPart(Op1, Part));+      continue;+    }     if (match(&R, m_VPInstruction<VPInstruction::ExtractLastElement>(                       m_VPValue(Op0))) ||         match(&R, m_VPInstruction<VPInstruction::ExtractPenultimateElement>(diff --git a/llvm/test/Transforms/LoopVectorize/AArch64/single-early-exit-interleave.ll b/llvm/test/Transforms/LoopVectorize/AArch64/single-early-exit-interleave.llindex 61ef3cef603fa..c7be4593c6a9c 100644--- a/llvm/test/Transforms/LoopVectorize/AArch64/single-early-exit-interleave.ll+++ b/llvm/test/Transforms/LoopVectorize/AArch64/single-early-exit-interleave.ll@@ -14,15 +14,16 @@ define i64 @same_exit_block_pre_inc_use1() #0 { ; CHECK-NEXT:    call void @init_mem(ptr [[P1]], i64 1024) ; CHECK-NEXT:    call void @init_mem(ptr [[P2]], i64 1024) ; CHECK-NEXT:    [[TMP0:%.*]] = call i64 @llvm.vscale.i64()-; CHECK-NEXT:    [[TMP1:%.*]] = mul nuw i64 [[TMP0]], 16-; CHECK-NEXT:    br i1 false, label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]]+; CHECK-NEXT:    [[TMP1:%.*]] = mul nuw i64 [[TMP0]], 64+; CHECK-NEXT:    [[MIN_ITERS_CHECK:%.*]] = icmp ult i64 510, [[TMP1]]+; CHECK-NEXT:    br i1 [[MIN_ITERS_CHECK]], label [[SCALAR_PH:%.*]], label [[VECTOR_PH:%.*]] ; CHECK:       vector.ph: ; CHECK-NEXT:    [[TMP2:%.*]] = call i64 @llvm.vscale.i64()-; CHECK-NEXT:    [[TMP3:%.*]] = mul nuw i64 [[TMP2]], 16+; CHECK-NEXT:    [[TMP3:%.*]] = mul nuw i64 [[TMP2]], 64 ; CHECK-NEXT:    [[N_MOD_VF:%.*]] = urem i64 510, [[TMP3]] ; CHECK-NEXT:    [[N_VEC:%.*]] = sub i64 510, [[N_MOD_VF]] ; CHECK-NEXT:    [[TMP4:%.*]] = call i64 @llvm.vscale.i64()-; CHECK-NEXT:    [[TMP5:%.*]] = mul nuw i64 [[TMP4]], 16+; CHECK-NEXT:    [[TMP5:%.*]] = mul nuw i64 [[TMP4]], 64 ; CHECK-NEXT:    [[INDEX_NEXT:%.*]] = add i64 3, [[N_VEC]] ; CHECK-NEXT:    br label [[LOOP:%.*]] ; CHECK:       vector.body:@@ -30,13 +31,43 @@ define i64 @same_exit_block_pre_inc_use1() #0 { ; CHECK-NEXT:    [[OFFSET_IDX:%.*]] = add i64 3, [[INDEX1]] ; CHECK-NEXT:    [[TMP7:%.*]] = getelementptr inbounds i8, ptr [[P1]], i64 [[OFFSET_IDX]] ; CHECK-NEXT:    [[TMP8:%.*]] = getelementptr inbounds i8, ptr [[TMP7]], i32 0+; CHECK-NEXT:    [[TMP18:%.*]] = call i64 @llvm.vscale.i64()+; CHECK-NEXT:    [[TMP19:%.*]] = mul nuw i64 [[TMP18]], 16+; CHECK-NEXT:    [[TMP11:%.*]] = getelementptr inbounds i8, ptr [[TMP7]], i64 [[TMP19]]+; CHECK-NEXT:    [[TMP29:%.*]] = call i64 @llvm.vscale.i64()+; CHECK-NEXT:    [[TMP36:%.*]] = mul nuw i64 [[TMP29]], 32+; CHECK-NEXT:    [[TMP37:%.*]] = getelementptr inbounds i8, ptr [[TMP7]], i64 [[TMP36]]+; CHECK-NEXT:    [[TMP15:%.*]] = call i64 @llvm.vscale.i64()+; CHECK-NEXT:    [[TMP38:%.*]] = mul nuw i64 [[TMP15]], 48+; CHECK-NEXT:    [[TMP54:%.*]] = getelementptr inbounds i8, ptr [[TMP7]], i64 [[TMP38]] ; CHECK-NEXT:    [[WIDE_LOAD4:%.*]] = load <vscale x 16 x i8>, ptr [[TMP8]], align 1+; CHECK-NEXT:    [[WIDE_LOAD2:%.*]] = load <vscale x 16 x i8>, ptr [[TMP11]], align 1+; CHECK-NEXT:    [[WIDE_LOAD3:%.*]] = load <vscale x 16 x i8>, ptr [[TMP37]], align 1+; CHECK-NEXT:    [[WIDE_LOAD5:%.*]] = load <vscale x 16 x i8>, ptr [[TMP54]], align 1 ; CHECK-NEXT:    [[TMP9:%.*]] = getelementptr inbounds i8, ptr [[P2]], i64 [[OFFSET_IDX]] ; CHECK-NEXT:    [[TMP10:%.*]] = getelementp...[truncated]

@github-actionsGitHub Actions
Copy link

github-actionsbot commentedJul 16, 2025
edited
Loading

✅ With the latest revision this PR passed the C/C++ code formatter.

…(WIP)Building on top ofllvm#148817, useExtractLane + FirstActiveLane to support vectorizing external users whentail-folding.Currently marked as WIP as there is a regression when-prefer-predicate-over-epilogue=predicate-else-scalar-epilogue is usedbecause we bail out when building VPlans, so cannot recover and switchto non-tail-folding.Ideally we would have built both VPlans(llvm#148882).See alsollvm#148603Depends onllvm#148817 (included inthe PR).
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@preamespreamesAwaiting requested review from preames

@lukel97lukel97Awaiting requested review from lukel97

@ayalzayalzAwaiting requested review from ayalz

@aniragilaniragilAwaiting requested review from aniragil

Assignees
No one assigned
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

2 participants
@fhahn@llvmbot

[8]ページ先頭

©2009-2025 Movatter.jp