This improves the chances of being able to remove the scalar loop and also fixes an issue where a UF=2 is choosen for a scalar loop with exactly VF(= X * VScale) iterations.

[LLVM][LV] Improve UF calculation for vscale based scalar loops.

118d38e

Update getSmallConstantTripCount() to return scalable ElementCountvalues that is used to acurrately determine the maximum value for UF,namely:  TripCount / VF ==> X * VScale / Y * VScale ==> X / YThis improves the chances of being able to remove the scalar loop andalso fixes an issue where a UF=2 is choosen for a scalar loop withexactly VF(= X * VScale) iterations.

llvmbot added vectorizers llvm:transforms labels

Jun 27, 2025

Copy link

Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Do we need to check overflow here?

Copy link

CollaboratorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Fixed both in terms of requiring NUW flags and ensuring the constant fits within the range of ElementCount.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

		@@ -423,7 +423,26 @@ static bool hasIrregularType(Type *Ty, const DataLayout &DL) {
		/// ElementCount to include loops whose trip count is a function of vscale.
		static ElementCountgetSmallConstantTripCount(ScalarEvolution *SE,
		const Loop *L) {
		returnElementCount::getFixed(SE->getSmallConstantTripCount(L));
		if (unsigned ExpectedTC = SE->getSmallConstantTripCount(L))

Copy link

Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

This routine seems like it should live on SCEV itself. I was originally going to propose we change getSmallConstantTripCount to return ElementCount, but that looks mildly invasive. Maybe for the moment have a "getSmallConstantTripElementCount"? Not a huge fan of that name, but it's at least close...

Copy link

CollaboratorAuthor

paulwalker-armJul 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

That's the route I started with but was advised to make the function specific to LoopVectorize, presumably until such a time that its applicability extends beyond vectorisation.

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp Outdated

		auto BestKnownTC =getSmallBestKnownTC(PSE, TheLoop);

		// For fixed length VFs treat a scalable trip count as unknown.
		if (BestKnownTC && (BestKnownTC->isFixed() \|\| VF.isScalable())) {

Copy link

Collaborator

preamesJul 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Style wise:
if (auto BestKnownTC = getSmallBestKnownTC(PSE, TheLoop);
BestKnownTC && (BestKnownTC->isFixed() || VF.isScalable())) {

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp Outdated

		// At least one iteration must be scalar when this constraint holds. So the
		// maximum available iterations for interleaving is one less.
		unsigned AvailableTC =requiresScalarEpilogue(VF.isVector())
		? BestKnownTC->getFixedValue() -1
		: BestKnownTC->getFixedValue();
		? BestKnownTC->getKnownMinValue() -1

Copy link

Collaborator

preamesJul 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

For a scalable trip count, using the min value here seems off. Shouldn't we be using vscalefortuning here?

Copy link

CollaboratorAuthor

paulwalker-armJul 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I don't think it matters because whenBestKnownTC is scalableEstimatedVF must also be scalable with all uses being to create ratios between the two. By usinggetKnownMinValue() we're just removing the common vscale factor.

That said, I don't much likegetKnownMinValue() so have changed the implementation to force everything to unsigned, be they know values or estimates, which as expected does not seem to affect the result but I feel looks more readable?

Please let me know if you'd rather I stick with the original approach.

paulwalker-arm added3 commits

July 11, 2025 11:49

Restrict scope of BestKnownTC.

2dff108

Make getEstimatedRuntimeVF resilient to overflow.

4e215f6

Make trip counts and VFs unsigned, using estimated where necessary.

0b21011

david-arm reviewed

Jul 14, 2025

View reviewed changes

llvm/lib/Transforms/Vectorize/LoopVectorize.cpp

		if (auto BestKnownTC = getSmallBestKnownTC(PSE, TheLoop)) {
		// ConstantMax from PSE, failing that. For fixed length VFs treat a scalable
		// trip count as if unknown.
		if (auto BestKnownTC = getSmallBestKnownTC(PSE, TheLoop);

Copy link

Contributor

david-armJul 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I know this works, but I personally find it more readable to avoid multiple statements separated by; within anif(...) block, i.e. something like

  auto BestKnownTC = getSmallBestKnownTC(PSE, TheLoop);  if (BestKnownTC && (BestKnownTC->isFixed() || VF.isScalable())

Copy link

CollaboratorAuthor

paulwalker-armJul 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

That is the style I originally used and@preames requested it be changed to this.

llvm/test/Transforms/LoopVectorize/AArch64/sve-vscale-based-trip-counts.ll

		br i1 %exitcond.not, label %for.cond.cleanup, label %for.body
		}

		; The known component of ElementCount is a 32-bit value.

Copy link

Contributor

david-armJul 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Shouldn't that beThe known component of ElementCount does not fit into 32 bits?

Copy link

CollaboratorAuthor

paulwalker-armJul 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

No. This is stating a fact, namely "ElementCount stores its known component as a 32-bit value", so as to explain why the test's trip count is too big for ElementCount.

Labels

llvm:transforms vectorizers

4 participants

		returnElementCount::getScalable(1);

		if (auto *Mul = dyn_cast<SCEVMulExpr>(ExitCount))
		if (Mul->getNumOperands() ==2 && isa<SCEVConstant>(Mul->getOperand(0)) &&

Movatterモバイル変換

[LLVM][LV] Improve UF calculation for vscale based scalar loops.#146102

Are you sure you want to change the base?

[LLVM][LV] Improve UF calculation for vscale based scalar loops.#146102

Uh oh!

Conversation

paulwalker-arm commentedJun 27, 2025

Uh oh!

llvmbot commentedJun 27, 2025

Uh oh!

llvmbot commentedJun 27, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!