NotificationsYou must be signed in to change notification settings
Fork5.2k
Star17.2k

Skip FP registers processing if method don't have floating points use#85744

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Jump to bottom

Merged

kunalspathak merged 4 commits intodotnet:mainfromkunalspathak:lsra-throughput

May 6, 2023

Merged

Skip FP registers processing if method don't have floating points use#85744

kunalspathak merged 4 commits intodotnet:mainfromkunalspathak:lsra-throughput

May 6, 2023

Conversation

Copy link

Contributor

kunalspathak commentedMay 4, 2023•
edited
Loading

Fixes:#83109
Contributes to#83946

skip processing floating point registers if method doesn't have one

fe09c7f

ghost added the area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label

May 4, 2023

ghost assignedkunalspathak

May 4, 2023

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

A safe change would be to haveneedFloatingRegisters |= (theRegisterType != IntRegisterType); innewInterval() method but want to see if my current changes achieve the purpose.

kunalspathak changed the title~~Skip processing floating point registers if method doesn't have one~~Skip FP registers processing if method don't have floating points use

May 4, 2023

Copy link

ContributorAuthor

kunalspathak commentedMay 4, 2023

NiceTP Diffs

Copy link

ContributorAuthor

kunalspathak commentedMay 4, 2023

/azp run runtime-coreclr jitstress-jitstressregs, runtime-coreclr jitstress

Copy link

azure-pipelinesbot commentedMay 4, 2023

Azure Pipelines successfully started running 1 pipeline(s).

tannergooding reviewed

May 4, 2023

View reviewed changes

src/coreclr/jit/lsra.cpp OutdatedShow resolvedHide resolved

Copy link

Member

tannergooding commentedMay 4, 2023•
edited
Loading

NiceTP Diffs

Do we know what causes the potential hit to Arm64 minopts? Seems odd that we see such huge improvements elsewhere, but not always in minopts but only on Arm64

I seetests in particular are the "worst offender", is that maybe due to the number ofHWIntrinsic tests? If so, do we know why we don't see similar in full-opts?

kunalspathak mentioned this pull request

May 4, 2023

AVX-512 throughput improvement opportunties#83946

Open

Rename to needNonIntegerRegisters

8a7d379

Copy link

ContributorAuthor

kunalspathak commentedMay 4, 2023

Jitstress is green.

Copy link

ContributorAuthor

kunalspathak commentedMay 4, 2023

I seetests in particular are the "worst offender", is that maybe due to the number ofHWIntrinsic tests? If so, do we know why we don't see similar in full-opts?

I will try to find it out.

kunalspathak marked this pull request as ready for review

May 4, 2023 05:53

Copy link

ContributorAuthor

kunalspathak commentedMay 4, 2023

@dotnet/jit-contrib

build-analysisbot mentioned this pull request

May 4, 2023

Unable to load Analyzer assembly .../Microsoft.CodeAnalysis.Analyzers.dll : Not a valid assembly#85082

Closed

Merge remote-tracking branch 'origin/main' into lsra-throughput

e295312

Copy link

ContributorAuthor

kunalspathak commentedMay 5, 2023

I seetests in particular are the "worst offender", is that maybe due to the number ofHWIntrinsic tests? If so, do we know why we don't see similar in full-opts?
I will try to find it out.

PS E:\spmi\pintools\1.0\windows> .\analyze-pin-trace-diff.ps1 -b E:\features\lsra\floatregs\base_arm64_coreclr.txt E:\features\lsra\floatregs\diff_arm64_coreclr.txtBase: 431066232654, Diff: 431355053761, +0.0670%?buildIntervals@LinearScan@@QEAAXXZ                                                                                           : 571725277  : +46.55%  : 27.20% : +0.1326%?processBlockStartLocations@LinearScan@@AEAAXPEAUBasicBlock@@@Z                                                               : 349854912  : +27.75%  : 16.64% : +0.0812%?resolveRegisters@LinearScan@@QEAAXXZ                                                                                         : 270767431  : +3.38%   : 12.88% : +0.0628%?impImportAndPushBox@Compiler@@IEAAXPEAUCORINFO_RESOLVED_TOKEN@@@Z                                                            : 2230836    : +6.41%   : 0.11%  : +0.0005%?impPushOnStack@Compiler@@IEAAXPEAUGenTree@@VtypeInfo@@@Z                                                                     : -2256082   : -0.75%   : 0.11%  : -0.0005%?impImportCall@Compiler@@IEAA?AW4var_types@@W4opcode_t@@PEAUCORINFO_RESOLVED_TOKEN@@1PEAUGenTree@@HPEAUCORINFO_CALL_INFO@@I@Z : -2438804   : -0.20%   : 0.12%  : -0.0006%??$allocateRegisters@$0A@@LinearScan@@QEAAXXZ                                                                                 : -436643544 : -1.57%   : 20.77% : -0.1013%?buildPhysRegRecords@LinearScan@@AEAAXXZ                                                                                      : -464797849 : -100.00% : 22.11% : -0.1078%

I investigated the top methods and one thing that stands out is that earlier thefor-loop of registers was unrolled and now it no longer does, but iterating it fewer times is still better.

The call tobuildPhysRegsRecord() is now getting inlined insidebuildIntervals().

Other then that, the only extra thing I am doing is settingneedNonIntegerRegisters = true insideinternalFloatRegCandidates().

Copy link

ContributorAuthor

kunalspathak commentedMay 5, 2023

It also occurred to me that for xarch, we already updated the definition ofAVAILABLE_REG_COUNT to a getter, but for arm64, it was a constant. This PR makes arm64 definition to getter as well and that could reflect the regression on arm64 and not on x64.

tannergooding reviewed

May 5, 2023

View reviewed changes

src/coreclr/jit/lsrabuild.cpp

Comment on lines +1949 to +1950

		// For that `compFloatingPointUsed` should be set accurately
		// before invoking allocator.

Copy link

Member

tannergoodingMay 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I think this bit in particular is a bit difficult as we need to pre-emptively setcompFloatingPointUsed for any SIMD code. But that can't really account for dead code elimination or other factors that might later remove or transforms the nodes to no longer need it.

Copy link

Member

tannergoodingMay 5, 2023•
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Would it instead be possible to haveregMaskTP LinearScan::internalFloatRegCandidates() lazily do the below, or is there other factors that require it?

That is, have it do something like:

regMaskTPLinearScan::internalFloatRegCandidates(){if (!needNonIntegerRegisters)    {// lazily initialize fpreg state herefor (unsignedint i =0; i < lsraRegOrderFltSize; i++)        {            regNumber  reg  = lsraRegOrderFlt[i];            RegRecord* curr = &physRegs[reg];            curr->regOrder  = (unsignedchar)i;        }        needNonIntegerRegisters =true;    }if (compiler->compFloatingPointUsed)    {return availableFloatRegs;    }else    {return RBM_FLT_CALLEE_TRASH;    }}

Copy link

ContributorAuthor

kunalspathakMay 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Unfortunately we need the physRegRecord early right after identifying candidates.

Copy link

ContributorAuthor

kunalspathakMay 5, 2023•
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

But that can't really account for dead code elimination or other factors that might later remove or transforms the nodes to no longer need it.

True, but this is the best we can do I suppose, unless you are proposing to just start from scratch in LSRA and set it totrue duringBuildNode() iftree->GetType() != TYP_INT.

Edit: Doing the check inBuildNode() might prove costly though because we will be doing that for every singleGenTree* node and not sure how many cases we will optimize for the scenarios where we didn't account for dead code elimination.