Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Skip FP registers processing if method don't have floating points use#85744

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
kunalspathak merged 4 commits intodotnet:mainfromkunalspathak:lsra-throughput
May 6, 2023

Conversation

@kunalspathak
Copy link
Contributor

@kunalspathakkunalspathak commentedMay 4, 2023
edited
Loading

Fixes:#83109
Contributes to#83946

@ghostghost added the area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI labelMay 4, 2023
@ghost
Copy link

Tagging subscribers to this area:@JulieLeeMSFT,@jakobbotsch
See info inarea-owners.md if you want to be subscribed.

Issue Details

null

Author:kunalspathak
Assignees:-
Labels:

area-CodeGen-coreclr

Milestone:-

unsignedintcurrentSpill[TYP_COUNT];
boolneedFloatTmpForFPCall;
boolneedDoubleTmpForFPCall;
boolneedFloatingRegisters;
Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

A safe change would be to haveneedFloatingRegisters |= (theRegisterType != IntRegisterType); innewInterval() method but want to see if my current changes achieve the purpose.

@kunalspathakkunalspathak changed the titleSkip processing floating point registers if method doesn't have oneSkip FP registers processing if method don't have floating points useMay 4, 2023
@kunalspathak
Copy link
ContributorAuthor

NiceTP Diffs

image

@kunalspathak
Copy link
ContributorAuthor

/azp run runtime-coreclr jitstress-jitstressregs, runtime-coreclr jitstress

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@tannergooding
Copy link
Member

tannergooding commentedMay 4, 2023
edited
Loading

NiceTP Diffs

Do we know what causes the potential hit to Arm64 minopts? Seems odd that we see such huge improvements elsewhere, but not always in minopts but only on Arm64

I seetests in particular are the "worst offender", is that maybe due to the number ofHWIntrinsic tests? If so, do we know why we don't see similar in full-opts?

@kunalspathak
Copy link
ContributorAuthor

Jitstress is green.

@kunalspathak
Copy link
ContributorAuthor

I seetests in particular are the "worst offender", is that maybe due to the number ofHWIntrinsic tests? If so, do we know why we don't see similar in full-opts?

I will try to find it out.

@kunalspathakkunalspathak marked this pull request as ready for reviewMay 4, 2023 05:53
@kunalspathak
Copy link
ContributorAuthor

@dotnet/jit-contrib

@kunalspathak
Copy link
ContributorAuthor

I seetests in particular are the "worst offender", is that maybe due to the number ofHWIntrinsic tests? If so, do we know why we don't see similar in full-opts?

I will try to find it out.

PS E:\spmi\pintools\1.0\windows> .\analyze-pin-trace-diff.ps1 -b E:\features\lsra\floatregs\base_arm64_coreclr.txt E:\features\lsra\floatregs\diff_arm64_coreclr.txtBase: 431066232654, Diff: 431355053761, +0.0670%?buildIntervals@LinearScan@@QEAAXXZ                                                                                           : 571725277  : +46.55%  : 27.20% : +0.1326%?processBlockStartLocations@LinearScan@@AEAAXPEAUBasicBlock@@@Z                                                               : 349854912  : +27.75%  : 16.64% : +0.0812%?resolveRegisters@LinearScan@@QEAAXXZ                                                                                         : 270767431  : +3.38%   : 12.88% : +0.0628%?impImportAndPushBox@Compiler@@IEAAXPEAUCORINFO_RESOLVED_TOKEN@@@Z                                                            : 2230836    : +6.41%   : 0.11%  : +0.0005%?impPushOnStack@Compiler@@IEAAXPEAUGenTree@@VtypeInfo@@@Z                                                                     : -2256082   : -0.75%   : 0.11%  : -0.0005%?impImportCall@Compiler@@IEAA?AW4var_types@@W4opcode_t@@PEAUCORINFO_RESOLVED_TOKEN@@1PEAUGenTree@@HPEAUCORINFO_CALL_INFO@@I@Z : -2438804   : -0.20%   : 0.12%  : -0.0006%??$allocateRegisters@$0A@@LinearScan@@QEAAXXZ                                                                                 : -436643544 : -1.57%   : 20.77% : -0.1013%?buildPhysRegRecords@LinearScan@@AEAAXXZ                                                                                      : -464797849 : -100.00% : 22.11% : -0.1078%

I investigated the top methods and one thing that stands out is that earlier thefor-loop of registers was unrolled and now it no longer does, but iterating it fewer times is still better.
image

The call tobuildPhysRegsRecord() is now getting inlined insidebuildIntervals().

image

Other then that, the only extra thing I am doing is settingneedNonIntegerRegisters = true insideinternalFloatRegCandidates().

@kunalspathak
Copy link
ContributorAuthor

It also occurred to me that for xarch, we already updated the definition ofAVAILABLE_REG_COUNT to a getter, but for arm64, it was a constant. This PR makes arm64 definition to getter as well and that could reflect the regression on arm64 and not on x64.

tannergooding reacted with thumbs up emoji

Comment on lines +1949 to +1950
// For that `compFloatingPointUsed` should be set accurately
// before invoking allocator.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I think this bit in particular is a bit difficult as we need to pre-emptively setcompFloatingPointUsed for any SIMD code. But that can't really account for dead code elimination or other factors that might later remove or transforms the nodes to no longer need it.

Copy link
Member

@tannergoodingtannergoodingMay 5, 2023
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Would it instead be possible to haveregMaskTP LinearScan::internalFloatRegCandidates() lazily do the below, or is there other factors that require it?

That is, have it do something like:

regMaskTPLinearScan::internalFloatRegCandidates(){if (!needNonIntegerRegisters)    {// lazily initialize fpreg state herefor (unsignedint i =0; i < lsraRegOrderFltSize; i++)        {            regNumber  reg  = lsraRegOrderFlt[i];            RegRecord* curr = &physRegs[reg];            curr->regOrder  = (unsignedchar)i;        }        needNonIntegerRegisters =true;    }if (compiler->compFloatingPointUsed)    {return availableFloatRegs;    }else    {return RBM_FLT_CALLEE_TRASH;    }}

Copy link
ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Unfortunately we need the physRegRecord early right after identifying candidates.

Copy link
ContributorAuthor

@kunalspathakkunalspathakMay 5, 2023
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

But that can't really account for dead code elimination or other factors that might later remove or transforms the nodes to no longer need it.

True, but this is the best we can do I suppose, unless you are proposing to just start from scratch in LSRA and set it totrue duringBuildNode() iftree->GetType() != TYP_INT.

Edit: Doing the check inBuildNode() might prove costly though because we will be doing that for every singleGenTree* node and not sure how many cases we will optimize for the scenarios where we didn't account for dead code elimination.

@kunalspathakkunalspathak merged commit8854509 intodotnet:mainMay 6, 2023
@kunalspathakkunalspathak deleted the lsra-throughput branchMay 6, 2023 03:37
@ghostghost locked asresolvedand limited conversation to collaboratorsJun 5, 2023
Sign up for freeto subscribe to this conversation on GitHub. Already have an account?Sign in.

Reviewers

@tannergoodingtannergoodingtannergooding approved these changes

+1 more reviewer

@BruceForstallBruceForstallBruceForstall approved these changes

Reviewers whose approvals may not affect merge requirements

Assignees

@kunalspathakkunalspathak

Labels

area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

Projects

None yet

Milestone

No milestone

Development

Successfully merging this pull request may close these issues.

Segregate the register iterations based on intType/floatType

3 participants

@kunalspathak@tannergooding@BruceForstall

[8]ページ先頭

©2009-2025 Movatter.jp