Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

[JIT] Optimize constant V512 vector with broadcast#92017

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
BruceForstall merged 8 commits intodotnet:mainfromRuihan-Yin:broadcastMov
Nov 21, 2023

Conversation

Ruihan-Yin
Copy link
Member

This PR is trying to solve#90328.

The optimization is implemented by replacing the constant V512 vector by a V128 and abroadcasti128 node when loweringGT_STOREIND plus an eligible constant V512 vector as its operand.

Currently, the implementation only coversV512 -> broadcasti128(V128), we are open to adjust the implementation or bring more situations into this PR, ideallyV512/256 -> broadcasti128(V128), when AVX512 is available. (Possibly plusV512 -> broadcast64x4(V256).)

@ghostghost added area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contributionIndicates that the PR has been added by a community member labelsSep 13, 2023
@ghost
Copy link

Tagging subscribers to this area:@JulieLeeMSFT,@jakobbotsch
See info inarea-owners.md if you want to be subscribed.

Issue Details

This PR is trying to solve#90328.

The optimization is implemented by replacing the constant V512 vector by a V128 and abroadcasti128 node when loweringGT_STOREIND plus an eligible constant V512 vector as its operand.

Currently, the implementation only coversV512 -> broadcasti128(V128), we are open to adjust the implementation or bring more situations into this PR, ideallyV512/256 -> broadcasti128(V128), when AVX512 is available. (Possibly plusV512 -> broadcast64x4(V256).)

Author:Ruihan-Yin
Assignees:-
Labels:

area-CodeGen-coreclr

Milestone:-

@Ruihan-YinRuihan-Yin changed the title[JIT] Optimize constant V512 vector[JIT] Optimize constant V512 vector with broadcastSep 13, 2023
@Ruihan-Yin
Copy link
MemberAuthor

Ruihan-Yin commentedSep 14, 2023
edited
Loading

Ran the test suite twice, should be some known or random fails, turning PR to ready for review.

@Ruihan-YinRuihan-Yin marked this pull request as ready for reviewSeptember 14, 2023 23:14
@Ruihan-Yin
Copy link
MemberAuthor

Hi@EgorBo, this PR is ready for review, please see if this PR is able to cover#90328, thanks!

@tannergooding
Copy link
Member

tannergooding commentedSep 18, 2023
edited
Loading

Is this going to perform better or just save on the size of the rodata section?

What about on hardware without AVX-512 (Haswell,Skylake, etc)?

What aboutscalar->V128,scalar->V256,scalar->V512,V128->V256,V256->V512, etc?

For AVX-512, thescalar->Vector scenario can at least be covered by an embedded broadcast. But for other scenarios, this seems like its trading more instructions for smaller data section.

@Ruihan-Yin
Copy link
MemberAuthor

Is this going to perform better or just save on the size of the rodata section?

The expected improvement is saving some memory space for constant values.

What about on hardware without AVX-512 (Haswell,Skylake, etc)?

I was intended to useVBROADCASTI32X4, which is an AVX512 only instruction, but seemsVBROADCASTI128 can also do this job forV256->V128.

What aboutscalar->V128,scalar->V256,scalar->V512,V128->V256,V256->V512, etc?

I presume the scope is to achieve compressing larger existing constant vector to smaller vector in a pure memory operation, which embedded broadcast might not be able to handle.

If we want to take compressing to scalar into consideration, we might also have the opportunity: V128/256/512 ->Byte/Word/DWord/QWord.

For AVX-512, thescalar->Vector scenario can at least be covered by an embedded broadcast. But for other scenarios, this seems like its trading more instructions for smaller data section.

From my understanding of#90328, the issue is for a pure store instruction case, then the code gen is mostly:

vmovups zmm, zmmword ptr[constant section]vmovups zmmword ptr[target], zmm

the optimization is essentially replacing the first load with a broadcast instruction with a smaller constant operand.

I might get the issue wrong or incompletely, so please correct me if I have any misunderstanding.

@tannergooding
Copy link
Member

the optimization is essentially replacing the first load with a broadcast instruction with a smaller constant operand.

👍, if its primarily for the case where we'd otherwise have avmovups reg1, [addr] then it sounds great to replace that withvbroadcast reg1, [addr] where possible.

I was initially concerned it would also change:

vadd reg1, reg2, [addr]

into

vbroadcast reg3, [addr]vadd reg1, reg2, reg3

@Ruihan-Yin
Copy link
MemberAuthor

the optimization is essentially replacing the first load with a broadcast instruction with a smaller constant operand.

👍, if its primarily for the case where we'd otherwise have avmovups reg1, [addr] then it sounds great to replace that withvbroadcast reg1, [addr] where possible.

I was initially concerned it would also change:

vadd reg1, reg2, [addr]

into

vbroadcast reg3, [addr]vadd reg1, reg2, reg3

I think it wouldn't cover that case (at least it is not intended to cover), as the entry point of this opt isLowerStoreIndir().

@Ruihan-Yin
Copy link
MemberAuthor

Fail should be unrelated.

Hi,@tannergooding@EgorBo, I added the optimization for V512->V256 and V256->V128, and I think it reaches the expected coverage and ready for the reviews.

@EgorBoEgorBo self-requested a reviewOctober 16, 2023 10:33
@JulieLeeMSFT
Copy link
Member

@tannergooding, this community PR is ready to review. PTAL.

return;
}

if (!node->Data()->AsVecCon()->TypeIs(TYP_SIMD32) && !node->Data()->AsVecCon()->TypeIs(TYP_SIMD64))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I believe you can just do:

Suggested change
if (!node->Data()->AsVecCon()->TypeIs(TYP_SIMD32) && !node->Data()->AsVecCon()->TypeIs(TYP_SIMD64))
if (!node->Data()->AsVecCon()->TypeIs(TYP_SIMD32,TYP_SIMD64))

Copy link
Member

@tannergoodingtannergooding left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

LGTM. This should get a secondary review from someone on the JIT team

CC. @dotnet/jit-contrib

@tannergooding
Copy link
Member

CC.@jakobbotsch,@EgorBo in particular

@Ruihan-Yin
Copy link
MemberAuthor

Hi@jakobbotsch@EgorBo, this PR is ready for review, would you please take a look? Thanks!

@jakobbotsch
Copy link
Member

Since this is AVX-512 backend work I think@BruceForstall should take a look... On my quick glance it seemed a bit odd to do it duringSTORE_INDIR lowering when presumably constants can benefit in many other cases (as long as they're not already contained), but I am not very familiar with these instructions. Also going to close and reopen this to rerun CI.

Ruihan-Yin reacted with thumbs up emoji

@BruceForstall
Copy link
Contributor

Diffs

@BruceForstallBruceForstall merged commita672cf1 intodotnet:mainNov 21, 2023
@Ruihan-Yin
Copy link
MemberAuthor

Thanks everyone for reviewing on this!

@github-actionsgithub-actionsbot locked and limited conversation to collaboratorsDec 22, 2023
Sign up for freeto subscribe to this conversation on GitHub. Already have an account?Sign in.
Reviewers

@tannergoodingtannergoodingtannergooding approved these changes

@BruceForstallBruceForstallBruceForstall approved these changes

@EgorBoEgorBoAwaiting requested review from EgorBo

Assignees
No one assigned
Labels
area-CodeGen-coreclrCLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMIcommunity-contributionIndicates that the PR has been added by a community member
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

5 participants
@Ruihan-Yin@tannergooding@JulieLeeMSFT@jakobbotsch@BruceForstall

[8]ページ先頭

©2009-2025 Movatter.jp