Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

[AMDGPU] Add scheduling stage to rewrite MFMA from VGPR to AGPR#149367

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Draft
jrbyrnes wants to merge1 commit intollvm:main
base:main
Choose a base branch
Loading
fromjrbyrnes:MFMASchedRewriteRebase0

Conversation

jrbyrnes
Copy link
Contributor

@jrbyrnesjrbyrnes commentedJul 17, 2025
edited
Loading

After#145025 we will always produce the VGPR MFMA form. While this is beneficial for some cases, there are still cases where using the AGPR form is preferred. Specifically, in cases where we have high per-iteration RP coming from MFMAs and no in-loop VGPR users of MFMAs. In such cases, selecting the VGPR form may cause an explosion in VGPR pressure, which degrades the quality of scheduling. The PostRA MFMA rewriter can help improve RA for some of these cases, but it will not help the scheduler.

This PR does rewriting during scheduling as a separate scheduling stage. It will only try to go from VGPR -> AGPR form if we have ArchVGPR pressure over the addressable limit, and if we find that we will not need to issue any cross RC copies in loop. We can also implement AGPR form -> VGPR, but the assumption is that we will always produce VGPR form.

A WIP:
Needs more testing
Still a bit undecided about the heuristic
Considering making the implemenation more generalized for other types of rewriting / transformations, though this may be left as a TODO

Putting up draft for any feedback.

Change-Id: I47b2a4274a35f3cf0a6d064674d1d29526e4dfd2
@lucas-rami
Copy link
Contributor

About the heuristic, instead of relying on cycle depth, how about using block frequencies and latency estimates of a cross-class copy vs a spill save/restore to determine how much copying we can afford without increasing latency? This is what I am doing to estimate rematerialization benefit in my upcoming scoring system for remat candidates (branch), so I think the cost of deriving block frequencies could even be factored in among the scheduler's stages.

Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Reviewers

@arsenmarsenmAwaiting requested review from arsenm

@kerbowakerbowaAwaiting requested review from kerbowa

@rampitecrampitecAwaiting requested review from rampitec

@lucas-ramilucas-ramiAwaiting requested review from lucas-rami

@srpandesrpandeAwaiting requested review from srpande

Assignees
No one assigned
Labels
None yet
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

2 participants
@jrbyrnes@lucas-rami

[8]ページ先頭

©2009-2025 Movatter.jp