NotificationsYou must be signed in to change notification settings
Fork5.2k
Star17.2k

Fix SQRT, RCP, RSQRT PerfScores#50813

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Jump to bottom

Merged

briansull merged 1 commit intodotnet:mainfrompentp:sqrt-perfscore

Apr 15, 2021

Merged

Fix SQRT, RCP, RSQRT PerfScores#50813

briansull merged 1 commit intodotnet:mainfrompentp:sqrt-perfscore

Apr 15, 2021

Conversation

Copy link

Contributor

tannergoodingApr 7, 2021•
edited
Loading

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Onhttps://uops.info/table.html, I see the following for Skylake:

sqrtsd   64, TP 1.00 / 4.50, LAT [≤13.0;≤19]sqrtpd  128, TP 1.00 / 4.50, LAT [≤13.0;≤19]sqrtpd  256, TP 1.00 / 9.00, LAT [≤13.0;≤20]sqrtss   32, TP 1.00 / 3.00, LAT [≤12.0;≤13]sqrtps  128, TP 1.00 / 3.00, LAT [≤12.0;≤13]sqrtps  256, TP 1.00 / 6.00, LAT [≤12.0;≤13]rsqrtss  32, TP 1.00 / 1.00, LAT [≤4.0;≤5]rsqrtps 128, TP 1.00 / 1.00, LAT [≤4.0;≤5]rsqrtps 256, TP 1.00 / 1.00, LAT [≤4.0;≤5]rcpss    32, TP 1.00 / 1.00, LAT 4rcpps   128, TP 1.00 / 1.00, LAT 4rcpps   256, TP 1.00 / 1.00, LAT 4

Copy link

Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

In particular, it looks like the 256-bit variants can be twice as expensive on throughput (as least for non-reciprocal cases)

Copy link

ContributorAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Yes, I saw that also when I checked the values, thought of even coding the argument size check, but then decided against it based on the fact that it wouldn't have any practical effect since the PerfScore is dominated by the larger latency value.

Copy link

Member

tannergoodingApr 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

That's fair, just thought I'd call it out 😄

It might be worth an up-for-grabs issue to get these correctly tracked. Given thatuops.info has everything available in an xml file (https://uops.info/xml.html), it would likely be possible to have a small generator that fills in a metadata header file (much like is used forhwintrinsiclistxarch.h) so this can be automated and is trivial to update to newer or older micro-architectures.

Copy link

ContributorAuthor

pentpApr 8, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I'm not too familiar with PerfScore code, but such an approach might be quite complicated - the uops.info measurements are made for each variant of encoding/reg-vs-mem/op-size and latencies are for each combination of input and output parameters. Currently PerfScore values are more of an art than science (e.g. latency for DIV is just set to the arithmetic mean of min/max, while in practice it might actually be on the lower end for 128:64 div as the upper 64bits are basically always zero) or the latency for MUL rdx:rax,reg is set to 4 (high half of output used) as it's the common use case, not 3 (only low half used) which is only rarely emitted.

Copy link

Member

AndyAyersMS commentedApr 9, 2021

@briansull PTAL

briansull approved these changes

Apr 14, 2021

View reviewed changes

Copy link

Contributor

briansull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Looks Good,
Thanks for your contribution

briansull merged commit0cf4f15 intodotnet:main

Apr 15, 2021

JulieLeeMSFT assignedpentp

Apr 15, 2021

JulieLeeMSFT added this to the6.0.0 milestone

Apr 15, 2021

pentp deleted the sqrt-perfscore branch

April 15, 2021 23:58

ghost locked asresolvedand limited conversation to collaborators

May 16, 2021

Labels

area-CodeGen-coreclr

CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI

5 participants

		result.insThroughput =PERFSCORE_THROUGHPUT_3C;
		result.insLatency +=PERFSCORE_LATENCY_12C;
		result.insThroughput =PERFSCORE_THROUGHPUT_1C;
		result.insLatency +=PERFSCORE_LATENCY_4C;

Movatterモバイル変換

Fix SQRT, RCP, RSQRT PerfScores#50813

Fix SQRT, RCP, RSQRT PerfScores#50813

Uh oh!

Conversation

pentp commentedApr 6, 2021

Uh oh!

tannergoodingApr 7, 2021• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tannergoodingApr 7, 2021

Choose a reason for hiding this comment

Uh oh!

pentpApr 7, 2021

Choose a reason for hiding this comment

Uh oh!

tannergoodingApr 7, 2021

Choose a reason for hiding this comment

Uh oh!

pentpApr 8, 2021

Choose a reason for hiding this comment

Uh oh!

AndyAyersMS commentedApr 9, 2021

Uh oh!

briansull left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

tannergoodingApr 7, 2021•
edited
Loading