- Notifications
You must be signed in to change notification settings - Fork353
precise_math attribute on functions#2080
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
base:main
Are you sure you want to change the base?
Uh oh!
There was an error while loading.Please reload this page.
Conversation
dneto0 left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Thanks for taking this stab. It's getting there
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
Uh oh!
There was an error while loading.Please reload this page.
wgsl/index.bs Outdated
| <tr><td><dfn noexport dfn-for="attribute">`precise_math`</dfn> | ||
| <td>*None* | ||
| Indicates that the arithmetic computations in the function need to be performed with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I have trouble with the word "precision" here, because that means "with more bits represented".
(Never mind the "precise" part of the attribute name, inherited from GLSL. It's good to reuse the GLSL word.)
Also, this should be constrained to floating point, I think.
How about:
Indicates that the floating point arithmetic computations in the function should be performed
- without [=reassociation/reassociating=] subexpressions
- while preserving infinities, NaNs, and signed zeroes
Apply this attribute when the correctness of the function is numerically sensitive, and it is acceptable to incur potential performance loss when forbidding such optimizations.
blah blah blah?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Thank you!
I took the liberty of modifying this a bit more. Let me know if it needs more fixing!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
I felt that it was important to refer to the floating point evaluation section from here
Uh oh!
There was an error while loading.Please reload this page.
dneto0 left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Seems ok to me now.
The key word is "should", instead of "must"
The group should review this.
litherum commentedSep 7, 2021 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
Metal exposes fastmath on the entire module:https://developer.apple.com/documentation/metal/mtlcompileoptions?language=objc. So this is a good idea, but it should be elevated to module-level (either by something at the global scope in the language, or as additional data to |
litherum commentedSep 7, 2021
The SPIR-V registry says SignedZeroInfNanPreserve is missing before version 1.4. The earliest version of Vulkan to require SPIR-V 1.4 is Vulkan 1.2, which I thought was unavailable on most Android devices. Can we really require it? |
kvark commentedSep 7, 2021
I don't think we require this
There is definitely value in having it exposed in a more granular level than the module scope:
|
kdashg commentedSep 8, 2021
WGSL meeting minutes 2021-09-07
|
dneto0 commentedSep 8, 2021
To fill in some a detail:
|
kvark commentedSep 8, 2021
My reading of the current state of the debate is that we need to decide if this functionality is testable or not. I believe having it testable would make a stronger API, and thus we need to explore this path before proceeding (with this PR as it stands now). It sounds like DX12 and Metal support this "precise" mode unconditionally, and there is a chance we'll be able to test it. In Vulkan, it's more complicated. As@dneto0 noted, there is an extension. However, one has to check for the properties of this extension before using them:https://vulkan.gpuinfo.org/listpropertiesextensions.php?extension=VK_KHR_shader_float_controls&platform=all If we make this an optional feature, we'd deny access to it for users who either don't care about |
dneto0 commentedSep 8, 2021
Agreed. I wasn't sure on the call yesterday, so I investigated what Vulkan does to test NoContract: The NoContract feature has been supported by SPIR-V / Vulkan from the start. The test tempts the compiler to fuse a multiply-add into one operation (FMA). This depends crucially on the fact that certain basic operations (add, subtract, multiply) are "correctly rounded" (as defined by IEEE 754, and adopted by Vulkan and WGSL). In general, catastrophic cancellation can be used to magnify errors for other undesirable cases: reassociation, distribution of multiply over addiiton. So I think fusing, reassociation, and distribution aspects are testable. |
mrshannon commentedSep 8, 2021 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
In an ideal world
We already have the |
kvark commentedSep 8, 2021
Metal is not exactly all or nothing. As@kainino0x pointed in#2076 (comment), we can pick a subset of fast-math stuff. It sounds like you are suggesting to adopt the current PR but cut out everything related to Then we can have an optional feature exposing something that captures |
mrshannon commentedSep 8, 2021 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
Yes, just disable fusing, reassociation, and distribution. With signed zeros and such not universal, and the lack of example code that would be effected by them I am proposing scaling back to only what
I was not sure if that was kosher since it was not documented in the Metal spec.
Or you could wait until someone needs it, its probably a failure of my imagination but I can't think of a case where asymptotic limits would be of use in rendering. |
kvark commentedSep 8, 2021
The last commit here describes this semantics. I'm sure@dneto0 would want to put more technical details of what is preserved, adding examples and such, and I'm hoping we can follow-up with this. |
munrocket commentedSep 8, 2021
Love to see where it is going, thanks@kvark. Floating point expansion definitely not rely on NaN/Infinity/SignedZero's. |
litherum commentedSep 9, 2021
From talking with the Metal team, we haven't gotten requests to apply fastMath per function rather than per MTLLibrary. This makes intuitive sense, because the use cases that need IEEE precision are things like scientific computing, where it's likely that all the functions in the library will need to be precise. Conversely, for use cases like games, it's likely that none of the functions in the library will need to be precise. (Games do need things like the |
litherum commentedSep 9, 2021 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
These things aren't API. Ideally, WebGPU / WGSL wouldn't rely on anything that isn't API in the 3 backend APIs. The API is a single boolean switch. (Anything that isn't API is unsupported, and able/willing to be removed at any point in the future.) |
litherum commentedSep 9, 2021 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
It would be unfortunate to make fastMath a "best effort" attribute. From an author's perspective, what's the point of a precision guarantee if the guarantee isn't actually guaranteed? From an implementor's perspective, why would an implementor implement any of the feature at all if it just slows down code and doesn't actually have any expected (testable) behavior? Or, stated a different way: Let's say I want to implement this feature in a particular WebGPU implementation, and I sit down and start typing code into the computer to do it. How do I know when I'm done? Why shouldn't I consider myself to be done implementing the feature before writing a single line of code? |
kvark commentedSep 9, 2021
@litherum it sounds like the desire to have this behavior testable is shared between all parties, so it's good to have this settled. The last version of the PR, which I mentioned in#2080 (comment), already makes it normative. It just doesn't spell out the exact norms affecting it, which is intended to be written at some point. So, no "best effort" any more. As for the scope of the change, I'm curious what use cases are to consider. From the distance, it felt useful to be able to make, say, vertex shaders precise but not the fragment shaders. Or even just computation of one specific output of a vertex shader. But I haven't used this myself, so happy to hear ISV feedback! @mrshannon could you share the intended usage of this attribute? Would you be doing it for the whole module, or potentially more granularly? |
mrshannon commentedSep 9, 2021 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
First, I am specifically talking about The first is extremely large scale terrain generation in a compute shader which requires double precision. An existing example of this is Elite Dangerous which uses real doubles on some cards and emulated doubles (which require Use in the wild:Generating the Universe in Elite Dangerous The second case is when rendering very large objects (which cannot be handled in other ways). To avoid jitter we need to perform the model to camera space transform in double precision sometimes. Therefore, again emulated doubles. But in this case the calculation is in the vertex shader and is pretty narrow in scope as it is just used for the model to world transform and furthermore is only used on a small subset of vertices (those close to the camera). Therefore it would be undesirable to require Use in the wild:3D Engine Design for Virtual Globes
This is not true, seeGenerating the Universe in Elite Dangerous. What is required is not IEEE but specifically the guarantees that HLSL gives with its In general there are cases where floating point error needs to be mitigated, even in rendering, which requires controlling the order of operations. |
kainino0x commentedSep 10, 2021
FWIW the flags I pointed to can probably only be used when invoking an MSL compiler via command line, but not via newLibraryWithSource. However I found the associated clang pragmas: Of course@litherum's point that these aren't officially supported still stands. |
mrshannon commentedSep 14, 2021 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
@munrocket Not sure that it is working on Windows, is the top of the fractal supposed to be filled with strange bands. Also not sure you need |
munrocket commentedSep 14, 2021 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
@mrshannon yes, it shows that float32 with limited precision. It’s intentional. I am started to think that fast-math is pretty ok even for this purposes because Dekker multiplication algorithm become smaller in x10 (2 FLOP vs 17 FLOP) with hardware fma instruction. It is implicitly inherited from fma(a,b,c) in current WebGPU implementation in Chrome/Firefox. Also with If you going to expose precise math in this PR then fma(a, b, c) will become twice rounded expression RN(RN(a * b) + c). And your will need to use more slower algorithm. I don’t know could you add support for hardware fma in this PR or not. But currently it is a trade-off. Fast multiplication and slow summation VS slow multiplication and fast summation |
mrshannon commentedSep 14, 2021 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
@munrocket We just tested the We are likely to use it over this PR (even if it is merged in) as it has better performance on Metal due to not disabling all optimizations and works at the expression and not at the module level. |
munrocket commentedSep 15, 2021
Glad to help. The only reason why someone will still need to turn off fast-math if they detect that host doesn’t support FMA in hardware. After that fma emulation with I am removed |
kvark commentedSep 15, 2021
Hey users, if you keep finding nice hacks and workarounds for this, we'll have no incentive to do anything with the spec! 🤪 |
munrocket commentedSep 15, 2021 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
Ha-ha, that was fun. It's actually miracle how it works. Because current rounding is UB and not specified, as well as fast-math mode. This PR still have potential. For example if somebody figure out how to turn on correct math and fused-multiply-add at the same time, mrshannon probably will use it. |
kdashg commentedSep 15, 2021
WGSL meeting minutes 2021-09-14
|
dneto0 commentedSep 15, 2021
Hey@munrocket thanks for this technique! And thank youalso for a nice compact test case. We had been discussing the need for a good way to test the behaviour. Some thoughts:
So two thumbs up for this technique! |
dneto0 commentedSep 15, 2021
It's a feature, not a bug. :-) |
dneto0 commentedSep 15, 2021
Another thing about the performance cost: Yes, this prevents the compiler from rearranging code to go faster, but that's exactly what the programmer wanted. |
munrocket commentedSep 15, 2021
It works with round-to-nearest-even floating point rounding, which is default usually, butnot specified for some reason in DX11 for example. Also as mentioned: floating point arithmetic not associative, muladd should be allowed only for fma.
Usually for emulated double addition 20 flop, multiplication 24 flop with software FMA, 9 flop with hardware. So it's cheap. When we using
Interesting, if we need a stronger confidence, we can pass variable there. |
dneto0 commentedSep 15, 2021
About the performance cost, I meant theadditional performance cost of using the select. Thanks for the extra info for the cost of the double precision emulation. :-) Right, rounding mode is not specified for graphics APIs because some devices use round-to-even, some use round-to-zero (which is cheaper in hardware). Does select do branching? It is common for GPUs to use predicated execution: they execute both paths, but selectively turn off side effects of that path "not taken", and then only use the chosen result. (wikipedia This trades off possibly wasting cycles stepping through the dead code path, but saves the machine from taking a branch and destroying internal state. So that's why I would hope to make the evaluation of the "other" path and the condition cheap: we want that so on a predicated execution they don't waste too much extra time. |
litherum commentedSep 21, 2021 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
Both of these use cases are supported by putting the precise attribute on the entire module rather than the individual function - just put these two entry points and their dependent functionality in a separate module. Linking a vertex shader from one module and a fragment shader from a different module is supported. |
kvark commentedSep 21, 2021
If I understand correctly, we are mostly fine with introducing
|
dneto0 commentedSep 21, 2021
The user has reasonable workaround, and it appears to be performant and likely stable over time. I thought this was an easy "not in V1.0" decision. |
kvark commentedSep 21, 2021
I'm not happy about this workaround becoming sort of a tribal knowledge thing. fncompute(val:T) ->T So doing |
mrshannon commentedSep 21, 2021
It would be wasteful in the 2nd case. Perhaps as much as 10% of vertices (depending on camera location) in any given object need emulated double vertex position. The rest can take the faster 32-bit float path as they are further from the camera. |
mrshannon commentedSep 21, 2021
Either this or actual function scope (including Metal) would keep us from using the |
kvark commentedSep 28, 2021
@litherum it looks like MSL supports |
kainino0x commentedSep 28, 2021
|
kdashg commentedSep 29, 2021
WGSL meeting minutes 2021-09-28
|
revoking my own review. Let's reconsider with fresh eyes
greggman commentedApr 7, 2024
I'm not sure this idea appeared above but .... what about module level flag that only works if a feature like "high-precision" exists? So you check if the adapter supports "high-precision". If it does you request a device with This way, if an GPU/driver can't pass the high-precision CTS tests it doesn't advertise the feature. If you don't like features bleeding into WGSL you could move the check into pipeline creation where you use the precision keywords/options in WGSL but when you go try to make a pipeline, if you didn't request the |
TimTheBig commentedOct 26, 2025
Is there any way I can get this moving again? |
| * without[=Reassociation|reassociating=] subexpressions | ||
| Note: this translates to `NoContraction` decoration in SPIR-V, `precise` qualifier in HLSL, | ||
| and a subset of `"-fno-fast-math" group of compile options in MSL. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
and a subset of `"-fno-fast-math" group of compile options in MSL.Should the subset not be documented?
Uh oh!
There was an error while loading.Please reload this page.
Closes#2077
Fixes#2076
NoContraction.MTLLibrary.preciseto the variable declarations used by the affected functions.Note:
SignedZeroInfNanPreserveand other features ofVK_KHR_shader_float_controlsare intentionally not included.