- Notifications
You must be signed in to change notification settings - Fork15.3k
Open
Description
This is the link to godbolt with the full reproducer:https://godbolt.org/z/qYczcba39
The problem is that the pragma doesn't switch the mode when using intrinsics directly, but works when using the operators for the__m128 types.
I've originally discovered this in clang 14.0.1.
The code to see the problem is this (compiled with-Ofast -msse4.2 -mrecip=none):
__m128func(__m128 d,float oldLen,float newLen) {#pragma float_control(precise, on)return_mm_div_ps(_mm_mul_ps(d,_mm_set1_ps(oldLen)),_mm_set1_ps(newLen));}__m128func1(__m128 d,float oldLen,float newLen) {#pragma float_control(precise, on)return d*oldLen/newLen;}
And it leads to this assembly:
.LCPI1_0: .long0x3f800000 #float1func(float__vector(4),float,float): # @func(float__vector(4),float,float)shufpsxmm1,xmm1,0 #xmm1 =xmm1[0,0,0,0]mulpsxmm0,xmm1movssxmm1, dword ptr[rip+ .LCPI1_0] #xmm1 = mem[0],zero,zero,zerodivssxmm1,xmm2shufpsxmm1,xmm1,0 #xmm1 =xmm1[0,0,0,0]mulpsxmm0,xmm1retfunc1(float__vector(4),float,float): # @func1(float__vector(4),float,float)shufpsxmm1,xmm1,0 #xmm1 =xmm1[0,0,0,0]mulpsxmm0,xmm1shufpsxmm2,xmm2,0 #xmm2 =xmm2[0,0,0,0]divpsxmm0,xmm2ret
Generally the use of*(1/a) optimization here seems questionable and cland doesn't do it for scalars, only for vector/simd types. Is this another bug that needs to be reported separately?