Movatterモバイル変換
[0]ホーム
This is the mail archive of thelibc-alpha@sourceware.orgmailing list for theglibc project.
Re: [PATCH] x86-64: Optimize e_expf with FMA [BZ #21912]
On 16/08/2017 11:56, Szabolcs Nagy wrote:> On 16/08/17 15:31, Arjan van de Ven wrote:>> On 8/16/2017 7:04 AM, Carlos O'Donell wrote:>>> On 08/16/2017 09:34 AM, H.J. Lu wrote:>>>> FMA optimized e_expf improves performance by more than 50% on Skylake.>>>>>>>> Any comments?>>>>>> Exactly how much of e_expf-fma.S do you need to achieve that 50% speedup?>>>> the core "fast path">> (the bit after /* Main path: here if 2^(-28)<=|x|<125*log(2) */ )>>>>>>>>>> How does this algorithm compare to what is already implemented for e_expf?>>>> I started with the SSE version of that e_expf, turned it into AVX, used FMA where possible and fixed a few>> glass jaws in the fast path that you hit on skylake.>>>> the slow path is more a direct 1:1 translation from SSE to AVX (because mixing SSE and AVX>> is generally a bad idea)>>> > based on my benchmarks portable c code can> easily beat the hand written sse asm> (i haven't tested with avx+fma though).> > the idea is that the x86 asm has overkill> precision (very close to 0.5 ulp error, but> not correctly rounded), we can debate this> later, but i think the polynomial can be> reduced and there should not be much difference> between asm and c performance (only the> round/convert to int operation is tricky:> for different targets the optimal code is> different, but that can be a target specific> macro hook).> > anyway i posted my code to the arm> optimized-routines github repo, i'll start> posting the patches to glibc soon.> > (one of the reasons posting glibc patches is> difficult is the nonsensical target specific> asm codes and ifunc resolvers that break when> i update the generic code in a way that> bypasses the wrapper function which is another> source of improvements.)> Yes, the include of generic implementation for ifunc default version could use some cleanup. However mostly, if not all, can be checked bybuild-many-glibc.py (it would take time though).
[8]ページ先頭