Movatterモバイル変換


[0]ホーム

URL:


This is the mail archive of thelibc-alpha@sourceware.orgmailing list for theglibc project.


Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:[Date Prev] [Date Next][Thread Prev] [Thread Next]
Other format:[Raw text]

Re: Variations of memset()


On Fri, Aug 4, 2017 at 12:54 PM, Carlos O'Donell <carlos@redhat.com> wrote:> On 08/04/2017 03:11 PM, Carlos O'Donell wrote:>> On 08/04/2017 03:02 PM, Matthew Wilcox wrote:>>> Here's the sample usage from the symbios driver:>>>>>> -               for (i = 0 ; i < 64 ; i++)>>> -                       tp->luntbl[i] = cpu_to_scr(vtobus(&np->badlun_sa));>>> +               memset32(tp->luntbl, cpu_to_scr(vtobus(&np->badlun_sa)), 64);>>>>>> I expect a lot of users would be of this type; simply replacing the>>> explicit for-loop equivalent with a library call.>>>> Have you measured the performance of this kind of conversion when using a>> simple application and a library implementing your various memset routines?>> In the kernel is one thing, outside of the kernel we have dynamic linking>> and no-inling across that shared object boundary.>> I want to  reiterate that measuring the performance of various options in> userspace is going to be relevant (particularly when they vary from the kernel):>> * Application doing the naive loop above (-O0).>> * Application doing the naive loop above ([-O2,-O3] + <vectorize options>).>> * Application calling memset32 (-O0)>> * Application calling memset32 (-O3)>> <vectorize options>="-ftree-vectorize [-msse2,-mavx] -fopt-info-missed=missed.all">> You need to split the memset32 into another DSO to simulate this accurately.>These functions aren't very useful for x86-64 where wmemset,aka, memset32, is implemented with memset:Dump of assembler code for function __wmemset_sse2_unaligned:   0x0000000000000020 <+0>: shl    $0x2,%rdx   0x0000000000000024 <+4>: movd   %esi,%xmm0   0x0000000000000028 <+8>: mov    %rdi,%rax   0x000000000000002b <+11>: pshufd $0x0,%xmm0,%xmm0   0x0000000000000030 <+16>: jmp    0x64 <__memset_sse2_unaligned+20>End of assembler dump.-- H.J.

Index Nav: [Date Index] [Subject Index] [Author Index] [Thread Index]
Message Nav:[Date Prev] [Date Next][Thread Prev] [Thread Next]

[8]ページ先頭

©2009-2026 Movatter.jp