Movatterモバイル変換


[0]ホーム

URL:


Skip to content
DEV Community
Log in Create account

DEV Community

Volatile Delegate
Volatile Delegate

Posted on

     

SIMD aggregate performance

Foreword 🍓

Dotnet provides several classes, some under theSystem.Runtime.Intrinsics namespace that allow hardware to execute instructions inparallel.

usingSystem.Runtime.Intrinsics;Vector512v512;Vector256v256;Vector128v128;
Enter fullscreen modeExit fullscreen mode

The numbersuffix(512, 256, 128) indicates the size in bits of the vector that the hardware can process in parallel.

This has positive impact in operations that performs aggregates, specially in a loop with large arrays.

To know if the hardware allows this type of registers we can consult the static read-only propertyIsHardwareAccelerated

if(Vector256.IsHardwareAccelerated){_is256=true;...}
Enter fullscreen modeExit fullscreen mode

The above code will test if our hardware supports 256 bit vector operations through JIT intrinsics.

Exploring 🧠

Suppose we want to simultaneously calculate the maximum and minimum of a sequence of integers usingVector256.

The process will consist of creating a loop in which we will move forward taking 256-bit chunks and updating the maximum and minimum

(TMin,TMax)MinMax256<T>(ReadOnlySpan<T>source)whereT:struct,INumber<T>{}
Enter fullscreen modeExit fullscreen mode

First we initialize some variables to hold the current element, the last element, and the last size wise element (thetovariable)

refTcurrent=refMemoryMarshal.GetReference(source);refTlast=refUnsafe.Add(refcurrent,source.Length);refTto=refUnsafe.Add(reflast,-Vector256<T>.Count);Vector256<T>minElement=Vector256.LoadUnsafe(refcurrent);Vector256<T>maxElement=minElement;
Enter fullscreen modeExit fullscreen mode

Then we start the loop. Inside, we load data in 256 bit chunks callingVector256.LoadUsafe

while(Unsafe.IsAddressLessThan(refcurrent,refto)){Vector256<T>tempElement=Vector256.LoadUnsafe(refcurrent);minElement=Vector256.Min(minElement,tempElement);maxElement=Vector256.Max(maxElement,tempElement);current=refUnsafe.Add(refcurrent,Vector256<T>.Count);}
Enter fullscreen modeExit fullscreen mode

We use the static Min and Max methods ofVector256and store the value in minElement and maxElement.

Finally, we increment the position counter (current) by adding 256 bits to the pointer.

Once we have exceeded the established size, we have to calculate the maximum and minimum individually

Tmin=minElement[0];Tmax=maxElement[0];for(inti=1;i<Vector256<T>.Count;i++){TtempMin=minElement[i];if(tempMin<min){min=tempMin;}TtempMax=maxElement[i];if(tempMax>max){max=tempMax;}}
Enter fullscreen modeExit fullscreen mode

After that we calculate the remaining elements if any:

while(Unsafe.IsAddressLessThan(refcurrent,reflast)){if(current<min){min=current;}if(current>max){max=current;}current=refUnsafe.Add(refcurrent,1);}
Enter fullscreen modeExit fullscreen mode

And that's all, we return the results:

return(min,max);
Enter fullscreen modeExit fullscreen mode

Benchmark 🔥

A quick test with BenchmarkDotnet calculating the maximum and minimum of an array of 10_000 integers reveals a performance improvement ofx146with Vector256 support.

💡 Ryzen 7 1700, 1 CPU
.NET SDK=8.0.100-rc.1.23455.8

MethodMean (ns)
🐢 MinMaxLinq .NET Framework 4.8118,675.226
MinMaxSimd .NET 8.0808.150

Farewell

All the code with a more elavorated example is hosted in github. Be happy and love your family 💖

Simd Iteration

Test SIMD 512, 256, 128 registers for fast aggregate calculations.

Unfortunately my hardware doesn't support Vector512.

Anyway, the performance improvement is mindblowing.

Important

net8 is x146 times faster than net48 for calculate the Min and Max at the same time !!

Results

  • BenchmarkDotNet=v0.13.5, OS=Windows 10 (10.0.19044.3086/21H2/November2021Update)
  • AMD Ryzen 7 1700, 1 CPU, 16 logical and 8 physical cores
  • .NET SDK=8.0.100-rc.1.23455.8
  • [Host] : .NET 8.0.0 (8.0.23.41904), X64 RyuJIT AVX2
  • .NET 7.0 : .NET 7.0.11 (7.0.1123.42427), X64 RyuJIT AVX2
  • .NET 8.0 : .NET 8.0.0 (8.0.23.41904), X64 RyuJIT AVX2
  • .NET Framework 4.8 : .NET Framework 4.8 (4.8.4644.0), X64 RyuJIT VectorSize=256
MethodRuntimeSizeMeanAllocated
MinMaxLinq.NET Framework 4.810000118,675.226 ns65 B
MinMaxLinq.NET 7.0100002,350.046 ns-
MinMaxLinq.NET 8.0100001,228.518 ns-
MinMaxSimd.NET 7.010000834.291 ns-
MinMaxSimd.NET 8.010000808.150 ns-

References

System.Runtime.Intrinsics Espacio de nombres | Microsoft Learn

Contiene tipos que se usan para crear y transmitir estados de registro de distintos tamaños y formatos para su uso con las extensiones del conjunto de instrucciones. Para obtener instrucciones sobre cómo manipular estos registros, vea System.Runtime.Intrinsics.X86 y System.Runtime.Intrinsics.Arm.

learn.microsoft.com

GitHub logo System.Runtime.Intrinsics work planned for .NET 8#79005

This is a work in progress as we develop our .NET 8 plans. This list is expected to change throughout the release cycle according to ongoing planning and discussions, with possible additions and subtractions to the scope.

Summary

During .NET 8, we will be focusing on AVX-512, an effort that includes the addition of a new intrinsic typeVector512 as well asVector<T> improvements. Beyond that major theme, we will invest in quality, enhancements and new APIs. This is an ambitious set of work, so it's likely that several of the items below will be pushed out beyond .NET 8. It is also likely additional items will be added throughout the year.

Planned for .NET 8

AVX-512

Quality

Enhancements / New APIs

Top comments(0)

Subscribe
pic
Create template

Templates let you quickly answer FAQs or store snippets for re-use.

Dismiss

Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment'spermalink.

For further actions, you may consider blocking this person and/orreporting abuse

  • Location
    Spain
  • Joined

More fromVolatile Delegate

DEV Community

We're a place where coders share, stay up-to-date and grow their careers.

Log in Create account

[8]ページ先頭

©2009-2025 Movatter.jp