Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

[wasm] Initial SIMD support#70086

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
radekdoulik merged 25 commits intodotnet:mainfromradekdoulik:pr-wasm-simd-wip-3
Jun 6, 2022

Conversation

radekdoulik
Copy link
Member

Add initial SIMD support for wasm. This is subset of the originaldraft PR without the public API additions. I left the underlying parts of newWasmBase class implementation here as well to not loose it.

AddWasmSIMD property to enable SIMD in AOT builds. With the property enabled, the apps built with AOT get SIMD intrinsics inlined for parts ofS.R.I.Vector128 andS.R.I.Vector128<T> API.

Add test to build and run a simple app with SIMD enabled.

Example of the produced code:

> wa-info -d -f Vector.*Multiply.*RunStep src/mono/sample/wasm/browser-bench/bin/Debug/AppBundle/dotnet.wasm(func Wasm_Browser_Bench_Sample_Sample_VectorTask_Multiply_RunStep(param $0 i32, $1 i32)) local.get $0 i32.eqz if  call mini_llvmonly_throw_nullref_exception  unreachable local.get $0 local.get $0 v128.load offset:24    [SIMD] local.get $0 v128.load offset:8    [SIMD] i32x4.mul    [SIMD] v128.store offset:40    [SIMD]

The C# code:

            ...            Vector128<int> vector1, vector2, vector3;            public override void RunStep() => vector3 = vector1 * vector2;

And comparison to the non SIMD version:

> wa-diff -d -f Vector.*Multiply.*RunStep src/mono/sample/wasm/browser-bench/bin-nosimd/Debug/AppBundle/dotnet.wasm src/mono/sample/wasm/browser-bench/bin/Debug/AppBundle/dotnet.wasm(func Wasm_Browser_Bench_Sample_Sample_VectorTask_Multiply_RunStep(param i32, i32))-  local $2 i32+  local.get $0-  local $3 i32-  local $4 i64-  local $5 i64-  local $6 i64-  local $7 i64-  local $8 i32-  global.get $__stack_pointer-  i32.const 80-  i32.sub-  local.tee $2-  global.set $__stack_pointer-  i32.const 3121246-  i32.load8.u   i32.eqz   if-   i32.const 2056316-   call mono_aot_Wasm_Browser_Bench_Sample_init_method+   call mini_llvmonly_throw_nullref_exception+   unreachable-   i32.const 3121246-   i32.const 1-   i32.store8   local.get $0-  if-   local.get $2   local.get $0-   i64.load offset:16+  v128.load offset:24    [SIMD]-   local.tee $4-   i64.store offset:40 align:3-   local.get $2   local.get $0-   i64.load offset:8-   local.tee $5+  v128.load offset:8    [SIMD]+  i32x4.mul    [SIMD]-   i64.store offset:32 align:3-   local.get $2-   local.get $0-   i64.load offset:32-   local.tee $6-   i64.store offset:56 align:3-   local.get $2-   local.get $0-   i64.load offset:24-   local.tee $7-   i64.store offset:48 align:3-   i32.const 3114552-   i32.load align:2-   local.tee $3-   i32.load offset:4 align:2-   local.set $8-   local.get $3-   i32.load align:2-   local.set $3-   local.get $2-   local.get $4-   i64.store offset:24 align:3-   local.get $2-   local.get $6-   i64.store offset:8 align:3-   local.get $2-   local.get $5-   i64.store offset:16 align:3-   local.get $2-   local.get $7-   i64.store align:3-   local.get $2-   i32.const -1-   i32.sub-   local.get $2-   i32.const 16-   i32.add-   local.get $2-   local.get $8-   local.get $3-   call.indirect (func (param i32, i32, i32, i32))-   local.get $0-   local.get $2-   v128.load offset:64 align:3    [SIMD]   v128.store offset:40    [SIMD]-   local.get $2-   i32.const 80-   i32.add-   global.set $__stack_pointer-   return--  call mini_llvmonly_throw_nullref_exception-  unreachable

Measurements of the bench-sample (aot and aot + SIMD are relevant here):

browser-bench/Release configuration

.NET7 May 19th

*1 .NET7 May 13th + emscripten 3.1.9 + SIMD

Chrome Version 101.0.4951.67 (Official Build) (64-bit)

measurementaotaot + EHaot + SIMD *1aot + EH + SIMD *1interpinterp + EH
AppStart, Page show26.1179ms29.3718ms31.4959ms25.8079ms34.3243ms24.7880ms
AppStart, Reach managed213.6154ms199.3214ms204.4444ms198.1786ms201.6667ms196.5714ms
Exceptions, NoExceptionHandling0.0537us0.0555us0.0547us0.0532us0.1068us0.0938us
Exceptions, TryCatch0.0767us0.0755us0.0916us0.0781us0.1042us0.0950us
Exceptions, TryCatchThrow0.0080ms0.0079ms0.0078ms0.0079ms0.0019ms0.0019ms
Exceptions, TryCatchFilter0.0776us0.0751us0.0820us0.0752us0.1039us0.0983us
Exceptions, TryCatchFilterInline0.0531us0.0521us0.0563us0.0532us0.0889us0.0864us
Exceptions, TryCatchFilterThrow0.0128ms0.0122ms0.0120ms0.0123ms0.0026ms0.0026ms
Exceptions, TryCatchFilterThrowApplies0.0100ms0.0096ms0.0097ms0.0098ms0.0019ms0.0018ms
Json, non-ASCII text serialize0.3699ms0.3529ms0.3553ms0.3681ms8.1483ms7.9713ms
Json, non-ASCII text deserialize1.5710ms1.5414ms1.5283ms1.5166ms12.3198ms12.3657ms
Json, small serialize0.0371ms0.0370ms0.0359ms0.0360ms0.2512ms0.2510ms
Json, small deserialize0.0557ms0.0546ms0.0537ms0.0533ms0.4237ms0.3976ms
Json, large serialize10.1708ms9.9264ms9.6885ms9.5669ms75.5970ms72.3194ms
Json, large deserialize15.5446ms15.0997ms14.9708ms14.7407ms112.7872ms107.1837ms
Vector, Create Vector1280.0602us0.0581us0.0450us0.0503us0.1793us0.1610us
Vector, Add 2 Vector128's0.5338us0.5359us0.0450us0.0481us0.2434us0.2325us
Vector, Multiply 2 Vector128's0.5331us0.5460us0.0451us0.0481us0.2421us0.2303us
WebSocket, PartialSend 1B0.4781us0.4582us0.4422us0.4462us0.0017ms0.0017ms
WebSocket, PartialSend 64KB0.0628ms0.0703ms0.0653ms0.0617ms0.0685ms0.0640ms
WebSocket, PartialSend 1MB0.9000ms0.9545ms0.9727ms0.9364ms0.9727ms0.9455ms
WebSocket, PartialReceive 1B0.8237us0.8083us0.7852us0.7852us0.0023ms0.0023ms
WebSocket, PartialReceive 10KB0.0020ms0.0020ms0.0040ms0.0020ms0.0040ms0.0040ms
WebSocket, PartialReceive 100KB0.0000us0.0000us0.0000us0.0000us0.0000us0.0000us

SamMonoRT reacted with heart emojifanyang-mono reacted with rocket emoji
The llvm code generator works nicely with them.
So that C#    WasmBase.Constant(0xff11ff22ff33ff44, 0xff55ff66ff77ff88)is compiled into wasm code    v128.const 0xff11ff22ff33ff44ff55ff66ff77ff88    [SIMD]
This will need more work, as it crashes clang during 'WebAssemblyInstruction Selection' pass:    WasmApp.Native.targets(353,5): error : 3.    Running pass 'WebAssembly Instruction Selection' on function '@corlib_System_Runtime_Intrinsics_Wasm_WasmBase_Shuffle_System_Runtime_Intrinsics_Vector128_1_byte_System_Runtime_Intrinsics_Vector128_1_byte_System_Runtime_Intrinsics_Vector128_1_byte'
Also add "experimental" to the property comment
@fanyang-mono
Copy link
Member

What doesEH stand for?

@SamMonoRT
Copy link
Member

What doesEH stand for?

Exception Handling

fanyang-mono reacted with thumbs up emoji

@ghostghost added the needs-author-actionAn issue or pull request that requires more info or actions from the author. labelJun 1, 2022
@ghostghost removed the needs-author-actionAn issue or pull request that requires more info or actions from the author. labelJun 2, 2022
@radicalradical added the arch-wasmWebAssembly architecture labelJun 3, 2022
@ghost
Copy link

Tagging subscribers to 'arch-wasm':@lewing
See info in area-owners.md if you want to be subscribed.

Issue Details

Add initial SIMD support for wasm. This is subset of the originaldraft PR without the public API additions. I left the underlying parts of newWasmBase class implementation here as well to not loose it.

AddWasmSIMD property to enable SIMD in AOT builds. With the property enabled, the apps built with AOT get SIMD intrinsics inlined for parts ofS.R.I.Vector128 andS.R.I.Vector128<T> API.

Add test to build and run a simple app with SIMD enabled.

Example of the produced code:

> wa-info -d -f Vector.*Multiply.*RunStep src/mono/sample/wasm/browser-bench/bin/Debug/AppBundle/dotnet.wasm(func Wasm_Browser_Bench_Sample_Sample_VectorTask_Multiply_RunStep(param $0 i32, $1 i32)) local.get $0 i32.eqz if  call mini_llvmonly_throw_nullref_exception  unreachable local.get $0 local.get $0 v128.load offset:24    [SIMD] local.get $0 v128.load offset:8    [SIMD] i32x4.mul    [SIMD] v128.store offset:40    [SIMD]

The C# code:

            ...            Vector128<int> vector1, vector2, vector3;            public override void RunStep() => vector3 = vector1 * vector2;

And comparison to the non SIMD version:

> wa-diff -d -f Vector.*Multiply.*RunStep src/mono/sample/wasm/browser-bench/bin-nosimd/Debug/AppBundle/dotnet.wasm src/mono/sample/wasm/browser-bench/bin/Debug/AppBundle/dotnet.wasm(func Wasm_Browser_Bench_Sample_Sample_VectorTask_Multiply_RunStep(param i32, i32))-  local $2 i32+  local.get $0-  local $3 i32-  local $4 i64-  local $5 i64-  local $6 i64-  local $7 i64-  local $8 i32-  global.get $__stack_pointer-  i32.const 80-  i32.sub-  local.tee $2-  global.set $__stack_pointer-  i32.const 3121246-  i32.load8.u   i32.eqz   if-   i32.const 2056316-   call mono_aot_Wasm_Browser_Bench_Sample_init_method+   call mini_llvmonly_throw_nullref_exception+   unreachable-   i32.const 3121246-   i32.const 1-   i32.store8   local.get $0-  if-   local.get $2   local.get $0-   i64.load offset:16+  v128.load offset:24    [SIMD]-   local.tee $4-   i64.store offset:40 align:3-   local.get $2   local.get $0-   i64.load offset:8-   local.tee $5+  v128.load offset:8    [SIMD]+  i32x4.mul    [SIMD]-   i64.store offset:32 align:3-   local.get $2-   local.get $0-   i64.load offset:32-   local.tee $6-   i64.store offset:56 align:3-   local.get $2-   local.get $0-   i64.load offset:24-   local.tee $7-   i64.store offset:48 align:3-   i32.const 3114552-   i32.load align:2-   local.tee $3-   i32.load offset:4 align:2-   local.set $8-   local.get $3-   i32.load align:2-   local.set $3-   local.get $2-   local.get $4-   i64.store offset:24 align:3-   local.get $2-   local.get $6-   i64.store offset:8 align:3-   local.get $2-   local.get $5-   i64.store offset:16 align:3-   local.get $2-   local.get $7-   i64.store align:3-   local.get $2-   i32.const -1-   i32.sub-   local.get $2-   i32.const 16-   i32.add-   local.get $2-   local.get $8-   local.get $3-   call.indirect (func (param i32, i32, i32, i32))-   local.get $0-   local.get $2-   v128.load offset:64 align:3    [SIMD]   v128.store offset:40    [SIMD]-   local.get $2-   i32.const 80-   i32.add-   global.set $__stack_pointer-   return--  call mini_llvmonly_throw_nullref_exception-  unreachable

Measurements of the bench-sample (aot and aot + SIMD are relevant here):

browser-bench/Release configuration

.NET7 May 19th

*1 .NET7 May 13th + emscripten 3.1.9 + SIMD

Chrome Version 101.0.4951.67 (Official Build) (64-bit)

measurementaotaot + EHaot + SIMD *1aot + EH + SIMD *1interpinterp + EH
AppStart, Page show26.1179ms29.3718ms31.4959ms25.8079ms34.3243ms24.7880ms
AppStart, Reach managed213.6154ms199.3214ms204.4444ms198.1786ms201.6667ms196.5714ms
Exceptions, NoExceptionHandling0.0537us0.0555us0.0547us0.0532us0.1068us0.0938us
Exceptions, TryCatch0.0767us0.0755us0.0916us0.0781us0.1042us0.0950us
Exceptions, TryCatchThrow0.0080ms0.0079ms0.0078ms0.0079ms0.0019ms0.0019ms
Exceptions, TryCatchFilter0.0776us0.0751us0.0820us0.0752us0.1039us0.0983us
Exceptions, TryCatchFilterInline0.0531us0.0521us0.0563us0.0532us0.0889us0.0864us
Exceptions, TryCatchFilterThrow0.0128ms0.0122ms0.0120ms0.0123ms0.0026ms0.0026ms
Exceptions, TryCatchFilterThrowApplies0.0100ms0.0096ms0.0097ms0.0098ms0.0019ms0.0018ms
Json, non-ASCII text serialize0.3699ms0.3529ms0.3553ms0.3681ms8.1483ms7.9713ms
Json, non-ASCII text deserialize1.5710ms1.5414ms1.5283ms1.5166ms12.3198ms12.3657ms
Json, small serialize0.0371ms0.0370ms0.0359ms0.0360ms0.2512ms0.2510ms
Json, small deserialize0.0557ms0.0546ms0.0537ms0.0533ms0.4237ms0.3976ms
Json, large serialize10.1708ms9.9264ms9.6885ms9.5669ms75.5970ms72.3194ms
Json, large deserialize15.5446ms15.0997ms14.9708ms14.7407ms112.7872ms107.1837ms
Vector, Create Vector1280.0602us0.0581us0.0450us0.0503us0.1793us0.1610us
Vector, Add 2 Vector128's0.5338us0.5359us0.0450us0.0481us0.2434us0.2325us
Vector, Multiply 2 Vector128's0.5331us0.5460us0.0451us0.0481us0.2421us0.2303us
WebSocket, PartialSend 1B0.4781us0.4582us0.4422us0.4462us0.0017ms0.0017ms
WebSocket, PartialSend 64KB0.0628ms0.0703ms0.0653ms0.0617ms0.0685ms0.0640ms
WebSocket, PartialSend 1MB0.9000ms0.9545ms0.9727ms0.9364ms0.9727ms0.9455ms
WebSocket, PartialReceive 1B0.8237us0.8083us0.7852us0.7852us0.0023ms0.0023ms
WebSocket, PartialReceive 10KB0.0020ms0.0020ms0.0040ms0.0020ms0.0040ms0.0040ms
WebSocket, PartialReceive 100KB0.0000us0.0000us0.0000us0.0000us0.0000us0.0000us
Author:radekdoulik
Assignees:radekdoulik
Labels:

arch-wasm,area-Build-mono

Milestone:-

Copy link
Member

@radicalradical left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

The build, and test changes look good.

@radical
Copy link
Member

Are there benchmarks indotnet/performance that need to be enabled now, or new ones need to be added?

@fanyang-mono
Copy link
Member

Are there benchmarks indotnet/performance that need to be enabled now, or new ones need to be added?

These ones should see improvementshttps://github.com/dotnet/performance/blob/main/src/benchmarks/micro/libraries/System.Numerics.Vectors/Perf_VectorOfT.cs

But they should be running already with bigger sets of instructions.

@radekdoulikradekdoulik merged commit1f2eaaa intodotnet:mainJun 6, 2022
@ghostghost locked asresolvedand limited conversation to collaboratorsJul 6, 2022
Sign up for freeto subscribe to this conversation on GitHub. Already have an account?Sign in.
Reviewers

@radicalradicalradical approved these changes

@lewinglewinglewing approved these changes

@vargazvargazvargaz approved these changes

@fanyang-monofanyang-monofanyang-mono approved these changes

@lambdageeklambdageekAwaiting requested review from lambdageek

@SamMonoRTSamMonoRTAwaiting requested review from SamMonoRT

@marek-safarmarek-safarAwaiting requested review from marek-safar

Assignees

@radekdoulikradekdoulik

Labels
arch-wasmWebAssembly architecturearea-Build-mono
Projects
None yet
Milestone
No milestone
Development

Successfully merging this pull request may close these issues.

6 participants
@radekdoulik@fanyang-mono@SamMonoRT@radical@lewing@vargaz

[8]ページ先頭

©2009-2025 Movatter.jp