Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Accelerate Vector128<long>::op_Multiply on x64#103555

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Merged
EgorBo merged 21 commits intodotnet:mainfromEgorBo:arm-mul-64bit
Jun 28, 2024
Merged
Show file tree
Hide file tree
Changes from19 commits
Commits
Show all changes
21 commits
Select commitHold shift + click to select a range
ae17211
Accelerate Vector128 mul for long/ulong
EgorBoJun 17, 2024
afda312
better ulong version
EgorBoJun 17, 2024
ab01574
fix build
EgorBoJun 17, 2024
21b42de
Update Vector128_1.cs
EgorBoJun 17, 2024
581f1e2
Sse41 version
EgorBoJun 17, 2024
49a359f
Update src/libraries/System.Private.CoreLib/src/System/Runtime/Intrin…
EgorBoJun 17, 2024
57898f0
Update Vector128_1.cs
EgorBoJun 17, 2024
f1be705
Update Vector128_1.cs
EgorBoJun 19, 2024
95d0eb8
Update Vector128_1.cs
EgorBoJun 19, 2024
dcfd93d
Update Vector128_1.cs
EgorBoJun 19, 2024
7fec9e3
Update Vector128_1.cs
EgorBoJun 19, 2024
e172296
Update Vector128_1.cs
EgorBoJun 19, 2024
0456d12
Update Vector128_1.cs
EgorBoJun 20, 2024
8af6104
Merge branch 'main' of https://github.com/dotnet/runtime into arm-mul…
EgorBoJun 20, 2024
69572a2
Update Vector128_1.cs
EgorBoJun 20, 2024
d607549
clean up
EgorBoJun 20, 2024
573a37b
move to jit
EgorBoJun 20, 2024
3790d1b
clean up
EgorBoJun 20, 2024
60441f3
revert unrelated changes
EgorBoJun 21, 2024
5b78ddd
Merge branch 'main' of https://github.com/dotnet/runtime into arm-mul…
EgorBoJun 28, 2024
cc257dd
Address feedback
EgorBoJun 28, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 46 additions & 2 deletionssrc/coreclr/jit/gentree.cpp
View file
Open in desktop
Original file line numberDiff line numberDiff line change
Expand Up@@ -21592,11 +21592,55 @@ GenTree* Compiler::gtNewSimdBinOpNode(
{
intrinsic = NI_AVX10v1_MultiplyLow;
}
else
else if (compOpportunisticallyDependsOn(InstructionSet_AVX512DQ_VL))
{
assert(compIsaSupportedDebugOnly(InstructionSet_AVX512DQ_VL));
intrinsic = NI_AVX512DQ_VL_MultiplyLow;
}
else
{
assert(((simdSize == 16) && compOpportunisticallyDependsOn(InstructionSet_SSE41)) ||
((simdSize == 32) && compOpportunisticallyDependsOn(InstructionSet_AVX2)));

// Make op1 and op2 multi-use:
GenTree* op1Dup = fgMakeMultiUse(&op1);
GenTree* op2Dup = fgMakeMultiUse(&op2);

const bool is256 = simdSize == 32;

// Vector256<ulong> tmp0 = Avx2.Multiply(left, right);
GenTreeHWIntrinsic* tmp0 =
gtNewSimdHWIntrinsicNode(type, op1, op2, is256 ? NI_AVX2_Multiply : NI_SSE2_Multiply,
CORINFO_TYPE_ULONG, simdSize);

// Vector256<uint> tmp1 = Avx2.Shuffle(right.AsUInt32(), ZWXY);
GenTree* shuffleMask = gtNewIconNode(SHUFFLE_ZWXY, TYP_INT);
GenTreeHWIntrinsic* tmp1 = gtNewSimdHWIntrinsicNode(type, op2Dup, shuffleMask,
is256 ? NI_AVX2_Shuffle : NI_SSE2_Shuffle,
CORINFO_TYPE_UINT, simdSize);

// Vector256<uint> tmp2 = Avx2.MultiplyLow(left.AsUInt32(), tmp1);
GenTreeHWIntrinsic* tmp2 =
gtNewSimdHWIntrinsicNode(type, op1Dup, tmp1,
is256 ? NI_AVX2_MultiplyLow : NI_SSE41_MultiplyLow,
CORINFO_TYPE_UINT, simdSize);

// Vector256<int> tmp3 = Avx2.HorizontalAdd(tmp2.AsInt32(), Vector256<int>.Zero);
GenTreeHWIntrinsic* tmp3 =
gtNewSimdHWIntrinsicNode(type, tmp2, gtNewZeroConNode(type),
is256 ? NI_AVX2_HorizontalAdd : NI_SSSE3_HorizontalAdd,
CORINFO_TYPE_UINT, simdSize);
Comment on lines +21536 to +21540

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I know in other places we've started avoidinghadd in favor ofshuffle+add, might be worth seeing if that's appropriate here too (low priority, non blocking)

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

I tried to benchmark different implementations for it and they all were equaly fast e.g.#99871 (comment)

tannergooding reacted with thumbs up emoji

// Vector256<int> tmp4 = Avx2.Shuffle(tmp3, YWXW);
shuffleMask = gtNewIconNode(SHUFFLE_YWXW, TYP_INT);
GenTreeHWIntrinsic* tmp4 =
gtNewSimdHWIntrinsicNode(type, tmp3, shuffleMask, is256 ? NI_AVX2_Shuffle : NI_SSE2_Shuffle,
CORINFO_TYPE_UINT, simdSize);

// result = tmp0 + tmp4;
op1 = tmp0;
op2 = tmp4;
intrinsic = simdSize == 32 ? NI_AVX2_Add : NI_SSE2_Add;
}

break;
}
Expand Down
22 changes: 15 additions & 7 deletionssrc/coreclr/jit/hwintrinsicxarch.cpp
View file
Open in desktop
Original file line numberDiff line numberDiff line change
Expand Up@@ -2706,17 +2706,25 @@ GenTree* Compiler::impSpecialIntrinsic(NamedIntrinsic intrinsic,

if (varTypeIsLong(simdBaseType))
{
if (simdSize != 64 && !canUseEvexEncoding())
if (TARGET_POINTER_SIZE == 4)
{
// TODO-XARCH-CQ:We shouldsupport long/ulong multiplication
// TODO-XARCH-CQ:32bitsupport

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

What's blocking 32-bit support? It doesn't look like we're using any_X64 intrinsics in the fallback logic?

Copy link
MemberAuthor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Not sure to be honest, that check was pre-existing, I only changed comment

break;
}
// else if simdSize == 64 then above assert would check if baseline isa supported

#if defined(TARGET_X86)
// TODO-XARCH-CQ: We need to support 64-bit CreateBroadcast
break;
#endif // TARGET_X86
if ((simdSize == 32) && compOpportunisticallyDependsOn(InstructionSet_AVX2))
{
// Emulate NI_AVX512DQ_VL_MultiplyLow with AVX2 for SIMD32
}
else if ((simdSize == 16) && compOpportunisticallyDependsOn(InstructionSet_SSE41))
{
// Emulate NI_AVX512DQ_VL_MultiplyLow with SSE41 for SIMD16
}
else
{
// Software fallback
break;
}
}

CORINFO_ARG_LIST_HANDLE arg1 = sig->args;
Expand Down
1 change: 1 addition & 0 deletionssrc/coreclr/jit/simd.h
View file
Open in desktop
Original file line numberDiff line numberDiff line change
Expand Up@@ -1015,6 +1015,7 @@ void BroadcastConstantToSimd(TSimd* result, TBase arg0)
#define SHUFFLE_XYZW 0x1B // 00 01 10 11
#define SHUFFLE_YXYX 0x44 // 01 00 01 00
#define SHUFFLE_YWXZ 0x72 // 01 11 00 10
#define SHUFFLE_YWXW 0x73 // 01 11 00 11
#define SHUFFLE_YYZZ 0x5A // 01 01 10 10
#define SHUFFLE_ZXXX 0x80 // 10 00 00 00
#define SHUFFLE_ZXXY 0x81 // 10 00 00 01
Expand Down
View file
Open in desktop
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
// Licensed to the .NET Foundation under one or more agreements.
// Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the MIT license.

using System.Diagnostics;
Expand Down

[8]ページ先頭

©2009-2025 Movatter.jp