This repository was archived by the owner on Jan 23, 2023. It is now read-only.
- Notifications
You must be signed in to change notification settings - Fork2.6k
Implement the SSE hardware intrinsics.#15538
Merged
Uh oh!
There was an error while loading.Please reload this page.
Merged
Changes from1 commit
Commits
Show all changes
32 commits Select commitHold shift + click to select a range
bd56bb3 Adding the remaining SSE intrinsics to hwintrinsiclistxarch.h
tannergoodingadadee1 Adding support for the SSE And, AndNot, Divide, Max, Min, MoveHighToL…
tannergooding1cb5722 Adding tests for the And, AndNot, Divide, Max, Min, MoveHighToLow, Mo…
tannergooding402550c Adding support for the SSE compare eq, gt, ge, lt, le, ne, ord, and u…
tannergooding77fa074 Adding tests for the SSE compare eq, gt, ge, lt, le, ne, ord, and uno…
tannergooding91c7550 Adding support for the SSE Reciprocal, ReciprocalSqrt, and Sqrt intri…
tannergoodingc09ad38 Adding tests for the SSE Reciprocal, ReciprocalSqrt, and Sqrt intrinsics
tannergooding3f115ef Adding support for the SSE Set, SetAll, and SetZero intrinsics
tannergoodingfbc91fc Adding tests for the SSE Set, SetAll, and SetZero intrinsics
tannergooding94f82fb Adding support for the SSE Shuffle intrinsic
tannergooding9e54585 Adding tests for the SSE Shuffle intrinsic
tannergooding224b8dc Adding support for the SSE StaticCast intrinsic
tannergooding1283d87 Adding tests for the SSE StaticCast intrinsic
tannergoodingbfc992e Adding support for the SSE Add, Divide, Max, Min, Move, Multiply, and…
tannergooding38af536 Adding tests for the SSE Add, Divide, Max, Min, Move, Multiply, and S…
tannergoodingeaf9aef Adding support for the SSE compare eq, gt, ge, lt, le, ne, ord, and u…
tannergoodingcd60a85 Adding tests for the SSE compare eq, gt, ge, lt, le, ne, ord, and uno…
tannergooding3fcdaf8 Adding support for the SSE Reciprocal, ReciprocalSqrt, and Sqrt scala…
tannergoodingdb75c98 Adding tests for the SSE Reciprocal, ReciprocalSqrt, and Sqrt scalar …
tannergoodinge84b55e Adding support for the SSE ConvertTo Int32, Int32WithTruncation, Int6…
tannergooding2a256bd Adding tests for the SSE ConvertTo Int32, Int32WithTruncation, Int64W…
tannergooding887d5c4 Adding support for the SSE Compare<op>Ordered and Compare<op>Unordere…
tannergooding0817912 Adding tests for the SSE Compare<op>Ordered and Compare<op>Unordered …
tannergooding5ef844b Adding support for the SSE Set scalar intrinsic
tannergoodingdee8fb7 Adding tests for the SSE Set scalar intrinsic
tannergoodingbaeed0c Adding support for the SSE MoveMask intrinsic
tannergooding51a1a59 Adding tests for the SSE MoveMask intrinsic
tannergoodinga15aa12 Updating the SSE HWIntrinsics to share code where possible.
tannergooding5ca9417 Updating most of the SSE Compare intrinsics to support containment
tannergooding677c5c3 Adding support for the SSE Load, LoadAligned, LoadHigh, LoadLow, and …
tannergooding11b6ac8 Adding tests for the SSE Load, LoadAligned, LoadHigh, LoadLow, and Lo…
tannergoodinga8db845 Resolving PR feedback
tannergoodingFile filter
Filter by extension
Conversations
Failed to load comments.
Loading
Uh oh!
There was an error while loading.Please reload this page.
Jump to
Jump to file
Failed to load files.
Loading
Uh oh!
There was an error while loading.Please reload this page.
Diff view
Diff view
Adding support for the SSE Shuffle intrinsic
- Loading branch information
Uh oh!
There was an error while loading.Please reload this page.
commit94f82fb98657a27165846d5f18a1afee4efc88cd
There are no files selected for viewing
7 changes: 7 additions & 0 deletionssrc/jit/compiler.h
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
12 changes: 12 additions & 0 deletionssrc/jit/gentree.cpp
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
89 changes: 89 additions & 0 deletionssrc/jit/hwintrinsiccodegenxarch.cpp
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -353,6 +353,95 @@ void CodeGen::genSSEIntrinsic(GenTreeHWIntrinsic* node) | ||
| emit->emitIns_SIMD_R_R_R(INS_xorps, targetReg, targetReg, targetReg, TYP_SIMD16); | ||
| break; | ||
| case NI_SSE_Shuffle: | ||
| { | ||
| GenTreeArgList* argList; | ||
| // Shuffle takes 3 operands, so op1 should be an arg list with two | ||
| // additional node in the chain. | ||
| assert(baseType == TYP_FLOAT); | ||
| assert(op1->OperIsList()); | ||
| assert(op1->AsArgList()->Rest() != nullptr); | ||
| assert(op1->AsArgList()->Rest()->Rest() != nullptr); | ||
| assert(op1->AsArgList()->Rest()->Rest()->Rest() == nullptr); | ||
| assert(op2 == nullptr); | ||
| argList = op1->AsArgList(); | ||
| op1 = argList->Current(); | ||
| op1Reg = op1->gtRegNum; | ||
| genConsumeRegs(op1); | ||
| argList = argList->Rest(); | ||
| op2 = argList->Current(); | ||
| op2Reg = op2->gtRegNum; | ||
| genConsumeRegs(op2); | ||
| argList = argList->Rest(); | ||
| op3 = argList->Current(); | ||
| genConsumeRegs(op3); | ||
| if (op3->IsCnsIntOrI()) | ||
| { | ||
| ssize_t ival = op3->AsIntConCommon()->IconValue(); | ||
| emit->emitIns_SIMD_R_R_R_I(INS_shufps, targetReg, op1Reg, op2Reg, (int)ival, TYP_SIMD16); | ||
| } | ||
| else | ||
| { | ||
| // We emit a fallback case for the scenario when op3 is not a constant. This should normally | ||
| // happen when the intrinsic is called indirectly, such as via Reflection. However, it can | ||
| // also occur if the consumer calls it directly and just doesn't pass a constant value. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others.Learn more. This looks great, though I'm wondering if it will be productive to try to abstract this so that we don't have to duplicate so much of this for other "immediate-only" instructions. MemberAuthor There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others.Learn more. I think it might be good to abstract it as well. | ||
| const unsigned jmpCount = 256; | ||
| BasicBlock* jmpTable[jmpCount]; | ||
| unsigned jmpTableBase = emit->emitBBTableDataGenBeg(jmpCount, true); | ||
| unsigned jmpTableOffs = 0; | ||
| // Emit the jump table | ||
| JITDUMP("\n J_M%03u_DS%02u LABEL DWORD\n", Compiler::s_compMethodsCount, jmpTableBase); | ||
| for (unsigned i = 0; i < jmpCount; i++) | ||
| { | ||
| jmpTable[i] = genCreateTempLabel(); | ||
| JITDUMP(" DD L_M%03u_BB%02u\n", Compiler::s_compMethodsCount, jmpTable[i]->bbNum); | ||
| emit->emitDataGenData(i, jmpTable[i]); | ||
| } | ||
| emit->emitDataGenEnd(); | ||
| // Compute and jump to the appropriate offset in the switch table | ||
| regNumber baseReg = node->ExtractTempReg(); // the start of the switch table | ||
| regNumber offsReg = node->GetSingleTempReg(); // the offset into the switch table | ||
| emit->emitIns_R_C(INS_lea, emitTypeSize(TYP_I_IMPL), offsReg, compiler->eeFindJitDataOffs(jmpTableBase), | ||
| 0); | ||
| emit->emitIns_R_ARX(INS_mov, EA_4BYTE, offsReg, offsReg, op3->gtRegNum, 4, 0); | ||
| emit->emitIns_R_L(INS_lea, EA_PTR_DSP_RELOC, compiler->fgFirstBB, baseReg); | ||
| emit->emitIns_R_R(INS_add, EA_PTRSIZE, offsReg, baseReg); | ||
| emit->emitIns_R(INS_i_jmp, emitTypeSize(TYP_I_IMPL), offsReg); | ||
| // Emit the switch table entries | ||
| BasicBlock* switchTableBeg = genCreateTempLabel(); | ||
| BasicBlock* switchTableEnd = genCreateTempLabel(); | ||
| genDefineTempLabel(switchTableBeg); | ||
| for (unsigned i = 0; i < jmpCount; i++) | ||
| { | ||
| genDefineTempLabel(jmpTable[i]); | ||
| emit->emitIns_SIMD_R_R_R_I(INS_shufps, targetReg, op1Reg, op2Reg, i, TYP_SIMD16); | ||
| emit->emitIns_J(INS_jmp, switchTableEnd); | ||
| } | ||
| genDefineTempLabel(switchTableEnd); | ||
| } | ||
| break; | ||
| } | ||
| case NI_SSE_Sqrt: | ||
| assert(baseType == TYP_FLOAT); | ||
| assert(op2 == nullptr); | ||
26 changes: 25 additions & 1 deletionsrc/jit/hwintrinsicxarch.cpp
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
12 changes: 12 additions & 0 deletionssrc/jit/lowerxarch.cpp
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
21 changes: 21 additions & 0 deletionssrc/jit/lsraxarch.cpp
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.