Movatterモバイル変換


[0]ホーム

URL:


US20210132950A1 - Bit shuffle processors, methods, systems, and instructions - Google Patents

Bit shuffle processors, methods, systems, and instructions
Download PDF

Info

Publication number
US20210132950A1
US20210132950A1US16/928,501US202016928501AUS2021132950A1US 20210132950 A1US20210132950 A1US 20210132950A1US 202016928501 AUS202016928501 AUS 202016928501AUS 2021132950 A1US2021132950 A1US 2021132950A1
Authority
US
United States
Prior art keywords
bit
bits
instruction
lane
operand
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
US16/928,501
Inventor
Roger Espasa
Guillem Sole
David GUILLEN FANDOS
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Intel Corp
Original Assignee
Intel Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Intel CorpfiledCriticalIntel Corp
Priority to US16/928,501priorityCriticalpatent/US20210132950A1/en
Publication of US20210132950A1publicationCriticalpatent/US20210132950A1/en
Pendinglegal-statusCriticalCurrent

Links

Images

Classifications

Definitions

Landscapes

Abstract

A processor includes packed data registers and a decode unit to decode an instruction. The instruction is to indicate a first source operand having at least one lane of bits, and a second source packed data operand having a number of sub-lane sized bit selection elements. An execution unit is coupled with the packed data registers and the decode unit. The execution unit, in response to the instruction, stores a result operand in a destination storage location. The result operand includes, a different corresponding bit for each of the number of sub-lane sized bit selection elements. A value of each bit of the result operand corresponding to a sub-lane sized bit selection element is that of a bit of a corresponding lane of bits, of the at least one lane of bits of the first source operand, which is indicated by the corresponding sub-lane sized bit selection element.

Description

Claims (20)

What is claimed is:
1. A processor comprising:
a plurality of packed data registers;
a decode unit to decode an instruction, the instruction to indicate a first source operand that is to have at least one lane of bits, and the instruction to indicate a second source packed data operand that is to have a number of sub-lane sized bit selection elements; and
an execution unit coupled with the packed data registers and the decode unit, the execution unit, in response to the instruction, to store a result operand in a destination storage location that is to be indicated by the instruction, the result operand to include, a different corresponding bit for each of the number of sub-lane sized bit selection elements, a value of each bit of the result operand corresponding to a sub-lane sized bit selection element to be that of a bit of a corresponding lane of bits, of the at least one lane of bits of the first source operand, which is indicated by the corresponding sub-lane sized bit selection element.
2. The processor ofclaim 1, wherein the number of sub-lane sized bit selection elements include a plurality of subsets that each correspond to a different one of a plurality of lanes of bits, and wherein the execution unit, in response to the instruction, is to use each subset of the sub-lane sized bit selection elements to select bits from within only a corresponding lane of bits.
3. The processor ofclaim 2, wherein the execution unit, in response to the instruction, is to store the result operand in a packed data register having the plurality of lanes of bits.
4. The processor ofclaim 3, wherein the execution unit, in response to the instruction, is to store the bits selected by each subset of the sub-lane sized bit selection elements in a corresponding lane of bits of the packed data register.
5. The processor ofclaim 4, wherein the execution unit, in response to the instruction, is to store at least one replica of the bits selected by each subset of the sub-lane sized bit selection elements in the corresponding lane of bits of the packed data register.
6. The processor ofclaim 5, wherein the decode unit is to decode the instruction that is to indicate a source predicate mask operand, and wherein the execution unit, in response to the instruction, is to use the source predicate mask operand to predicate storage of the bits selected by each subset of the sub-lane sized bit selection elements and replicas thereof in the corresponding lane of bits of the packed data register.
7. The processor ofclaim 1, wherein each sub-lane sized bit selection element corresponds to a bit of the result operand in a same relative position, and wherein the second source packed data operand has at least sixteen sub-lane sized bit selection elements.
8. The processor ofclaim 1, wherein the execution unit, in response to the instruction, is to store the result operand in the destination storage location which is a packed data operation mask register.
9. The processor ofclaim 1, wherein the execution unit, in response to the instruction, is to store the result operand in the destination storage location which is a general-purpose register.
10. The processor ofclaim 1, wherein the decode unit is to decode the instruction that is to indicate the first source operand that is to have a single lane of bits, wherein all of the number of sub-lane sized bit selection elements are to correspond to the single lane of bits, and wherein the execution unit, in response to the instruction, is to store a bit of the single lane of bits to the result operand for each of the number of sub-lane sized bit selection elements.
11. The processor ofclaim 1, wherein the decode unit is to decode the instruction that is to indicate the first source operand is to have a plurality of lanes of bits.
12. The processor ofclaim 1, wherein the decode unit is to decode the instruction that is to indicate the first source operand that is to have a single lane of bits, and wherein the processor, in response to the instruction, is to replicate the single lane of bits of the first source operand a plurality of times to create a plurality of lanes of bits.
13. The processor ofclaim 1, wherein the decode unit is to decode the instruction that is to indicate the first source operand that is to have at least one 64-bit lane of bits, and is to indicate the second source packed data operand that is to have the number of at least 6-bit sized bit selection elements.
14. The processor ofclaim 13, wherein each at least 6-bit bit selection element is in a different corresponding 8-bit byte of the second source packed data operand, and wherein the second source packed data operand has at least sixteen bit selection elements.
15. The processor ofclaim 1, wherein the decode unit is to decode the instruction that is to indicate the second source packed data operand that is to have a same number of sub-lane sized bit selection elements as a number of bits in each of the at least one lane of bits of the first source operand.
16. A method in a processor comprising:
receiving an instruction, the instruction indicating a first source operand having at least one lane of bits, and the instruction indicating a second source packed data operand having a number of sub-lane sized bit selection elements; and
storing a result operand in a destination storage location indicated by the instruction in response to the instruction, the result operand including a different corresponding bit for each of the number of sub-lane sized bit selection elements, a value of each bit of the result operand that corresponds to a sub-lane sized bit selection element being that of a bit of a corresponding lane of bits, of the at least one lane of bits of the first source operand, indicated by the corresponding sub-lane sized bit selection element.
17. The method ofclaim 16, wherein storing comprises storing the result operand in the destination storage location which is a predicate mask register, and wherein each bit of the result operand corresponds to a sub-lane sized bit selection element in a same relative position.
18. The method ofclaim 16, wherein receiving comprises receiving the instruction indicating the second source packed data operand having the number of sub-lane sized bit selection elements including a plurality of subsets that each correspond to a different one of a plurality of lanes of bits, and further comprising using each subset of the sub-lane sized bit selection elements to select bits from within only a corresponding lane of bits.
19. The method ofclaim 16, wherein storing comprises storing the result operand in a packed data register having a plurality of lanes of bits, and wherein a lane of bits of the result operand includes the bits selected by the corresponding subset of the sub-lane sized bit selection elements as well as a plurality of replicas of the bits selected by the corresponding subset.
20. A system to process instructions comprising:
an interconnect;
a processor coupled with the interconnect, the processor to receive an instruction that is to indicate a first source operand that is to have at least one lane of bits, to indicate a second source packed data operand that is to have a number of sub-lane sized bit selection elements, and to indicate a destination storage location, the processor, in response to the instruction, to store a result operand in the destination storage location, the result operand to include a different corresponding bit for each of the number of sub-lane sized bit selection elements, a value of each bit of the result operand corresponding to a sub-lane sized bit selection element to be that of a bit of a corresponding lane of bits, of the at least one lane of bits of the first source operand, which is indicated by the corresponding sub-lane sized bit selection element; and
a dynamic random access memory (DRAM) coupled with the interconnect.
US16/928,5012014-09-252020-07-14Bit shuffle processors, methods, systems, and instructionsPendingUS20210132950A1 (en)

Priority Applications (1)

Application NumberPriority DateFiling DateTitle
US16/928,501US20210132950A1 (en)2014-09-252020-07-14Bit shuffle processors, methods, systems, and instructions

Applications Claiming Priority (5)

Application NumberPriority DateFiling DateTitle
EP14382361.52014-09-25
EP14382361.5AEP3001307B1 (en)2014-09-252014-09-25Bit shuffle processors, methods, systems, and instructions
PCT/US2015/048627WO2016048631A1 (en)2014-09-252015-09-04Bit shuffle processors, methods, systems, and instructions
US201715508284A2017-03-022017-03-02
US16/928,501US20210132950A1 (en)2014-09-252020-07-14Bit shuffle processors, methods, systems, and instructions

Related Parent Applications (2)

Application NumberTitlePriority DateFiling Date
US15/508,284ContinuationUS10713044B2 (en)2014-09-252015-09-04Bit shuffle processors, methods, systems, and instructions
PCT/US2015/048627ContinuationWO2016048631A1 (en)2014-09-252015-09-04Bit shuffle processors, methods, systems, and instructions

Publications (1)

Publication NumberPublication Date
US20210132950A1true US20210132950A1 (en)2021-05-06

Family

ID=51730476

Family Applications (2)

Application NumberTitlePriority DateFiling Date
US15/508,284ActiveUS10713044B2 (en)2014-09-252015-09-04Bit shuffle processors, methods, systems, and instructions
US16/928,501PendingUS20210132950A1 (en)2014-09-252020-07-14Bit shuffle processors, methods, systems, and instructions

Family Applications Before (1)

Application NumberTitlePriority DateFiling Date
US15/508,284ActiveUS10713044B2 (en)2014-09-252015-09-04Bit shuffle processors, methods, systems, and instructions

Country Status (7)

CountryLink
US (2)US10713044B2 (en)
EP (1)EP3001307B1 (en)
JP (1)JP6526175B2 (en)
KR (2)KR102296800B1 (en)
CN (2)CN106575217B (en)
TW (1)TWI556165B (en)
WO (1)WO2016048631A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US12079627B1 (en)*2023-03-232024-09-03Qualcomm IncorporatedPredicated compare-exchange-shuffle instruction for parallel processor

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US9513906B2 (en)2013-01-232016-12-06International Business Machines CorporationVector checksum instruction
US9804840B2 (en)2013-01-232017-10-31International Business Machines CorporationVector Galois Field Multiply Sum and Accumulate instruction
US9471308B2 (en)*2013-01-232016-10-18International Business Machines CorporationVector floating point test data class immediate instruction
EP3001307B1 (en)2014-09-252019-11-13Intel CorporationBit shuffle processors, methods, systems, and instructions
US10296489B2 (en)*2014-12-272019-05-21Intel CorporationMethod and apparatus for performing a vector bit shuffle
KR102659495B1 (en)*2016-12-022024-04-22삼성전자주식회사Vector processor and control methods thererof
US9959247B1 (en)2017-02-172018-05-01Google LlcPermuting in a matrix-vector processor
CN110312993B (en)*2017-02-232024-04-19Arm有限公司 Vector element-by-element operations in data processing devices
US10191740B2 (en)*2017-02-282019-01-29Intel CorporationDeinterleave strided data elements processors, methods, systems, and instructions
KR102503176B1 (en)*2018-03-132023-02-24삼성디스플레이 주식회사Data transmitting system and display apparatus including the same method of transmitting data using the same
US11789734B2 (en)*2018-08-302023-10-17Advanced Micro Devices, Inc.Padded vectorization with compile time known masks
US10818359B2 (en)2018-12-212020-10-27Micron Technology, Inc.Apparatuses and methods for organizing data in a memory device
US10838732B2 (en)2018-12-212020-11-17Micron Technology, Inc.Apparatuses and methods for ordering bits in a memory device
CN112650496B (en)*2019-10-092024-04-26安徽寒武纪信息科技有限公司Shuffling method and computing device
FR3101980B1 (en)2019-10-112021-12-10St Microelectronics Grenoble 2 Processor
CN114297138B (en)2021-12-102023-12-26龙芯中科技术股份有限公司Vector shuffling method, processor and electronic equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US6052769A (en)*1998-03-312000-04-18Intel CorporationMethod and apparatus for moving select non-contiguous bytes of packed data in a single instruction
US20040133617A1 (en)*2001-10-292004-07-08Yen-Kuang ChenMethod and apparatus for computing matrix transformations
US20140149713A1 (en)*2011-12-232014-05-29Ashish JhaMulti-register gather instruction

Family Cites Families (41)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
GB9509987D0 (en)1995-05-171995-07-12Sgs Thomson MicroelectronicsManipulation of data
JPH11203106A (en)1998-01-131999-07-30Hitachi Ltd Processor
US6041404A (en)1998-03-312000-03-21Intel CorporationDual function system and method for shuffling packed data elements
GB2352536A (en)1999-07-212001-01-31Element 14 LtdConditional instruction execution
US6493819B1 (en)*1999-11-162002-12-10Advanced Micro Devices, Inc.Merging narrow register for resolution of data dependencies when updating a portion of a register in a microprocessor
US6922472B2 (en)*2000-05-052005-07-26Teleputers, LlcMethod and system for performing permutations using permutation instructions based on butterfly networks
JP2003533829A (en)*2000-05-052003-11-11リー,ルビー,ビー. Method and system for performing permutations using improved omega and flip stage based permutation instructions
US7174014B2 (en)*2000-05-052007-02-06Teleputers, LlcMethod and system for performing permutations with bit permutation instructions
US7155601B2 (en)2001-02-142006-12-26Intel CorporationMulti-element operand sub-portion shuffle instruction execution
US6986025B2 (en)*2001-06-112006-01-10Broadcom CorporationConditional execution per lane
US20040054877A1 (en)*2001-10-292004-03-18Macy William W.Method and apparatus for shuffling data
US7685212B2 (en)2001-10-292010-03-23Intel CorporationFast full search motion estimation with SIMD merge instruction
GB2409064B (en)*2003-12-092006-09-13Advanced Risc Mach LtdA data processing apparatus and method for performing in parallel a data processing operation on data elements
GB2409066B (en)*2003-12-092006-09-27Advanced Risc Mach LtdA data processing apparatus and method for moving data between registers and memory
JP2005218055A (en)2004-02-022005-08-11Toshiba Corp Image processing apparatus, image processing method, and image processing program
US8078836B2 (en)*2007-12-302011-12-13Intel CorporationVector shuffle instructions operating on multiple lanes each having a plurality of data elements using a common set of per-lane control bits
GB2456775B (en)*2008-01-222012-10-31Advanced Risc Mach LtdApparatus and method for performing permutation operations on data
JP2009282744A (en)2008-05-222009-12-03Toshiba CorpComputing unit and semiconductor integrated circuit device
US9086872B2 (en)2009-06-302015-07-21Intel CorporationUnpacking packed data in multiple lanes
US9747105B2 (en)*2009-12-172017-08-29Intel CorporationMethod and apparatus for performing a shift and exclusive or operation in a single instruction
US9003170B2 (en)2009-12-222015-04-07Intel CorporationBit range isolation instructions, methods, and apparatus
CN102253824A (en)2010-05-182011-11-23江苏芯动神州科技有限公司Method for shuffling byte nepit data
US20120254588A1 (en)*2011-04-012012-10-04Jesus Corbal San AdrianSystems, apparatuses, and methods for blending two source operands into a single destination using a writemask
JP2013057872A (en)2011-09-092013-03-28Micro Uintekku KkElectromagnetic driving device
JP2013060328A (en)2011-09-142013-04-04Sumitomo Electric Ind LtdMethod for manufacturing silicon carbide crystal
WO2013057872A1 (en)*2011-10-182013-04-25パナソニック株式会社Shuffle pattern generating circuit, processor, shuffle pattern generating method, and instruction
US9513918B2 (en)2011-12-222016-12-06Intel CorporationApparatus and method for performing permute operations
US20160041827A1 (en)*2011-12-232016-02-11Jesus CorbalInstructions for merging mask patterns
CN104025040B (en)*2011-12-232017-11-21英特尔公司 Apparatus and method for shuffling floating point or integer values
WO2013095609A1 (en)*2011-12-232013-06-27Intel CorporationSystems, apparatuses, and methods for performing conversion of a mask register into a vector register
CN104137054A (en)*2011-12-232014-11-05英特尔公司Systems, apparatuses, and methods for performing conversion of a list of index values into a mask value
CN104067224B (en)*2011-12-232017-05-17英特尔公司 Instruction execution that broadcasts and masks data values at different levels of granularity
US9218182B2 (en)*2012-06-292015-12-22Intel CorporationSystems, apparatuses, and methods for performing a shuffle and operation (shuffle-op)
US9342479B2 (en)*2012-08-232016-05-17Qualcomm IncorporatedSystems and methods of data extraction in a vector processor
US9557993B2 (en)*2012-10-232017-01-31Analog Devices GlobalProcessor architecture and method for simplifying programming single instruction, multiple data within a register
EP2728491A1 (en)*2012-10-312014-05-07MStar Semiconductor, IncStream Data Processor
US20140281418A1 (en)*2013-03-142014-09-18Shihjong J. KuoMultiple Data Element-To-Multiple Data Element Comparison Processors, Methods, Systems, and Instructions
US9244684B2 (en)*2013-03-152016-01-26Intel CorporationLimited range vector memory access instructions, processors, methods, and systems
CN105653499B (en)*2013-03-152019-01-01甲骨文国际公司Hardware-efficient for single instruction multidata processor instructs
US9411593B2 (en)2013-03-152016-08-09Intel CorporationProcessors, methods, systems, and instructions to consolidate unmasked elements of operation masks
EP3001307B1 (en)2014-09-252019-11-13Intel CorporationBit shuffle processors, methods, systems, and instructions

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US6052769A (en)*1998-03-312000-04-18Intel CorporationMethod and apparatus for moving select non-contiguous bytes of packed data in a single instruction
US20040133617A1 (en)*2001-10-292004-07-08Yen-Kuang ChenMethod and apparatus for computing matrix transformations
US20140149713A1 (en)*2011-12-232014-05-29Ashish JhaMulti-register gather instruction

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Lee et al. (Efficient Permutation Instructions for Fast Software Cryptography, Dec 2001, pgs. 56-69) (Year: 2001)*
McGregor et al. (Architectural Techniques for Accelerating Subword Permutations With Repetitions, June 2003, pgs. 325-335) (Year: 2003)*

Cited By (1)

* Cited by examiner, † Cited by third party
Publication numberPriority datePublication dateAssigneeTitle
US12079627B1 (en)*2023-03-232024-09-03Qualcomm IncorporatedPredicated compare-exchange-shuffle instruction for parallel processor

Also Published As

Publication numberPublication date
KR102296800B1 (en)2021-09-02
TW201631468A (en)2016-09-01
US10713044B2 (en)2020-07-14
CN114020328A (en)2022-02-08
KR20210111866A (en)2021-09-13
JP2017529601A (en)2017-10-05
CN106575217B (en)2024-01-30
US20170286112A1 (en)2017-10-05
EP3001307B1 (en)2019-11-13
EP3001307A1 (en)2016-03-30
CN106575217A (en)2017-04-19
KR102354842B1 (en)2022-01-24
JP6526175B2 (en)2019-06-05
KR20170033890A (en)2017-03-27
WO2016048631A1 (en)2016-03-31
TWI556165B (en)2016-11-01

Similar Documents

PublicationPublication DateTitle
US20210132950A1 (en)Bit shuffle processors, methods, systems, and instructions
US10324718B2 (en)Packed rotate processors, methods, systems, and instructions
US9552205B2 (en)Vector indexed memory access plus arithmetic and/or logical operation processors, methods, systems, and instructions
US9448795B2 (en)Limited range vector memory access instructions, processors, methods, and systems
US9785433B2 (en)Three source operand floating-point addition instruction with operand negation bits and intermediate and final result rounding
US10565283B2 (en)Processors, methods, systems, and instructions to generate sequences of consecutive integers in numerical order
US10866807B2 (en)Processors, methods, systems, and instructions to generate sequences of integers in numerical order that differ by a constant stride
US20170308383A1 (en)Bit group interleave processors, methods, systems, and instructions
US10223119B2 (en)Processors, methods, systems, and instructions to store source elements to corresponding unmasked result elements with propagation to masked result elements
US20130275727A1 (en)Processors, Methods, Systems, and Instructions to Generate Sequences of Integers in which Integers in Consecutive Positions Differ by a Constant Integer Stride and Where a Smallest Integer is Offset from Zero by an Integer Offset
US11204764B2 (en)Processors, methods, systems, and instructions to Partition a source packed data into lanes
US10891131B2 (en)Processors, methods, systems, and instructions to consolidate data elements and generate index updates
US10732970B2 (en)Processors, methods, systems, and instructions to generate sequences of integers in which integers in consecutive positions differ by a constant integer stride and where a smallest integer is offset from zero by an integer offset
EP3123301A1 (en)Processors, methods, systems, and instructions to store consecutive source elements to unmasked result elements with propagation to masked result elements
US20160283242A1 (en)Apparatus and method for vector horizontal logical instruction
US20190102187A1 (en)Processors, Methods, Systems, and Instructions to Generate Sequences of Integers in which Integers in Consecutive Positions Differ by a Constant Integer Stride and Where a Smallest Integer is Offset from Zero by an Integer Offset

Legal Events

DateCodeTitleDescription
STPPInformation on status: patent application and granting procedure in general

Free format text:APPLICATION DISPATCHED FROM PREEXAM, NOT YET DOCKETED

STPPInformation on status: patent application and granting procedure in general

Free format text:DOCKETED NEW CASE - READY FOR EXAMINATION

STPPInformation on status: patent application and granting procedure in general

Free format text:NON FINAL ACTION MAILED

STPPInformation on status: patent application and granting procedure in general

Free format text:RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER

STPPInformation on status: patent application and granting procedure in general

Free format text:FINAL REJECTION MAILED

STPPInformation on status: patent application and granting procedure in general

Free format text:DOCKETED NEW CASE - READY FOR EXAMINATION

STPPInformation on status: patent application and granting procedure in general

Free format text:NON FINAL ACTION MAILED

STPPInformation on status: patent application and granting procedure in general

Free format text:RESPONSE TO NON-FINAL OFFICE ACTION ENTERED AND FORWARDED TO EXAMINER

STPPInformation on status: patent application and granting procedure in general

Free format text:FINAL REJECTION MAILED

STCVInformation on status: appeal procedure

Free format text:NOTICE OF APPEAL FILED


[8]ページ先頭

©2009-2025 Movatter.jp