Movatterモバイル変換

[0]ホーム

Jump to content

Advanced Vector Extensions

Edit links

From Wikipedia, the free encyclopedia

Wikibooks has a book on the topic of:X86 Assembly/AVX, AVX2, FMA3, FMA4

Instructions for the x86 microprocessors

Advanced Vector Extensions (AVX, also known asGesher New Instructions and thenSandy Bridge New Instructions) areSIMD extensions to thex86 instruction set architecture formicroprocessors fromIntel andAdvanced Micro Devices (AMD). They were proposed by Intel in March 2008 and first supported by Intel with theSandy Bridge^[1] microarchitecture shipping in Q1 2011 and later by AMD with theBulldozer^[2] microarchitecture shipping in Q4 2011. AVX provides new features, new instructions, and a new coding scheme.

AVX2 (also known asHaswell New Instructions) expands most integer commands to 256 bits and introduces new instructions. They were first supported by Intel with theHaswell microarchitecture, which shipped in 2013.

AVX-512 expands AVX to 512-bit support using a newEVEX prefix encoding proposed by Intel in July 2013 and first supported by Intel with theKnights Landing co-processor, which shipped in 2016.^[3]^[4] In conventional processors, AVX-512 was introduced withSkylake server and HEDT processors in 2017.

511256	255128	1270

ZMM0	YMM0	XMM0
ZMM1	YMM1	XMM1
ZMM2	YMM2	XMM2
ZMM3	YMM3	XMM3
ZMM4	YMM4	XMM4
ZMM5	YMM5	XMM5
ZMM6	YMM6	XMM6
ZMM7	YMM7	XMM7
ZMM8	YMM8	XMM8
ZMM9	YMM9	XMM9
ZMM10	YMM10	XMM10
ZMM11	YMM11	XMM11
ZMM12	YMM12	XMM12
ZMM13	YMM13	XMM13
ZMM14	YMM14	XMM14
ZMM15	YMM15	XMM15
ZMM16	YMM16	XMM16
ZMM17	YMM17	XMM17
ZMM18	YMM18	XMM18
ZMM19	YMM19	XMM19
ZMM20	YMM20	XMM20
ZMM21	YMM21	XMM21
ZMM22	YMM22	XMM22
ZMM23	YMM23	XMM23
ZMM24	YMM24	XMM24
ZMM25	YMM25	XMM25
ZMM26	YMM26	XMM26
ZMM27	YMM27	XMM27
ZMM28	YMM28	XMM28
ZMM29	YMM29	XMM29
ZMM30	YMM30	XMM30
ZMM31	YMM31	XMM31

Instruction	Description
`VBROADCASTSS`,`VBROADCASTSD`,`VBROADCASTF128`	Copy a 32-bit, 64-bit or 128-bit memory operand to all elements of a XMM or YMM vector register.
`VINSERTF128`	Replaces either the lower half or the upper half of a 256-bit YMM register with the value of a 128-bit source operand. The other half of the destination is unchanged.
`VEXTRACTF128`	Extracts either the lower half or the upper half of a 256-bit YMM register and copies the value to a 128-bit destination operand.
`VMASKMOVPS`,`VMASKMOVPD`	Conditionally reads any number of elements from a SIMD vector memory operand into a destination register, leaving the remaining vector elements unread and setting the corresponding elements in the destination register to zero. Alternatively, conditionally writes any number of elements from a SIMD vector register operand to a vector memory operand, leaving the remaining elements of the memory operand unchanged. On the AMD Jaguar processor architecture, this instruction with a memory source operand takes more than 300 clock cycles when the mask is zero, in which case the instruction should do nothing. This appears to be a design flaw.^[7]
`VPERMILPS`,`VPERMILPD`	Permute In-Lane. Shuffle the 32-bit or 64-bit vector elements of one input operand. These are in-lane 256-bit instructions, meaning that they operate on all 256 bits with two separate 128-bit shuffles, so they can not shuffle across the 128-bit lanes.^[8]
`VPERM2F128`	Shuffle the four 128-bit vector elements of two 256-bit source operands into a 256-bit destination operand, with an immediate constant as selector.
`VTESTPS`,`VTESTPD`	Packed bit test of the packed single-precision or double-precision floating-point sign bits, setting or clearing the ZF flag based on AND and CF flag based on ANDN.
`VZEROALL`	Set all YMM registers to zero and tag them as unused. Used when switching between 128-bit use and 256-bit use.
`VZEROUPPER`	Set the upper half of all YMM registers to zero. Used when switching between 128-bit use and 256-bit use.

Instruction	Description
`VBROADCASTSS`,`VBROADCASTSD`	Copy a 32-bit or 64-bit register operand to all elements of a XMM or YMM vector register. These are register versions of the same instructions in AVX1. There is no 128-bit version, but the same effect can be simply achieved using VINSERTF128.
`VPBROADCASTB`,`VPBROADCASTW`,`VPBROADCASTD`,`VPBROADCASTQ`	Copy an 8, 16, 32 or 64-bit integer register or memory operand to all elements of a XMM or YMM vector register.
`VBROADCASTI128`	Copy a 128-bit memory operand to all elements of a YMM vector register.
`VINSERTI128`	Replaces either the lower half or the upper half of a 256-bit YMM register with the value of a 128-bit source operand. The other half of the destination is unchanged.
`VEXTRACTI128`	Extracts either the lower half or the upper half of a 256-bit YMM register and copies the value to a 128-bit destination operand.
`VGATHERDPD`,`VGATHERQPD`,`VGATHERDPS`,`VGATHERQPS`	Gathers single- or double-precision floating-point values using either 32- or 64-bit indices and scale.
`VPGATHERDD`,`VPGATHERDQ`,`VPGATHERQD`,`VPGATHERQQ`	Gathers 32 or 64-bit integer values using either 32- or 64-bit indices and scale.
`VPMASKMOVD`,`VPMASKMOVQ`	Conditionally reads any number of elements from a SIMD vector memory operand into a destination register, leaving the remaining vector elements unread and setting the corresponding elements in the destination register to zero. Alternatively, conditionally writes any number of elements from a SIMD vector register operand to a vector memory operand, leaving the remaining elements of the memory operand unchanged.
`VPERMPS`,`VPERMD`	Shuffle the eight 32-bit vector elements of one 256-bit source operand into a 256-bit destination operand, with a register or memory operand as selector.
`VPERMPD`,`VPERMQ`	Shuffle the four 64-bit vector elements of one 256-bit source operand into a 256-bit destination operand, with a register or memory operand as selector.
`VPERM2I128`	Shuffle (two of) the four 128-bit vector elements oftwo 256-bit source operands into a 256-bit destination operand, with an immediate constant as selector.
`VPBLENDD`	Doubleword immediate version of the PBLEND instructions fromSSE4.
`VPSLLVD`,`VPSLLVQ`	Shift left logical. Allows variable shifts where each element is shifted according to the packed input.
`VPSRLVD`,`VPSRLVQ`	Shift right logical. Allows variable shifts where each element is shifted according to the packed input.
`VPSRAVD`	Shift right arithmetically. Allows variable shifts where each element is shifted according to the packed input.

Subset	F	ER	4FMAPS	VPOPCNTDQ	VL	IFMA	VBMI2	VNNI	BF16	VPCLMULQDQ	VP2INTERSECT	FP16
IntelKnights Landing (2016)	Yes	Yes	No
IntelKnights Mill (2017)		Yes	Yes		No
IntelSkylake-SP,Skylake-X (2017)		No		No	Yes	No
IntelCannon Lake (2018)						Yes	No
IntelCascade Lake-SP (2019)						No		Yes	No
IntelCooper Lake (2020)						No		Yes		No
IntelIce Lake (2019)				Yes					No	Yes	No
IntelTiger Lake (2020)											Yes	No
IntelRocket Lake (2021)											No
IntelAlder Lake (2021)	Partial^{Note 1}			Partial^{Note 1}
AMDZen 4 (2022)	Yes			Yes							No
IntelSapphire Rapids (2023)											No	Yes
AMDZen 5 (2024)											Yes	No

v t e Intel technology
Platforms	Centrino Centrino 2 Viiv MID Tablet CULV Ultrabook Skulltrail NUC Galileo Edison Curie Evo
Discontinued	Common Building Block MultiProcessor Specification Intel Communication Streaming Architecture Intel Inboard 386 Intel Play MMC-1 MMC-2
Current	Advanced Programmable Interrupt Controller CNVi Intel Turbo Boost vPro Intel Secure Key Intel Management Engine Active Management Technology AMT versions High-bandwidth Digital Content Protection High Definition Audio Hub Architecture Rapid Storage Technology SpeedStep Serial Digital Video Out Host Embedded Controller Interface Hyper-threading Omni-Path Platform Environment Control Interface QuickPath Interconnect Platform Controller Hub System Management Bus Thunderbolt Ultra Path Interconnect
Upcoming	Silicon Photonics Link

v t e Instruction set extensions
SIMD (RISC)	Alpha MVI ARM NEON SVE MIPS MDMX MIPS-3D MXU MIPS SIMD PA-RISC MAX Power ISA VMX SPARC VIS
SIMD (x86)	MMX (1996) 3DNow! (1998) SSE (1999) SSE2 (2001) SSE3 (2004) SSSE3 (2006) SSE4 (2006) SSE5~~(2007)~~ AVX (2008) F16C (2009) XOP (2009) FMA (FMA4: 2011, FMA3: 2012) AVX2 (2013) AVX-512 (2015) AMX (2022) AVX10 (2023)
Bit manipulation	BMI (ABM: 2007, BMI1: 2012, BMI2: 2013, TBM: 2012) ADX (2014)
Compressed instructions	Thumb MIPS16e ASE RVC
Security andcryptography	PadLock (2003) AES-NI (2008); ARMv8 also has AES instructions CLMUL (2010) RDRAND (2012) SHA (2013) MPX (2015) SGX (2015) TDX (2021)
Transactional memory	TSX (2013) ASF
Virtualization	VT-x (2005) AMD-V (2006) VT-d (AMD-Vi)
Suspended extensions' dates are~~struck through~~.

Movatterモバイル変換

Advanced Vector Extensions

New instructions

CPUs with AVX

Compiler and assembler support

Operating system support

Advanced Vector Extensions 2

New instructions

CPUs with AVX2

AVX-512

AVX-512 CPU compatibility table

Compilers supporting AVX-512

Assemblers supporting AVX-512

AVX-VNNI, AVX-IFMA

CPUs with AVX-VNNI

CPUs with AVX-IFMA

AVX10

APX

Applications

Software

Downclocking

See also

References

External links