Movatterモバイル変換


[0]ホーム

URL:


Jump to content
WikipediaThe Free Encyclopedia
Search

FMA instruction set

From Wikipedia, the free encyclopedia
(Redirected fromFMA3)
X86 instruction set extension developed by Intel
Wikibooks has a book on the topic of:X86 Assembly/AVX, AVX2, FMA3, FMA4

TheFMA instruction set is an extension to the 128- and 256-bitStreaming SIMD Extensions instructions in thex86microprocessorinstruction set to performfused multiply–add (FMA) operations.[1] There are two variants:

Instructions

[edit]

FMA3 and FMA4 instructions have almost identical functionality, but are not compatible. Both containfused multiply–add (FMA) instructions forfloating-point scalar andSIMD operations, but FMA3 instructions have three operands, while FMA4 ones have four. The FMA operation has the formd = round(a ·b +c), where the round function performs arounding to allow the result to fit within the destination register if there are too many significant bits to fit within the destination.

The four-operand form (FMA4) allowsa,b,c andd to be four different registers, while the three-operand form (FMA3) requires thatd be the same register asa,b orc. The three-operand form makes the code shorter and the hardware implementation slightly simpler, while the four-operand form provides more programming flexibility.

SeeXOP instruction set for more discussion of compatibility issues between Intel and AMD.

FMA3 instruction set

[edit]

CPUs with FMA3

[edit]
  • AMD
    • Piledriver (2012) and newer microarchitectures[3]
      • 2nd genAPUs, "Trinity" (32nm), May 15, 2012
      • 2nd gen "Bulldozer" (bdver2) with Piledriver cores, October 23, 2012
  • Intel

Excerpt from FMA3

[edit]

Supported commands include

MnemonicOperationMnemonicOperation
VFMADDresult = + a · b + cVFMADDSUBresult = a · b + c for i = 1, 3, ...
result = a · b − c for i = 0, 2, ...
VFNMADDresult = − a · b + c
VFMSUBresult = + a · b − cVFMSUBADDresult = a · b − c for i = 1, 3, ...
result = a · b + c for i = 0, 2, ...
VFNMSUBresult = − a · b − c
Note
  • VFNMADD isresult = − a · b + c, notresult = − (a · b + c).
  • VFNMSUB generates a −0 for all inputs are zero.

Explicit order of operands is included in the mnemonic using numbers "132", "213", and "231":

Postfix
1
Operationpossible
memory operand
overwrites
132a = a · c + bc (factor)a (other factor)
213a = b · a + cc (summand)a (factor)
231a = b · c + ac (factor)a (summand)

as well as operand format (packed or scalar) and size (single or double).

Postfix
2
precisionsizePostfix
2
precisionsize
SSSingle32 bitSDDouble64 bit
PSx4× 32 bitPDx2× 64 bit
PSy8× 32 bitPDy4× 64 bit
PSz16× 32 bitPDz8× 64 bit

This results in

EncodingMnemonicOperandsOperation
VEX.256.66.0F38.W1 98 /rVFMADD132PDyymm, ymm, ymm/m256a = a · c + b
VEX.256.66.0F38.W0 98 /rVFMADD132PSy
VEX.128.66.0F38.W1 98 /rVFMADD132PDxxmm, xmm, xmm/m128
VEX.128.66.0F38.W0 98 /rVFMADD132PSx
VEX.LIG.66.0F38.W1 99 /rVFMADD132SDxmm, xmm, xmm/m64
VEX.LIG.66.0F38.W0 99 /rVFMADD132SSxmm, xmm, xmm/m32
VEX.256.66.0F38.W1 A8 /rVFMADD213PDyymm, ymm, ymm/m256a = b · a + c
VEX.256.66.0F38.W0 A8 /rVFMADD213PSy
VEX.128.66.0F38.W1 A8 /rVFMADD213PDxxmm, xmm, xmm/m128
VEX.128.66.0F38.W0 A8 /rVFMADD213PSx
VEX.LIG.66.0F38.W1 A9 /rVFMADD213SDxmm, xmm, xmm/m64
VEX.LIG.66.0F38.W0 A9 /rVFMADD213SSxmm, xmm, xmm/m32
VEX.256.66.0F38.W1 B8 /rVFMADD231PDyymm, ymm, ymm/m256a = b · c + a
VEX.256.66.0F38.W0 B8 /rVFMADD231PSy
VEX.128.66.0F38.W1 B8 /rVFMADD231PDxxmm, xmm, xmm/m128
VEX.128.66.0F38.W0 B8 /rVFMADD231PSx
VEX.LIG.66.0F38.W1 B9 /rVFMADD231SDxmm, xmm, xmm/m64
VEX.LIG.66.0F38.W0 B9 /rVFMADD231SSxmm, xmm, xmm/m32

FMA4 instruction set

[edit]

CPUs with FMA4

[edit]
  • AMD
    • "Heavy Equipment" processors
    • Zen: WikiChip's testing shows FMA4 still appears to work (under the conditions of the tests) despite not being officially supported and not even reported by CPUID. This has also been confirmed by Agner Fog.[8] But other tests gave wrong results.[9] AMD Official Web Site FMA4 Support Note ZEN CPUs = AMD ThreadRipper 1900x, R7 Pro 1800, 1700, R5 Pro 1600, 1500, R3 Pro 1300, 1200, R3 2200G, R5 2400G.[10][11][12]
  • Intel
    • Intel has not released CPUs with support for FMA4.

Excerpt from FMA4

[edit]
Mnemonic (AT&T)OperandsOperation
VFMADDPDxxmm, xmm, xmm/m128, xmm/m128a = b·c + d
VFMADDPDyymm, ymm, ymm/m256, ymm/m256
VFMADDPSxxmm, xmm, xmm/m128, xmm/m128
VFMADDPSyymm, ymm, ymm/m256, ymm/m256
VFMADDSDxmm, xmm, xmm/m64, xmm/m64
VFMADDSSxmm, xmm, xmm/m32, xmm/m32

History

[edit]

The incompatibility between Intel's FMA3 and AMD's FMA4 is due to both companies changing plans without coordinating coding details with each other. AMD changed their plans from FMA3 to FMA4 while Intel changed their plans from FMA4 to FMA3 almost at the same time. The history can be summarized as follows:

  • August 2007:AMD announces theSSE5 instruction set, which includes 3-operand FMA instructions. A new coding scheme (DREX) is introduced for allowing instructions to have three operands.[13]
  • April 2008:Intel announces theirAVX and FMA instruction sets, including 4-operand FMA instructions. The coding of these instructions uses the newVEX coding scheme,[14] which is more flexible than AMD's DREX scheme.
  • December 2008: Intel changes the specification for their FMA instructions from 4-operand to 3-operand instructions. The VEX coding scheme is still used.[15]
  • May 2009: AMD changes the specification of their FMA instructions from the 3-operand DREX form to the 4-operand VEX form, compatible with the April 2008 Intel specification rather than the December 2008 Intel specification.[16]
  • October 2011: AMDBulldozer processor supports FMA4.[17]
  • January 2012: AMD announces FMA3 support in future processors codenamed Trinity and Vishera; they are based on thePiledriver architecture.[18]
  • May 2012: AMD Piledriver processor supports both FMA3 and FMA4.[17]
  • June 2013: IntelHaswell processor supports FMA3.[19]
  • February 2017: The first generation of AMDRyzen processors officially supports FMA3, but not FMA4 according to theCPUID instruction.[2] There has been confusion regarding whether FMA4 was implemented or not on this processor due to errata in the initial patch to theGNU Binutils package that has since been rectified.[20][21] One unconfirmed report of wrong results[9] led to some doubt, but Mysticial (Alexander Yee, developer of y-cruncher) debunked it:[22] FMA4 worked for bit-exact bignum calculations on his Zen 1 system for years, and the one report on Reddit never had any followup investigation to rule out mistakes in the testing software before being widely repeated. The initial Ryzen CPUs could be crashed by a particular sequence of FMA3 instructions, but updated CPU microcode fixes the problem.[23]
  • July 2019: AMDZen 2 and later Ryzen processors don't support FMA4 at all.[24] They continue to support FMA3. Only Zen 1 and Zen+ have unofficial FMA4 support.

Compiler and assembler support

[edit]

Different compilers provide different levels of support for FMA:

References

[edit]
  1. ^Woltmann, George (Prime95)."Intel AVX and GIMPS".mersenneforum.org. Great Internet Mersenne Prime Search (GIMPS) project. Retrieved27 July 2011.FMA3 and FMA4 are not instruction sets, they are individual instructions -- fused multiply add. They could be quite useful depending on how Intel and AMD implement them{{cite web}}: CS1 maint: numeric names: authors list (link)
  2. ^ab"The microarchitecture of Intel, AMD and VIA CPUs An optimization guide for assembly programmers and compiler makers"(PDF). Retrieved2017-05-02.
  3. ^Maffeo, Robin (March 1, 2012)."AMD and the Visual Studio 11 Beta". AMD. Archived fromthe original on November 9, 2013. Retrieved2018-11-07.
  4. ^"CPU-Z - ID : y5z6gq". Retrieved2022-05-01.
  5. ^"CPU-Z - ID : kr2mlx". Retrieved2022-05-01.
  6. ^"AMD64 Architecture Programmer's Manual Volume 6: 128-Bit and 256-Bit XOP, FMA4 and CVT16 Instructions"(PDF).AMD. May 1, 2009.
  7. ^"New "Bulldozer" and "Piledriver" Instructions A step forward for high performance software development"(PDF).AMD. October 2012.
  8. ^"Agner's CPU blog - Test results for AMD Ryzen". 2017-05-02.
  9. ^ab"Discussion – Ryzen has undocumented support for FMA4". Retrieved2017-05-10.
  10. ^"www.amd.com, FMA4 support model list".
  11. ^"www.amd.com, FMA4 support model list".
  12. ^"www.amd.com, FMA4 support model list".
  13. ^"128-Bit SSE5 Instruction Set".AMD Developer Central. Archived fromthe original on 2008-01-15. Retrieved2008-01-28.
  14. ^"Intel Advanced Vector Extensions Programming Reference"(PDF).Intel. Retrieved2008-04-05.[permanent dead link]
  15. ^"Intel Advanced Vector Extensions Programming Reference".Intel. Retrieved2009-05-06.
  16. ^"Striking a balance". Dave Christie, AMD Developer blogs. May 6, 2009. Archived fromthe original on July 8, 2012. Retrieved2018-11-07.
  17. ^ab"New Bulldozer and Piledriver Instructions"(PDF). AMD. Retrieved25 July 2013.
  18. ^"Software Optimization Guide for AMD Family 15h Processors"(PDF). AMD. Retrieved19 April 2012.
  19. ^"Intel Architecture Instruction Set Extensions Programming Reference"(PDF). Intel. Retrieved25 July 2013.
  20. ^Gopalasubramanian, Ganesh (2015-03-10)."[PATCH] add znver1 processor". Retrieved2022-05-01.
  21. ^Pawar, Amit (2015-08-07)."[PATCH] Remove CpuFMA4 from Znver1 CPU Flags". Retrieved2022-05-01.
  22. ^"Stack Overflow comment by Mysticial". 2019-07-16. Archived from the original on 2019-08-22. Retrieved2023-09-01.{{cite web}}: CS1 maint: bot: original URL status unknown (link)
  23. ^"AMD Ryzen Machine Crashes to a Sequence of FMA3 Instructions". 16 March 2017. Retrieved2017-09-10.
  24. ^"Stack Overflow comment by Mysticial". 2019-07-16. Retrieved2023-09-01.
  25. ^abLatif, Lawrence (Nov 14, 2011)."AMD Bulldozer only FMA4 and XOP instructions are supported by GCC Intel still mute".The Inquirer. Archived from the original on November 17, 2011.
  26. ^"FMA4 Intrinsics Added for Visual Studio 2010 SP1". 4 February 2013.
  27. ^"EKOPath man doc". Archived fromthe original on 2016-06-23. Retrieved2013-07-24.
  28. ^"LLVM 3.1 Release Notes".
  29. ^"Enable detection of AVX and AVX2 support through CPUID".LLVM. 2012-04-26.
AMD technology
Software
Platforms
Current
Obsolete
Technology
Instructions
Intel technology
Platforms
Discontinued
Current
Upcoming
SIMD (RISC)
SIMD (x86)
Bit manipulation
  • BMI (ABM: 2007, BMI1: 2012, BMI2: 2013, TBM: 2012)
  • ADX (2014)
Compressed instructions
Security andcryptography
Transactional memory
Virtualization
Suspended extensions' dates arestruck through.
Retrieved from "https://en.wikipedia.org/w/index.php?title=FMA_instruction_set&oldid=1278654648"
Categories:
Hidden categories:

[8]ページ先頭

©2009-2025 Movatter.jp