Incomputing, especiallydigital signal processing, themultiply–accumulate (MAC) ormultiply–add (MAD) operation is a common step that computes the product of two numbers and adds that product to anaccumulator. The hardware unit that performs the operation is known as amultiplier–accumulator (MAC unit); the operation itself is also often called a MAC or a MAD operation. The MAC operation modifies an accumulatora:
When done withfloating-point numbers, it might be performed with tworoundings (typical in manyDSPs), or with a single rounding. When performed with a single rounding, it is called afused multiply–add (FMA) orfused multiply–accumulate (FMAC).
Modern computers may contain a dedicated MAC, consisting of a multiplier implemented incombinational logic followed by anadder and an accumulator register that stores the result. The output of the register is fed back to one input of the adder, so that on each clock cycle, the output of the multiplier is added to the register. Combinational multipliers require a large amount of logic, but can compute a product much more quickly than themethod of shifting and adding typical of earlier computers.Percy Ludgate was the first to conceive a MAC in his Analytical Machine of 1909,[1] and the first to exploit a MAC for division (using multiplication seeded by reciprocal, via the convergent series(1+x)−1). The first modern processors to be equipped with MAC units weredigital signal processors, but the technique is now also common in general-purpose processors.[2][3][4][5]
When done withintegers, the operation is typically exact (computedmodulo somepower of two). However,floating-point numbers have only a certain amount of mathematicalprecision. That is, digital floating-point arithmetic is generally notassociative ordistributive. (SeeFloating-point arithmetic § Accuracy problems.)Therefore, it makes a difference to the result whether the multiply–add is performed with two roundings, or in one operation with a single rounding (a fused multiply–add).IEEE 754-2008 specifies that it must be performed with one rounding, yielding a more accurate result.[6]
Afused multiply–add (FMA orfmadd)[7]is a floating-point multiply–add operation performed in one step (fused operation), with a single rounding. That is, where an unfused multiply–add would compute the productb ×c, round it toN significant bits, add the result toa, and round back toN significant bits, a fused multiply–add would compute the entire expressiona + (b ×c) to its full precision before rounding the final result down toN significant bits.
A fast FMA can speed up and improve the accuracy of many computations that involve the accumulation of products:
Fused multiply–add can usually be relied on to give more accurate results. However,William Kahan has pointed out that it can give problems if used unthinkingly.[8] Ifx2 −y2 is evaluated as ((x ×x) −y ×y) (following Kahan's suggested notation in which redundant parentheses direct the compiler to round the(x ×x) term first) using fused multiply–add, then the result may be negative even whenx =y due to the first multiplication discarding low significance bits. This could then lead to an error if, for instance, the square root of the result is then evaluated.
When implemented inside amicroprocessor, an FMA can be faster than a multiply operation followed by an add. However, standard industrial implementations based on the original IBM RS/6000 design require a 2N-bit adder to compute the sum properly.[9]
Another benefit of including this instruction is that it allows an efficient software implementation ofdivision (seedivision algorithm) andsquare root (seemethods of computing square roots) operations, thus eliminating the need for dedicated hardware for those operations.[10]
Some machines combine multiple fused multiply add operations into a single step, e.g. performing a four-element dot-product on two 128-bitSIMD registersa0×b0 + a1×b1 + a2×b2 + a3×b3
with single cycle throughput.
The FMA operation is included inIEEE 754-2008.
The1999 standard of theC programming language supports the FMA operation through thefma()
standard math library function and the automatic transformation of a multiplication followed by an addition (contraction of floating-point expressions), which can be explicitly enabled or disabled with standard pragmas (#pragma STDC FP_CONTRACT
). TheGCC andClang C compilers do such transformations by default for processor architectures that support FMA instructions. With GCC, which does not support the aforementioned pragma,[11] this can be globally controlled by the-ffp-contract
command line option.[12]
The fused multiply–add operation was introduced as "multiply–add fused" in the IBMPOWER1 (1990) processor,[13] but has been added to numerous processors: