Movatterモバイル変換

[0]ホーム

Jump to content

x87

Edit links

From Wikipedia, the free encyclopedia

(Redirected fromIntel 80387)

Subset of x86 instruction set architecture for floating-point arithmetic

x87 is afloating-point-related subset of thex86 architecture instruction set. It originated as an extension of the8086 instruction set in the form of optional floating-pointcoprocessors that work in tandem with corresponding x86 CPUs. These microchips have names ending in "87". This is also known as the NPX (numeric processor extension). Like other extensions to the basic instruction set, x87 instructions are not strictly needed to construct working programs, but provide hardware andmicrocode implementations of common numerical tasks, allowing these tasks to be performed much faster than correspondingmachine code routines can. The x87 instruction set includes instructions for basic floating-point operations such as addition, subtraction and comparison, but also for more complex numerical operations, such as the computation of thetangent function and its inverse, for example.

Most x86 processors since theIntel 80486 have had these x87 instructions implemented in the main CPU, but the term is sometimes still used to refer to that part of the instruction set. Before x87 instructions were standard in PCs,compilers or programmers had to use rather slow library calls to perform floating-point operations, a method that is still common in (low-cost)embedded systems.

Description

[edit]

The x87 registers form an eight-level deep non-strictstack structure ranging from ST(0) to ST(7) with registers that can be directly accessed by either operand, using an offset relative to the top, as well as pushed and popped. (This scheme may be compared to how astack frame may be both pushed/popped and indexed.)

There are instructions to push, calculate, and pop values on top of this stack;unary operations (FSQRT, FPTAN etc.) then implicitly address the topmost ST(0), whilebinary operations (FADD, FMUL, FCOM, etc.) implicitly address ST(0) and ST(1). The non-strict stack model also allows binary operations to use ST(0) together with a directmemory operand or with anexplicitly specified stack register, ST(x), in a role similar to a traditionalaccumulator (a combined destination and left operand). This can also be reversed on an instruction-by-instruction basis with ST(0) as the unmodified operand and ST(x) as thedestination. Furthermore, the contents in ST(0) can be exchanged with another stack register using an instruction called FXCH ST(x).

These properties make the x87 stack usable as seven freely addressable registers plus a dedicated accumulator (or as seven independent accumulators). This is especially applicable onsuperscalar x86 processors (such as thePentium of 1993 and later), where these exchange instructions (codes D9C8..D9CF_h) are optimized down to a zero clock penalty by using one of the integer paths for FXCH ST(x) in parallel with the FPU instruction. Despite being natural and convenient for humanassembly language programmers, some compiler writers have found it complicated to construct automaticcode generators that schedule x87 code effectively. Such a stack-based interface potentially can minimize the need to save scratch variables in function calls compared with a register-based interface^[1] (although, historically, design issues in the 8087 implementation limited that potential.^[2]^[3])

The x87 provides single-precision, double-precision and 80-bitdouble-extended precision binary floating-point arithmetic as per theIEEE 754-1985 standard. By default, the x87 processors all use 80-bit double-extended precision internally (to allow sustained precision over many calculations, seeIEEE 754 design rationale). A given sequence of arithmetic operations may thus behave slightly differently compared to a strict single-precision or double-precision IEEE 754 FPU.^[4] As this may sometimes be problematic for some semi-numerical calculations written to assume double precision for correct operation, to avoid such problems, the x87 can be configured using a special configuration/status register to automatically round to single or double precision after each operation. Since the introduction ofSSE2, the x87 instructions are not as essential as they once were, but remain important as a high-precision scalar unit for numerical calculations sensitive toround-off error and requiring the64-bit mantissa precision and extended range available in the 80-bit format.

Performance

[edit]

Clock cycle counts for examples of typical x87 FPU instructions (only register-register versions shown here).^[5]

TheA...B notation (minimum to maximum) covers timing variations dependent on transient pipeline status and the arithmetic precision chosen (32, 64 or 80 bits); it also includes variations due to numerical cases (such as the number of set bits, zero, etc.). The L → H notation depicts values corresponding to the lowest (L) and the highest (H) maximal clock frequencies that were available.

x87 implementation	FADD	FMUL	FDIV	FXCH	FCOM	FSQRT	FPTAN	FPATAN	Max clock (MHz)	Peak FMUL (millions/s)	FMUL^§ rel. 5 MHz 8087
8087	70…100	90…145	193…203	10…15	40…50	180…186	30…540	250…800	5 →10	0.034…0.055 → 0.100…0.111	1 → 2× as fast
80287 (original)	70…100	90…145	193…203	10…15	40…50	180…186	30…540	250…800	6 →12	0.041…0.066 → 0.083…0.133	1.2 → 2.4×
80387 (and later 287 models)	23…34	29…57	88…91	18	24	122…129	191…497	314…487	16 →33	0.280…0.552 → 0.580…1.1	~10 → 20×
80486 (or 80487)	8…20	16	73	4	4	83…87	200…273	218…303	16 →50	1.0 → 3.1	~18 → 56×
Cyrix 6x86,Cyrix MII	4…7	4…6	24…34	2	4	59…60	117…129	97…161	66 →300	11…16 → 50…75	~320 → 1400×
AMD K6 (including K6 II/III)	2	2	21…41	2	3	21…41	?	?	166 →550	83 → 275	~1500 → 5000×
Pentium / Pentium MMX	1…3	1…3	39	1 (0*)	1…4	70	17…173	19…134	60 →300	20…60 → 100…300	~1100 → 5400×
Pentium Pro	1…3	2…5	16…56		1	28…68	?	?	150 →200	30…75 → 40…100	~1400 → 1800×
Pentium II / III	1…3	2…5	17…38		1	27…50	?	?	233 → 1400	47…116 → 280…700	~2100 → 13000×
Athlon (K7)	1…4	1…4	13…24		1…2	16…35	?	?	500 → 2330	125…500 → 580…2330	~9000 → 42000×
Athlon 64 (K8)	1…4	1…4	13…24		1…2	16…35	?	?	1000 → 3200	250…1000 → 800…3200	~18000 → 58000×
Pentium 4	1…5	2…7	20…43	multiple cycles	1	20…43	?	?	1300 → 3800	186…650 → 543…1900	~11000 → 34000×

* An effective zero clock delay is often possible, via superscalar execution.

^§ The 5 MHz 8087 was the original x87 processor. Compared to typical software-implemented floating-point routines on an 8086 (without an 8087), the factors would be even larger, perhaps by another factor of 10 (i.e., a correct floating-point addition in assembly language may well consume over 1000 cycles).

Manufacturers

[edit]

Companies that have designed or manufactured^[a] floating-point units compatible with the Intel 8087 or later models includeAMD (287,387,486DX,5x86,K5,K6,K7,K8),Chips and Technologies (theSuper MATH coprocessors),Cyrix (theFasMath,Cx87SLC,Cx87DLC, etc.,6x86,Cyrix MII),Fujitsu (earlyPentium Mobile etc.),Harris Semiconductor (manufactured80387 and486DX processors),IBM (various387 and486 designs),IDT (theWinChip,C3,C7,Nano, etc.),IIT (the2C87,3C87, etc.), LC Technology (theGreen MATH coprocessors),National Semiconductor (theGeode GX1,Geode GXm, etc.),NexGen (theNx587),Rise Technology (themP6),ST Microelectronics (manufactured486DX,5x86, etc.),Texas Instruments (manufactured486DX processors etc.),Transmeta (theTM5600 andTM5800),ULSI (theMath·Co coprocessors),VIA (theC3,C7, andNano, etc.),Weitek (the1067,1167,3167 and4167), and Xtend (the83S87SX-25 and other coprocessors).

Architectural generations

[edit]

8087

[edit]

Main article:Intel 8087

The8087 was the first mathcoprocessor for 16-bit processors designed byIntel. It was built to be paired with theIntel 8088 or8086 microprocessors. (Intel's earlier8231 and 8232 floating-point processors, marketed for use with the i8080 CPU, were in fact licensed versions of AMD's Am9511 and Am9512 FPUs from 1977 and 1979.^[6])

80C187

[edit]

Although the original 1982 datasheet for the (NMOS based) 80188 and 80186 seem to mention specific math coprocessors,^[7] both chips were actually paired with an 8087.

However, in 1987, to work with the refreshedCMOS basedIntel 80C186 CPU, Intel introduced the80C187^[8] math coprocessor. The 80C187 interface to the main processor is the same as that of the 8087, but its core is essentially that of an 80387SX and is thus fullyIEEE 754-compliant and capable of executing all the 80387's extra instructions.^[9]

80287

[edit]

The80287 (i287) is the mathcoprocessor for theIntel 80286 series ofmicroprocessors. Intel's models included variants with specified upper frequency limits ranging from 6 up to 12 MHz. The NMOS version were available 6, 8 and 10 MHz.^[10] The available 10 MHz Intel 80287-10 Numerics Coprocessor version was for 250 USD in quantities of 100.^[11] These boxed version of 80287, 80287-8, and 80287-10 were available for USD $212, $326, and $374 respectively. There was boxed version of 80C287A available for USD $457.^[12] Other 287 models with 387-like performance are the Intel 80C287, built usingCHMOS III, and the AMD 80EC287 manufactured in AMD'sCMOS process, using only fully static gates.

Later followed the i80287XL with 387SX microarchitecture with a 287 pinout,^[13] the i80287XLT, a special version intended for laptops, as well as other variants. It contains an internal 3/2 multiplier, so that motherboards that ran the coprocessor at 2/3 CPU speed could instead run the FPU at the same speed of the CPU. Both 80287XL and 80287XLT offered 50% better performance, 83% less power consumption, and additional instructions.^[14]

The 80287 works with the80386 microprocessor and was initially the only coprocessor available for the 80386 until the introduction of the 80387 in 1987. However, the 80387 is strongly preferred for its higher performance and the greater capability of its instruction set.

6 MHz version of the Intel 80287
Intel 80287 die shot
Intel 80287XL
Intel 80287XLT

80387

[edit]

The80387 (387 ori387) is the first Intel coprocessor to be fully compliant with theIEEE 754-1985 standard. Released in 1987,^[15] two years after the 386 chip, the i387 includes much improved speed over Intel's previous 8087/80287 coprocessors and improved characteristics of its trigonometric functions. It was made available for USD $500 in quantities of 100.^[16] Shortly afterwards, it was made available through Intel's Personal Computer Enhancement Operation for a retail market price of USD $795.^[17] The 25 MHz version was available in retail channel for USD $1395.^[18] The Intel M387 math coprocessor met underMIL-STD-883 Rev. C standard. This device was tested which includes temperature cycling between -55 and 125 °C, hermeticity sealed and extended burn-in. This military version operates at 16 MHz. This military version was available in 68-lead PGA and quad flatpack. This military version was available for USD $1155 in 100-unit of quantities for the PGA version.^[19] The 33 MHz version of 387DX was available and it has the performance of 3.4megawhetstones per second.^[20] The following boxed version of 16-, 20-, 25-, and 33-MHz 387DX math coprocessor were available for USD $570, $647, $814, and $994 respectively.^[21] The 8087 and 80287's FPTAN and FPATAN instructions are limited to an argument in the range ±π/4 (±45°), and the 8087 and 80287 have nodirect instructions for the SIN and COS functions.^[22]^{[full citation needed]}

Without a coprocessor, the 386 normally performs floating-point arithmetic through (relatively slow) software routines, implemented at runtime through a softwareexception handler. When a math coprocessor is paired with the 386, the coprocessor performs the floating-point arithmetic in hardware, returning results much faster than an (emulating) software library call.

The i387 is compatible only with the standard i386 chip, which has a 32-bit processor bus. The later cost-reduced i386SX, which has a narrower 16-bitdata bus, can not interface with the i387's 32-bit bus. The i386SX requires its own coprocessor, the80387SX, which is compatible with the SX's narrower 16-bit data bus. Intel released the low power version of 387SX coprocessor.^[20]

In addition, to pair with thei386SL used in laptops, Intel released thei387SL (N80387SL).^[23] Marketed as "Intel387 SL Mobile Math CoProcessor", it included power-management features which allowed it to run without significantly reducing battery life. There are two battery-saving power-down features. The first one stops the coprocessor's clock when the CPU goes into "stop clock" mode; the 387SL consumes about 25 microamperes when its clock is stopped. The second one operates automatically when the CPU is running, putting the 387SL into "idle mode" when it is not executing an instruction. When active, the 387SL typically consumes 30 percent less battery power (about 100 mA) than the 387SX. In idle mode, it consumes 4 mA, a 96 percent power reduction compared to the active mode. It works in the range of 16 to 25 MHz and does not require BIOS or hardware reconfiguration.^[24] It was initially available for USD $189.^[25]

i387
i387SX
i387DX
i387 microarchitecture with 16-bitbarrel shifter andCORDIC unit
i386DX with i387DX
Socket for the 80387

80487

[edit]

Thei487SX (P23N) was marketed as afloating-point unit coprocessor for Inteli486SX machines. It actually contained a full-blowni486DX implementation. When installed into an i486SX system, the i487 disabled the main CPU and took over all CPU operations. The i487 took measures to detect the presence of an i486SX and would not function without the original CPU in place.^[26]^[27]^{[failed verification]}

80587

[edit]

TheNx587 was the last FPU for x86 to be manufactured separately from the CPU, in this case NexGen'sNx586.

Notes

[edit]

^Fabless companies design a chip and rely on a fabbed company to manufacture it, while fabbed companies can do both the design and the manufacture by themselves.

References

[edit]

^William Kahan (2 November 1990)."On the advantages of 8087's stack"(PDF).Unpublished course notes, Computer Science Division, University of California at Berkeley. Archived fromthe original(PDF) on 18 January 2017.
^William Kahan (8 July 1989)."How Intel 8087 stack overflow/underflow should have been handled"(PDF). Archived fromthe original(PDF) on 12 June 2013.
^Jack Woehr (1 November 1997)."A conversation with William Kahan".
^David Monniaux (May 2008)."The pitfalls of verifying floating-point computations".ACM Transactions on Programming Languages and Systems.30 (3):1–41.arXiv:cs/0701192.doi:10.1145/1353445.1353446.S2CID 218578808.
^Numbers are taken from respective processors' data sheets, programming manuals, and optimization manuals.
^"Arithmetic Processors: Then and Now".www.cpushack.com. 23 September 2010. Retrieved3 May 2023.
^Intel (1983).Intel Microprocessor & Peripherals Handbook. pp. 3-25 (iAPX 186/20) and 3-106 (iAPX 188/20).
^"CPU Collection – Model 80187".cpu-info.com. Archived fromthe original on 23 July 2011. Retrieved14 April 2018.
^"80C187 80-BIT MATH COPROCESSOR"(PDF). November 1992. Retrieved3 May 2023.
^Yoshida, Stacy, "Math Coprocessors: Keeping Your Computer Up for the Count", Intel Corporation, Microcomputer Solutions, September/October 1990, page 16
^Intel Corporation, "New Product Focus Component: A 32-Bit Microprocessor With A Little Help From Some Friends", Special 32-Bit Issue Solutions, November/December 1985, page 13.
^Intel Corporation, "Personal Computer Enhancement", Personal Computer Enhancement Operation, Order No. 245.2, 10-89/75K/AL/GO, October 1989, page 4
^Intel Corporation, "New Product Focus: Systems: SnapIn 386 Module Upgrades PS/2 PCs", Microcomputer Solutions, September/October 1991, page 12
^Yoshida, Stacy, "Math Coprocessors: Keeping Your Computer Up for the Count", Intel Corporation, Microcomputer Solutions, September/October 1990, page 16
^Moran, Tom (1987-02-16)."Chips to Improve Performance Of 386 Machines, Intel Says".InfoWorld. Vol. 9, no. 7. p. 5.ISSN 0199-6649.
^"New Product Focus Components: The 32-Bit Computing Engine Full Speed Ahead".Solutions. Intel Corporation: 10. May–June 1987.
^"NewsBit: Intel 80387 Available Through Retail Channels".Solutions. Intel Corporation: 1. July–August 1987.
^Intel Corporation, "NewsBits: 25 MHZ 80387 Available Through Retail Channels", Microcomputer Solutions, September/October 1988, page 1
^Intel Corporation, "Focus: Components: Militarized Peripherals Support M386 Microprocessor", Microcomputer Solutions, March/April 1989, page 12
^^a ^bLewnes, Ann, "The Intel386 Architecture Here to Stay", Intel Corporation, Microcomputer Solutions, July/August 1989, page 2
^Intel Corporation, "Personal Computer Enhancement", Personal Computer Enhancement Operation, Order No. 245.2, 10-89/75K/AL/GO, October 1989
^Borland Turbo Assembler documentation.
^"Intel N80387SL".www.cpu-world.com. Retrieved4 December 2024.
^"Intel 387 SL Math Coprocessor".PC World. Vol. 10, no. 7. July 1992. p. 72.
^Intel Corporation, "New Product Focus: End-User: Math Coprocessor Brings Desktop Performance To Portables", Microcomputer Solutions, May/June 1992, page 16-17
^Intel 487SX at theFree On-line Dictionary of Computing
^"Intel 80487".www.cpu-world.com. Retrieved9 June 2021.

Intel 64 and IA-32 Architectures Software Developer's Manual Volume 1: Basic Architecture(PDF). Intel.

External links

[edit]

Everything you always wanted to know about math coprocessors

Intel processors

Lists

Processors
- Atom
- Celeron
- Pentium
  - Pro
  - II
  - III
  - 4
  - D
  - M
- Core
  - 2
  - i3
  - i5
  - i7
  - i9
  - M
- Xeon
- Quark
- Itanium
Microarchitectures
Chipsets

Microarchitectures

IA-32 (32-bit x86)	P5 P6 P6 variant (Pentium M) P6 variant (Enhanced Pentium M) NetBurst
x86-64 (64-bit)	Core Penryn Nehalem Westmere Sandy Bridge Ivy Bridge Haswell Broadwell Skylake Cannon Lake Sunny Cove Cypress Cove Willow Cove Golden Cove
x86 ULV	Bonnell Saltwell Silvermont Goldmont Goldmont Plus Tremont Gracemont

Current products

x86-64 (64-bit)	Atom Celeron Pentium Core 10th gen 11th gen 12th gen 13th gen 14th gen Core Ultra 1st gen 2nd gen Xeon

Discontinued

BCD oriented (4-bit)	4004 (1971) 4040 (1974)
pre-x86 (8-bit)	8008 (1972) 8080 (1974) 8085 (1977)
Earlyx86 (16-bit)	8086 (1978) 8088 (1979) 80186 (1982) 80188 (1982) 80286 (1982)
x87 (externalFPUs)	8/16-bit databus 8087 (1980) 16-bit databus 80C187 80287 80387SX 32-bit databus 80387DX 80487
IA-32 (32-bit x86)	i386 SX 376 EX i486 SX DX2 DX4 SL RapidCAD OverDrive A100/A110 Atom CE SoC Celeron (1998) M D (2004) Pentium Original i586 OverDrive Pro II III 4 M Dual-Core Core Xeon P6-based NetBurst-based Core-based Quark Tolapai
x86-64 (64-bit)	Atom SoC CE Celeron D Dual-Core Pentium 4 D Extreme Edition Dual-Core Core 2 1st gen 2nd gen 3rd gen 4th gen 5th gen 6th gen 7th gen 8th gen 9th gen 10th gen 11th gen M Xeon Nehalem-based Sandy Bridge-based Ivy Bridge-based Haswell-based Broadwell-based Skylake-based
Other	CISC iAPX 432 EPIC Itanium RISC i860 i960 StrongARM XScale