Movatterモバイル変換

ホーム
17 Sep 2000 - 31 May 2025
May	JUN	Jul
	18
2018	2019	2020
success
fail
About this capture
COLLECTED BY
Organization:Internet Archive
The Internet Archive discovers and captures web pages through many different web crawls.At any given time several distinct crawls are running, some for months, and some every day or longer.View the web archive through theWayback Machine.
Collection:Live Web Proxy Crawls
Content crawled via theWayback Machine Live Proxy mostly by the Save Page Now feature on web.archive.org.

Liveweb proxy is a component of Internet Archive?s wayback machine project. The liveweb proxy captures the content of a web page in real time, archives it into a ARC or WARC file and returns the ARC/WARC record back to the wayback machine to process. The recorded ARC/WARC file becomes part of the wayback machine in due course of time.
TIMESTAMPS
The Wayback Machine - https://web.archive.org/web/20190618044444/http://www.trnicely.net/pentbug/pentbug.html
TO:   Whom it may concernFROM: Thomas R. Nicely (current e-mail address)RE:   Pentium FDIV flawDATE: 0900 GMT 19 August 2011Freeware copyright (c) 2011 Thomas R. Nicely. Released into the publicdomain by the author, who disclaims any legal liability arising fromits use.Enumerated below are several questions that have frequently been posedto me, regarding the discovery, nature, and implications of thePentium FDIV flaw. Each question is followed by my response.Many of these questions were submitted by Dr. Denis Delbecq of theParis based computer periodical "Science et Vie Micro."/*************************************************************/Q1:  How can a user check a Pentium machine for the presence of the     bug?/**************************************************************/Perform Coe's calculation (see Question 5 below). That is, carryout the following division problem:4195835.0/3145727.0 = 1.333 820 449 136 241 002 5  (Correct value)4195835.0/3145727.0 = 1.333 739 068 902 037 589 4  (Flawed Pentium)The consequence of the flaw can be made more glaring by performingthe following related calculation, which is the one employed inthe test code provided inpentbug.zip.4195835.0 - 3145727.0*(4195835.0/3145727.0) = 0    (Correct value)4195835.0 - 3145727.0*(4195835.0/3145727.0) = 256  (Flawed Pentium)The calculation can be done in BASIC, in a spreadsheet (such asQuattro Pro, Excel, or Microsoft Works), in the Microsoft Windowscalculator, or in some other programming language such as Pascal,C, or Fortran.Make sure that the FPU has not been disabled (this usually has tobe done intentionally through some specific action). GW-BASICand QBasic usually ignore the FPU. If you compile your code, turnoff all optimization.I have provided a C source code, and corresponding DOS executable,for the purpose of testing for the bug; seepentbug.zip./*************************************************************/Q2:  Could you summarize how you discovered the problem?  Were you     doing research calculations or were you studying the problem of     accuracy with computers?/**************************************************************/RESPONSE:  I was pursuing a research project in an area of puremathematics known as computational number theory. Specifically, Ihave written a code which enumerates the primes, twin primes, primetriplets, and prime quadruplets for all positive integers up to anextremely large upper bound (currently 6.4*10^15). The totals arewritten to a file at intervals of 10^10 (earlier 10^9). Also computedare the sums of the reciprocals of the twin primes, the triplets, andthe quadruplets; each of these can be proved to converge to a limit,but the limit of the sum of the reciprocals of the twin primes isknown imprecisely, and the others have not been previouslycomputed. Large gaps between consecutive primes have also been recorded.Some of these results have been published in journal papers;many others are being published and updated at my Web site,http://www.trnicely.net,where additional details and information are available.The code is written so that the computation can be distributedasynchronously over a large number of independent systems, withthe final results combined upon completion. The calculation has runfor more than 13 years simultaneously on a number of systems (varyingfrom a few to more than two dozen, mostly Pentiums but with a few486s and 386s); the first Pentium was added in March, 1994. As ofSeptember, 2006, calculations have been completed to 6.4*10^15;the latest version of the code, ported to GNU C, has a throughput(for intervals near 6.4*10^15) of approximately 50 million integersper second on a 3.0 GHz Pentium 4 (800 MHz FSB, 533 MHz DDR memory)running under GNU/Linux.Simultaneously with the calculation of the unknown quantities, anumber of checks are maintained by calculating previously publishedvalues (such as pi(x), the number of primes <= x). As an additionalcheck, the reciprocal sums are computed by two different methods.First, reciprocals are computed using the Intel x87 floating-pointco-processor unit (FPU, NPX), which provides 80-bit registers and a64-bit significand, equivalent to 19S (19 significant decimal digits);this is also referred to as extended precision or long double precision(on other platforms, long double may represent a different accuracy,e.g., 53 bits [16D] or 113 bits [33D]), or as an "extended real"or "temporary real" data type. Secondly, the reciprocals arecalculated to 53 (earlier 26) decimal places (53D) using scaledultraprecision integer arithmetic; this is accomplished using amodification of the BIGINT code written and contributed to thepublic domain by Arjen K. Lenstra (with additions by Mark Manasse,Marc Ringuette, and Mark Riordan) circa 1988-1991. Lenstra'sC code represents very large integers by arrays of smaller (typically32-bit) integers; it also retains some minor dependency onfloating-point arithmetic. Lenstra's code is a precursor of, andalternative to the GMP (GNU multiple precision) library; eventually Imay replace my calls to Lenstra's code with calls to GMP.Calculations began in early 1993. On 13 June 1994, the outputs of severalruns were assembled, and I found that the computed value for pi(2*10^13)disagreed with the published value. This led to a long search forlogic errors and sources of reduced precision in my source code (some3000 lines in all). In the process, I found that the Borland C++ 4.02compiler was producing erroneous code when compiled in 32-bit mode withcertain optimizations (-Op -Om -Og) enabled. For some time Ibelieved this to be the source of my woes.After eliminating this source of error, and rewriting thecode to convert certain floating-point calculations from doubleprecision to long double precision, I put the revised code into use on10 September. To my dismay, I soon discovered (on 4 October) that I wasnow encountering a new error, a discrepancy in the long double (sum ofthe) floating-point reciprocals returned by the x87 FPU. The results forthe first trillion, as computed on the Pentium-60, differed from theresults obtained on a 486DX-33 by an amount orders of magnitude in excessof that expected from rounding or truncation error accumulation(the floating-point and ultraprecision sums also differed, butby an amount less than the expected floating-point noise). Throughtrial and error and finally a binary search, the discrepancy wasisolated to the twin-prime pair (824633702441, 824633702443), whichwas producing incorrect floating-point reciprocals (the ultraprecisionreciprocals were also in error, by a lesser amount, evidently due tothe incidental dependency on floating-point arithmetic in Lenstra'soriginal integer arithmetic code).My first conjecture was that the error was again an artifact of theBorland compiler, but even completely disabling optimization failedto eliminate the problem. Tracing the source of the error wasfurther complicated by the fact that on one occasion I tested thecode with the Pentium FPU locked out, and the error was stillpresent (this never happened again, and was apparently due to myown failure to properly disable the FPU). The problem might also bein the PCI bus on the Pentiums, rather than the CPU. After all, anumber of Pentium PCI systems had been reported in the trade press ascorrupting data due to faulty design of the interface with the PCIbus (this was especially true of Intel motherboards using theNeptune chipset).The final pieces of the puzzle fell into place during the week of16-22 October. On 17 October I gained access to a second Pentium,which had a motherboard from a different manufacturer. The errorwas present in this machine as well. During 17-19 October, Ireproduced the error in a code written in Power Basic, eliminatingthe C compiler as a cause. I reproduced the error in a Quattro Prospreadsheet, and also verified that the error disappeared when theFPU was locked out in real-mode DOS (this is difficult to do inWindows code or 32-bit code, which I was using for my mainapplication). On 21 October, I ran the test code on a 486DX2-66with a PCI bus; when no error appeared, I felt that the PCI bus hadbeen eliminated as a cause. On 22 October, I tested the code onstill a third Pentium on display at Staples, a local office supplystore; this Packard-Bell machine also produced the error. I wasnow certain that the error was in the FPU of the Pentium chip.On or about 19 October, I contacted tech support at Micron, Inc.,from whom I purchased my system, but they were unable to provide mewith any information regarding the problem. On 24 October, Icontacted Intel tech support. After six days, they still had noanswer to the problem, only an informal acknowledgement that the errorhad been reproduced there on a 66-MHz Pentium. On 26 October, I maileda copy of the bug demonstration codes to Mr. Tim Wetzel at Micron techsupport (no reply was ever received). On 27 October, I provided acolleague with a copy of the test code; her husband is an engineer inthe nuclear reactor group at the local firm of Babcock and Wilcox.A. B. Copsey of Babcock and Wilcox Nuclear Technologies reported to meon 28 October that their new P90 Gateway Pentiums all appeared to havethe bug (this was the first e-mail exchanged in regard to the bug).In the absence of any meaningful response from Intel or Micron,on 30 October I sent e-mail (see the filebugmail1.html)to a number of individuals and organizations whom I felt would have accessto many other Pentium systems, and asked them to check for the problem.I believe you are aware of events from that point on./**************************************************************/Q3:  Can you reveal the parties to whom you addressed the initial     e-mail inquiries of 30 October 1994?  Were any parties informed     of the problem prior to that date?  This knowledge might be useful     in tracing the process by which the public became aware of the     problem./***************************************************************/Actually, I did not maintain a definitive list of these parties.It did not seem of any importance at the time, and I merely chosesome individuals and organizations whom I felt would be likely tohave access to numerous Pentiums and other systems, so that theymight test for the error. Following below is the chronology asbest I can now reconstruct it.March 1993      Approximate date of beginning of calculations.March 1994      Pentium system is added to computational group.13 June 1994    A discrepancy (incorrect count of primes < 2*10^13) is                first noted in the research code results. Some                of my department colleagues are made aware of this.                This initial error was probably related to the FPU FPREMx                instructions through the C fmod and fmodl instructions.June-Sept       A long process attempting to pinpoint and eliminate                the error is carried out. Large parts of the code                are rewritten; the Borland compiler bugs are accounted                for; other sources of potential error are eliminated.10 September    The revised code is put into production.4 October       A new error is noticed:  the FPU values for the sum of                the reciprocals of the twins, as computed on the Pentium-60                and a 486DX-33, diverge within the first 10^12. After                several days, the discrepancy is tracked down to                the twin prime pair (824633702441, 824633702443), and it                is noted that the elementary operation 1/824633702441 is                returning an incorrect value from the FPU in C++.17 October      The code is tested on a colleague's brand new Pentium,                and the same error is noted. The error does not appear                on 486s. The new Pentium has an Intel motherboard; mine                has a Micronics motherboard.18 October      The error is reproduced on the Pentium in Power Basic                and Quattro Pro, thus is not language dependent. It                disappears when the FPU is disabled.19 October      Results of 18 October are confirmed on the new Pentium                system. I inform tech support at Micron Computers of                the problem, but they have no explanation.21 October      A 486 system with a PCI bus does not show the error,                eliminating the PCI bus as the source.22 October      A third Pentium system displays the error---a Packard-Bell                system on display at Staples' office supply. It is                confirmed in the Microsoft Works spreadsheet.24 October      I call Intel tech support and inform them of the problem.                A response is promised within a few days. A colleague in                Great Britain is informed of the problem by letter (he                did not receive the letter for several days, and was                apparently unable to gain access to a Pentium to check                for the bug).26 October      I mail floppy disks containing test codes for the bug to                Micron tech support (Tim Wetzel). No response is ever                received. Also about this time, my colleague informed                Insight, Inc. tech support that his new Pentium had the                problem (with no substantive response).27 October      I give a floppy disk containing copies of the bug detection                codes to a colleague whose husband works at Babcock &                Wilcox Nuclear Technologies (Lynchburg, Virginia; later                known as BWXT/Framatome).28 October      A. B. Copsey of BWNT informs me by e-mail that their new                Gateway P90s all show the bug, using my test codes. This                is the first e-mail transmission on the subject.30 October      With no substantive response to this point from either                Micron tech support or Intel tech support, I dispatch my                initial e-mail inquiry (see the filebugmail1.html)to several individuals and groups (at approximately 3:20:49pm EST). The following listing is approximate:                         1 Andrew Schulman                         2 Ralf Brown                         3 David Maxey                         4 Jim Kyle                         5 Raymond J. Michaels                         6 Tom Halfhill (Byte magazine)                         7 Ziff-Davis Labs                         8 Spencer Katt (PC Week) 9 <157.9301@mcimail.com>10 Brett Glass (Infoworld)11 John Dvorak (PC Magazine)12 Robert X. Cringely (Infoworld)                The first five are the authors of "Undocumented DOS,"                2nd edition. I'm not sure who had address 9; research                by Gideon Yuval indicates that it may have been the                address for PC Magazine's "Letters to the Editor"                section. Most of the above parties never responded (of                the trade publications, only Tom Halfhill responded,                saying he would refer the inquiry to Byte's labs).                Robert X. Cringely apparently refused to use or                acknowledge the information, on the rather curious                grounds that my request for attribution constituted a                copyright, and was also unprofessional. So far as I                know, only Andrew Schulman made a real effort to                investigate the problem; he forwarded the inquiry to                Richard Smith at Phar Lap, Inc. Ralf Brown also sent a                response.31 October      An Intel engineer calls and asks that a diskette with                copies of the bug detection codes be shipped to themFed Ex overnight. The package is sent out at about6:30 pm EST.1 November      Richard Smith of Phar Lap posts the original inquiry                on the Canopus forum of Compuserve. For further details                on the early propagation of the inquiry message on the                Internet, see Smith's account atrsmith.html.2 November      Richard Smith informs me that the bug has been detected                on some of Phar Laps's Pentiums (others had apparently                already received replacement chips some weeks earlier).2 November      An Intel program manager calls to acknowledge receipt                of the bug codes. He says my analysis is essentially                correct, that Intel itself had noticed the problem during                its own testing (I later learned that this apparently                happened during testing of a similar FPU intended for                the P6, in May or June of 1994), and that a new stepping                with the problem fixed is out in sample quantities. He                offers to ship two replacement chips (one for my system and                one for my colleague's).2 November      I receive an inquiry from Alexander Wolfe of Electronic                Engineering Times regarding the flaw.3 November      The two replacement chips arrive.4 November      I install one of the replacement chips in my home Pentium.                First tests indicate that the bug has been fixed.7 November      Alexander Wolfe's article appears in Electronic Engineering                Times. The matter is now fully public.21 November     Steve Young, chief financial correspondent for CNN Cable                News, is the first mainstream media journalist to break                the story of the Pentium FDIV flaw and its implications                for Intel. The story is then picked up by other national                and international media.30 November     Intel releases an in-house study of the flaw, "Statistical                Analysis of Floating Point Flaw in the Pentium Processor                (1994)," H. P. Sharangpani and M. L. Barton, Intel                Corporation. This study minimizes the potential impact                of the flaw on the vast majority of users, a conclusion                with which I largely agree.12 December     IBM releases its own study of the potential impact of                the flaw, challenging Intel's analysis and concluding                that the flaw will seriously impact the work of a large                number of users both within and outside the scientific                community. My own analysis is closer to Intel's position.20 December     In response to a firestorm of public opinion, Intel                announces plans for a total recall, replacement, and                destruction of the flawed Pentium processors.17 Jan 1995     Intel announces a pre-tax charge of 475 million dollars                against earnings, ostensibly the total cost associated                with replacement of the flawed processors./**************************************************************/Q4:  In which fields of mathematics and numerical models could the     FDIV roundoff error reduce significantly confidence in the     results?  Many people talk about the formulas that demonstrate     the problem./***************************************************************/RESPONSE:  Clearly, computational number theory is one areaaffected. Other areas with the potential for major difficultiesinclude computations in chaos theory (non-linear dynamics), linearprogramming or finite element analysis (where ill-conditionedmatrices may be involved), and areas requiring numerical solutionof differential equations by iterative methods (if high precisionis required in the extrapolated result, as in orbital dynamics).Bear in mind, however, that the likelihood is 1000 to 1000000 timesgreater that any erroneous results obtained on a Pentium are due tosoftware errors, rather than any error in the CPU. For the averageuser, I do not believe the bug has a significant impact,particularly in comparison to other sources of error.However, for users in mathematics, science, and engineering, wemust each be our own judge as to the danger posed by the bug. Inany case, whether you are using the Pentium or some other CPU,mission-critical applications and those which may affect the healthand welfare of others should be performed in duplicate, preferablyon systems with different CPUs, operating systems, and applicationsoftware./***************************************************************/Q5:  What does this FDIV problem signify at the logical level of     the FPU?  Does it occur with some specific mantissa schemes?/***************************************************************/RESPONSE:  The difficulty apparently arises from an error in thelookup tables used to implement the hardware division algorithm;five cells in a lookup table were accidentally left blank. The Pentiumapparently attempts to use a much more aggressive algorithm forhardware floating-point division than did the 486; this isindicated by the fact that it uses only about half as many clockcycles per floating-point division. Evidently the 486 isattempting to generate one bit of the quotient per iteration, whilethe Pentium attempts to generate two bits per iteration. Accordingto Coe and Intel, the critical denominators (those that might producea flawed division or remainder) are those that have bits 2 through 7inclusive on in the mantissa (significand) of the 80-bit IEEE temporaryreal representation (employed by Intel x87 numeric coprocessors); thisis borne out by my own experience. Thus problem denominators can beidentified by masking the most significant word of the denominatormantissa. Only a small portion of even these mantissas produces an error.The sign and exponent are irrelevant. The worst case error is the onefirst discovered by Coe: 4195835.0/3145727.0 is returned correctly toonly 12 matching bits and 14 significant bits (the 5th decimal digit andall beyond are in error; the flawed result is accurate to only fivesignificant digits; the difference in the two binary values iszero through the 14 leading bits):4195835.0/3145727.0 = 1.333 820 449 136 241 002 5  (Correct value)4195835.0/3145727.0 = 1.333 739 068 902 037 589 4  (Flawed Pentium)So far as is known, this is the worst-case error possible (in a simplelong double division x/y of floating-point numbers x and y) as a resultof the flaw. Reports of the fourth decimal digit being in error aresimply variations of the above example (e.g., multiply the numeratorby 5; the results will now differ in the fourth significant digit,and the fifth digit [fourth one to the right of the decimal point]is no longer significant, but the relative error is still the same,and the number of matching and significant bits is still the same).Note that the FPU instructions FPREM and FPREM1 (floating-pointremainders, as called by fmod in C) are also subject to the bug.In fact, it was probably one of these that caused my original13 June error, rather than the FDIV instruction; all theseinstructions rely on the same hardware divider unit.A more detailed analysis of the flaw can be found in the paperscited in the bibliography at the end of this document./****************************************************************/Q6:  Do your calculations of the relative frequency of the error     agree with those publicized by Intel?/****************************************************************/RESPONSE:  Yes, within an order of magnitude. Intel quotes an errorrate of about 1 in 8.77*10^9 random divisions. The exact frequencydepends on the type and precision of the operands; single-precisionreciprocals, for example, are always returned correctly.Note, however, that many authorities consider statistical samplingrates to be unrepresentative of the problem, since the valuesappearing in a particular application may not constitute a randomsample of all possible mantissas. In particular, the analysispublicized by IBM on 12 December 1994 claims that the numerical valuesappearing in spreadsheets are heavily biased toward the bitpatterns subject to error, and that consequently the error occursthousands or millions of times more often in common usage than isindicated by Intel's "White Paper" analysis. I personally regardIntel's analysis as more realistic, if a bit optimistic (as I statedin my San Francisco Examiner article of 18 December 1994, I would besurprised if the average user noticed any effect from the error withinthe lifetime of the chip). Aside from Intel's analysis, there isone compelling piece of empirical evidence to support the belief thatthe error is not of consequence to most users: after over a year ofworldwide use of Pentium systems, not a single one of roughly a millionusers had noticed the error. Thus either the error is inconsequentialfor almost all users, or almost all users are extremely sloppy in theirwork. Over a period of five years at my workplace, no person was everable to collect a reward offered for exhibiting (other than with a codeartificially contrived to demonstrate the error), on either of twopublicly available systems intentionally left with flawed CPUsinstalled, an error caused by the flaw.The actual number of different division problems (long double operandpairs, excluding pairs which include or produce denormals) which producean erroneous result appears to be roughly 3*10^37, out of a total of2.28*10^47 possible such pairs./****************************************************************/Q7:  Do the replacement Pentium chips you received from Intel     appear to eliminate the bug?/****************************************************************/RESPONSE:  Yes. I have tested the replacement chips with billionsof divisions and reciprocals involving the critical bit patterns,and have observed zero errors. The critical cases, such as myoriginal example and Tim Coe's example, have also been testedindividually./***************************************************************/Q8:  What about the so-called "workarounds" for the bug?/***************************************************************/RESPONSE:  The workaround finally recommended by Intel is to replaceeach division operation by a function call. The function checks thedivisor for the critical bit pattern; if it is not present, the resultof a normal division is returned; if the critical pattern is found,the numerator and denominator are each multiplied by 15/16 beforethe division is performed. The factor 15/16 was determined to shiftcritical bit patterns to benign ones, while it does not shift anybenign critical bit patterns to erroneous ones. The replacementfunction for long double division in C might look like the following.long double ldQuotient(long double ldNumerator, long double         ldDenominator){unsigned short int ui, *uip;uip = (unsigned short int *)(ldDenominator);ui = *(uip + 3);if ((ui & 0x07e0)==0x07e0)        return(((15.0L/16.0L)*ldNumerator)/((15.0L/16.0L)*ldDenominator))else        return(ldNumerator/ldDenominator);}Variations are required for other precisions and for the remainderingfunctions fmod and fmodl.Of course, the workaround only succeeds in applications whose codehas been rewritten and/or recompiled, and reshipped since the bugappeared. Updated versions of some compilers provide the developer withthe option of automatically trapping each division for the flaw (via acompilation switch such as -fp). Previously existing binaries can avoidthe bug only by locking out the FPU (e. g., by setting 87=NO and NO87=NO87in DOS, or by resetting the emulation bit in the machine status word of CR0otherwise, as can be done using utilities which have been madeavailable by several companies, including Compaq). It is alsopossible to trap the relevant instructions with a TSR or a VxD,then check for and correct erroneous operations, but this apparentlyslows the machine down almost as much as locking the FPU out.The workaround slows the machine down slightly, perhaps 20 % (thisis application dependent). Locking out the FPU may slow themachine down by a factor of five or ten, depending on theapplication; and some applications will not function without anactive coprocessor present./***************************************************************/Q9: Why do you think this particular bug has received an     inordinate amount of publicity, making it such a public relations     nightmare for Intel?/***************************************************************/I believe several factors contributed to this phenomenon.*    Intel's initial failure to publicize the problem, even in a     listing of errata to their OEMs and most valued customers, was     in retrospect a mistake which alienated these constituencies.*    Even more baffling, Intel failed to warn their tech support     desk to immediately report any external complaint about the     bug, so that it could be given special handling.*    Intel's subsequent response, once the bug had been detected     independently, was considered unsatisfactory by nearly     everyone outside the company.*    The Pentium CPU has been the subject of a high-profile     advertising campaign by Intel.*    In contrast to most previous errors found in CPUs, this one     occurs in an elementary, frequently-used operation which is     easy to demonstrate to the non-specialist, even those who have     little or no computer training.*    The bug was found late in the life cycle of the chip, after     millions of them were already distributed or in production.*    The existence of the Internet, and its current widespread     availability, caused the news and the reaction to Intel's     response to spread much more rapidly than for previous bugs.     Unfortunately, many of the Internet discussions generated more     heat than light.*    One of Intel's principal competitors decided it was in their     interest to publicize an estimate of the flaw's impact which     I believe to be exaggerated, and in obvious disagreement     with user experience prior to public knowledge of the flaw./***************************************************************/Q10: Do you believe Intel's eventual total recall of the flawed     chips on 20 December 1994 was appropriate?/***************************************************************/Certainly, it was appropriate from a public relations standpoint.My own feeling is that a great many people overestimated theimportance, impact, and peril of the flaw; for example, I considerIBM's analysis of 12 December 1994 to be a serious exagerration of theimpact of the flaw. Intel's action, under tremendous pressure fromcustomers, establishes a new level of accountability in the industry.If chip manufacturers such as Intel, IBM, and Motorola are now to beexpected to offer unconditional replacement of a chip each time a newflaw is found, we may very well see prices and/or time to market greatlyincrease. There may unfortunately be an even greater incentive forthe manufacturers to keep the discovery of flaws secret. We could evensee a two-tiered pricing system, with one price for chips "as is" anda much higher one offering unlimited replacements.My own feeling is that fewer than 10 % of all users needed to have anyreal concern about the flaw, and probably fewer than 1 % wouldactually be impacted by it. Thus, in one sense, the recall is a wasteof resources at a time when society in general can ill afford such anextravagance; it was simply not worth more than 100 million dollars tocorrect this flaw (Intel announced on 17 January 1995 a pre-tax charge of475 million dollars against earnings, ostensibly the total costassociated with replacement of the flawed chips).Even more distressing is Intel's decision to destroy the flawed chips,rather than donating them (as is, without liability, not for sale orresale) to educational and non-profit institutions. This is an evenworse instance of waste than Apple Computer's decision some years agoto bury the last few thousand Lisas in a landfill./***************************************************************/Q11: What lessons should be learned by the general public from this     experience?/***************************************************************/I would hope that computers and computer analysis would lose some ofthe aura of invincibility with which they have been treated. Computergenerated results need to be treated with some enlightened skepticism.No system or microprocessor can be expected to produce results whichare absolutely reliable.Computations which are mission critical, which might affect someone'slife or well being, should be carried out in two entirely differentways, with the results checked against each other. While this stillwill not guarantee absolute reliability, it would represent a majoradvance. If two totally different platforms are not available, thenas much as possible of the calculations should be done in two or moreindependent ways. Do not assume that a single computational run ofanything is going to give correct results---check your work!At the same time, we must be conscious that the chips are one of the leastlikely sources of error; user input, application software, systemsoftware, and other system hardware are much more likely to cause errors.This is an even better reason for running check calculations. Few usersare aware that even electromagnetic or particle flux can cause errors.Since the Pentium flaw affair, I have encountered machine errors on morethan fifty other occasions. These were due to defective memory chips,soft memory errors, disk subsystem malfunctions, and possiblyoperating system errors. In several of these instances, I had no reasonto be suspicious of the result except that a second machine produced adifferent result.Thomas R. Nicely
Bibliography
Statistical analysis of floating point flaw in the Pentium Processor (1994). H. P. Sharangpani and M. L. Barton, Intel Corporation (30 November 1994). This is Intel's "White Paper."
Inside the Pentium FDIV bug. Tim Coe. Dr. Dobb's Journal (April 1995) #229, pp. 129-135 and 148.
Computational aspects of the Pentium affair. Tim Coe, Terje Mathisen, Cleve Moler, and Vaughan Pratt. IEEE Computational Science & Engineering (ISSN 1070-9924, March 1995) Vol. 2, #1, pp. 18-31.
Higher-radix division using estimates of the divisor and partial remainder. Daniel E. Atkins. IEEE Transactions on Computers C-17:925-935 (1968).
A zipfile containing the C source code and corresponding DOS executable for a program which will check for the flaw. Thomas R. Nicely (26 April 2003).
Original e-mail message announcing the discovery of the Pentium divison flaw. Thomas R. Nicely (30 October 1994).
An account of the spread of the Pentium flaw announcement across the Internet during the first few days. Richard M. Smith, President of Phar Lap Software, Inc. (27 December 1994)
Pentium study. IBM Research, IBM Corporation <ibmstudy@watson.ibm.com> (12 December 1994).
The Pentium division flaw. Thomas R. Nicely. Virginia Scientists Newsletter (April 1995) Vol. 1, p. 3.
Untitled newspaper article concerning the Pentium division flaw. Thomas R. Nicely. San FranciscoExaminer, San Francisco CA USA (18 December 1994) p. B-5.
ページ先頭