Movatterモバイル変換

[0]ホーム

Jump to content

Subnormal number

Edit links

From Wikipedia, the free encyclopedia

(Redirected fromDenormal number)

Denormalized floating-point numbers near zero

An unaugmented floating-point system would contain only normalized numbers (indicated in red). Allowing denormalized numbers (blue) extends the system's range.

Incomputer science,subnormal numbers are the subset ofdenormalized numbers (sometimes calleddenormals) that fill theunderflow gap around zero infloating-point arithmetic. Any non-zero number with magnitude smaller than the smallest positivenormal number issubnormal, whiledenormal can also refer to numbers outside that range.

Floating-point formats
IEEE 754
16-bit:Half (binary16) 32-bit:Single (binary32),decimal32 64-bit:Double (binary64),decimal64 128-bit:Quadruple (binary128),decimal128 256-bit:Octuple (binary256) Extended precision
Other
Minifloat bfloat16 TensorFloat-32 Microsoft Binary Format IBM floating-point architecture PMBus Linear-11 G.711 8-bit floats
Alternatives
Arbitrary precision
Tapered floating point
Posit
v t e

Terminology

[edit]

In some older documents (especially standards documents such as the initial releases ofIEEE 754 andthe C language), "denormal" is used to refer exclusively to subnormal numbers. This usage persists in various standards documents, especially when discussing hardware that is incapable of representing any other denormalized numbers, but the discussion here uses the term "subnormal" in line with the 2008 revision ofIEEE 754. In casual discussions the termssubnormal anddenormal are often used interchangeably, in part because there areno denormalized IEEE binary numbers outside the subnormal range.

The term "number" is used rather loosely, to describe a particular sequence of digits, rather than a mathematical abstraction; seeFloating-point arithmetic for details of how real numbers relate to floating-point representations. "Representation" rather than "number" may be used when clarity is required.

Definition

[edit]

Mathematical real numbers may be approximated by multiple floating-point representations. One representation is defined asnormal, and others are defined assubnormal,denormal, orunnormal by their relationship tonormal.

In anormal floating-point value, there are noleading zeros in thesignificand (also commonly called mantissa); rather, leading zeros are removed by adjusting the exponent (for example, the number 0.0123 would be written as1.23×10⁻²). Conversely, a denormalized floating-point value has a significand with a leading digit of zero. Of these, the subnormal numbers represent values which if normalized would have exponents below the smallest representable exponent (the exponent having a limited range).

The significand (or mantissa) of anIEEE floating-point number is the part of a floating-point number that represents thesignificant digits. For a positive normalised number, it can be represented asm₀.m₁m₂m₃...m_p−2m_p−1 (wherem represents a significant digit, andp is the precision) with non-zerom₀. Notice that for a binaryradix, the leading binary digit is always 1. In a subnormal number, since the exponent is the least that it can be, zero is the leading significant digit (0.m₁m₂m₃...m_p−2m_p−1), allowing the representation of numbers closer to zero than the smallest normal number. A floating-point number may be recognized as subnormal whenever its exponent has the least possible value.

By filling the underflow gap like this, significant digits are lost, but not as abruptly as when using theflush to zero on underflow approach (discarding all significant digits when underflow is reached). Hence the production of a subnormal number is sometimes calledgradual underflow because it allows a calculation to lose precision slowly when the result is small.

InIEEE 754-2008, denormal numbers are renamedsubnormal numbers and are supported in both binary and decimal formats. In binary interchange formats, subnormal numbers are encoded with abiased exponent of 0, but are interpreted with the value of the smallest allowed exponent, which is one greater (i.e., as if it were encoded as a 1). In decimal interchange formats they require no special encoding because the format supports unnormalized numbers directly.

Mathematically speaking, the normalized floating-point numbers of a givensign are roughlylogarithmically spaced, and as such any finite-sized normal floatcannot include zero. The subnormal floats are a linearly spaced set of values, which span the gap between the negative and positive normal floats.

Background

[edit]

Subnormal numbers provide the guarantee that addition and subtraction of floating-point numbers never underflows; two nearby floating-point numbers always have a representable non-zero difference. Without gradual underflow, the subtractiona − b can underflow and produce zero even though the values are not equal. This can, in turn, lead todivision by zero errors that cannot occur when gradual underflow is used.^[1]

Subnormal numbers were implemented in theIntel 8087 while the IEEE 754 standard was being written. They were by far the most controversial feature in theK-C-S format proposal that was eventually adopted,^[2] but this implementation demonstrated that subnormal numbers could be supported in a practical implementation. Some implementations offloating-point units do not directly support subnormal numbers in hardware, but rather trap to some kind of software support. While this may be transparent to the user, it can result in calculations that produce or consume subnormal numbers being much slower than similar calculations on normal numbers.

IEEE

[edit]

InIEEE binary floating-point formats,subnormals are represented by having a zero exponent field with a non-zero significand field.^[3]

No other denormalized numbers exist in the IEEE binary floating-point formats, but theydo exist in some other formats, including the IEEE decimal floating-point formats.

Performance issues

[edit]

Some systems handle subnormal values in hardware, in the same way as normal values. Others leave the handling of subnormal values to system software ("assist"), only handling normal values and zero in hardware. Handling subnormal values in software always leads to a significant decrease in performance. When subnormal values are entirely computed in hardware, implementation techniques exist to allow their processing at speeds comparable to normal numbers.^[4] However, the speed of computation remains significantly reduced on many modern x86 processors; in extreme cases,instructions involving subnormal operands may take as many as 100 additional clock cycles, causing the fastest instructions to run as much as six times slower.^[5]^[6]

This speed difference can be a security risk. Researchers showed that it provides atiming side channel that allows a malicious web site to extract page content from another site inside a web browser.^[7]

Some applications need to contain code to avoid subnormal numbers, either to maintain accuracy, or in order to avoid the performance penalty in some processors. For instance, in audio processing applications, subnormal values usually represent a signal so quiet that it is out of the human hearing range. Because of this, a common measure to avoid subnormals on processors where there would be a performance penalty is to cut the signal to zero once it reaches subnormal levels or mix in an extremely quiet noise signal.^[8] Other methods of preventing subnormal numbers include adding a DC offset, quantizing numbers, adding a Nyquist signal, etc.^[9] Since theSSE2 processor extension,Intel has provided such a functionality in CPU hardware, which rounds subnormal numbers to zero.^[10]

Disabling subnormal floats at the code level

[edit]

Intel SSE

[edit]

Intel's C and Fortran compilers enable theDAZ (denormals-are-zero) andFTZ (flush-to-zero) flags forSSE by default for optimization levels higher than-O0.^[11] The effect ofDAZ is to treat subnormal input arguments to floating-point operations as zero, and the effect ofFTZ is to return zero instead of a subnormal float for operations that would result in a subnormal float, even if the input arguments are not themselves subnormal.clang andgcc have varying default states depending on platform and optimization level.

A non-C99-compliant method of enabling theDAZ andFTZ flags on targets supporting SSE is given below, but is not widely supported. It is known to work onMac OS X since at least 2006.^[12]

#include<fenv.h>#pragma STDC FENV_ACCESS ON// Sets DAZ and FTZ, clobbering other CSR settings.// See https://opensource.apple.com/source/Libm/Libm-287.1/Source/Intel/, fenv.c and fenv.h.fesetenv(FE_DFL_DISABLE_SSE_DENORMS_ENV);// fesetenv(FE_DFL_ENV) // Disable both, clobbering other CSR settings.

For other x86-SSE platforms where the C library has not yet implemented this flag, the following may work:^[13]

#include<xmmintrin.h>_mm_setcsr(_mm_getcsr()|0x0040);// DAZ_mm_setcsr(_mm_getcsr()|0x8000);// FTZ_mm_setcsr(_mm_getcsr()|0x8040);// Both_mm_setcsr(_mm_getcsr()&~0x8040);// Disable both

The_MM_SET_DENORMALS_ZERO_MODE and_MM_SET_FLUSH_ZERO_MODE macros wrap a more readable interface for the code above.^[14]

// To enable DAZ#include<pmmintrin.h>_MM_SET_DENORMALS_ZERO_MODE(_MM_DENORMALS_ZERO_ON);// To enable FTZ#include<xmmintrin.h>_MM_SET_FLUSH_ZERO_MODE(_MM_FLUSH_ZERO_ON);

Most compilers will already provide the previous macro by default, otherwise the following code snippet can be used (the definition for FTZ is analogous):

#define _MM_DENORMALS_ZERO_MASK   0x0040#define _MM_DENORMALS_ZERO_ON     0x0040#define _MM_DENORMALS_ZERO_OFF    0x0000#define _MM_SET_DENORMALS_ZERO_MODE(mode) _mm_setcsr((_mm_getcsr() & ~_MM_DENORMALS_ZERO_MASK) | (mode))#define _MM_GET_DENORMALS_ZERO_MODE()                (_mm_getcsr() &  _MM_DENORMALS_ZERO_MASK)

The default denormalization behavior is mandated by theABI, and therefore well-behaved software should save and restore the denormalization mode before returning to the caller or calling code in other libraries.

ARM

[edit]

This sectiondoes notcite anysources. Please helpimprove this section byadding citations to reliable sources. Unsourced material may be challenged andremoved.(March 2023) (Learn how and when to remove this message)

AArch32 NEON (SIMD) FPU always uses a flush-to-zero mode^{[citation needed]}, which is the same asFTZ + DAZ. For the scalar FPU and in the AArch64 SIMD, the flush-to-zero behavior is optional and controlled by theFZ bit of the control register – FPSCR in Arm32 and FPCR in AArch64.^[15]

One way to do this can be:

#if defined(__arm64__) || defined(__aarch64__)uint64_tfpcr;asm("mrs %0,   fpcr":"=r"(fpcr));//Load the FPCR registerasm("msr fpcr, %0"::"r"(fpcr|(1<<24)));//Set the 24th bit (FTZ) to 1#endif

Some ARM processors have hardware handling of subnormals.

Notes

[edit]

References

[edit]

^William Kahan."IEEE 754R meeting minutes, 2002". Archived fromthe original on 15 October 2016. Retrieved29 December 2013.
^"An Interview with the Old Man of Floating-Point". University of California, Berkeley.
^"Denormalized numbers". Caldera International. Retrieved11 October 2023. (Note that the XenuOS documentation usesdenormal whereIEEE 754 usessubnormal.)
^Schwarz, E.M.; Schmookler, M.; Son Dao Trong (July 2005)."FPU Implementations with Denormalized Numbers"(PDF).IEEE Transactions on Computers.54 (7):825–836.doi:10.1109/TC.2005.118.S2CID 26470540.
^Dooley, Isaac; Kale, Laxmikant (12 September 2006)."Quantifying the Interference Caused by Subnormal Floating-Point Values"(PDF). Retrieved30 November 2010.
^Fog, Agner."Instruction tables: Lists of instruction latencies, throughputs and microoperation breakdowns for Intel, AMD and VIA CPUs"(PDF). Retrieved25 January 2011.
^Andrysco, Marc; Kohlbrenner, David; Mowery, Keaton; Jhala, Ranjit; Lerner, Sorin; Shacham, Hovav."On Subnormal Floating Point and Abnormal Timing"(PDF). Retrieved5 October 2015.
^Serris, John (16 April 2002)."Pentium 4 denormalization: CPU spikes in audio applications". Archived fromthe original on 25 February 2012. Retrieved29 April 2015.
^de Soras, Laurent (19 April 2005)."Denormal numbers in floating point signal processing applications"(PDF).
^Casey, Shawn (16 October 2008)."x87 and SSE Floating Point Assists in IA-32: Flush-To-Zero (FTZ) and Denormals-Are-Zero (DAZ)". Retrieved3 September 2010.
^"Intel® MPI Library – Documentation". Intel.
^"Re: Macbook pro performance issue". Apple Inc. Archived fromthe original on 26 August 2016.
^"Re: Changing floating point state (Was: double vs float performance)". Apple Inc. Archived fromthe original on 15 January 2014. Retrieved24 January 2013.
^"C++ Compiler for Linux* Systems User's Guide". Intel.
^"Aarch64 Registers". Arm.

Movatterモバイル変換

Subnormal number

Terminology

Definition

Background

IEEE

Performance issues

Disabling subnormal floats at the code level

Intel SSE

ARM

See also

Notes

References

Further reading