Movatterモバイル変換


[0]ホーム

URL:


Jump to content
WikipediaThe Free Encyclopedia
Search

Half-precision floating-point format

From Wikipedia, the free encyclopedia
16-bit computer number format
Not to be confused withbfloat16, a different 16-bit floating-point format.

Incomputing,half precision (sometimes calledFP16 orfloat16) is abinaryfloating-pointcomputer number format that occupies16 bits (two bytes in modern computers) incomputer memory. It is intended for storage of floating-point values in applications where higher precision is not essential, in particularimage processing andneural networks.

Almost all modern uses follow theIEEE 754-2008 standard, where the 16-bitbase-2 format is referred to asbinary16, and the exponent uses 5 bits. This can express values in the range ±65,504, with the minimum value above 1 being 1 + 1/1024.

Floating-pointformats
IEEE 754
Other
Alternatives
Tapered floating point

History

[edit]

Several earlier 16-bit floating point formats have existed including that of Hitachi's HD61810 DSP of 1982 (a 4-bit exponent and a 12-bit mantissa),[1] the top 16 bits of a 32-bit float (8 exponent and 7 mantissa bits) called abfloat16, and Thomas J. Scott's WIF of 1991 (5 exponent bits, 10 mantissa bits)[2] and the3dfx Voodoo Graphics processor of 1995 (same as Hitachi).[3]

ILM was searching for an image format that could handle a widedynamic range, but without the hard drive and memory cost of single or double precision floating point.[4] The hardware-accelerated programmable shading group led by John Airey atSGI (Silicon Graphics) used the s10e5 data type in 1997 as part of the "bali" design effort. This is described in aSIGGRAPH 2000 paper[5] (see section 4.3) and further documented in US patent 7518615.[6] It was popularized by its use in the open-sourceOpenEXR image format.

Nvidia andMicrosoft defined thehalfdatatype in theCg language, released in early 2002, and implemented it in silicon in theGeForce FX, released in late 2002.[7] However, hardware support for accelerated 16-bit floating point was later dropped by Nvidia before being reintroduced in theTegra X1 mobile GPU in 2015.

TheF16C extension in 2012 allows x86 processors to convert half-precision floats to and from single-precision floats with a machine instruction.

IEEE 754 half-precision binary floating-point format: binary16

[edit]

The IEEE 754 standard[8] specifies abinary16 as having the following format:

The format is laid out as follows:

The format is assumed to have an implicit lead bit with value 1 unless the exponent field is stored with all zeros. Thus, only 10 bits of thesignificand appear in the memory format but the total precision is 11 bits. In IEEE 754 parlance, there are 10 bits of significand, but there are 11 bits of significand precision (log10(211) ≈ 3.311 decimal digits, or 4 digits ± slightly less than 5units in the last place).

Exponent encoding

[edit]

The half-precision binary floating-point exponent is encoded using anoffset-binary representation, with the zero offset being 15; also known as exponent bias in the IEEE 754 standard.[8]

  • Emin = 000012 − 011112 = −14
  • Emax = 111102 − 011112 = 15
  • Exponent bias = 011112 = 15

Thus, as defined by the offset binary representation, in order to get the true exponent the offset of 15 has to be subtracted from the stored exponent.

The stored exponents 000002 and 111112 are interpreted specially.

ExponentSignificand = zeroSignificand ≠ zeroEquation
000002zero,−0subnormal numbers(−1)signbit × 2−14 × 0.significantbits2
000012, ..., 111102normalized value(−1)signbit × 2exponent−15 × 1.significantbits2
111112±infinityNaN (quiet, signaling)

The minimum strictly positive (subnormal) value is2−24 ≈ 5.96 × 10−8.The minimum positive normal value is 2−14 ≈ 6.10 × 10−5.The maximum representable value is (2−2−10) × 215 = 65504.

Half precision examples

[edit]

These examples are given in bit representationof the floating-point value. This includes the sign bit, (biased) exponent, and significand.

BinaryHexValueNotes
0 00000 000000000000000
0 00000 000000000100012−14 × (0 +1/1024 ) ≈ 0.000000059604645smallest positive subnormal number
0 00000 111111111103ff2−14 × (0 +1023/1024 ) ≈ 0.000060975552largest subnormal number
0 00001 000000000004002−14 × (1 +0/1024 ) ≈ 0.00006103515625smallest positive normal number
0 01101 010101010135552−2 × (1 +341/1024 ) ≈ 0.33325195nearest value to 1/3
0 01110 11111111113bff2−1 × (1 +1023/1024 ) ≈ 0.99951172largest number less than one
0 01111 00000000003c0020 × (1 +0/1024 ) = 1one
0 01111 00000000013c0120 × (1 +1/1024 ) ≈ 1.00097656smallest number larger than one
0 11110 11111111117bff215 × (1 +1023/1024 ) = 65504largest normal number
0 10000 1001001000424821 × (1 +584/1024 ) = 3.140625closest value toπ
0 11111 00000000007c00infinity
1 00000 00000000008000−0
1 10000 0000000000c000(−1)1 × 21 × (1 +0/1024 ) = −2
1 11111 0000000000fc00−∞negative infinity

By default, 1/3 rounds down like fordouble precision, because of the odd number of bits in the significand. The bits beyond the rounding point are0101... which is less than 1/2 of aunit in the last place.

ARM alternative half-precision

[edit]

ARM processors support (via a floating-pointcontrol register bit) an "alternative half-precision" format, which does away with the special case for an exponent value of 31 (111112).[9] It is almost identical to the IEEE format, but there is no encoding for infinity or NaNs; instead, an exponent of 31 encodes normalized numbers in the range 65536 to 131008.

Uses of half precision

[edit]

Half precision is used in severalcomputer graphics environments to store pixels, includingMATLAB,OpenEXR,JPEG XR,GIMP,OpenGL,Vulkan,[10]Cg,Direct3D, andD3DX. The advantage over 8-bit or 16-bit integers is that the increaseddynamic range allows for more detail to be preserved in highlights andshadows for images, and avoids gamma correction. The advantage over 32-bitsingle-precision floating point is that it requires half the storage andbandwidth (at the expense of precision and range).[4]

Half precision can be useful formesh quantization. Mesh data is usually stored using 32-bit single-precision floats for the vertices, however in some situations it is acceptable to reduce the precision to only 16-bit half-precision, requiring only half the storage at the expense of some precision. Mesh quantization can also be done with 8-bit or 16-bit fixed precision depending on the requirements.[11]

Hardware and software formachine learning orneural networks tend to use half precision: such applications usually do a large amount of calculation, but don't require a high level of precision. Due to hardware typically not supporting 16-bit half-precision floats, neural networks often use thebfloat16 format, which is the single precision float format truncated to 16 bits.

If the hardware has instructions to compute half-precision math, it is often faster than single or double precision. If the system hasSIMD instructions that can handle multiple floating-point numbers within one instruction, half precision can be twice as fast by operating on twice as many numbers simultaneously.[12]

Support by programming languages

[edit]

Zig provides support for half precisions with itsf16 type.[13]

.NET 5 introduced half precision floating point numbers with theSystem.Half standard library type.[14][15] As of 16 July 2025[update], no .NET language (C#,F#,Visual Basic, andC++/CLI andC++/CX) has literals (e.g. in C#,1.0f has typeSystem.Single or1.0m has typeSystem.Decimal) or a keyword for the type.[16][17][18]

Swift introduced half-precision floating point numbers in Swift 5.3 with theFloat16 type.[19]

OpenCL also supports half-precision floating point numbers with the half datatype on IEEE 754-2008 half-precision storage format.[20]

As of 16 July 2025[update],Rust is currently working on adding a newf16 type for IEEE half-precision 16-bit floats.[21]

Julia provides support for half-precision floating point numbers with theFloat16 type.[22]

C++ introduced half-precision since C++23 with thestd::float16_t type.[23]GCC already implements support for it.[24]

Hardware support

[edit]

Several versions of theARM architecture have support for half precision.[25]

Support for conversions with half-precision floats in thex86instruction set is specified in theF16C instruction set extension, first introduced in 2009 by AMD and fairly broadly adopted by AMD and Intel CPUs by 2012. This was further extended up theAVX-512_FP16 instruction set extension implemented in the IntelSapphire Rapids processor.[26]

OnRISC-V, theZfh andZfhminextensions provide hardware support for 16-bit half precision floats. TheZfhmin extension is a minimal alternative toZfh.[27]

OnPower ISA, VSX and the not-yet-approved SVP64 extension provide hardware support for 16-bit half-precision floats as of PowerISA v3.1B and later.[28][29]

Support for half precision onIBM Z is part of the Neural-network-processing-assist facility that IBM introduced withTelum. IBM refers to half precision floating point data as NNP-Data-Type 1 (16-bit).

See also

[edit]

References

[edit]
  1. ^"hitachi :: dataBooks :: HD61810 Digital Signal Processor Users Manual".Archive.org. Retrieved2017-07-14.
  2. ^Scott, Thomas J. (March 1991). "Mathematics and computer science at odds over real numbers".Proceedings of the twenty-second SIGCSE technical symposium on Computer science education - SIGCSE '91. Vol. 23. pp. 130–139.doi:10.1145/107004.107029.ISBN 0897913779.S2CID 16648394.
  3. ^"/home/usr/bk/glide/docs2.3.1/GLIDEPGM.DOC".Gamers.org. Retrieved2017-07-14.
  4. ^ab"OpenEXR". OpenEXR. Archived fromthe original on 2013-05-08. Retrieved2017-07-14.
  5. ^Mark S. Peercy; Marc Olano; John Airey; P. Jeffrey Ungar."Interactive Multi-Pass Programmable Shading"(PDF).People.csail.mit.edu. Retrieved2017-07-14.
  6. ^"Patent US7518615 - Display system having floating point rasterization and floating point ... - Google Patents".Google.com. Retrieved2017-07-14.
  7. ^"vs_2_sw".Cg 3.1 Toolkit Documentation. Nvidia. Retrieved17 August 2016.
  8. ^abIEEE Standard for Floating-Point Arithmetic. IEEE STD 754-2019 (Revision of IEEE 754-2008). July 2019. pp. 1–84.doi:10.1109/ieeestd.2019.8766229.ISBN 978-1-5044-5924-2.
  9. ^"Half-precision floating-point number support".RealView Compilation Tools Compiler User Guide. 10 December 2010. Retrieved2015-05-05.
  10. ^Garrard, Andrew."10.1. 16-bit floating-point numbers".Khronos Data Format Specification v1.2 rev 1. Khronos. Retrieved2023-08-05.
  11. ^"KHR_mesh_quantization".GitHub. Khronos Group. Retrieved2023-07-02.
  12. ^Ho, Nhut-Minh; Wong, Weng-Fai (September 1, 2017)."Exploiting half precision arithmetic in Nvidia GPUs"(PDF). Department of Computer Science, National University of Singapore. RetrievedJuly 13, 2020.Nvidia recently introduced native half precision floating point support (FP16) into their Pascal GPUs. This was mainly motivated by the possibility that this will speed up data intensive and error tolerant applications in GPUs.
  13. ^"Floats".ziglang.org. Retrieved7 January 2024.
  14. ^"Half Struct (System)".learn.microsoft.com. Retrieved2024-02-01.
  15. ^Govindarajan, Prashanth (2020-08-31)."Introducing the Half type!"..NET Blog. Retrieved2024-02-01.
  16. ^"Floating-point numeric types ― C# reference".learn.microsoft.com. 2022-09-29. Retrieved2024-02-01.
  17. ^"Literals ― F# language reference".learn.microsoft.com. 2022-06-15. Retrieved2024-02-01.
  18. ^"Data Type Summary — Visual Basic language reference".learn.microsoft.com. 2021-09-15. Retrieved2024-02-01.
  19. ^"swift-evolution/proposals/0277-float16.md at main · apple/swift-evolution".github.com. Retrieved13 May 2024.
  20. ^"cl_khr_fp16 extension".registry.khronos.org. Retrieved31 May 2024.
  21. ^Cross, Travis."Tracking Issue for f16 and f128 float types".GitHub. Retrieved2024-07-05.
  22. ^"Integers and Floating-Point Numbers · The Julia Language".docs.julialang.org. Retrieved2024-07-11.
  23. ^"P1467R9: Extended floating-point types and standard names".www.open-std.org. Retrieved2024-10-18.
  24. ^"106652 – [C++23] P1467 - Extended floating-point types and standard names".gcc.gnu.org. Retrieved2024-10-18.
  25. ^"Half-precision floating-point number format".ARM Compiler armclang Reference Guide Version 6.7. ARM Developer. Retrieved13 May 2022.
  26. ^Towner, Daniel."Intel® Advanced Vector Extensions 512 - FP16 Instruction Set for Intel® Xeon® Processor Based Products"(PDF).Intel® Builders Programs. Retrieved13 May 2022.
  27. ^"RISC-V Instruction Set Manual, Volume I: RISC-V User-Level ISA".Five EmbedDev. Retrieved2023-07-02.
  28. ^"OPF_PowerISA_v3.1B.pdf".OpenPOWER Files. OpenPOWER Foundation. Retrieved2023-07-02.
  29. ^"ls005.xlen.mdwn".libre-soc.org Git. Retrieved2023-07-02.

Further reading

[edit]

External links

[edit]


Uninterpreted
Numeric
Pointer
Text
Composite
Other
Related
topics
Retrieved from "https://en.wikipedia.org/w/index.php?title=Half-precision_floating-point_format&oldid=1324220209"
Categories:
Hidden categories:

[8]ページ先頭

©2009-2025 Movatter.jp