Movatterモバイル変換

Minifloat

From Wikipedia, the free encyclopedia

Floating-point values coded as few bits

Floating-point formats
IEEE 754
16-bit:Half (binary16) 32-bit:Single (binary32),decimal32 64-bit:Double (binary64),decimal64 128-bit:Quadruple (binary128),decimal128 256-bit:Octuple (binary256) Extended precision
Other
Minifloat bfloat16 TensorFloat-32 Microsoft Binary Format IBM floating-point architecture PMBus Linear-11 G.711 8-bit floats
Alternatives
Arbitrary precision
Tapered floating point
Posit
v t e

Computer architecture bit widths
Bit
1 4 8 12 16 18 24 26 30 31 32 36 45 48 60 64 128 256 512 bit slicing
Application
8 16 32 64
Binary floating-point precision
16 (×½) 24 32 (×1) 40 64 (×2) 80 128 (×4) 256 (×8)
Decimal floating-point precision
32 64 128
v t e

Incomputing,minifloats arefloating-point values represented with very fewbits. This reduced precision makes them ill-suited for general-purpose numerical calculations, but they are useful for special purposes such as:

Computer graphics, where iterations are small and precision has aesthetic effects.^[1]
Machine learning, which can be relatively insensitive to numeric precision.bfloat16 and fp8 are common formats.^[2]

Additionally, they are frequently encountered as a pedagogical tool in computer-science courses to demonstrate the properties and structures offloating-point arithmetic andIEEE 754 numbers.

Minifloats with 16bits arehalf-precision numbers (opposed tosingle anddouble precision). There are also minifloats with 8 bits or even fewer.^[2]

Minifloats can be designed following the principles of theIEEE 754 standard. In this case they must obey the (not explicitly written) rules for the frontier betweensubnormal and normal numbers and must have special patterns forinfinity andNaN. Normalized numbers are stored with abiased exponent. The new revision of the standard,IEEE 754-2008, has16-bit binary minifloats.

	… 000	… 001	… 010	… 011	… 100	… 101	… 110	… 111
0 0000 …	0	0.001953125	0.00390625	0.005859375	0.0078125	0.009765625	0.01171875	0.013671875
0 0001 …	0.015625	0.017578125	0.01953125	0.021484375	0.0234375	0.025390625	0.02734375	0.029296875
0 0010 …	0.03125	0.03515625	0.0390625	0.04296875	0.046875	0.05078125	0.0546875	0.05859375
0 0011 …	0.0625	0.0703125	0.078125	0.0859375	0.09375	0.1015625	0.109375	0.1171875
0 0100 …	0.125	0.140625	0.15625	0.171875	0.1875	0.203125	0.21875	0.234375
0 0101 …	0.25	0.28125	0.3125	0.34375	0.375	0.40625	0.4375	0.46875
0 0110 …	0.5	0.5625	0.625	0.6875	0.75	0.8125	0.875	0.9375
0 0111 …	1	1.125	1.25	1.375	1.5	1.625	1.75	1.875
0 1000 …	2	2.25	2.5	2.75	3	3.25	3.5	3.75
0 1001 …	4	4.5	5	5.5	6	6.5	7	7.5
0 1010 …	8	9	10	11	12	13	14	15
0 1011 …	16	18	20	22	24	26	28	30
0 1100 …	32	36	40	44	48	52	56	60
0 1101 …	64	72	80	88	96	104	112	120
0 1110 …	128	144	160	176	192	208	224	240
0 1111 …	Inf	NaN	NaN	NaN	NaN	NaN	NaN	NaN
1 0000 …	−0	−0.001953125	−0.00390625	−0.005859375	−0.0078125	−0.009765625	−0.01171875	−0.013671875
1 0001 …	−0.015625	−0.017578125	−0.01953125	−0.021484375	−0.0234375	−0.025390625	−0.02734375	−0.029296875
1 0010 …	−0.03125	−0.03515625	−0.0390625	−0.04296875	−0.046875	−0.05078125	−0.0546875	−0.05859375
1 0011 …	−0.0625	−0.0703125	−0.078125	−0.0859375	−0.09375	−0.1015625	−0.109375	−0.1171875
1 0100 …	−0.125	−0.140625	−0.15625	−0.171875	−0.1875	−0.203125	−0.21875	−0.234375
1 0101 …	−0.25	−0.28125	−0.3125	−0.34375	−0.375	−0.40625	−0.4375	−0.46875
1 0110 …	−0.5	−0.5625	−0.625	−0.6875	−0.75	−0.8125	−0.875	−0.9375
1 0111 …	−1	−1.125	−1.25	−1.375	−1.5	−1.625	−1.75	−1.875
1 1000 …	−2	−2.25	−2.5	−2.75	−3	−3.25	−3.5	−3.75
1 1001 …	−4	−4.5	−5	−5.5	−6	−6.5	−7	−7.5
1 1010 …	−8	−9	−10	−11	−12	−13	−14	−15
1 1011 …	−16	−18	−20	−22	−24	−26	−28	−30
1 1100 …	−32	−36	−40	−44	−48	−52	−56	−60
1 1101 …	−64	−72	−80	−88	−96	−104	−112	−120
1 1110 …	−128	−144	−160	−176	−192	−208	−224	−240
1 1111 …	−Inf	NaN	NaN	NaN	NaN	NaN	NaN	NaN

	… 0000	… 0001	… 0010	… 0011	… 0100	… 0101	… 0110	… 0111	… 1000	… 1001	… 1010	… 1011	… 1100	… 1101	… 1110	… 1111
0 000 …	0	0.015625	0.03125	0.046875	0.0625	0.078125	0.09375	0.109375	0.125	0.140625	0.15625	0.171875	0.1875	0.203125	0.21875	0.234375
0 001 …	0.25	0.265625	0.28125	0.296875	0.3125	0.328125	0.34375	0.359375	0.375	0.390625	0.40625	0.421875	0.4375	0.453125	0.46875	0.484375
0 010 …	0.5	0.53125	0.5625	0.59375	0.625	0.65625	0.6875	0.71875	0.75	0.78125	0.8125	0.84375	0.875	0.90625	0.9375	0.96875
0 011 …	1	1.0625	1.125	1.1875	1.25	1.3125	1.375	1.4375	1.5	1.5625	1.625	1.6875	1.75	1.8125	1.875	1.9375
0 100 …	2	2.125	2.25	2.375	2.5	2.625	2.75	2.875	3	3.125	3.25	3.375	3.5	3.625	3.75	3.875
0 101 …	4	4.25	4.5	4.75	5	5.25	5.5	5.75	6	6.25	6.5	6.75	7	7.25	7.5	7.75
0 110 …	8	8.5	9	9.5	10	10.5	11	11.5	12	12.5	13	13.5	14	14.5	15	15.5
0 111 …	Inf	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN
1 000 …	−0	−0.015625	−0.03125	−0.046875	−0.0625	−0.078125	−0.09375	−0.109375	−0.125	−0.140625	−0.15625	−0.171875	−0.1875	−0.203125	−0.21875	−0.234375
1 001 …	−0.25	−0.265625	−0.28125	−0.296875	−0.3125	−0.328125	−0.34375	−0.359375	−0.375	−0.390625	−0.40625	−0.421875	−0.4375	−0.453125	−0.46875	−0.484375
1 010 …	−0.5	−0.53125	−0.5625	−0.59375	−0.625	−0.65625	−0.6875	−0.71875	−0.75	−0.78125	−0.8125	−0.84375	−0.875	−0.90625	−0.9375	−0.96875
1 011 …	−1	−1.0625	−1.125	−1.1875	−1.25	−1.3125	−1.375	−1.4375	−1.5	−1.5625	−1.625	−1.6875	−1.75	−1.8125	−1.875	−1.9375
1 100 …	−2	−2.125	−2.25	−2.375	−2.5	−2.625	−2.75	−2.875	−3	−3.125	−3.25	−3.375	−3.5	−3.625	−3.75	−3.875
1 101 …	−4	−4.25	−4.5	−4.75	−5	−5.25	−5.5	−5.75	−6	−6.25	−6.5	−6.75	−7	−7.25	−7.5	−7.75
1 110 …	−8	−8.5	−9	−9.5	−10	−10.5	−11	−11.5	−12	−12.5	−13	−13.5	−14	−14.5	−15	−15.5
1 111 …	−Inf	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN	NaN

	0 … 0	0 … 1	1 … 0	1 … 1
… 00 …	0	0.5	−0	−0.5
… 01 …	1	1.5	−1	−1.5
… 10 …	2	3	−2	−3
… 11 …	Inf	NaN	−Inf	NaN

v t e Data types
Uninterpreted	Bit Byte Trit Tryte Word Bit array
Numeric	Arbitrary-precision or bignum Complex Decimal Fixed point Floating point Reduced precision Minifloat Half precision bfloat16 Single precision Double precision Quadruple precision Octuple precision Extended precision Long double Integer signedness Interval Rational
Pointer	Address physical virtual Reference
Text	Character String null-terminated
Composite	Algebraic data type generalized Array Associative array Class Dependent Equality Inductive Intersection List Object metaobject Option type Product Record or Struct Refinement Set Union tagged
Other	Boolean Bottom type Collection Enumerated type Exception Function type Opaque data type Recursive data type Semaphore Stream Strongly typed identifier Top type Type class Empty type Unit type Void
Related topics	Abstract data type Boxing Data structure Generic Kind metaclass Parametric polymorphism Primitive data type Interface Subtyping Type constructor Type conversion Type system Type theory Variable

Movatterモバイル変換

Notation

Example 8-bit float (1.4.3)

Representation of zero

Subnormal numbers

Normalized numbers

Infinity

Not a number

Table of values

Alternative bias values

Different bit allocations

Arithmetic

Addition

Subtraction, multiplication and division

Other sizes

4 bits and fewer

In embedded devices

See also

References

External links