Movatterモバイル変換

RFC 8949	CBOR	December 2020
Bormann & Hoffman	Standards Track	[Page]

Stream:: Internet Engineering Task Force (IETF)
RFC:: 8949
STD:: 94
Obsoletes:: 7049
Category:: Standards Track
Published:: December 2020
ISSN:: 2070-1721
Authors:: C. Bormann
Universität Bremen TZI
P. Hoffman
ICANN

Status of This Memo

This is an Internet Standards Track document.¶

This document is a product of the Internet Engineering Task Force (IETF). It represents the consensus of the IETF community. It has received public review and has been approved for publication by the Internet Engineering Steering Group (IESG). Further information on Internet Standards is available in Section 2 of RFC 7841.¶

Information about the current status of this document, any errata, and how to provide feedback on it may be obtained athttps://www.rfc-editor.org/info/rfc8949.¶

Copyright Notice

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.¶

1.Introduction

There are hundreds of standardized formats for binary representationof structured data (also known as binary serialization formats). Ofthose, some are for specific domains of information, while others aregeneralized for arbitrary data. In the IETF, probably the best-knownformats in the latter category are ASN.1's BER and DER[ASN.1].¶

The format defined here follows some specific design goals that arenot well met by current formats. The underlying data model is anextended version of the JSON data model[RFC8259]. It is importantto note that this is not a proposal that the grammar in RFC 8259 beextended in general, since doing so would cause a significantbackwards incompatibility with already deployed JSONdocuments. Instead, this document simply defines its own data modelthat starts from JSON.¶

Appendix E lists some existing binary formats and discusseshow well they do or do not fit the design objectives of the ConciseBinary Object Representation (CBOR).¶

This document obsoletes[RFC7049], providing editorial improvements, new details, and errata fixes while keeping full compatibility with the interchange format of RFC 7049. It does not create a new version of the format.¶

1.1.Objectives

The objectives of CBOR, roughly in decreasing order of importance,are:¶

The representation must be able to unambiguously encode most commondata formats used in Internet standards.¶
- It must represent a reasonable set of basic data types andstructures using binary encoding. "Reasonable" here is largelyinfluenced by the capabilities of JSON, with the major additionof binary byte strings. The structures supported are limited toarrays and trees; loops and lattice-style graphs are notsupported.¶
- There is no requirement that all data formats be uniquelyencoded; that is, it is acceptable that the number "7" might beencoded in multiple different ways.¶
The code for an encoder or decoder must be able to be compact inorder to support systems with very limited memory, processor power,and instruction sets.¶
- An encoder and a decoder need to be implementable in a verysmall amount of code (for example, in class 1 constrained nodesas defined in[RFC7228]).¶
- The format should use contemporary machine representations ofdata (for example, not requiring binary-to-decimal conversion).¶
Data must be able to be decoded without a schema description.¶
- Similar to JSON, encoded data should be self-describing so thata generic decoder can be written.¶
The serialization must be reasonably compact, but data compactnessis secondary to code compactness for the encoder and decoder.¶
- "Reasonable" here is bounded by JSON as an upper bound in size and by the implementation complexity, which limits the amount of effort that can go into achieving that compactness. Using either general compression schemes or extensive bit-fiddling violates the complexity goals.¶
The format must be applicable to both constrained nodes andhigh-volume applications.¶
- This means it must be reasonably frugal in CPU usage for bothencoding and decoding. This is relevant both for constrainednodes and for potential usage in applications with a very highvolume of data.¶
The format must support all JSON data types for conversion to andfrom JSON.¶
- It must support a reasonable level of conversion as long as thedata represented is within the capabilities of JSON. It must bepossible to define a unidirectional mapping towards JSON for alltypes of data.¶
The format must be extensible, and the extended data must bedecodable by earlier decoders.¶
- The format is designed for decades of use.¶
- The format must support a form of extensibility that allowsfallback so that a decoder that does not understand an extensioncan still decode the message.¶
- The format must be able to be extended in the future by laterIETF standards.¶

1.2.Terminology

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14[RFC2119][RFC8174] when, and only when, they appear in all capitals, as shown here.¶

The term "byte" is used in its now-customary sense as a synonym for"octet". All multi-byte values are encoded in network byte order (thatis, most significant byte first, also known as "big-endian").¶

This specification makes use of the following terminology:¶

Data item:: A single piece of CBOR data. The structure of a data item maycontain zero, one, or more nested data items. The term is used bothfor the data item in representation format and for the abstract ideathat can be derived from that by a decoder; the former can beaddressed specifically by using the term "encoded data item".¶
Decoder:: A process that decodes a well-formed encoded CBOR data item and makes it available to anapplication. Formally speaking, a decoder contains a parser tobreak up the input using the syntax rules of CBOR, as well as asemantic processor to prepare the data in a form suitable to theapplication.¶
Encoder:: A process that generates the (well-formed) representation format of a CBOR dataitem from application information.¶
Data Stream:: A sequence of zero or more data items, not further assembled into alarger containing data item (see[RFC8742] for one application).The independent data items that makeup a data stream are sometimes also referred to as "top-level dataitems".¶
Well-formed:: A data item that follows the syntactic structure of CBOR. Awell-formed data item uses the initial bytes and the byte stringsand/or data items that are implied by their values as defined inCBOR and does not include following extraneous data. CBOR decodersby definition only return contents from well-formed data items.¶
Valid:: A data item that is well-formed and also follows the semanticrestrictions that apply to CBOR data items (Section 5.3).¶
Expected:: Besides its normal English meaning, the term "expected" is used todescribe requirements beyond CBOR validity that an application hason its input data. Well-formed (processable at all), valid (checkedby a validity-checking generic decoder), and expected (checked by theapplication) form a hierarchy of layers of acceptability.¶
Stream decoder:: A process that decodes a data stream and makes each of the dataitems in the sequence available to an application as they arereceived.¶

Terms and concepts for floating-point values such as Infinity, NaN(not a number), negative zero, and subnormal are defined in[IEEE754].¶

Where bit arithmetic or data types are explained, this document usesthe notation familiar from the programming language C[C], except that".." denotes a range that includes both ends given, and superscriptnotation denotes exponentiation. For example, 2 to the power of 64 isnotated: 2⁶⁴.In the plain-text version of this specification, superscript notationis not available and therefore is rendered by a surrogate notation.That notation is not optimized for this RFC; it is unfortunatelyambiguous with C's exclusive-or (which is only used in the appendices,which in turn do not use exponentiation) and requires circumspectionfrom the reader of the plain-text version.¶

Examples and pseudocodeassume that signed integers use two's complement representation andthat right shifts of signed integers perform sign extension; theseassumptions are also specified in Sections 6.8.1 (basic.fundamental)and 7.6.7 (expr.shift) of the 2020 version of C++ (currently availableas a final draft,[Cplusplus20]).¶

Similar to the "0x" notation forhexadecimal numbers, numbers in binary notation are prefixed with"0b". Underscores can be added to a number solely forreadability, so 0b00100001 (0x21) might be written 0b001_00001 toemphasize the desired interpretation of the bits in the byte; in thiscase, it is split into three bits and five bits. Encoded CBOR dataitems are sometimes given in the "0x" or "0b" notation; these valuesare first interpreted as numbers as in C and are then interpreted asbyte strings in network byte order, including any leading zero bytesexpressed in the notation.¶

Words may beitalicized for emphasis; in the plain text form of this specification, this is indicated by surrounding words with underscore characters. Verbatim text (e.g., names from a programming language) may be set inmonospace type; in plain text, this is approximated somewhat ambiguously by surrounding the text in double quotes (which also retain their usual meaning).¶

2.CBOR Data Models

CBOR is explicit about its generic data model, which defines the setof all data items that can be represented in CBOR. Its basic genericdata model is extensible by the registration of "simple values" andtags. Applications can then create a subset of the resulting extended genericdata model to build their specific data models.¶

Within environments that can represent the data items in the genericdata model, generic CBOR encoders and decoders can be implemented(which usually involves defining additional implementation data typesfor those data items that do not already have a natural representationin the environment). The ability to provide generic encoders anddecoders is an explicit design goal of CBOR; however, many applicationswill provide their own application-specific encoders and/or decoders.¶

In the basic (unextended) generic data model defined inSection 3, a data item is one ofthe following:¶

an integer in the range -2⁶⁴..2⁶⁴-1 inclusive¶
a simple value, identified by a numberbetween 0 and 255, but distinct from that number itself¶
a floating-point value, distinct from an integer, out of the setrepresentable by IEEE 754 binary64 (including non-finites)[IEEE754]¶
a sequence of zero or more bytes ("byte string")¶
a sequence of zero or more Unicode code points ("text string")¶
a sequence of zero or more data items ("array")¶
a mapping (mathematical function) from zero or more data items("keys") each to a data item ("values"), ("map")¶
a tagged data item ("tag"), comprising a tag number (an integer inthe range 0..2⁶⁴-1) and the tag content (a data item)¶

Note that integer and floating-point values are distinct in thismodel, even if they have the same numeric value.¶

Also note that serialization variants are not visible at the genericdata model level. This deliberate absence of visibility includes the number of bytes of the encodedfloating-point value. It also includes the choice of encoding for an "argument" (seeSection 3) such as the encoding for aninteger, the encoding for the length of a text or byte string, the encoding for the number of elementsin an array or pairs in a map, or the encoding for a tag number.¶

2.1.Extended Generic Data Models

This basic generic data model has been extended in this document by the registrationof a number of simple values and tag numbers, such as:¶

false,true,null, andundefined (simple values identified by 20..23,Section 3.3)¶
integer and floating-point values with a larger range and precisionthan the above (tag numbers 2 to 5,Section 3.4)¶
application data types such as a point in time or date/time string defined in RFC 3339 (tag numbers 1 and 0,Section 3.4)¶

Additional elements of the extended generic data model can be (and havebeen) defined via the IANA registries created for CBOR. Even if suchan extension is unknown to a generic encoder or decoder, data itemsusing that extension can be passed to or from the application byrepresenting them at the application interface within the basicgeneric data model, i.e., as generic simple values orgeneric tags.¶

In other words, the basic generic data model is stable as defined inthis document, while the extended generic data model expands by theregistration of new simple values or tag numbers, but never shrinks.¶

While there is a strong expectation that generic encoders and decoderscan representfalse,true, andnull (undefined is intentionallyomitted) in the form appropriate for their programming environment,the implementation of the data model extensions created by tags is trulyoptional and a matter of implementation quality.¶

2.2.Specific Data Models

The specific data model for a CBOR-based protocol usually takes a subset of theextended generic data model and assigns application semantics to thedata items within this subset and its components. When documenting such specific data models and specifying the types of data items, it is preferable to identify the types by their generic data model names ("negative integer", "array") instead of referring to aspects of their CBOR representation ("major type 1", "major type 4").¶

Specific data models can also specify value equivalency (including values of different types) for the purposes of map keys and encoder freedom. For example, in the generic data model, a valid mapMAY have both0 and0.0 as keys, and an encoderMUST NOT encode0.0 as an integer(major type 0,Section 3.1). However, if a specific data modeldeclares that floating-point and integer representations of integralvalues are equivalent, using both map keys0 and0.0 in a singlemap would be consideredduplicates, even while encoded as different major types, and so invalid; and an encoder could encode integral-valuedfloats as integers or vice versa, perhaps to save encoded bytes.¶

3.Specification of the CBOR Encoding

A CBOR data item (Section 2) is encoded to or decoded froma byte string carrying a well-formed encoded data item as described in this section. The encoding issummarized inTable 7 inAppendix B, indexed by the initial byte. An encoderMUST produce only well-formedencoded data items. A decoderMUST NOT return a decoded data item when itencounters input that is not a well-formed encoded CBOR data item (this doesnot detract from the usefulness of diagnostic and recovery tools thatmight make available some information from a damaged encoded CBOR data item).¶

The initial byte of each encoded data item contains both informationabout the major type (the high-order 3 bits, described inSection 3.1) and additional information (the low-order 5 bits).With a few exceptions, the additional information's valuedescribes how to load an unsigned integer "argument":¶

Less than 24:: The argument's value is the value of the additional information.¶
24, 25, 26, or 27:: The argument's value is held in the following 1, 2, 4, or 8 bytes,respectively, in network byte order. For major type 7 andadditional information value 25, 26, 27, these bytes are not used asan integer argument, but as a floating-point value (seeSection 3.3).¶
28, 29, 30:: These values are reserved for future additions to the CBOR format.In the present version of CBOR, the encoded item is not well-formed.¶
31:: No argument value is derived.If the major type is 0, 1, or 6, the encoded item is notwell-formed. For major types 2 to 5, the item's length isindefinite, and for major type 7, the byte does not constitute a dataitem at all but terminates an indefinite-length item; all aredescribed inSection 3.2.¶

The initial byte and any additional bytes consumed to construct theargument are collectively referred to as thehead of the data item.¶

The meaning of this argument depends on the major type.For example, in major type 0, the argument is the value of the dataitem itself (and in major type 1, the value of the data item iscomputed from the argument); in major type 2 and 3, it gives the lengthof the string data in bytes that follow; and in major types 4 and 5, it is used todetermine the number of data items enclosed.¶

If the encoded sequence of bytes ends before the end of a data item,that item is not well-formed. If the encodedsequence of bytes still has bytes remainingafter the outermost encoded item is decoded, that encoding is not asingle well-formed CBOR item. Depending on the application, the decoder may eithertreat the encoding as not well-formed or just identify the start ofthe remaining bytes to the application.¶

A CBOR decoder implementation can be based on a jump table with all256 defined values for the initial byte (Table 7). A decoder ina constrained implementation can instead use the structure of theinitial byte and following bytes for more compact code (seeAppendix C for a rough impression of how this could look).¶

3.1.Major Types

The following lists the major types and the additional information andother bytes associated with the type.¶

Major type 0:: An unsigned integer in the range 0..2⁶⁴-1 inclusive. The value of theencoded item is the argument itself. For example, theinteger 10 is denoted as the one byte 0b000_01010 (major type 0,additional information 10). The integer 500 would be 0b000_11001(major type 0, additional information 25) followed by the two bytes0x01f4, which is 500 in decimal.¶
Major type 1:: A negative integer in the range -2⁶⁴..-1 inclusive. The value ofthe item is -1 minus the argument. For example, the integer-500 would be 0b001_11001 (major type 1, additional information 25)followed by the two bytes 0x01f3, which is 499 in decimal.¶
Major type 2:: A byte string. The number of bytes in the string is equal to theargument. For example, a bytestring whose length is 5 would have an initial byte of 0b010_00101(major type 2, additional information 5 for the length), followed by5 bytes of binary content. A byte string whose length is 500 wouldhave 3 initial bytes of 0b010_11001 (major type 2, additionalinformation 25 to indicate a two-byte length) followed by the twobytes 0x01f4 for a length of 500, followed by 500 bytes of binarycontent.¶
Major type 3:: A text string (Section 2) encoded as UTF-8[RFC3629]. The number of bytes in the string is equal to theargument. A string containing an invalid UTF-8 sequence iswell-formed but invalid (Section 1.2). This type is provided forsystems that need to interpret or display human-readable text, andallows the differentiation between unstructured bytes and text thathas a specified repertoire (that of Unicode) and encoding (UTF-8). In contrast to formatssuch as JSON, the Unicode characters in this type are neverescaped. Thus, a newline character (U+000A) is always represented ina string as the byte 0x0a, and never as the bytes 0x5c6e (thecharacters "\" and "n") nor as 0x5c7530303061 (the characters "\","u", "0", "0", "0", and "a").¶
Major type 4:: An array of data items. In other formats, arrays are also called lists, sequences, ortuples (a "CBOR sequence" is something slightly different, though[RFC8742]).The argument is the number of data items in thearray. Items in anarray do not need to all be of the same type. For example, an arraythat contains 10 items of any type would have an initial byte of0b100_01010 (major type 4, additional information 10 for thelength) followed by the 10 remaining items.¶
Major type 5:: A map of pairs of data items. Maps are also called tables,dictionaries, hashes, or objects (in JSON). A map is comprised ofpairs of data items, each pair consisting of a key that isimmediately followed by a value. The argument is the numberofpairs of data items in the map. Forexample, a map that contains 9 pairs would have an initial byte of0b101_01001 (major type 5, additional information 9 for thenumber of pairs) followed by the 18 remaining items. The first itemis the first key, the second item is the first value, the third itemis the second key, and so on. Because items in a map come in pairs,their total number is always even: a map that contains an oddnumber of items (no value data present after the last key data item) is not well-formed.A map that has duplicate keys may bewell-formed, but it is not valid, and thus it causes indeterminatedecoding; see alsoSection 5.6.¶
Major type 6:: A tagged data item ("tag") whose tag number, an integer in the range0..2⁶⁴-1 inclusive, is the argument andwhose enclosed data item (tag content) is the single encoded data item that follows the head.SeeSection 3.4.¶
Major type 7:: Floating-point numbers and simple values, as well as the "break"stop code. SeeSection 3.3.¶

These eight major types lead to a simple table showing which of the256 possible values for the initial byte of a data item are used(Table 7).¶

In major types 6 and 7, many of the possible values are reserved forfuture specification. SeeSection 9 for more information on thesevalues.¶

Table 1 summarizes the major types defined by CBOR,ignoringSection 3.2 for now. The number N in this table standsfor the argument.¶

Table 1:Overview over the Definite-Length Use of CBOR Major Types (N = Argument)
Major Type	Meaning	Content
0	unsigned integer N	-
1	negative integer -1-N	-
2	byte string	N bytes
3	text string	N bytes (UTF-8 text)
4	array	N data items (elements)
5	map	2N data items (key/value pairs)
6	tag of number N	1 data item
7	simple/float	-

3.2.Indefinite Lengths for Some Major Types

Four CBOR items (arrays, maps, byte strings, and text strings) can beencoded with an indefinite length using additional informationvalue 31. This is useful if the encoding of the item needs to beginbefore the number of items inside the array or map, or the totallength of the string, is known. (The ability to start sending a dataitem before all of it is known is oftenreferred to as "streaming" within that data item.)¶

Indefinite-length arrays and maps are dealt with differently thanindefinite-length strings (byte strings and text strings).¶

3.2.1.The "break" Stop Code

The "break" stop code is encoded with major type 7 and additionalinformation value 31 (0b111_11111). It is not itself a data item: itis just a syntactic feature to close an indefinite-length item.¶

If the "break" stop code appears where a data item is expected, other than directly inside an indefinite-length string, array, or map -- for example, directly inside a definite-length array or map -- the enclosing item is not well-formed.¶

3.2.2.Indefinite-Length Arrays and Maps

Indefinite-length arrays and maps are represented using their majortype with the additional information value of 31, followed by anarbitrary-length sequence of zero or more items for an array or key/value pairs fora map, followed by the "break" stop code (Section 3.2.1). In other words, indefinite-lengtharrays and maps look identical to other arrays and maps except forbeginning with the additional information value of 31 and ending with the"break" stop code.¶

If the "break" stop code appears after a key in a map, in place of thatkey's value, the map is not well-formed.¶

There is no restriction against nesting indefinite-lengtharray or map items. A "break" only terminates a single item, sonested indefinite-length items need exactly as many "break" stop codesas there are type bytes starting an indefinite-length item.¶

For example, assume an encoder wants to represent the abstract array[1, [2, 3], [4, 5]]. The definite-length encoding would be0x8301820203820405:¶

83        -- Array of length 3   01     -- 1   82     -- Array of length 2      02  -- 2      03  -- 3   82     -- Array of length 2      04  -- 4      05  -- 5

Indefinite-length encoding could be applied independently to each ofthe three arrays encoded in this data item, as required, leading torepresentations such as:¶

0x9f018202039f0405ffff9F        -- Start indefinite-length array   01     -- 1   82     -- Array of length 2      02  -- 2      03  -- 3   9F     -- Start indefinite-length array      04  -- 4      05  -- 5      FF  -- "break" (inner array)   FF     -- "break" (outer array)

0x9f01820203820405ff9F        -- Start indefinite-length array   01     -- 1   82     -- Array of length 2      02  -- 2      03  -- 3   82     -- Array of length 2      04  -- 4      05  -- 5   FF     -- "break"

0x83018202039f0405ff83        -- Array of length 3   01     -- 1   82     -- Array of length 2      02  -- 2      03  -- 3   9F     -- Start indefinite-length array      04  -- 4      05  -- 5      FF  -- "break"

0x83019f0203ff82040583        -- Array of length 3   01     -- 1   9F     -- Start indefinite-length array      02  -- 2      03  -- 3      FF  -- "break"   82     -- Array of length 2      04  -- 4      05  -- 5

An example of an indefinite-length map (that happens to have twokey/value pairs) might be:¶

0xbf6346756ef563416d7421ffBF           -- Start indefinite-length map   63        -- First key, UTF-8 string length 3      46756e --   "Fun"   F5        -- First value, true   63        -- Second key, UTF-8 string length 3      416d74 --   "Amt"   21        -- Second value, -2   FF        -- "break"

3.2.3.Indefinite-Length Byte Strings and Text Strings

Indefinite-length strings are represented by a byte containing themajor type for byte string or text string with an additionalinformation value of 31, followed by a series of zero or more stringsof the specified type ("chunks") that have definite lengths, andfinished by the "break" stop code (Section 3.2.1). The data itemrepresented by the indefinite-length string is the concatenation ofthe chunks. If no chunks are present, the data item is an emptystring of the specified type. Zero-length chunks, while notparticularly useful, are permitted.¶

If any item between the indefinite-length string indicator(0b010_11111 or 0b011_11111) and the "break" stop code is not a definite-lengthstring item of the same major type, the string is not well-formed.¶

The design does not allow nestingindefinite-length strings as chunks into indefinite-length strings.If it were allowed, it would require decoder implementations to keep a stack, or atleast a count, of nesting levels. It is unnecessary on theencoder side because the inner indefinite-length string would consist ofchunks, and these could instead be put directly into the outer indefinite-lengthstring.¶

If any definite-length text string inside an indefinite-length textstring is invalid, the indefinite-length text string is invalid. Notethat this implies that the UTF-8 bytes of a single Unicode code point(scalar value) cannot be spread between chunks: a new chunk of a textstring can only be started at a code point boundary.¶

For example, assume an encoded data item consisting of the bytes:¶

0b010_11111 0b010_00100 0xaabbccdd 0b010_00011 0xeeff99 0b111_111115F              -- Start indefinite-length byte string   44           -- Byte string of length 4      aabbccdd  -- Bytes content   43           -- Byte string of length 3      eeff99    -- Bytes content   FF           -- "break"

After decoding, this results in a single byte string with seven bytes:0xaabbccddeeff99.¶

3.2.4.Summary of Indefinite-Length Use of Major Types

Table 2 summarizes the major types defined by CBOR asused for indefinite-length encoding (with additional information setto 31).¶

Table 2:Overview of the Indefinite-Length Use of CBOR Major Types (Additional Information = 31)
Major Type	Meaning	Enclosed up to "break" Stop Code
0	(not well-formed)	-
1	(not well-formed)	-
2	byte string	definite-length byte strings
3	text string	definite-length text strings
4	array	data items (elements)
5	map	data items (key/value pairs)
6	(not well-formed)	-
7	"break" stop code	-

3.3.Floating-Point Numbers and Values with No Content

Major type 7 is for two types of data: floating-point numbers and"simple values" that do not need any content. Each value of the 5-bitadditional information in the initial byte has its own separatemeaning, as defined inTable 3. Like the major types forintegers, items of this major type do not carry content data; all theinformation is in the initial bytes (the head).¶

Table 3:Values for Additional Information in Major Type 7
5-Bit Value	Semantics
0..23	Simple value (value 0..23)
24	Simple value (value 32..255 in following byte)
25	IEEE 754 Half-Precision Float (16 bits follow)
26	IEEE 754 Single-Precision Float (32 bits follow)
27	IEEE 754 Double-Precision Float (64 bits follow)
28-30	Reserved, not well-formed in the present document
31	"break" stop code for indefinite-length items (Section 3.2.1)

As with all other major types, the 5-bit value 24 signifies asingle-byte extension: it is followed by an additional byte torepresent the simple value. (To minimize confusion, only the values 32to 255 are used.) This maintains the structure of the initial bytes:as for the other major types, the length of these always depends onthe additional information in the first byte.Table 4 liststhe numeric values assigned and available for simple values.¶

Table 4:Simple Values
Value	Semantics
0..19	(unassigned)
20	false
21	true
22	null
23	undefined
24..31	(reserved)
32..255	(unassigned)

An encoderMUST NOT issue two-byte sequences that start with 0xf8 (major type 7, additional information 24) and continue with a byte less than 0x20 (32 decimal). Such sequences are not well-formed. (This implies that an encoder cannot encodefalse,true,null, orundefined in two-byte sequences and that only the one-byte variants of these are well-formed; more generally speaking, each simple value only has a single representation variant).¶

The 5-bit values of 25, 26, and 27 are for 16-bit, 32-bit, and 64-bitIEEE 754 binary floating-point values[IEEE754]. These floating-point valuesare encoded in the additional bytes of the appropriate size. (SeeAppendix D for some information about 16-bit floating-point numbers.)¶

3.4.Tagging of Items

In CBOR, a data item can be enclosed by a tag to give it someadditional semantics, as uniquely identified by atag number.The tag is major type 6, its argument (Section 3) indicates the tagnumber, and it contains a single enclosed data item, thetag content.(If a tag requires further structure to its content, this structure isprovided by the enclosed data item.)We use the termtag for the entire data item consisting of both atag number and the tag content: the tag content is the data item thatis being tagged.¶

For example, assume that a byte string of length 12 is marked with atag of number 2 to indicate it is an unsignedbignum (Section 3.4.3).The encoded data item would start with a byte 0b110_00010 (major type6, additional information 2 for the tag number) followed by theencoded tag content: 0b010_01100 (major type 2, additional information12 for the length) followed by the 12 bytes of the bignum.¶

In the extended generic data model, a tag number'sdefinition describes the additional semanticsconveyed with the tag number.These semantics may include equivalence of some tagged dataitems with other data items, including some that can berepresented in the basic generic data model. For instance, 0xc24101,a bignum the tag content of which is the byte string with the singlebyte 0x01, is equivalent to an integer 1, which could also be encodedas 0x01, 0x1801, or 0x190001.The tag definition may specify a preferredserialization (Section 4.1) that is recommended for genericencoders; this may prefer basic generic data model representationsover ones that employ a tag.¶

The tag definition usually defines which nested data items arevalid for such tags. Tag definitions may restrict their content to avery specific syntactic structure, as the tags defined in thisdocument do, or they may define their content more semantically. Anexample for the latter is how tags 40 and 1040 accept multiple ways torepresent arrays[RFC8746].¶

As a matter of convention, many tags do not acceptnull orundefinedvalues as tag content; instead, the expectation is that anull orundefined value can be used in place of the entire tag;Section 3.4.2 provides some further considerations for one specific tag about the handling of this convention in application protocols and in mapping to platform types.¶

Decoders do not need to understand tags of every tag number, and tags may be oflittle value in applications where the implementation creating aparticular CBOR data item and the implementation decoding that streamknow the semantic meaning of each item in the data flow. The primarypurpose of tags in this specification is to define common data types such asdates. A secondary purpose is to provide conversion hints when it isforeseen that the CBOR data item needs to be translated into adifferent format, requiring hints about the content of items.Understanding the semantics of tags isoptional for a decoder; it can simply present both the tag number andthe tag content to the application, without interpreting the additionalsemantics of the tag.¶

A tag applies semantics to the data item it encloses.Tags can nest: if tag A encloses tag B, which encloses data item C,tag A applies to the result of applying tag B on data item C.¶

IANA maintains a registry of tag numbers as described inSection 9.2.Table 5 provides a list of tag numbers that were defined in[RFC7049] with definitions inthe rest of this section.(Tag number 35 was also defined in[RFC7049]; a discussion of thistag number follows inSection 3.4.5.3.)Note that many other tag numbers have been defined since the publication of[RFC7049];see the registry described atSection 9.2 for the complete list.¶

Table 5:Tag Numbers Defined in RFC 7049
Tag	Data Item	Semantics
0	text string	Standard date/time string; seeSection 3.4.1
1	integer or float	Epoch-based date/time; seeSection 3.4.2
2	byte string	Unsigned bignum; seeSection 3.4.3
3	byte string	Negative bignum; seeSection 3.4.3
4	array	Decimal fraction; seeSection 3.4.4
5	array	Bigfloat; seeSection 3.4.4
21	(any)	Expected conversion to base64url encoding; seeSection 3.4.5.2
22	(any)	Expected conversion to base64 encoding; seeSection 3.4.5.2
23	(any)	Expected conversion to base16 encoding; seeSection 3.4.5.2
24	byte string	Encoded CBOR data item; seeSection 3.4.5.1
32	text string	URI; seeSection 3.4.5.3
33	text string	base64url; seeSection 3.4.5.3
34	text string	base64; seeSection 3.4.5.3
36	text string	MIME message; seeSection 3.4.5.3
55799	(any)	Self-described CBOR; seeSection 3.4.6

Conceptually, tags are interpreted in the generic data model, not at(de-)serialization time. A small number of tags (at this time, tagnumber 25 and tag number 29[IANA.cbor-tags]) have been registered with semantics thatmay require processing at (de-)serialization time: the decoder needs tobe aware of, and the encoder needs to be in control of, the exactsequence in which data items are encoded into the CBOR data item.This means these tags cannot be implemented on top of an arbitrary genericCBOR encoder/decoder (which might not reflect the serialization orderfor entries in a map at the data model level and vice versa); theirimplementation therefore typically needs to be integrated into thegeneric encoder/decoder. The definition of new tags with thisproperty isNOT RECOMMENDED.¶

IANA allocated tag numbers 65535, 4294967295, and18446744073709551615 (binary all-ones in 16-bit, 32-bit, and 64-bit).These can be used as a convenience for implementerswho want a single-integer data structure to indicate either the presence of a specific tag or absence of a tag.That allocation is described inSection 10 of [CBOR-TAGS].These tags are not intended to occur in actual CBOR data items;implementationsMAY flag such an occurrence as an error.¶

Protocols can extend the generic data model (Section 2) with data items representing points in time by using tag numbers 0 and 1, with arbitrarily sized integers by using tag numbers 2 and 3, and with floating-point values of arbitrary size and precision by using tag numbers 4 and 5.¶

3.4.1.Standard Date/Time String

Tag number 0 contains a text string in the standard format described bythedate-time production in[RFC3339], as refined bySection 3.3 of [RFC4287], representing the point in time described there. Anested item of another type or a text string that doesn't match the format described in[RFC4287] is invalid.¶

3.4.2.Epoch-Based Date/Time

Tag number 1 contains a numerical value counting the number of secondsfrom 1970-01-01T00:00Z in UTC time to the represented point in civiltime.¶

The tag contentMUST be an unsigned or negative integer (major types 0and 1) or a floating-point number (major type 7 with additionalinformation 25, 26, or 27). Other contained types are invalid.¶

Nonnegative values (major type 0 and nonnegative floating-pointnumbers) stand for time values on or after 1970-01-01T00:00Z UTC andare interpreted according to POSIX[TIME_T].(POSIX time is also known as "UNIX Epoch time".)Leap secondsare handled specially by POSIX time, and this results in a 1-seconddiscontinuity several times per decade.Note that applications that require the expression of times beyondearly 2106 cannot leave out support of 64-bit integers for the tag content.¶

Negative values (major type 1 and negative floating-point numbers) areinterpreted as determined by the application requirements as there isno universal standard for UTC count-of-seconds time before1970-01-01T00:00Z (this is particularly true for points in time thatprecede discontinuities in national calendars). The same applies tonon-finite values.¶

To indicate fractional seconds, floating-point values can be usedwithin tag number 1 instead of integer values. Note that this generallyrequires binary64 support, as binary16 and binary32 provide nonzerofractions of seconds only for a short period of time aroundearly 1970. An application that requires tag number 1 support may restrictthe tag content to be an integer (or a floating-point value) only.¶

Note that platform types for date/time may includenull orundefinedvalues, which may also be desirable at an application protocol level.While emitting tag number 1 values with non-finite tag content values(e.g., with NaN for undefined date/time values or with Infinity for anexpiry date that is not set) may seem an obvious way to handle this,using untaggednull orundefined avoids the use of non-finites and results in a shorter encoding.Application protocol designers are encouraged to consider these casesand include clear guidelines for handling them.¶

3.4.3.Bignums

Protocols using tag numbers 2 and 3 extend the generic data model(Section 2) with "bignums" representing arbitrarily sizedintegers. In the basic generic data model, bignum values are not equalto integers from the same model, but the extended generic data modelcreated by this tag definition defines equivalence based on numericvalue, and preferred serialization (Section 4.1) never makes use ofbignums that also can be expressed as basic integers (see below).¶

Bignums are encoded as a byte string data item, which is interpretedas an unsigned integer n in network byte order. Contained items ofother types are invalid. For tag number 2, thevalue of the bignum is n. For tag number 3, the value of the bignum is-1 - n. The preferred serialization of the byte string is to leave out anyleading zeroes (note that this means the preferred serialization for n = 0is the empty byte string, but see below).Decoders that understand these tagsMUST be able to decodebignums that do have leading zeroes.The preferred serialization of an integer that can be represented usingmajor type 0 or 1 is to encode it this way instead of as a bignum(which means that the empty string never occurs in a bignum when usingpreferred serialization).Note that this means the non-preferred choice of a bignumrepresentation instead of a basic integer for encoding a number is notintended to have application semantics (just as the choice of a longerbasic integer representation than needed, such as 0x1800 for 0x00, doesnot).¶

For example, the number 18446744073709551616 (2⁶⁴) is representedas 0b110_00010 (major type 6, tag number 2), followed by 0b010_01001 (majortype 2, length 9), followed by 0x010000000000000000 (one byte 0x01 andeight bytes 0x00). In hexadecimal:¶

C2                        -- Tag 2   49                     -- Byte string of length 9      010000000000000000  -- Bytes content

3.4.4.Decimal Fractions and Bigfloats

Protocols using tag number 4 extend the generic data model with dataitems representing arbitrary-length decimal fractions of the form m*(10^e).Protocols using tag number 5 extend the generic data model with dataitems representing arbitrary-length binary fractions of the form m*(2^e).As withbignums, values of different types are not equal in the generic datamodel.¶

Decimal fractions combine an integer mantissa with a base-10 scalingfactor. They are most useful if an application needs the exactrepresentation of a decimal fraction such as 1.1 because there is noexact representation for many decimal fractions in binaryfloating-point representations.¶

"Bigfloats" combine an integer mantissa with a base-2 scaling factor.They are binary floating-point values that can exceed the range or theprecision of the three IEEE 754 formats supported by CBOR(Section 3.3). Bigfloats may also be used by constrainedapplications that need some basic binary floating-point capabilitywithout the need for supporting IEEE 754.¶

A decimal fraction or a bigfloat is represented as a tagged array thatcontains exactly two integer numbers: an exponent e and a mantissa m.Decimal fractions (tag number 4) use base-10 exponents; the value of adecimal fraction data item is m*(10^e). Bigfloats (tag number 5) usebase-2 exponents; the value of a bigfloat data item is m*(2^e).The exponent eMUST be represented in an integer of major type 0 or 1,while the mantissa can also be a bignum (Section 3.4.3). Containeditems with other structures are invalid.¶

An example of a decimal fraction is the representation of the number 273.15 as 0b110_00100 (major type 6 for tag, additionalinformation 4 for the tag number), followed by 0b100_00010 (majortype 4 for the array, additional information 2 for the length ofthe array), followed by 0b001_00001 (major type 1 for the firstinteger, additional information 1 for the value of -2), followed by0b000_11001 (major type 0 for the second integer, additionalinformation 25 for a two-byte value), followed by0b0110101010110011 (27315 in two bytes). In hexadecimal:¶

C4             -- Tag 4   82          -- Array of length 2      21       -- -2      19 6ab3  -- 27315

An example of a bigfloat is the representation of the number 1.5 as 0b110_00101 (major type 6 for tag, additional information5 for the tag number), followed by 0b100_00010 (major type 4 forthe array, additional information 2 for the length of the array),followed by 0b001_00000 (major type 1 for the first integer,additional information 0 for the value of -1), followed by0b000_00011 (major type 0 for the second integer, additionalinformation 3 for the value of 3). In hexadecimal:¶

C5             -- Tag 5   82          -- Array of length 2      20       -- -1      03       -- 3

Decimal fractions and bigfloats provide no representation of Infinity,-Infinity, or NaN; if these are needed in place of a decimal fractionor bigfloat, the IEEE 754 half-precision representations fromSection 3.3 can be used.¶

3.4.5.Content Hints

The tags in this section are for content hints that might be used bygeneric CBOR processors. These content hints do not extend the genericdata model.¶

3.4.5.1.Encoded CBOR Data Item

Sometimes it is beneficial to carry an embedded CBOR data item that isnot meant to be decoded immediately at the time the enclosing dataitem is being decoded. Tag number 24 (CBOR data item) can be used to tag theembedded byte string as a single data item encoded in CBOR format. Containeditems that aren't byte strings are invalid. A contained byte stringis valid if it encodes a well-formed CBOR data item; validity checking ofthe decoded CBOR item is not required for tag validity (but could beoffered by a generic decoder as a special option).¶

3.4.5.2.Expected Later Encoding for CBOR-to-JSON Converters

Tag numbers 21 to 23 indicate that a byte string might require a specificencoding when interoperating with a text-based representation. Thesetags are useful when an encoder knows that the byte string data it iswriting is likely to be later converted to a particular JSON-basedusage. That usage specifies that some strings are encoded as base64,base64url, and so on. The encoder uses byte strings instead of doingthe encoding itself to reduce the message size, to reduce the codesize of the encoder, or both. The encoder does not know whether ornot the converter will be generic, and therefore wants to say what itbelieves is the proper way to convert binary strings to JSON.¶

The data item tagged can be a byte string or any other data item. Inthe latter case, the tag applies to all of the byte string data itemscontained in the data item, except for those contained in a nesteddata item tagged with an expected conversion.¶

These three tag numbers suggest conversions to three of the base data encodings defined in[RFC4648]. Tag number 21 suggests conversion to base64url encoding (Section 5 of [RFC4648]) where padding is not used (seeSection 3.2 of [RFC4648]); that is, all trailing equals signs ("=") are removed from the encoded string.Tag number 22 suggests conversion to classical base64 encoding (Section 4 of [RFC4648]) with padding as defined in RFC 4648.For both base64url and base64, padding bits are set to zero (seeSection 3.5 of [RFC4648]), and the conversion to alternate encodingis performed on the contents of the byte string (that is, without adding any line breaks, whitespace, orother additional characters). Tag number 23 suggests conversion tobase16 (hex) encoding with uppercase alphabetics (seeSection 8 of [RFC4648]).Note that, for all three tag numbers, theencoding of the empty byte string is the empty text string.¶

3.4.5.3.Encoded Text

Some text strings hold data that have formats widely used on theInternet, and sometimes those formats can be validated and presentedto the application in appropriate form by the decoder. There are tagsfor some of these formats.¶

Tag number 32 is for URIs, as defined in[RFC3986]. If the text stringdoesn't match theURI-reference production, the string is invalid.¶
Tag numbers 33 and 34 are for base64url- and base64-encoded textstrings, respectively,as defined in[RFC4648]. If any of the following apply:¶
- the encoded text string contains non-alphabet characters or only 1alphabet character in the last block of 4 (where alphabet isdefined bySection 5 of [RFC4648] for tag number 33 andSection 4 of [RFC4648] for tag number 34), or¶
- the padding bits in a 2- or 3-character block are not 0, or¶
- the base64 encoding has the wrong number of padding characters, or¶
- the base64url encoding has padding characters,¶
the string is invalid.¶
Tag number 36 is for MIME messages (including all headers), as defined in[RFC2045]. A text string that isn't a valid MIME message isinvalid. (For this tag, validity checkingmay be particularly onerous for a generic decoder and mighttherefore not be offered. Note that many MIME messages are generalbinary data and therefore cannot be represented in a text string;[IANA.cbor-tags] lists a registration for tag number 257 that issimilar to tag number 36 but uses a byte string as its tag content.)¶

Note that tag numbers 33 and 34 differ from 21 and 22 in that the data istransported in base-encoded form for the former and in raw byte stringform for the latter.¶

[RFC7049] also defined a tag number 35 for regular expressions thatare in Perl Compatible Regular Expressions (PCRE/PCRE2) form[PCRE]or in JavaScript regular expression syntax[ECMA262].The state of the art in these regular expression specifications hassince advanced and is continually advancing, so this specification does not attempt to update the references.Instead, this tag remains available (as registered in[RFC7049]) forapplications that specify the particular regular expression variantthey use out-of-band (possibly by limiting the usage to a definedcommon subset of both PCRE and ECMA262).As this specification clarifies tag validity beyond[RFC7049], we note that due to the open way the tag was defined in[RFC7049], any contained string value needs to be valid at the CBORtag level (but then may not be "expected" at the application level).¶

3.4.6.Self-Described CBOR

In many applications, it will be clear from the context that CBOR isbeing employed for encoding a data item. For instance, a specificprotocol might specify the use of CBOR, or a media type is indicatedthat specifies its use. However, there may be applications where suchcontext information is not available, such as when CBOR data is storedin a file that does not have disambiguating metadata. Here, it mayhelp to have some distinguishing characteristics for the data itself.¶

Tag number 55799 is defined for this purpose, specifically for use atthe start of a stored encoded CBOR data item as specified by an application.It does not impart any specialsemantics on the data item that it encloses; that is, the semantics ofthe tag content enclosed in tag number 55799 is exactly identical to the semanticsof the tag content itself.¶

The serialization of this tag's head is 0xd9d9f7, which does not appear to be inuse as a distinguishing mark for any frequently used file types. Inparticular, 0xd9d9f7 is not a valid start of a Unicode text in any Unicodeencoding if it is followed by a valid CBOR data item.¶

For instance, a decoder might be able to decode both CBOR andJSON. Such a decoder would need to mechanically distinguish the twoformats. An easy way for an encoder to help the decoder would be totag the entire CBOR item with tag number 55799, the serialization of whichwill never be found at the beginning of a JSON text.¶

4.Serialization Considerations

4.1.Preferred Serialization

For some values at the data model level, CBOR provides multipleserializations.For many applications, it is desirable that an encoder always choosesa preferred serialization (preferred encoding); however, the present specification does notput the burden of enforcing this preference on either the encoder or decoder.¶

Some constrained decoders may be limited in their ability to decode non-preferred serializations: for example, if only integers below 1_000_000_000 (one billion) are expected in an application, the decoder may leave out the code that would be needed to decode 64-bit arguments in integers. An encoder that always uses preferred serialization ("preferred encoder") interoperates with this decoder for the numbers that can occur in this application. Generally speaking, a preferred encoder is more universally interoperable (and also less wasteful) than one that, say, always uses 64-bit integers.¶

Similarly, a constrained encoder may be limited in the variety ofrepresentation variants it supports such that it does notemit preferred serializations ("variant encoder"). For instance, a constrained encoder couldbe designed toalways use the 32-bit variant for an integer that it encodes even if ashort representation is available (assuming that there is no application need for integers that can onlybe represented with the 64-bit variant).A decoder that does not rely on receiving only preferred serializations ("variation-tolerant decoder") can therefore be said to be moreuniversally interoperable (it might very well optimize for the case ofreceiving preferred serializations, though).Full implementations of CBOR decoders are by definitionvariation tolerant; the distinction is only relevant if a constrainedimplementation of a CBOR decoder meets a variant encoder.¶

The preferred serialization always uses the shortest form ofrepresenting the argument (Section 3); it also uses the shortestfloating-point encoding that preserves the value being encoded.¶

The preferred serialization for a floating-point value is the shortestfloating-point encoding that preserves its value, e.g., 0xf94580 forthe number 5.5, and 0xfa45ad9c00 for the number 5555.5. For NaNvalues, a shorter encoding is preferred if zero-padding the shortersignificand towards the right reconstitutes the original NaN value(for many applications, the single NaN encoding 0xf97e00 willsuffice).¶

Definite-length encoding is preferred whenever the length is known atthe time the serialization of the item starts.¶

4.2.Deterministically Encoded CBOR

Some protocols may want encoders to only emit CBOR in a particulardeterministic format; those protocols might also have the decoders checkthat their input is in that deterministic format. Those protocols arefree to define whatthey mean by a "deterministic format" and what encoders and decoders areexpected to do. This section defines a set of restrictions that canserve as the base of such a deterministic format.¶

4.2.1.Core Deterministic Encoding Requirements

A CBOR encoding satisfies the "core deterministic encoding requirements" ifit satisfies the following restrictions:¶

Preferred serializationMUST be used. In particular, this meansthat arguments (seeSection 3) for integers, lengths in major types2 through 5, and tagsMUST be as short as possible, for instance:¶
- 0 to 23 and -1 to -24MUST be expressed in the same byte as themajor type;¶
- 24 to 255 and -25 to -256MUST be expressed only with anadditional uint8_t;¶
- 256 to 65535 and -257 to -65536MUST be expressed only with anadditional uint16_t;¶
- 65536 to 4294967295 and -65537 to -4294967296MUST be expressedonly with an additional uint32_t.¶
Floating-point values alsoMUST use the shortest form that preservesthe value, e.g., 1.5 is encoded as 0xf93e00 (binary16) and 1000000.5 as0xfa49742408 (binary32).(One implementation of this is to have all floats start as a 64-bitfloat, then do a test conversion to a 32-bit float; if the result isthe same numeric value, use the shorter form and repeat the processwith a test conversion to a 16-bit float. This also works to select16-bit float for positive and negative Infinity as well.)¶
Indefinite-length itemsMUST NOT appear. They can be encoded asdefinite-length items instead.¶
The keys in every mapMUST be sorted in the bytewise lexicographic order of their deterministic encodings. For example, the following keys are sorted correctly:¶
1. 10, encoded as 0x0a.¶
2. 100, encoded as 0x1864.¶
3. -1, encoded as 0x20.¶
4. "z", encoded as 0x617a.¶
5. "aa", encoded as 0x626161.¶
6. [100], encoded as 0x811864.¶
7. [-1], encoded as 0x8120.¶
8. false, encoded as 0xf4.¶

Implementation note: the self-delimiting nature of the CBORencoding means that there are no two well-formed CBOR encoded dataitems where one is a prefix of the other.The bytewise lexicographic comparison of deterministic encodings ofdifferent map keys therefore always ends in a position where thebyte differs between the keys, before the end of a key is reached.¶

4.2.2.Additional Deterministic Encoding Considerations

CBOR tags present additional considerations for deterministicencoding. If a CBOR-based protocol were to provide the same semanticsfor the presence and absence of a specific tag (e.g., by allowing bothtag 1 data items and raw numbers in a date/time position, treating thelatter as if they were tagged), the deterministic format would notallow the presence of the tag, based on the "shortest form" principle.For example, a protocol might give encoders the choice of representing a URL aseither a text string or, usingSection 3.4.5.3, tag number 32 containing atext string. This protocol's deterministic encoding needs either torequire that the tag is present or to require that it is absent, notallow either one.¶

In a protocol that does require tags in certain places toobtain specific semantics, the tag needs to appear in thedeterministic format as well. Deterministic encoding considerationsalso apply to the content of tags.¶

If a protocol includes a field that can express integers with anabsolute value of 2⁶⁴ or larger using tag numbers 2 or 3(Section 3.4.3), the protocol's deterministic encoding needs to specifywhether smaller integers are also expressed using these tags or usingmajor types 0 and 1. Preferred serialization uses the latter choice,which is therefore recommended.¶

Protocols that include floating-point values, whether representedusing basic floating-point values (Section 3.3) or using tags (orboth), may need to define extra requirements on their deterministicencodings, such as:¶

Although IEEE floating-point values can represent both positive and negative zero asdistinct values, the application might not distinguish these and mightdecide to represent all zero values with a positive sign, disallowingnegative zero.(The application may also want to restrict the precision offloating-point values in such a way that there is never a need to represent64-bit -- or even 32-bit -- floating-point values.)¶
If a protocol includes a field that can express floating-point values,with a specific data model that declares integer andfloating-point values to be interchangeable, the protocol'sdeterministic encoding needs to specifywhether, for example, the integer 1.0 is encoded as 0x01 (unsignedinteger), 0xf93c00 (binary16), 0xfa3f800000 (binary32),or 0xfb3ff0000000000000 (binary64). Example rules for this are:¶
1. Encode integral values that fit in 64 bits as values from majortypes 0 and 1, and other values as the preferred (smallest of 16-, 32-, or64-bit) floating-point representation that accurately represents the value,¶
2. Encode all values as the preferredfloating-point representation that accurately represents thevalue, even for integral values, or¶
3. Encode all values as 64-bit floating-point representations.¶
Rule 1 straddles the boundaries between integers and floating-pointvalues, and Rule 3 does not use preferred serialization, so Rule 2 may bea good choice in many cases.¶
If NaN is an allowed value, and there is no intent to support NaNpayloads or signaling NaNs, the protocol needs to pick a singlerepresentation, typically 0xf97e00. If that simple choice is notpossible, specific attention will be needed for NaN handling.¶
Subnormal numbers (nonzero numbers with the lowest possible exponentof a given IEEE 754 number format) may be flushed to zero outputs orbe treated as zero inputs in some floating-point implementations.A protocol's deterministic encoding may want to specificallyaccommodate such implementations while creating an onus on otherimplementations by excluding subnormal numbers from interchange,interchanging zero instead.¶
The same number can be represented by different decimal fractions,by different bigfloats, and by different forms under other tags thatmay be defined to express numeric values. Depending on theimplementation, it may not always be practical to determine whetherany of these forms (or forms in the basic generic data model) areequivalent. An application protocol that presents choices of thiskind for the representation format of numbers needs to be explicitabout how the formats for deterministic encoding are to be chosen.¶

4.2.3.Length-First Map Key Ordering

The core deterministic encoding requirements (Section 4.2.1) sort map keys in a differentorder from the one suggested bySection 3.9 of [RFC7049] (called"Canonical CBOR" there). Protocols that need tobe compatible with the order specified in[RFC7049] can instead be specified interms of this specification's "length-first core deterministic encodingrequirements":¶

A CBOR encoding satisfies the "length-first core deterministic encodingrequirements" if it satisfies the core deterministic encoding requirementsexcept that the keys in every mapMUST be sorted such that:¶

If two keys have different lengths, the shorter one sorts earlier;¶
If two keys have the same length, the one with the lower value in(bytewise) lexical order sorts earlier.¶

For example, under the length-first core deterministic encodingrequirements, the following keys are sorted correctly:¶

10, encoded as 0x0a.¶
-1, encoded as 0x20.¶
false, encoded as 0xf4.¶
100, encoded as 0x1864.¶
"z", encoded as 0x617a.¶
[-1], encoded as 0x8120.¶
"aa", encoded as 0x626161.¶
[100], encoded as 0x811864.¶

Although[RFC7049] used the term "Canonical CBOR" for its form ofrequirements on deterministic encoding, this document avoids this termbecause "canonicalization" is often associated with specific uses of deterministicencoding only. The terms are essentially interchangeable, however, andthe set of core requirements in this document could also becalled "Canonical CBOR", while the length-first-ordered version of thatcould be called "Old Canonical CBOR".¶

5.Creating CBOR-Based Protocols

Data formats such as CBOR are often used in environments where thereis no format negotiation. A specific design goal of CBOR is to notneed any included or assumed schema: a decoder can take a CBOR itemand decode it with no other knowledge.¶

Of course, in real-world implementations, the encoder and the decoderwill have a shared view of what should be in a CBOR data item. Forexample, an agreed-to format might be "the item is an array whosefirst value is a UTF-8 string, second value is an integer, andsubsequent values are zero or more floating-point numbers" or "theitem is a map that has byte strings for keys and contains apair whose key is 0xab01".¶

CBOR-based protocolsMUST specify how their decoders handleinvalid and other unexpected data. CBOR-based protocolsMAY specify that they treat arbitrary valid data as unexpected.Encoders for CBOR-based protocolsMUST produce only valid items, thatis, the protocol cannot be designed to make use of invalid items. Anencoder can be capable of encoding as many or as few types of valuesas is required by the protocol in which it is used; a decoder can becapable of understanding as many or as few types of values as isrequired by the protocols in which it is used. This lack ofrestrictions allows CBOR to be used in extremely constrainedenvironments.¶

The rest of this section discusses some considerations in creating CBOR-basedprotocols. With few exceptions, it is advisory only and explicitly excludes any languagefrom BCP 14[RFC2119][RFC8174] other than words that could be interpreted as "MAY" inthe sense of BCP 14. The exceptions aim at facilitatinginteroperability of CBOR-based protocols while making use of a wide variety ofboth generic and application-specific encoders and decoders.¶

5.1.CBOR in Streaming Applications

In a streaming application, a data stream may be composed of asequence of CBOR data items concatenated back-to-back. In such anenvironment, the decoder immediately begins decoding a new data itemif data is found after the end of a previous data item.¶

Not all of the bytes making up a data item may be immediatelyavailable to the decoder; some decoders will buffer additional datauntil a complete data item can be presented to the application. Otherdecoders can present partial information about a top-level data itemto an application, such as the nested data items that could already bedecoded, or even parts of a byte string that hasn't completely arrivedyet.Such an application alsoMUST have a matching streaming security mechanism, wherethe desired protection is available for incremental data presented to theapplication.¶

Note that some applications and protocols will not want to useindefinite-length encoding. Using indefinite-length encoding allowsan encoder to not need to marshal all the data for counting, but itrequires a decoder to allocate increasing amounts of memory whilewaiting for the end of the item. This might be fine for someapplications but not others.¶

5.2.Generic Encoders and Decoders

A generic CBOR decoder can decode all well-formed encoded CBOR data items andpresent the data items to an application. SeeAppendix C.(The diagnostic notation,Section 8, may be used topresent well-formed CBOR values to humans.)¶

Generic CBOR encoders provide an application interface that allows theapplication to specify any well-formed value to be encoded as a CBORdata item, including simple values and tags unknown to the encoder.¶

Even though CBOR attempts to minimize these cases, not all well-formedCBOR data is valid: for example, the encoded text string0x62c0aedoes not contain valid UTF-8 (because[RFC3629] requires always using the shortestform) and so is not a valid CBOR item.Also, specific tags maymake semantic constraints that may be violated, for instance, by a bignum tagenclosing another tag or by an instance of tag number 0 containing a bytestring or containing a text string with contents that do not match thedate-time production of[RFC3339]. There isno requirement that generic encoders and decoders make unnaturalchoices for their application interface to enable the processing ofinvalid data. Generic encoders and decoders are expected to forwardsimple values and tags even if their specific codepoints are notregistered at the time the encoder/decoder is written(Section 5.4).¶

5.3.Validity of Items

A well-formed but invalid CBOR data item (Section 1.2) presents a problem withinterpreting the data encoded in it in the CBOR data model. ACBOR-based protocol could be specified in several layers, in which thelower layers don't process the semantics of some of the CBOR data theyforward. These layers can't notice any validity errors in data they don'tprocess andMUST forward that data as-is. The first layer that doesprocess the semantics of an invalid CBOR itemMUST pick one of twochoices:¶

Replace the problematic item with an error marker and continue withthe next item, or¶
Issue an error and stop processing altogether.¶

A CBOR-based protocolMUST specify which of these options its decoderstake for each kind of invalid item they might encounter.¶

Such problems might occur at the basic validity level of CBOR or inthe context of tags (tag validity).¶

5.3.1.Basic validity

Two kinds of validity errors can occur in the basic generic data model:¶

Duplicate keys in a map:: Generic decoders (Section 5.2) make data available to applicationsusing the native CBOR data model. That data model includes maps(key-value mappings with unique keys), not multimaps (key-valuemappings where multiple entries can have the same key). Thus, ageneric decoder that gets a CBOR map item that has duplicate keyswill decode to a map with only one instance of that key, or it mightstop processing altogether. On the other hand, a "streamingdecoder" may not even be able to notice. SeeSection 5.6 for morediscussion of keys in maps.¶
Invalid UTF-8 string:: A decoder might or might not want to verify that the sequence ofbytes in a UTF-8 string (major type 3) is actually valid UTF-8 andreact appropriately.¶

5.3.2.Tag validity

Two additional kinds of validity errors are introduced by adding tagsto the basic generic data model:¶

Inadmissible type for tag content:: Tag numbers (Section 3.4) specify what type of data item is supposed to beused as their tag content; for example, the tag numbers for unsigned or negative bignums aresupposed to be put on byte strings. A decoder that decodes thetagged data item into a native representation (a native big integerin this example) is expected to check the type of the data itembeing tagged. Even decoders that don't have such nativerepresentations available in their environment may perform the checkon those tags known to them and react appropriately.¶
Inadmissible value for tag content:: The type of data item may be admissible for a tag's content, but thespecific value may not be; e.g., a value of "yesterday" is notacceptable for the content of tag 0, even though it properly is atext string. A decoder that normally ingests such tags intoequivalent platform types might present this tag to the applicationin a similar way to how it would present a tag with an unknown tagnumber (Section 5.4).¶

5.4.Validity and Evolution

A decoder with validity checking will expend the effort to reliablydetect data items with validity errors. For example, such adecoder needs to have an API that reports an error (and does notreturn data) for a CBOR data item that contains any of the validityerrors listed in the previous subsection.¶

The set of tags defined in the "Concise Binary Object Representation (CBOR) Tags" registry (Section 9.2), as well as the set of simple values defined in the "Concise Binary Object Representation (CBOR) Simple Values" registry (Section 9.1), can grow at any time beyond the set understood by a generic decoder. A validity-checking decoder can do one of two things when it encounters such a case that it does not recognize:¶

It can report an error (and not return data).Note that treating this case as an error can cause ossification and isthus not encouraged. This erroris not a validity error, per se. This kind of error is more likelyto be raised by a decoder that would be performing validity checkingif this were a known case.¶
It can emit the unknown item (type, value, and, for tags, thedecoded tagged data item) to the application calling the decoder,and then give the applicationan indication that the decoder did not recognize that tagnumber or simple value.¶

The latter approach, which is also appropriate for decoders that donot support validity checking, provides forward compatibility withnewly registered tags and simple values without the requirement toupdate the encoder at the same time as the calling application. (Forthis, the decoder's API needs the ability to mark unknownitems so that the calling application can handle them in a mannerappropriate for the program.)¶

Since some of the processing needed for validity checking may have anappreciable cost (in particular with duplicate detection for maps),support of validity checking is not a requirement placed on all CBORdecoders.¶

Some encoders will rely on their applications to provide input data insuch a way that valid CBOR results from the encoder. A genericencoder may also want to provide a validity-checking mode where itreliably limits its output to valid CBOR, independent of whether ornot its application is indeed providing API-conformant data.¶

5.5.Numbers

CBOR-based protocols should take into account that different languageenvironments pose different restrictions on the range and precision ofnumbers that are representable. For example, the basic JavaScript numbersystem treats all numbers as floating-point values, which may result inthe silent loss of precision in decoding integers with more than 53significant bits.Another example is that, since CBOR keeps the sign bit for its integerrepresentation in the major type, it has one bit more for signednumbers of a certain length (e.g., -2⁶⁴..2⁶⁴-1 for 1+8-byteintegers) than the typical platform signed integer representation ofthe same length (-2⁶³..2⁶³-1 for 8-byte int64_t).A protocol that uses numbers should define itsexpectations on the handling of nontrivial numbers in decoders andreceiving applications.¶

A CBOR-based protocol that includes floating-point numbers canrestrict which of the three formats (half-precision, single-precision,and double-precision) are to be supported. For an integer-onlyapplication, a protocol may want to completely exclude the use offloating-point values.¶

A CBOR-based protocol designed for compactness may want to excludespecific integer encodings that are longer than necessary for theapplication, such as to save the need to implement 64-bit integers.There is an expectation that encoders will use the most compactinteger representation that can represent a given value. However, acompact application that does not require deterministic encodingshould accept values that use a longer-than-neededencoding (such as encoding "0" as 0b000_11001 followed by two bytes of0x00) as long as the application can decode an integer of the givensize.Similar considerations apply to floating-point values; decoding bothpreferred serializations and longer-than-needed ones is recommended.¶

CBOR-based protocols for constrained applications that provide achoice between representing a specific number as an integer andas a decimal fraction or bigfloat (such as when the exponent is smalland nonnegative) might express a quality-of-implementation expectationthat the integer representation is used directly.¶

5.6.Specifying Keys for Maps

The encoding and decoding applications need to agree on what types ofkeys are going to be used in maps. In applications that need tointerwork with JSON-based applications, conversion is simplified bylimiting keys to text strings only; otherwise, there has to be a specifiedmapping from the other CBOR types to text strings, and thisoften leads to implementation errors. In applications where keys arenumeric in nature, and numeric ordering of keys is important to theapplication, directly using the numbers for the keys is useful.¶

If multiple types of keys are to be used, consideration should begiven to how these types would be represented in the specificprogramming environments that are to be used. For example, inJavaScript Maps[ECMA262], a key of integer 1 cannot bedistinguished from a key of floating-point 1.0. This means that, if integerkeys are used, the protocol needs to avoid the use offloating-point keys the values of which happen to be integer numbers in the same map.¶

Decoders that deliver data items nested within a CBOR data itemimmediately on decoding them ("streaming decoders") often do not keepthe state that is necessary to ascertain uniqueness of a key in a map.Similarly, an encoder that can start encoding data items before theenclosing data item is completely available ("streaming encoder") maywant to reduce its overhead significantly by relying on its datasource to maintain uniqueness.¶

A CBOR-based protocolMUST define whatto do when a receiving application sees multiple identical keys ina map. The resulting rule in the protocolMUST respect the CBORdata model: it cannot prescribe a specific handling of the entrieswith the identical keys, except that it might have a rule that havingidentical keys in a map indicates a malformed map and that the decoderhas to stop with an error.When processing maps that exhibit entries with duplicate keys, a genericdecoder might do one of the following:¶

Not accept maps with duplicate keys (that is, enforce validity for maps,see alsoSection 5.4). These generic decoders areuniversally useful. An application may still need to perform itsown duplicate checking based on application rules (for instance, ifthe application equates integers and floating-point values in mapkey positions for specific maps).¶
Pass all map entries to the application, including ones withduplicate keys. This requires that the application handle (checkagainst) duplicate keys, even if the application rules are identicalto the generic data model rules.¶
Lose some entries with duplicate keys, e.g., deliver only thefinal (or first) entry out of the entries with the same key. Withsuch a generic decoder, applications may get different results for aspecific key on different runs, and with different generic decoders,which value is returned is based on generic decoderimplementation and the actual order of keys in the map. Inparticular, applications cannot validate key uniqueness on their ownas they do not necessarily see all entries; they may not be able touse such a generic decoder if they need to validate keyuniqueness. These generic decoders can only be used in situationswhere the data source and transfer alwaysprovide valid maps; this is not possible if the data source and transfer can be attacked.¶

Generic decoders need to document which of these three approaches theyimplement.¶

The CBOR data model for maps does not allow ascribing semantics to theorder of the key/value pairs in the map representation. Thus, aCBOR-based protocolMUST NOT specify that changing the key/value pairorder in a map changes the semantics, except to specify that someorders are disallowed, for example, where they would not meet therequirements of a deterministicencoding (Section 4.2).(Any secondary effects of map ordering such as on timing, cache usage,and other potential side channels are not considered part of thesemantics but may be enough reason on their own for a protocol to require adeterministic encoding format.)¶

Applications for constrained devices should consider using small integers as keys if they have maps with a small number of frequently used keys; for instance, a set of 24 or fewer keys can be encoded in a single byte as unsigned integers, up to 48 if negative integers are also used. Less frequently occurring keys can then use integers with longer encodings.¶

5.6.1.Equivalence of Keys

The specific data model that applies to a CBOR data item is used todetermine whether keys occurring in maps are duplicates or distinct.¶

At the generic data model level, numerically equivalent integer andfloating-point values are distinct from each other, as they are fromthe various big numbers (Tags 2 to 5). Similarly, text strings aredistinct from byte strings, even if composed of the same bytes. Atagged value is distinct from an untagged value or from a value taggedwith a different tag number.¶

Within each of these groups, numeric values are distinct unless theyare numerically equal (specifically, -0.0 is equal to 0.0); for thepurpose of map key equivalence, NaN values are equivalent if they have the same significand afterzero-extending both significands at the right to 64 bits.¶

Both byte strings and text strings are compared byte by byte, arrays are compared element by element, and are equal if they have the same number of bytes/elements and the same values at the same positions. Two maps are equal if they have the same set of pairs regardless oftheir order; pairs are equal if both the key and value are equal.¶

Tagged values are equal if both the tag number and the tag content are equal.(Note that a generic decoder that provides processing for a specifictag may not be able to distinguish some semantically equivalentvalues, e.g., if leading zeroes occur in the content of tag 2 or tag 3(Section 3.4.3).)Simple values are equal if they simply have the same value.Nothing else is equal in the generic data model; a simple value 2 isnot equivalent to an integer 2, and an array is never equivalent to a map.¶

As discussed inSection 2.2, specific data models canmake values equivalent for the purpose of comparing map keys that aredistinct in the generic data model. Note that this implies that ageneric decoder may deliver a decoded map to an application that needsto be checked for duplicate map keys by that application(alternatively, the decoder may provide a programming interface toperform this service for the application). Specific data modelsare not able to distinguish values for map keys that are equal for this purposeat the generic data model level.¶

5.7.Undefined Values

In some CBOR-based protocols, the simple value (Section 3.3) ofundefined might be used by an encoder as a substitute for a data itemwith an encoding problem, in order to allow the rest of the enclosingdata items to be encoded without harm.¶

6.Converting Data between CBOR and JSON

This section gives non-normative advice about converting between CBORand JSON. Implementations of convertersMAY use whicheveradvice here they want.¶

It is worth noting that a JSON text is a sequence of characters, notan encoded sequence of bytes, while a CBOR data item consists ofbytes, not characters.¶

6.1.Converting from CBOR to JSON

Most of the types in CBOR have direct analogs in JSON. However, somedo not, and someone implementing a CBOR-to-JSON converter has toconsider what to do in those cases. The following non-normative advicedeals with these by converting them to a single substitute value, suchas a JSON null.¶

An integer (major type 0 or 1) becomes a JSON number.¶
A byte string (major type 2) that is not embedded in a tag thatspecifies a proposed encoding is encoded in base64url withoutpadding and becomes a JSON string.¶
A UTF-8 string (major type 3) becomes a JSON string. Note that JSONrequires escaping certain characters ([RFC8259],Section 7):quotation mark (U+0022), reverse solidus (U+005C), and the "C0control characters" (U+0000 through U+001F). All other charactersare copied unchanged into the JSON UTF-8 string.¶
An array (major type 4) becomes a JSON array.¶
A map (major type 5) becomes a JSON object. This is possibledirectly only if all keys are UTF-8 strings. A converter might alsoconvert other keys into UTF-8 strings (such as by convertingintegers into strings containing their decimal representation);however, doing so introduces a danger of key collision.Note also that, if tags on UTF-8 strings are ignored as proposedbelow, this will cause a keycollision if the tags are different but the strings are the same.¶
False (major type 7, additional information 20) becomes a JSONfalse.¶
True (major type 7, additional information 21) becomes a JSON true.¶
Null (major type 7, additional information 22) becomes a JSON null.¶
A floating-point value (major type 7, additional information 25through 27) becomes a JSON number if it is finite (that is, it canbe represented in a JSON number); if the value is non-finite (NaN,or positive or negative Infinity), it is represented by thesubstitute value.¶
Any other simple value (major type 7, any additional informationvalue not yet discussed) is represented by the substitute value.¶
A bignum (major type 6, tag number 2 or 3) is represented by encodingits byte string in base64url without padding and becomes a JSONstring. For tag number 3 (negative bignum), a "~" (ASCII tilde) isinserted before the base-encoded value. (The conversion to a binaryblob instead of a number is to prevent a likely numeric overflow forthe JSON decoder.)¶
A byte string with an encoding hint (major type 6, tag number 21through 23) is encoded as described by the hint and becomes a JSON string.¶
For all other tags (major type 6, any other tag number), the tagcontent is represented as a JSON value; the tag number is ignored.¶
Indefinite-length items are made definite before conversion.¶

A CBOR-to-JSON converter may want to keep to the JSON profile I-JSON[RFC7493], to maximize interoperability and increase confidencethat the JSON output can be processed with predictable results. Forexample, this has implications on the range of integers that can berepresented reliably, as well as on the top-level items that may besupported by older JSON implementations.¶

6.2.Converting from JSON to CBOR

All JSON values, once decoded, directly map into one or more CBORvalues. As with any kind of CBOR generation, decisions have to bemade with respect to number representation. In a suggestedconversion:¶

JSON numbers without fractional parts (integer numbers) arerepresented as integers (major types 0 and 1, possibly major type 6,tag number 2 and 3), choosing the shortest form; integers longer thanan implementation-defined threshold may instead be represented asfloating-point values. Thedefault range that is represented as integer is-2⁵³+1..2⁵³-1 (fully exploiting the range for exact integersin the binary64 representation often used for decoding JSON[RFC7493]).A CBOR-based protocol, or a generic converter implementation,may choose -2³²..2³²-1 or -2⁶⁴..2⁶⁴-1 (fullyusing the integer ranges available in CBOR with uint32_t oruint64_t, respectively) or even -2³¹..2³¹-1 or-2⁶³..2⁶³-1 (using popular ranges for two's complementsigned integers).(Ifthe JSON was generated from a JavaScript implementation, itsprecision is already limited to 53 bits maximum.)¶
Numbers with fractional parts are represented as floating-pointvalues, performing the decimal-to-binary conversion based on theprecision provided by IEEE 754 binary64.The mathematical value of the JSON number is converted to binary64using the roundTiesToEven procedure in Section 4.3.1 of[IEEE754].Then, when encoding inCBOR, the preferred serialization uses the shortest floating-pointrepresentation exactly representing this conversion result; forinstance, 1.5 is represented in a 16-bit floating-point value (notall implementations will be capable of efficiently finding theminimum form, though). Instead of using the default binary64precision, there may be an implementation-defined limit to theprecision of the conversion that will affect the precision of therepresented values. Decimal representation should only be used onthe CBOR side if that is specified in a protocol.¶

CBOR has been designed to generally provide a more compact encodingthan JSON. One implementation strategy that might come to mind is toperform a JSON-to-CBOR encoding in place in a single buffer. Thisstrategy would need to carefully consider a number of pathologicalcases, such as that some strings represented with no or very fewescapes and longer (or much longer) than 255 bytes may expand whenencoded as UTF-8 strings in CBOR. Similarly, a few of the binaryfloating-point representations might cause expansion from some shortdecimal representations (1.1, 1e9) in JSON. This may be hard to getright, and any ensuing vulnerabilities may be exploited by anattacker.¶

7.Future Evolution of CBOR

Successful protocols evolve over time. New ideas appear,implementation platforms improve, related protocols are developed andevolve, and new requirements from applications and protocols areadded. Facilitating protocol evolution is therefore an importantdesign consideration for any protocol development.¶

For protocols that will use CBOR, CBOR provides some useful mechanismsto facilitate their evolution. Best practices for this are wellknown, particularly from JSON format development of JSON-basedprotocols. Therefore, such best practices are outside the scope ofthis specification.¶

However, facilitating the evolution of CBOR itself is very well withinits scope. CBOR is designed to both provide a stable basis fordevelopment of CBOR-based protocols and to be able to evolve. Since asuccessful protocol may live for decades, CBOR needs to be designedfor decades of use and evolution. This section provides some guidancefor the evolution of CBOR. It is necessarily more subjective thanother parts of this document. It is also necessarily incomplete, lestit turn into a textbook on protocol development.¶

7.1.Extension Points

In a protocol design, opportunities for evolution are often includedin the form of extension points. For example, there may be acodepoint space that is not fully allocated from the outset, and theprotocol is designed to tolerate and embrace implementations thatstart using more codepoints than initially allocated.¶

Sizing the codepoint space may be difficult because the range requiredmay be hard to predict. Protocol designs should attempt to make thecodepoint space large enough so that it can slowly be filled over theintended lifetime of the protocol.¶

CBOR has three major extension points:¶

the "simple" space (values in major type 7):: Of the 24 efficient(and 224 slightly less efficient) values, only a small number havebeen allocated. Implementations receiving an unknown simple dataitem may easily be able to process it as such, given that the structure ofthe value is indeed simple. The IANA registry inSection 9.1 is the appropriate way to address theextensibility of this codepoint space.¶
the "tag" space (values in major type 6):: The total codepoint spaceis abundant; only a tiny part of it hasbeen allocated. However, not all of these codepoints are equallyefficient: the first 24 only consume a single ("1+0") byte, andhalf of them have already been allocated. The next 232 values onlyconsume two ("1+1") bytes, with nearly a quarter already allocated.These subspaces need some curation to last for a few more decades.Implementations receiving an unknown tag number can choose toprocess just the enclosed tag content or, preferably, toprocess the tag as an unknown tag number wrapping thetag content. The IANA registry inSection 9.2 is the appropriate way toaddress the extensibility of this codepoint space.¶
the "additional information" space:: An implementation receiving anunknown additional information value has no way to continue decoding,so allocating codepoints in this space is a major step beyond justexercising an extension point. There arealso very few codepoints left. See alsoSection 7.2.¶

7.2.Curating the Additional Information Space

The human mind is sometimes drawn to filling in little perceived gapsto make something neat. We expect the remaining gaps in the codepointspace for the additional information values to be an attractor for newideas, just because they are there.¶

The present specification does not manage the additional informationcodepoint space by an IANA registry. Instead, allocations out of thisspace can only be done by updating this specification.¶

For an additional information value of n >= 24, the size of theadditional data typically is 2^n-24 bytes. Therefore, additionalinformation values 28 and 29 should be viewed as candidates for128-bit and 256-bit quantities, in case a need arises to add them tothe protocol. Additional information value 30 is then the onlyadditional information value available for general allocation, andthere should be a very good reason for allocating it before assigningit through an update of the present specification.¶

8.Diagnostic Notation

CBOR is a binary interchange format. To facilitate documentation anddebugging, and in particular to facilitate communication betweenentities cooperating in debugging, this section defines a simplehuman-readable diagnostic notation. All actual interchange alwayshappens in the binary format.¶

Note that this truly is a diagnostic format; it is not meant to beparsed. Therefore, no formal definition (as in ABNF) is given in thisdocument. (Implementers looking for a text-based format forrepresenting CBOR data items in configuration files may also want toconsider YAML[YAML].)¶

The diagnostic notation is loosely based on JSON as it is defined inRFC 8259, extending it where needed.¶

The notation borrows the JSON syntax for numbers (integer andfloating-point), True (>true<), False (>false<), Null (>null<), UTF-8strings, arrays, and maps (maps are called objects in JSON; thediagnostic notation extends JSON here by allowing any data item in thekey position). Undefined is written >undefined< as in JavaScript.The non-finite floating-point numbers Infinity, -Infinity, and NaN arewritten exactly as in this sentence (this is also a way they can bewritten in JavaScript, although JSON does not allow them). A tag iswritten as an integer number for the tag number, followed by the tag contentin parentheses; for instance, a date in the format specified by RFC 3339 (ISO 8601) could benotated as:¶

0("2013-03-21T20:04:00Z")¶

or the equivalent relative time as the following:¶

1(1363896240)¶

Byte strings are notated in one of the base encodings, withoutpadding, enclosed in single quotes, prefixed by >h< for base16,>b32< for base32, >h32< for base32hex, >b64< for base64 orbase64url (the actual encodings do not overlap, so the string remainsunambiguous). For example, the byte string 0x12345678 could bewritten h'12345678', b32'CI2FM6A', or b64'EjRWeA'.¶

Unassigned simple values are given as "simple()" with the appropriateinteger in the parentheses. For example, "simple(42)" indicates majortype 7, value 42.¶

A number of useful extensions to the diagnostic notation defined here areprovided inAppendix G of [RFC8610], "Extended Diagnostic Notation" (EDN). Similarly, this notation could be extended in a separate document to provide documentation for NaN payloads, which are not covered in this document.¶

8.1.Encoding Indicators

Sometimes it is useful to indicate in the diagnostic notation which ofseveral alternative representations were actually used; for example, adata item written >1.5< by a diagnostic decoder might have beenencoded as a half-, single-, or double-precision float.¶

The convention for encoding indicators is that anything starting withan underscore and all following characters that are alphanumeric orunderscore is an encoding indicator, and can be ignored by anyone notinterested in this information. For example,_ or_3.Encoding indicators are alwaysoptional.¶

A single underscore can be written after the opening brace of a map orthe opening bracket of an array to indicate that the data item wasrepresented in indefinite-length format. For example, [_ 1, 2]contains an indicator that an indefinite-length representation wasused to represent the data item [1, 2].¶

An underscore followed by a decimal digit n indicates that thepreceding item (or, for arrays and maps, the item starting with thepreceding bracket or brace) was encoded with an additional informationvalue of 24+n. For example, 1.5_1 is a half-precision floating-pointnumber, while 1.5_3 is encoded as double precision. This encodingindicator is not shown inAppendix A. (Note that the encodingindicator "_" is thus an abbreviation of the full form "_7", which isnot used.)¶

The detailed chunk structure of byte and text strings of indefinitelength can benotated in the form (_ h'0123', h'4567') and (_ "foo", "bar").However, for an indefinite-length string with no chunks inside, (_ )would be ambiguous as to whether a byte string (0x5fff) or a text string(0x7fff) is meant and is therefore not used.The basic forms ''_ and ""_ can be used instead and are reserved forthe case of no chunks only -- not as short forms for the (permitted,but not really useful) encodings with only empty chunks, which need to be notated as (_ ''), (_ ""), etc.,to preserve the chunk structure.¶

9.IANA Considerations

IANA has created two registries for new CBOR values. The registries are separate, that is, not under an umbrella registry, and follow the rules in[RFC8126]. IANA has also assigned a new media type, an associated CoAP Content-Format entry, and a structured syntax suffix.¶

9.1.CBOR Simple Values Registry

IANA has created the "Concise Binary Object Representation (CBOR) Simple Values" registry at[IANA.cbor-simple-values]. The initial values are shown inTable 4.¶

New entries in the range 0 to 19 are assigned by Standards Action[RFC8126]. Itis suggested that IANA allocate values startingwith the number 16 in order to reserve the lower numbers forcontiguous blocks (if any).¶

New entries in the range 32 to 255 are assigned by SpecificationRequired.¶

9.2.CBOR Tags Registry

IANA has created the "Concise Binary Object Representation (CBOR)Tags" registry at[IANA.cbor-tags].The tags that were defined in[RFC7049] are described in detail inSection 3.4,and other tags have already been defined since then.¶

New entries in the range 0 to 23 ("1+0") are assigned by Standards Action.New entries in the ranges 24 to 255 ("1+1") and 256 to 32767 (lowerhalf of "1+2") are assigned by SpecificationRequired. New entries in the range 32768 to 18446744073709551615(upper half of "1+2", "1+4", and "1+8") areassigned by First Come First Served. The template for registrationrequests is:¶

Data item¶
Semantics (short form)¶

In addition, First Come First Served requests should include:¶

Point of contact¶
Description of semantics (URL) -- This description isoptional; the URL can point to something like an Internet-Draft or aweb page.¶

Applicants exercising the First Come First Served range and makinga suggestion for a tag number that is not representable in 32 bits(i.e., larger than 4294967295)should be aware that this could reduce interoperability withimplementations that do not support 64-bit numbers.¶

9.3.Media Types Registry

The Internet media type[RFC6838] ("MIME type") for a single encoded CBOR dataitem is "application/cbor" as defined in the "Media Types" registry[IANA.media-types]:¶

Type name:

application¶

Subtype name:

cbor¶

Required parameters:

n/a¶

Optional parameters:

n/a¶

Encoding considerations:

Binary¶

Security considerations:

SeeSection 10 of RFC 8949.¶

Interoperability considerations:

n/a¶

Published specification:

RFC 8949¶

Applications that use this media type:

Many¶

Additional information:

Magic number(s):: n/a¶
File extension(s):: .cbor¶
Macintosh file type code(s):: n/a¶

Person & email address to contact for further information:

IETF CBOR Working Group (cbor@ietf.org) or IETF Applications and Real-Time Area (art@ietf.org)¶

Intended usage:

COMMON¶

Restrictions on usage:

none¶

Author:

IETF CBOR Working Group (cbor@ietf.org)¶

Change controller:

The IESG (iesg@ietf.org)¶

9.4.CoAP Content-Format Registry

The CoAP Content-Format for CBOR has been registered in the "CoAP Content-Formats" subregistry within the "Constrained RESTful Environments (CoRE) Parameters" registry[IANA.core-parameters]:¶

Media Type:: application/cbor¶
Encoding:: -¶
ID:: 60¶
Reference:: RFC 8949¶

9.5.Structured Syntax Suffix Registry

The structured syntax suffix[RFC6838] for media types based on a single encoded CBOR data item is +cbor, which IANA has registered in the "Structured Syntax Suffixes" registry[IANA.structured-suffix]:¶

Name:

Concise Binary Object Representation (CBOR)¶

+suffix:

+cbor¶

References:

RFC 8949¶

Encoding Considerations:

CBOR is a binary format.¶

Interoperability Considerations:

n/a¶

Fragment Identifier Considerations:

The syntax and semantics of fragment identifiers specified for+cborSHOULD be as specified for "application/cbor". (Atpublication of RFC 8949, there is no fragment identificationsyntax defined for "application/cbor".)¶

The syntax and semantics for fragment identifiers for a specific"xxx/yyy+cbor"SHOULD be processed as follows:¶

For cases defined in +cbor, where the fragment identifier resolvesper the +cbor rules, then process as specified in +cbor.¶
For cases defined in +cbor, where the fragment identifier doesnot resolve per the +cbor rules, then process as specified in"xxx/yyy+cbor".¶
For cases not defined in +cbor, then process as specified in"xxx/yyy+cbor".¶

Security Considerations:

SeeSection 10 of RFC 8949.¶

Contact:

IETF CBOR Working Group (cbor@ietf.org) orIETF Applications and Real-Time Area (art@ietf.org)¶

Author/Change Controller:

IETF¶

10.Security Considerations

A network-facing application can exhibit vulnerabilities in itsprocessing logic for incoming data. Complex parsers are well known asa likely source of such vulnerabilities, such as the ability toremotely crash a node, or even remotely execute arbitrary code on it.CBOR attempts to narrow the opportunities for introducing suchvulnerabilities by reducing parser complexity, by giving the entirerange of encodable values a meaning where possible.¶

Because CBOR decoders are often used as a first step in processingunvalidated input, they need to be fully prepared for all types ofhostile input that may be designed to corrupt, overrun, or achieve controlof the system decoding the CBOR data item. A CBOR decoder needs toassume that all input may be hostile even if it has been checked by afirewall, has come over a secure channel such as TLS, is encrypted orsigned,or has come from some other source that is presumed trusted.¶

Section 4.1 gives examples of limitations in interoperability when using aconstrained CBOR decoder with input from a CBOR encoder that uses anon-preferred serialization. When a single data item is consumed both by such aconstrained decoder and a full decoder, it can lead to security issues that canbe exploited by an attacker who can inject or manipulate content.¶

As discussed throughout this document, there are many values that can beconsidered "equivalent" in some circumstances and "not equivalent" in others. Asjust one example, the numeric value for the number "one" might be expressed asan integer or a bignum. A system interpreting CBOR input might accept eitherform for the number "one", or might reject one (or both) forms. Such acceptanceor rejection can have security implications in the program that is using theinterpreted input.¶

Hostile input may be constructed to overrun buffers, to overflow orunderflow integer arithmetic, or to cause other decoding disruption. CBORdata items might have lengths or sizes that are intentionallyextremely large or too short.Resource exhaustion attacks might attempt to lure a decoder intoallocating very big data items (strings, arrays, maps, or evenarbitrary precision numbers) or exhaust thestack depth by setting up deeply nested items. Decoders need to haveappropriate resource management to mitigate these attacks. (Items forwhich very large sizes are given can also attempt to exploit integeroverflow vulnerabilities.)¶

A CBOR decoder, by definition, only accepts well-formed CBOR; this isthe first step to its robustness. Input that is not well-formed CBORcauses no further processing from the point where the lack ofwell-formedness was detected. If possible, any data decoded up tothis point should have no impact on the application using the CBORdecoder.¶

In addition to ascertaining well-formedness, a CBOR decoder might alsoperform validity checks on the CBOR data. Alternatively, it can leavethose checks to the application using the decoder. This choice needsto be clearly documented in the decoder. Beyond the validity at the CBOR level, anapplication also needs to ascertain that the input is in alignmentwith the application protocol that is serialized in CBOR.¶

The input check itself may consume resources. This is usually linearin the size of the input, which means that an attacker has to spendresources that are commensurate to the resources spent by the defenderon input validation.However, an attacker might be able to craft inputs that will take longer for atarget decoder to process than for the attacker to produce.Processing for arbitrary-precision numbers mayexceed linear effort. Also, some hash-table implementations that areused by decoders to build in-memory representations of maps can beattacked to spend quadratic effort, unless a secret key(see Section 7 of[SIPHASH_LNCS], also[SIPHASH_OPEN]) or some other mitigation is employed.Such superlinear efforts can beexploited by an attacker to exhaust resources at or before the inputvalidator; they therefore need to be avoided in a CBOR decoderimplementation. Note that tag number definitions and their implementationscan add security considerations of this kind; this should then bediscussed in the security considerations of the tag number definition.¶

CBOR encoders do not receive input directly from the network and arethus not directly attackable in the same way as CBOR decoders.However, CBOR encoders often have an API thattakes input from another level in the implementation and can beattacked through that API. The design and implementation of that APIshould assume the behavior of its caller may be based on hostile inputor on coding mistakes. It should check inputs for buffer overruns,overflow and underflow of integer arithmetic, and other such errorsthat are aimed to disrupt the encoder.¶

Protocols should be defined insuch a way that potential multiple interpretations are reliablyreduced to a single interpretation. For example, an attacker could make use ofinvalid input such as duplicate keys in maps, or exploit differentprecision in processing numbers to make one application base itsdecisions on a different interpretation than the one that will be usedby a second application. To facilitate consistent interpretation,encoder and decoder implementations shouldprovide a validity-checking mode of operation(Section 5.4). Note, however, that a generic decoder cannotknow about all requirements that an application poses on its inputdata; it is therefore not relieving the application from performing itsown input checking. Also, since the set of defined tag numbersevolves, the application may employ a tag number that is not yetsupported for validity checking by the generic decoder it uses. Genericdecoders therefore need to document which tag numbersthey support and what validity checking they provide for those tag numbers as well as for basic CBOR (UTF-8 checking, duplicate mapkey checking).¶

Section 3.4.3 notes that using the non-preferred choice of a bignumrepresentation instead of a basic integer for encoding a number is not intendedto have application semantics, but it can have such semantics if an applicationreceiving CBOR data is using a decoder in the basic generic data model. Thisdisparity causes a security issue if the two sets of semantics differ. Thus,applications using CBOR need to specify the data model that they are using foreach use of CBOR data.¶

It is common to convert CBOR data to other formats. In many cases, CBOR has moreexpressive types than other formats; this is particularly true for thecommon conversion to JSON. The loss of type information can cause securityissues for the systems that are processing the less-expressive data.¶

Section 6.2 describes a possibly common usage scenario ofconverting between CBOR and JSON that could allow an attack if the attacker knowsthat the application is performing the conversion.¶

Security considerations for the use of base16 and base64 from[RFC4648], and the useof UTF-8 from[RFC3629], are relevant to CBOR as well.¶

Appendix A.Examples of Encoded CBOR Data Items

The following table provides some CBOR-encoded values in hexadecimal (right column), together with diagnostic notation for these values (left column). Note that the string "\u00fc" is one form of diagnostic notation for a UTF-8 string containing the single Unicode characterU+00FC (LATIN SMALL LETTER U WITH DIAERESIS, "ü"). Similarly, "\u6c34" is a UTF-8 string in diagnostic notation with a single characterU+6C34 (CJK UNIFIED IDEOGRAPH-6C34, "水"), often representing "water", and "\ud800\udd51" is a UTF-8 string in diagnostic notation with a single characterU+10151 (GREEK ACROPHONIC ATTIC FIFTY STATERS, "𐅑"). (Note that all these single-character strings could also be represented in native UTF-8 in diagnostic notation, just not if an ASCII-only specification is required.) In the diagnostic notation provided for bignums, their intended numeric value is shown as a decimal number (such as 18446744073709551616) instead of a tagged byte string (such as 2(h'010000000000000000')).¶

Table 6:Examples of Encoded CBOR Data Items
Diagnostic	Encoded
0	0x00
1	0x01
10	0x0a
23	0x17
24	0x1818
25	0x1819
100	0x1864
1000	0x1903e8
1000000	0x1a000f4240
1000000000000	0x1b000000e8d4a51000
18446744073709551615	0x1bffffffffffffffff
18446744073709551616	0xc249010000000000000000
-18446744073709551616	0x3bffffffffffffffff
-18446744073709551617	0xc349010000000000000000
-1	0x20
-10	0x29
-100	0x3863
-1000	0x3903e7
0.0	0xf90000
-0.0	0xf98000
1.0	0xf93c00
1.1	0xfb3ff199999999999a
1.5	0xf93e00
65504.0	0xf97bff
100000.0	0xfa47c35000
3.4028234663852886e+38	0xfa7f7fffff
1.0e+300	0xfb7e37e43c8800759c
5.960464477539063e-8	0xf90001
0.00006103515625	0xf90400
-4.0	0xf9c400
-4.1	0xfbc010666666666666
Infinity	0xf97c00
NaN	0xf97e00
-Infinity	0xf9fc00
Infinity	0xfa7f800000
NaN	0xfa7fc00000
-Infinity	0xfaff800000
Infinity	0xfb7ff0000000000000
NaN	0xfb7ff8000000000000
-Infinity	0xfbfff0000000000000
false	0xf4
true	0xf5
null	0xf6
undefined	0xf7
simple(16)	0xf0
simple(255)	0xf8ff
0("2013-03-21T20:04:00Z")	0xc074323031332d30332d32315432303a 30343a30305a
1(1363896240)	0xc11a514b67b0
1(1363896240.5)	0xc1fb41d452d9ec200000
23(h'01020304')	0xd74401020304
24(h'6449455446')	0xd818456449455446
32("http://www.example.com")	0xd82076687474703a2f2f7777772e6578 616d706c652e636f6d
h''	0x40
h'01020304'	0x4401020304
""	0x60
"a"	0x6161
"IETF"	0x6449455446
"\"\\"	0x62225c
"\u00fc"	0x62c3bc
"\u6c34"	0x63e6b0b4
"\ud800\udd51"	0x64f0908591
[]	0x80
[1, 2, 3]	0x83010203
[1, [2, 3], [4, 5]]	0x8301820203820405
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]	0x98190102030405060708090a0b0c0d0e 0f101112131415161718181819
{}	0xa0
{1: 2, 3: 4}	0xa201020304
{"a": 1, "b": [2, 3]}	0xa26161016162820203
["a", {"b": "c"}]	0x826161a161626163
{"a": "A", "b": "B", "c": "C", "d": "D", "e": "E"}	0xa5616161416162614261636143616461 4461656145
(_ h'0102', h'030405')	0x5f42010243030405ff
(_ "strea", "ming")	0x7f657374726561646d696e67ff
[_ ]	0x9fff
[_ 1, [2, 3], [_ 4, 5]]	0x9f018202039f0405ffff
[_ 1, [2, 3], [4, 5]]	0x9f01820203820405ff
[1, [2, 3], [_ 4, 5]]	0x83018202039f0405ff
[1, [_ 2, 3], [4, 5]]	0x83019f0203ff820405
[_ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]	0x9f0102030405060708090a0b0c0d0e0f 101112131415161718181819ff
{_ "a": 1, "b": [_ 2, 3]}	0xbf61610161629f0203ffff
["a", {_ "b": "c"}]	0x826161bf61626163ff
{_ "Fun": true, "Amt": -2}	0xbf6346756ef563416d7421ff

Appendix B.Jump Table for Initial Byte

For brevity, this jump table does not show initial bytes that arereserved for future extension. It also only shows a selection of theinitial bytes that can be used for optional features. (Allunsigned integers are in network byte order.)¶

Table 7:Jump Table for Initial Byte
Byte	Structure/Semantics
0x00..0x17	unsigned integer 0x00..0x17 (0..23)
0x18	unsigned integer (one-byte uint8_t follows)
0x19	unsigned integer (two-byte uint16_t follows)
0x1a	unsigned integer (four-byte uint32_t follows)
0x1b	unsigned integer (eight-byte uint64_t follows)
0x20..0x37	negative integer -1-0x00..-1-0x17 (-1..-24)
0x38	negative integer -1-n (one-byte uint8_t for n follows)
0x39	negative integer -1-n (two-byte uint16_t for n follows)
0x3a	negative integer -1-n (four-byte uint32_t for n follows)
0x3b	negative integer -1-n (eight-byte uint64_t for n follows)
0x40..0x57	byte string (0x00..0x17 bytes follow)
0x58	byte string (one-byte uint8_t for n, and then n bytes follow)
0x59	byte string (two-byte uint16_t for n, and then n bytes follow)
0x5a	byte string (four-byte uint32_t for n, and then n bytes follow)
0x5b	byte string (eight-byte uint64_t for n, and then n bytes follow)
0x5f	byte string, byte strings follow, terminated by "break"
0x60..0x77	UTF-8 string (0x00..0x17 bytes follow)
0x78	UTF-8 string (one-byte uint8_t for n, and then n bytes follow)
0x79	UTF-8 string (two-byte uint16_t for n, and then n bytes follow)
0x7a	UTF-8 string (four-byte uint32_t for n, and then n bytes follow)
0x7b	UTF-8 string (eight-byte uint64_t for n, and then n bytes follow)
0x7f	UTF-8 string, UTF-8 strings follow, terminated by "break"
0x80..0x97	array (0x00..0x17 data items follow)
0x98	array (one-byte uint8_t for n, and then n data items follow)
0x99	array (two-byte uint16_t for n, and then n data items follow)
0x9a	array (four-byte uint32_t for n, and then n data items follow)
0x9b	array (eight-byte uint64_t for n, and then n data items follow)
0x9f	array, data items follow, terminated by "break"
0xa0..0xb7	map (0x00..0x17 pairs of data items follow)
0xb8	map (one-byte uint8_t for n, and then n pairs of data items follow)
0xb9	map (two-byte uint16_t for n, and then n pairs of data items follow)
0xba	map (four-byte uint32_t for n, and then n pairs of data items follow)
0xbb	map (eight-byte uint64_t for n, and then n pairs of data items follow)
0xbf	map, pairs of data items follow, terminated by "break"
0xc0	text-based date/time (data item follows; seeSection 3.4.1)
0xc1	epoch-based date/time (data item follows; seeSection 3.4.2)
0xc2	unsigned bignum (data item "byte string" follows)
0xc3	negative bignum (data item "byte string" follows)
0xc4	decimal Fraction (data item "array" follows; seeSection 3.4.4)
0xc5	bigfloat (data item "array" follows; seeSection 3.4.4)
0xc6..0xd4	(tag)
0xd5..0xd7	expected conversion (data item follows; seeSection 3.4.5.2)
0xd8..0xdb	(more tags; 1/2/4/8 bytes of tag number and then a data item follow)
0xe0..0xf3	(simple value)
0xf4	false
0xf5	true
0xf6	null
0xf7	undefined
0xf8	(simple value, one byte follows)
0xf9	half-precision float (two-byte IEEE 754)
0xfa	single-precision float (four-byte IEEE 754)
0xfb	double-precision float (eight-byte IEEE 754)
0xff	"break" stop code

Appendix C.Pseudocode

The well-formedness of a CBOR item can be checked by thepseudocode inFigure 1. The data is well-formed if and only if:¶

the pseudocode does not "fail";¶
after execution of the pseudocode, no bytes are left in the input(except in streaming applications).¶

The pseudocode has the following prerequisites:¶

take(n) reads n bytes from the input data and returns them as a bytestring. If n bytes are no longer available, take(n) fails.¶
uint() converts a byte string into an unsigned integer byinterpreting the byte string in network byte order.¶
Arithmetic works as in C.¶
All variables are unsigned integers of sufficient range.¶

Note thatwell_formed returns the major type for well-formeddefinite-length items, but 99 for an indefinite-length item (or -1 fora "break" stop code, only ifbreakable is set). This is used inwell_formed_indefinite to ascertain that indefinite-length stringsonly contain definite-length strings as chunks.¶

well_formed(breakable = false) {  // process initial bytes  ib = uint(take(1));  mt = ib >> 5;  val = ai = ib & 0x1f;  switch (ai) {    case 24: val = uint(take(1)); break;    case 25: val = uint(take(2)); break;    case 26: val = uint(take(4)); break;    case 27: val = uint(take(8)); break;    case 28: case 29: case 30: fail();    case 31:      return well_formed_indefinite(mt, breakable);  }  // process content  switch (mt) {    // case 0, 1, 7 do not have content; just use val    case 2: case 3: take(val); break; // bytes/UTF-8    case 4: for (i = 0; i < val; i++) well_formed(); break;    case 5: for (i = 0; i < val*2; i++) well_formed(); break;    case 6: well_formed(); break;     // 1 embedded data item    case 7: if (ai == 24 && val < 32) fail(); // bad simple  }  return mt;                    // definite-length data item}well_formed_indefinite(mt, breakable) {  switch (mt) {    case 2: case 3:      while ((it = well_formed(true)) != -1)        if (it != mt)           // need definite-length chunk          fail();               //    of same type      break;    case 4: while (well_formed(true) != -1); break;    case 5: while (well_formed(true) != -1) well_formed(); break;    case 7:      if (breakable)        return -1;              // signal break out      else fail();              // no enclosing indefinite    default: fail();            // wrong mt  }  return 99;                    // indefinite-length data item}

Figure 1:Pseudocode for Well-Formedness Check

Note that the remaining complexity of a complete CBOR decoder is aboutpresenting data that has been decoded to the application in anappropriate form.¶

Major types 0 and 1 are designed in such a way that they can beencoded in C from a signed integer without actually doing anif-then-else for positive/negative (Figure 2). This uses thefact that (-1-n), the transformation for major type 1, is the same as~n (bitwise complement) in C unsigned arithmetic; ~n can then beexpressed as (-1)^n for the negative case, while 0^n leaves nunchanged for nonnegative. The sign of a number can be converted to-1 for negative and 0 for nonnegative (0 or positive) byarithmetic-shifting the number by one bit less than the bit length ofthe number (for example, by 63 for 64-bit numbers).¶

void encode_sint(int64_t n) {  uint64t ui = n >> 63;    // extend sign to whole length  unsigned mt = ui & 0x20; // extract (shifted) major type  ui ^= n;                 // complement negatives  if (ui < 24)    *p++ = mt + ui;  else if (ui < 256) {    *p++ = mt + 24;    *p++ = ui;  } else       ...

Figure 2:Pseudocode for Encoding a Signed Integer

SeeSection 1.2 for some specific assumptions about the profile ofthe C language used in these pieces of code.¶

Appendix D.Half-Precision

As half-precision floating-point numbers were only added to IEEE 754in 2008[IEEE754], today's programming platforms often still only have limitedsupport for them. It is very easy to include at least decodingsupport for them even without such support. An example of a smalldecoder for half-precision floating-point numbers in the C language isshown inFigure 3. A similar program for Python is inFigure 4; this code assumes that the 2-byte value hasalready been decoded as an (unsigned short) integer in network byteorder (as would be done by the pseudocode inAppendix C).¶

#include <math.h>double decode_half(unsigned char *halfp) {  unsigned half = (halfp[0] << 8) + halfp[1];  unsigned exp = (half >> 10) & 0x1f;  unsigned mant = half & 0x3ff;  double val;  if (exp == 0) val = ldexp(mant, -24);  else if (exp != 31) val = ldexp(mant + 1024, exp - 25);  else val = mant == 0 ? INFINITY : NAN;  return half & 0x8000 ? -val : val;}

Figure 3:C Code for a Half-Precision Decoder

import structfrom math import ldexpdef decode_single(single):    return struct.unpack("!f", struct.pack("!I", single))[0]def decode_half(half):    valu = (half & 0x7fff) << 13 | (half & 0x8000) << 16    if ((half & 0x7c00) != 0x7c00):        return ldexp(decode_single(valu), 112)    return decode_single(valu | 0x7f800000)

Figure 4:Python Code for a Half-Precision Decoder

Appendix E.Comparison of Other Binary Formats to CBOR's Design Objectives

The proposal for CBOR follows a history of binary formats that is aslong as the history of computers themselves. Different formats havehad different objectives. In most cases, the objectives of the formatwere never stated, although they can sometimes be implied by thecontext where the format was first used. Some formats were meant to beuniversally usable, although history has proven that no binary formatmeets the needs of all protocols and applications.¶

CBOR differs from many of these formats due to it starting with a setof objectives and attempting to meet just those. This section comparesa few of the dozens of formats with CBOR's objectives in order to helpthe reader decide if they want to use CBOR or a different format for aparticular protocol or application.¶

Note that the discussion here is not meant to be a criticism of anyformat: to the best of our knowledge, no format before CBOR was meantto cover CBOR's objectives in the priority we have assigned them. Abrief recap of the objectives fromSection 1.1 is:¶

unambiguous encoding of most common data formats from Internet standards¶
code compactness for encoder or decoder¶
no schema description needed¶
reasonably compact serialization¶
applicability to constrained and unconstrained applications¶
good JSON conversion¶
extensibility¶

A discussion of CBOR and other formats with respect to a different setof design objectives is provided in Section5 and AppendixC of[RFC8618].¶

E.1.ASN.1 DER, BER, and PER

[ASN.1] has many serializations. In the IETF, DER and BER are themost common. The serialized output is not particularly compact formany items, and the code needed to decode numeric items can be complexon a constrained device.¶

Few (if any) IETF protocols have adopted one of the several variantsof Packed Encoding Rules (PER). There could be many reasons for this,but one that is commonly stated is that PER makes use of the schemaeven for parsing the surface structure of the data item, requiringsignificant tool support. There are different versions of the ASN.1schema language in use, which has also hampered adoption.¶

E.2.MessagePack

[MessagePack] is a concise, widely implemented counted binaryserialization format, similar in many properties to CBOR, althoughsomewhat less regular. While the data model can be used to representJSON data, MessagePack has also been used in many remote procedurecall (RPC) applications and for long-term storage of data.¶

MessagePack has been essentially stable since it was first publishedaround 2011; it has not yet had a transition. The evolution ofMessagePack is impeded by an imperative to maintain complete backwardscompatibility with existing stored data, while only few bytecodes arestill available for extension. Repeated requests over the years fromthe MessagePack user community to separate out binary and text stringsin the encoding recently have led to an extension proposal that wouldleave MessagePack's "raw" data ambiguous between its usages for binaryand text data. The extension mechanism for MessagePack remainsunclear.¶

E.3.BSON

[BSON] is a data format that was developed for the storage ofJSON-like maps (JSON objects) in the MongoDB database. Its majordistinguishing feature is the capability for in-place update,which prevents a compact representation. BSON uses a countedrepresentation except for map keys, which are null-byte terminated.While BSON can be used for the representation of JSON-like objects onthe wire, its specification is dominated by the requirements of thedatabase application and has become somewhat baroque. The status ofhow BSON extensions will be implemented remains unclear.¶

E.4.MSDTP: RFC 713

Message Services Data Transmission (MSDTP) is a very early example ofa compact message format; it is described in[RFC0713], writtenin 1976. It is included here for its historical value, not because itwas ever widely used.¶

E.5.Conciseness on the Wire

While CBOR's design objective of code compactness for encoders anddecoders is a higher priority than its objective of conciseness on thewire, many people focus on the wire size.Table 8 shows someencoding examples for the simple nested array [1, [2, 3]]; where someform of indefinite-length encoding is supported by the encoding,[_ 1, [2, 3]] (indefinite length on the outer array) is alsoshown.¶

Table 8:Examples for Different Levels of Conciseness
Format	[1, [2, 3]]	[_ 1, [2, 3]]
RFC 713	c2 05 81 c2 02 82 83
ASN.1 BER	30 0b 02 01 01 30 06 02 01 02 02 01 03	30 80 02 01 01 30 06 02 01 02 02 01 03 00 00
MessagePack	92 01 92 02 03
BSON	22 00 00 00 10 30 00 01 00 00 00 04 31 00 13 00 00 00 10 30 00 02 00 00 00 10 31 00 03 00 00 00 00 00
CBOR	82 01 82 02 03	9f 01 82 02 03 ff

Appendix F.Well-Formedness Errors and Examples

There are three basic kinds of well-formedness errors that can occurin decoding a CBOR data item:¶

Too much data:: There are input bytes left that were not consumed.This is only an error if the application assumed that the inputbytes would span exactly one data item. Where the applicationuses the self-delimiting nature of CBOR encoding to permitadditional data after the data item, as is done in CBORsequences[RFC8742], for example, the CBOR decoder can simplyindicate which part of the input has not been consumed.¶
Too little data:: The input data available would need additionalbytes added at their end for a complete CBOR data item. This mayindicate the input is truncated; it is also a common error whentrying to decode random data as CBOR. For someapplications, however, this may not actually be an error, as theapplication may not be certain it has all the data yet and canobtain or wait for additional input bytes. Some ofthese applications may have an upper limit for how much additionaldata can appear; here the decoder may be able to indicate that theencoded CBOR data item cannot be completed within this limit.¶
Syntax error:: The input data are not consistent with therequirements of the CBOR encoding, and this cannot be remedied byadding (or removing) data at the end.¶

InAppendix C, errors of the first kind are addressed in the firstparagraph and bullet list (requiring "no bytes are left"), and errors ofthe second kind are addressed in the second paragraph/bullet list(failing "if n bytes are no longer available"). Errors of the thirdkind are identified in the pseudocode by specific instances of callingfail(), in order:¶

a reserved value is used for additional information (28, 29, 30)¶
major type 7, additional information 24, value < 32 (incorrect)¶
incorrect substructure of indefinite-length byte string or text string (mayonly contain definite-length strings of the same major type)¶
"break" stop code (major type 7, additional information 31) occurs in a value position of a map orexcept at a position directly in an indefinite-length item wherealso another enclosed data item could occur¶
additional information 31 used with major type 0, 1, or 6¶

F.1.Examples of CBOR Data Items That Are Not Well-Formed

This subsection shows a few examples for CBOR data items that are notwell-formed. Each example is a sequence of bytes, each shown inhexadecimal; multiple examples in a list are separated by commas.¶

Examples for well-formedness error kind 1 (too much data) can easilybe formed by adding data to a well-formed encoded CBOR data item.¶

Similarly, examples for well-formedness error kind 2 (too little data)can be formed by truncating a well-formed encoded CBOR data item. Intest suites, it may be beneficial to specifically test with incompletedata items that would require large amounts of addition to becompleted (for instance by starting the encoding of a string of a verylarge size).¶

A premature end of the input can occur in a head or within the encloseddata, which may be bare strings or enclosed data items that are eithercounted or should have been ended by a "break" stop code.¶

End of input in a head:: 18, 19, 1a, 1b, 19 01, 1a 01 02, 1b 01 02 0304 05 06 07, 38, 58, 78, 98, 9a 01 ff 00, b8, d8, f8, f9 00, fa 0000, fb 00 00 00¶
Definite-length strings with short data:: 41, 61, 5a ff ff ff ff 00,5b ff ff ff ff ff ff ff ff 01 02 03, 7a ff ff ff ff 00, 7b 7f ff ffff ff ff ff ff 01 02 03¶
Definite-length maps and arrays not closed with enough items:: 81, 8181 81 81 81 81 81 81 81, 82 00, a1, a2 01 02, a1 00, a2 00 00 00¶
Tag number not followed by tag content:: c0¶
Indefinite-length strings not closed by a "break" stop code:: 5f 41 00, 7f 61 00¶
Indefinite-length maps and arrays not closed by a "break" stop code:: 9f, 9f 01 02, bf, bf 01 02 01 02, 81 9f, 9f 80 00, 9f 9f 9f 9f 9f ffff ff ff, 9f 81 9f 81 9f 9f ff ff ff¶

A few examples for the five subkinds of well-formedness error kind 3(syntax error) are shown below.¶

Subkind 1:

Reserved additional information values:: 1c, 1d, 1e, 3c, 3d, 3e, 5c,5d, 5e, 7c, 7d, 7e, 9c, 9d, 9e, bc, bd, be, dc, dd, de, fc, fd, fe,¶

Subkind 2:

Reserved two-byte encodings of simple values:: f8 00, f8 01, f8 18, f8 1f¶

Subkind 3:

Indefinite-length string chunks not of the correct type:: 5f 00 ff,5f 21 ff, 5f 61 00 ff, 5f 80 ff, 5f a0 ff, 5f c0 00 ff, 5f e0 ff, 7f41 00 ff¶
Indefinite-length string chunks not definite length:: 5f 5f 41 00 ff ff, 7f 7f 61 00 ff ff¶

Subkind 4:

Break occurring on its own outside of an indefinite-length item:: ff¶
Break occurring in a definite-length array or map or a tag:: 81 ff,82 00 ff, a1 ff, a1 ff 00, a1 00 ff, a2 00 00 ff, 9f 81 ff, 9f 82 9f81 9f 9f ff ff ff ff¶
Break in an indefinite-length map that would lead to an odd number of items(break in a value position):: bf 00 ff, bf 00 00 00 ff¶

Subkind 5:

Major type 0, 1, 6 with additional information 31:: 1f, 3f, df¶

Appendix G.Changes from RFC 7049

As discussed in the introduction,this document formally obsoletes RFC 7049 while keeping full compatibilitywith the interchange format from RFC 7049. This document provides editorialimprovements, added detail, and fixed errata.This document does not create a new version of the format.¶

G.1.Errata Processing and Clerical Changes

The two verified errata on RFC 7049,[Err3764] and[Err3770], concernedtwo encoding examples in the text that have been corrected(Section 3.4.3: "29" -> "49",Section 5.5: "0b000_11101" ->"0b000_11001"). Also, RFC 7049 contained an example using the numericvalue 24 for a simple value[Err5917], which is not well-formed; this example hasbeen removed. Errata report 5763[Err5763] pointed to an error in thewording of the definition of tags; this was resolved during a rewriteofSection 3.4. Errata report 5434[Err5434] pointed out that the Universal Binary JSON(UBJSON) exampleinAppendix E no longer complied with the version of UBJSONcurrent at the time of the errata report submission. It turned out that theUBJSON specification had completely changed since 2013; this exampletherefore was removed. Other errata reports[Err4409][Err4963][Err4964] complained that the map key sorting rules for canonical encoding wereonerous; these led to a reconsideration of the canonical encodingsuggestions and replacement by the deterministic encoding suggestions(described below). An editorial suggestion in errata report 4294[Err4294] was alsoimplemented (improved symmetry by adding "Second value" to a commentto the last example inSection 3.2.2).¶

Other clerical changes include:¶

the use of new xml2rfc functionality[RFC7991];¶
more explanation of the notation used;¶
the update of references, e.g., from RFC 4627 to[RFC8259],from CNN-TERMS to[RFC7228], andfrom the 5.1 edition to the 11th edition of[ECMA262]; the addition of a reference to[IEEE754] and importation of required definitions; the addition of references to[C] and[Cplusplus20];and the addition of a reference to[RFC8618] that further illustrates the discussion inAppendix E;¶
in the discussion of diagnostic notation (Section 8), the "Extended Diagnostic Notation" (EDN) defined in[RFC8610] is now mentioned, the gap in representing NaN payloads is now highlighted, and an explanation of representing indefinite-length strings with no chunks has been added (Section 8.1);¶
the addition of this appendix.¶

G.2.Changes in IANA Considerations

The IANA considerations were generally updated (clerical changes,e.g., now pointing to the CBOR Working Group as the author of thespecification). References to the respective IANA registries wereadded to the informative references.¶

In the "Concise Binary Object Representation (CBOR) Tags" registry[IANA.cbor-tags], tags in the space from 256 to 32767 (lower half of"1+2") are no longer assigned by First Come First Served; this rangeis now Specification Required.¶

G.3.Changes in Suggestions and Other Informational Components

While revising the document, beyond the addressing of the errata reports, the working group drew upon nearly seven years of experience with CBOR in a diverse set of applications. This led to a number of editorial changes, including adding tables for illustration, but also emphasizing some aspects and de-emphasizing others.¶

A significant addition isSection 2, whichdiscusses the CBOR data model and its small variations involved in theprocessing of CBOR. The introduction of terms for those variations (basic generic,extended generic, specific) enables more concise language in otherplaces of the document and also helps to clarify expectations ofimplementations and of the extensibility features of the format.¶

As a format derived from the JSON ecosystem, RFC 7049 was influencedby the JSON number system that was in turn inherited from JavaScriptat the time. JSON does not provide distinct integers and floating-pointvalues (and the latter are decimal in the format). CBORprovides binary representations of numbers, which do differ betweenintegers and floating-point values. Experience from implementation and usesuggested that the separation between these two numberdomains should be more clearly drawn in the document; language thatsuggested an integer could seamlessly stand in for a floating-pointvalue was removed. Also, a suggestion (based on I-JSON[RFC7493])was added for handling these types when converting JSON to CBOR, andthe use of a specific rounding mechanism has been recommended.¶

For a single value in the data model, CBOR often provides multipleencoding options. A new section (Section 4) introduces the term"preferred serialization" (Section 4.1) and defines it for variouskinds of data items. On the basis of this terminology, the sectionthen discusses how a CBOR-based protocol can define "deterministicencoding" (Section 4.2), which avoids terms"canonical" and "canonicalization" from RFC 7049. The suggestion of "CoreDeterministic Encoding Requirements" (Section 4.2.1) enables genericsupport for such protocol-defined encoding requirements. This documentfurther eases the implementation of deterministic encoding bysimplifying the map ordering suggested in RFC 7049 to a simplelexicographic ordering of encoded keys. A description of the oldersuggestion is kept as an alternative, now termed "length-first map keyordering" (Section 4.2.3).¶

The terminology for well-formed and valid data was sharpened and morestringently used, avoiding less well-defined alternative terms such as"syntax error", "decoding error", and "strict mode" outside of examples.Also, a third level of requirements that anapplication has on its input data beyond CBOR-level validity is now explicitly called out.Well-formed (processable at all), valid (checked by avalidity-checking generic decoder), and expected input (as checked bythe application) are treated as a hierarchy of layers ofacceptability.¶

The handling of non-well-formed simple values was clarified in textand pseudocode.Appendix F was added to discuss well-formednesserrors and provide examples for them. The pseudocode was updated tobe more portable, and some portability considerations were added.¶

The discussion of validity has been sharpened in two areas. Mapvalidity (handling of duplicate keys) was clarified, and the domain ofapplicability of certain implementation choices explained. Also,while streamlining the terminology for tags, tag numbers, and tagcontent, discussion was added on tag validity, and the restrictionswere clarified on tag content, in general and specifically for tag 1.¶

An implementation note (and note for future tag definitions) was addedtoSection 3.4 about defining tags with semantics that depend onserialization order.¶

Tag 35 is not defined by this document; the registrationbased on the definition in RFC 7049 remains in place.¶

Terminology was introduced inSection 3 for "argument" and "head",simplifying further discussion.¶

The security considerations (Section 10) were mostly rewritten and significantlyexpanded; in multiple other places, the document is now more explicitthat a decoder cannot simply condone well-formedness errors.¶

Acknowledgements

CBOR was inspired by MessagePack. MessagePack was developed andpromoted bySadayuki Furuhashi ("frsyuki"). This reference toMessagePack is solely for attribution; CBOR is not intended as aversion of, or replacement for, MessagePack, as it has different designgoals and requirements.¶

The need for functionality beyond the original MessagePackspecification became obvious to many people at about the same timearound the year 2012. BinaryPack is a minor derivation of MessagePackthat was developed byEric Zhang for the binaryjs project. A similar,but different, extension was made byTim Caswell for his msgpack-jsand msgpack-js-browser projects. Many people have contributed to thediscussion about extending MessagePack to separate text stringrepresentation from byte string representation.¶

The encoding of the additional information in CBOR was inspired by theencoding of length information designed byKlaus Hartke for CoAP.¶

This document also incorporates suggestions made by many people,notablyDan Frost,James Manger,Jeffrey Yasskin,Joe Hildebrand,Keith Moore,Laurence Lundblade,Matthew Lepinski,Michael Richardson,Nico Williams,Peter Occil,Phillip Hallam-Baker,Ray Polk,Stuart Cheshire,Tim Bray,Tony Finch,Tony Hansen, andYaron Sheffer.Benjamin Kaduk provided an extensive review during IESG processing.Éric Vyncke,Erik Kline,Robert Wilton, andRoman Danyliw provided further IESG comments, which included an IoT directorate review byEve Schooler.¶