Movatterモバイル変換

RFC 9682	CDDL grammar updates	November 2024
Bormann	Standards Track	[Page]

2.Clarifications and Changes Based on Errata Reports

A number of errata reports have been made regarding some details of textstring and byte string literal syntax: for example,[Err6527] and[Err6543].These are being addressed in this section, updating details of theABNF for these literal syntaxes.Also, the changes described in[Err6526] need to be applied (backslashes have been lost during the RFC publication process ofAppendix G.2 of [RFC8610], garbling the text explaining backslash escaping).¶

These changes are intended to mirror the way existing implementationshave dealt with the errata reports. This document also uses the opportunity presentedby the necessary cleanup of the grammar of string literals for abackward-compatible addition to the syntax for hexadecimal escapes.The latter change is not automatically forward compatible (i.e., CDDLspecifications that make use of this syntax do not necessarily workwith existing implementations until these are updated, which is recommended by thisspecification).¶

2.1.Updates to String Literal Grammar

2.1.1.Erratum ID 6527 (Text String Literals)

The ABNF used in[RFC8610] for the content of text string literals is rather permissive:¶

; ABNF from RFC 8610:text = %x22 *SCHAR %x22SCHAR = %x20-21 / %x23-5B / %x5D-7E / %x80-10FFFD / SESCSESC = "\" (%x20-7E / %x80-10FFFD)

Figure 1:Original ABNF from RFC 8610 for Strings with Permissive ABNF for SESC (Which Did Not Allow Hex Escapes)

This allows almost any non-C0 character to be escaped by a backslash,but critically misses out on the\uXXXX and\uHHHH\uLLLL formsthat JSON allows to specify characters in hex(which shouldapply here according to item 6 ofSection 3.1 of [RFC8610]).(Note that CDDL imports from JSON the unwieldy\uHHHH\uLLLL syntax,which represents Unicode code points beyond U+FFFF by making them looklike UTF-16 surrogate pairs; CDDL text strings do not use UTF-16 or surrogates.)¶

Both can be solved by updating the SESC rule.This document uses the opportunity to add a popular form of directly specifyingcharacters in strings using hexadecimal escape sequences of the form\u{hex}, wherehex is the hexadecimal representation of theUnicode scalar value.The result is the new set of rules defining SESC inFigure 2.¶

; new rules collectively defining SESC:SESC = "\" ( %x22 / "/" / "\" /                 ; \" \/ \\             %x62 / %x66 / %x6E / %x72 / %x74 / ; \b \f \n \r \t             (%x75 hexchar) )                   ; \uXXXXhexchar = "{" (1*"0" [ hexscalar ] / hexscalar) "}" /          non-surrogate / (high-surrogate "\" %x75 low-surrogate)non-surrogate = ((DIGIT / "A"/"B"/"C" / "E"/"F") 3HEXDIG) /                ("D" %x30-37 2HEXDIG )high-surrogate = "D" ("8"/"9"/"A"/"B") 2HEXDIGlow-surrogate = "D" ("C"/"D"/"E"/"F") 2HEXDIGhexscalar = "10" 4HEXDIG / HEXDIG1 4HEXDIG          / non-surrogate / 1*3HEXDIGHEXDIG1 = DIGIT1 / "A" / "B" / "C" / "D" / "E" / "F"

Figure 2:Update to String ABNF in Appendix B of [RFC8610]: Allow Hex Escapes

Notes:In ABNF, strings such as"A","B", etc., are case insensitive, as isintended here.The rules above could have also used%s"b", etc., instead of%x62, but didn't, in order to maximize compatibility with ABNF tools.¶

Now that SESC is more restrictively formulated, anupdate to the BCHAR rule used in the ABNF syntax for byte stringliterals is also required:¶

; ABNF from RFC 8610:bytes = [bsqual] %x27 *BCHAR %x27BCHAR = %x20-26 / %x28-5B / %x5D-10FFFD / SESC / CRLFbsqual = "h" / "b64"

Figure 3:ABNF from RFC 8610 for BCHAR

With the SESC updated as above,\' is no longer allowed in BCHAR and now needs to be explicitly included there; seeFigure 4.¶

2.1.2.Erratum ID 6278 (Consistent String Literals)

Updating BCHAR also provides an opportunity to address[Err6278],which points to an inconsistency in treating U+007F (DEL) between SCHAR andBCHAR.As U+007F is not printable, including it in a byte string literal isas confusing as for a text string literal; therefore, it should beexcluded from BCHAR as it is from SCHAR.The same reasoning also applies to the C1 control characters,so the updated ABNF actually excludes the entire range from U+007F to U+009F.The same reasoning also applies to text in comments (PCHAR). For completeness, all these rules should also explicitly exclude the codepoints that have been set aside for UTF-16 surrogates.¶

; new rules for SCHAR, BCHAR, and PCHAR:SCHAR = %x20-21 / %x23-5B / %x5D-7E / NONASCII / SESCBCHAR = %x20-26 / %x28-5B / %x5D-7E / NONASCII / SESC / "\'" / CRLFPCHAR = %x20-7E / NONASCIINONASCII = %xA0-D7FF / %xE000-10FFFD

Figure 4:Update to ABNF in Appendix B of [RFC8610]: BCHAR, SCHAR, and PCHAR

(Note that, apart from addressing the inconsistencies, there is noattempt to further exclude non-printable characters from the ABNF;doing this properly would draw in complexity from the ongoingevolution of the Unicode standard[UNICODE] that is not needed here.)¶

2.1.3.Addressing Erratum ID 6526 and Erratum ID 6543

The above changes also cover[Err6543] (a proposal to split offqualified byte string literals from UTF-8 byte string literals) and[Err6526] (lost backslashes); seeAppendix B for details.¶

2.2.Examples Demonstrating the Updated String Syntaxes

The CDDL example inFigure 5 demonstrates various escapingtechniques now available for (byte and text) strings in CDDL.Obviously, in the literals fora andx, there is no need to escapethe second character, ano, as\u{6f}; this is just for demonstration.Similarly, as shown inc andz, there also is no need to escape the"🁳" (DOMINO TILE VERTICAL-02-02, U+1F073) or"⌘" (PLACE OF INTEREST SIGN, U+2318); however, escaping them may be convenient in order to limit the characterrepertoire of a CDDL file itself to ASCII[STD80].¶

start = [a, b, c, x, y, z]; "🁳", DOMINO TILE VERTICAL-02-02, and; "⌘", PLACE OF INTEREST SIGN, in a text string:a = "D\u{6f}mino's \u{1F073} + \u{2318}"      ; \u{}-escape 3 charsb = "Domino's \uD83C\uDC73 + \u2318"          ; escape JSON-likec = "Domino's 🁳 + ⌘"                          ; unescaped; in a byte string given as text, the ' needs to be escaped:x = 'D\u{6f}mino\u{27}s \u{1F073} + \u{2318}' ; \u{}-escape 4 charsy = 'Domino\'s \uD83C\uDC73 + \u2318'         ; escape JSON-likez = 'Domino\'s 🁳 + ⌘'                         ; escape ' only

Figure 5:Example Text and Byte String Literals with Various Escaping Techniques

In this example, the rules a to c and x to z all produce strings withbyte-wise identical content: a to c are text strings and x to zare byte strings.Figure 6 illustrates this by showing the output generated fromthestart rule inFigure 5, using pretty-printed hexadecimal.¶

86                                      # array(6)   73                                   # text(19)      446f6d696e6f277320f09f81b3202b20e28c98 # "Domino's 🁳 + ⌘"   73                                   # text(19)      446f6d696e6f277320f09f81b3202b20e28c98 # "Domino's 🁳 + ⌘"   73                                   # text(19)      446f6d696e6f277320f09f81b3202b20e28c98 # "Domino's 🁳 + ⌘"   53                                   # bytes(19)      446f6d696e6f277320f09f81b3202b20e28c98 # "Domino's 🁳 + ⌘"   53                                   # bytes(19)      446f6d696e6f277320f09f81b3202b20e28c98 # "Domino's 🁳 + ⌘"   53                                   # bytes(19)      446f6d696e6f277320f09f81b3202b20e28c98 # "Domino's 🁳 + ⌘"

Figure 6:Generated CBOR from CDDL Example (Pretty-Printed Hexadecimal)

3.Small Enabling Grammar Changes

Each subsection that follows specifies a small change to thegrammar that is intended to enable certain kinds of specifications.These changes are backward compatible (i.e., CDDL files thatcomply with[RFC8610] continue to match the updated grammar) but notnecessarily forward compatible (i.e., CDDL specifications that makeuse of these changes cannot necessarily be processed by existing implementations of[RFC8610]).¶

3.1.Empty Data Models

[RFC8610] requires a CDDL file to have at least one rule.¶

; ABNF from RFC 8610:cddl = S 1*(rule S)

Figure 7:ABNF from RFC 8610 for Top-Level Rulecddl

This makes sense when the file has to stand alone, as a CDDL datamodel needs to have at least one rule to provide an entry point (i.e., a startrule).¶

With CDDL modules[CDDL-MODULES], CDDL files can also include directives,and these might be the source of all the rules thatultimately make up the module created by the file.Any other rule content in the file has to be available for directiveprocessing, making the requirement for at least one rule cumbersome.¶

Therefore, the present update extends the grammar as inFigure 8and turns the existence of at least one rule into a semantic constraint, tobe fulfilled after processing of all directives.¶

; new top-level rule:cddl = S *(rule S)

Figure 8:Update to Top-Level ABNF in Appendices B and C of RFC 8610

3.2.Non-Literal Tag Numbers and Simple Values

The existing ABNF syntax for expressing tags in CDDL is as follows:¶

; extracted from the ABNF in RFC 8610:type2 =/ "#" "6" ["." uint] "(" S type S ")"

Figure 9:Original ABNF from RFC 8610 for Tag Syntax

This means tag numbers can only be given as literal numbers (uints).Some specifications operate on ranges of tag numbers; for example,[RFC9277]has a range of tag numbers 1668546817 (0x63740101) to 1668612095(0x6374FFFF) to tag specific content formats.This cannot currently be expressed in CDDL.Similar considerations apply to simple values (#7.xx).¶

This update extends the syntax to the following:¶

; new rules collectively defining the tagged case:type2 =/ "#" "6" ["." head-number] "(" S type S ")"       / "#" "7" ["." head-number]head-number = uint / ("<" type ">")

Figure 10:Update to Tag and Simple Value ABNF in Appendices B and C of RFC 8610

For#6, thehead-number stands for the tag number.For#7, thehead-number stands for the simple value if it is inthe ranges 0..23 or 32..255 (as per Section3.3 of RFC 8949[STD94],the simple values 24..31 are not used).For 24..31, thehead-number stands for the "additionalinformation", e.g.,#7.25 or#7.<25> is a float16, etc.(All ranges mentioned here are inclusive.)¶

So the above range can be expressed in a CDDL fragment such as:¶

ct-tag<content> = #6.<ct-tag-number>(content)ct-tag-number = 1668546817..1668612095; or use 0x63740101..0x6374FFFF

Notes:¶

This syntax reuses the angle bracket syntax for generics;this reuse is innocuous because a generic parameter or argument only everoccurs after a rule name (id), while it occurs after the "." (dot) character here.(Whether there is potential for human confusion can be debated; theabove example deliberately uses generics as well.)¶
The updated ABNF grammar makes it a bit more explicit that the number given after the optional dot is the value of the argument: for tags and simple values, it is not giving the CBOR "additional information”, as it is with other uses of# in CDDL.(Adding this observation toSection 2.2.3 of [RFC8610] is the subjectof[Err6575]; it is correctly noted inSection 3.6 of [RFC8610].)In hindsight, maybe a different character than the dot should havebeen chosen for this special case; however, changing the grammarin the current document would have been too disruptive.¶

6.References

6.1.Normative References

[RFC8610]: Birkholz, H.,Vigano, C., andC. Bormann,"Concise Data Definition Language (CDDL): A Notational Convention to Express Concise Binary Object Representation (CBOR) and JSON Data Structures",RFC 8610,DOI 10.17487/RFC8610,June 2019,<https://www.rfc-editor.org/info/rfc8610>.
[STD68]: Internet Standard 68,<https://www.rfc-editor.org/info/std68>.
At the time of writing, this STD comprises the following:
Crocker, D., Ed. andP. Overell,"Augmented BNF for Syntax Specifications: ABNF",STD 68,RFC 5234,DOI 10.17487/RFC5234,January 2008,<https://www.rfc-editor.org/info/rfc5234>.
[STD94]: Internet Standard 94,<https://www.rfc-editor.org/info/std94>.
At the time of writing, this STD comprises the following:
Bormann, C. andP. Hoffman,"Concise Binary Object Representation (CBOR)",STD 94,RFC 8949,DOI 10.17487/RFC8949,December 2020,<https://www.rfc-editor.org/info/rfc8949>.

6.2.Informative References

[CDDL-MODULES]: Bormann, C. andB. Moran,"CDDL Module Structure",Work in Progress,Internet-Draft, draft-ietf-cbor-cddl-modules-03,1 September 2024,<https://datatracker.ietf.org/doc/html/draft-ietf-cbor-cddl-modules-03>.
[EDN-LITERALS]: Bormann, C.,"CBOR Extended Diagnostic Notation (EDN)",Work in Progress,Internet-Draft, draft-ietf-cbor-edn-literals-13,3 November 2024,<https://datatracker.ietf.org/doc/html/draft-ietf-cbor-edn-literals-13>.
[Err6278]: RFC Errata,Erratum ID 6278,RFC 8610,<https://www.rfc-editor.org/errata/eid6278>.
[Err6526]: RFC Errata,Erratum ID 6526,RFC 8610,<https://www.rfc-editor.org/errata/eid6526>.
[Err6527]: RFC Errata,Erratum ID 6527,RFC 8610,<https://www.rfc-editor.org/errata/eid6527>.
[Err6543]: RFC Errata,Erratum ID 6543,RFC 8610,<https://www.rfc-editor.org/errata/eid6543>.
[Err6575]: RFC Errata,Erratum ID 6575,RFC 8610,<https://www.rfc-editor.org/errata/eid6575>.
[RFC7405]: Kyzivat, P.,"Case-Sensitive String Support in ABNF",RFC 7405,DOI 10.17487/RFC7405,December 2014,<https://www.rfc-editor.org/info/rfc7405>.
[RFC9165]: Bormann, C.,"Additional Control Operators for the Concise Data Definition Language (CDDL)",RFC 9165,DOI 10.17487/RFC9165,December 2021,<https://www.rfc-editor.org/info/rfc9165>.
[RFC9277]: Richardson, M. andC. Bormann,"On Stable Storage for Items in Concise Binary Object Representation (CBOR)",RFC 9277,DOI 10.17487/RFC9277,August 2022,<https://www.rfc-editor.org/info/rfc9277>.
[STD80]: Internet Standard 80,<https://www.rfc-editor.org/info/std80>.
At the time of writing, this STD comprises the following:
Cerf, V.,"ASCII format for network interchange",STD 80,RFC 20,DOI 10.17487/RFC0020,October 1969,<https://www.rfc-editor.org/info/rfc20>.
[UNICODE]: The Unicode Consortium,"The Unicode Standard",<https://www.unicode.org/versions/latest/>.

Appendix A.Updated Collected ABNF for CDDL

This appendix is normative.¶

It provides the full ABNF from[RFC8610] as updated by the present document.¶

cddl = S *(rule S)rule = typename [genericparm] S assignt S type     / groupname [genericparm] S assigng S grpenttypename = idgroupname = idassignt = "=" / "/="assigng = "=" / "//="genericparm = "<" S id S *("," S id S ) ">"genericarg = "<" S type1 S *("," S type1 S ) ">"type = type1 *(S "/" S type1)type1 = type2 [S (rangeop / ctlop) S type2]; space may be needed before the operator if type2 ends in a nametype2 = value      / typename [genericarg]      / "(" S type S ")"      / "{" S group S "}"      / "[" S group S "]"      / "~" S typename [genericarg]      / "&" S "(" S group S ")"      / "&" S groupname [genericarg]      / "#" "6" ["." head-number] "(" S type S ")"      / "#" "7" ["." head-number]      / "#" DIGIT ["." uint]                ; major/ai      / "#"                                 ; anyhead-number = uint / ("<" type ">")rangeop = "..." / ".."ctlop = "." idgroup = grpchoice *(S "//" S grpchoice)grpchoice = *(grpent optcom)grpent = [occur S] [memberkey S] type       / [occur S] groupname [genericarg]  ; preempted by above       / [occur S] "(" S group S ")"memberkey = type1 S ["^" S] "=>"          / bareword S ":"          / value S ":"bareword = idoptcom = S ["," S]occur = [uint] "*" [uint]      / "+"      / "?"uint = DIGIT1 *DIGIT     / "0x" 1*HEXDIG     / "0b" 1*BINDIG     / "0"value = number      / text      / bytesint = ["-"] uint; This is a float if it has fraction or exponent; int otherwisenumber = hexfloat / (int ["." fraction] ["e" exponent ])hexfloat = ["-"] "0x" 1*HEXDIG ["." 1*HEXDIG] "p" exponentfraction = 1*DIGITexponent = ["+"/"-"] 1*DIGITtext = %x22 *SCHAR %x22SCHAR = %x20-21 / %x23-5B / %x5D-7E / NONASCII / SESCSESC = "\" ( %x22 / "/" / "\" /                 ; \" \/ \\             %x62 / %x66 / %x6E / %x72 / %x74 / ; \b \f \n \r \t             (%x75 hexchar) )                   ; \uXXXXhexchar = "{" (1*"0" [ hexscalar ] / hexscalar) "}" /          non-surrogate / (high-surrogate "\" %x75 low-surrogate)non-surrogate = ((DIGIT / "A"/"B"/"C" / "E"/"F") 3HEXDIG) /                ("D" %x30-37 2HEXDIG )high-surrogate = "D" ("8"/"9"/"A"/"B") 2HEXDIGlow-surrogate = "D" ("C"/"D"/"E"/"F") 2HEXDIGhexscalar = "10" 4HEXDIG / HEXDIG1 4HEXDIG          / non-surrogate / 1*3HEXDIGbytes = [bsqual] %x27 *BCHAR %x27BCHAR = %x20-26 / %x28-5B / %x5D-7E / NONASCII / SESC / "\'" / CRLFbsqual = "h" / "b64"id = EALPHA *(*("-" / ".") (EALPHA / DIGIT))ALPHA = %x41-5A / %x61-7AEALPHA = ALPHA / "@" / "_" / "$"DIGIT = %x30-39DIGIT1 = %x31-39HEXDIG = DIGIT / "A" / "B" / "C" / "D" / "E" / "F"HEXDIG1 = DIGIT1 / "A" / "B" / "C" / "D" / "E" / "F"BINDIG = %x30-31S = *WSWS = SP / NLSP = %x20NL = COMMENT / CRLFCOMMENT = ";" *PCHAR CRLFPCHAR = %x20-7E / NONASCIINONASCII = %xA0-D7FF / %xE000-10FFFDCRLF = %x0A / %x0D.0A

Figure 11:ABNF for CDDL as Updated

Appendix B.Details about Covering Erratum ID 6543

This appendix is informative.¶

[Err6543] notes thatthe ABNF used in[RFC8610] for the content of byte string literalslumps together byte strings notated as text with byte strings notated in base16 (hex) or base64 (but see also updated BCHAR rule inFigure 4):¶

; ABNF from RFC 8610:bytes = [bsqual] %x27 *BCHAR %x27BCHAR = %x20-26 / %x28-5B / %x5D-10FFFD / SESC / CRLF

Figure 12:Original ABNF from RFC 8610 for BCHAR

B.1.Change Proposed by Erratum ID 6543

Erratum ID 6543 proposes handling the two cases in separateABNF rules (where, with an updated SESC, BCHAR obviously needs to beupdated as above):¶

; Proposal from Erratum ID 6543:bytes = %x27 *BCHAR %x27      / bsqual %x27 *QCHAR %x27BCHAR = %x20-26 / %x28-5B / %x5D-10FFFD / SESC / CRLFQCHAR = DIGIT / ALPHA / "+" / "/" / "-" / "_" / "=" / WS

Figure 13:Proposal from Erratum ID 6543 to Split the Byte String Rules

This potentially causes a subtle change, which is hidden in the WS rule:¶

; ABNF from RFC 8610:WS = SP / NLSP = %x20NL = COMMENT / CRLFCOMMENT = ";" *PCHAR CRLFPCHAR = %x20-7E / %x80-10FFFDCRLF = %x0A / %x0D.0A

Figure 14:ABNF Definition of WS from RFC 8610

This allows any non-C0 character in a comment, so this fragmentbecomes possible:¶

foo = h'   43424F52 ; 'CBOR'   0A       ; LF, but don't use CR!'

The current text is not unambiguously saying whether the three apostrophesneed to be escaped with a\ or not, as in:¶

foo = h'   43424F52 ; \'CBOR\'   0A       ; LF, but don\'t use CR!'

... which would be supported by the existing ABNF in[RFC8610].¶

B.2.No Further Change Needed after Updating String Literal Grammar

This document takes the simpler approach of leaving the processing ofthe content of the byte string literal to a semantic step afterprocessing the syntax of thebytes andBCHAR rules, as updated byFigures2 and4 inSection 2.1 (updates prompted by the combinationof[Err6527] and[Err6278]).¶

Therefore, the rules inFigure 14 (as updated byFigure 4) are applied to the result of thisprocessing wherebsqual is given ash orb64.¶

Note that this approach also works well with the use of byte stringsinSection 3 of [RFC9165].It does require some care when copying-and-pasting into CDDL models from ABNFthat contains single quotes (which may also hide as apostrophesin comments); these need to be escaped or possibly replaced by%x27.¶

Finally, the approach taken lends support to extendingbsqual in CDDLsimilar to the way this is done for CBOR diagnostic notation in[EDN-LITERALS].(Note that, at the time of writing, the processing of string literals is quite similar for bothCDDL and Extended Diagnostic Notation (EDN), except that CDDL has end-of-line comments that are ";" based and EDN hastwo comment syntaxes: one in-line "/" based and one end-of-line "#" based.)¶

Movatterモバイル変換

RFC 9682

Updates to the Concise Data Definition Language (CDDL) Grammar

Abstract

Status of This Memo

Copyright Notice

Table of Contents

1.Introduction

1.1.Conventions and Definitions