RFC 9292 | Binary HTTP Messages | August 2022 |
Thomson & Wood | Standards Track | [Page] |
This document defines a binary format for representing HTTP messages.¶
This is an Internet Standards Track document.¶
This document is a product of the Internet Engineering Task Force (IETF). It represents the consensus of the IETF community. It has received public review and has been approved for publication by the Internet Engineering Steering Group (IESG). Further information on Internet Standards is available in Section 2 of RFC 7841.¶
Information about the current status of this document, any errata, and how to provide feedback on it may be obtained athttps://www.rfc-editor.org/info/rfc9292.¶
Copyright (c) 2022 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
This document defines a simple format for representing an HTTP message[HTTP], either request or response. This allows for the encoding of HTTPmessages that can be conveyed outside an HTTP protocol. This enables thetransformation of entire messages, including the application of authenticatedencryption.¶
The design of this format is informed by the framing structure of HTTP/2[HTTP/2] and HTTP/3[HTTP/3]. Rules for constructing messages rely on the rulesdefined in HTTP/2, but the format itself is distinct; seeSection 6.¶
This format defines"message/bhttp"
, a binary alternative to the"message/http"
content type defined in[HTTP/1.1]. A binary format permits more efficientencoding and processing of messages. A binary format also reduces exposure tosecurity problems related to processing of HTTP messages.¶
Two modes for encoding are described:¶
This format is designed to convey the semantics of valid HTTP messages as simplyand efficiently as possible. It is not designed to capture all of the detailsof the encoding of messages from specific HTTP versions[HTTP/1.1][HTTP/2][HTTP/3]. As such, this format is unlikely to be suitable for applications thatdepend on an exact recording of the encoding of messages.¶
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14[RFC2119][RFC8174] when, and only when, they appear in all capitals, as shown here.¶
This document uses terminology from HTTP[HTTP] and notation from QUIC(Section 1.3 of [QUIC]).¶
Section 6 of [HTTP] defines the general structure of HTTP messages andcomposes those messages into distinct parts. This format describes how thoseparts are composed into a sequence of bytes. At a high level, binary messagesare comprised of:¶
All lengths and numeric values are encoded using the variable-length integerencoding fromSection 16 of [QUIC]. Integer values do not need to be encodedon the minimum number of bytes necessary.¶
A request or response that has a known length at the time of construction usesthe format shown inFigure 1.¶
Known-Length Request { Framing Indicator (i) = 0, Request Control Data (..), Known-Length Field Section (..), Known-Length Content (..), Known-Length Field Section (..), Padding (..),}Known-Length Response { Framing Indicator (i) = 1, Known-Length Informational Response (..) ..., Final Response Control Data (..), Known-Length Field Section (..), Known-Length Content (..), Known-Length Field Section (..), Padding (..),}Known-Length Field Section { Length (i), Field Line (..) ...,}Known-Length Content { Content Length (i), Content (..),}Known-Length Informational Response { Informational Response Control Data (..), Known-Length Field Section (..),}
A known-length request consists of a framing indicator (Section 3.3), requestcontrol data (Section 3.4), a header section with a length prefix,binary content with a length prefix, a trailer section with a length prefix, andpadding.¶
A known-length response contains the same fields, with the exception thatrequest control data is replaced by zero or more informational responses(Section 3.5.1) followed by response control data (Section 3.5).¶
For a known-length encoding, the length prefix on field sections and content isa variable-length encoding of an integer. This integer is the number of bytesin the field section or content, not including the length field itself.¶
Fields in the header and trailer sections consist of a length-prefixed name andlength-prefixed value; seeSection 3.6.¶
The format allows for the message to be truncated before any of the lengthprefixes that precede the field sections or content; seeSection 3.8.¶
The variable-length integer encoding means that there is a limit of 262-1bytes for each field section and the message content.¶
A request or response that is constructed without encoding a known length foreach section uses the format shown inFigure 2:¶
Indeterminate-Length Request { Framing Indicator (i) = 2, Request Control Data (..), Indeterminate-Length Field Section (..), Indeterminate-Length Content (..), Indeterminate-Length Field Section (..), Padding (..),}Indeterminate-Length Response { Framing Indicator (i) = 3, Indeterminate-Length Informational Response (..) ..., Final Response Control Data (..), Indeterminate-Length Field Section (..), Indeterminate-Length Content (..), Indeterminate-Length Field Section (..), Padding (..),}Indeterminate-Length Content { Indeterminate-Length Content Chunk (..) ..., Content Terminator (i) = 0,}Indeterminate-Length Content Chunk { Chunk Length (i) = 1.., Chunk (..),}Indeterminate-Length Field Section { Field Line (..) ..., Content Terminator (i) = 0,}Indeterminate-Length Informational Response { Informational Response Control Data (..), Indeterminate-Length Field Section (..),}
An indeterminate-length request consists of a framing indicator (Section 3.3),request control data (Section 3.4), a header section that is terminatedby a zero value, any number of non-zero-length chunks of binary content, a zerovalue, a trailer section that is terminated by a zero value, and padding.¶
An indeterminate-length response contains the same fields, with the exceptionthat request control data is replaced by zero or more informational responses(Section 3.5.1) and response control data (Section 3.5).¶
The indeterminate-length encoding only uses length prefixes for content blocks.Multiple length-prefixed portions of content can be included, each prefixed by anon-zero Chunk Length integer describing the number of bytes in the block. TheChunk Length is encoded as a variable-length integer.¶
Each Field Line in an Indeterminate-Length Field Section starts with a NameLength field. An Indeterminate-Length Field Section ends with a ContentTerminator field. The zero value of the Content Terminator distinguishes itfrom the Name Length field, which cannot contain a value of 0.¶
Indeterminate-length messages can be truncated in a way similar to that forknown-lengthmessages; seeSection 3.8.¶
Indeterminate-length messages use the same encoding for Field Line asknown-length messages; seeSection 3.6.¶
The start of each binary message is a framing indicator that is a single integer thatdescribes the structure of the subsequent sections. The framing indicator cantake just four values:¶
Other values cause the message to be invalid; seeSection 4.¶
The control data for a request message contains the method and request target.That information is encoded as an ordered sequence of fields: Method, Scheme,Authority, Path. Each of these fields is prefixed with a length.¶
The values of these fields follow the rules in HTTP/2 (Section 8.3.1 of [HTTP/2])that apply to the":method"
,":scheme"
,":authority"
, and":path"
pseudo-headerfields, respectively. However, where the":authority"
pseudo-header field mightbe omitted in HTTP/2, a zero-length value is encoded instead.¶
The format of request control data is shown inFigure 3.¶
Request Control Data { Method Length (i), Method (..), Scheme Length (i), Scheme (..), Authority Length (i), Authority (..), Path Length (i), Path (..),}
The control data for a response message consists of the status code. The statuscode (Section 15 of [HTTP]) is encoded as a variable-length integer, not alength-prefixed decimal string.¶
The format of final response control data is shown inFigure 4.¶
Final Response Control Data { Status Code (i) = 200..599,}
Responses that include informational status codes (seeSection 15.2 of [HTTP])are encoded by repeating the response control data and associated header sectionuntil a final status code is encoded; that is, a Status Code field with a value from 200 to 599 (inclusive). The status code distinguishesbetween informational and final responses.¶
The format of the informational response control data is shown inFigure 5.¶
Informational Response Control Data { Status Code (i) = 100..199,}
A response message can include any number of informational responses thatprecede a final status code. These convey an informational status code and aheader block.¶
If the response control data includes an informational status code (that is, avalue between 100 and 199 inclusive), the control data is followed by a headersection (encoded with known length or indeterminate length according to the framingindicator) and another block of control data. This pattern repeats until thecontrol data contains a final status code (200 to 599 inclusive).¶
Header and trailer sections consist of zero or more field lines; seeSection 5 of [HTTP]. The format of a field section depends on whether the message is ofknown length or indeterminate length.¶
Each Field Line encoding includes a name and a value. Both the name and value arelength-prefixed sequences of bytes. The Name field is a minimum of onebyte. The format of a Field Line is shown inFigure 6.¶
Field Line { Name Length (i) = 1.., Name (..), Value Length (i), Value (..),}
For field names, byte values that are not permitted in an HTTP field name causethe message to be invalid; seeSection 5.1 of [HTTP] for a definition of whatis valid andSection 4 regarding the handling of invalid messages. A recipientMUSTtreat a message that contains field values that would cause an HTTP/2 message tobe malformed according toSection 8.2.1 of [HTTP/2] as invalid; seeSection 4.¶
The same field name can be repeated over more than one field line; seeSection 5.2 of [HTTP] for the semantics of repeated field names and rules for combiningvalues.¶
Messages are invalid (Section 4) if they contain fields named":method"
,":scheme"
,":authority"
,":path"
, or":status"
. Other pseudo-fields that aredefined by protocol extensionsMAY be included; pseudo-fields cannot be includedin trailers (seeSection 8.1 of [HTTP/2]). A Field Line containing pseudo-fieldsMUST precede other Field Line values. A message that contains a pseudo-field afterany other field is invalid; seeSection 4.¶
Fields that relate to connections (Section 7.6.1 of [HTTP]) cannot be used toproduce the effect on a connection in this context. These fieldsSHOULD beremoved when constructing a binary message. However, they do not cause amessage to be invalid (Section 4); permitting these fields allows a binarymessage to capture messages that are exchanged in a protocol context.¶
Like HTTP/2 or HTTP/3, this format has an exception for the combination ofmultiple instances of theCookie
field. Instances of fields with theASCII-encoded value of"cookie"
are combined using a semicolon octet (0x3b)rather than a comma; seeSection 8.2.3 of [HTTP/2].¶
The content of messages is a sequence of bytes of any length. Though aknown-length message has a limit, this limit is large enough that it isunlikely to be a practical limitation. There is no limit to the size of contentin an indeterminate-length message.¶
Messages can be padded with any number of zero-valued bytes. Non-zero paddingbytes cause a message to be invalid (seeSection 4). Unlike other parts of amessage, a processorMAY decide not to validate the value of padding bytes.¶
Truncation can be used to reduce the size of messages that have no data intrailing field sections or content. If the trailers of a message are empty, theyMAY be omitted by the encoder in place of adding a length field equal tozero. An encoderMAY omit empty content in the same way if the trailers are alsoempty. A message that is truncated at any other point is invalid; seeSection 4.¶
DecodersMUST treat missing truncated fields as equivalent to having been sentwith the length field set to zero.¶
Padding is compatible with truncation of empty parts of the messages.Zero-valued bytes will be interpreted as a zero-length part, which is semanticallyequivalent to the part being absent.¶
This document describes a number of ways that a message can be invalid. InvalidmessagesMUST NOT be processed further except to log an error and produce anerror response.¶
The format is designed to allow incremental processing. Implementations need tobe aware of the possibility that an error might be detected after performingincremental processing.¶
This section includes example requests and responses encoded in bothknown-length and indeterminate-length forms.¶
The example HTTP/1.1 message inFigure 7 shows the content in the"message/http"
format.¶
Valid HTTP/1.1 messages require lines terminated with CRLF (the two bytes 0x0d and 0x0a). For simplicity and consistency, the content of these examples islimited to text, which also uses CRLF for line endings.¶
GET /hello.txt HTTP/1.1User-Agent: curl/7.16.3 libcurl/7.16.3 OpenSSL/0.9.7l zlib/1.2.3Host: www.example.comAccept-Language: en, mi
This can be expressed as a binary message (type"message/bhttp"
) using aknown-length encoding as shown in hexadecimal inFigure 8.Figure 8 includes text alongside to show that most of the content isnot modified.¶
00034745 54056874 74707300 0a2f6865 ..GET.https../he6c6c6f2e 74787440 6c0a7573 65722d61 llo.txt@l.user-a67656e74 34637572 6c2f372e 31362e33 gent4curl/7.16.3206c6962 6375726c 2f372e31 362e3320 libcurl/7.16.34f70656e 53534c2f 302e392e 376c207a OpenSSL/0.9.7l z6c69622f 312e322e 3304686f 73740f77 lib/1.2.3.host.w77772e65 78616d70 6c652e63 6f6d0f61 ww.example.com.a63636570 742d6c61 6e677561 67650665 ccept-language.e6e2c206d 690000 n, mi..
This example shows that the Host header field is not replicated in the":authority"
field, as is required for ensuring that the request is reproducedaccurately; seeSection 8.3.1 of [HTTP/2].¶
The same message can be truncated with no effect on interpretation. In thiscase, the last two bytes -- corresponding to content and a trailer section -- caneach be removed without altering the semantics of the message.¶
The same message, encoded using an indeterminate-length encoding, is shown inFigure 9. As the content of this message is empty, the difference informats is negligible.¶
02034745 54056874 74707300 0a2f6865 ..GET.https../he6c6c6f2e 7478740a 75736572 2d616765 llo.txt.user-age6e743463 75726c2f 372e3136 2e33206c nt4curl/7.16.3 l69626375 726c2f37 2e31362e 33204f70 ibcurl/7.16.3 Op656e5353 4c2f302e 392e376c 207a6c69 enSSL/0.9.7l zli622f312e 322e3304 686f7374 0f777777 b/1.2.3.host.www2e657861 6d706c65 2e636f6d 0f616363 .example.com.acc6570742d 6c616e67 75616765 06656e2c ept-language.en,206d6900 00000000 00000000 00000000 mi.............
This indeterminate-length encoding contains 10 bytes of padding. As two additionalbytes can be truncated in the same way as the known-length example, anything upto 12 bytes can be removed from this message without affecting its meaning.¶
Response messages can contain interim (1xx) status codes, as the message inFigure 10 shows.Figure 10 includes examples of informationalstatus codes 102 and 103, as defined in[RFC2518] (now obsolete but defines status code 102) and[RFC8297], respectively.¶
HTTP/1.1 102 ProcessingRunning: "sleep 15"HTTP/1.1 103 Early HintsLink: </style.css>; rel=preload; as=styleLink: </script.js>; rel=preload; as=scriptHTTP/1.1 200 OKDate: Mon, 27 Jul 2009 12:28:53 GMTServer: ApacheLast-Modified: Wed, 22 Jul 2009 19:15:56 GMTETag: "34aa387-d-1568eb00"Accept-Ranges: bytesContent-Length: 51Vary: Accept-EncodingContent-Type: text/plainHello World! My content includes a trailing CRLF.
As this is a longer example, only the indeterminate-length encoding is shown inFigure 11. Note here that the specific text used in the reasonphrase is not retained by this encoding.¶
03406607 72756e6e 696e670a 22736c65 .@f.running."sle65702031 35220040 67046c69 6e6b233c ep 15".@g.link#<2f737479 6c652e63 73733e3b 2072656c /style.css>; rel3d707265 6c6f6164 3b206173 3d737479 =preload; as=sty6c65046c 696e6b24 3c2f7363 72697074 le.link$</script2e6a733e 3b207265 6c3d7072 656c6f61 .js>; rel=preloa643b2061 733d7363 72697074 0040c804 d; as=script.@..64617465 1d4d6f6e 2c203237 204a756c date.Mon, 27 Jul20323030 39203132 3a32383a 35332047 2009 12:28:53 G4d540673 65727665 72064170 61636865 MT.server.Apache0d6c6173 742d6d6f 64696669 65641d57 .last-modified.W65642c20 3232204a 756c2032 30303920 ed, 22 Jul 200931393a31 353a3536 20474d54 04657461 19:15:56 GMT.eta67142233 34616133 38372d64 2d313536 g."34aa387-d-15638656230 30220d61 63636570 742d7261 8eb00".accept-ra6e676573 05627974 65730e63 6f6e7465 nges.bytes.conte6e742d6c 656e6774 68023531 04766172 nt-length.51.var790f4163 63657074 2d456e63 6f64696e y.Accept-Encodin670c636f 6e74656e 742d7479 70650a74 g.content-type.t6578742f 706c6169 6e003348 656c6c6f ext/plain.3Hello20576f72 6c642120 4d792063 6f6e7465 World! My conte6e742069 6e636c75 64657320 61207472 nt includes a tr61696c69 6e672043 524c462e 0d0a0000 ailing CRLF.....
A response that uses the chunked encoding (seeSection 7.1 of [HTTP/1.1]) asshown inFigure 12 can be encoded using indeterminate-length encoding, whichminimizes buffering needed to translate into the binary format. However, chunkboundaries do not need to be retained, and any chunk extensions cannot beconveyed using the binary format; seeSection 6.¶
HTTP/1.1 200 OKTransfer-Encoding: chunked4This6 conte13;chunk-extension=foont contains CRLF.0Trailer: text
Figure 13 shows this message using the known-length encoding. Note thatthe Transfer-Encoding header field is removed.¶
0140c800 1d546869 7320636f 6e74656e .@...This conten7420636f 6e746169 6e732043 524c462e t contains CRLF.0d0a0d07 74726169 6c657204 74657874 ....trailer.text
This format is designed to carry HTTP semantics just like HTTP/1.1[HTTP/1.1], HTTP/2[HTTP/2], orHTTP/3[HTTP/3]. However, there are some notabledifferences between this format and the format used in an interactive protocolversion.¶
In particular, as a standalone representation, this format lacks the followingfeatures of the formats used in those protocols:¶
Some of these features are also absent in HTTP/2 and HTTP/3.¶
Unlike HTTP/2 and HTTP/3, this format uses a fixed format for control datarather than using pseudo-fields.¶
Note that while some messages -- CONNECT or upgrade requests in particular -- canbe represented using this format, doing so serves no purpose, as these requestsare used to affect protocol behavior, which this format cannot do withoutadditional mechanisms.¶
The"message/bhttp"
media type can be used to enclose a single HTTP request orresponse message, provided that it obeys the MIME restrictions for all"message" types regarding line length and encodings.¶
message¶
bhttp¶
N/A¶
N/A¶
Only "8bit" or "binary" is permitted.¶
N/A¶
RFC 9292¶
Applications seeking to convey HTTP semantics that are independent of aspecific protocol.¶
N/A¶
See the Authors' Addresses section.¶
COMMON¶
N/A¶
See the Authors' Addresses section.¶
IESG¶
Many of the considerations that apply to HTTP message handling apply to thisformat; seeSection 17 of [HTTP] andSection 11 of [HTTP/1.1] for commonissues in handling HTTP messages.¶
Strict parsing of the format with no tolerance for errors can help avoid anumber of attacks. However, implementations still need to be aware of thepossibility of resource exhaustion attacks that might arise from receivinglarge messages, particularly those with large numbers of fields.¶
Implementations need to ensure that they aren't subject to resource exhaustionattacks from maliciously crafted messages. Overall, the format is designed toallow for minimal state when processing messages. However, producing a combinedfield value (Section 5.2 of [HTTP]) for fields might require the commitment ofresources. In particular, combining might be necessary for theCookie
fieldwhen translating this format for use in other contexts, such as use in an API ortranslation to HTTP/1.1[HTTP/1.1], where the recipient of the field mightnot expect multiple values.¶
IANA has added the media type"message/bhttp"
to the "Media Types" registry at<https://www.iana.org/assignments/media-types>. SeeSection 7 for registrationinformation.¶
Julian Reschke,David Schinazi,Lucas Pardue, andTommy Pauly providedexcellent feedback on both the design and its documentation.¶