| draft-ietf-httpbis-header-structure-01.txt | draft-ietf-httpbis-header-structure-02.txt | |||
|---|---|---|---|---|
| HTTP Working Group P-H. Kamp | HTTP Working Group M. Nottingham | |||
| Internet-Draft The Varnish Cache Project | Internet-Draft Fastly | |||
| Intended status: Standards Track April 24, 2017 | Intended status: Standards Track P-H. Kamp | |||
| Expires: October 26, 2017 | Expires: May 31, 2018 The Varnish Cache Project | |||
| November 27, 2017 | ||||
| HTTP Header Common Structure | Structured Headers for HTTP | |||
| draft-ietf-httpbis-header-structure-01 | draft-ietf-httpbis-header-structure-02 | |||
| Abstract | Abstract | |||
| An abstract data model for HTTP headers, "Common Structure", and a | This document describes Structured Headers, a way of simplifying HTTP | |||
| HTTP/1 serialization of it, generalized from current HTTP headers. | header field definition and parsing. It is intended for use by new | |||
| specifications of HTTP header fields. This includes revisions of | ||||
| existing specifications when doing so does not cause interoperability | ||||
| issues. | ||||
| Note to Readers | Note to Readers | |||
| Discussion of this draft takes place on the HTTP working group | Discussion of this draft takes place on the HTTP working group | |||
| mailing list (ietf-http-wg@w3.org), which is archived at | mailing list (ietf-http-wg@w3.org), which is archived at | |||
| https://lists.w3.org/Archives/Public/ietf-http-wg/ . | https://lists.w3.org/Archives/Public/ietf-http-wg/ [1]. | |||
| Working Group information can be found at http://httpwg.github.io/ ; | _RFC EDITOR: please remove this section before publication_ | |||
| source code and issues list for this draft can be found at | ||||
| https://github.com/httpwg/http-extensions/labels/header-structure . | Working Group information can be found at https://httpwg.github.io/ | |||
| [2]; source code and issues list for this draft can be found at | ||||
| https://github.com/httpwg/http-extensions/labels/header-structure | ||||
| [3]. | ||||
| Status of This Memo | Status of This Memo | |||
| This Internet-Draft is submitted in full conformance with the | This Internet-Draft is submitted in full conformance with the | |||
| provisions of BCP 78 and BCP 79. | provisions of BCP 78 and BCP 79. | |||
| Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
| Task Force (IETF). Note that other groups may also distribute | Task Force (IETF). Note that other groups may also distribute | |||
| working documents as Internet-Drafts. The list of current Internet- | working documents as Internet-Drafts. The list of current Internet- | |||
| Drafts is at http://datatracker.ietf.org/drafts/current/. | Drafts is at https://datatracker.ietf.org/drafts/current/. | |||
| Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
| and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
| time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
| material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
| This Internet-Draft will expire on October 26, 2017. | This Internet-Draft will expire on May 31, 2018. | |||
| Copyright Notice | Copyright Notice | |||
| Copyright (c) 2017 IETF Trust and the persons identified as the | Copyright (c) 2017 IETF Trust and the persons identified as the | |||
| document authors. All rights reserved. | document authors. All rights reserved. | |||
| This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
| Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
| (http://trustee.ietf.org/license-info) in effect on the date of | (https://trustee.ietf.org/license-info) in effect on the date of | |||
| publication of this document. Please review these documents | publication of this document. Please review these documents | |||
| carefully, as they describe your rights and restrictions with respect | carefully, as they describe your rights and restrictions with respect | |||
| to this document. Code Components extracted from this document must | to this document. Code Components extracted from this document must | |||
| include Simplified BSD License text as described in Section 4.e of | include Simplified BSD License text as described in Section 4.e of | |||
| the Trust Legal Provisions and are provided without warranty as | the Trust Legal Provisions and are provided without warranty as | |||
| described in the Simplified BSD License. | described in the Simplified BSD License. | |||
| 1. Introduction | 1. Introduction | |||
| The HTTP protocol does not impose any structure or datamodel on the | Specifying the syntax of new HTTP header fields is an onerous task; | |||
| information in HTTP headers, the HTTP/1 serialization is the | even with the guidance in [RFC7231], Section 8.3.1, there are many | |||
| datamodel: An ASCII string without control characters. | decisions - and pitfalls - for a prospective HTTP header field | |||
| author. | ||||
| HTTP header definitions specify how the string must be formatted and | ||||
| while families of similar headers exist, it still requires an | ||||
| uncomfortable large number of bespoke parser and validation routines | ||||
| to process HTTP traffic correctly. | ||||
| In order to improve performance HTTP/2 and HPACK uses naive text- | ||||
| compression, which incidentally decoupled the on-the-wire | ||||
| serialization from the data model. | ||||
| During the development of HPACK it became evident that significantly | ||||
| bigger gains were available if semantic compression could be used, | ||||
| most notably with timestamps. However, the lack of a common data | ||||
| structure for HTTP headers would make semantic compression one long | ||||
| list of special cases. | ||||
| Parallel to this, various proposals for how to fulfill data- | ||||
| transportation needs, and to a lesser degree to impose some kind of | ||||
| order on HTTP headers, at least going forward, were floated. | ||||
| All of these proposals, JSON, CBOR etc. run into the same basic | ||||
| problem: Their serialization is incompatible with RFC 7230's | ||||
| [RFC7230] ABNF definition of 'field-value'. | ||||
| For binary formats, such as CBOR, a wholesale base64/85 | ||||
| reserialization would be needed, with negative results for both | ||||
| debugability and bandwidth. | ||||
| For textual formats, such as JSON, the format must first be neutered | ||||
| to not violate field-value's ABNF, and then workarounds added to | ||||
| reintroduce the features just lost, for instance UNICODE strings. | ||||
| The post-surgery format is no longer JSON, and it experience | ||||
| indicates that almost-but-not-quite compatibility is worse than no | ||||
| compatibility. | ||||
| This proposal starts from the other end, and builds and generalizes a | ||||
| data structure definition from existing HTTP headers, which means | ||||
| that HTTP/1 serialization and 'field-value' compatibility is built | ||||
| in. | ||||
| If all future HTTP headers are defined to fit into this Common | ||||
| Structure we have at least halted the proliferation of bespoke | ||||
| parsers and started to pave the road for semantic compression | ||||
| serializations of HTTP traffic. | ||||
| 1.1. Terminology | ||||
| In this document, the key words "MUST", "MUST NOT", "REQUIRED", | ||||
| "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", | ||||
| and "OPTIONAL" are to be interpreted as described in BCP 14, RFC 2119 | ||||
| [RFC2119]. | ||||
| 2. Definition of HTTP Header Common Structure | ||||
| The data model of Common Structure is an ordered sequence of named | Likewise, bespoke parsers often need to be written for specific HTTP | |||
| dictionaries. Please see Appendix A for how this model was derived. | headers, because each has slightly different handling of what looks | |||
| like common syntax. | ||||
| The definition of the data model is on purpose abstract, uncoupled | This document introduces structured HTTP header field values | |||
| from any protocol serialization or programming environment | (hereafter, Structured Headers) to address these problems. | |||
| representation, it is meant as the foundation on which all such | Structured Headers define a generic, abstract model for data, along | |||
| manifestations of the model can be built. | with a concrete serialisation for expressing that model in textual | |||
| HTTP headers, as used by HTTP/1 [RFC7230] and HTTP/2 [RFC7540]. | ||||
| Common Structure in ABNF (Slightly bastardized relative to RFC5234 | HTTP headers that are defined as Structured Headers use the types | |||
| [RFC5234]): | defined in this specification to define their syntax and basic | |||
| handling rules, thereby simplifying both their definition and | ||||
| parsing. | ||||
| import token from RFC7230 | Additionally, future versions of HTTP can define alternative | |||
| import DIGIT from RFC5234 | serialisations of the abstract model of Structured Headers, allowing | |||
| headers that use it to be transmitted more efficiently without being | ||||
| redefined. | ||||
| common-structure = 1* ( identifier dictionary ) | Note that it is not a goal of this document to redefine the syntax of | |||
| existing HTTP headers; the mechanisms described herein are only | ||||
| intended to be used with headers that explicitly opt into them. | ||||
| dictionary = * ( identifier [ value ] ) | To specify a header field that uses Structured Headers, see | |||
| Section 2. | ||||
| value = identifier / | Section 4 defines a number of abstract data types that can be used in | |||
| integer / | Structured Headers, of which only three are allowed at the "top" | |||
| number / | level: lists, dictionaries, or items. | |||
| ascii-string / | ||||
| unicode-string / | ||||
| blob / | ||||
| timestamp / | ||||
| common-structure | ||||
| Recursion is included as a way to to support deep and more general | Those abstract types can be serialised into textual headers - such as | |||
| data structures, but its use is highly discouraged and where it is | those used in HTTP/1 and HTTP/2 - using the algorithms described in | |||
| used the depth of recursion SHALL always be explicitly limited in the | Section 3. | |||
| specifications of the HTTP headers which allow it. | ||||
| identifier = token [ "/" token ] | 1.1. Notational Conventions | |||
| integer = ["-"] 1*19 DIGIT | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||
| "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and | ||||
| "OPTIONAL" in this document are to be interpreted as described in BCP | ||||
| 14 [RFC2119] [RFC8174] when, and only when, they appear in all | ||||
| capitals, as shown here. | ||||
| Integers SHALL be in the range +/- 2^63-1 (= +/- 9223372036854775807) | This document uses the Augmented Backus-Naur Form (ABNF) notation of | |||
| [RFC5234], including the DIGIT, ALPHA and DQUOTE rules from that | ||||
| document. It also includes the OWS rule from [RFC7230]. | ||||
| number = ["-"] DIGIT '.' 1*14DIGIT / | 2. Specifying Structured Headers | |||
| ["-"] 2DIGIT '.' 1*13DIGIT / | ||||
| ["-"] 3DIGIT '.' 1*12DIGIT / | ||||
| ... / | ||||
| ["-"] 12DIGIT '.' 1*3DIGIT / | ||||
| ["-"] 13DIGIT '.' 1*2DIGIT / | ||||
| ["-"] 14DIGIT '.' 1DIGIT | ||||
| The limit of 15 significant digits is chosen so that numbers can be | HTTP headers that use Structured Headers need to be defined to do so | |||
| correctly represented by IEEE754 64 bit binary floating point. | explicitly; recipients and generators need to know that the | |||
| requirements of this document are in effect. The simplest way to do | ||||
| that is by referencing this document in its definition. | ||||
| ascii-string = * %x20-7e | The field's definition will also need to specify the field-value's | |||
| allowed syntax, in terms of the types described in Section 4, along | ||||
| with their associated semantics. | ||||
| This is intended to be an efficient, "safe" and uncomplicated string | Field definitions MUST NOT relax or otherwise modify the requirements | |||
| type, for uses where the string content is culturally neutral or | of this specification; doing so would preclude handling by generic | |||
| where it will not be user visible. | software. | |||
| unicode-string = * UNICODE | However, field definitions are encouraged to clearly state additional | |||
| constraints upon the syntax, as well as the consequences when those | ||||
| constraints are violated. | ||||
| UNICODE = <U+0000-U+D7FF / U+E000-U+10FFFF> | For example: | |||
| # UNICODE nicked from draft-seantek-unicode-in-abnf-02 | ||||
| Unicode-strings are unrestricted because there is no sane and/or | # FooExample Header | |||
| culturally neutral way to subset or otherwise make unicode "safe", | ||||
| and Unicode is still evolving new and interesting code points. | ||||
| Users of unicode-string SHALL be prepared for the full gammut of | The FooExample HTTP header field conveys a list of numbers about how | |||
| glyph-gymnastics in order to avoid U+1F4A9 U+08 U+1F574. | much Foo the sender has. | |||
| blob = * %0x00-ff | FooExample is a Structured header [RFCxxxx]. Its value MUST be a | |||
| dictionary ([RFCxxxx], Section Y.Y). | ||||
| Blobs are intended primarily for cryptographic data, but can be used | The dictionary MUST contain: | |||
| for any otherwise unsatisfied needs. | ||||
| timestamp = number | * A member whose key is "foo", and whose value is an integer | |||
| ([RFCxxxx], Section Y.Y), indicating the number of foos in | ||||
| the message. | ||||
| * A member whose key is "bar", and whose value is a string | ||||
| ([RFCxxxx], Section Y.Y), conveying the characteristic bar-ness | ||||
| of the message. | ||||
| A timestamp counts seconds since the UNIX time_t epoch, including the | If the parsed header field does not contain both, it MUST be ignored. | |||
| "invisible leap-seconds" misfeature. | ||||
| 3. HTTP/1 Serialization of HTTP Header Common Structure | Note that empty header field values are not allowed by the syntax, | |||
| and therefore will be considered errors. | ||||
| In ABNF: | 3. Parsing Requirements for Textual Headers | |||
| import OWS from RFC7230 | When a receiving implementation parses textual HTTP header fields | |||
| import HEXDIG, DQUOTE from RFC5234 | (e.g., in HTTP/1 or HTTP/2) that are known to be Structured Headers, | |||
| import EmbeddedUnicodeChar from RFC5137 | it is important that care be taken, as there are a number of edge | |||
| cases that can cause interoperability or even security problems. | ||||
| This section specifies the algorithm for doing so. | ||||
| h1-common-structure-header = | Given an ASCII string input_string that represents the chosen | |||
| h1-common-structure-legacy-header / | header's field-value, return the parsed header value. Note that | |||
| h1-common-structure-self-identifying-header | input_string may incorporate multiple header lines combined into one | |||
| comma-separated field-value, as per [RFC7230], Section 3.2.2. | ||||
| h1-common-structure-legacy-header = | 1. Discard any OWS from the beginning of input_string. | |||
| field-name ":" OWS h1-common-structure | ||||
| Only white-listed legacy headers (see Section 8) can use this format. | 2. If the field-value is defined to be a dictionary, return the | |||
| result of Parsing a Dictionary from Textual headers | ||||
| (Section 4.7). | ||||
| h1-common-structure-self-identifying-header: | 3. If the field-value is defined to be a list, return the result of | |||
| field-name ":" OWS ">" h1-common-structure "<" | Parsing a List from Textual Headers (Section 4.8). | |||
| h1-common-structure = h1-element * ("," h1-element) | 4. If the field-value is defined to be a parameterised label, return | |||
| the result of Parsing a Parameterised Label from Textual headers | ||||
| (Section 4.4). | ||||
| h1-element = identifier * (";" identifier ["=" h1-value]) | 5. Otherwise, return the result of Parsing an Item from Textual | |||
| Headers (Section 4.6). | ||||
| h1-value = identifier / | Note that in the case of lists and dictionaries, this has the effect | |||
| integer / | of combining multiple instances of the header field into one. | |||
| number / | However, for singular items and parameterised labels, it has the | |||
| h1-ascii-string / | effect of selecting the first value and ignoring any subsequent | |||
| h1-unicode-string / | instances of the field, as well as extraneous text afterwards. | |||
| h1-blob / | ||||
| h1-timestamp / | ||||
| ">" h1-common-structure "<" | ||||
| h1-ascii-string = DQUOTE *( | Additionally, note that the effect of the parsing algorithms as | |||
| ( "\" DQUOTE ) / | specified is generally intolerant of syntax errors; if one is | |||
| ( "\" "\" ) / | encountered, the typical response is to throw an error, thereby | |||
| 0x20-21 / | discarding the entire header field value. This includes any non- | |||
| 0x23-5B / | ASCII characters in input_string. | |||
| 0x5D-7E | ||||
| ) DQUOTE | ||||
| h1-unicode-string = DQUOTE *( | 4. Structured Header Data Types | |||
| ( "\" DQUOTE ) | ||||
| ( "\" "\" ) / | ||||
| EmbeddedUnicodeChar / | ||||
| 0x20-21 / | ||||
| 0x23-5B / | ||||
| 0x5D-7E / | ||||
| ) DQUOTE | ||||
| The dim prospects of ever getting a majority of HTTP1 paths 8-bit | This section defines the abstract value types that can be composed | |||
| clean makes UTF-8 unviable as H1 serialization. Given that very | into Structured Headers, along with the textual HTTP serialisations | |||
| little of the information in HTTP headers is presented to users in | of them. | |||
| the first place, improving H1 and HPACK efficiency by inventing a | ||||
| more efficient RFC5137 compliant escape-sequences seems unwarranted. | ||||
| h1-blob = ":" base64 ":" | 4.1. Numbers | |||
| # XXX: where to import base64 from ? | ||||
| h1-timestamp = number | Abstractly, numbers are integers with an optional fractional part. | |||
| They have a maximum of fifteen digits available to be used in one or | ||||
| both of the parts, as reflected in the ABNF below; this allows them | ||||
| to be stored as IEEE 754 double precision numbers (binary64) | ||||
| ([IEEE754]). | ||||
| XXX: Allow OWS in parsers, but not in generators ? | The textual HTTP serialisation of numbers allows a maximum of fifteen | |||
| In programming environments which do not define a native | digits between the integer and fractional part, along with an | |||
| representation or serialization of Common Structure, the HTTP/1 | optional "-" indicating negative numbers. | |||
| serialization should be used. | ||||
| 4. When to use Common Structure Parser | number = ["-"] ( "." 1*15DIGIT / | |||
| DIGIT "." 1*14DIGIT / | ||||
| 2DIGIT "." 1*13DIGIT / | ||||
| 3DIGIT "." 1*12DIGIT / | ||||
| 4DIGIT "." 1*11DIGIT / | ||||
| 5DIGIT "." 1*10DIGIT / | ||||
| 6DIGIT "." 1*9DIGIT / | ||||
| 7DIGIT "." 1*8DIGIT / | ||||
| 8DIGIT "." 1*7DIGIT / | ||||
| 9DIGIT "." 1*6DIGIT / | ||||
| 10DIGIT "." 1*5DIGIT / | ||||
| 11DIGIT "." 1*4DIGIT / | ||||
| 12DIGIT "." 1*3DIGIT / | ||||
| 13DIGIT "." 1*2DIGIT / | ||||
| 14DIGIT "." 1DIGIT / | ||||
| 15DIGIT ) | ||||
| All future standardized and all private HTTP headers using Common | integer = ["-"] 1*15DIGIT | |||
| Structure should self identify as such. In the HTTP/1 serialization | unsigned = 1*15DIGIT | |||
| by making the first character ">" and the last "<". (These two | ||||
| characters are deliberately "the wrong way" to not clash with | ||||
| exsisting usages.) | ||||
| Legacy HTTP headers which fit into Common Structure, are marked as | integer and unsigned are defined as conveniences to specification | |||
| such in the IANA Message Header Registry (see Section 8), and a | authors; if their use is specified and their ABNF is not matched, a | |||
| snapshot of the registry can be used to trigger parsing according to | parser MUST consider it to be invalid. | |||
| Common Structure of these headers. | ||||
| 5. Desired Normative Effects | For example, a header whose value is defined as a number could look | |||
| like: | ||||
| All new HTTP headers SHOULD use the Common Structure if at all | ExampleNumberHeader: 4.5 | |||
| possible. | ||||
| 6. Open/Outstanding issues to resolve | 4.1.1. Parsing Numbers from Textual Headers | |||
| 6.1. Single/Multiple Headers | TBD | |||
| Should we allow splitting common structure data over multiple headers | 4.2. Strings | |||
| ? | ||||
| Pro: | Abstractly, strings are ASCII strings [RFC0020], excluding control | |||
| characters (i.e., the range 0x20 to 0x7E). Note that this excludes | ||||
| tabs, newlines and carriage returns. They may be at most 1024 | ||||
| characters long. | ||||
| Avoids size restrictions, easier on-the-fly editing | The textual HTTP serialisation of strings uses a backslash ("") to | |||
| escape double quotes and backslashes in strings. | ||||
| Contra: | string = DQUOTE 1*1024(char) DQUOTE | |||
| char = unescaped / escape ( DQUOTE / "\" ) | ||||
| unescaped = %x20-21 / %x23-5B / %x5D-7E | ||||
| escape = "\" | ||||
| For example, a header whose value is defined as a string could look | ||||
| like: | ||||
| Cannot act on any such header until all headers have been received. | ExampleStringHeader: "hello world" | |||
| We must define where headers can be split (between identifier and | Note that strings only use DQUOTE as a delimiter; single quotes do | |||
| dictionary ?, in the middle of dictionaries ?) | not delimit strings. Furthermore, only DQUOTE and "" can be escaped; | |||
| other sequences MUST generate an error. | ||||
| Most on-the-fly editing is hackish at best. | Unicode is not directly supported in Structured Headers, because it | |||
| causes a number of interoperability issues, and - with few exceptions | ||||
| - header values do not require it. | ||||
| 7. Future Work | When it is necessary for a field value to convey non-ASCII string | |||
| 7.1. Redefining existing headers for better performance | content, binary content (Section 4.5) SHOULD be specified, along with | |||
| a character encoding (most likely, UTF-8). | ||||
| The HTTP/1 serializations self-identification mechanism makes it | 4.2.1. Parsing a String from Textual Headers | |||
| possible to extend the definition of existing Appendix A.5 headers | ||||
| into Common Structure. | ||||
| For instance one could imagine: | Given an ASCII string input_string, return an unquoted string. | |||
| input_string is modified to remove the parsed value. | ||||
| Date: >1475061449.201< | 1. Let output_string be an empty string. | |||
| Which would be faster to parse and validate than the current | 2. If the first character of input_string is not DQUOTE, throw an | |||
| definition of the Date header and more precise too. | error. | |||
| Some kind of signal/negotiation mechanism would be required to make | 3. Discard the first character of input_string. | |||
| this work in practice. | ||||
| 7.2. Define a validation dictionary | 4. If input_string contains more than 1025 characters, throw an | |||
| error. | ||||
| A machine-readable specification of the legal contents of HTTP | 5. While input_string is not empty: | |||
| headers would go a long way to improve efficiency and security in | ||||
| HTTP implementations. | ||||
| 8. IANA Considerations | 1. Let char be the result of removing the first character of | |||
| input_string. | ||||
| The IANA Message Header Registry will be extended with an additional | 2. If char is a backslash ("\"): | |||
| field named "Common Structure" which can have the values "True", | ||||
| "False" or "Unknown". | ||||
| The RFC723x headers listed in Appendix A.4 will get the value "True" | 1. If input_string is now empty, throw an error. | |||
| in the new field. | ||||
| The RFC723x headers listed in Appendix A.5 will get the value "False" | 2. Else: | |||
| in the new field. | ||||
| All other existing entries in the registry will be set to "Unknown" | 1. Let next_char be the result of removing the first | |||
| until and if the owner of the entry requests otherwise. | character of input_string. | |||
| 9. Security Considerations | 2. If next_char is not DQUOTE or "\", throw an error. | |||
| Unique dictionary keys are required to reduce the risk of smuggling | 3. Append next_char to output_string. | |||
| attacks. | ||||
| 10. References | 3. Else, if char is DQUOTE, remove the first character of | |||
| 10.1. Normative References | input_string and return output_string. | |||
| [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | 4. Else, append char to output_string. | |||
| Requirement Levels", BCP 14, RFC 2119, | ||||
| DOI 10.17487/RFC2119, March 1997, | ||||
| <http://www.rfc-editor.org/info/rfc2119>. | ||||
| [RFC5137] Klensin, J., "ASCII Escaping of Unicode Characters", | 6. Otherwise, throw an error. | |||
| BCP 137, RFC 5137, DOI 10.17487/RFC5137, February 2008, | ||||
| <http://www.rfc-editor.org/info/rfc5137>. | ||||
| [RFC5234] Crocker, D., Ed. and P. Overell, "Augmented BNF for Syntax | 4.3. Labels | |||
| Specifications: ABNF", STD 68, RFC 5234, | ||||
| DOI 10.17487/RFC5234, January 2008, | ||||
| <http://www.rfc-editor.org/info/rfc5234>. | ||||
| [RFC7230] Fielding, R., Ed. and J. Reschke, Ed., "Hypertext Transfer | Labels are short (up to 256 characters) textual identifiers; their | |||
| Protocol (HTTP/1.1): Message Syntax and Routing", | abstract model is identical to their expression in the textual HTTP | |||
| RFC 7230, DOI 10.17487/RFC7230, June 2014, | serialisation. | |||
| <http://www.rfc-editor.org/info/rfc7230>. | ||||
| 10.2. Informative References | label = lcalpha *255( lcalpha / DIGIT / "_" / "-"/ "*" / "/" ) | |||
| lcalpha = %x61-7A ; a-z | ||||
| [RFC7231] Fielding, R., Ed. and J. Reschke, Ed., "Hypertext Transfer | Note that labels can only contain lowercase letters. | |||
| Protocol (HTTP/1.1): Semantics and Content", RFC 7231, | ||||
| DOI 10.17487/RFC7231, June 2014, | ||||
| <http://www.rfc-editor.org/info/rfc7231>. | ||||
| [RFC7232] Fielding, R., Ed. and J. Reschke, Ed., "Hypertext Transfer | For example, a header whose value is defined as a label could look | |||
| Protocol (HTTP/1.1): Conditional Requests", RFC 7232, | like: | |||
| DOI 10.17487/RFC7232, June 2014, | ||||
| <http://www.rfc-editor.org/info/rfc7232>. | ||||
| [RFC7233] Fielding, R., Ed., Lafon, Y., Ed., and J. Reschke, Ed., | ExampleLabelHeader: foo/bar | |||
| "Hypertext Transfer Protocol (HTTP/1.1): Range Requests", | ||||
| RFC 7233, DOI 10.17487/RFC7233, June 2014, | ||||
| <http://www.rfc-editor.org/info/rfc7233>. | ||||
| [RFC7234] Fielding, R., Ed., Nottingham, M., Ed., and J. Reschke, | 4.3.1. Parsing a Label from Textual Headers | |||
| Ed., "Hypertext Transfer Protocol (HTTP/1.1): Caching", | ||||
| RFC 7234, DOI 10.17487/RFC7234, June 2014, | ||||
| <http://www.rfc-editor.org/info/rfc7234>. | ||||
| [RFC7235] Fielding, R., Ed. and J. Reschke, Ed., "Hypertext Transfer | Given an ASCII string input_string, return a label. input_string is | |||
| Protocol (HTTP/1.1): Authentication", RFC 7235, | modified to remove the parsed value. | |||
| DOI 10.17487/RFC7235, June 2014, | ||||
| <http://www.rfc-editor.org/info/rfc7235>. | ||||
| [RFC7239] Petersson, A. and M. Nilsson, "Forwarded HTTP Extension", | 1. If input_string contains more than 256 characters, throw an | |||
| RFC 7239, DOI 10.17487/RFC7239, June 2014, | error. | |||
| <http://www.rfc-editor.org/info/rfc7239>. | ||||
| [RFC7694] Reschke, J., "Hypertext Transfer Protocol (HTTP) Client- | 2. If the first character of input_string is not lcalpha, throw an | |||
| Initiated Content-Encoding", RFC 7694, | error. | |||
| DOI 10.17487/RFC7694, November 2015, | ||||
| <http://www.rfc-editor.org/info/rfc7694>. | ||||
| Appendix A. Do HTTP headers have any common structure ? | 3. Let output_string be an empty string. | |||
| Several proposals have been floated in recent years to use some | 4. While input_string is not empty: | |||
| preexisting structured data serialization or other for HTTP headers, | ||||
| to impose some sanity. | ||||
| None of these proposals have gained traction and no obvious candidate | 1. Let char be the result of removing the first character of | |||
| data serializations have been left unexamined. | input_string. | |||
| This effort tries to tackle the question from the other side, by | 2. If char is not one of lcalpha, DIGIT, "_", "-", "*" or "/": | |||
| asking if there is a common structure in existing HTTP headers we can | ||||
| generalize for this purpose. | ||||
| A.1. Survey of HTTP header structure | 1. Prepend char to input_string. | |||
| The RFC723x family of HTTP/1 standards control 49 entries in the IANA | 2. Return output_string. | |||
| Message Header Registry, and they share two common motifs. | ||||
| The majority of RFC723x HTTP headers are lists. A few of them are | 3. Append char to output_string. | |||
| ordered, ('Content-Encoding'), some are unordered ('Connection') and | ||||
| some are ordered by 'q=%f' weight parameters ('Accept') | ||||
| In most cases, the list elements are some kind of identifier, usually | 5. Return output_string. | |||
| derived from ABNF 'token' as defined by [RFC7230]. | ||||
| A subgroup of headers, mostly related to MIME, uses what one could | 4.4. Parameterised Labels | |||
| call a 'qualified token':: | ||||
| qualified-token = token-or-asterix [ "/" token-or-asterix ] | Parameterised Labels are labels (Section 4.3) with up to 256 | |||
| parameters; each parameter has a label and an optional value that is | ||||
| an item (Section 4.6). Ordering between parameters is not | ||||
| significant, and duplicate parameters MUST be considered an error. | ||||
| The second motif is parameterized list elements. The best known is | The textual HTTP serialisation uses semicolons (";") to delimit the | |||
| the "q=0.5" weight parameter, but other parameters exist as well. | parameters from each other, and equals ("=") to delimit the parameter | |||
| name from its value. | ||||
| Generalizing from these motifs, our candidate "Common Structure" data | parameterised = label *256( OWS ";" OWS label [ "=" item ] ) | |||
| model becomes an ordered list of named dictionaries. | ||||
| In pidgin ABNF, ignoring white-space for the sake of clarity, the | For example, | |||
| HTTP/1.1 serialization of Common Structure is is something like: | ||||
| token-or-asterix = token from RFC7230, but also allowing "*" | ExampleParamHeader: abc; a=1; b=2; c | |||
| qualified-token = token-or-asterix [ "/" token-or-asterix ] | 4.4.1. Parsing a Parameterised Label from Textual Headers | |||
| field-name, see RFC7230 | Given an ASCII string input_string, return a label with an mapping of | |||
| parameters. input_string is modified to remove the parsed value. | ||||
| Common-Structure-Header = field-name ":" 1#named-dictionary | 1. Let primary_label be the result of Parsing a Label from Textual | |||
| Headers (Section 4.3) from input_string. | ||||
| named-dictionary = qualified-token [ *(";" param) ] | 2. Let parameters be an empty mapping. | |||
| param = token [ "=" value ] | 3. In a loop: | |||
| value = we'll get back to this in a moment. | 1. Consume any OWS from the beginning of input_string. | |||
| Nineteen out of the RFC723x's 48 headers, almost 40%, can already be | 2. If the first character of input_string is not ";", exit the | |||
| parsed using this definition, and none the rest have requirements | loop. | |||
| which could not be met by this data model. See Appendix A.4 and | ||||
| Appendix A.5 for the full survey details. | ||||
| A.2. Survey of values in HTTP headers | 3. Consume a ";" character from the beginning of input_string. | |||
| Surveying the datatypes of HTTP headers, standardized as well as | 4. Consume any OWS from the beginning of input_string. | |||
| private, the following picture emerges: | ||||
| A.2.1. Numbers | 5. let param_name be the result of Parsing a Label from Textual | |||
| Headers (Section 4.3) from input_string. | ||||
| Integer and floating point are both used. Range and precision is | 6. If param_name is already present in parameters, throw an | |||
| mostly unspecified in controlling documents. | error. | |||
| Scientific notation (9.192631770e9) does not seem to be used | 7. Let param_value be a null value. | |||
| anywhere. | ||||
| The ranges used seem to be minus several thousand to plus a couple of | 8. If the first character of input_string is "=": | |||
| billions, the high end almost exclusively being POSIX time_t | ||||
| timestamps. | ||||
| A.2.2. Timestamps | 1. Consume the "=" character at the beginning of | |||
| input_string. | ||||
| RFC723x text format, but POSIX time_t represented as integer or | 2. Let param_value be the result of Parsing an Item from | |||
| floating point is not uncommon. ISO8601 have also been spotted. | Textual Headers (Section 4.6) from input_string. | |||
| A.2.3. Strings | 9. If parameters has more than 255 members, throw an error. | |||
| The vast majority are pure ASCII strings, with either no escapes, %xx | 10. Add param_name to parameters with the value param_value. | |||
| URL-like escapes or C-style back-slash escapes, possibly with the | ||||
| addition of \uxxxx UNICODE escapes. | ||||
| Where non-ASCII character sets are used, they are almost always | 4. Return the tuple (primary_label, parameters). | |||
| implicit, rather than explicit. UTF8 and ISO-8859-1 seem to be most | ||||
| common. | ||||
| A.2.4. Binary blobs | 4.5. Binary Content | |||
| Often used for cryptographic data. Usually in base64 encoding, | Arbitrary binary content up to 16K in size can be conveyed in | |||
| sometimes ""-quoted more often not. base85 encoding is also seen, | Structured Headers. | |||
| usually quoted. | ||||
| A.2.5. Identifiers | The textual HTTP serialisation indicates their presence by a leading | |||
| "*", with the data encoded using Base 64 Encoding [RFC4648], without | ||||
| padding (as "=" might be confused with the use of dictionaries). | ||||
| Seems to almost always fit in the RFC723x 'token' definition. | binary = "*" 1*21846(base64) | |||
| base64 = ALPHA / DIGIT / "+" / "/" | ||||
| A.3. Is this actually a useful thing to generalize ? | For example, a header whose value is defined as binary content could | |||
| look like: | ||||
| The number one wishlist item seems to be UNICODE strings, with a big | ExampleBinaryHeader: *cHJldGVuZCB0aGlzIGlzIGJpbmFyeSBjb250ZW50Lg | |||
| side order of not having to write a new parser routine every time | ||||
| somebody comes up with a new header. | ||||
| Having a common parser would indeed be a good thing, and having an | 4.5.1. Parsing Binary Content from Textual Headers | |||
| underlying data model which makes it possible define a compressed | ||||
| serialization, rather than rely on serialization to text followed by | ||||
| text compression (ie: HPACK) seems like a good idea too. | ||||
| However, when using a datamodel and a parser general enough to | Given an ASCII string input_string, return binary content. | |||
| transport useful data, it will have to be followed by a validation | input_string is modified to remove the parsed value. | |||
| step, which checks that the data also makes sense. | ||||
| Today validation, such as it is, is often done by the bespoke | 1. If the first character of input_string is not "*", throw an | |||
| parsers. | error. | |||
| This then is probably where the next big potential for improvement | 2. Discard the first character of input_string. | |||
| lies: | ||||
| Ideally a machine readable "data dictionary" which makes it possibly | 3. Let b64_content be the result of removing content of input_string | |||
| to copy that text out of RFCs, run it through a code generator which | up to but not including the first character that is not in ALPHA, | |||
| spits out validation code which operates on the output of the common | DIGIT, "+" or "/". | |||
| parser. | ||||
| But history has been particularly unkind to that idea. | 4. Let binary_content be the result of Base 64 Decoding [RFC4648] | |||
| b64_content, synthesising padding if necessary. If an error is | ||||
| encountered, throw it. | ||||
| Most attempts studied as part of this effort, have sunk under | 5. Return binary_content. | |||
| complexity caused by reaching for generality, but where scope has | ||||
| been wisely limited, it seems to be possible. | ||||
| So file that idea under "future work". | 4.6. Items | |||
| A.4. RFC723x headers with "common structure" | An item is can be a number (Section 4.1), string (Section 4.2), label | |||
| (Section 4.3) or binary content (Section 4.5). | ||||
| o Accept [RFC7231], Section 5.3.2 | item = number / string / label / binary | |||
| o Accept-Charset [RFC7231], Section 5.3.3 | 4.6.1. Parsing an Item from Textual Headers | |||
| o Accept-Encoding [RFC7231], Section 5.3.4, [RFC7694], Section 3 | Given an ASCII string input_string, return an item. input_string is | |||
| modified to remove the parsed value. | ||||
| o Accept-Language [RFC7231], Section 5.3.5 | 1. Discard any OWS from the beginning of input_string. | |||
| o Age [RFC7234], Section 5.1 | 2. If the first character of input_string is a "-" or a DIGIT, | |||
| process input_string as a number (Section 4.1) and return the | ||||
| result, throwing any errors encountered. | ||||
| o Allow [RFC7231], Section 7.4.1 | 3. If the first character of input_string is a DQUOTE, process | |||
| input_string as a string (Section 4.2) and return the result, | ||||
| throwing any errors encountered. | ||||
| o Connection [RFC7230], Section 6.1 | 4. If the first character of input_string is "*", process | |||
| input_string as binary content (Section 4.5) and return the | ||||
| result, throwing any errors encountered. | ||||
| o Content-Encoding [RFC7231], Section 3.1.2.2 | 5. If the first character of input_string is an lcalpha, process | |||
| input_string as a label (Section 4.3) and return the result, | ||||
| throwing any errors encountered. | ||||
| o Content-Language [RFC7231], Section 3.1.3.2 | 6. Otherwise, throw an error. | |||
| o Content-Length [RFC7230], Section 3.3.2 | 4.7. Dictionaries | |||
| o Content-Type [RFC7231], Section 3.1.1.5 | Dictionaries are unordered maps of key-value pairs, where the keys | |||
| are labels (Section 4.3) and the values are items (Section 4.6). | ||||
| There can be between 1 and 1024 members, and keys are required to be | ||||
| unique. | ||||
| o Expect [RFC7231], Section 5.1.1 | In the textual HTTP serialisation, keys and values are separated by | |||
| "=" (without whitespace), and key/value pairs are separated by a | ||||
| comma with optional whitespace. | ||||
| o Max-Forwards [RFC7231], Section 5.1.2 | dictionary = label "=" item *1023( OWS "," OWS label "=" item ) | |||
| o MIME-Version [RFC7231], Appendix A.1 | For example, a header field whose value is defined as a dictionary | |||
| could look like: | ||||
| o TE [RFC7230], Section 4.3 | ExampleDictHeader: foo=1.23, en="Applepie", da=*w4ZibGV0w6ZydGUK | |||
| o Trailer [RFC7230], Section 4.4 | Typically, a header field specification will define the semantics of | |||
| individual keys, as well as whether their presence is required or | ||||
| optional. Recipients MUST ignore keys that are undefined or unknown, | ||||
| unless the header field's specification specifically disallows them. | ||||
| o Transfer-Encoding [RFC7230], Section 3.3.1 | 4.7.1. Parsing a Dictionary from Textual Headers | |||
| o Upgrade [RFC7230], Section 6.7 | Given an ASCII string input_string, return a mapping of (label, | |||
| item). input_string is modified to remove the parsed value. | ||||
| o Vary [RFC7231], Section 7.1.4 | 1. Let dictionary be an empty mapping. | |||
| A.5. RFC723x headers with "uncommon structure" | 2. While input_string is not empty: | |||
| 1 of the RFC723x headers is only reserved, and therefore have no | 1. Let this_key be the result of running Parse Label from | |||
| structure at all: | Textual Headers (Section 4.3) with input_string. If an error | |||
| is encountered, throw it. | ||||
| o Close [RFC7230], Section 8.1 | 2. If dictionary already contains this_key, raise an error. | |||
| 5 of the RFC723x headers are HTTP dates: | 3. Consume a "=" from input_string; if none is present, raise an | |||
| error. | ||||
| o Date [RFC7231], Section 7.1.1.2 | 4. Let this_value be the result of running Parse Item from | |||
| Textual Headers (Section 4.6) with input_string. If an error | ||||
| is encountered, throw it. | ||||
| o Expires [RFC7234], Section 5.3 | 5. Add key this_key with value this_value to dictionary. | |||
| o If-Modified-Since [RFC7232], Section 3.3 | 6. Discard any leading OWS from input_string. | |||
| o If-Unmodified-Since [RFC7232], Section 3.4 | 7. If input_string is empty, return dictionary. | |||
| o Last-Modified [RFC7232], Section 2.2 | 8. Consume a COMMA from input_string; if no comma is present, | |||
| raise an error. | ||||
| 24 of the RFC723x headers use bespoke formats which only a single or | 9. Discard any leading OWS from input_string. | |||
| in rare cases two headers share: | ||||
| o Accept-Ranges [RFC7233], Section 2.3 | 3. Return dictionary. | |||
| * bytes-unit / other-range-unit | 4.8. Lists | |||
| o Authorization [RFC7235], Section 4.2 | Lists are arrays of items (Section 4.6) or parameterised labels | |||
| (Section 4.4, with one to 1024 members. | ||||
| o Proxy-Authorization [RFC7235], Section 4.4 | In the textual HTTP serialisation, each member is separated by a | |||
| comma and optional whitespace. | ||||
| * credentials | list = list_member 1*1024( OWS "," OWS list_member ) | |||
| list_member = item / parameterised | ||||
| o Cache-Control [RFC7234], Section 5.2 | For example, a header field whose value is defined as a list of | |||
| labels could look like: | ||||
| * 1#cache-directive | ExampleLabelListHeader: foo, bar, baz_45 | |||
| o Content-Location [RFC7231], Section 3.1.4.2 | and a header field whose value is defined as a list of parameterised | |||
| labels could look like: | ||||
| * absolute-URI / partial-URI | ExampleParamListHeader: abc/def; g="hi";j, klm/nop | |||
| o Content-Range [RFC7233], Section 4.2 | 4.8.1. Parsing a List from Textual Headers | |||
| * byte-content-range / other-content-range | Given an ASCII string input_string, return a list of items. | |||
| input_string is modified to remove the parsed value. | ||||
| o ETag [RFC7232], Section 2.3 | 1. Let items be an empty array. | |||
| * entity-tag | 2. While input_string is not empty: | |||
| o Forwarded [RFC7239] | 1. Let item be the result of running Parse Item from Textual | |||
| Headers (Section 4.6) with input_string. If an error is | ||||
| encountered, throw it. | ||||
| * 1#forwarded-element | 2. Append item to items. | |||
| o From [RFC7231], Section 5.5.1 | 3. Discard any leading OWS from input_string. | |||
| * mailbox | 4. If input_string is empty, return items. | |||
| o If-Match [RFC7232], Section 3.1 | 5. Consume a COMMA from input_string; if no comma is present, | |||
| o If-None-Match [RFC7232], Section 3.2 | raise an error. | |||
| * "*" / 1#entity-tag | 6. Discard any leading OWS from input_string. | |||
| o If-Range [RFC7233], Section 3.2 | 3. Return items. | |||
| * entity-tag / HTTP-date | 5. IANA Considerations | |||
| o Host [RFC7230], Section 5.4 | This draft has no actions for IANA. | |||
| * uri-host [ ":" port ] | 6. Security Considerations | |||
| o Location [RFC7231], Section 7.1.2 | TBD | |||
| * URI-reference | 7. References | |||
| o Pragma [RFC7234], Section 5.4 | 7.1. Normative References | |||
| * 1#pragma-directive | [RFC0020] Cerf, V., "ASCII format for network interchange", STD 80, | |||
| RFC 20, DOI 10.17487/RFC0020, October 1969, | ||||
| <https://www.rfc-editor.org/info/rfc20>. | ||||
| o Range [RFC7233], Section 3.1 | [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate | |||
| Requirement Levels", BCP 14, RFC 2119, | ||||
| DOI 10.17487/RFC2119, March 1997, | ||||
| <https://www.rfc-editor.org/info/rfc2119>. | ||||
| * byte-ranges-specifier / other-ranges-specifier | [RFC4648] Josefsson, S., "The Base16, Base32, and Base64 Data | |||
| Encodings", RFC 4648, DOI 10.17487/RFC4648, October 2006, | ||||
| <https://www.rfc-editor.org/info/rfc4648>. | ||||
| o Referer [RFC7231], Section 5.5.2 | [RFC5234] Crocker, D., Ed. and P. Overell, "Augmented BNF for Syntax | |||
| Specifications: ABNF", STD 68, RFC 5234, | ||||
| DOI 10.17487/RFC5234, January 2008, | ||||
| <https://www.rfc-editor.org/info/rfc5234>. | ||||
| * absolute-URI / partial-URI | [RFC7230] Fielding, R., Ed. and J. Reschke, Ed., "Hypertext Transfer | |||
| Protocol (HTTP/1.1): Message Syntax and Routing", | ||||
| RFC 7230, DOI 10.17487/RFC7230, June 2014, | ||||
| <https://www.rfc-editor.org/info/rfc7230>. | ||||
| o Retry-After [RFC7231], Section 7.1.3 | [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC | |||
| 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, | ||||
| May 2017, <https://www.rfc-editor.org/info/rfc8174>. | ||||
| * HTTP-date / delay-seconds | 7.2. Informative References | |||
| o Server [RFC7231], Section 7.4.2 | [IEEE754] IEEE, "IEEE Standard for Floating-Point Arithmetic", 2008, | |||
| <http://grouper.ieee.org/groups/754/>. | ||||
| o User-Agent [RFC7231], Section 5.5.3 | [RFC7231] Fielding, R., Ed. and J. Reschke, Ed., "Hypertext Transfer | |||
| Protocol (HTTP/1.1): Semantics and Content", RFC 7231, | ||||
| DOI 10.17487/RFC7231, June 2014, | ||||
| <https://www.rfc-editor.org/info/rfc7231>. | ||||
| * product *( RWS ( product / comment ) ) | [RFC7540] Belshe, M., Peon, R., and M. Thomson, Ed., "Hypertext | |||
| Transfer Protocol Version 2 (HTTP/2)", RFC 7540, | ||||
| DOI 10.17487/RFC7540, May 2015, | ||||
| <https://www.rfc-editor.org/info/rfc7540>. | ||||
| o Via [RFC7230], Section 5.7.1 | 7.3. URIs | |||
| * 1#( received-protocol RWS received-by [ RWS comment ] ) | [1] https://lists.w3.org/Archives/Public/ietf-http-wg/ | |||
| o Warning [RFC7234], Section 5.5 | [2] https://httpwg.github.io/ | |||
| * 1#warning-value | [3] https://github.com/httpwg/http-extensions/labels/header-structure | |||
| o Proxy-Authenticate [RFC7235], Section 4.3 | Appendix A. Changes | |||
| o WWW-Authenticate [RFC7235], Section 4.1 | ||||
| * 1#challenge | A.1. Since draft-ietf-httpbis-header-structure-01 | |||
| Appendix B. Changes | Replaced with draft-nottingham-structured-headers. | |||
| B.1. Since draft-ietf-httpbis-header-structure-00 | A.2. Since draft-ietf-httpbis-header-structure-00 | |||
| Added signed 64bit integer type. | Added signed 64bit integer type. | |||
| Drop UTF8, and settle on BCP137 [RFC5137]::EmbeddedUnicodeChar for | Drop UTF8, and settle on BCP137 ::EmbeddedUnicodeChar for h1-unicode- | |||
| h1-unicode-string. | string. | |||
| Change h1_blob delimiter to ":" since "'" is valid t_char | Change h1_blob delimiter to ":" since "'" is valid t_char | |||
| Author's Address | Authors' Addresses | |||
| Mark Nottingham | ||||
| Fastly | ||||
| Email: mnot@mnot.net | ||||
| URI: https://www.mnot.net/ | ||||
| Poul-Henning Kamp | Poul-Henning Kamp | |||
| The Varnish Cache Project | The Varnish Cache Project | |||
| Email: phk@varnish-cache.org | Email: phk@varnish-cache.org | |||
| End of changes. 212 change blocks. | ||||
| 442 lines changed or deleted | 435 lines changed or added | |||
This html diff was produced by rfcdiff 1.45. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ | ||||