From: Bruce Lilly To: Jacob Palme Date: Fri, 24 Sep 2004 09:06:48 -0400 Jacob Palme wrote: > I have received e-mail from an implementor, who has noted a > discrepancy between RFC 2110 and 2112 regarding the syntax of > the start parameter to the Content-Type: Multipart/related > defined in RFC2112 and used in some examples in RFC 2110. > > RFC2112 says that there should be angle brackets around the > value of the start parameter, e.g. > > Content-Type: Multipart/related; boundary="boundary-example- > 1"; type=Text/HTML; start= > > but RFC2110 has some examples without these angle brackets, > e.g. > > Content-Type: Multipart/related; boundary="boundary-example- > 1"; type=Text/HTML; start=foo3*foo1@bar.net > > My conclusion is that since RFC2112 is the document which > defines this parameter, it should take precedence and RFC2110 > is incorrect in its examples. However, a good implementor > should certainly strip these angle brackets from all Message- > IDs (wherever they occur) before comparing them with other > Message-IDs or using them in any way. A good implementor > should also accept the incorrect syntax without these angle > brackets in what it receives, but always use the angle > brackets in what it produces. > > With this message I only want to check that other e-mail > experts agree with this. First, both RFC 2110 and 2112 are obsolete, 2110 having been superseded by 2557 and 2112 having been obsoleted by RFC 2387. Second, there are several different types of things being discussed: Content-ID (RFC 2045) Message-ID (RFCs 822/2822) start parameters (RFCs 2387/2045/2231 and the RFC editor errata page) Related constructs include cid and mid URIs (RFC 2392). RFC 2387 goes to some length to clarify the issue described; the start parameter is supposed to include the angle brackets which are an integral part of the RFC 822 msg-id construct. Content identifiers and message identifiers use similar syntax. However they serve distinct purposes. Prior to RFC 2822, the syntax was identical, viz. RFC 822 msg-id, which in turn was identical syntax to an RFC 822 route-addr with no route, i.e. an angle-bracketed addr-spec (the latter consisting of a local-part, '@', and a domain). RFC 2822 defines msg-id differently, using id-left and id-right, merely recommending that id-right be a domain. Having said that, all of the other RFCs mentioned above use RFC 822 as their basis, not 2822. There are implications for comparisons (below). Temporarily leaving aside "strip these angle brackets", there are several issues that should be taken into consideration when comparing identifiers: * domain names are case-insensitive, so "<1234@foo.example.net>" is semantically identical to "<1234@FoO.ExAmPlE.nEt>". RFC 2822 presents a problem here, because a receiver can never be sure whether or not an RFC 2822 id-right is a case- insensitive domain name or something else (which might be case-sensitive). * local-parts and domain literals need to be canonicalized w.r.t. quoting conventions prior to comparisons. The following are all semantically identical: (canonical form) <"foo.bar"@[1\.2.3.4]> <"f\oo.bar"@[1\.2.3.4]> <"f\oo\.bar"@[1\.2.3.4]> <"f\oo\.bar"@[1\.\2.3.4]> * if one or more identifiers being compared are in a parameter of a MIME Content-Type or Content-Disposition field must reassemble any such parameter fragments, remove any RFC 2231- specific character encoding and/or quoting present, convert to a common charset, and possibly consider specified language; paying particular attention to the published ( http://www.rfc-editor.org/cgi-bin/errata.pl ) errata for RFC 2231 and any other relevant RFCs. * if one or more identifiers was obtained from a URI, any URI-encoding (RFC 2396) must be undone prior to comparison * identifier syntax (modulo RFC 2822 introductions) and the context in which identifiers appear, generally permit comments, whitespace, and line-folding, which should be removed prior to comparison Now, regarding "strip these angle-brackets": * if an implementation chooses to strip the angle brackets which are an integral part of a msg-id as used in Content-ID and Message-ID fields and in "start" parameters (but NOT) in CID or MID URIs), it must be done carefully and properly, not with reckless abandon by amateurish programmers. Note that either '<' or '>' or '@' or any other special character may appear in a local-part if quoted using either a quoted-string or qpair backslash quoting, and do NOT signal the end of the identifier and must NOT be stripped (though quoting must be canonicalized for comparison as noted above). * handling of the angle brackets must also be performed with due consideration to security issues, as these characters may have special meaning to some library functions, and there may well be security implications. In the specific case of CID or MID URIs, the angle brackets are omitted. When comparing a CID or MID URI to an identifier obtained from a different source (e.g. when comparing an MID URI to a Message-ID header field body) an implementation could either properly and carefully strip the delimiting angle brackets (ONLY!) from the non-CID/MID identifier or add brackets to the CID/MID-derived identifier. My personal recommendation for such comparisons would be to reassemble fragments, undo quoting and encoding, carefully and correctly separate the identifier into local-part and domain, checking for correct identifier syntax (exactly one '@', which incidentally in very old messages might be " at "), carefully and correctly canonicalize the local-part and domain, then perform a case-insensitive comparison of the domains and a case-sensitive comparison of the local-parts. Specifically for identifiers, charset and language can probably be ignored in the event that one identifier is obtained from a parameter (because identifiers in field bodies (excepting parameters) are always in a subset of US- ASCII which is invariant across charsets likely to be encountered and have no language. I would verify that the angle brackets were present where required, and not present where forbidden (e.g. after handling the considerations mentioned above, if there is a '>' at the end of the domain part, somebody fouled up badly (or RFC 2822 syntax is being used) because a domain name never contains that character.