By Jacob Palme, e-mail: jpalme@dsv.su.se, at the research group for CMC (Computer Mediated Communication) in the Department of Computer and Systems Sciences at Stockholm University and KTH.
MHTML is the IETF working group for developing standards for sending HTML-formatted text in e-mail.
5.1 Do we by exact matches mean case sensitive matches and no resolution like "file%20name" to "file name". Note: This should not be any problem if standards are adhered to, since spaces are not legal in URLs. However, it is accepted practice for Web browsers to accept lots of kinds of illegal URLs, and the two most widely used products both accept spaces in URLs in hyperlinks in HTML documents. How should such a URL be handled in the Content-Location statement. Should the space be converted to %20 (then the words about exact matching in mhtml-spec chapter 8.2.2 most be changed) or should it be put in illegal format in the Content-Location header, too?
The MHTML proposed standard (RFC 2110) at present says that URL-s in e-mail headers are to be encoded using the encoding method of RFC 2017, and RFC 2017 refers to RFC 1738 which specifies that illegal characters in URL are to be encoded using the % method, for example a space is encoded as %20. Ed Levinson has proposed that the encoding method of RFC 2047 should be used instead in the special case where RFC 1738 encoding would make it impossible to make the exact match required by RFC 2110. The advantage with this is that when the RFC 2047 encoding is reversed, we get back the same string, and can do the exact match. If RFC 2017/RFC 1738 encoding is used, reversal may reverse too much, so that the exact match will not work.
5.2 Does this apply only to relative Content-Locations without any Content-Base? Should we say something about exactness of matchings when URL-s are resolved using a Content-Base? If so, what?
5.3 What about the case where the URL is relative and unresolvable in the header, but absolute in the HTML text. The present spec does not say what should be done in that case.
Assume you have a HTML document which contains the following
element:
<IMG SRC="file name.gif">
and the owner of this HTML document requests that it is sent by
e-mail.
How should the e-mail look like in this case?
(a)
Content-Type: Text/HTML <IMG SRC="file name.gif"> Content-Type: Image/GIF Content-Location: "file name.gif"
(b)
Content-Type: Text/HTML <IMG SRC="file%20name.gif"> Content-Type: Image/GIF Content-Location: "file%20name.gif"
(c)
Content-Type: Text/HTML <IMG SRC="file name.gif"> Content-Type: Image/GIF Content-Location: "file%20name.gif"
(a) is not in agreement with RFC 2017, which RFC 2110 refers to, so if we choose (a), RFC 2110 or RFC 2017 must be changed.
(b) means you have to edit the HTML text before sending it, which is not so nice, since you are then opening a big can of worms: Which corrections of faulty HTML should you correct before sending it via e-mail?
(c) requires change in the text about "exact match" in RFC 2110.
If there is both a Content-Base and a Content-Location header, which of them should take precedence in resolving URL-s in the HTML content?
Should the Content-Base and Content-Location be allowed in cases where they do not influence functionality, as a way of informing the reader that a body part was taken from a certain web location?
Any reason to remove this passage in RFC 2110 section 4.1:
These two headers may occur both inside and outside of a multipart/related part.
JP comment: The statement is true. The specific usage of Content-Base and Content-Location described in RFC 2110 SHOULD only occur inside Multipart/related, but these two headers can also occur as information to the reader that the body part is also available at a certain URL. And since Text/html can occur outside of Multipart/related (Multipart/related is only needed when the Text/html contains links to other body parts in the same message), Content-Base and Content-Location can also occur outside of Multipart/related, and in my opinion this text should not be removed. Possibly we could change the paragraph to the following.
These two headers may occur both inside and outside of a multipart/related part, but their usage for handling HTML links between body parts in a message SHOULD only occur inside Multipart/related.
Should we allow the same Content-Location on two body parts, if they resolve to different URLs (last paragraph of section 7 in mhtml-spec).
Suggestion: Yes.
Suppose there are two body parts in a multipart/related. One of them has a Content-Base statement, the other does not have.
Example:
Part 1: Content-Type: Text/html Content-Base: http://foo.net <IMG SRC="picture.gif"> Part 2: Content-Type: Image/gif Content-Location: picture.gif
In this case, should relative-to-absolute conversion take place on "picture.gif" in Part 1, so that it will not match the relative URL in Part 2?
Should the standard include the new chapter 13. Robustness Principle as suggested in draft-ietf-mhtml-spec-07 or should this chapter be put into the informational draft draft-ietf-mhtml-info or not be published at all.
Note: The present work in the IETF DRUMS working group, where
this kind of information, under the title "4. Obsolete Syntax" is included in the standard-to-be draft-ietf-drums-msg-fmt.
Every single subchapter in chapter 13. Robustness Principle is controversial and we should decide for or against having it (this applies whether this chapter goes into the standard or the informational document).
Should liberal implementations accept input where the type parameter is wrong or omitted?
Should liberal implementations accept input where the type parameter is not quoted?
Should liberal implementations accept input where the start parameter is not quoted with angle brackets?
Should liberal implementations accept and try to use, if necessary, Content-Base and Content-Location headers in multipart headings.
Any reason to change this passage in RFC 2110 section 4.1:
These two headers are valid only for exactly the content heading or message heading where they occurs and its text. They are thus not valid for the parts inside multipart headings, and are thus meaningless in multipart headings.
Can some of the implementors, who have executable code which can check examples, provide better examples? By better examples I mean examples with both are correct and which clarify the controversial points.
Are we aiming at revising RFC 2100 into a revised proposed standard or into a draft standard?
Is it time now to publish draft-ietf-mhtml-info-06.txt as an informational RFC?
Is there any need for a discussion about the charter of the working group, and about whether the working group should be designated as "active" or "inactive"?
The present MHTML standard (RFC 2110 and RFC 2112 say that if the root body part of a multipart/related is of type multipart/alternative, then the type parameter of multipart/related should be "multipart/alternative". It has been suggested, that this be changed, so that the type parameter tells what is the main part of the multipart/alternative. One solution might be to change the syntax of the type parameter so that it can for example have the value "multipart/alternative;text/html" to indicate that the root is a multipart/alternative whose primary alternative is of type text/html.