New features in version 04 of the mhtml IETF drafts

I have just submitted two new IETF drafts from the MHTML working group on sending HTML via e-mail.

The two new drafts are:

draft-ietf-mhtml-spec-04.txt

and

draft-ietf-mhtml-info-04.txt

You can download them from the following anonymous FTP URLs:

ftp://ftp.dsv.su.se/users/jpalme/draft-ietf-mhtml-spec-04.txt

ftp://ftp.dsv.su.se/users/jpalme/draft-ietf-mhtml-info-04.txt

This message gives a summary of the major changes to the documents. There are also a number of minor changes in wordings, etc., which are not documented in this message.

Changes in draft-ietf-mhtml-spec-04.txt:

Added text in section 1. Introduction:

An informational RFC [MHTML-INFO] will be published as a supplement to
this standard. The informational RFC will discuss implementation methods
and some implementation problems. Implementors are recommended to read
this informational RFC when developing implementations of the MHTML
standard.

Added text in section 7: Use of the Content-Type: Multipart/related:

In certain special cases this will not work if the original HTML
document contains URIs as parameters to objects and applets. In such a
case, it might be better to rewrite the document before sending it. This
problem is discussed in more detail in the informational RFC which will
be published as a supplement to this standard.

Added text in section 12: Security considerations:

Some WWW applications hide passwords and tickets (access tokens to
information which may not be available to anyone) and other sensitive
information in hidden fields in the web documents or in on-the-fly
constructed URLs. If a person gets such a document, and forwards it via
e-mail, the person may inadvertently disclose sensitive information.

Added text in section 14: References:

[MHTML-INFO] J. Palme: "Sending HTML in E-mail, an informational
supplement to RFC ???: MIME E-mail Encapsulation of
Aggregate HTML Documents (MHTML)", to be published as an
informational supplement to the MHTML standard.

Changes in draft-ietf-mhtml-info-04.txt:

The sections have been renumbered.

Added phrase in section 1. Abstract:

problems with rewriting of URIs

A whole new section 5:

5. Problems with rewriting URIs when copying HTML documents

Sending of HTML-formatted messages is based on the assumption that an
HTML documents, together with in-line objects like images, applets and
frames, can be copied into an e-mail message. Such copying may require
rewriting of URIs containing references between the different message
parts. The MHTML standard [MHTML] has been carefully prepared to allow
existing web pages to be copied without such rewriting, through the use
of the Content-Base and Content-Location MIME content heading fields.

There is however a problem if the source HTML document contains relative
URIs in parameters to objects and applets, such as in the example below:

From: foo1@bar.net
To: foo2@bar.net
Subject: A simple example
Mime-Version: 1.0
Content-Type: multipart/related; boundary="boundary-example-1";
type=Text/HTML
Content-Base: "http://www.ietf.cnri.reston.va.us"

--boundary-example 1
Content-Type: Text/HTML; charset=US-ASCII

... text of the HTML document...
<OBJECT
CLASSID = "clsid:5220cb21-c88d-11cf-b347-00aa00a28331">
<PARAM NAME="imageurl" VALUE="image.gif">
</OBJECT>
...etc...

--boundary-example-1
Content-Location: "image.gif"
Content-Type: IMAGE/GIF
Content-Transfer-Encoding: BASE64

R0lGODlhGAGgAPEAAP/////ZRaCgoAAAACH+PUNvcHlyaWdodCAoQykgMTk5
..etc...

--boundary-example-1--

Only the object might know that the imageurl parameter is a relative
URI.
It's nearly impossible for the HTML parser to understand that the
parameter is a relative URI. Simply searching for "image.gif" is not
robust, as the string "image.gif" may be used elsewhere. URIs in scripts
can also have similar problems.

One might envisage even more difficult cases, an applet might take a
parameter "subject" and another parameter "range" and when
subject="auto" and range="1-5" it could compute, and try to use
auto1.gif, auto2.gif ... auto5.gif as relantive URLs.

Some implementation methods described in chapter 4 above, for example
method 2 described in chapter 4.2, may require rewriting of the URIs in
the HTML document.

There is no perfect solution to this problem.

One way of alleviating the problem is to produce the original document
using only absolute URIs, preferably of the CID type, since they are
more easily identifiable.

Another way of alleviating the problem is if to make all URIs and
Content-Locations into simple relative URIs containing file names only
(without paths, preferably using a file name format common to most
platforms, i.e. 1-6 ascii letters or digits, a period, and 1-3 extension
ascii letters or digits). An implementation using method 2 described in
chapter 4.2 above can then just store the parts as files in an empty
directory on the recipient computer with the Content-Locations as file
names, and then turn the start HTML file over to a web browser, and need
not rewrite the URIs at all. This simple variant of use of the MHTML
standard is probably most robust, and those implementors who can control
the production of the HTML documents to be sent as e-mail are thus
recommended to use this variant.


I especially want to thank Lewis Geer, Ed Levinson and Larry Masinter for help in preparing the new text.