Eudora is a very capable mail client, but its facilities for organizing, ordering and searching of the e-mail data base could become more capable. This proposal covers part of this, the introduction of thread support to Eudora. Even though this proposal is based on Eudora, its ideas might be implemented in other mail clients. |
By Professor Jacob Palme, |
||
Table of contents |
||
What is a thread? |
||
A thread is a set of messages,
which are responses to each other. There are different more exact definitions of
a thread. I suggest, here, the following more exact definition: A starting message
is a message which does not contain any In-Reply-To, References or Supersedes header.
A thread is the set of all messages which can be reached by recursively following
In-Reply-To, References and Supersedes links from a starting message. Note that with
this definition, the same message can belong to more than one thread, if it is a
response to different messages in different threads. Note that if a message has In-Reply-To
to two messages from different threads, this will not cause these two threads to
merge into one thread. Note also that with this definition, a change in the value
of the Subject header will not break a thread. |
||
Proposed change in the Eudora user interface |
||
Display of a message which belongs to a threadHere is an example of how a message is displayed to the user by Eudora (Macintosh version) today: |
||
|
||
Here is a proposed change to this to support threads: | ||
|
||
When the user clicks on any of
the blue, underscored links above, a new window is opened, displaying the referenced
message. A Note About the Supersedes Header
Since I know that Pete Resnick is concerned about the security risks with this second way of implementing Supersedes, I suggest that those, who share his concerns, should choose to implement Supersedes using method 1, rather than method 2. Value for UsersThe value for Users with thread
support is that users will easily be able to see the position of a message in its
thread. Of special value is that users can see if other people have replied to a
message, before they write their own reply, and that users can see if a message has
been superseded, before reading the message, believing it to be still valid. It will
also be easy for users to scan a thread. |
||
Implementation |
||
Here is a suggestion for a simple way to implement this in Eudora. A Simple New Message-ID Data BaseThe handling of threads requires a new data base. This data base, however, can be very simple. All that is needed is to enter this data base with a Message-ID value as key, and get back a list of all mailboxes, and positions within those mailboxes, where a message with this Message-ID occurs. Such a data base, because of its simplicity, is very easy to implement, I would suggest to hash the Message-ID to a value between 1 and 8191. This hash value would refer to a bucket, large enough to store 10 message references. The size of the data base would then be about one megabyte. If there are more than 10 messages in the same bucket, the data base file could be extended with overflow buckets, and a full bucket could end with the number of its overflow bucket, with 0 to indicate that there is no overflow bucket for this hash value. Storage of Thread Information in MessagesFor each message, there should, where the message itself is stored, be stored the Message-IDs of messages which refer to this message. (Message-IDs of the messages, which this message refers to, is already available in header-fields like "In-Reply-To" and "References", according to IETF standards.) The new header fields "Replied-By", "Referenced-By" and "Superseded-By" could either be stored in the normal message header, or in an auxiliary area related to each message. (Eudora already has such an auxiliary area.) They would not be sent out if the message is resent or forwarded, since their values would not be complete, new referencing messages may arrive at a later time. A Note about the References HeaderThe References header usually contains not only the directly referenced message, but all message in the path from the referenced message to the start of the thread. One might not show all these messages in the header, only show the last values in this header. A user who wants to see the whole thread, can instead use the new suggested command Show Thread. Updating of Thread InformationWhenever a new message arrives
and is stored in a mailbox, the Message-ID data base must be updated. When a message
is moved, copied or filtered between mailboxes, the Message-ID data base must also
be updated. When a mailbox is rebuilt, information about its messages in the Message-ID
data base should be checked, and, if needed, updated. When a message is deleted,
information about it must be removed from the Message-ID data base. When a mailbox
is deleted, information about all its messages must be removed from the Message-ID
data base. Messages arriving in the wrong orderSometimes, not very often, a message may arrive before another message, which the first message references. To cater for this special case, I suggest a very small data base with a list of the Message-ID-s of such missing messages. This data base could be of fixed size, maybe a maximum of ten records, with deletion of the oldest entry, if a new entry arrives and the data base is full. When a message arrives, which is in this data base, then its entry is removed from the data base. Duplicate messagesA by-effect of the
Message-ID data base is that duplicate arrivals of the same message
could be noted. If such a duplicate arrives, it is usually not exactly
identical, at least the Received headers are probably different, since
the duplicates may have arrived via different routes. Also other headers
might be different, such as Resent-headers or the comment in the From
header (which Eudora itself changes when resending messages). The simplest
way to handle duplicate messages is to treat them in the same way as
the duplicates which will occur if the user copies a message from one
mailbox to another, i.e. just note in the Message-ID data base where
the different copies occur. One risk with this implementation, is that
when a user clicks on a thread link, the user may be shown the wrong
copy of this message. This risk cannot be avoided, since if there are
more than one message with the same Message-ID, and a new message arrives
with, for example, "In-Reply-To" referring to this Message-ID,
there is no way of finding out which of them the "In-Reply-To"
refers to. More about message threading in general: More
about the Supersedes header: |