A Proposal for Extending Eudora
with Thread Support

Eudora is a very capable mail client, but its facilities for organizing, ordering and searching of the e-mail data base could become more capable. This proposal covers part of this, the introduction of thread support to Eudora. Even though this proposal is based on Eudora, its ideas might be implemented in other mail clients.

By Professor Jacob Palme,
Stockholm University and KTH Technical University
E-mail: jpalme@dsv.su.se

Table of contents

What is a thread
Proposed change in the Eudora user interface
Implementation

What is a thread?

A thread is a set of messages, which are responses to each other. There are different more exact definitions of a thread. I suggest, here, the following more exact definition: A starting message is a message which does not contain any In-Reply-To, References or Supersedes header. A thread is the set of all messages which can be reached by recursively following In-Reply-To, References and Supersedes links from a starting message. Note that with this definition, the same message can belong to more than one thread, if it is a response to different messages in different threads. Note that if a message has In-Reply-To to two messages from different threads, this will not cause these two threads to merge into one thread. Note also that with this definition, a change in the value of the Subject header will not break a thread.
   There are other possible definitions of a thread. Since some mailers do not produce In-Reply-To, References or Supersedes headers, some people prefer to recognize threads by a common subject. Since, however, most mailers, including Eudora, does produce such headers, I suggest the definition of a thread given above.

Back to the table of contents

Proposed change in the Eudora user interface

Display of a message which belongs to a thread

Here is an example of how a message is displayed to the user by Eudora (Macintosh version) today:

Example of an Eudora message

Here is a proposed change to this to support threads:

Eudora message with added clickable headers like In-Reply-To and Replied-In

When the user clicks on any of the blue, underscored links above, a new window is opened, displaying the referenced message.
   In addition to this, a new command might be added to one of the menus, with the name Show Thread. This command would create a new, temporary mailbox, listing all messages in the thread, to which the message in the topmost window belongs, and open a window listing the contents of this temporary mailbox. The listed messages would be copied to the new mailbox, not removed from their original place. This temporary mailbox would be purged when its window is closed. A user who wants to copy the thread contents to an existing or new mailbox, could perform the command Select All and Transfer from this temporary mailbox.

A Note About the Supersedes Header

A simple and good way of handling the Supersedes header is to handle it in exactly the same way as the In-Reply-To and References header. If it is implemented in that way, it will not create any more security problems than In-Reply-To and References. It will just be a new kind of link. Whether the content of a Superseding message really does supersede the superseded message will be evaluated by the reader in the same way as the reader evaluates whether the content of a replying message really is a reply to the replied-to message.
One might also mark a superseded message with a new value in the "Status" column, "Superseded". Such messages might then be handled as seen messages, i.e. not shown as new. The recipient will still see the "Superseded-In" header in the superseding message, and can then click on this to see the previous version. This implementation is a little less secure, since it makes it possible for a malicious user to send a false superseding message, and in that way suppress the viewing of the superseded message.

Since I know that Pete Resnick is concerned about the security risks with this second way of implementing Supersedes, I suggest that those, who share his concerns, should choose to implement Supersedes using method 1, rather than method 2.

Value for Users

The value for Users with thread support is that users will easily be able to see the position of a message in its thread. Of special value is that users can see if other people have replied to a message, before they write their own reply, and that users can see if a message has been superseded, before reading the message, believing it to be still valid. It will also be easy for users to scan a thread.
   Today, in Eudora, users can scan a thread by sorting a mailbox by subject, but this is not very reliable, because (i) a thread may contain messages which have been filtered to different mailboxes, (ii) the content of the subject is often changed within a thread.

Back to the table of contents

Implementation

Here is a suggestion for a simple way to implement this in Eudora.

A Simple New Message-ID Data Base

The handling of threads requires a new data base. This data base, however, can be very simple. All that is needed is to enter this data base with a Message-ID value as key, and get back a list of all mailboxes, and positions within those mailboxes, where a message with this Message-ID occurs. Such a data base, because of its simplicity, is very easy to implement, I would suggest to hash the Message-ID to a value between 1 and 8191. This hash value would refer to a bucket, large enough to store 10 message references. The size of the data base would then be about one megabyte. If there are more than 10 messages in the same bucket, the data base file could be extended with overflow buckets, and a full bucket could end with the number of its overflow bucket, with 0 to indicate that there is no overflow bucket for this hash value.

Storage of Thread Information in Messages

For each message, there should, where the message itself is stored, be stored the Message-IDs of messages which refer to this message. (Message-IDs of the messages, which this message refers to, is already available in header-fields like "In-Reply-To" and "References", according to IETF standards.) The new header fields "Replied-By", "Referenced-By" and "Superseded-By" could either be stored in the normal message header, or in an auxiliary area related to each message. (Eudora already has such an auxiliary area.) They would not be sent out if the message is resent or forwarded, since their values would not be complete, new referencing messages may arrive at a later time.

A Note about the References Header

The References header usually contains not only the directly referenced message, but all message in the path from the referenced message to the start of the thread. One might not show all these messages in the header, only show the last values in this header. A user who wants to see the whole thread, can instead use the new suggested command Show Thread.

Updating of Thread Information

Whenever a new message arrives and is stored in a mailbox, the Message-ID data base must be updated. When a message is moved, copied or filtered between mailboxes, the Message-ID data base must also be updated. When a mailbox is rebuilt, information about its messages in the Message-ID data base should be checked, and, if needed, updated. When a message is deleted, information about it must be removed from the Message-ID data base. When a mailbox is deleted, information about all its messages must be removed from the Message-ID data base.
   There is a risk that a user will delete a mailbox by deleting its file, instead of by using the command in Eudora for deleting mailboxes. If this occurs, the Message-ID data base will contain references to non-existing mailboxes. These references might be purged, when they are found, or by a background purging procedure expected now and then. The disadvantage with having these entries is only that the data base becomes a little too large, so there is no need for continuous purging to keep the data base slimmed all the time.
   I suggest that the In-Reply-To, References, Supersedes, Replied-In, Referenced-In and Superseded-By headers are not reduced or removed when the message they refer to is deleted. It is of value to the user, to know the information in these headers, even if the referred-to messages are not any more available. When a user clicks on a reference to a non-existing message, the user would get an error message "This message has been deleted".

Messages arriving in the wrong order

Sometimes, not very often, a message may arrive before another message, which the first message references. To cater for this special case, I suggest a very small data base with a list of the Message-ID-s of such missing messages. This data base could be of fixed size, maybe a maximum of ten records, with deletion of the oldest entry, if a new entry arrives and the data base is full. When a message arrives, which is in this data base, then its entry is removed from the data base.

Duplicate messages

A by-effect of the Message-ID data base is that duplicate arrivals of the same message could be noted. If such a duplicate arrives, it is usually not exactly identical, at least the Received headers are probably different, since the duplicates may have arrived via different routes. Also other headers might be different, such as Resent-headers or the comment in the From header (which Eudora itself changes when resending messages). The simplest way to handle duplicate messages is to treat them in the same way as the duplicates which will occur if the user copies a message from one mailbox to another, i.e. just note in the Message-ID data base where the different copies occur. One risk with this implementation, is that when a user clicks on a thread link, the user may be shown the wrong copy of this message. This risk cannot be avoided, since if there are more than one message with the same Message-ID, and a new message arrives with, for example, "In-Reply-To" referring to this Message-ID, there is no way of finding out which of them the "In-Reply-To" refers to.
   This is only a problem when two messages arrive with the same Message-ID but with substantially different content. This will probably happen so seldom that there is no need to cater for this case. A perfect implementation might handle this case by creating a new link "Same-As" between the not-quite-identical copies. This link would be a clickable header field in the same way as the other thread header fields described earlier in this proposal.

More about message threading in general:
http://dsv.su.se/jpalme/ietf/message-threading.html

More about the Supersedes header:
http://dsv.su.se/jpalme/ietf/jp-ietf-home.html#newfields

Back to the table of contents

	By Professor Jacob Palme, Stockholm University and KTH Technical University E-mail: jpalme@dsv.su.se
Table of contents
	What is a thread Proposed change in the Eudora user interface Implementation
What is a thread?
	A thread is a set of messages, which are responses to each other. There are different more exact definitions of a thread. I suggest, here, the following more exact definition: A starting message is a message which does not contain any In-Reply-To, References or Supersedes header. A thread is the set of all messages which can be reached by recursively following In-Reply-To, References and Supersedes links from a starting message. Note that with this definition, the same message can belong to more than one thread, if it is a response to different messages in different threads. Note that if a message has In-Reply-To to two messages from different threads, this will not cause these two threads to merge into one thread. Note also that with this definition, a change in the value of the Subject header will not break a thread. There are other possible definitions of a thread. Since some mailers do not produce In-Reply-To, References or Supersedes headers, some people prefer to recognize threads by a common subject. Since, however, most mailers, including Eudora, does produce such headers, I suggest the definition of a thread given above. Back to the table of contents
Proposed change in the Eudora user interface
	Display of a message which belongs to a thread Here is an example of how a message is displayed to the user by Eudora (Macintosh version) today:

	Here is a proposed change to this to support threads:

	When the user clicks on any of the blue, underscored links above, a new window is opened, displaying the referenced message. In addition to this, a new command might be added to one of the menus, with the name Show Thread. This command would create a new, temporary mailbox, listing all messages in the thread, to which the message in the topmost window belongs, and open a window listing the contents of this temporary mailbox. The listed messages would be copied to the new mailbox, not removed from their original place. This temporary mailbox would be purged when its window is closed. A user who wants to copy the thread contents to an existing or new mailbox, could perform the command Select All and Transfer from this temporary mailbox. A Note About the Supersedes Header A simple and good way of handling the Supersedes header is to handle it in exactly the same way as the In-Reply-To and References header. If it is implemented in that way, it will not create any more security problems than In-Reply-To and References. It will just be a new kind of link. Whether the content of a Superseding message really does supersede the superseded message will be evaluated by the reader in the same way as the reader evaluates whether the content of a replying message really is a reply to the replied-to message. One might also mark a superseded message with a new value in the "Status" column, "Superseded". Such messages might then be handled as seen messages, i.e. not shown as new. The recipient will still see the "Superseded-In" header in the superseding message, and can then click on this to see the previous version. This implementation is a little less secure, since it makes it possible for a malicious user to send a false superseding message, and in that way suppress the viewing of the superseded message. Since I know that Pete Resnick is concerned about the security risks with this second way of implementing Supersedes, I suggest that those, who share his concerns, should choose to implement Supersedes using method 1, rather than method 2. Value for Users The value for Users with thread support is that users will easily be able to see the position of a message in its thread. Of special value is that users can see if other people have replied to a message, before they write their own reply, and that users can see if a message has been superseded, before reading the message, believing it to be still valid. It will also be easy for users to scan a thread. Today, in Eudora, users can scan a thread by sorting a mailbox by subject, but this is not very reliable, because (i) a thread may contain messages which have been filtered to different mailboxes, (ii) the content of the subject is often changed within a thread. Back to the table of contents
Implementation
	Here is a suggestion for a simple way to implement this in Eudora. A Simple New Message-ID Data Base The handling of threads requires a new data base. This data base, however, can be very simple. All that is needed is to enter this data base with a Message-ID value as key, and get back a list of all mailboxes, and positions within those mailboxes, where a message with this Message-ID occurs. Such a data base, because of its simplicity, is very easy to implement, I would suggest to hash the Message-ID to a value between 1 and 8191. This hash value would refer to a bucket, large enough to store 10 message references. The size of the data base would then be about one megabyte. If there are more than 10 messages in the same bucket, the data base file could be extended with overflow buckets, and a full bucket could end with the number of its overflow bucket, with 0 to indicate that there is no overflow bucket for this hash value. Storage of Thread Information in Messages For each message, there should, where the message itself is stored, be stored the Message-IDs of messages which refer to this message. (Message-IDs of the messages, which this message refers to, is already available in header-fields like "In-Reply-To" and "References", according to IETF standards.) The new header fields "Replied-By", "Referenced-By" and "Superseded-By" could either be stored in the normal message header, or in an auxiliary area related to each message. (Eudora already has such an auxiliary area.) They would not be sent out if the message is resent or forwarded, since their values would not be complete, new referencing messages may arrive at a later time. A Note about the References Header The References header usually contains not only the directly referenced message, but all message in the path from the referenced message to the start of the thread. One might not show all these messages in the header, only show the last values in this header. A user who wants to see the whole thread, can instead use the new suggested command Show Thread. Updating of Thread Information Whenever a new message arrives and is stored in a mailbox, the Message-ID data base must be updated. When a message is moved, copied or filtered between mailboxes, the Message-ID data base must also be updated. When a mailbox is rebuilt, information about its messages in the Message-ID data base should be checked, and, if needed, updated. When a message is deleted, information about it must be removed from the Message-ID data base. When a mailbox is deleted, information about all its messages must be removed from the Message-ID data base. There is a risk that a user will delete a mailbox by deleting its file, instead of by using the command in Eudora for deleting mailboxes. If this occurs, the Message-ID data base will contain references to non-existing mailboxes. These references might be purged, when they are found, or by a background purging procedure expected now and then. The disadvantage with having these entries is only that the data base becomes a little too large, so there is no need for continuous purging to keep the data base slimmed all the time. I suggest that the In-Reply-To, References, Supersedes, Replied-In, Referenced-In and Superseded-By headers are not reduced or removed when the message they refer to is deleted. It is of value to the user, to know the information in these headers, even if the referred-to messages are not any more available. When a user clicks on a reference to a non-existing message, the user would get an error message "This message has been deleted". Messages arriving in the wrong order Sometimes, not very often, a message may arrive before another message, which the first message references. To cater for this special case, I suggest a very small data base with a list of the Message-ID-s of such missing messages. This data base could be of fixed size, maybe a maximum of ten records, with deletion of the oldest entry, if a new entry arrives and the data base is full. When a message arrives, which is in this data base, then its entry is removed from the data base. Duplicate messages A by-effect of the Message-ID data base is that duplicate arrivals of the same message could be noted. If such a duplicate arrives, it is usually not exactly identical, at least the Received headers are probably different, since the duplicates may have arrived via different routes. Also other headers might be different, such as Resent-headers or the comment in the From header (which Eudora itself changes when resending messages). The simplest way to handle duplicate messages is to treat them in the same way as the duplicates which will occur if the user copies a message from one mailbox to another, i.e. just note in the Message-ID data base where the different copies occur. One risk with this implementation, is that when a user clicks on a thread link, the user may be shown the wrong copy of this message. This risk cannot be avoided, since if there are more than one message with the same Message-ID, and a new message arrives with, for example, "In-Reply-To" referring to this Message-ID, there is no way of finding out which of them the "In-Reply-To" refers to. This is only a problem when two messages arrive with the same Message-ID but with substantially different content. This will probably happen so seldom that there is no need to cater for this case. A perfect implementation might handle this case by creating a new link "Same-As" between the not-quite-identical copies. This link would be a clickable header field in the same way as the other thread header fields described earlier in this proposal. More about message threading in general: http://dsv.su.se/jpalme/ietf/message-threading.html More about the Supersedes header: http://dsv.su.se/jpalme/ietf/jp-ietf-home.html#newfields Back to the table of contents

A Proposal for Extending Eudora with Thread Support

Table of contents

What is a thread?

Proposed change in the Eudora user interface

Display of a message which belongs to a thread

A Note About the Supersedes Header

Value for Users

Implementation

A Simple New Message-ID Data Base

Storage of Thread Information in Messages

A Note about the References Header

Updating of Thread Information

Messages arriving in the wrong order

Duplicate messages

A Proposal for Extending Eudora
with Thread Support