Ghosts-Design and System Description
The base for this text is a draft written by Olle Palmgren and
Daniel Pargman, dated September 27, 1993. Transcription, editing and
comments by Fredrik Kilander [FK]. This version June 1, 1994.
Contents
Here we describe the design of the GHOSTS system.
The GHOSTS system is intended to cut down on information
overload when reading usenet news or email. It does this by filtering.
The filtering is designed to work on a sequential stream of messages,
as is the case of email. In the case of usenet news the situation is
slightly different. Instead of a stream of messages, we have a stream
of groups! Where each group in the group-stream contains a stream of
messages, very much like the stream of email messages.
The actual GHOSTS system consists of four parts:
- nnghost
- The usenet news filter. This program is invoked automatically
each time the user starts reading the modified nn program
distributed together with GHOSTS. It then monitors the stream
of messages within each group and applies the actions defined on
messages that it recognizes.
- mailghost
- The email filter. It's invoked each time the user receives email.
It filters the message stream before the user is notified that mail
has arrived. It handles the messages in the same way as the
news-filter.
- ruled
- The rule-editor. This program is used by the user to tell the
filters what to do and to which messges.
- grouped
- The group-editor. This program is used to filter groups. If can
also be used to subscribe and unsubscribe to groups and to get a
general overview of the available news-groups. The latter in
particular can be of great use to a novice user since the group
structure in usenet news is huge.
The user intercats only with the group and the rule in order to define
its behaviour. The filter (ghost) parts are completely transparent to
the user once they are set up, and the only thing he notices of the
presence is the effect of their work.
A possible advantage with GHOSTS is that a user of it can
continue to work with the same email or news system that he is
accustomed to. He doesn't have to relearn a new set of commands, a new
interface, and so on... in order to cut down the information flow to a
more managable level.
The design has been influenced by a number of considerations, some of
them are:
- It should be highly portable. The different parts (modules) of
the code should be as independent of each other as possible.
- The non-portable portions of the system should be isolated in
separate modules, easily replacable if the need for porting arises.
- The underlying engine should be isolated from the actual rules
and message types. This because we want the system's different parts
to be as self-contained as possible.
- It should be highly reusable. It should be easy to "tear" out a
part of the implementation and use it in another system.
- It should be as open to extension as possible. It should be easy
to add functionality to the programs within its current structure, ie
without having to redesign the programs.
We found that the object-oriented paradigm fit this bill to a high
degree. We chose to implement GHOSTS in C++
[Str80]. Some of the factors that affected our
decision were:
- The particular C++ compiler we used was the the GNU C++ compiler
g++, version 2.4.5. This compiler has a number of non-ANSI extensions
of C++. In order to make the code as portable as possible between
different compilers we avoided these extensions as far as possible.
- It was possible to use classes for information hiding, isolating
modules from the rest of the code and making them responsible for
their own functionality.
- The possibility for define public, protected
and private parts of the class interface in C++ allowed us to
define a class interface. This class interface defines the information
that is available from a class. The replacement of code inside the
class is transparent as long as the class interface stays unaltered.
This makes it possible to rewrite a class without affecting any other
classes.
- The flow of control in an object-oriented program becomes very
decentralized. Each class is responsible for its own behaviour and
data. If a class object wishes to perform an operation on another
class' data it has to ask the owner of the data to perform the
operation for it. This abstracts away a lof of book-keeping from the
algorithms used, yielding shorter and more succinct code.
The only parts where we had to let go of the above-mentioned design
considerations were in the coding of the visual user interface parts
for the group and rule editors.
The interface was constructed in Motif [Hel91].
Motif is a widget-set written in C for the construction of visual user
interfaces under the X Window System. Even though it is designed in an
object-oriented way, it was too much work to remodel that design in
C++. Instead we incapsulate the interface part in a class and let the
applications communicate with this class. This means that if another
interface is wanted, only one module must be rewritten. All the
interface-specific code is found in this class.
The filter interacts with a sequential stream of messages. It reads
and intercepts one message at a time from the stream. It then tries to
classify the message and, if allowed, perform some actions upon the
message. These actions could be to save the message in a folder,
discard the message, forward the message etc. If the filter fails to
classify the message it is passed unaltered to its original
destination.
In order to be able to make any judgement at all about the messages
the filter has a small expert system. The filter passes the message it
reads from the input stream to the expert system and asks it to
evaluate the message. The expert system evaluates the message
according to a set of user-defined rules. The expert system is also
responsible for applying actions in rules to the message.
The filter engine (the expert system at the moment) is constructed to
work on a generalized model of rules and messages. This makes it
possible to construct new rules, and new messages without changing the
filter engine.
This is implemented in C++ using virtual base classes and
inheritance. The actual rule-classes and message classes inherit an
interface from their virtual base classes. They have to provide their
own specific behaviour to the interface defined by their virtual base
class. When the engine operates on a specific instance of a rule or
message, it is the behaviour that is used. The engine sees the rules
and messages as objects of the virtual base class type. It doesn't
care about which actual type of rule or message it is working with.
The virtual base class implementation works fine as long as all
the necessary functionality required by the filter engine is provided
by the generic rule and message classes.
The news filter exists in a modified version of the nn program,
a reader for Usenet News. The nnghost program intercepts all nn's
requests for messages and filters them as described in section
The Filter. The nnghost program is
totally transparent for the user, the news server and the original
parts of nn. The filter implements most of its services by using
functions and message properties already provided in the news server
and nn. For example:
- The user enters a new newsgroup (within nn).
- The nn program asks the news server for a message header in the
group.
- The nnghost program intercepts the header, filters it and finds
that the appropriate action is to mark the message for reading.
- The nnghost program gives the message to nn with the information
that is was autoselected (i.e. nn believes that it was
flagged by nn's original selection mechanism).
- Steps 1 through 4 are then repeated until all message headers in
the newsgroup has been processed.
- The nn program displays all the message headers to the user.
What happens is that nn is kept in the belief that it interacts with
the user and the news server, when in reality it interacts with the
user and the nnghost process. The news server sees just another client,
such as tin, mxrn, rn, gnus or nn but the client in this
case was nnghost, acting as an invisible intermediary between the news
reader and the news server.
The email filter is notified as soon as there arrives mail that a
message is available.
(The invocation mechanism is most likely the .forward
file in the user's home directory. [FK])
It then reads and handles the message, as described in section
The Filter.
This program reads a rule file, displays them as a set of rules to the
user and prepares for editing. The user may browse through the rules,
change, delete or add completely new rules to the set.
The interface part of ruled is written using the Motif widget set. All
Motif-specific code is collected in a single interface class. This
class is responsible for all user interaction. If the interface is to
be changed, there is just one class to rewrite.
The rule editor is used to maintain rule sets for both the email
filter mailghost and the Usenet News filter
nnghost.
The Group Editor (grouped)
The group editor is an interactive, visually oriented editor for the
user's personal .newsrc
file. This file is used by almost all
Usenet News readers to maintain the user's position in the flow of
messages. The file contains entries which define the newsgroups the
user is subscribing to and which messages the user has seen in each
newsgroup.
The structure of the newsgroups forms a tree, not unlike Internet
domain-names or filename paths. The tree structure is a way to
classify newsgroups from general to specialized topics. The grouped
program visualizes this tree for the user and allows him to orient
spatially as well as conceptually. In particular, ruled offers the
possibility of hiding uninteresting groups from view, as well
as providing visual cues of the properties of a particular newgroup or
class of newsgroups. The user interacts with editor through the
traditional means: the pointing device and the keyboard.
Syntax for Rules and Messages
Rule Syntax
The structure of the rules is presently very simple, but it can easily
be extended by adding new production rules. We can probably use the
same rules for both news and email, since the format of a news message
and an email is so similar. The main point of difference between the
two alternatives is the actions. The actions should reflect the
actions that the user may perform manually.
The rules has the following structure:
RULE --> "rule" String
"if" TVS "then" ACTIONS "end"
TVS --> "(" TVS ")"
TVS --> TVS "and" TVS
TVS --> TVS "or" TVS
TVS --> STATEMENT "==" STATEMENT
TVS --> STATEMENT "!=" STATEMENT
STATEMENT --> String
STATEMENT --> COMMAND
COMMAND --> "field" String
COMMAND --> "body"
ACTIONS --> epsilon (The empty string? [FK])
ACTIONS --> ACTION ACTIONS
ACTION --> "save" String
ACTION --> "forward" String
Here's an example of a rule in the above syntax:
rule example
if (field == "C++")
then
save "c++.folder"
forward "friend@student.docs.uu.se"
end
Message Syntax
At the current development stage we regard the syntax of Usenet News
messages as a subset of the syntax for an email message. The only
significant difference being that a news message is not started by a
from:
line. The syntax used for email messages is
simplified in accordance to [Cro81]:
"Some mail-reading software systems may wish to perform only minimal
processing, ignoring the internal syntax of structured field-bodies
and treating them the same as unstructured field-bodies. Such software
will need only to distinguish:
- Header fields from the message body,
- Beginnings of fields from lines which continue fields,
- Field-names from field-contents.
The abbreviated set of syntactic rules which follows will suffice for
this purpose. It describes a limited view of messages and is a subset
of the syntactic rules provided in the main part of this
specification. One small exception is that the contents of
field-bodies consist only of text."
The syntax is as follows:
MESSAGE --> HEADER
MESSAGE --> HEADER CRLF BODY
HEADER --> epsilon
HEADER --> FIELD HEADER
FIELD --> FIELD-NAME : CRLF
FIELD --> FIELD-NAME : FIELD-BODY CRLF
FIELD-NAME --> Any chars except {CTLs, space and ':'}.
FIELD-BODY --> BODY
FIELD-BODY --> BODY CRLF LWSP FIELD-BODY
BODY --> epsilon
BODY --> TEXT BODY
TEXT --> Any chars except CR immediately followed by LF
References
Str80:missing
Hel91:missing
Cro81:missing