Choice of Coding Format for
Senior Online

By: Jacob Palme, Stockholm University and KTH Technical University
E-mail: jpalme@dsv.su.se
Last change: 99-04-04
File name: coding-format-choice.html

An important, and difficult, choice is the selection of coding format and base web protocol for the communication between servers in Senior Online. Two major such types of communication are envisaged:

  1. The protocol for communication between groupware servers
  2. The protocol for communication between groupware servers and portal servers

Here is a short description of the choices.

Table of contents

Coding format choices

Note: Some of these formats can be combined. For example, the e-mail formats can be used for coding of the actual text of messages, combined with the other formats for other information.

Format

Description

MIME The standard format for complex e-mail messages, where the body can be split recursively into multiple body parts.
MFORM = Multipart/formdata Variant of MIME, one of the formats used, when a web user fills in a form in a web page and pushes the SEND button.
MHTML = Multipart/alternative, Text/html and Multipart/related Variant of MIME, the commonly used format for sending HTML-formatted messages via e-mail. Used by KOM 2000 when sending messages from KOM 2000 to e-mail. (? Web for Groups probably also uses this format in communication with e-mail?)
XML A currently very popular format, strongly supported by IBM and Microsoft, for sending structured information on the Internet. Good for complex structures, not so good for binary information (like pictures or attachments)
ASN.1 A complex and powerful binary format, used by LDAP.
LDAP The currently most popular format for communication with directory systems. Good for complex structures and for distributed directory data bases. Uses ASN.1.
LDIF A variant of LDAP with textual, instead of binary, encoding.
RFC822 header format A simple format common in many protocols, including e-mail headers and HTTP headers.
Corba A "remote procedure call" protocol for communication between program modules on different servers, written in common programming languages.


As an aid in selecting this format, here is a table of choices and their pros and cons. Question marks indicate that I do not know or am not sure.

Format:

MFORM

MIME

XML

LDAP

LDIF

RFC822

Corba

Easy to produce manually and debug Very much (5) Yes (4) Yes (4) Bad (1) Yes (4) Very much (5) Yes (4)
Ease of coding OK (3) OK (3) OK (3) Difficult (1) OK (3) Easy (4) Very easy (5)
Portability Good (4) Good (4) Good (4) Good (4) Good (4) Good (4) Bad (1)
Binary data Good (4) Good (4) No (1) ? (3) ? (3) No (3) Yes? (4)
Acceptability as a future standard Good (4) Good (4) Very good (5) Very good (5) Good (4) Good (4) Bad (1)
Ease of specification OK (3) OK (3) Good (4) Good (4) Good (4) Good (4) Good? (4)
Total score 23 22 21 18 22 25 19

Recommendation: I suggest that we start with the RFC822 header format combined with MIME for the formatting of messages.

Protocol format choice

Possible choices for the protocol (to be extended for our needs):

Choice

SMTP

HTTP

Corba

Description The Internet e-mail format, based on store-and-forward of messages. The WWW protocol, based on direct connections, popular as a base for new protocols. A remote procedure call method, popular in the telecom industry.
Advantage Good for sending messages, we have to implement it anyway in order to handle e-mail connectivity, built-in queing and resending facility when the destination server is down. Easy to use, popular. Easy to use.
Disadvantage Store and forward means that you get no direct responses to queries. Complex, but you can choose a subset suitable for your needs. Limited platform availability, not acceptable for a standard protocol.


Recommendation: I suggest we use HTTP for all communication except the sending of messages. For the sending of messages, I am not sure whether to recommend SMTP or HTTP.

Character set format choices

Choice

ISO Latin 1

Charset

UTF-7, UTF-8

Description ISO 8859-1, a 256 character standard Several character sets, with charset parameter to indicate which is used where UTF-7 or UTF-8 encodings of the Unicode/ISO 10646 character set
Advantage Easy to use The format used today in web and e-mail Expected to be what all computers use in the future, but not yet well supported by all platforms
Disadvantage Only good for Western European languages (not, for example, Polish, Hungarian, Cyrillic, Arabic) Difficult to implement, especially for the search engine Some debugging problems because it is not well supported by existing protocol debugging software like telnet and text editors


UTF-7 and UTF-8 are encodings of the future character set standards Univode and ISO 10646. These encodings of Unicode/ISO 10646 are especially suitable for Internet protocols, because all Latin letters and digits and some common punctuation characters are the same as in ASCII. IETF recommends UTF-8. The only advantage with UTF-7 is that it can be sent without further encoding in e-mail.

Recommendation: I recommend that we start with the Charset choice, but only using one charset, ISO Latin 1. This can in the future be extended to either full Charset or Charset with a choice between ISO Latin 1 and UTF-7 or UTF-8.