By: Jacob
Palme, Stockholm University and KTH
Technical University
E-mail: jpalme@dsv.su.se
Last change: 99-04-04
File name: coding-format-choice.html
An important, and difficult, choice is
the selection of coding format and base web protocol for the communication between
servers in Senior Online. Two major
such types of communication are envisaged:
Here is a short description of the choices.
Format |
Description |
| MIME | The standard format for complex e-mail messages, where the body can be split recursively into multiple body parts. |
| MFORM = Multipart/formdata | Variant of MIME, one of the formats used, when a web user fills in a form in a web page and pushes the SEND button. |
| MHTML = Multipart/alternative, Text/html and Multipart/related | Variant of MIME, the commonly used format for sending HTML-formatted messages via e-mail. Used by KOM 2000 when sending messages from KOM 2000 to e-mail. (? Web for Groups probably also uses this format in communication with e-mail?) |
| XML | A currently very popular format, strongly supported by IBM and Microsoft, for sending structured information on the Internet. Good for complex structures, not so good for binary information (like pictures or attachments) |
| ASN.1 | A complex and powerful binary format, used by LDAP. |
| LDAP | The currently most popular format for communication with directory systems. Good for complex structures and for distributed directory data bases. Uses ASN.1. |
| LDIF | A variant of LDAP with textual, instead of binary, encoding. |
| RFC822 header format | A simple format common in many protocols, including e-mail headers and HTTP headers. |
| Corba | A "remote procedure call" protocol for communication between program modules on different servers, written in common programming languages. |
As an aid in selecting this format, here is a table of choices and their pros
and cons. Question marks indicate that I do not know or am not sure.
Format: |
MFORM |
MIME |
XML |
LDAP |
LDIF |
RFC822 |
Corba |
| Easy to produce manually and debug | Very much (5) | Yes (4) | Yes (4) | Bad (1) | Yes (4) | Very much (5) | Yes (4) |
| Ease of coding | OK (3) | OK (3) | OK (3) | Difficult (1) | OK (3) | Easy (4) | Very easy (5) |
| Portability | Good (4) | Good (4) | Good (4) | Good (4) | Good (4) | Good (4) | Bad (1) |
| Binary data | Good (4) | Good (4) | No (1) | ? (3) | ? (3) | No (3) | Yes? (4) |
| Acceptability as a future standard | Good (4) | Good (4) | Very good (5) | Very good (5) | Good (4) | Good (4) | Bad (1) |
| Ease of specification | OK (3) | OK (3) | Good (4) | Good (4) | Good (4) | Good (4) | Good? (4) |
| Total score | 23 | 22 | 21 | 18 | 22 | 25 | 19 |
Choice |
SMTP |
HTTP |
Corba |
| Description | The Internet e-mail format, based on store-and-forward of messages. | The WWW protocol, based on direct connections, popular as a base for new protocols. | A remote procedure call method, popular in the telecom industry. |
| Advantage | Good for sending messages, we have to implement it anyway in order to handle e-mail connectivity, built-in queing and resending facility when the destination server is down. | Easy to use, popular. | Easy to use. |
| Disadvantage | Store and forward means that you get no direct responses to queries. | Complex, but you can choose a subset suitable for your needs. | Limited platform availability, not acceptable for a standard protocol. |
Recommendation: I suggest we use HTTP for all communication except the sending
of messages. For the sending of messages, I am not sure whether to recommend
SMTP or HTTP.
Choice |
ISO Latin 1 |
Charset |
UTF-7, UTF-8 |
| Description | ISO 8859-1, a 256 character standard | Several character sets, with charset parameter to indicate which is used where | UTF-7 or UTF-8 encodings of the Unicode/ISO 10646 character set |
| Advantage | Easy to use | The format used today in web and e-mail | Expected to be what all computers use in the future, but not yet well supported by all platforms |
| Disadvantage | Only good for Western European languages (not, for example, Polish, Hungarian, Cyrillic, Arabic) | Difficult to implement, especially for the search engine | Some debugging problems because it is not well supported by existing protocol debugging software like telnet and text editors |
UTF-7 and UTF-8 are encodings of the future character set standards Univode
and ISO 10646. These encodings of Unicode/ISO 10646 are especially suitable
for Internet protocols, because all Latin letters and digits and some common
punctuation characters are the same as in ASCII. IETF recommends UTF-8. The
only advantage with UTF-7 is that it can be sent without further encoding in
e-mail.
Recommendation: I recommend that we start with the Charset choice, but only using one charset, ISO Latin 1. This can in the future be extended to either full Charset or Charset with a choice between ISO Latin 1 and UTF-7 or UTF-8.