000522-1

If you want to send data from one computer to another, there is a need to mark the end of one data item. How can you, for example, include the "." character in a string, if "." is used to mark the end of the string, or some other character which you use as an end-of-string-mark. Discuss different methods to handle this problem in protocols based on ABNF, ASN.1 and XML, and their pros and cons.

Answer

  1. Put a length counter in front of the data. The data can then contain anything. Main method in BER. Also used to some extent in HTTP (2).
  2. Split the data into chunks, with a length counter in front of each chunk. Again, anything can be included, but the sender need not even know all the data before starting to send it. Also used in BER and in e-mail "chunking" method (1).
  3. Forbid certain characters in the data (1). If they occur anyway, encode them in some special way. The three most common such special ways are:
    1. Double all occurences of the forbidden character. Example: Encode 'His name is "John" today' as '"His name is ""John"" today' (0.5).
    2. Put a special quoting charater in front of forbidden characters. Example: "John F. Nilsson" as "John\ F\.\ Nilsson". Used in e-mail (0.5).
    3. Encode using the hexadecimal or decimal value of the character. Example: "Göran Åberg" as "Gäran Åran" or G%f5ran %c5". Used in HTML and many other standards (0.5). An extreme variant of this is BASE64, where all characters are encoded.
    4. Encode using a "name" of the character. Example: "Göran Åberg" as "Göran Åberg". (0.5)
    5. Let a line break indicate the end of a string, but allow line breaks in the string if they are succeded by linear white space (e-mail headers). (0.5)

Some of these methods have special problems if the character which needs to be encoded or the encoded variant is at the end of the string to be transmitted.

List of exam questions