Here are notes on what happended during some of the Application area sessions at the Internet Engineering Task Force (IETF) meeting in Munich, August 1997. These are my personal notes, no official minutes. What I write below is often quotes of what someone said at the meeting, and not necessarily my own opinions on the issue.
By Jacob Palme, e-mail: email@example.com, at the research group for CMC (Computer Mediated Communication) in the Department of Computer and Systems Sciences at Stockholm University and KTH.
"We reject kings, presidents and voting.
We believe in rough consensus and running code."
&endash; Dave Clark (1992)
Comment: This is a commonly credo of IETF, but it is not quite true. I have participated in several votings during IETF face-to-face working group meetings. However, special voting algorithms are then usually used. Example: "Who has read the draft?", "Who among those who has read the draft are of the opinion that...?" I believe that voting with normal voting algorithms is not suitable for IETF work, but that voting with specialised voting algorithms may in some cases be useful. Some time I will write a proposal about this.
The task of this group is to handle HTTP 1.1, which is already a proposed standard, and to further this into a draft standard. This means that the work is very well progressed on HTTP 1.1, and the things discussed are nitty-gritty detailed of unclarities or other problems with a standard being implemented. But such discussions are also interesting, you can recognize the typical problems which crop up in this stage of standards development.
The meeting rather rapidly went through a very long issue list. Most issues were related to special cases or unclarities in particular uses of particular header fields.
Is an asterisk (*) permitted as wildcard for the Character set.
Should Content-Disposition (from e-mail standards RFC 1806) be used also in HTTP? Content-Disposition allows a sender to designate a body part as inline or attachment, and indicate a suggested file name.
The problem is that this has been changed from old practice to HTTP 1.1, and that the old practice is codified in a number of CGI scripts. These scripts, when used by new 1.1 servers, will continue to perform according to the old practice, but the clients expects new practice, and so the connection does not work. Similar problems may occur with other cases where 1.1 has changed older practice.
This issue should at least be discussed in a revision of RFC 2145, even if there is no good solution.
One solution used by one server: Turn off 1.1 as soon as you are communicating with a proxy.
Some web pages cause 5-6 redirects in sequence. This is not quite nice. Many walkers stop at redirections and do not follow them.
There is a conflict between HTML and HTTP in interpretation of the link header. Should this be discussed in the HTTP working group, or in the groups working on HTML standards? Is the semantics of the HTTP Link header exactly the same as that of the HTML Link element?
Should we add a note explaining how CGI can be cached. History: Caching proxies started not caching queries or things with CGI-BIN in them, since such usually return different values with every usage and is not worth caching. So servers learnt this, and began to assume that such things are not cached, and did not put in the "don't cache" header. And then they begin to ask for the standard to say that proxies should not cache such things!
Options are insufficiently defined. Discussion seems to be converging. Internet draft with solution will be written. RFC numbers are used in the protocol for option negotiations.
Encourage synchronised clocks!
Advice on SHOULD and DON'T should be in the main text, not in security considerations, someone said.
Redirect inherently has a privacy concern issue.
A talbe of rquirements like RFC 1122 and 1123 is wanted. A sample such table can be found in Draft-08. Example: Which are the requirements for a pure origin server?
To go to draft standard, you have to have at least two independent inter-operable implementations. To collect this data is difficult for such a long and complex standard as HTTP 1.1. There was a question whether two independent inter-operable implementation meant two servers or two clients or what? Answer: Two independent servers and two independent clients which are all interoperable!
The goal of this group is to develop better methods to support authoring of WWW documents. Authoring requires:
This IETF working group is developing a way of associating a set of Properties with collections of web resources. Typical properties are "author" or "readonly". Each property has a name and can have a value. The syntax and semantics of the value is specified by the name. HTTP will be extended with facilities for getting and setting the properties of resources. This has some similarity to the META values which can occur in HTML heads. A Schema is a set of properties defined for use within a special set of resources. In particular, the Webdav group is going to define a schema named DAV.
This group is backing away from providing a general-purpose search feature. A standard for general-purpose search could be a task for a new IETF working group. The Webdav group will only provide an intentionally limited facility called FINDPROP. FINDPROP allows retrieval of which resources have certain properties in a collection, but does not provide more general-purpose search like searching on the contents of resources.
It is possible to apply a method on one resources or of all resources in a collection. A depth value can have value "0" (only this particular resource), "1" (all direct members of a collection) or "infinite" (all direct and indirect members of a collection). This is controversial, and will be moved to a separate document, with "0" as the only option in the base specification. Example: Copy and Move are very complex methods if applied to more than one single resource.
There is a need for a facility for an atomic operation to lock a set of resources simultaneously. Reason: Avoid the problem where someone asks for a lock on a set of resources and gets locks on only some of them. Difficult to implement. There is discussion whether to have a locking facility on a set of resources, or on only one at a time. Compromise: The need is there, try to get it working, if we cannot, we will have to accept that we could not meet this requirement. The problem: in HTTP, all commands operate on only one URI. There is no HTTP facility for performing an operation on more than one URI at the same time.
Possible solution: Special lock servers called arbitrators. But then before trying to lock a set of resources, you must find an arbitrator capable of handling locking of all resources in this set.
Note: An atomic operation is an operation which can only peformed in full or not-at-all. But it need not be performed exactly simultaneously on all the resources on which the atomic operation acts.
There was a discussion about language variants. They seemed to be oriented towards a special construct for handling variants, rather than the simple solution of a "translation-of" link between objects, which we have chosen in Web4Groups. I tried to argue for our solution, but people seemed to think according to different models than we do.
An observation: The issues in this IETF group seems to be of interest to people working on the BSCW and in the EU-funded research project CoopWWW.
This meeting was a so-called BOF (Birds of a Feather) which in IETF means a meeting for which there is no IETF working group yet. Chairman: Harald T Alvestrand.
He started with a very good overview of the issues:
Texts in many languages get badly mangled if shown as plain ASCII.
Why declare character set: It will take time until ISO 10646 is universally accepted, and we need a way out if ISO 10646 is found not to suffice at some time in the future.
Why ISO 10646: Richest today, good opportunities for further extension. Problem may be that this standard will change in the future. It is well known, maintained and extended. Problem: Unstable, but this can also be an advantage, mistakes can be expected to be corrected. A problem with ISO 10646 is that because the character set is so large, many implementations will only be capable of handling a subset of all the characters in ISO 10646. A method may then be needed to indicate which subset of ISO 10646 which a computer can handle, and what to do when characters outside of this set is encountered.
Why UTF-8: One way is better than many ways. UTF-8 IS backwards compatible with ASCII, ASCII data will look like normal ASCII. Disadvantage: Requires 8-bit clean channels and is a variable-length encoding.
UTF-1, UTF-2, UTF-file-system-safe are precursors or earlier names of UTF-8, these designations should not be used any more.
Why language tags: People can designate which language version to read, much processing, like indexing and sorting, depends on the language. RFC 1766 is recommended because the ISO standard 639 is not complete, it can only handle about 50 languages. Larger ISO schemes are in development. RFC 1766 is a flexible scheme under IETF control.
ISO 10646 is better maintained than ISO 639 because there is stronger industry pressure to get 10646 working. ISO 639 is handled by linguists who do not understand the urgency needed to get working standards reasonably fast.
What about names: Who sees them, Who types them, Who misunderstand them? Names are often used with for example Norwegian-particular characters in English-language texts. Examples: Torbjörn, Torbjørn. JP comment: The problem with names is because you have one language and character encoding for a whole body part. If you can switch language and character encoding within a string, the name problem disappears. Also, 10646 might solve this problem since it allows all characters, you need never switch character set within a string to handle a name.
Problems: ISO 10646 has different characters which look the same. Case handling (upper and lower case) may not be well-defined for non-latin characters and sorting is a problem both because of case handling and for other reasons. Comparisons of two strings is a problem. Could be solved by normalizing methods or rules.
There was a long discussion about different variants of ISO 10646 and various problems with 10646 and about normalising of 10646 strings. There was only one hour allocated to this BOF, so many important issues never got to be discussed.
This group is needed because the registries have not agreed on a joint format. This group will develop protocols for some basic information exchange between different registries based on different standards (whois++, X.500, etc.)
We have to persuade the area directors that an IETF working group on this is needed, said Keith Moore.
Three registries are already in existence, in the future, hundreds of them are expected.
An host object has been added, required/optional, single/multiple added.
There must be a way for objects to reference other objects in other registries. Every reference has a globally unique registry identifer, and a local identifer unique within this registry. The global registry id is the domain name. It can be registered in the DNS. Each server can be queried with the local identifier.
No central authority needed for managing who is a registry and who is not, but the registries.int domain, managed by IANA is used.
There was a controversy on what information to store in the DNS. Some want to put much info there, others not so much. The DNS might store server, protocol and protocol options, what kind of data is available, or it might only store how to find a registry and nothing more. There was a lot of discussion on this issue.
Our main customers are the regional registration authorities. We are not defining a new registry system, just exchange of data between existing registries.
Operations needed: Version negotiation, identificdation to the server, retrieval of the full dataset for a specified object type or all types, retrieval of data that has changed since last retrieval (optional).
Should non-ascii characters be allowed in registry entries. Registries should be international, and retrievable by anyone anywhere. (Registries might store extra data in non-ascii fields, but the registry must look, when accessed from other registries, like having only ascii fields.)
Which data needs to be exchanged between registries, how to handle references between data store in different registries, how to find Internet registries in the global Internet and how to obtain authoritative information on which protocols they use, what data formats should be used.
One person who said this group was not needed at all was out-boohed.
A proposed standard for sending HTML in e-mail is out, and most of the major e-mail software vendors are busy implementing it (including Eudora, Microsoft, Netscape and others). Several problems have cropped up in the exact implementation of the proposed standard, and some bugs have been found, so we decided to develop a new proposed standard, which we hope to submit to the IESG for last call at the end of September 1997.
The most important issues were:
We decided to accept the illegal URL and repeat it, if necesary, in the Content-Location statement, too, as shown in the example above, rather than having to rewrite the HTML text. This is of course still illegal and not recommended.
Is the Content-Base on the Multipart/related to be used as a base for URL-s in the sub-parts. RFC 2110 says no, draft-fielding-url-syntax-05 says yes. We decided to keep saying no, but to contact Fielding to ensure that both documents agree.
We want to have an MTHML meeting at that meeting, it will probably discuss what features are implemented and not implemented as a basis for going to draft standard.
DRUMS is in the final stages of developing its documents, so the discussion was mostly on small, but not unimportant, technical details.
There was continued discussion on ABNF, the mostly used syntax specification language in IETF, and which is to become a separate standard, and not part of RFC822. Should ABNF cater for RFC 10646/U TF-8 characters? If so, how?
Should we allow multiple "To:" lines? Eudora, on receipt, will ignore all "To:" lines except the first. 822 says that the behaviour if you get multiple "To:" lines is undefined. Same for "Cc:", "Bcc". Conclusion: Generate only one, accept multiple, if you get multiple, handle as one long catenated string.
SMTP: Should there be any limit on length of lines or of e-mail addresses? This is related to the issue: Should we stay compatible with RFC822 or write what we think is best for the future?
When should the response 452 and 552 be given to RCPT TO?
It is important that the server gives the client adequate information on whether the client should try again a few hours later, or abandon the attempt to send this message to this server.
How should Message-ID be constructed to ensure global uniqueness. We agreed to give implementors freedom but could describe different methods of achieving uniqueness.
Allow "group: LWSP ;"? Example: "To: foo@bar, via postal mail: (Mary Smith);".
"free-form-name" -> "display-name".
Four-digit-year: Generate grammar, must be four-digit. Receive grammar: SHOULD be able to handle it. Two-digit years NN SHOULD be interpreted as "20NN" if NN < 59, "19NN" if NN > 60.
Forward means: I want to discuss this message with the new recipient or with the new recipient together with the original recipient.
Resent means: You are the person who should be the recipient of this message, not me.
Current practice is a mess. Whatever we decide will require changes of existing browsers.
Two current uses:
My opinion: Deprecate this. Other people wanted mainly choice 1 above, said this was the original intention. Possible deprecating will be done by a new standard, defining two new replacements, this new standard is not the MSGFMT document.
IANA registry is wanted. IANA wants rules what to accept. A mailing list for community review, reasonable headers accepted, area director decides in controversial cases. This is part of the DRUMS work.
We discussed how strong the control should be on new e-mail headers before acceptance in the registry, and concluded that control of the same kind as is presently used for registration of new mime subtypes is suitable. A stronger control will cause the registry to be too little used, and the goal of the registry (to reduce the risk of synonyms and homonyms in header names) will not be fullfilled, a less strong control might get too many unreasonable header registered.
A new version of the IETF draft on this issue is available.