Towards a Unification of Internet Applications ============================================== Author: Uzi Paz E-mail: for e-mail contact: user is uzi4wg and domain is uzipaz.com First version: 18/08/1997 Recent version: 03/01/1998 Main Source: http://www.spoofers.net/uzi/eng/unified.txt or http://www.geocities.com/uzipaz/eng/unified.txt Copyright Notice: you may not copy this document or any part of it, to a public location, nor to publicize it in any other manner without prior permission from the author. The aim of this document is NOT to be used as a learning resource, but rather to be used as a matter for further work. This document should be treated only as a work in progress. AN IMPORTANT COMMENT: the author does not claim to be knowledgable in the field. If there are any comments which you think that I'm not aware of them please contact me. Index ===== 1. Abstract 2. Motivation 3. Unified Framework for Internet Extensions 3.1 Definition of the Criterias 3.1.1 Objects 3.1.2 Distance 3.1.3 Expiration 3.1.4 Automatic and Manual Change of Distance 3.1.5 Identity of the Object 3.1.6 Output Presentation on Local Site 3.1.7 Forground vs. Background 3.1.8 Topic Classification 3.1.9 Location Identifiers 3.2 Examination According to Criteria 3.2.1 Personal E-Mail 3.2.2 Unmoderated Mailing Lists 3.2.3 Unmoderated Newsgroups 3.2.4 FTP Sites 3.2.5 WWW-Sites 3.2.6 Intermediate Stages 3.2.6.1 Proxy Servers 3.2.6.2 News Servers 4. Putting Everything Inside 4.1 Objects 4.2 Distance 4.3 Expiration 4.4 Automatic and Manual Forground/Background Transfer of Messages 4.5 Identity of the Object 4.6 Output Presentation on Local Site 4.7 Topic Classification 4.8 Location Identifiers 4.9 Intermediate stages 5. Transition Problems 6. Last Remarks 7. References 1 Abstract ---------- We discuss the different Internet applications from a unified standpoint. We believe that such a standpoint is the correct framework for any further extensions. We treat the subject not from the technical point of view, but rather from the user point of view. 2 Motivation ------------ Few years ago, before the WWW (World Wide Web) was invented, there was a clear distinction between various Internet protocols. FTP was for transfering files. Telnet was for connecting to a remote account, SMTP (E-Mail) was for transfering E-Mail messages between e-mail users. Usenet was invented without a connection to the Internet, and then it was adopted by the Internet. Mailing lists were well developed on Bitnet, before they were on the Internet, but as there were no problems in transferring messages between the two networks, many Internet accounts were subscribed to those bitnet mailing lists. At present, we still consider Usenet as being transmitted within the Internet via NNTP, E-MAIL via SMTP, Web-pages via HTTP, files via FTP, and mailing list messages via SMTP (or the secured extensions for these protocols), but this is no longer accurate. There are gateways from E-Mail to Usenet and vice-versa; from Usenet to WWW and vice versa. More and more mailing lists have WWW archives with an interface to post messages via WWW. Even private e-mail can be accessed via WWW interface (e.g. Hotmail). Many Newsgroups have mirror mailing lists, so that every message posted to the mailing list, will be gatewayed to the mirror newsgroup and vice versa. The invention of the URL, did an important step towards a unification of the different applications. A single browser allows you to access various applications via various protocols. Many of the browsers do not really care if the message came via ftp, http, or nntp, if it is an html message, they will provide you an (optional) html interpretation of it. So it doesn't too much matter if you use http or ftp to get the file. Each of the services and programs try to provide better options, so that you can use WWW to access Usenet and have all advantages of a regular news-reader (e.g. Billy News [1] uses cookies to allow the reader to subscribe to newsgroups and mark messages as being read, while accessing Usenet by HTTP, and more such services appear), mail to news gateways cause borrowing headers which belong to one system to the other. Some mail clients support an option to put different incoming mail in different "incoming" folders, so that the user can decide that incoming mail from one mailing list will be kept in one folder, mail from another mailing list will be kept in another, private e-mail in another "incoming" folder etc. If once, the automatic seperation of messages to different logical-"folders" according to name of discussion group, was a feature which belong solely to Usenet, it becomes more part of the features of mailing lists. There is much discussion on borrowing header fields from Usenet to E-mail, (e.g. "Expires:" field from Usenet to E-Mail [2], so that a poster of e-mail would be able to set an expiration date). This will make the e-mail even more similar to Usenet. A suggestion was made, to add to mailing list messages, in their header fields, new fields with URL-like structures for controlling subscription options. This will provide the e-mail client, the option to add to the interface simple "buttons" for controlling subscription. Such a suggestion will make mailing lists even closer to newsgroups than they are now. HTML is now fully implemented by many MUAs, and links from e-mail messages to other resources are also supported. To make things even more complicated, new creatures were invented, newsgroup-like Web-based discussion groups (Hypernews [3]), and mailing lists-like web-based discussion groups (Web4Groups [4]) which also have a standard e-mail interface. Each of them tries to combine the advantages of WWW to the discussion groups. Take one of these applications, use it via a gateway from e-mail or from Usenet, and enjoy a mailing-list-like, WWW based application via usenet, or perhaps a Usenet-like WWW-based application via Usenet, or perhaps you wish to use the Usenet-like WWW-based application service via e-mail ? Would you like all smart advantages of usenet to be provided to you via e-mail when accessing Usenet through a gateway? With a wise enough gateway, wise enough extension of the protocols, and a wise enough mail-client, nothing is impossible. This overlapping of areas, is not simple, gateways may produce compatibility problems. For example, if you post a message and put in the "To:" field, two different mailing lists, each of them is mirorred to a different newsgroup, then the message, instead of being posted to Usenet once, with a "Newsgroups:" field containing the two newsgroups, it is posted twice, with the same message-id, but once with only the first newsgroup in the "Newsgroups:" field, and once with only the second newsgroup. The more each of the applications will try to be compatible with gatewaying services from other applications, and the more each of the applications will try to adopt advantages and features from the other, then the less, the borders between the different applications will be clear. At the moment, there are still clear differences between different applications, but the direction is clear. As there is already such a trend towards unification, but it is driven from inside out, i.e. from application-specific points of view, and from IETF groups each working on extensions of specific protocols. The suggestion made here is to develop a unified framework for discussion of future extensions to existing protocols and applications, and to discuss the unification of them. More specifically, we wish to discuss the possibility of pushing this unification to the extreme. At first sight, it looks really crazy: How can an e-mail message sent by a person to another person as a private e-mail, and a web-page, can be treated as two points in a continuous map? We are not in a position to provide an answer to this question, but we wish to initiate the discussion, not knowing where it will lead us in the future. Any trial for inventing a new protocol which will unify all features of all existing protocols, is immediately faced with the questions: a) What guarantees, do we have that, it won't become just another protocol, with gateways to and from other systems? b) What will be the problems with compatibility with other non-Internet networks, when using gateways. Before we are trying to unify the different systems and protocols, we should discuss the existing ones from a unified standpoint. Even without trying to unify the protocols, such a unified approach, should be a good framework for discussions of further extensions of the existing software and standards. 3 Unified Framework for Internet Extensions -------------------------------------------- We shall try to define the different applications according to various criteria. We shall not enter the technical aspects of the different applications and protocols. We shall also ignore many aspects and details which according to our view, should be discussed only at a later stage. In order to make the overall picture more transparent, we do not enter in details to the examples nor intend to discuss all known examples, and special cases. The info below is probably not new to most of you. The main importance of it, is in the way the info is presented. 3.1 Definition of the Criterias ------------------------------- 3.1.1 Objects ------------- An object is any file or stream delivered using Internet protocols. It may be: 1) A text file 2) A MIME enhanced file 3) A binary file (not defined as a MIME enhancement) 4) An HTML file (not defined as MIME enhancement) 5) A stream of text such as the output for a LIST request on NNTP. 3.1.2 Distance -------------- We shall roughly define three different distances for any object. We shall define the distance with respect to any of the recipients. The definition is not connected neither to physical distance, nor to logical Ineternet distance. REMOTE: ftp site, www-site, nntp-server of the poster of a news message. INTERMEDIATE: Proxy-servers, news-servers. Relay systems will not be considered. LOCAL: the local computer of the recipient, or a relay system of the recipient's ISP. Different systems and protocols provide a different control to the sender, recipient, and intermediate maintainer (i-maint) for the transfer of the object from one distance to another. 3.1.3 Expiration ---------------- For some objects and for different distances different people have different control over expiration of the object. in some cases the expiration is set, in some there is no expiration, in some cases expiration is set at one stage but ignored by the later stages. 3.1.4 Automatic and Manual Change of Distance ---------------------------------------------- Sometimes the recipient requests a specific object or set of objects to be sent to the local distance. In some cases, the recipient requests that whenever some objects meet some definitions, they will automatically be moved to the local distance. This might be generalized to allow the recipient to request only the transfer to an intermediate distance automatically, and then, while online, requesting them to be transferred to the local. 3.1.5 Identity of the Object ---------------------------- Some identities are supposed to be unique (message-id). Some may refer to different objects at different times (URL of a file which is updated from time to time). A combination of a date/time and URL is a good URI. On the other hand, if there are a few copies of the same object, the there might be a few different URIs for the same object. 3.1.6 Output Presentation on Local Site --------------------------------------- This has more to do with the software of the recipient, but has to be discussed as it is inseperable from the whole discussion. Output will usually directed to a file in the local site. It might be a temporary file+screen (www-browsers), may be an incoming buffer (e-mail), or a regular file (ftp). A related question, is who is in charge on the local presentation. On WWW a site developer may choose the fonts, and presentation of the service. Software on local machine may allow the user to choose the fonts and presentation. Using a small number of standard font-codes, or presentation-preferences, may allow the local user to choose his/her own user-preferences for each of them. (standards for codes might be: underligned/regular/bolded/blinking/etc.). With some convention on the usage of each standard preferences, it allows the user to have an easy control over presentation. On the other extreme, a huge number of fonts, and total flexibility with no standards, may provide a lot of flexibility to the designer of the object, but no easy control to the user for choosing his own preferences. 3.1.7 Forground vs. Background ------------------------------ You may open your Web-browser, request an object and wait for it to come into your local site. Usually this is the case with www, and ftp. E-mail is different, as you may use it to get objects in the background. You may use www-mail servers (e.g. agora servers) to get the object in the background, or you may run ftp or http in the background. At the moment, most of the people who wish to ransfer objects to their local distance, in the background, will find the by-e-mail services as the most useful tools. When discussing a unified approach, we should look at this aspect as well. 3.1.8 Topic Classification -------------------------- This criteria is for classification of objects in order to allow automatic receipt of objects. For locating links and threads. There are many different topic classifications, and html allows every one to construct a different classification. Do we wish a single standard classification? Can we unify a Usenet-like classification (i.e. by threads, newsgroups, subhierarchies and hierarchies), and the many different wise types of classifications found on WWW pages, and FTP sites. 3.1.9 Location Identifiers -------------------------- Location identifiers should provide info on where to find a specific object. Do you wish to receive a document according to a specific specification ? Where it exists ? This is probably the hardest part in unification. It is not enough that the object holds the identifiers, but rather that those identifiers will help users which do not have the object to identify its location. 3.2 Examination According to Criteria ------------------------------------- We shall now examine how do E-Mail, Mailing lists, news-messages, ftp-sites, and www-sites, fall in the above criteria. Many of the definitions here are not exact and there are plenty of exceptions. Making comments about all the exceptions will make the discussion much less transparent, hence many cases and exceptions are ignored. For the same reasons, we ignore other applications such as IRC and the Internet phone. 3.2.1 Personal E-Mail --------------------- A personal e-mail is any e-mail for which the sender addresses the object to a list of addresses, each of them is an incoming box of a single human. In general: (1) the poster has the manual control for transfering the object to the local distance. (we ignore setting filters and killfiles) (2) the recipient has no control (in general. Up to kill files and filters which are considered as automatic controls). (3) No intermediate distance. (4) Usually: no expiration (recipient has to delete the object manually). (5) The object might be of type, either 1 or 2 (see 3.1.1). It might of-course be 4, but many mail-client will not interpret it as that. (6) identity is suppposed to be unique (7) A reply-to header might in theory, be used for generation of threads, but practically, this is not used. On the local machine, one may either manually or automatically organize the messages on different folders. (8) No location identifiers needed. The "Received:" field serves as path-recorder, and may be extended to be used as a location identifier. Location is at local site. 3.2.2 Unmoderated Mailing Lists ------------------------------- In general: (2) the recipient has automatic controls, by choosing to which mailing lists to subscribe. (1) The poster has manual control for transferring the object to the local distances. (3) No intermediate distance. (4) Usually no expiration. (5) types of objects are the same as in 4.1 (6) identity is supposed to be unique (7) mailing lists are one kind of topic classification. Each mailing list is supposed to send messages related to a specific topic. A reply-to header might in theory, be used for generation of threads, but practically, this is not used. On the local machine, one may either manually or automatically organize the messages on different folders. (8) Same as in 3.2.11 3.2.3 Unmoderated Newsgroups ---------------------------- In general: (1) is the same as for mailing lists. (2) is the same as for mailing lists, up to changes implied from (3) (3) Intermediate news-servers: The maintainer of the news-servers will usually limit the ability to access. (4) expiration is set either by the intermediate maintainer, or by the poster. (5) types of objects are as in 4.1 . (6) identity is supposted to be unique (7) Topic classification: hierarchies - sub-hierarchies - newsgroups - threads . (8) Location identifiers - not much needed. The "Path:" header field may play a role equivalent to the "Received:" field in e-mail. 3.2.4 FTP Sites --------------- In general (ignoring proxy servers for ftp sites): (1) the poster has control for transfering the object to the remote distance. (2) The recipient has manual control for transferring the object from the remote site to the local site. (3) FTP - proxy servers (4) No expiration. But at the distant site the poster has control. (5) Originally, types of objects were 1,3. (6) identity may refer to different objects at different times. (7) Classification according to location. (8) Location identifiers - None. URL includes an identifier of the remote location. Stream provides a partial location info. 3.2.5 WWW-Sites --------------- In general (1) the poster has control for transferring the object to the remote site. (2) The recieipent has manual control for transferring the object to the local site. (3) Some wise proxy servers allow the recipient to save time, by automaticaaly keep objects in an intermediate site. (4) No expiration. At the distant and intermediate sites, the poster has control over deleting and replacing the object. (5) all objects 1-4. (6) identity may refer to different objects at different times URL + time give us a unique identity. There might be few copies of the same object with different URIs. (7) First classification is by location, usually not used. A free classification may be used by linking. There is no standard for such a classification. (8) Location identifiers, appear in the URL, and include the remote location. 3.2.6 Intermediate Stages ------------------------- We shall discuss the different intermediate stages, from the unified point of view. We shall discuss two such systems: news servers, and proxy- servers. On any intermediate site which serves many users, the maintainer must have some control of the way in which the resources are used most efficiently. For news-servers, it is done by subscription to only part of the newsgroups, and by setting different expiration times for different hierarchies. 3.2.6.1 Proxy Servers --------------------- For this discussion we treat Proxy-Server as an intermediate site which saves the recipient time by keeping popular objects closer to the site of the recipient. I do not wish to enter other possible uses for the proxy servers (such as firewalls). A proxy server makes a non-automatic transfer of objects from a distant site to an intermediate site. It is usually done in the forground, but may be done at the background. After transfering the object from the remote site to the intermediate site, users will be able to get the object from the intermediate site instead of the remote site, and hence save time. Object has expiration on the intermediate site. Expiration depends on the resources for the intermediate site, and on the amount of time passed since the last recipient ordered the object. After expiration, the recipient will get the object from the remote distance, through the proxy-server. 3.2.6.2 News Servers -------------------- A news-server provides a feature similar to a proxy server, in the sense that it allows the bringing of objects closer to the recipient. Of course news-server is a basic element of the Usenet logical structure as Usenet messages do not have a fixed location. A news-server makes an automatic transfer of objects from a distant site to the intermediate site, according to the decission of its maintainer. It is done automatically, and hence in the background. Objects have expiration date. The maintainer sets the expiration time for the objects. The poster may request a specific expiration by setting a header field. The maintainer may decide to respect this request, or not. 4 Putting Everything Inside --------------------------- We wish to discuss a new creature which combines the most flexible features according to the criteria mentioned in 3.1 . The purpose is not to offer a specific suggestion, although at a future stage we may be able to make such. At the moment, we only wish to concentrate on the discussion itself. We shall treat the creature as if it is a new logical structure (as ftp, www, Usenet, e-mail etc.) 4.1 Objects ----------- Our structure has to support all objects mentioned in 3.1.1 . One may say that the structure can handle binary files through MIME, and hence there is no need for other types of binary files. It is important, to state that we wish this object to be compatible with all existing objects. We do not enter the technical manner in which the binary files are transferred, but it should be clear that the structure should be able to handle even binary objects which do not have a MIME header-fields. About the `stream' object mentioned in 3.1.1 , a local or intermediate agent may request such an object from the intermediate or remote site. Since listings are for some systems delivered as files, and for others, delivered as streams, we wish our structure to treat the stream as any other object. 4.2 Distance ------------ The structure has to support an intermediate stage as a basic element. The nature of the intermediate servers should be, in general, very flexible. For any object, the poster, the (human) recipient, and the maintainer, have all control for the passing and expiration of the object in the intermediate server. There might be more than one intermediate stages. 4.3 Expiration -------------- Expiration is always to be set for any stage, by the maintainer of the stage. For the local : by the recipient, for the intermediate, by its maintainer, and for the remote, by the poster. A poster my request a different expiration, and the recipient and intermediate-server-maintainer, may (automatically) accept the request or not. 4.4 Automatic and Manual Forground/Background Transfer of Messages ------------------------------------------------------------------- The recipient may request to transfer a specific object to the local site, or may set, that whenever a new object which matches some criteria is found, at a certain remote/intermediate site, it should automatically transferred to the local site. The recipient may also request certain objects to be automatically or manually transfer from a remote site to an intermediate site in the background, so that it would be possible to get the objects faster in the forground. Any such usage of the intermediate server, should be mastered by the maintainer of the intermediate site, which has to define good rules for a fair share of the server's resources by the local users. 4.5 Identity of the Object -------------------------- On Usenet and e-mail any object should have its identity, and there is no specific rules for the naming. On WWW and FTP the identity is defined by the location of the object, and hence, a replacement of the object with an updated one, will not change its identity. Identity by location has the clear disadvantage if the object has no fixed location. If we leave for a moment the question of transition, (we would address it later in 5) a good suggestion for an identity, would be the address (email?) of the creator of the object, and the exact time of the creation. This may leave another question on the definition of the location of an object. We will address the question on 4.8. 4.6 Output Presentation on Local Site ------------------------------------- This part enters the specific client software considerations. Presentation of each object has to be according to the type of object, whether the object is an html file or a text file etc. Each object has a header, which is either a stream of input about it (such as the stream result of the ftp dir command for a file) or the header of an e-mail/news message. These all should be treated as the header of the object. The interface should provide both the header and the object as distinct connected parts of the object. The header frame might be used for generating a continuation of the thread (a reply, a forward etc.). There is a place for a further discussion of the header. How does it tell us the locations of the object. 4.7 Classification of Objects According to Topic ------------------------------------------------- In section 3.2 we saw a few examples of classifications, and we may put them into three main types: (i) Classification by hierarchies/Sub-Hierarchies/Names/Threads (Usenet) (ii) Classification by location (FTP) (iii) Free Classification (WWW) Practically, I see no advantage in the second type (should not be confused with the third one, which might be related to the second). I believe that the first and third ones must coexist: A standard, type (i) classification, and a free (i.e. non-standard) classification for html objects. A type (i) classification, should be standard for all types of objects, and may include the info that until now could be found in the protocol envelope, in the object header fields. The output presentation of it, might be as an HTML interpreted standard structure of the header frame (see 4.6). 4.8 Location Identifiers ------------------------ Location identifiers, have to identify locations in the most efficient manner. If it is possible, they should hold info on as many locations as it is possible, for a specific object, with expiration date for each of them. If the object has no expiration date on the remote site, there was no problem and there was no need for more identifiers, as the intermediate sites will not need to use it, if they hold the object. 4.9 Intermediate stages ----------------------- In order for the recipient to be able to locate the objects relevant for him/her. It is imporatant that if not the objects to be pushed to the intermediate sites, then at least their headers. (or indices of some kind). It is possible that there will be a standard remote location for each hierarchy or subhierarchy , and whenever a user wishes to receive a list of objects according to a specific classification, The list will be pulled from the remote site. There is much more place to discuss this issue. I'm totally unsatisfied with what I have at the moment. 5 Transition Problems --------------------- The last thing we wish, is to add just another protocol, so that many sites which exists at the moment will continue to use recent protocols. In order to allow it to replace existing protocols, it is better that the protocol will be compatible with most of the recently existing protocols, so that all existing could be accessed using this protocols. Such extenstions of the existing protocols so that they would support using the new protocol, and application is another direction. Another option is to decide that a specific protocol (e.g. HTTP) will be extended to support all spectrum of options discussed. 6 Last Remarks -------------- The draft above is just a starting point. There are quite a bunch of problems to be solved until a standards-draft could be written. I'm not considering it as a good starting point, but this is what I was able to do at this stage, and I do believe that such a direction of research, is important. Any comments are welcomed. 7 References: ------------- 1. http://www.billyboard.com/ 2. ftp://ftp.dsv.su.se/users/jpalme/draft-ietf-mailext-new-fields-09.txt 3. http://union.ncsa.uiuc.edu/HyperNews/get/hypernews.html 4. http://www.dsv.su.se/~jpalme/w4g/web4groups-summary.html