Nagarajan Gopalan: Writing: Thesis

CHAPTER 5.0

INTERNET TOOLS

Two distinctive features of the Internet are the use of TCP/IP (discussed in the previous chapter) and the use of the client/server model of information interchange.

In the client/server model work is shared between two computers, a "host" machine that serves out information and a "client" machine that receives information. With the client/server model, client software can be installed on a personal computer and the client software can take over the tasks of displaying menus to the screen, negotiating connections to a remote computer, saving files, creating a screen environment, and similar tasks; while the server or remote host computer performs tasks such as searching a database and sending the results back to the client.

This provides an efficient division of labor between client and server- the client computer takes care of displaying the information and the server does not have to handle the overhead of supporting large numbers of logged on users. With this model, developers can improve and change client software and server software independently.

A number of Internet tools have been developed employing TCP/IP and the client/server model and these are discussed in this chapter.

5.1 Email

Electronic Mail or Email is probably the most commonly used and well known application on the Internet. It is popular because it offers a fast, convenient method of transferring information and can accommodate small notes as well as large voluminous documents within a single mechanism.

It differs from the other tools because the sending and receiving computers do not have to be able to communicate directly to make it work. In all the other tools, network protocols send packets directly to destinations, using time-out and re-transmission for individual segments if no acknowledgment returns. In the case of Email however, the system must provide for instances when the remote machine or the network connections have failed- the sender does not want to wait for the remote machine to become available nor does he want the transfer to abort merely because the remote machine becomes temporarily unavailable.

To handle this requirement, Email systems use a technique known as spooling. This is shown schematically in the diagram below.

Source: Comer, Douglas E. Internetworking with TCP/IP: Principles, Protocols and Architecture. Englewood Cliffs: Prentice Hall Inc., 1991.
Figure 5.1 Email Process Model

When the user "sends" an Email message, the system places a copy in its private storage area (also referred to as the mail spool) along with identification of the sender, recipient, destination machine and time of deposit. The system then initiates transfer to the remote machine as a background activity allowing the sender to proceed with other activities.

The Email system then behaves as a client and maps the destination machine to an IP address and attempts to for a TCP/IP connection to the Email system on the destination machine. If it succeeds, it passes on a copy of the message to the remote machine and deletes it from its mail spool. If the transfer cannot be achieved, it records the time of attempted delivery and terminates the session. The Email system periodically sweeps through the spool area checking for undelivered mail and attempts to deliver. If delivery cannot be accomplished for an extended period of time, the message is usually returned to the sender.

The Email system on the remote machine retains the message in its mail spool and delivers it to the user it is intended for when that person logs on to the system.

Email is hinged on the concept of an address which provides information allowing a message to reach the person it is intended for. The Email address usually takes the form of

user-identification@domain-name.

Thus the address

gopalan.3@postbox.acs.ohio-state.edu

would correspond to user gopalan.3 on the system postbox.acs.ohio-state.edu.

The transfer of Email is governed by a separate protocol called SMTP (Simple Mail Transfer Protocol). This protocol specifies the exact format of messages a client on one machine uses to transfer mail to a server on another. It does not specify how the Email system accepts mail from a user or how the user interface presents the user with incoming mail. It also does not specify how mail is stored or how frequently the mail system attempts to send messages. These are controlled by the client software and by the administrators of the Email systems.

A specific application which uses Email as a means for communication and dialogue between people with common interests is the Mailing List. This is discussed in a subsequent chapter.

5.2 USENET Newsgroups

USENET Newgroups are the Internet's equivalent of a discussion group or a BBS (Bulletin Board System). It is a world wide conferencing system encompassing all kinds of organizations and people and provides a forum communication and discussion.

Source: LaQuey, Tracy. Internet Companion Plus: A Beginner's Start-Up Kit for Global Networking. Reading: Addison-Wesley Publishing Company, 1994.
Figure 5.2 USENET Newsgroups

The basic building block of USENET is the newsgroup, which is a collection of messages with a related theme (on other networks, these would be called conferences, forums, bboards or special-interest groups). There are a huge number of Newsgroups and in order to make them manageable, they are organized into a tree structure usually referred to as a hierarchy.

The major hierarchies are:

comp Computer hardware, software and protocol discussion.

misc Subjects which do not fit in anywhere else

news USENET software, network administration

rec Recreational subjects and hobbies

sci Topics in established sciences

soc Socializing or discussion of social issues or world culture

talk Debates and discussions on current events and issues

In addition there are several other hierarchies such as:

alt Groups that discuss alternative ways of looking at things

clari Commercial news service from ClariNet

biz Business related topics

Individual organizations may also have their own newsgroup hierarchies, usually for internal use.

Newsgroups may be moderated or un-moderated- in the latter, control on content is maintained by a moderator who looks at individual messages and decides whether they should be posted or not.

The user accesses newsgroups with a client program. Capabilities vary, but most client programs offer the options of reading messages, replying to them (by Email or through a posting), to mark messages as read and to post messages. Most client programs organize messages into threads so that a message and postings in reply to a message are displayed sequentially.

The key difference between Email and the USENET is the fact that in the latter, messages are not automatically delivered to the user but are stored on the news server. They are delivered to the user only when the user chooses to.

USENET has its own set of rules of behavior, usually referred to as Netiquette. Brendan P. Kehoe's Zen and the Art of the Internet (1994) has an excellent section on Netiquette. Another excellent source of information on Netiquette is "Emily Postnews Answers your Questions on Netiquette", an FAQ, which is frequently posted on USENET. An FAQ (Frequently Asked Questions) is a document updated and posted periodically to newsgroups to answer the typical questions posted by new entrants, in order to avoid frequent repetition.

The user's view of USENET is shown below:

Source: Krol, Ed. The Whole Internet User's Guide and Catalog. Sebastopol: O'Reilly & Associates, 1994.
Figure 5.3 User's View of USENET

The actual implementation of the News system is shown in the following diagram and is somewhat different from the user's view of it. Essentially, it consists of a number of News Servers which provide (and accept information feeds from one or more other servers).

Source: Krol, Ed. The Whole Internet User's Guide and Catalog. Sebastopol: O'Reilly & Associates, 1994.
Figure 5.4 Newsfeed System on USENET

The transmission of USENET news is entirely cooperative. Information feeds from one USENET site to another are generally provided out of goodwill and the desire to distribute information, though some commercial services do exist that provide feeds for a fee. A server's administrator makes bilateral agreements with other administrators to transfer certain newsgroups, usually over the Internet, between each other.

There are two major transport methods- UUCP (Unix-to-Unix Copy Program) and NNTP (Network News Transfer Protocol). With UUCP, news is transferred in batches when a neighbor site calls or the feed site happens to call. NNTP, on the other hand, offers a little more flexibility. Using a unique Message ID that is associated with each message, this protocol allows a site to ensure that a neighbor does not already have a particular message before sending it through. This prevents a site from receiving multiple copies of the same message from different neighbors.

When a message is posted to a particular newsgroup, it does not get posted on other servers simultaneously. But over a period of time, it will gradually spread until it appears on all the news servers in the world.

A message has a limited "life" which is controlled by the local news administrator. The life of a message varies from newsgroup to newsgroup depending on the nature of its subject and the volume of messages posted to it. If this were not there, the news server would quickly run out of disk space because it stores copies of all messages.

5.3 Telnet

Telnet is the main Internet protocol for creating an interactive connection with a remote machine by allowing the user to be on one computer system and work on another remote system, which may be across the street or across the world.

Telnet allows a host on the Internet to emulate a terminal circuit with any other host on the Internet. It establishes a TCP connection to a login sever, and passes on keystrokes from the user's terminal directly to the remote machine exactly as if it had been typed in on a terminal at the remote machine.

Source: Comer, Douglas E. Internetworking with TCP/IP: Principles, Protocols and Architecture. Englewood Cliffs: Prentice Hall Inc., 1991.
Figure 5.5 Telnet Process Model

Telnet offers three basic services (Comer 1991). First, it defines a virtual network terminal that provides a standard interface to remote systems. Second, it includes a mechanism that allows the client and server to negotiate options and it provides a set of standard options. Third, it treats both ends of the connection symmetrically.

5.4 FTP

FTP (File Transfer Protocol) is a protocol which allows files to be transferred from one computer to another over the Internet. It is one of the most popular Internet applications in terms of network traffic it generates, though it is being overtaken now by the World Wide Web.

There is often confusion between FTP and Telnet in the minds of users because both allow one to connect to other computers and obtain information. The essential difference is that in FTP files are physically copied from one computer to another whereas in Telnet, the user only interacts with another computer's services without any actual transfer of files. This is illustrated in the following diagram.

Source: LaQuey, Tracy. Internet Companion Plus: A Beginner's Start-Up Kit for Global Networking. Reading: Addison-Wesley Publishing Company, 1994.
Figure 5.6 Difference between FTP and Telnet

The key issues in FTP are that the client and server must agree on authorization, notions of file ownership and access protection and data formats. This is particularly important when one is dealing with heterogeneous machines with different operating platforms.

Thus, FTP needs to provide several facilities beyond the transfer function itself (Comer 1991) :

Interactive Access which allows users to interact more efficiently with the remote server
Format Specification which allows the client to specify the type and format of stored data
Authentication Control which requires clients to authorize themselves by sending a login name and password to the server before requesting file transfers.

The FTP process model is shown below.

Source: Comer Douglas E. Internetworking with TCP/IP: Principles, Protocols and Architecture. Englewood Cliffs: Prentice Hall Inc., 1991.
Figure 5.7 FTP Process Model

A special kind of FTP that is extremely popular on the Internet is Anonymous FTP. This allows users who do not have a login name or password to access certain files on a machine.

When anonymous FTP is enabled on a server, a special login name is created which allows anybody to login with username as anonymous. Although any character string may be used as password, it is generally considered good form to use your Email address as your password so that the administrators of the server have some idea of who you are and can get in touch with you if necessary. In fact, nowadays, many servers check your password to make sure it looks like an Email address (though, of course, there is no way they can ensure that the address is authentic).

There are a lot of resources available through anonymous FTP including software, archives of messages from mailing lists, USENET newsgroups, graphics, electronic texts etc. In order to conserve bandwidth, files are often compressed before being placed on-line. Also, many binary files on FTP sites are encoded as text to ensure correct transmission. So, in order to be able to use them, it is necessary to have utilities on the user's computer to decode and decompress the files. Fortunately, many FTP clients have the built in capability to decode files as they are brought in. Decompression usually requires a separate utility, though many sites use self-extracting archives which can expand themselves. Popular utilities on various platforms for decoding and decompressing files are also available from FTP sites.

The owners of the FTP sites are making their resources available to the general public without expectation of return. So it is a good idea to be considerate while using anonymous FTP. This includes providing your Email address (as mentioned above), reading the README files where they are provided (these usually contain information that the site owners feel you should be aware of), connecting to servers close to you wherever possible (many popular FTP sites are "mirrored" at many locations across the world for this purpose) and limiting connections to times which are outside normal working hours (in the local time of the site- keeping in mind that the Internet allows access to FTP sites all over the world).

5.5 Gopher

Gopher is a system which was created by researchers at that University of Minnesota which probably accounts for its name (the gopher being the school mascot). It is a distributed document delivery service which allows users to seamlessly access data residing on multiple hosts through a hierarchical arrangement of documents (Lane and Summerhill 1993).

In a sense, Gopher and FTP are competing standards for information retrieval. But Gopher has several advantages over FTP:

Gopher clients usually provide a far more user friendly interface than FTP clients.
It provides access information through one interface from many different kinds of sites including FTP, Telnet and WAIS (Wide Area Information Servers).
It provides a simple hierarchical menu system which is easy to understand and use.
The user is not required to know or use the actual addresses or locations of any of the information sources. That job is done by the gopher client by automatically connecting to the relevant host based on the user's selection.

The Gopher protocol rests on a metaphor of a file system and presents the user with a menu in the form of a list of document titles. Each Gopher menu item has a script associated with it which establishes the location and appropriate type of connection to access it. This includes the domain name of the server, the server port number, the type of the document, its title and a selector string identifying the document's location in the server's file system. The user does not see all this information though- all that is displayed is the title of the document.

When the user selects an item from the menu, the Gopher client automatically establishes an appropriate type of connection to retrieve the selected document, linking up to a different server if necessary.

The fact that the user does not have to know the location of a document to retrieve it is probably one of Gopher's greatest strengths. However, it can also prove to be a liability since the user has absolutely no idea of where on the Internet they are being taken by the client, resulting in getting "lost in Gopher space".

5.6 World Wide Web

The World Wide Web, started by Tim Berners-Lee while at CERN (the European Laboratory for Particle Physics), seeks to build a "distributed hypermedia system." In practice, the Web is a vast collection of interconnected documents, spanning the world.

While the Gopher system described in the previous section does offer significant value in terms of usability and organization of information, this very hierarchical organization can be a limitation because it requires sequential, hierarchical navigation to reach a document.

The World Wide Web uses a different model of organization- the hypertext model. Hypertext is data that contains links to other data. The advantage of hypertext is that in a hypertext document, if the user wants more information about a particular subject mentioned, he/she can usually "just click on it" to get more information. On the World Wide Web, links from a document can lead to any other kind of document including text, graphics, FTP files, sound and video.

The World Wide Web has created a much more user friendly and flexible means for users to navigate through the many different kinds of resources available on the Internet. This the primary reason for its ever increasing popularity. But this very flexibility is also creating its own kind of new problems by resulting in an extremely chaotic environment.

The World Wide Web is discussed in more detail in the next chapter.