A commonly asked question is "What is the Internet?" The reason such a question gets asked so often is because there's no agreed upon answer that neatly sums up the Internet.
The Internet can be thought about in relation to its common protocols, as a physical collection of routers and circuits, as a set of shared resources, or even as an attitude about interconnecting and intercommunication. Some common definitions given in the past include (Krol 1992):
The Internet is a large collection of networks (all of which run the TCP/IP protocols) that are tied together so that users of any of the networks can use the network services provided by TCP/IP to reach users on any of the other networks. (Malkin and Marine 1991).
Probably the best way of technically describing the Internet is as a global network of networks enabling computers of all kinds to directly and transparently communicate and share services throughout much of the world.
However, this definition does not consider the fact that the Internet is more than just the computers of which it comprises- it is also the people and information resources behind those machines.
At one level it is a vast collection of large and small interconnected computer networks extending all the way around the world. At another level it is all the people whose active participation makes the Internet a valuable information resource. (Butler 1994).
The Internet consists of a mind-bogglingly huge number of participants, connected machines, software programs and a massive quantity of information spread all over the world. (Engst 1994).
While the networks that make up the Internet are based on a standard set of protocols (a mutually agreed upon method of communication between parties), the Internet also has gateways to networks and services that are based on other protocols.
No one knows how big the Internet really is and no single person or organization runs it. It is the collective effort of many thousands of individuals and organizations in many different countries. The Internet is a sort of confederation- a worldwide collection of national, regional, campus and corporate networks.
Today's Internet is a global resource connecting millions of users that began as an experiment more than 25 years ago by the US. Department of Defense. The Advanced Research Projects Agency, a part of the U.S. Department of Defense initiated a project in 1968 to connect four sites- Stanford Research Institute, the University of California at Los Angeles, University of California at Santa Barbara and the University of Utah. In today's terminology the network they were building would be called a WAN (Wide Area Network). The four sites were connected in 1969 marking the birth of a precursor to today's Internet. Called ARPANET, it used a scheme called Network Control Protocol to manage the flow of data.
In 1973, the U.S. Defense Advanced Research Projects Agency (DARPA) initiated a research program to investigate techniques and technologies for interlinking packet networks of various kinds. The objective was to develop communication protocols which would allow networked computers to communicate transparently across multiple, linked packet networks. This was called the Internetting project and the system of networks which emerged from the research was known as the "Internet."
The ARPANET model was designed to support military research- in particular, research about how to build networks that could withstand partial outages (like through bomb attacks) and still function. In the ARPANET model, the network itself is assumed to be unreliable; any portion of the network could disappear at any moment. It was designed to require a minimum amount of information from the computer clients and the communicating computers themselves (and not the network) were given the responsibility for ensuring that the communication is accomplished. The philosophy was that every computer on the network could talk, as a peer, with any other computer. (Krol 1994).
The system of protocols which was developed over the course of this research effort became known as the TCP/IP Protocol Suite, after the two initial protocols developed: Transmission Control Protocol (TCP) and Internet Protocol (IP).
In 1986, the U.S. National Science Foundation (NSF) initiated the development of the NSFNET which, today, provides a major backbone communication service for the Internet. With its 45 megabit per second facilities, the NSFNET carries on the order of 12 billion packets per month between the networks it links.
The National Aeronautics and Space Administration (NASA) and the U.S. Department of Energy contributed additional backbone facilities in the form of the NSFNET and ESNET respectively. In Europe, major international backbones such as NORDUNET and others provide connectivity to over one hundred thousand computers on a large number of networks.
Commercial network providers in the U.S. and Europe now offer Internet backbone and access support on a competitive basis to any interested parties. "Regional" support for the Internet is provided by various consortium networks and "local" support is provided through each of the research and educational institutions. Within the United States, much of this support has come from the federal and state governments, but a considerable contribution has been made by industry. In Europe and elsewhere, support arises from cooperative international efforts and through national research organizations.
During the course of its evolution, particularly after 1989, the Internet system began to integrate support for other protocol suites into its basic networking fabric. The present emphasis in the system is on multi-protocol interworking, and in particular, with the integration of the Open Systems Interconnection (OSI) protocols into the architecture. Both public domain and commercial implementations of the roughly 100 protocols of TCP/IP protocol suite became available in the 1980's.
During the early 1990's, OSI protocol implementations also became available and, by the end of 1991, the Internet has grown to include some 5,000 networks in over three dozen countries, serving over 700,000 host computers used by over 4,000,000 people.
A great deal of support for the Internet community has come from the U.S. Federal Government, since the Internet was originally part of a federally-funded research program and, subsequently, has become a major part of the U.S. research infrastructure. During the late 1980's, however, the population of Internet users and network constituents expanded internationally and began to include commercial facilities. Indeed, the bulk of the system today is made up of private networking facilities in educational and research institutions, businesses and in government organizations across the globe
Internet Technical Evolution
The Internet has always functioned as a collaboration among cooperating parties. Certain key functions have been critical for its operation, not the least of which is the specification of the protocols by which the components of the system operate. These were originally developed in the DARPA research program mentioned above, but recent times, this work has been undertaken on a wider basis with support from Government agencies in many countries, industry and the academic community.
The Internet Activities Board (IAB) was created in 1983 to guide the evolution of the TCP/IP Protocol Suite and to provide research advice to the Internet community. During the course of its existence, the IAB has reorganized several times. It now has two primary components: the Internet Engineering Task Force and the Internet Research Task Force.
The former has primary responsibility for further evolution of the TCP/IP protocol suite, its standardization with the concurrence of the IAB, and the integration of other protocols into Internet operation (e.g. the Open Systems Interconnection protocols). The Internet Research Task Force continues to organize and explore advanced concepts in networking under the guidance of the Internet Activities Board and with support from various government agencies.
Throughout the development of the Internet, its protocols and other aspects of its operation have been documented first in a series of documents called Internet Experiment Notes and, later, in a series of documents called Requests for Comment (RFCs). The latter were used initially to document the protocols of the first packet switching network developed by DARPA, the ARPANET, beginning in 1969, and have become the principal archive of information about the Internet.
There are a number of Network Information Centers (NICs) located throughout the Internet to serve its users with documentation, guidance, advice and assistance. As the Internet continues to grow internationally, the need for high quality NIC functions increases. Although the initial community of users of the Internet were drawn from the ranks of computer science and engineering, its users now comprise a wide range of disciplines in the sciences, arts, letters, business, military and government administration.
Related Networks
In 1980-81, two other networking projects, BITNET and CSNET, were initiated. BITNET adopted the IBM RSCS protocol suite and featured direct leased line connections between participating sites. Most of the original BITNET connections linked IBM mainframes in university data centers. This rapidly changed as protocol implementations became available for other machines. From the beginning, BITNET has been multi-disciplinary in nature with users in all academic areas. It has also provided a number of unique services to its users (e.g., LISTSERV).
Today, BITNET and its parallel networks in other parts of the world (e.g., EARN in Europe) have several thousand participating sites. In recent years, BITNET has established a backbone which uses the TCP/IP protocols with RSCS-based applications running above TCP.
CSNET was initially funded by the National Science Foundation (NSF) to provide networking for university, industry and government computer science research groups. CSNET used the Phonenet MMDF protocol for telephone-based electronic mail relaying and, in addition, pioneered the first use of TCP/IP over X.25 using commercial public data networks. The CSNET name server provided an early example of a white pages directory service and this software is still in use at numerous sites.
At its peak, CSNET had approximately 200 participating sites and international connections to approximately fifteen countries. In 1987, BITNET and CSNET merged to form the Corporation for Research and Educational Networking (CREN). In the Fall of 1991, CSNET service was discontinued having fulfilled its important early role in the provision of academic networking service.
Today, the Internet also interacts with the networks of information service providers such as Compuserve, America On Line etc. through gateways.
There are two types of connections that can be provided between two computers- circuit-switched and packet switched (Comer 1991).
Circuit switched networks operate by forming a dedicated connection (circuit) between two points. The advantage of circuit switching lies in its guaranteed capacity: once a circuit is established, no other network activity will decrease the capacity of the circuit. An example of a circuit-switched network is the telephone system.
Packet-switched networks take a different approach, Traffic on the network is divided into small pieces called packets that are multiplexed on to high capacity inter-machine connections. A packet, which may contain only a few hundred bytes of data, carries identification that enables computers on the network to know whether it is intended for them or how to send it on to its correct destination. The main advantage of packet-switching are that multiple communications among computers on a network can proceed concurrently. Consequently, fewer interconnections are required and cost is kept low. Most computer networks are packet-switched.
Packet switched networks have become extremely popular for connecting computers because of their cost and performance. Packet switched networks that span large geographical distances (between cities or countries) are fundamentally different from those that span short distances (within a computer lab). To help characterize these differences, packet switched technologies are often divided into three broad categories: Wide Area Networks (WANs), Metropolitan Area Networks (MANs) and Local Area Networks (LANs) (Comer 1991).
WAN technologies allow endpoints to be arbitrarily far apart and are intended for use over large distances. They usually operate at slower speeds ranging from 9.6 Kbps (thousand bits per second) to 45 Mbps (million bits per second). A WAN network usually consists of a series of complex switches interconnected by communication lines. The size of a network can be extended by adding a new switch and another communication line.
MAN technologies span intermediate geographic areas and operate at medium to high speeds. Typical MANs operate at 56 Kbps to 100 Mbps. In MAN technologies, a network contains active switching elements that introduce short delays as they route data to its destination.
LAN technologies provide the highest speed but sacrifice the ability to span large distances. A typical LAN spans a building or a campus and operates between 4 Mbps and 2 Gbps (billion bits per second). In LAN technologies, each computer usually contains a network interface device that connects the machine directly to the network medium. Often, the network itself is passive, depending on electronic devices in the attached computers to generate and receive the necessary signals.
The goal of network protocol design is to hide the technological differences between networks, making interconnection independent of the underlying hardware. The protocol used on the Internet is called TCP/IP (Transmission Control Protocol/Internet Protocol).
4.2.1 TCP/IP
TCP/IP is a set of protocols developed to allow cooperating computers to share resources across a network
TCP/IP is built on "connectionless" technology. Information is transferred as a sequence of "datagrams". A datagram is a collection of data that is sent as a single message. The terms "datagram" and "packet" often seem to be nearly interchangeable. Technically, datagram is the right word to use when describing TCP/IP. A datagram is a unit of data, which is what the protocols deal with. A packet is a physical thing, appearing on an Ethernet or some wire. In most cases a packet simply contains a datagram, so there is very little difference.
Each of these datagrams is sent through the network individually. TCP (Transmission Control Protocol) is responsible for breaking up the message into datagrams, reassembling them at the other end, re-sending anything that gets lost, and putting things back in the right order. IP (Internet Protocol) is responsible for routing individual datagrams.
Transmission Control Protocol, in a nutshell, breaks data down into packets, each of which has a header containing the address of the host, the address of the destination, information for putting the data back together and information for making sure the packets don't get corrupted. It uses positive acknowledgment with re-transmission in order to ensure reliable transfer. This requires the recipient computer to communicate with the source, sending back an acknowledgment message as it receives data. The sender keeps a record of each packet it sends and waits for an acknowledgment before sending the next packet. The sender also starts a timer when it sends a packet and re-transmits the packet if the timer expires before it receives an acknowledgment. A simple positive acknowledgment protocol wastes a substantial amount of network bandwidth because it must delay sending a new packet until it receives an acknowledgment for the previous packet. In order to make the transmission more efficient, a sliding window concept is used which allows the sender to transmit multiple packets before waiting for an acknowledgment.
The Internet Protocol specifies the exact format of all data that passes across a TCP/IP network and includes a set of rules that characterize how hosts and gateways should process packets, how and when error messages should be generated and the conditions under which packets can be discarded. The Internet Protocol's main function is to perform the routing function, to move datagrams through an interconnected set of networks from source to destination. This is done by passing the datagrams from one Internet module to another until the destination is reached. In order to allow gateways or other intermediate systems to forward the datagram, it adds its own header. The main things in this header are the source and destination Internet address. IP routing consists of deciding where to send a datagram based on its destination IP address. The route is direct if the destination machine lies on a network to which the sending machine attaches. The route is indirect if the datagram must be sent to a gateway for delivery. Generally, hosts send indirectly routed datagrams to the nearest gateway: the datagrams travel through the Internet from gateway to gateway until they can be delivered directly across one physical network.
This method of routing has the advantage that if a connection is disrupted, an alternative route can usually be found. In fact, it is quite possible that within a single data transfer, various packets might follow different routes. Another advantage of this routing system is that as conditions change, it can use the best connection available.
4.2.2 IP Addresses
Datagrams are routed from one Internet module to another through individual networks based on the interpretation of an Internet address. Thus, an important mechanism of TCP/IP is the Internet address.
Each host on the Internet is assigned a unique integer address called its Internet address or IP address as it is more commonly known. The IP address is a 32 bit number which is carefully chosen to make routing efficient by identifying the network to which the host is attached as well as identifying a unique host on the network. Conceptually, each address is a pair consisting of a network id and a host id. There are three classes of IP addresses- Class A addresses are used for the few large networks that have more than 65,536 hosts; Class B addresses are used for intermediate size networks with between 256 and 65,536 hosts; Class C addresses are used for networks with less than 256 hosts.
IP addresses are written as four decimal integers separated by decimal points, where each integer gives the value of one octet of the IP address. To insure that the network portion of the IP address is unique, all IP addresses are assigned by a central authority, the Network Information Center (NIC). The central authority only assigns the network portion of the address and delegates responsibility for assigning host addresses to the requesting organization.
4.2.3 Domain Names
IP addresses provide a convenient, compact representation for specifying the source and destination of packets sent across the Internet but are not convenient for humans to use and remember.
The domain name system was created to address this need. This involves a hierarchical scheme where the authority for names in subdivisions is passed on to a designated agent. A domain name consists of a sequence of sub-names separated by periods.
For example: office1.id.ohio-state.edu
In any machine name, the final word after the last dot is the top level domain and a limited number of these exist:
.com Commercial Organizations
.edu Educational Institutions
.gov Government Institutions
.mil Military Establishments
.net Network Support Centers
.org Organizations other than the above
In addition to the above, networks in countries other than the US add a two letter country code as the top level domain.
The next level in the domain name usually corresponds to the organization within which the network resides. For example, all hosts at The Ohio State University have domain names ending in .ohio-state.edu
Subsequent levels are assigned by the organizations themselves and may vary quite a lot. Thus, all computers at the Department of Industrial Design at The Ohio State University have names ending in .id.ohio-state.edu.
The first word in the domain name is the name of the individual computer.
While these names are relatively easy for human users to remember, there still has to exist the means for translating these names into IP addresses and vice-versa. The domain name scheme therefore also includes an efficient, reliable, general purpose, distributed system for mapping names to addresses.
This involves a number of independent, cooperative systems called name servers. There is a root server which contains information about the top level domains and a server for each organization. Each name server maintains information on all the hosts within its sub-domain. When a domain name server receives a query, it checks to see if the name is in the sub-domain for which it is an authority. If so, it translates the name to an IP address and returns it to the client. If the name server cannot resolve the name completely, it contacts a domain name server that can resolve the name and returns the answer to the client. Internet name servers use name caching to optimize search costs- each server maintains a cache of recently used names. When it receives a request, it first check to see if the name has been resolved recently before contacting another name server.
There are three basic levels of connectivity to the Internet:
Conventional Dialup
SLIP/PPP Dialup
Direct Access
Conventional Dialup is the most basic level of access where the user dials into an interactive system offered by a service provider. The user workstation runs a terminal program allowing access to application programs on the service providers computer. This is primarily a text based view of the Internet.
SLIP/PPP Dialup access makes it possible for IP to communicate over dialup lines. With SLIP (Serial Line Internet Protocol) or PPP (Point-to Point Protocol), the users computer can conduct TCP/IP communications as if it were directly connected to the Internet. This means that the user can run any client software that can be used on the Internet.
Direct Access to the Internet most commonly involves a Local Area Network directly connected to the Internet. The speed of the connection can range from a telephone line that runs at 56 Kbps to a T1 line which runs at 1.544 Mbps.
There is no single parameter that can be used as a measure of the size and growth of the Internet. Various indicators have been used in the past as an indication of the rate of growth of the Internet including:
From the above figures it is amply clear that the Internet is a truly global network and is growing at a phenomenal rate.
It is extremely difficult to get accurate information on how many people actually have access to the Internet. A recent survey (based on a sample of 4200 adults) reported in the Columbus Dispatch (Tebben 1996) indicates that 40 percent of Columbus adults own a home computer. Of these computer owners, 60 percent own a modem and 19 percent subscribe to an on-line service. This would indicate that only about 10 percent of households have on-line access. However, the actual number of people who have Internet access in some form or the other is probably much higher since the survey did not measure the number of people who have access to the Internet through their employer or through an educational institution.
Others surveys have revealed comparable levels of Internet access among adult Americans. The following surveys are quoted in "The Networking of America" (Piper Resources http://www.piperinfo.com/):
While these figures themselves do not indicate a high level of Internet access, what is interesting is the trends in Internet access which is growing at a remarkable rate. Again, according to "The Networking of America" (Piper Resources http://www.piperinfo.com/):
"While the size of the on-line audience may not now justify classifying it as a mass medium (30 percent is normally the figure advertising agencies use to gauge "mass" penetration), the pace of growth is rather phenomenal. And the direction of growth is definitely moving towards broader consumer acceptance."
The importance of access to the information highway was recognized by the US government, when it launched its National Information Infrastructure (NII) initiative in September 1993. According to the Agenda for Action, the aim of the NII is to complement and enhance the efforts of the private sector and assure the growth of an information infrastructure available to all Americans at reasonable cost. Some of the key principles and objectives of the NII are:
The definition of the NII is relatively broad and includes much more than just the Internet. According to the Agenda for Action, "the NII includes more than just the physical facilities used to transmit, store, process, and display voice, data, and images. It encompasses a wide range and ever-expanding range of equipment including cameras, scanners, keyboards, telephones, fax machines, computers, switches, compact disks, video and audio tape, cable, wire, satellites, optical fiber transmission lines, microwave nets, switches, televisions, monitors, printers, and much more. The NII will integrate and interconnect these physical components in a technologically neutral manner so that no one industry will be favored over any other. Most importantly, the NII requires building foundations for living in the Information Age and for making these technological advances useful to the public, business, libraries, and other non-governmental entities."
But there are strong economic drivers which point to an increasing importance of the Internet as part of the National information Infrastructure. Kahin (1995) presents a comparison of the cost in US dollars of transmitting 4 pages of ASCII Text through various means:
Local | Coast-to-Coast | Trans-Pacific | |
Voice | .10 | 2.00 | 16.00 |
Fax | .06 | 1.00 | 8.00 |
Commercial Email | .50 | .50 | .50 |
Commercial Internet | .02 | .02 | .02 |
The Internet's technological environment offers dramatic economies of scale. It allows extremely efficient aggregation of traffic and sharing of network resources. Specifically, it allows traffic from a large number of users to be combined with the expectation that the uses will average over time through "statistical multiplexing" (Kahin, 1995).
Based on the preceding discussion, it is possible to make a hypothesis that access to the Internet, in some form or the other, will become practically universal in the near future. In fact, it is becoming increasingly impossible to avoid on-line access. According to a feature article in Time (Ratan 1995) access to the information super highway may prove to less a question of privilege or position than one of the basic ability to function in a democratic society. It may determine how well people are educated, the kind of job they will eventually get, how they are retrained if they lose their job, how much access they have to their government and how they will learn about the critical issues affecting them and their country.
While the trends discussed earlier in this section are particular to the United States, the globalization drivers which were discussed in an earlier chapter are making this happen in countries all over the world. This fact is supported by the following figure showing host count growth in the top 16 countries excluding the United States.