Client/server model, Internet protocols

What is Internet?

If two or more computers are connect by a network, it's called internet (with a lower-case i). The Internet with a capital I refers to millions of computers connected to a gigantic network and communicating via TCP/IP protocols. A protocol is a pre-defined way for a computer to communicate with another computer, for instance when requesting a service, s.a. an FTP service, or when forwarding some information to another machine. Each computer at any given time has a unique address on the Internet. This is its IP address.

A bit of Internet history

Until the 1960s, very little communication was taking place between computers, and was carried out by the technology of telephone networks, i.e. circuit switching. In 1962 Paul Baran and Donald Davies independently proposed an idea of a robust, efficient, store-and-forward data network based on packets, i.e. units odf data carried independently from one another. The technology was called packet switching, and has been implemented as ARPANET (a network of research sites in US, a project funded by the US Department of Defense) in late 1960s. In early 1970s ARPANET spanned the continential US, and by 1973 it had connections to Europe.

ARPANET had several protocols for communications of computers one to another or with the network. In 1974 a new, more robust suite of protocols has been developed and implemented on the ARPANET based on Transmission Control Protocol (TCP) and Internet Protocol (IP). The standard includes a large collection of protocols, some of which we mention later. These protocols have been modified several times since then, but essential ideas of the original protocols are still preserved. In 1983 the US Department of Defense has mandated that all of their computer systems use TCP/IP protocols, which boosted use of these protocols in the US and throughout the world. Another boost to TCP/IP was given by including it, also in 1983, into the communication kernel of the University of California's UNIX implementation.

In 1986 the National Science Foundation (NSF) built a backbone network (i.e. a network of fast powerful computers which can quickly forward information to each other and other computers) called NSFNET. Eventually it grew to provide connectivity between various networks, forming the foundation of what curently is known as Internet.

In 1993 the NSF started to reduce its role in governing the Internet. Currently the administration of the Internet is divided between several international organizations, s.a. ISOC (Internet Society) and IAB (Internet Activities Board), which supervise various technical, administrative, and other aspects of the Internet.

Modern Internet has over 32,000,000 registered domain names (according to domainstats.com/ provided by ISOC). The size of Internet doubles every 10-12 months. According cnn.com, more than 50% American households are now connected to the Internet. The following pictures illustrate the growth of the number of Internet hosts:

Number of Internet hosts

Number of WWW web sites

Source: http://www.zakon.org/robert/internet/timeline/, see also other fascinating information there.

Client/server model

Computers on Internet that provide services to other machines are called servers (s.a. Web servers or FTP servers). Machines that request services from another machine are called clients. When you want to read a web page and type in its URL in a browser, the browser requests this page from a Web server on which this page resides. A server may provide more than one service, for instance a machine may be a database server and a web server at the same time. A client requests a specific service from a server.

A machine can be both a client and a server. For instance, a large UNIX machine may be a Web server and FTP server, but for a user who uses it to read a web page in a browser it would also act as a client. While roles of machines on Internet are not unique, some machines (usually large, fast, reliable, and always connected to Internet) act primarily as servers, while others (smaller, with slower modems, connected to the Internet only as needed) acts as clients.

Domain names and IP addresses

Ineternet hosts use a hierarchical naming structure consisting of a top-level domain name, a domain and a subdomain name (optional) and a host name. For instance, cs.wellesley.edu refers to a host puma at domain wellesley within the educational top-level domain. Other top-level domains include .com for commercial organizations, .net for Network providers, .gov for the US governement, .mil for military, and so on. International top-level domains use two-letter country codes, s.a. fr for France, jp for Japan, and so on. Interestingly, uk is not an official code for the United Kingdom, gb (Great Britain) is. The country code reflects the affiliation of the host, and not necessarily where it is located.

Domain names are informative and convenient for people to remember. However, computer use numberical IP addreses instead. Domain names are translated to IP addresses via Domain Name Service (DNS) -- a disributed database which keeps track of computer's names and their corresponding IP addresses. Many computers connected to the Internet host part of the DNS database and allow others to access it. These computers are known as DNS servers. No DNS server contains the entire database; they only contain a subset of it. If a DNS server does not contain the domain name requested by another computer, the DNS server re-directs the requesting computer to another DNS server higher in the hierarchy.

Every computer connected to Internet has an IP address which no other computer can have at the same time. If a computer connects to Internet via an Internet Service Provider (ISP), it may be assigned a temporary IP address for the duration of the session. IP adresses are in the form nnn.nnn.nnn.nnn where nnn must be a number from 0 - 255. For instance, puma has IP address 149.130.13.118.

Internet Infrastructure and Routing

Suppose I want to send a message (some piece of data) to another computer. Given the computer's IP address, how does the data find its way on the Internet? The information used to get data to its destination is contained in routing tables kept by each router, i.e. a machine that routes packages. Each router knows about it's sub-networks and which IP addresses they use. The router usually doesn't know what IP addresses are in other subnetworks of the Internet. Routers form a tree-like structure on the Internet with NSP (Network Service Provider) backbones at the roots, connected to one another.

When a packet arrives at a router, the router examines the IP address of its destination. The router checks it's routing table. If the network containing the IP address is found, the packet is sent to that network. If the network containing the IP address is not found, then the router sends the packet on a default route, usually up the backbone hierarchy to the next router. Hopefully the next router will know where to send the packet. If it does not, again the packet is routed upwards until it reaches a NSP backbone. The routers connected to the NSP backbones hold the largest routing tables and here the packet will be routed to the correct backbone, where it will begin its journey 'downward' through smaller and smaller networks until it finds its destination.

This process is known as package routing, where a package is a piece of data wrapped into an "envelope" with all the necessary information for routing the package and for sending it to the corrected application on the destination machine. TCP/IP protocols described below provide the wrapping and unwrapping.

TCP/IP protocols

There are 4 protocol layers that data goes through before it gets sent off to another computer. If the message to be sent is long, it will be broken into smaller chunks called packets.

  1. Application Protocols Layer Protocols specific to applications such as WWW, e-mail, FTP, etc.
  2. Transmission Control Protocol Layer TCP attaches a port number of the destination service to packets to be sent out, and directs arriving packets to a specific application on a computer using a port number. Every internet service communicates at a specific port. Ports are integer numbers which allow a computer to distinguish between several internet communications happening at the same time, but facilitates by different internet applications. For instance, since FTP and HTTP use different ports, a computer can be downloading a file and displaying web pages at the same time. Most common ports are: A port number may be specified explicitly, in addition to a domain name, to connect to a specific port. For instance, http://cs.wellesley.edu:80 means that you are connection to puma at port 80. Since it happens to be the default port for HTTP, adding the port number sends the request to the port where it was going by default anyway. However, if someone is running a Web service at some other port, say 111, you need to specify that you are sending a request to a non-standard port if you want to get a web page from their server: http://someone.whatever.edu:111

    TCP also handles assembly and disassembly of long messages into packets (smaller chunks of data). Each packet will be "wrapped" separately by all the protocols that the packet goes through. It will also get a sequence number, so that the packets will be reassembled at the destination (packets may arrive out of order). The last packet is marked to indicate that the message is complete.

  3. Internet Protocol Layer IP converts the domain name into IP address, attaches the IP address of the sending machine, directs packets to a specific computer using an IP address.
  4. Hardware Layer Converts binary packet data to network signals and back. (E.g. ethernet network card, modem for phone lines, etc.). That's how the data actually gets transmitted.

More about TCP and IP

TCP is a connection-oriented, reliable, byte stream service. Connection-oriented means that two applications using TCP must first establish a connection before exchanging data. TCP is reliable because for each packet received, an acknowledgement is sent to the sender to confirm the delivery. TCP also includes a checksum in it's header for error-checking the received data. It sends data in byte streams, not character streams, meaning that any binary data (pictures, Excel files, etc.) may be sent using TCP.

Unlike TCP, IP is an unreliable, connectionless protocol. IP doesn't care whether a packet gets to it's destination or not. Nor does IP know about connections and port numbers. IP's job is too send and route packets to other computers. IP packets are independent entities and may arrive out of order or not at all. It is TCP's job to make sure packets arrive and are in the correct order.

The picture given here is actually greatly simplified: both TCP and IP are actually not a single protocol, but a suite of small protocols, each performing small tasks, such as establishing connection with annother machine, checking packages for corrupted data, and so on.

Application level protocols

Application level protocols, s.a. HTTP or FTP, build on top of TCP and IP.

HTTP is a connectionless text based protocol. Clients (web browsers) send requests to web servers for web elements such as web pages and images. After the request is serviced by a server, the connection between client and server across the Internet is disconnected. A new connection must be made for each request. This means that HTTP is not connection-oriented. On the practical side, it means that after a requested HTML file gets downloaded, your computer is disconnected from the server, then your browser discoveres that it needs a .gif image for the page, makes another request to the server, disconnects again, and so on.

Telnet is another application level protocol, which allows a user to log in from one Internet host to another.

FTP (file transfer protocol) facilitates file transfer between a local and a remote host. Various applications based on FTP include FTP program for Windows, Fetch for Macintosh, and others.

SMTP (simple mail transfer protocol) is a protocol for exchanging electronic mail.

SSL (secure socket layer protocol) is a protocol for transfering encrypted text.


Some material on this page has been adopted from a subset of online sources listed here
This page has been created and is maintained by Elena Machkasova
Comments and suggestions are welcome at emachkas@wellesley.edu

Spring Semester 2002