Networks and Distributed Computing CS-233/333

Homework 3

    Due Thursday, May 1st, at the beginning of class.

Reading Assignment

    Read Chapter 4 Tanenbaum

Programming Exercise

    The point of this assignment is to build a simple web server.

HTTP Protocol Overview

An HTTP client issues a `GET' request to a server in order to retrieve a file. The general syntax of such a request is given below :

GET <sp> <Document Requested> <sp> HTTP/1.0 <crlf>
{<Other Header Information> <crlf>}*
<crlf>

where :

  • <sp> stands for a whitespace character and,
  • <crlf> stands for a carraige return-linefeed pair. i.e. a carriage return (ascii character 13) followed by a linefeed (ascii character 10).
  • <Document Requested> gives us the name of the file requested by the client. This could be just a backslash ( / ) if the client is requesting the default file on the server.
  • {<Other Header Information> <crlf>}* contains useful ( but not critical ) information sent by a client. These can be ignored for this lab. Note that this part can be composed of several lines each seperated by a <crlf>.

Finally, observe that the client ends the request with two carraige return - linefeed character pairs: <crlf> <crlf>

The function of a HTTP server is to parse the above request from a client, identify the file being requested and send the file across to the client. However, before sending the actual document, the HTTP server must send a response header to the client. The following shows a typical response from a HTTP server when the requested file is found on the server:

HTTP/1.1 <sp> 200 <sp> Document <sp> follows <crlf>
Server: <sp> <Server-Type> <crlf>
Content-type: <sp> <Document-Type> <crlf>
{<Other Header Information> <crlf>}*
<crlf>
<Document Data>

where :

  • <Server-Type> identifies the manufacturer/version of the server. For this lab, you can set this to cs 233 hw3.
  • <Document-Type> indicates to the client, the MIME type of document being sent. This should be text/html for an html document, image/gif for a gif file, text/plain for plain text, etc.
  • {<Other Header Information><crlf>}* as before, contains some additional useful header information for the client to use. These may be ignored for this lab.
  • <Document Data> is the actual document requested. Observe that this is separated from the response headers be two carraige return - linefeed pairs.

If the requested file cannot be found on the server, the server must send a response header indicating the error. The following shows a typical response:

HTTP/1.1 <sp> 404 File Not Found <crlf>
Server: <sp> <Server-Type> <crlf>
Content-type: <sp> <Document-Type> <crlf>
<crlf>
<Error Message>

where :

  • <Document-Type> indicates the type of document (i.e. error message in this case) being sent. Since you are going to send a plain text message, this should be set to text/plain.
  • <Error Message> is a human readable description of the error in plain text/html format indicating the error (e.g. Could not find the specified URL. The server returned an error).
Sample Code

Sample code is available in a tar file archive here. (See instructions in the last lab for dealing with the tarball.) The sample code includes a daytime server, TCPdaytimed, and a tipcat client. After expanding the sample code, make all the elements

senator> make

And then run the server in the background, specifying a port to use. Port numbers below 1024 are privledged (you need to be root to open one), so pick a number above 1024 and avoid port numbers already in use.

senator> TCPdaytimed 2049 &

Now to check that your copy of TCPdaytimed is running correctly, use timecat to connect to that port from another window with a command like "timecat <yourhostname> 2049" Among other things, you should see the time printed.

As you work with your code, you may hang up ports and create zombie processes. Once you create a zombie or hung process on a port number, you will not be able to open up another server on that port number, so you should use another port number above 1024. Eventually, you may create enough zombie processes that you may need to switch to another machine. Email techstaff the old machine name so they can reboot it. Use the man command to learn the details of each system call in the TCPdaytimed sample code.

There are also sample http server root directory files provided in a tar file, or you can again directly copy them from ~nugent/html/cs233/root. If you are copying them, be sure to use "cp -r" to recursively copy the subdirectories.

Included in the sample files are html files, plain text files and gif files. Your server should handle text/html and text/plain file types. You should recognize the file types by the suffix of the files. It is ok to hardcode those into your program. Test your server with any web browser, or use lynx if you are a purist.

Procedure and Algorithm Details

You will implement an iterative HTTP server that implements the following basic algorithm:

  • Open Passive Socket.
  • Do Forever
    • Accept new TCP connection
    • Read request from TCP connection and parse it.
    • Frame the appropriate response header depending on whether the URL requested is found on the server or not.
    • Write the response header to TCP connection.
    • Write requested document (if found) to TCP connection.
    • Close TCP connection

The server that you will implement in this step will not be concurrent, i.e, it will not serve more than one client at a time (it queues the remaining requests while processing each request). You can base your implementation on the TCPdaytimed server provided. The server should work as specified in the overview above.

Hints

You should implement your server in a single file. Copy the TCPdaytimed.c file to be your own httpd.c file and copy the line in the Makefile for TCPdaytimed, renaming the first line. You might want to create a separate function to parse the request string. You can also have a separate function that frames the response header to be sent to a client. As before, you can group all the socket functionality into a single functiion.

If you want to test your server with IE, Netscape, lynx or another web browser, use a url of "detroit.cs.uchicago.edu:2049" if your server is on port 2049 and you are on host detroit.

The grade will be based on how well your server works, the organization of your code, as well as any extra features you include to your project. Future programming labs will add additional features to this code, so don't skip this lab.

Deliverables

    Please turn in your source and makefile to the grader for this week.