Friday 24 March 2017

A Caching Proxy HTTP Server - Java Tutorials ( Page 1 of 2 )

In the remainder of this section, we will develop a simple caching proxy HTTP server, called http, to demonstrate client and server sockets. http supports only GET operations and a very limited range of hard-coded MIME types. (MIME types are the type descriptors for multimedia content.) The proxy HTTP server is single threaded, in that each request is handled in turn while all others wait. It has fairly naive strategies for caching—it keeps everything in RAM forever. When it is acting as a proxy server, http also copies every file it gets to a local cache for which it has no strategy for refreshing or garbage collecting. All of these caveats aside, http represents a productive example of client and server sockets, and it is fun to explore and easy to extend.


Source Code

The implementation of this HTTP server is presented here in five classes and one interface. A more complete implementation would likely split many of the methods out of the main class, httpd, in order to abstract more of the components. For space considerations in this book, most of the functionality is in the single class, and the small support classes are only acting as data structures. We will take a close look at each class and method to examine how this server works, starting with the support classes and ending with the main program.

MimeHeader.java
MIME is an Internet standard for communicating multimedia content over e-mail systems. This standard was created by Nat Borenstein in 1992. The HTTP protocol uses and extends the notion of MIME headers to pass general attribute/value pairs between the HTTP client and server.

CONSTRUCTORS   This class is a subclass of Hashtable so that it can conveniently store and retrieve the key/value pairs associated with a MIME header. It has two constructors. One creates a blank MimeHeader with no keys. The other takes a string formatted as a MIME header and parses it for the initial contents of the object. See parse( ) next.

parse( )    The parse( ) method is used to take a raw MIME-formatted string and enter its key/value pairs into a given instance of MimeHeader. It uses a StringTokenizer to split the input data into individual lines, marked by the CRLF (\r\n) sequence. It then iterates through each line using the canonical while ... hasMoreTokens( ) ... nextToken( ) sequence.

For each line of the MIME header, the parse( ) method splits the line into two strings separated by a colon (:). The two variables key and val are set by the substring( ) method to extract the characters before the colon, those after the colon, and its following space character. Once these two strings have been extracted, the put( ) method is used to store this association between the key and value in the Hashtable.

toString( )    The toString( ) method (used by the String concatenation operator, +) is simply the reverse of parse( ). It takes the current key/value pairs stored in the MimeHeader and returns a string representation of them in the MIME format, where keys are printed followed by a colon and a space, and then the value followed by a CRLF. put( ), get( ), AND fix( ) The put( ) and get( ) methods in Hashtable would work fine for this application if not for one rather odd thing. The MIME specification defined several important keys, such as Content-Type and Content-Length. Some early implementors of MIME systems, notably web browsers, took liberties with the capitalization of these fields. Some use Content-type, others content-type. To avoid mishaps, our HTTP server tries to convert all incoming and outgoing MimeHeader keys to be in the canonical form, Content-Type. Thus, we override put( ) and get( ) to convert the values’ capitalization, using the method fix( ), before entering them into the Hashtable and before looking up a given key.

THE CODE    Here is the source code for MimeHeader:

  import java.util.*;

  class MimeHeader extends Hashtable {
    void parse(String data) {
      StringTokenizer st = new StringTokenizer(data, "\r\n");

      while (st.hasMoreTokens()) {
        String s = st.nextToken();
        int colon = s.indexOf(':');
        String key = s.substring(0, colon);
        String val = s.substring(colon + 2); // skip ": "
        put(key, val);
      }
    }

    MimeHeader() {}

    MimeHeader(String d) {
      parse(d);
    }

    public String toString() {
      String ret = "";
      Enumeration e = keys();

      while(e.hasMoreElements()) {
        String key = (String) e.nextElement();
        String val = (String) get(key);
        ret += key + ": " + val + "\r\n";
      } 
      return ret;
    }

    // This simple function converts a mime string from
    // any variant of capitalization to a canonical form.
    // For example: CONTENT-TYPE or content-type to Content-Type,
    // or Content-length or CoNTeNT-LENgth to Content-Length.
    private String fix(String ms) {
      char chars[] = ms.toLowerCase().toCharArray();
      boolean upcaseNext = true;

      for (int i = 0; i < chars.length - 1; i++) {
        char ch = chars[i];
        if (upcaseNext && 'a' <= ch && ch <= 'z') {
          chars[i] = (char) (ch - ('a' - 'A'));
        }
        upcaseNext = ch == '-';
      }
      return new String(chars);
    }

    public String get(String key) {
      return (String) super.get(fix(key));
    }

    public void put(String key, String val) {
      super.put(fix(key), val);
    }
  }


HttpResponse.java

The HttpResponse class is a wrapper around everything associated with a reply from an HTTP server. This is used by the proxy part of our httpd class. When you send a request to an HTTP server, it responds with an integer status code, which we store in statusCode, and a textual equivalent, which we store in reasonPhrase. This single-line response is followed by a MIME header, which contains further information about the reply. We use the previously explained MimeHeader object to parse this string. The MimeHeader object is stored inside the HttpResponse class in the mh variable. These variables are not made private so that the httpd class can use them directly.

CONSTRUCTORS     If you construct an HttpResponse with a string argument, this is taken to be a raw response from an HTTP server and is passed to parse( ), described next, to initialize the object. Alternatively, you can pass in a precomputed status code, reason phrase, and MIME header.

parse( )     The parse( ) method takes the raw data that was read from the HTTP server, parses the statusCode and reasonPhrase from the first line, and then constructs a MimeHeader out of the remaining lines.

toString( )     The toString( ) method is the inverse of parse( ). It takes the current values of the HttpResponse object and returns a string that an HTTP client would expect to read back from a server.

THE CODE     Here is the source code for HttpResponse:

  import java.io.*;
  /*
   * HttpResponse
   * Parse a return message and MIME header from a server.
   * HTTP/1.0 302 Found = redirection, check Location for where.
   * HTTP/1.0 200 OK = file data comes after mime header.
   */

  class HttpResponse
  {
    int statusCode; // Status-Code in spec
    String reasonPhrase; // Reason-Phrase in spec
    MimeHeader mh;
    static String CRLF = "\r\n";

    void parse(String request) {

      int fsp = request.indexOf(' ');
      int nsp = request.indexOf(' ', fsp+1);
      int eol = request.indexOf('\n');
      String protocol = request.substring(0, fsp);
      statusCode = Integer.parseInt(request.substring(fsp+1, nsp));
      reasonPhrase = request.substring(nsp+1, eol);
      String raw_mime_header = request.substring(eol + 1);
      mh = new MimeHeader(raw_mime_header);
    }

    HttpResponse(String request) {
      parse(request);
    }

    HttpResponse(int code, String reason, MimeHeader m) {
      statusCode = code;
      reasonPhrase = reason;
      mh = m;
    }

    public String toString() {
      return "HTTP/1.0 " + statusCode + " " + reasonPhrase + CRLF +
        mh + CRLF;
    }
  }


UrlCacheEntry.java

To cache the contents of a document on a server, we need to make an association between the URL that was used to retrieve the document and the description of the document itself. A document is described by its MimeHeader and the raw data. For example, an image might be described by a MimeHeader with Content-Type: image/gif, and the raw image data is just an array of bytes. Similarly, a web page will likely have a Content-Type: text/html key/value pair in its MimeHeader, while the raw data is the contents of the HTML page. Again, the instance variables are not marked as private so that httpd can have free access to them.

CONSTRUCTOR     The constructor for a UrlCacheEntry object requires the URL to use as the key and a MimeHeader to associate with it. If the MimeHeader has a field in it called Content-Length (most do), the data area is preallocated to be large enough to hold such content.

append( )     The append( ) method is used to add data to a UrlCacheEntry object. The reason this isn’t simply a setData( ) method is that the data might be streaming in over a network and need to be stored a chunk at a time. The append( ) method deals with three cases. In the first case, the data buffer has not been allocated at all. In the second, the data buffer is too small to accommodate the incoming data, so it is reallocated. In the last case, the incoming data fits just fine and is inserted into the buffer. At any time, the length member variable holds the current valid size of the data buffer.

THE CODE     Here is the source code for UrlCacheEntry:

  class UrlCacheEntry
  {
    String url;
    MimeHeader mh;
    byte data[];
    int length = 0;

    public UrlCacheEntry(String u, MimeHeader m) {
      url = u;
      mh = m;
      String cl = mh.get("Content-Length");
      if (cl != null) {
        data = new byte[Integer.parseInt(cl)];
      }
    }

    void append(byte d[], int n) {
      if (data == null) {
        data = new byte[n];
        System.arraycopy(d, 0, data, 0, n);
        length = n;
      } else if (length + n > data.length) {
        byte old[] = data;
        data = new byte[old.length + n];
        System.arraycopy(old, 0, data, 0, old.length);
        System.arraycopy(d, 0, data, old.length, n);
      } else {
        System.arraycopy(d, 0, data, length, n);
        length += n;
      }
    }
  }


LogMessage.java

LogMessage is a simple interface that declares one method, log( ), which takes a single String parameter. This is used to abstract the output of messages from the httpd. In the application case, this method is implemented to print to the standard output of the console in which the application was started. In the applet case, the data is appended to a windowed text buffer.

THE CODE     Here is the source code for LogMessage:

  interface LogMessage {
    public void log(String msg);
  }

httpd.java
This is a really big class that does a lot. We will walk through it method by method.

CONSTRUCTOR     There are five main instance variables: port, docRoot, log, cache, and stopFlag, and all of them are private. Three of these can be set by httpd’s lone constructor, shown here:

  httpd(int p, String dr, LogMessage lm)

It initializes the port to listen on, the directory to retrieve files from, and the interface to send messages to.

The fourth instance variable, cache, is the Hashtable where all of the files are cached in RAM, and is initialized when the object is created. stopFlag controls the execution of the program.

STATIC SECTION     There are several important static variables in this class. The version reported in the “Server” field of the MIME header is found in the variable version. A few constants are defined next: the MIME type for HTML files, mime_text_html; the MIME end-of-line sequence, CRLF; the name of the HTML file to return in place of raw directory requests, indexfile; and the size of the data buffer used in I/O, buffer_size.

Then mt defines a list of filename extensions and the corresponding MIME types for those files. The types Hashtable is statically initialized in the next block to contain the array mt as alternating keys and values. Then the fnameToMimeType( ) method can be used to return the proper MIME type for each filename passed in. If the filename does not have one of the extensions from the mt table, the method returns the defaultExt, or “text/plain.”

STATISTICAL COUNTERS     Next, we declare five more instance variables. These are left without the private modifier so that an external monitor can inspect these values to display them graphically. (We will show this in action later.) These variables represent the usage statistics of our web server. The raw number of hits and bytes served is stored in hits_served and bytes_served. The number of files and bytes currently stored in the cache is stored in files_in_cache and bytes_in_cache. Finally, we store the number of hits that were successfully served out of the cache in hits_to_cache.

toBytes( )     Next, we have a convenience routine, toBytes( ), which converts its string argument to an array of bytes. This is necessary, because Java String objects are stored as Unicode characters, while the lingua franca of Internet protocols such as HTTP is good old 8-bit ASCII.

makeMimeHeader( )     The makeMimeHeader( ) method is another convenience routine that is used to create a MimeHeader object with a few key values filled in. The MimeHeader that is returned from this method has the current time and date in the Date field, the name and version of our server in the Server field, the type parameter in the Content-Type field, and the length parameter in the Content-Length field.

error( )     The error( ) method is used to format an HTML page to send back to web clients who make requests that cannot be completed. The first parameter, code, is the error code to return. Typically, this will be between 400 and 499. Our server sends back 404 and 405 errors. It uses the HttpResponse class to encapsulate the return code with the appropriate MimeHeader. The method returns the string representation of that response concatenated with the HTML page to show the user. The page includes a human-readable version of the error code, msg, and the url request that caused the error.

getRawRequest( )     The getRawRequest( ) method is very simple. It reads data from a stream until it gets two consecutive newline characters. It ignores carriage returns and just looks for newlines. Once it has found the second newline, it turns the array of bytes into a String object and returns it. It will return null if the input stream does not produce two consecutive newlines before it ends. This is how messages from HTTP servers and clients are formatted. They begin with one line of status and then are immediately followed by a MIME header. The end of the MIME header is separated from the rest of the content by two newlines.

logEntry( )     The logEntry( ) method is used to report on each hit to the HTTP server in a standard format. The format this method produces may seem odd, but it matches the current standard for HTTP log files. This method has several helper variables and methods that are used to format the date stamp on each log entry. The months array is used to convert the month to a string representation. The host variable is set by the main HTTP loop when it accepts a connection from a given host. The fmt02d( ) method formats integers between 0 and 9 as two-digit, leading-zero numbers. The resulting string is then passed through the LogMessage interface variable log.

writeString( )     Another convenience method, writeString( ), is used to hide the conversion of a String to an array of bytes so that it can be written out to a stream.

writeUCE( )     The writeUCE( ) method takes an OutputStream and a UrlCacheEntry. It extracts the information out of the cache entry in order to send a message to a web client containing the appropriate response code, MIME header, and content.

serveFromCache( )     This Boolean method attempts to find a particular URL in the cache. If it is successful, then the contents of that cache entry are written to the client, the hits_to_cache variable is incremented, and the caller is returned true. Otherwise, it simply returns false.

loadFile( )     This method takes an InputStream, the url that corresponds to it, and the MimeHeader for that URL. A new UrlCacheEntry is created with the information stored in the MimeHeader. The input stream is read in chunks of buffer_size bytes and appended to the UrlCacheEntry. The resulting UrlCacheEntry is stored in the cache. The files_in_cache and bytes_in_cache variables are updated, and the UrlCacheEntry is returned to the caller.

readFile( )     The readFile( ) method might seem redundant with the loadFile( ) method. It isn’t. This method is strictly for reading files out of a local file system, where loadFile( ) is used to talk to streams of any sort. If the File object, f, exists, then an InputStream is created for it. The size of the file is determined and the MIME type is derived from the filename. These two variables are used to create the appropriate MimeHeader, then loadFile( ) is called to do the actual reading and caching.

writeDiskCache( )     The writeDiskCache( ) method takes a UrlCacheEntry object and writes it persistently into the local disk. It constructs a directory name out of the URL, making sure to replace the slash (/) characters with the system-dependent separatorChar. Then it calls mkdirs( ) to make sure that the local disk path exists for this URL. Lastly, it opens a FileOutputStream, writes all the data into it, and closes it.

handleProxy( )     The handleProxy( ) routine is one of the two major modes of this server. The basic idea is this: If you set your browser to use this server as a proxy server, then the requests that will be sent to it will include the complete URL, where normal GETs remove the “http://” and host name part. We simply pick apart the complete URL, looking for the “://” sequence, the next slash (/), and optionally another colon (:) for servers using nonstandard port numbers. Once we’ve found these characters, we know the intended host and port number as well as the URL we need to fetch from there. We can then attempt to load a previously saved version of this document out of our RAM cache. If this fails, we can attempt to load it from the file system into the RAM cache and reattempt loading it from the cache. If that fails, then it gets interesting, because we must read the document from the remote site.

To do this, we open a socket to the remote site and port. We send a GET request, asking for the URL that was passed to us. Whatever response header we get back from the remote site, we send on to the client. If that code was 200, for successful file transfer, we also read the ensuing data stream into a new UrlCacheEntry and write it onto the client socket. After that, we call writeDiskCache( ) to save the results of that transfer to the local disk. We log the transaction, close the sockets, and return.

handleGet( )     The handleGet( ) method is called when the http daemon is acting like a normal web server. It has a local disk document root out of which it is serving files. The parameters to handleGet( ) tell it where to write the results, the URL to look up, and the MimeHeader from the requesting web browser. This MIME header will include the User-Agent string and other useful attributes. First we attempt to serve the URL out of the RAM cache. If this fails, we look in the file system for the URL. If the file does not exist or is unreadable, we report an error back to the web client. Otherwise, we just use readFile( ) to get the contents of the file and put them in the cache. Then writeUCE( ) is used to send the contents of the file down the client socket.

doRequest( )     The doRequest( ) method is called once per connection to the server. It parses the request string and incoming MIME header. It decides to call either handleProxy( ) or handleGet( ), based on whether there is a “://” in the request string. If any methods are used other than GET, such as HEAD or POST, this routine returns a 405 error to the client. Note that the HTTP request is ignored if stopFlag is true.

run( )     The run( ) method is called when the server thread is started. It creates a new ServerSocket on the given port, goes into an infinite loop calling accept( ) on the server socket, and then passes the resulting Socket off to doRequest( ) for inspection.

start( ) AND stop( )     These are two methods used to start and stop the server process. These methods set the value of stopFlag.

main( )     You can use the main( ) method to run this application from a command line. It sets the LogMessage parameter to be the server itself, and then provides a simple console output implementation of log( ).

No comments:

Post a Comment