Friday 24 March 2017

TCP/IP Client Sockets, URL & URLConnection - Java Tutorials

TCP/IP Client Sockets

TCP/IP sockets are used to implement reliable, bidirectional, persistent, point-to- point, stream-based connections between hosts on the Internet. A socket can be used to connect Java’s I/O system to other programs that may reside either on the local machine or on any other machine on the Internet.

Applets may only establish socket connections back to the host from which the applet was downloaded. This restriction exists because it would be dangerous for applets loaded through a firewall to have access to any arbitrary machine.

There are two kinds of TCP sockets in Java. One is for servers, and the other is for clients. The ServerSocket class is designed to be a “listener,” which waits for clients to connect before doing anything. The Socket class is designed to connect to server sockets and initiate protocol exchanges.

The creation of a Socket object implicitly establishes a connection between the client and server. There are no methods or constructors that explicitly expose the details of establishing that connection. Here are two constructors used to create client sockets:

Socket(String hostName, int port):  Creates a socket connecting the local host to the named host and port; can throw an UnknownHostException or an IOException.

Socket(InetAddress ipAddress, int port):  Creates a socket using a preexisting InetAddress object and a port; can throw an IOException.


A socket can be examined at any time for the address and port information associated with it, by use of the following methods:

InetAddress getInetAddress( ):  Returns the InetAddress associated with the Socket object.

int getPort( ):  Returns the remote port to which this Socket object is connected.

int getLocalPort( ):  Returns the local port to which this Socket object is connected.


Once the Socket object has been created, it can also be examined to gain access to the input and output streams associated with it. Each of these methods can throw an IOException if the sockets have been invalidated by a loss of connection on the Net. These streams are used exactly like the I/O streams described in Chapter 17 to send and receive data.

InputStream getInputStream( ):  Returns the InputStream associated with the invoking socket.

OutputStream getOutputStream( ):  Returns the OutputStream associated with the invoking socket.


Java 2, version 1.4 added the getChannel( ) method to Socket. This method returns a channel connected to the Socket object. Channels are used by the new I/O classes contained in java.nio.


Whois

The very simple example that follows opens a connection to a “whois” port on the InterNIC server, sends the command-line argument down the socket, and then prints the data that is returned. InterNIC will try to look up the argument as a registered Internet domain name, then send back the IP address and contact information for that site.

  //Demonstrate Sockets.
  import java.net.*;
  import java.io.*;

  class Whois {
    public static void main(String args[]) throws Exception {
      int c;
      Socket s = new Socket("internic.net", 43);
      InputStream in = s.getInputStream();
      OutputStream out = s.getOutputStream();
      String str = (args.length == 0 ? "osborne.com" : args[0]) + 
                    "\n";
      byte buf[] = str.getBytes();
      out.write(buf);
      while ((c = in.read()) != -1) {
        System.out.print((char) c);
      }
      s.close();
    }
  }

If, for example, you obtained information about osborne.com, you’d get something similar to the following:

  Whois Server Version 1.3

  Domain names in the .com, .net, and .org domains can now be 
  registered with many different competing registrars. Go to 
  http://www.internic.net for detailed information.

    Domain Name: OSBORNE.COM
    Registrar: NETWORK SOLUTIONS, INC.
    Whois Server: whois.networksolutions.com
    Referral URL: http://www.networksolutions.com
    Name Server: NS1.EPPG.COM
    Name Server: NS2.EPPG.COM
    Updated Date: 16-jan-2002

  >> Last update of whois database: Thu, 25 Apr 2002 05:05:52 EDT <<
  
  The Registry database contains ONLY .COM, .NET, .ORG, .EDU 
  domains and Registrars.





URL


That last example was rather obscure, because the modern Internet is not about the older protocols, like whois, finger, and FTP. It is about WWW, the World Wide Web. The Web is a loose collection of higher-level protocols and file formats, all unified in a web browser. One of the most important aspects of the Web is that Tim Berners-Lee devised a scaleable way to locate all of the resources of the Net. Once you can reliably name anything and everything, it becomes a very powerful paradigm. The Uniform Resource Locator (URL) does exactly that.

The URL provides a reasonably intelligible form to uniquely identify or address information on the Internet. URLs are ubiquitous; every browser uses them to identify information on the Web. In fact, the Web is really just that same old Internet with all of its resources addressed as URLs plus HTML. Within Java’s network class library, the URL class provides a simple, concise API to access information across the Internet using URLs.


Format

Two examples of URLs are http://www.osborne.com/ and http://www.osborne.com:80/index.htm. A URL specification is based on four components. The first is the protocol to use, separated from the rest of the locator by a colon (:). Common protocols are http, ftp, gopher, and file, although these days almost everything is being done via HTTP (in fact, most browsers will proceed correctly if you leave off the “http://” from your URL specification). The second component is the host name or IP address of the host to use; this is delimited on the left by double slashes (//) and on the right by a slash (/) or optionally a colon (:). The third component, the port number, is an optional parameter, delimited on the left from the host name by a colon (:) and on the right by a slash (/). (It defaults to port 80, the predefined HTTP port; thus “:80” is redundant.) The fourth part is the actual file path. Most HTTP servers will append a file named index.html or index.htm to URLs that refer directly to a directory resource. Thus, http://www.osborne.com/ is the same as http://www.osborne.com/index.htm.

Java’s URL class has several constructors, and each can throw a MalformedURLException. One commonly used form specifies the URL with a string that is identical to what you see displayed in a browser:

      URL(String urlSpecifier)

The next two forms of the constructor allow you to break up the URL into its component parts:

      URL(String protocolName, String hostName, int port, String path)
      URL(String protocolName, String hostName, String path)

Another frequently used constructor allows you to use an existing URL as a reference context and then create a new URL from that context. Although this sounds a little contorted, it’s really quite easy and useful.

      URL(URL urlObj, String urlSpecifier)

In the following example, we create a URL to Osborne’s download page and then examine its properties:

  // Demonstrate URL.
  import java.net.*;
  class URLDemo {
    public static void main(String args[]) throws 
                                           MalformedURLException {
      URL hp = new URL("http://www.osborne.com/downloads");

      System.out.println("Protocol: " + hp.getProtocol());
      System.out.println("Port: " + hp.getPort());
      System.out.println("Host: " + hp.getHost());
      System.out.println("File: " + hp.getFile());
      System.out.println("Ext:" + hp.toExternalForm());
    }
  }

When you run this, you will get the following output:

  Protocol: http
  Port: -1
  Host: www.osborne.com
  File: /downloads
  Ext:http://www.osborne.com/downloads

Notice that the port is –1; this means that one was not explicitly set. Now that we have created a URL object, we want to retrieve the data associated with it. To access the actual bits or content information of a URL, you create a URLConnection object from it, using its openConnection( ) method, like this:

  url.openConnection()

openConnection( ) has the following general form:

      URLConnection openConnection( )

It returns a URLConnection object associated with the invoking URL object. It may throw an IOException.




URLConnection

URLConnection is a general-purpose class for accessing the attributes of a remote resource. Once you make a connection to a remote server, you can use URLConnection to inspect the properties of the remote object before actually transporting it locally. These attributes are exposed by the HTTP protocol specification and, as such, only make sense for URL objects that are using the HTTP protocol. We’ll examine the most useful elements of URLConnection here.

In the following example, we create a URLConnection using the openConnection( ) method of a URL object and then use it to examine the document’s properties and content:

  // Demonstrate URLConnection.
  import java.net.*;
  import java.io.*;
  import java.util.Date;

  class UCDemo
  {
    public static void main(String args[]) throws Exception {
      int c;
      URL hp = new URL("http://www.internic.net");
      URLConnection hpCon = hp.openConnection();

      // get date
      long d = hpCon.getDate();
      if(d==0)
        System.out.println("No date information.");
      else
        System.out.println("Date: " + new Date(d));

      // get content type
      System.out.println("Content-Type: " + hpCon.getContentType());

      // get expiration date
      d = hpCon.getExpiration();
      if(d==0)
        System.out.println("No expiration information.");
      else
        System.out.println("Expires: " + new Date(d));

      // get last-modified date
      d = hpCon.getLastModified();
      if(d==0)
        System.out.println("No last-modified information.");
      else
        System.out.println("Last-Modified: " + new Date(d));

      // get content length
      int len = hpCon.getContentLength();
      if(len == -1)
        System.out.println("Content length unavailable.");
      else
        System.out.println("Content-Length: " + len);

      if(len != 0) {
        System.out.println("=== Content ===");
        InputStream input = hpCon.getInputStream();
        int i = len;
        while (((c = input.read()) != -1)) { // && (--i > 0)) {
          System.out.print((char) c);
        }
        input.close();

      } else {
        System.out.println("No content available.");
      }
    }
  }

The program establishes an HTTP connection to www.internic.net over port 80. We then list out the header values and retrieve the content. Here are the first lines of the output (the precise output will vary over time).

      Date: Sat Apr 27 12:17:32 CDT 2002
      Content-Type: text/html
      No expiration information.
      Last-Modified: Tue Mar 19 17:52:42 CST 2002
      Content-Length: 5299
      === Content ===

      <html>
      
      <head>
      <title>InterNIC | The Internet's Network Information Center</title>
      <meta name="keywords" content="internic,network information, domain registration">
      <style type="text/css">
      <!--
      p, li, td, ul { font-family: Arial, Helvetica, sans-serif}
      -->
      </style>
      </head>

The URL and URLConnection classes are good enough for simple programs that want to connect to HTTP servers to fetch content. For more complex applications, you’ll probably find that you are better off studying the specification of the HTTP protocol and implementing your own wrappers.

No comments:

Post a Comment