Friday, 24 March 2017

Networking Basics, Java and the Net & InetAddress - Java Tutorials

Networking Basics

Ken Thompson and Dennis Ritchie developed UNIX in concert with the C language at Bell Telephone Laboratories, Murray Hill, New Jersey, in 1969. For many years, the development of UNIX remained in Bell Labs and in a few universities and research facilities that had the DEC PDP machines it was designed to be run on. In 1978, Bill Joy was leading a project at Cal Berkeley to add many new features to UNIX, such as virtual memory and full-screen display capabilities. By early 1984, just as Bill was leaving to found Sun Microsystems, he shipped 4.2BSD, commonly known as Berkeley UNIX.

4.2BSD came with a fast file system, reliable signals, interprocess communication, and, most important, networking. The networking support first found in 4.2 eventually became the de facto standard for the Internet. Berkeley’s implementation of TCP/IP remains the primary standard for communications within the Internet. The socket paradigm for interprocess and network communication has also been widely adopted outside of Berkeley. Even Windows and the Macintosh started talking “Berkeley sockets” in the late ‘80s.


Socket Overview

A network socket is a lot like an electrical socket. Various plugs around the network have a standard way of delivering their payload. Anything that understands the standard protocol can “plug in” to the socket and communicate. With electrical sockets, it doesn’t matter if you plug in a lamp or a toaster; as long as they are expecting 60Hz, 115-volt electricity, the devices will work. Think how your electric bill is created. There is a meter somewhere between your house and the rest of the network. For each kilowatt of power that goes through that meter, you are billed. The bill comes to your “address.” So even though the electricity flows freely around the power grid, all of the sockets in your house have a particular address.

The same idea applies to network sockets, except we talk about TCP/IP packets and IP addresses rather than electrons and street addresses. Internet Protocol (IP) is a low-level routing protocol that breaks data into small packets and sends them to an address across a network, which does not guarantee to deliver said packets to the destination. Transmission Control Protocol (TCP) is a higher-level protocol that manages to robustly string together these packets, sorting and retransmitting them as necessary to reliably transmit your data. A third protocol, User Datagram Protocol (UDP), sits next to TCP and can be used directly to support fast, connectionless, unreliable transport of packets.


Client/Server

You often hear the term client/server mentioned in the context of networking. It seems complicated when you read about it in corporate marketing statements, but it is actually quite simple. A server is anything that has some resource that can be shared. There are compute servers, which provide computing power; print servers, which manage a collection of printers; disk servers, which provide networked disk space; and web servers, which store web pages. A client is simply any other entity that wants to gain access to a particular server. The interaction between client and server is just like the interaction between a lamp and an electrical socket. The power grid of the house is the server, and the lamp is a power client. The server is a permanently available resource, while the client is free to “unplug” after it is has been served.

In Berkeley sockets, the notion of a socket allows a single computer to serve many different clients at once, as well as serving many different types of information. This feat is managed by the introduction of a port, which is a numbered socket on a particular machine. A server process is said to “listen” to a port until a client connects to it. A server is allowed to accept multiple clients connected to the same port number, although each session is unique. To manage multiple client connections, a server process must be multithreaded or have some other means of multiplexing the simultaneous I/O.


Reserved Sockets

Once connected, a higher-level protocol ensues, which is dependent on which port you are using. TCP/IP reserves the lower 1,024 ports for specific protocols. Many of these will seem familiar to you if you have spent any time surfing the Internet. Port number 21 is for FTP, 23 is for Telnet, 25 is for e-mail, 79 is for finger, 80 is for HTTP, 119 is for netnews—and the list goes on. It is up to each protocol to determine how a client should interact with the port.

For example, HTTP is the protocol that web browsers and servers use to transfer hypertext pages and images. It is quite a simple protocol for a basic page-browsing web server. Here’s how it works. When a client requests a file from an HTTP server, an action known as a hit, it simply prints the name of the file in a special format to a predefined port and reads back the contents of the file. The server also responds with a status code number to tell the client whether the request can be fulfilled and why.

Here’s an example of a client requesting a single file, /index.html, and the server replying that it has successfully found the file and is sending it to the client:

Server   ---   Client

Listens to port 80.  ---   Connects to port 80.

Accepts the connection.  ---  Writes “GET /index.html HTTP/1.0\n\n”.

Reads up until the second end-of-line (\n). Sees that GET is a known command and that HTTP/1.0 is a valid protocol version. Reads a local file called /index.html. Writes “HTTP/1.0 200 OK\n\n”.              ---   “200” means “here comes the file.”

Copies the contents of the file into the socket.  ---  Reads the contents of the file and displays it.

Hangs up.  ---  Hangs up.


Obviously, the HTTP protocol is much more complicated than this example shows, but this is an actual transaction that you could have with any web server near you.


Proxy Servers

A proxy server speaks the client side of a protocol to another server. This is often required when clients have certain restrictions on which servers they can connect to. Thus, a client would connect to a proxy server, which did not have such restrictions, and the proxy server would in turn communicate for the client. A proxy server has the additional ability to filter certain requests or cache the results of those requests for future use. A caching proxy HTTP server can help reduce the bandwidth demands on a local network’s connection to the Internet. When a popular web site is being hit by hundreds of users, a proxy server can get the contents of the web server’s popular pages once, saving expensive internetwork transfers while providing faster access to those pages to the clients.

Later in this chapter, we will actually build a complete caching proxy HTTP server. The interesting part about this sample program is that it is both a client and a server. To serve certain pages, it must act as a client to other servers to obtain a copy of the requested content.


Internet Addressing

Every computer on the Internet has an address. An Internet address is a number that uniquely identifies each computer on the Net. Originally, all Internet addresses consisted of 32-bit values. This address type was specified by IPv4 (Internet Protocol, version 4). However, a new addressing scheme, called IPv6 (Internet Protocol, version 6) has come into play. IPv6 uses a 128-bit value to represent an address. Although there are several reasons for and advantages to IPv6, the main one is that it supports a much larger address space than does IPv4. Fortunately, IPv6 is downwardly compatible with IPv4. Currently, IPv4 is by far the most widely used scheme, but this situation is likely to change over time.

Because of the emerging importance of IPv6, Java 2, version 1.4 has begun to add support for it. However, at the time of this writing, IPv6 is not supported by all environments. Furthermore, for the next few years, IPv4 will continue to be the dominant form of addressing. For these reasons, the form of Internet addresses discussed here, and used in this chapter, are the IPv4 form. As mentioned, IPv4 is, loosely, a subset of IPv6, and the material contained in this chapter is largely applicable to both forms of addressing.

There are 32 bits in an IPv4 IP address, and we often refer to them as a sequence of four numbers between 0 and 255 separated by dots (.). This makes them easier to remember, because they are not randomly assigned—they are hierarchically assigned. The first few bits define which class of network, lettered A, B, C, D, or E, the address represents. Most Internet users are on a class C network, since there are over two million networks in class C. The first byte of a class C network is between 192 and 224, with the last byte actually identifying an individual computer among the 256 allowed on a single class C network. This scheme allows for half a billion devices to live on class C networks.


Domain Naming Service (DNS)

The Internet wouldn’t be a very friendly place to navigate if everyone had to refer to their addresses as numbers. For example, it is difficult to imagine seeing “http://192.9.9.1/” at the bottom of an advertisement. Thankfully, a clearinghouse exists for a parallel hierarchy of names to go with all these numbers. It is called the Domain Naming Service (DNS). Just as the four numbers of an IP address describe a network hierarchy from left to right, the name of an Internet address, called its domain name, describes a machine’s location in a name space, from right to left. For example, www.osborne.com is in the COM domain (reserved for U.S. commercial sites), it is called osborne (after the company name), and www is the name of the specific computer that is Osborne’s web server. www corresponds to the rightmost number in the equivalent IP address.




Java and the Net

Now that the stage has been set, let’s take a look at how Java relates to all of these network concepts. Java supports TCP/IP both by extending the already established stream I/O interface introduced in Chapter 17 and by adding the features required to build I/O objects across the network. Java supports both the TCP and UDP protocol families. TCP is used for reliable stream-based I/O across the network. UDP supports a simpler, hence faster, point-to-point datagram-oriented model.


The Networking Classes and Interfaces

The classes contained in the java.net package are listed here:

Authenticator (Java 2) 
InetSocketAddress (Java 2, v1.4) 
SocketImpl
ContentHandler 
JarURLConnection (Java 2) 
SocketPermission
DatagramPacket 
MulticastSocket 
URI (Java 2, v1.4)
DatagramSocket 
NetPermission 
URL
DatagramSocketImpl 
NetworkInterface (Java 2, v1.4) 
URLClassLoader (Java 2)
HttpURLConnection 
PasswordAuthentication (Java 2) 
URLConnection
InetAddress 
ServerSocket 
URLDecoder (Java 2)
Inet4Address (Java 2, v1.4) 
Socket 
URLEncoder
Inet6Address (Java 2, v1.4) 
SocketAddress (Java 2, v1.4) 
URLStreamHandler

As you can see, several new classes were added by Java 2, version 1.4. Some of these are to support the new IPv6 addressing scheme. Others provide some added flexibility to the original java.net package. Java 2, version 1.4 also added functionality, such as support for the new I/O classes, to several of the preexisting networking classes. Most of the additions made by Java 2, version 1.4 are beyond the scope of this chapter, but three new classes, Inet4Address, Inet6Address, and URI, are briefly discussed at the end. The java.net package’s interfaces are listed here:

ContentHandlerFactory 
SocketImplFactory 
URLStreamHandlerFactory
FileNameMap 
SocketOptions 
DatagramSocketImplFactory (added by Java 2, v1.3)

In the sections that follow, we will examine the main networking classes and show several examples that apply them.




InetAddress

Whether you are making a phone call, sending mail, or establishing a connection across the Internet, addresses are fundamental. The InetAddress class is used to encapsulate both the numerical IP address we discussed earlier and the domain name for that address. You interact with this class by using the name of an IP host, which is more convenient and understandable than its IP address. The InetAddress class hides the number inside. As of Java 2, version 1.4, InetAddress can handle both IPv4 and IPv6 addresses. This discussion assumes IPv4.


Factory Methods

The InetAddress class has no visible constructors. To create an InetAddress object, you have to use one of the available factory methods. Factory methods are merely a convention whereby static methods in a class return an instance of that class. This is done in lieu of overloading a constructor with various parameter lists when having unique method names makes the results much clearer. Three commonly used InetAddress factory methods are shown here.

      static InetAddress getLocalHost( )
          throws UnknownHostException
     
      static InetAddress getByName(String hostName)
          throws UnknownHostException

      static InetAddress[ ] getAllByName(String hostName)
          throws UnknownHostException

The getLocalHost( ) method simply returns the InetAddress object that represents the local host. The getByName( ) method returns an InetAddress for a host name passed to it. If these methods are unable to resolve the host name, they throw an UnknownHostException.

On the Internet, it is common for a single name to be used to represent several machines. In the world of web servers, this is one way to provide some degree of scaling. The getAllByName( ) factory method returns an array of InetAddresses that represent all of the addresses that a particular name resolves to. It will also throw an UnknownHostException if it can’t resolve the name to at least one address.

Java 2, version 1.4 also includes the factory method getByAddress( ), which takes an IP address and returns an InetAddress object. Either an IPv4 or an IPv6 address can be used. The following example prints the addresses and names of the local machine and two well-known Internet web sites:

  // Demonstrate InetAddress.
  import java.net.*;

  class InetAddressTest
  {
    public static void main(String args[]) throws 
                                           UnknownHostException {
      InetAddress Address = InetAddress.getLocalHost();
      System.out.println(Address);
      Address = InetAddress.getByName("osborne.com");
      System.out.println(Address);
      InetAddress SW[] = InetAddress.getAllByName("www.nba.com");
      for (int i=0; i<SW.length; i++)
        System.out.println(SW[i]);
    }
  }

Here is the output produced by this program. (Of course, the output you see will be slightly different.)

  default/206.148.209.138
  osborne.com/198.45.24.162
  www.nba.com/64.241.238.153
  www.nba.com/64.241.238.142


Instance Methods

The InetAddress class also has several other methods, which can be used on the objects returned by the methods just discussed. Here are some of the most commonly used.

boolean equals(Object other):  Returns true if this object has the same Internet address as other.

byte[ ] getAddress( ):  Returns a byte array that represents the object’s Internet address in network byte order.

String getHostAddress( ):  Returns a string that represents the host address associated with the InetAddress object.

String getHostName( ):  Returns a string that represents the host name associated with the InetAddress object.

boolean isMulticastAddress( ):  Returns true if this Internet address is a multicast address. Otherwise, it returns false.

String toString( ):  Returns a string that lists the host name and the IP address for convenience.


Internet addresses are looked up in a series of hierarchically cached servers. That means that your local computer might know a particular name-to-IP-address mapping automatically, such as for itself and nearby servers. For other names, it may ask a local DNS server for IP address information. If that server doesn’t have a particular address, it can go to a remote site and ask for it. This can continue all the way up to the root server, called InterNIC (internic.net). This process might take a long time, so it is wise to structure your code so that you cache IP address information locally rather than look it up repeatedly.

No comments:

Post a Comment