Friday 24 March 2017

Using Stream I/O - Java Tutorials

The following example demonstrates several of Java’s I/O character stream classes and methods. This program implements the standard wc (word count) command. The program has two modes: if no filenames are provided as arguments, the program operates on the standard input stream. If one or more filenames are specified, the program operates on each of them.

  // A word counting utility.
  import java.io.*;

  class WordCount {
    public static int words = 0;
    public static int lines = 0;
    public static int chars = 0;

    public static void wc(InputStreamReader isr)
      throws IOException {
      int c = 0;
      boolean lastWhite = true;
      String whiteSpace = " \t\n\r";

      while ((c = isr.read()) != -1) {
        // Count characters
        chars++;
        // Count lines
        if (c == '\n') {
          lines++;
        }
        // Count words by detecting the start of a word
        int index = whiteSpace.indexOf(c);
        if(index == -1) {
          if(lastWhite == true) {
            ++words;
          }
          lastWhite = false;
        }
        else {
          lastWhite = true;
        }
      }
      if(chars != 0) {
        ++lines;
      }
    }

    public static void main(String args[]) {
      FileReader fr;
      try {
        if (args.length == 0) { // We're working with stdin
          wc(new InputStreamReader(System.in));
        }
        else { // We're working with a list of files
          for (int i = 0; i < args.length; i++) {
            fr = new FileReader(args[i]);
            wc(fr);
          }
        }
      }
      catch (IOException e) {
        return;
      }
      System.out.println(lines + " " + words + " " + chars);
    }
  }

The wc( ) method operates on any input stream and counts the number of characters, lines, and words. It tracks the parity of words and whitespace in the lastNotWhite variable.

When executed with no arguments, WordCount creates an InputStreamReader object using System.in as the source for the stream. This stream is then passed to wc( ), which does the actual counting. When executed with one or more arguments, WordCount assumes that these are filenames and creates FileReaders for each of them, passing the resultant FileReader objects to the wc( ) method. In either case, it prints the results before exiting.


Improving wc( ) Using a StreamTokenizer

An even better way to look for patterns in an input stream is to use another of Java’s I/O classes: StreamTokenizer. Similar to StringTokenizer breaks up the InputStream into tokens that are delimited by sets of characters. It has this constructor:

      StreamTokenizer(Reader inStream)

Here inStream must be some form of Reader.

StreamTokenizer defines several methods. In this example, we will use only a few. To reset the default set of delimiters, we will employ the resetSyntax( ) method. The default set of delimiters is finely tuned for tokenizing Java programs and is thus too specialized for this example. We declare that our tokens, or “words,” are any consecutive string of visible characters delimited on both sides by whitespace.

We use the eolIsSignificant( ) method to ensure that newline characters will be delivered as tokens, so we can count the number of lines as well as words. It has this general form:

      void eolIsSignificant(boolean eolFlag)

If eolFlag is true, the end-of-line characters are returned as tokens. If eolFlag is false, the end-of-line characters are ignored.

The wordChars( ) method is used to specify the range of characters that can be used in words. Its general form is shown here:

      void wordChars(int start, int end)

Here, start and end specify the range of valid characters. In the program, characters in the range 33 to 255 are valid word characters.

The whitespace characters are specified using whitespaceChars( ). It has this general form:

      void whitespaceChars(int start, int end)

Here, start and end specify the range of valid whitespace characters.

The next token is obtained from the input stream by calling nextToken( ). It returns the type of the token.

StreamTokenizer defines four int constants: TT_EOF, TT_EOL, TT_NUMBER, and TT_WORD. There are three instance variables. nval is a public double used to hold the values of numbers as they are recognized. sval is a public String used to hold the value of any words as they are recognized. ttype is a public int indicating the type of token that has just been read by the nextToken( ) method. If the token is a word, ttype equals TT_WORD. If the token is a number, ttype equals TT_NUMBER. If the token is a single character, ttype contains its value. If an end-of-line condition has been encountered, ttype equals TT_EOL. (This assumes that eolIsSignificant( ) was invoked with a true argument.) If the end of the stream has been encountered, ttype equals TT_EOF.

The word count program revised to use a StreamTokenizer is shown here:

  // Enhanced word count program that uses a StreamTokenizer
  import java.io.*;

  class WordCount {
    public static int words=0;
    public static int lines=0;
    public static int chars=0;

    public static void wc(Reader r) throws IOException {
      StreamTokenizer tok = new StreamTokenizer(r);

      tok.resetSyntax();
      tok.wordChars(33, 255);
      tok.whitespaceChars(0, ' ');
      tok.eolIsSignificant(true);

      while (tok.nextToken() != tok.TT_EOF) {
        switch (tok.ttype) {
          case StreamTokenizer.TT_EOL:
            lines++;
            chars++;
            break;
          case StreamTokenizer.TT_WORD:
            words++;
          default: // FALLSTHROUGH
            chars += tok.sval.length();
            break;
          }
        }
      }

      public static void main(String args[]) {
        if (args.length == 0) { // We're working with stdin
          try {
            wc(new InputStreamReader(System.in));
            System.out.println(lines + " " + words + " " + chars);
          } catch (IOException e) {};
        } else { // We're working with a list of files
          int twords = 0, tchars = 0, tlines = 0;
          for (int i=0; i<args.length; i++) {
            try {
              words = chars = lines = 0;
              wc(new FileReader(args[i]));
              twords += words;
              tchars += chars;
              tlines += lines;
              System.out.println(args[i] + ": " +
                lines + " " + words + " " + chars);
            } catch (IOException e) {
              System.out.println(args[i] + ": error.");
            }
          }
          System.out.println("total: " +
          tlines + " " + twords + " " + tchars);
      }
    }
  }

No comments:

Post a Comment