Processing Xml With Java - A Guide To Sax, Dom, Jdom, Jaxp, And Trax Free Open Book

Processing Xml With Java - A Guide To Sax, Dom, Jdom, Jaxp, And Trax

Previous Section Next Section

Receiving Characters

When the parser reads #PCDATA, it passes this text to the characters() method as an array of chars. Although it would be simpler if characters() just took a String as an argument, using a char[] array allows certain performance optimizations. In particular, parsers often store a large chunk of the original document in a single array, and repeatedly pass that same array to the characters() method, while updating the values of start and length.

On the flip side, when there's a large amount of text between two tags with no intervening markup, the parser may choose to call characters() multiple times even though it doesn't need to. Xerces generally won't pass more than 16K of text in one call. Crimson is limited to about 8K of text per call. At the extreme, I have even seen a parser pass a single character at a time to the characters() method. You must not assume that the parser will pass the maximum contiguous run of text in a single call to characters().

This can lead to some uncomfortable contortions when processing many documents. Given an element such as <Name>Birdsong Clock</Name>, you typically want to process the entire content as a unit. This requires you to set a boolean flag at the start-tag for the element in startElement(); accumulate the data into a buffer of some kind, often a StringBuffer; and act on the data only when you reach the end-tag for the element, as signaled by the endElement() method.

For an example, I'm going to revisit the Fibonacci XML-RPC client program from Chapter 5. This time, rather than printing the result on System.out, I'm going to collect the result and make it available as a BigInteger. Once again, this will require the ContentHandler to recognize the contents of the single double element in the response while ignoring everything else. Example 6.10 demonstrates this.

Example 6.10 A SAX Client for the Fibonacci XML-RPC Server
import java.net.*;
import java.io.*;
import java.math.BigInteger;
import org.xml.sax.*;
import org.xml.sax.helpers.*;


public class NewFibonacciClient {

  public final static String DEFAULT_SERVER
   = "http://www.elharo.com/fibonacci/XML-RPC";

  public static BigInteger calculateFibonacci(int index,
   String server) throws IOException, SAXException {

      // Connect to the the server
      URL u = new URL(server);
      URLConnection uc = u.openConnection();
      HttpURLConnection connection = (HttpURLConnection) uc;
      connection.setDoOutput(true);
      connection.setDoInput(true);
      connection.setRequestMethod("POST");
      OutputStream out = connection.getOutputStream();
      Writer wout = new OutputStreamWriter(out, "UTF-8");

      // Transmit the request XML document
      wout.write("<?xml version=\"1.0\"?>\r\n");
      wout.write("<methodCall>\r\n");
      wout.write(
       "  <methodName>calculateFibonacci</methodName>\r\n");
      wout.write("  <params>\r\n");
      wout.write("    <param>\r\n");
      wout.write("      <value><int>" + index
       + "</int></value>\r\n");
      wout.write("    </param>\r\n");
      wout.write("  </params>\r\n");
      wout.write("</methodCall>\r\n");

      wout.flush();
      wout.close();

       // Read the response XML document
      XMLReader parser = XMLReaderFactory.createXMLReader(
        "org.apache.xerces.parsers.SAXParser"
      );
      FibonacciHandler handler = new FibonacciHandler();
      parser.setContentHandler(handler);

      InputStream in = connection.getInputStream();
      InputSource source = new InputSource(in);
      parser.parse(source);

      in.close();
      connection.disconnect();
      return handler.result;

  }

  static class FibonacciHandler extends DefaultHandler {

    StringBuffer buffer = null;
    BigInteger result = null;

    public void startElement(String namespaceURI,
     String localName, String qualifiedName, Attributes atts) {

      if (qualifiedName.equals("double")) {
        buffer = new StringBuffer();
      }

    }

    public void endElement(String namespaceURI, String localName,
     String qualifiedName) {

      if (qualifiedName.equals("double")) {
        String accumulatedText = buffer.toString();
        result = new BigInteger(accumulatedText);
        buffer = null;
      }

    }

    public void characters(char[] text, int start, int length)
     throws SAXException {

      if (buffer != null) {
        buffer.append(text, start, length);
      }

    }

  }

  public static void main(String[] args) {

    int index;
    try {
      index = Integer.parseInt(args[0]);
    }
    catch (Exception e) {
      System.out.println(
       "Usage: java NewFibonacciClient number url"
      );
      return;
    }

    String server = DEFAULT_SERVER;
    if (args.length >= 2) server = args[1];

    try {
      BigInteger result = calculateFibonacci(index, server);
      System.out.println(result);
    }
    catch (Exception e) {
      e.printStackTrace();
    }
  }

}

The return value is stored in a private BigInteger field named result. The value of this field only makes sense after the response has been received and parsed; therefore, I hide the ContentHandler in a static inner class, which is accessed through the static calculateFibonacci() method. Because ContentHandler methods often need to be called in specific order from a certain context, the strategy of hiding them inside a nonpublic, possibly inner class is quite common. It's not absolutely required, but it does make the class safer and the public interface much simpler.

What's really new here is how the characters() method operates. Fibonacci numbers grow arbitrarily large exponentially quickly. There does exist a Fibonacci number, the exact size depending on the parser, that will not be completely given in a single call to characters(). Consequently, rather than simply storing a boolean flag that tells us whether we're in the double element, we use a StringBuffer field. This is null outside the double element. It is non-null inside the double element. When it is non-null, the characters() method appends data to the buffer. That data is acted on—in this case, converted to an integer—only when an end-tag is spotted and the endElement() method is invoked.

This general approach of accumulating data into a buffer and acting on it only after the last character of data has been seen is very common in SAX programs. Elements that contain mixed content are handled similarly. Elements that can recursively contain other elements with the same name (for example, in XHTML a div can contain another div) are trickier, but normally can be handled by using a stack of element name flags rather than a single boolean flag. Indeed stacks are often very convenient data structures when processing XML with SAX, as is evident in earlier examples and will be seen again before this chapter is done.

    Previous Section Next Section


         Main Menu
    Main Page
    Table of content
    Copyright
    Praise for Elliotte Rusty Harold's 'Processing XML with Java™'
    List of Examples
    List of Figures
    Preface
    Part I: XML
    Part II: SAX
    Chapter 6. SAX
    What Is SAX?
    Parsing
    Callback Interfaces
    Receiving Documents
    Receiving Elements
    Handling Attributes
    Receiving Characters
    Receiving Processing Instructions
    Receiving Namespace Mappings
    'Ignorable White Space'
    Receiving Skipped Entities
    Receiving Locators
    What the ContentHandler Doesn't Tell You
    Summary
    Chapter 7. The XMLReader Interface
    Chapter 8. SAX Filters
    Part III: DOM
    Part IV: JDOM
    Part V: XPath/XSLT
    Part VI: Appendixes


    More Books
    PHP Hacks
    Processing Xml With Java - A Guide To Sax, Dom, Jdom, Jaxp, And Trax
    The Koran (Holy Qur'an)
    Macromedia Flash 8 Bible
    Search Engine Optimization for Dummies
    YouTube Traffic
    PHP 5 for Dummies
    Harry Potter and The Chamber of Secrets
    Harry Potter and the Sorcerer's Stone
    The Pilgrim's Progress
    Wireless Hacks
    Flash Hacks. 100 Industrial-Strength Tips & Tools
    PayPal Hacks. 100 Industrial-Strength Tips and Tools
    Amazon Hacks
    Pdf Hacks
    The Da Vinci Code
    Google Hacks
    The Holy Bible
    Windows XP For Dummies
    Harry Potter and the Half-Blood Prince
    Seo Book
    Upgrading and Repairing Networks
    Macromedia Dreamweaver 8 UNLEASHED
    Windows XP Annoyances
    Windows XP Hacks
    Microsoft Windows XP Power Toolkit
    Teach Yourself MS Office In 24Hours
    iPod & iTunes Missing Manual
    PC Hacks 100 Industrial-Strength Tips and Tools
    PC Overclocking, Optimization, and Tuning - 2th Edition
    PC Hardware In A Nutshell 3rd Edition
    PC Hardware in a Nutshell, 2nd Edition
    Upgrading and Repairing PCs
    Google for Dummies
    MySQL Cookbook
    Teach Yourself Macromedia Flash 8 In 24 Hours
    PHP CookBook
    Sams Teach Yourself JavaScript in 24 Hours
    PHP5 Manual
    Free Games Paper Airplanes
    500 Juegos Gratis 500 Giochi Gratis 500 Jeux Gratuits 500 Jogos Gratis 500 Kostenlose Spiele