Processing Xml With Java - A Guide To Sax, Dom, Jdom, Jaxp, And Trax Free Open Book

Processing Xml With Java - A Guide To Sax, Dom, Jdom, Jaxp, And Trax

Previous Section Next Section

DOM

The Document Object Model, DOM, is the second major standard API for XML parsers, and the first tree-based API I'll consider. Most major parsers implement both SAX and DOM. DOM programs start off similarly to SAX programs, by having a parser object read an XML document from an input stream or other source. However, whereas the SAX parser returns the document broken up into a series of small pieces, the equivalent DOM method returns an entire Document object that contains everything in the original XML document. You read information from the document by invoking methods on this Document object or on the other objects it contains. This makes DOM much more convenient when random access to widely separated parts of the original document is required. However, it is quite memory intensive compared with SAX, and not nearly as well suited to streaming applications.

A second advantage to DOM is that it is a read-write API. Whereas SAX can only parse existing XML documents, DOM can also create them. Documents created in this fashion are automatically well-formed. Attempting to create a malformed document throws an exception. Example 5.5 is a DOM-based program for connecting to the Fibonacci XML-RPC servlet. The request is formed as a new DOM document, and the response is read as a parsed DOM document.

Example 5.5 A DOM-Based Client for the Fibonacci XML-RPC Server
import java.net.*;
import java.io.*;
import org.w3c.dom.*;
import org.apache.xerces.dom.*;
import org.apache.xerces.parsers.*;
import org.apache.xml.serialize.*;
import org.xml.sax.InputSource;


public class FibonacciDOMClient {

  public final static String DEFAULT_SERVER
   = "http://www.elharo.com/fibonacci/XML-RPC";

  public static void main(String[] args) {

    if (args.length <= 0) {
      System.out.println(
       "Usage: java FibonacciDOMClient number url"
      );
      return;
    }

    String server = DEFAULT_SERVER;
    if (args.length >= 2) server = args[1];

    try {
 
      // Build the request document
      DOMImplementation impl
       = DOMImplementationImpl.getDOMImplementation();

      Document request
       = impl.createDocument(null, "methodCall", null);

      Element methodCall = request.getDocumentElement();

      Element methodName = request.createElement("methodName");
      Text text = request.createTextNode("calculateFibonacci");
      methodName.appendChild(text);
      methodCall.appendChild(methodName);

      Element params = request.createElement("params");
      methodCall.appendChild(params);

      Element param = request.createElement("param");
      params.appendChild(param);

      Element value = request.createElement("value");
      param.appendChild(value);

      // Had to break the naming convention here because of a
      // conflict with the Java keyword int
      Element intElement = request.createElement("int");
      Text index = request.createTextNode(args[0]);
      intElement.appendChild(index);
      value.appendChild(intElement);

      // Transmit the request document
      URL u = new URL(server);
      URLConnection uc = u.openConnection();
      HttpURLConnection connection = (HttpURLConnection) uc;
      connection.setDoOutput(true);
      connection.setDoInput(true);
      connection.setRequestMethod("POST");
      OutputStream out = connection.getOutputStream();

      OutputFormat fmt = new OutputFormat(request);
      XMLSerializer serializer = new XMLSerializer(out, fmt);
      serializer.serialize(request);

      out.flush();
      out.close();

      // Read the response
      DOMParser parser = new DOMParser();
      InputStream in = connection.getInputStream();
      InputSource source = new InputSource(in);
      parser.parse(source);
      in.close();
      connection.disconnect();

      Document doc = parser.getDocument();
      NodeList doubles = doc.getElementsByTagName("double");
      Node datum = doubles.item(0);
      Text result = (Text) datum.getFirstChild();
      System.out.println(result.getNodeValue());
    }
    catch (Exception e) {
      System.err.println(e);
    }

  }

}

In DOM the request document is built as a tree. Each thing in the document is a node in this tree, including not only elements but also text nodes, comments, processing instructions, and more. The document serves as a factory for creating the various kinds of node objects. Each node in this tree belongs to exactly one document. After being created the node object is appended to the child list of its parent node.

Once the Document object has been created and populated, it needs to be serialized onto the output stream of the URLConnection. Unfortunately, there is no standard parser-independent way to do this in DOM2. This will be added in DOM3. In the meantime, you will need to resort to parser-specific classes and methods. Here I've used Xerces's org.apache.xml.serialize package. The basic design is that an XMLSerializer object is connected to an output stream. Options for serialization such as where to place line breaks and what character encoding to use are specified by an OutputFormat object. Here I just used the default OutputFormat. The document is written onto the stream using the XMLSerializer's serialize() method.

Once the server receives and parses the request, it calculates and transmits its response as an XML document. This document must be parsed to extract the single string you actually want. DOM includes a number of methods and classes to extract particular parts of a document without necessarily walking down the entire tree. The one I use here is the Document class's getElementsByTagName() method. This returns a NodeList containing one Node object for each element in the input document that has the name double. In this case there's exactly one of those, so it's extracted from the list. I then get the first child of that node, which happens to be a Text node that contains the value I want. This value is retrieved by the getNodeValue() method.

The first problem with DOM should now be apparent. It's more than a little complex, even for very simple problems like this one. However, DOM does have an internal logic; and once you become accustomed to it, you'll find it's actually not that hard to use. Still, the learning curve is quite steep, and frequent reference to the documentation is a necessity.

The second downside to DOM is that it does not expose as much of the information in an XML document as SAX does. Although the basic content of elements, text, and attributes is well supported by both DOM and SAX, there are many more esoteric aspects of XML documents that SAX provides but DOM does not. These include unparsed entities, notations, attribute types, and declarations in the DTD. Some of this will be provided in DOM3.

The third downside to DOM is that it's not as complete as SAX. Much of the code in Example 5.5 is actually part of the Xerces parser rather than standard DOM. Such parser-specific code is virtually impossible to avoid when programming in DOM. That's because DOM doesn't give you any way to create a new XML document, create a new parser, or write a Document onto an output stream. All of these have to be provided by the parser. If I were to port Example 5.5 to Crimson or GNU JAXP, I'd have to rewrite about half of it. DOM3 is going to fill in a lot of these holes. However, because DOM3 is still just a working draft with little parser support, I chose to stick to DOM2 for the time being. JAXP can also plug a few of these holes.

    Previous Section Next Section


         Main Menu
    Main Page
    Table of content
    Copyright
    Praise for Elliotte Rusty Harold's 'Processing XML with Java™'
    List of Examples
    List of Figures
    Preface
    Part I: XML
    Chapter 1. XML for Data
    Chapter 2. XML Protocols: XML-RPC and SOAP
    Chapter 3. Writing XML with Java
    Chapter 4. Converting Flat Files to XML
    Chapter 5. Reading XML
    InputStreams and Readers
    XML Parsers
    SAX
    DOM
    JAXP
    JDOM
    dom4j
    ElectricXML
    XMLPULL
    Summary
    Part II: SAX
    Part III: DOM
    Part IV: JDOM
    Part V: XPath/XSLT
    Part VI: Appendixes


    More Books
    PHP Hacks
    Processing Xml With Java - A Guide To Sax, Dom, Jdom, Jaxp, And Trax
    The Koran (Holy Qur'an)
    Macromedia Flash 8 Bible
    Search Engine Optimization for Dummies
    YouTube Traffic
    PHP 5 for Dummies
    Harry Potter and The Chamber of Secrets
    Harry Potter and the Sorcerer's Stone
    The Pilgrim's Progress
    Wireless Hacks
    Flash Hacks. 100 Industrial-Strength Tips & Tools
    PayPal Hacks. 100 Industrial-Strength Tips and Tools
    Amazon Hacks
    Pdf Hacks
    The Da Vinci Code
    Google Hacks
    The Holy Bible
    Windows XP For Dummies
    Harry Potter and the Half-Blood Prince
    Seo Book
    Upgrading and Repairing Networks
    Macromedia Dreamweaver 8 UNLEASHED
    Windows XP Annoyances
    Windows XP Hacks
    Microsoft Windows XP Power Toolkit
    Teach Yourself MS Office In 24Hours
    iPod & iTunes Missing Manual
    PC Hacks 100 Industrial-Strength Tips and Tools
    PC Overclocking, Optimization, and Tuning - 2th Edition
    PC Hardware In A Nutshell 3rd Edition
    PC Hardware in a Nutshell, 2nd Edition
    Upgrading and Repairing PCs
    Google for Dummies
    MySQL Cookbook
    Teach Yourself Macromedia Flash 8 In 24 Hours
    PHP CookBook
    Sams Teach Yourself JavaScript in 24 Hours
    PHP5 Manual
    Free Games Paper Airplanes
    500 Juegos Gratis 500 Giochi Gratis 500 Jeux Gratuits 500 Jogos Gratis 500 Kostenlose Spiele