Processing Xml With Java - A Guide To Sax, Dom, Jdom, Jaxp, And Trax Free Open Book

Processing Xml With Java - A Guide To Sax, Dom, Jdom, Jaxp, And Trax

Previous Section Next Section

Exceptions and Errors

The XML specification defines three classes of problems that can occur in an XML document. In order of decreasing severity, these are as follows:

Fatal Error

A well-formedness error. As soon as the parser detects it, it must throw in the towel and stop parsing. The parse() method throws a SAXParseException when a fatal error is detected. Parsers have a little leeway in whether they detect fatal errors. In particular, nonvalidating parsers may not catch certain fatal errors that occur in the external DTD subset, and many parsers don't actually check everything they're supposed to check. However, if a parser does detect a fatal error, then it must give up and stop parsing.

Error

An error but not a well-formedness error. The most common is a validity error, although there are a few other kinds as well. Some parsers classify violations of namespace well-formedness as errors. Parsers may or may not detect these errors. If a parser does detect one of these errors, it may or may not throw a SAXParseException and it may or may not continue parsing. (Validity errors generally do not cause SAXParseExceptions. Other kinds of errors may, depending on the parser.) These sorts of errors are a source of some interoperability problems in XML, because two parsers may behave differently given the same document.

Warning

Not itself an error. Nonetheless, it may indicate a mistake of some kind in the document. For example, a parser might issue a warning if it encountered an element named XMLDocument. That's because all names beginning with "XML" (in any arrangement of case) are reserved by the W3C for future standards. Parsers may or may not detect these types of problems. If a parser does detect one, it will not throw an exception but will continue parsing.

In addition, a parser may encounter an I/O problem that has nothing to do with XML. For example, your cat might knock the Ethernet cable out of the back of your PC while you're downloading a large XML document from a remote web server.

If the parser detects a well-formedness error in the document it's parsing, then parse() throws a SAXException. In the event of an I/O error, it throws an IOException. The parser may or may not throw a SAXException in the event of a nonfatal error, and it will not throw an exception for a warning.

As you can see, the only kind of XML problem the parser is guaranteed to tell you about through an exception is the well-formedness error. If you want to be informed of the other kinds of errors and possible problems, you need to implement the ErrorHandler interface, and register your ErrorHandler implementation with the XMLReader.

SAXExceptions

The SAXException class, demonstrated in Example 7.4, is the generic exception class for almost anything (other than an I/O problem) that can go wrong while processing an XML document with SAX. Not only the parse() method but also most of the callback methods in the various SAX interfaces are declared to throw this exception. If you detect a problem while processing an XML document, your code can throw its own SAXException.

Example 7.4 The SAXException Class
package org.xml.sax;

public class SAXException extends Exception {

  public SAXException()
  public SAXException(String message)
  public SAXException(Exception rootCause)
  public SAXException(String message, Exception e)

  public String    getMessage()
  public Exception getException()
  public String    toString()

}
Nested Exceptions

SAXException may not always be the exception you want to throw, however. For example, suppose you're parsing a document containing an XML digital signature, and the endElement() method notices that the base64 encoded text provided in the P element (which represents the prime modulus of a DSA key) does not decode to a prime number the way it's supposed to. You naturally want to throw a java.security.InvalidKeyException to warn the client application of this. But endElement() cannot throw a java.security.InvalidKeyException—only a SAXException. In this case, you wrap the exception you really want to throw inside a SAXException and throw the SAXException instead. For example,

Exception nestedException 
 = new InvalidKeyException("Modulus is not prime!");
SAXException e = new SAXException(nestedException);
throw e;

The code that catches the SAXException can retrieve the original exception using the getException() method. For example, the client application method might indeed be declared to throw an InvalidKeyException, so you could cast the nested exception to its real type and throw it into the appropriate catch block elsewhere in the call chain:

catch (SAXException e) {
  Exception rootCause = e.getException();
  if (rootCause == null) {
    // handle it as an XML problem...
  }
  else {
    if (rootCause instanceof InvalidKeyException) {
      InvalidKeyException ike = (InvalidKeyException) rootCause;
      throw ike;
    }
    else if (rootCause instanceof SomeOtherException) {
      SomeOtherException soe = (SomeOtherException) rootCause;
      throw soe;
    }
    ...
  }
}
SAXException Subclasses

SAX defines several more specific subclasses of SAXException for specific problems, even though most methods are only declared to throw a generic SAXException. These subclasses include SAXParseException, SAXNotRecognizedException, and SAXNotSupportedException. In addition, parsers can extend SAXException with their own custom subclasses, but few do this.

A SAXParseException indicates a fatal error, error, or warning in an XML document. The parse() method of the XMLReader interface throws this when it encounters a well-formedness error. SAXParseException is also passed as an argument to the methods of the ErrorHandler interface to signal any of the three kinds of problems an XML document may contain.

In addition to the usual exception methods like getMessage() and printStackTrace() that SAXParseException inherits from its superclasses, it provides methods to get the public ID and system ID of the file where the well-formedness error occurs (remember, XML documents that use external parsed entities can be divided among multiple separate files) and the line number and column number within that file where the well-formedness error occurs.

Example 7.5 The SAXParseException Class
package org.xml.sax;

public class SAXParseException extends SAXException {

  public SAXParseException(String message, Locator locator)
  public SAXParseException(String message, Locator locator,
   Exception e)
  public SAXParseException(String message, String publicID,
   String systemID, int lineNumber, int columnNumber)
  public SAXParseException(String message, String publicID,
   String systemID, int lineNumber, int columnNumber,
   Exception e)

  public String getPublicId()
  public String getSystemId()
  public int    getLineNumber()
  public int    getColumnNumber()

}

The lines and column numbers that the parser reports for the problem may not always be perfectly accurate. Nonetheless, they should be close to where the problem begins or ends. (Some parsers give the line and column numbers for the start-tag of a problem element. Others give the line and column numbers for the endtag.) If the document is so malformed that the parser can't even begin working with it, particularly if it isn't an XML document at all, then the parser will probably indicate that the error occurred at line -1, column -1.

Example 7.6 enhances last chapter's SAXChecker program so that it reports the line numbers of any well-formedness errors. There are two catch blocks—one for SAXParseException and another one for the more generic SAXException—so it's possible to distinguish between well-formedness errors and other problems, such as not being able to find the right XMLReader implementation class.

Example 7.6 A SAX Program That Parses a Document and Identifies the Line Numbers of Any Well-Formedness Errors
import org.xml.sax.*;
import org.xml.sax.helpers.XMLReaderFactory;
import java.io.IOException;

public class BetterSAXChecker {

  public static void main(String[] args) {

    if (args.length <= 0) {
      System.out.println("Usage: java BetterSAXChecker URL");
      return;
    }
    String document = args[0];

    try {
      XMLReader parser = XMLReaderFactory.createXMLReader();
      parser.parse(document);
      System.out.println(document + " is well-formed.");
    }
    catch (SAXParseException e) {
      System.out.print(document + " is not well-formed at ");
      System.out.print("line " + e.getLineNumber()
       + ", column " +  e.getColumnNumber() );
      System.out.println(" in the entity " + e.getSystemId());
    }
    catch (SAXException e) {
      System.out.println("Could not check document because "
       + e.getMessage());
    }
    catch (IOException e) {
      System.out.println(
       "Due to an IOException, the parser could not check "
       + document
      );
    }

  }

}

Following is the output I got when I first ran this program across my Cafe con Leche home page. The first time I neglected to specify a parser, which produced a generic SAXException. The second time I corrected that mistake, and a SAXParseException signaled a well-formedness error.

%java BetterSAXChecker http://www.cafeconleche.org 
Could not check document because System property
 org.xml.sax.driver not specified
%java -Dorg.xml.sax.driver=org.apache.xerces.parsers.SAXParser
 BetterSAXChecker http://www.cafeconleche.org
http://www.cafeconleche.org is not well-formed at line 64,
column 64 in the entity http://www.cafeconleche.org/
Not-Necessarily-Fatal Errors

XML includes a few errors that fall into a gray area. These are errors but neither fatal well-formedness errors nor nonfatal validity errors. The most common such error is an ambiguous content model in an element declaration. For example, consider the following declaration, which states that an Actor can have between zero and two Parts:

<!ELEMENT Actor (Part?, Part?)> 

The problem occurs with an Actor element that has one Part, like this:

<Actor> 
  <Part>Cyrano</Part>
</Actor>

Does this one Part match the first Part in the content model or the second one? There's no way to tell. Some parsers have trouble with this construct, and other parsers don't notice any problem at all. The XML specification calls this an error but does not classify it as a fatal error.

Different parsers treat these not-necessarily-fatal errors differently. Some parsers throw a SAXParseException when one is encountered. Other parsers let them pass without comment. And still others report them in a different way but do not throw an exception. For maximum compatibility, try to design your DTDs and instance documents to avoid this problem.

The ErrorHandler Interface

Throwing an exception aborts the parsing process, but not all problems encountered in an XML document necessarily require such a radical step. In particular, validity errors are not signaled by an exception because that would stop parsing. If you want your program to be informed of nonfatal errors, then you must register an ErrorHandler object with the XMLReader. Then the parser will tell you about problems in the document by passing (not throwing!) a SAXParseException to one of the methods in this object.

Example 7.7 summarizes the ErrorHandler interface. As you can see, it has three callback methods corresponding to the three different kinds of problems a parser may detect. When the parser detects one of these problems, it passes a SAXParseException to the appropriate method. If you want to treat errors or warnings as fatal, then you can throw the exception you were passed. (The parse() method will always throw an exception for a fatal error, even if you don't.) If you don't want to treat them as fatal (and most often you don't), then you can do something else with the information wrapped in the exception.

Example 7.7 The ErrorHandler Interface
package org.xml.sax;

public interface ErrorHandler {

  public void warning(SAXParseException exception)
   throws SAXException;
  public void error(SAXParseException exception)
   throws SAXException;
  public void fatalError(SAXParseException exception)
   throws SAXException;

}

The following two methods install an ErrorHandler into an XMLReader:

public void setErrorHandler (ErrorHandler handler)

public ErrorHandler getErrorHandler()

You can uninstall an ErrorHandler by passing null to setErrorHandler().

Example 7.8 is a program that checks documents for well-formedness errors and other problems. It reports all errors detected, no matter how small, through the ErrorHandler interface.

Example 7.8 A SAX Program That Reports All Problems Found in an XML Document
import org.xml.sax.*;
import org.xml.sax.helpers.XMLReaderFactory;
import java.io.IOException;


public class BestSAXChecker implements ErrorHandler {

  public void warning(SAXParseException exception) {

    System.out.println("Warning: " + exception.getMessage());
    System.out.println(" at line " + exception.getLineNumber()
     + ", column " + exception.getColumnNumber());
    System.out.println(" in entity " + exception.getSystemId());

  }

  public void error(SAXParseException exception) {

    System.out.println("Error: " + exception.getMessage());
    System.out.println(" at line " + exception.getLineNumber()
     + ", column " + exception.getColumnNumber());
    System.out.println(" in entity " + exception.getSystemId());

  }

  public void fatalError(SAXParseException exception) {

    System.out.println("Fatal Error: " + exception.getMessage());
    System.out.println(" at line " + exception.getLineNumber()
     + ", column " + exception.getColumnNumber());
    System.out.println(" in entity " + exception.getSystemId());

  }

  public static void main(String[] args) {

    if (args.length <= 0) {
      System.out.println("Usage: java BestSAXChecker URL");
      return;
    }
    String document = args[0];

    try {
      XMLReader parser = XMLReaderFactory.createXMLReader();
      ErrorHandler handler = new BestSAXChecker();
      parser.setErrorHandler(handler);
      parser.parse(document);
      // If the document isn't well-formed, an exception has
      // already been thrown and this has been skipped.
      System.out.println(document + " is well-formed.");
    }
    catch (SAXParseException e) {
      System.out.print(document + " is not well-formed at ");
      System.out.println("Line " + e.getLineNumber()
       + ", column " +  e.getColumnNumber() );
      System.out.println(" in entity " + e.getSystemId());
    }
    catch (SAXException e) {
      System.out.println("Could not check document because "
       + e.getMessage());
    }
    catch (IOException e) {
      System.out.println(
       "Due to an IOException, the parser could not check "
       + document
      );
    }

  }

}

Following is the output from running BestSAXChecker across the Docbook XML source code for an early version of this chapter:

%java -Dorg.xml.sax.driver=org.apache.xerces.parsers.SAXParser 
 BestSAXChecker xmlreader.xml
Error: The namespace prefix "xinclude" was not declared.
 at line 349, column 92
 in entity file:///D:/books/XMLJAVA/xmlreader.xml
Error: The namespace prefix "xinclude" was not declared.
 at line 530, column 95
 in entity file:///D:/books/XMLJAVA/xmlreader.xml
Error: The namespace prefix "xinclude" was not declared.
 at line 545, column 84
 in entity file:///D:/books/XMLJAVA/xmlreader.xml
Error: The namespace prefix "xinclude" was not declared.
 at line 688, column 93
 in entity file:///D:/books/XMLJAVA/xmlreader.xml
Fatal Error: The element type "para" must be terminated by the
matching end-tag "</para>".
 at line 706, column 42
 in entity file:///D:/books/XMLJAVA/xmlreader.xml
Could not check document because Stopping after fatal error:
The element type "para" must be terminated by the matching
 end-tag "</para>".

BestSAXChecker complains several times about an undeclared namespace prefix for the XInclude elements I use to merge in source code examples like Example 7.8. Then, about three-quarters of the way through the document, it encounters a well-formedness error where I neglected to put an end-tag in the right place. At this point parsing stops. If there are any errors after that point, they aren't reported. Once I fixed those problems, the file became well-formed and valid:

%java -Dorg.xml.sax.driver=org.apache.xerces.parsers.SAXParser 
 BestSAXChecker xmlreader.xml
xmlreader.xml is well-formed.

Beyond simple well-formedness, the errors that this program catches depend on the underlying parser. All conformant parsers detect all well-formedness errors. Most modern parsers should also catch any violations of namespace well-formedness. Whether this program catches validity errors depends on the parser. Most parsers do not validate by default. Instead they require the client application to explicitly request validation by setting the http://xml.org/sax/features/validation feature to true. I take this subject up next.

    Previous Section Next Section


         Main Menu
    Main Page
    Table of content
    Copyright
    Praise for Elliotte Rusty Harold's 'Processing XML with Java™'
    List of Examples
    List of Figures
    Preface
    Part I: XML
    Part II: SAX
    Chapter 6. SAX
    Chapter 7. The XMLReader Interface
    Building Parser Objects
    Input
    Exceptions and Errors
    Features and Properties
    DTDHandler
    Summary
    Chapter 8. SAX Filters
    Part III: DOM
    Part IV: JDOM
    Part V: XPath/XSLT
    Part VI: Appendixes


    More Books
    PHP Hacks
    Processing Xml With Java - A Guide To Sax, Dom, Jdom, Jaxp, And Trax
    The Koran (Holy Qur'an)
    Macromedia Flash 8 Bible
    Search Engine Optimization for Dummies
    YouTube Traffic
    PHP 5 for Dummies
    Harry Potter and The Chamber of Secrets
    Harry Potter and the Sorcerer's Stone
    The Pilgrim's Progress
    Wireless Hacks
    Flash Hacks. 100 Industrial-Strength Tips & Tools
    PayPal Hacks. 100 Industrial-Strength Tips and Tools
    Amazon Hacks
    Pdf Hacks
    The Da Vinci Code
    Google Hacks
    The Holy Bible
    Windows XP For Dummies
    Harry Potter and the Half-Blood Prince
    Seo Book
    Upgrading and Repairing Networks
    Macromedia Dreamweaver 8 UNLEASHED
    Windows XP Annoyances
    Windows XP Hacks
    Microsoft Windows XP Power Toolkit
    Teach Yourself MS Office In 24Hours
    iPod & iTunes Missing Manual
    PC Hacks 100 Industrial-Strength Tips and Tools
    PC Overclocking, Optimization, and Tuning - 2th Edition
    PC Hardware In A Nutshell 3rd Edition
    PC Hardware in a Nutshell, 2nd Edition
    Upgrading and Repairing PCs
    Google for Dummies
    MySQL Cookbook
    Teach Yourself Macromedia Flash 8 In 24 Hours
    PHP CookBook
    Sams Teach Yourself JavaScript in 24 Hours
    PHP5 Manual
    Free Games Paper Airplanes
    500 Juegos Gratis 500 Giochi Gratis 500 Jeux Gratuits 500 Jogos Gratis 500 Kostenlose Spiele