Processing Xml With Java - A Guide To Sax, Dom, Jdom, Jaxp, And Trax Free Open Book

Processing Xml With Java - A Guide To Sax, Dom, Jdom, Jaxp, And Trax

Previous Section Next Section

DTDHandler

SAX is mostly about the instance document, not the DTD or schema. However, given a validating parser, or at least an internal DTD subset, the DTD can affect the contents of the instance document in six ways:

  1. It can provide default values for attributes.

  2. It can assign types to attributes, which affects their normalized value.

  3. It can distinguish between ignorable and non-ignorable white space.

  4. It can declare general entities.

  5. It can declare unparsed entities.

  6. It can declare notations.

The first four are resolved silently. For example, when applying a default value for an attribute to an element, the parser simply adds that attribute to the Attributes object it passes to startElement(). It doesn't tell you that it's done it. It just does it.

The DTDHandler interface covers the last two effects. Because notations and unparsed entities are so infrequently used, they're not made a part of the main ContentHandler interface. Instead they're given their own callback interface that's just for working with notations and unparsed entities, DTDHandler. This is summarized in Example 7.16 The few developers who need this functionality can use it. Everyone else can ignore it.

Example 7.16 The DTDHandler Interface
package org.xml.sax;

public interface DTDHandler {

  public void notationDecl(String name, String publicID,
   String systemID) throws SAXException;

  public void unparsedEntityDecl(String name, String publicID,
   String systemID, String notationName) throws SAXException;

}

As with other callback interfaces, developers implement this interface in a class of their own choosing. That concrete instantiation is registered with the XMLReader through its setDTDHandler() method. For parallelism, there's also a getDTDHandler() method, although it isn't much needed in practice:

public void setDTDHandler (DTDHandler handler) 

public DTDHandler getDTDHandler()

As with the other callback interfaces, you can uninstall a DTDHandler by passing null to setDTDHandler().

The most common thing to do with a DTDHandler is simply to store all of the information provided about the notations and unparsed entities. Then the ContentHandler can refer back to this when it needs to resolve an unparsed entity. Example 7.17 is a simple DTDHandler implementation that stores the notations and unparsed entities declared in the DTD in two hash tables.

Example 7.17 A Caching DTDHandler
import org.xml.sax.*;
import java.util.Hashtable;

public class UnparsedCache implements DTDHandler {

  private Hashtable notations = new Hashtable();
  private Hashtable entities = new Hashtable();

  public void notationDecl(String name, String publicID,
   String systemID) {

    System.out.println(name);
    notations.put(name, new Notation(name, publicID, systemID));

  }

  public void unparsedEntityDecl(String name, String publicID,
   String systemID, String notationName) {

    entities.put(name, new UnparsedEntity(name, publicID,
     systemID, notationName));

  }

  public UnparsedEntity getUnparsedEntity(String name) {
    System.out.println("Getting " + name);
    return (UnparsedEntity) entities.get(name);
  }

  public Notation getNotation(String name) {
    System.out.println("Getting " + name);
    return (Notation) notations.get(name);
  }

}

For the convenience of tracking the several strings associated with each notation and unparsed entity, I wrap each one in a very simple class that just has a constructor, some getter methods, the equals() and hashCode() methods needed to store these objects in hash tables, and a toString() method for convenient output. The Notation class is shown in Example 7.18, and the UnparsedEntity class is shown in Example 7.19. Once you learn about DOM, an alternative would be to use that API's Notation and Entity classes instead.

Example 7.18 A Notation Utility Class
public class Notation {

  private String name;
  private String publicID;
  private String systemID;

  public Notation(String name, String publicID,
   String systemID) {

    this.name = name;
    this.publicID = publicID;
    this.systemID = systemID;

  }

  public String getName() {
    return this.name;
  }

  public String getSystemID() {
    return this.systemID;
  }

  public String getPublicID() {
    return this.publicID;
  }

  public boolean equals(Object o) {

    if (o instanceof Notation) {
      Notation n = (Notation) o;
      // Well-formedness requires every notation to have
      // at least a SYSTEM or a PUBLIC ID so both should not be
      // simultaneously null as long as the UnparsedCache built
      // this object.
      if (publicID == null) {
        return name.equals(n.name)
         && systemID.equals(n.systemID);
      }
      else if (systemID == null) {
        return name.equals(n.name)
         && publicID.equals(n.publicID);
      }
      else {
        return name.equals(n.name)
         && publicID.equals(n.publicID)
         && systemID.equals(n.systemID);
      }
    }
    return false;

  }

  public int hashCode() {

    if (publicID == null) {
      return name.hashCode() ^ systemID.hashCode();
    }
    else if (systemID == null) {
      return name.hashCode() ^ publicID.hashCode();
    }
    else {
      return name.hashCode() ^ publicID.hashCode()
       ^ systemID.hashCode();
    }

  }

  public String toString() {

    StringBuffer result = new StringBuffer(name);
    if (publicID != null) {
      result.append(" PUBLIC ");
      result.append(publicID);
      if (systemID != null) {
        result.append(" ");
        result.append(systemID);
      }
    }
    else {
      result.append(" SYSTEM ");
      result.append(systemID);
    }
    return result.toString();

  }

}
Example 7.19 An UnparsedEntity Utility Class
public class UnparsedEntity {

  private String name;
  private String publicID;
  private String systemID;
  private String notationName;

  public UnparsedEntity(String name, String publicID,
   String systemID, String notationName) {

    this.name = name;
    this.publicID = publicID;
    this.systemID = systemID;
    this.notationName = notationName;

  }

  public String getName() {
    return this.name;
  }

  public String getSystemID() {
    return this.systemID;
  }

  public String getPublicID() {
    return this.publicID;
  }

  public String getNotationName() {
    return this.notationName;
  }

  public boolean equals(Object o) {

    if (o instanceof UnparsedEntity) {
      UnparsedEntity entity = (UnparsedEntity) o;
      if (publicID == null) {
        return name.equals(entity.name)
         && systemID.equals(entity.systemID)
         && notationName.equals(entity.notationName);
      }
      else {
        return name.equals(entity.name)
         && systemID.equals(entity.systemID)
         && publicID.equals(entity.publicID)
         && notationName.equals(entity.notationName);
      }
    }
    return false;

  }

  public int hashCode() {

    if (publicID == null) {
      return name.hashCode() ^ systemID.hashCode()
       ^ notationName.hashCode();
    }
    else {
      return name.hashCode() ^ publicID.hashCode()
       ^ systemID.hashCode() ^ notationName.hashCode();
    }

  }

  public String toString() {

    StringBuffer result = new StringBuffer(name);
    if (publicID == null) {
      result.append(" PUBLIC ");
      result.append(publicID);
    }
    else {
      result.append(" SYSTEM ");
    }
    result.append(" ");
    result.append(systemID);

    return result.toString();
  }

}

When you later encounter an attribute of type ENTITY, ENTITIES, or NOTATION in the ContentHandler, you can use the getEntity() and getNotation() methods to return the relevant data for that item. Example 7.20 is a simple program to list the unparsed entities and notations discovered in an XML document.

Example 7.20 A Program That Lists the Unparsed Entities and Notations Used in an XML Document
import org.xml.sax.*;
import org.xml.sax.helpers.*;
import java.io.IOException;
import java.util.StringTokenizer;

public class EntityLister extends DefaultHandler {

  private UnparsedCache cache;

  public EntityLister(UnparsedCache cache) {
    this.cache = cache;
  }

  public void startElement(String namespaceURI, String localName,
   String qualifiedName, Attributes attributes) {

    for (int i = 0; i < attributes.getLength(); i++) {

      if (attributes.getType(i).equals("NOTATION")) {
        Notation n = cache.getNotation(attributes.getValue(i));
        System.out.println("Element " + qualifiedName
         + " has notation " + n);
      }
      else if (attributes.getType(i).equals("ENTITY")) {
        UnparsedEntity e = cache.getUnparsedEntity(
         attributes.getValue(i));
        System.out.println("Entity: " + e);
      }
      else if (attributes.getType(i).equals("ENTITIES")) {
        String entityNames = attributes.getValue(i);
        StringTokenizer st
         = new StringTokenizer(entityNames);
        while (st.hasMoreTokens()) {
           String name = st.nextToken();
           UnparsedEntity e = cache.getUnparsedEntity(name);
           System.out.println("Entity: " + e);
        }

      }

    }

  }


  public static void main(String[] args) {

    if (args.length <= 0) {
      System.out.println("Usage: java EntityLister URL");
      return;
    }
    String document = args[0];

    try {
      XMLReader parser = XMLReaderFactory.createXMLReader();

      // I want to use qualified names
      parser.setFeature(
       "http://xml.org/sax/features/namespace-prefixes", true);

      UnparsedCache cache = new UnparsedCache();
      parser.setDTDHandler(cache);
      parser.setContentHandler(new EntityLister(cache));

      parser.parse(document);
    }
    catch (Exception e) {
      System.out.println("Could not read document because "
       + e.getMessage());
    }

  }

}

It took me a while to find an XML document in the wild that actually used notations and unparsed entities. But David Carlisle pointed out to me that DocBook uses notations to identify preformatted elements in which white space should be preserved. This book is written in DocBook, so I decided to run EntityLister across a rough draft of this chapter. Here's what came out:

% java EntityLister xmlreader.xml 
Element screen has notation linespecific SYSTEM linespecific
Element programlisting has notation linespecific
 SYSTEM linespecific
Element programlisting has notation linespecific
 SYSTEM linespecific
Element programlisting has notation linespecific
 SYSTEM linespecific
Element programlisting has notation linespecific
 SYSTEM linespecific
Element programlisting has notation linespecific
 SYSTEM linespecific
Element programlisting has notation linespecific
 SYSTEM linespecific
Element programlisting has notation linespecific
 SYSTEM linespecific
Element programlisting has notation linespecific
 SYSTEM linespecific
Element programlisting has notation linespecific
 SYSTEM linespecific
Element programlisting has notation linespecific
 SYSTEM linespecific
Element programlisting has notation linespecific
 SYSTEM linespecific
Element programlisting has notation linespecific
 SYSTEM linespecific
Element programlisting has notation linespecific
 SYSTEM linespecific
Element programlisting has notation linespecific
 SYSTEM linespecific
Element screen has notation linespecific SYSTEM linespecific
Element programlisting has notation linespecific
 SYSTEM linespecific
...
    Previous Section Next Section


         Main Menu
    Main Page
    Table of content
    Copyright
    Praise for Elliotte Rusty Harold's 'Processing XML with Java™'
    List of Examples
    List of Figures
    Preface
    Part I: XML
    Part II: SAX
    Chapter 6. SAX
    Chapter 7. The XMLReader Interface
    Building Parser Objects
    Input
    Exceptions and Errors
    Features and Properties
    DTDHandler
    Summary
    Chapter 8. SAX Filters
    Part III: DOM
    Part IV: JDOM
    Part V: XPath/XSLT
    Part VI: Appendixes


    More Books
    PHP Hacks
    Processing Xml With Java - A Guide To Sax, Dom, Jdom, Jaxp, And Trax
    The Koran (Holy Qur'an)
    Macromedia Flash 8 Bible
    Search Engine Optimization for Dummies
    YouTube Traffic
    PHP 5 for Dummies
    Harry Potter and The Chamber of Secrets
    Harry Potter and the Sorcerer's Stone
    The Pilgrim's Progress
    Wireless Hacks
    Flash Hacks. 100 Industrial-Strength Tips & Tools
    PayPal Hacks. 100 Industrial-Strength Tips and Tools
    Amazon Hacks
    Pdf Hacks
    The Da Vinci Code
    Google Hacks
    The Holy Bible
    Windows XP For Dummies
    Harry Potter and the Half-Blood Prince
    Seo Book
    Upgrading and Repairing Networks
    Macromedia Dreamweaver 8 UNLEASHED
    Windows XP Annoyances
    Windows XP Hacks
    Microsoft Windows XP Power Toolkit
    Teach Yourself MS Office In 24Hours
    iPod & iTunes Missing Manual
    PC Hacks 100 Industrial-Strength Tips and Tools
    PC Overclocking, Optimization, and Tuning - 2th Edition
    PC Hardware In A Nutshell 3rd Edition
    PC Hardware in a Nutshell, 2nd Edition
    Upgrading and Repairing PCs
    Google for Dummies
    MySQL Cookbook
    Teach Yourself Macromedia Flash 8 In 24 Hours
    PHP CookBook
    Sams Teach Yourself JavaScript in 24 Hours
    PHP5 Manual
    Free Games Paper Airplanes
    500 Juegos Gratis 500 Giochi Gratis 500 Jeux Gratuits 500 Jogos Gratis 500 Kostenlose Spiele