Processing Xml With Java - A Guide To Sax, Dom, Jdom, Jaxp, And Trax Free Open Book

Processing Xml With Java - A Guide To Sax, Dom, Jdom, Jaxp, And Trax

Previous Section Next Section

Receiving Documents

In general, a single XMLReader may parse multiple documents and may do so with the same ContentHandler. Consequently, it's important to tell where one document ends and the next document begins. To provide this information, the parser invokes startDocument() as soon as it begins parsing a new document and before it invokes any other methods in ContentHandler. It calls endDocument() after it has finished parsing the document, and it will not report any further content from that document. No arguments are passed to either of these methods, which serve no purpose other than marking the beginning and end of a complete XML document.

Because an XMLReader may parse multiple documents with the same ContentHandler object, per-document data structures are normally initialized in the startDocument() method rather than in a constructor. These data structures can be flushed, saved, or committed as appropriate by the endDocument() method.

Caution

If you are using one ContentHandler for multiple documents, do not assume that the endDocument() method for the previous document actually ran. If one of the earlier methods such as startElement() threw an exception, it's likely that the parsing was not finished and that any cleanup code you put in endDocument() was not executed. For safety, it's a good idea to reinitialize all per-document data structures in startDocument().


For example, let's revise the tag stripper program so that it can operate on multiple XML documents in series. Furthermore, rather than printing the results on a Writer, we'll store them in a List of Strings. As is common in SAX programs, we need a data structure that holds the information collected from each document. For this simple program, a simple data structure suffices, namely a StringBuffer, which is stored in the currentDocument field. This field is initialized to a new StringBuffer object in the startDocument() method, and converted to a string and stored in the documents vector in the endDocument() method. Example 6.6 demonstrates the necessary ContentHandler class. The characters() method simply appends text to the currentDocument buffer.

Example 6.6 A ContentHandler Interface That Resets Its Data Structures Between Documents
import org.xml.sax.*;
import java.util.List;

public class MultiTextExtractor implements ContentHandler {

  private List documents;

  // This field is deliberately not initialized in the
  // constructor. It is initialized for each document parsed, not
  // for each object constructed.
  private StringBuffer currentDocument;

  public MultiTextExtractor(List documents) {

    if (documents == null) {
      throw new NullPointerException(
       "Documents list must be non-null");
    }
    this.documents = documents;
  }

  // Initialize the per-document data structures
  public void startDocument() {

    currentDocument = new StringBuffer();
  }

  // Flush and commit the per-document data structures
  public void endDocument() {

    String text = currentDocument.toString();
    documents.add(text);

  }

  // Update the per-document data structures
  public void characters(char[] text, int start, int length) {

    currentDocument.append(text, start, length);

  }

  // do-nothing methods
  public void setDocumentLocator(Locator locator) {}
  public void startPrefixMapping(String prefix, String uri) {}
  public void endPrefixMapping(String prefix) {}
  public void startElement(String namespaceURI, String localName,
   String qualifiedName, Attributes atts) {}
  public void endElement(String namespaceURI, String localName,
   String qualifiedName) {}
  public void ignorableWhitespace(char[] text, int start,
   int length) {}
  public void processingInstruction(String target,
   String data) {}
  public void skippedEntity(String name) {}

}
    Previous Section Next Section


         Main Menu
    Main Page
    Table of content
    Copyright
    Praise for Elliotte Rusty Harold's 'Processing XML with Java™'
    List of Examples
    List of Figures
    Preface
    Part I: XML
    Part II: SAX
    Chapter 6. SAX
    What Is SAX?
    Parsing
    Callback Interfaces
    Receiving Documents
    Receiving Elements
    Handling Attributes
    Receiving Characters
    Receiving Processing Instructions
    Receiving Namespace Mappings
    'Ignorable White Space'
    Receiving Skipped Entities
    Receiving Locators
    What the ContentHandler Doesn't Tell You
    Summary
    Chapter 7. The XMLReader Interface
    Chapter 8. SAX Filters
    Part III: DOM
    Part IV: JDOM
    Part V: XPath/XSLT
    Part VI: Appendixes


    More Books
    PHP Hacks
    Processing Xml With Java - A Guide To Sax, Dom, Jdom, Jaxp, And Trax
    The Koran (Holy Qur'an)
    Macromedia Flash 8 Bible
    Search Engine Optimization for Dummies
    YouTube Traffic
    PHP 5 for Dummies
    Harry Potter and The Chamber of Secrets
    Harry Potter and the Sorcerer's Stone
    The Pilgrim's Progress
    Wireless Hacks
    Flash Hacks. 100 Industrial-Strength Tips & Tools
    PayPal Hacks. 100 Industrial-Strength Tips and Tools
    Amazon Hacks
    Pdf Hacks
    The Da Vinci Code
    Google Hacks
    The Holy Bible
    Windows XP For Dummies
    Harry Potter and the Half-Blood Prince
    Seo Book
    Upgrading and Repairing Networks
    Macromedia Dreamweaver 8 UNLEASHED
    Windows XP Annoyances
    Windows XP Hacks
    Microsoft Windows XP Power Toolkit
    Teach Yourself MS Office In 24Hours
    iPod & iTunes Missing Manual
    PC Hacks 100 Industrial-Strength Tips and Tools
    PC Overclocking, Optimization, and Tuning - 2th Edition
    PC Hardware In A Nutshell 3rd Edition
    PC Hardware in a Nutshell, 2nd Edition
    Upgrading and Repairing PCs
    Google for Dummies
    MySQL Cookbook
    Teach Yourself Macromedia Flash 8 In 24 Hours
    PHP CookBook
    Sams Teach Yourself JavaScript in 24 Hours
    PHP5 Manual
    Free Games Paper Airplanes
    500 Juegos Gratis 500 Giochi Gratis 500 Jeux Gratuits 500 Jogos Gratis 500 Kostenlose Spiele