Processing Xml With Java - A Guide To Sax, Dom, Jdom, Jaxp, And Trax Free Open Book

Processing Xml With Java - A Guide To Sax, Dom, Jdom, Jaxp, And Trax

Previous Section Next Section

Receiving Skipped Entities

Validating parsers resolve all general entity references that occur in both element content and attribute values. However, nonvalidating parsers are allowed not to read the external DTD subset. Consider the simple XHTML document in Example 6.14.

Example 6.14 An XML Document Containing a Potentially Skipped Entity Reference
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
              "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
  <body>
     <h1>My resum&eacute;</h1>
  </body>
</html>

If a parser does not read the DTD, then it has no way of knowing what the entity reference &eacute; stands for, or indeed whether that entity reference is even properly defined. However, such a nonvalidating parser will assume that the entity reference is defined in the external DTD subset it didn't read. But rather than reporting the replacement text for that entity, it reports a skipped entity using the skippedEntity() callback method:

public void skippedEntity (String name) throws SAXException 

For example, according to the XHTML 1.0 specification, if a User Agent such as a browser

encounters an entity reference (other than one of the predefined HTML entities) for which the User Agent has processed no declaration (which could happen if the declaration is in the external subset which the User Agent hasn't read), the entity reference should be processed as the characters (starting with the ampersand and ending with the semi-colon) that make up the entity reference.

In other words, rather than rendering &prescription_take; as the symbol 8, the browser is supposed to draw it as simply &prescription_take;. If you were writing an XHTML browser that did not validate but did require full conformance to XHTML 1.0, you would probably implement the skippedEntity() method by passing an ampersand, the name of the entity reference, and a semicolon to the characters() method in the same content handler, like this:

public void skippedEntity(String name) 
 throws SAXException {

  StringBuffer sb = new StringBuffer();
  sb.append('&');
  sb.append(name);
  sb.append(';');
  char[] text = new char[sb.length()];
  sb.getChars(0, sb.length(), text, 0)
  this.characters(text, 0, text.length);

}

Skipped entities can also appear in attribute values. For example:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" 
              "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
  <body>
     <div purpose="resum&eacute;">
     ...
     </div>
  </body>
</html>

This is one of the few holes in SAX. The parser will not report such an entity to you. The value it assigns to the attribute is calculated by simply deleting the entity reference. In this example, the value of the purpose attribute would be reported as "resum" if the parser does not read the DTD.

    Previous Section Next Section


         Main Menu
    Main Page
    Table of content
    Copyright
    Praise for Elliotte Rusty Harold's 'Processing XML with Java™'
    List of Examples
    List of Figures
    Preface
    Part I: XML
    Part II: SAX
    Chapter 6. SAX
    What Is SAX?
    Parsing
    Callback Interfaces
    Receiving Documents
    Receiving Elements
    Handling Attributes
    Receiving Characters
    Receiving Processing Instructions
    Receiving Namespace Mappings
    'Ignorable White Space'
    Receiving Skipped Entities
    Receiving Locators
    What the ContentHandler Doesn't Tell You
    Summary
    Chapter 7. The XMLReader Interface
    Chapter 8. SAX Filters
    Part III: DOM
    Part IV: JDOM
    Part V: XPath/XSLT
    Part VI: Appendixes


    More Books
    PHP Hacks
    Processing Xml With Java - A Guide To Sax, Dom, Jdom, Jaxp, And Trax
    The Koran (Holy Qur'an)
    Macromedia Flash 8 Bible
    Search Engine Optimization for Dummies
    YouTube Traffic
    PHP 5 for Dummies
    Harry Potter and The Chamber of Secrets
    Harry Potter and the Sorcerer's Stone
    The Pilgrim's Progress
    Wireless Hacks
    Flash Hacks. 100 Industrial-Strength Tips & Tools
    PayPal Hacks. 100 Industrial-Strength Tips and Tools
    Amazon Hacks
    Pdf Hacks
    The Da Vinci Code
    Google Hacks
    The Holy Bible
    Windows XP For Dummies
    Harry Potter and the Half-Blood Prince
    Seo Book
    Upgrading and Repairing Networks
    Macromedia Dreamweaver 8 UNLEASHED
    Windows XP Annoyances
    Windows XP Hacks
    Microsoft Windows XP Power Toolkit
    Teach Yourself MS Office In 24Hours
    iPod & iTunes Missing Manual
    PC Hacks 100 Industrial-Strength Tips and Tools
    PC Overclocking, Optimization, and Tuning - 2th Edition
    PC Hardware In A Nutshell 3rd Edition
    PC Hardware in a Nutshell, 2nd Edition
    Upgrading and Repairing PCs
    Google for Dummies
    MySQL Cookbook
    Teach Yourself Macromedia Flash 8 In 24 Hours
    PHP CookBook
    Sams Teach Yourself JavaScript in 24 Hours
    PHP5 Manual
    Free Games Paper Airplanes
    500 Juegos Gratis 500 Giochi Gratis 500 Jeux Gratuits 500 Jogos Gratis 500 Kostenlose Spiele