Processing Xml With Java - A Guide To Sax, Dom, Jdom, Jaxp, And Trax Free Open Book

Processing Xml With Java - A Guide To Sax, Dom, Jdom, Jaxp, And Trax

Previous Section Next Section

"Ignorable White Space"

One of the more obscure parts of the XML 1.0 specification is the perhaps misleadingly named "ignorable white space." This is white space that occurs between tags in places where the DTD does not allow mixed content. Consider the XML-RPC document in Example 6.13.

Example 6.13 A Document That Uses Ignorable White Space to Prettify the XML
<?xml version="1.0"?>
<!DOCTYPE methodCall [
  <!ELEMENT methodCall (methodName, params)>
  <!ELEMENT params (param+)>
  <!ELEMENT param (value)>
  <!ELEMENT value (string)>
  <!ELEMENT methodName (#PCDATA)>
  <!ELEMENT string (#PCDATA)>
]>
<methodCall>
  <methodName>lookupSymbol</methodName>
  <params>
    <param>
      <value>
        <string>
          Red Hat
        </string>
      </value>
    </param>
  </params>
</methodCall>

This example has quite a bit of white space just for indenting. In particular, the spaces, carriage returns, and linefeeds between the following exist only for indenting:

  • <methodCall> and <methodName>

  • </methodName> and <params>

  • <params> and <param>

  • <param> and <value>

  • </value> and </param>

  • </param> and </params>

  • </params> and </methodCall>

Furthermore, the DTD says that these elements cannot contain #PCDATA, and therefore it's known that this white space is ignorable. Thus a validating parser will not pass these white space characters to the characters() method. Instead it passes them to the ignorableWhiteSpace() method. A nonvalidating parser might do the same, or it might pass the ignorable white space to the characters() method instead. If this matters to you, make sure you use a validating parser.

The space and line break characters in the string element are not ignorable because the DTD allows this element to contain #PCDATA. This white space is passed to the characters() method along with the words Red and Hat. White space is considered ignorable only where #PCDATA is invalid.

For purposes of this method, white space consists exclusively of the ASCII space (&#x20;), tab (&#x9;), carriage return (&#xD;), and linefeed (&#xA;). Unicode includes many more space characters, including new line (&#x85;), em space (&#x2003;), en space (&#x2002;), and more. However, these characters are never ignorable.

The ignorableWhiteSpace() method has the same arguments and the same caveats as the characters() method. For example, there's no guarantee that each call to this method will contain the maximum contiguous run of ignorable white space. However, its text[] argument should contain nothing except space characters, tabs, carriage returns, and linefeeds, at least in the subarray delineated by start and start+length.

    Previous Section Next Section


         Main Menu
    Main Page
    Table of content
    Copyright
    Praise for Elliotte Rusty Harold's 'Processing XML with Java™'
    List of Examples
    List of Figures
    Preface
    Part I: XML
    Part II: SAX
    Chapter 6. SAX
    What Is SAX?
    Parsing
    Callback Interfaces
    Receiving Documents
    Receiving Elements
    Handling Attributes
    Receiving Characters
    Receiving Processing Instructions
    Receiving Namespace Mappings
    'Ignorable White Space'
    Receiving Skipped Entities
    Receiving Locators
    What the ContentHandler Doesn't Tell You
    Summary
    Chapter 7. The XMLReader Interface
    Chapter 8. SAX Filters
    Part III: DOM
    Part IV: JDOM
    Part V: XPath/XSLT
    Part VI: Appendixes


    More Books
    PHP Hacks
    Processing Xml With Java - A Guide To Sax, Dom, Jdom, Jaxp, And Trax
    The Koran (Holy Qur'an)
    Macromedia Flash 8 Bible
    Search Engine Optimization for Dummies
    YouTube Traffic
    PHP 5 for Dummies
    Harry Potter and The Chamber of Secrets
    Harry Potter and the Sorcerer's Stone
    The Pilgrim's Progress
    Wireless Hacks
    Flash Hacks. 100 Industrial-Strength Tips & Tools
    PayPal Hacks. 100 Industrial-Strength Tips and Tools
    Amazon Hacks
    Pdf Hacks
    The Da Vinci Code
    Google Hacks
    The Holy Bible
    Windows XP For Dummies
    Harry Potter and the Half-Blood Prince
    Seo Book
    Upgrading and Repairing Networks
    Macromedia Dreamweaver 8 UNLEASHED
    Windows XP Annoyances
    Windows XP Hacks
    Microsoft Windows XP Power Toolkit
    Teach Yourself MS Office In 24Hours
    iPod & iTunes Missing Manual
    PC Hacks 100 Industrial-Strength Tips and Tools
    PC Overclocking, Optimization, and Tuning - 2th Edition
    PC Hardware In A Nutshell 3rd Edition
    PC Hardware in a Nutshell, 2nd Edition
    Upgrading and Repairing PCs
    Google for Dummies
    MySQL Cookbook
    Teach Yourself Macromedia Flash 8 In 24 Hours
    PHP CookBook
    Sams Teach Yourself JavaScript in 24 Hours
    PHP5 Manual
    Free Games Paper Airplanes
    500 Juegos Gratis 500 Giochi Gratis 500 Jeux Gratuits 500 Jogos Gratis 500 Kostenlose Spiele