Processing Xml With Java - A Guide To Sax, Dom, Jdom, Jaxp, And Trax Free Open Book

Processing Xml With Java - A Guide To Sax, Dom, Jdom, Jaxp, And Trax

Previous Section Next Section

Alternatives to Java

When all you have is a hammer, most problems look a lot like nails. Since you're reading this book, I'm willing to bet that Java is your hammer of choice, and indeed Java is a very powerful hammer. But sometimes you really could use a screwdriver, and this may be one of those times. I must admit that the solution for imposing hierarchy developed in the last section feels more than a little like pounding a screw with a hammer. Maybe it would be better to use the hammer to set the screw, but then use a screwdriver to drive it in. In this section I want to explore a few possible screwdrivers, including XSLT and XQuery. Rather than using such complex Java code, I'll do the following: First I'll use Java to get the data into the same simple XML format as that produced by Example 4.2, which closely matches the flat input data. Then I'll use XSLT to transform this simple intermediate XML format into the less flat final XML format. To refresh your memory, the flat XML data is organized like this:

<?xml version="1.0"?> 
<Budget>
  <LineItem>
    <FY1994>-1982</FY1994>
    <FY1993>4946</FY1993>
    <FY1992>-3251</FY1992>
    <FY1991>-17373</FY1991>
    <FY1990>-90008</FY1990>
    <AccountCode>265197</AccountCode>
    <On-Off-BudgetIndicator>On-budget</On-Off-BudgetIndicator>
    <TransitionQuarter>0</TransitionQuarter>
    <FY1989>-80069</FY1989>
    <AccountName>Sale of scrap and salvage materials</AccountName>
    <FY1988>-72411</FY1988>
    <FY1987>-60964</FY1987>
    <FY1986>-61462</FY1986>
    <FY1985>-68182</FY1985>
    <FY1984>-79482</FY1984>
    <FY1983>0</FY1983>
    <FY1982>0</FY1982>
    <SubfunctionCode>051</SubfunctionCode>
    <FY1981>0</FY1981>
    <FY2006>-1000</FY2006>
    <FY1980>0</FY1980>
    <FY2005>-1000</FY2005>
    <FY2004>-1000</FY2004>
    <FY2003>-1000</FY2003>
    <FY2002>-1000</FY2002>
    <FY2001>-1000</FY2001>
    <FY2000>-2000</FY2000>
    <AgencyCode>007</AgencyCode>
    <BEACategory>Mandatory</BEACategory>
    <FY1979>0</FY1979>
    <FY1978>0</FY1978>
    <FY1977>0</FY1977>
    <FY1976>0</FY1976>
    <TreasuryAgencyCode>97</TreasuryAgencyCode>
    <AgencyName>Department of Defense--Military</AgencyName>
    <BureauCode>00</BureauCode>
    <BureauName>Department of Defense--Military</BureauName>
    <FY1999>-1000</FY1999>
    <FY1998>-2000</FY1998>
    <FY1997>-4000</FY1997>
    <FY1996>-1000</FY1996>
    <SubfunctionTitle>Department of Defense-Military
     </SubfunctionTitle>
    <FY1995>-1000</FY1995>
  </LineItem>
  <!-- several thousand more LineItem elements... -->
</Budget>

Imposing Hierarchy with XSLT

The XSLT stylesheet shown in Example 4.12 will convert flat XML budget data of this type into an output document of the same form produced by Example 4.11. Because the input file is so large, you may need to raise the memory allocation for your XSLT processor before running the transform.

Example 4.12 An XSLT Stylesheet That Converts Flat XML Data to Hierarchical XML Data
<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <!-- Try to make the output look half decent -->
  <xsl:output indent="yes" encoding="ISO-8859-1"/>

  <!-- Muenchian method -->
  <xsl:key name="agencies" match="LineItem" use="AgencyCode"/>
  <xsl:key name="bureaus"  match="LineItem"
    use="concat(AgencyCode,'+',BureauCode)"/>
  <xsl:key name="accounts" match="LineItem"
    use="concat(AgencyCode,'+',BureauCode,'+',AccountCode)"/>
  <xsl:key name="subfunctions" match="LineItem"
    use="concat(AgencyCode,'+',BureauCode,'+',AccountCode,
    '+',SubfunctionCode)"/>

  <xsl:template match="Budget">
    <Budget year='2001'>
      <xsl:for-each select="LineItem[generate-id()
       = generate-id(key('agencies',AgencyCode)[1])]">
        <Agency>
          <Name><xsl:value-of select="AgencyName"/></Name>
          <Code><xsl:value-of select="AgencyCode"/></Code>
          <xsl:for-each
            select="/Budget/LineItem[AgencyCode
            =current()/AgencyCode]
             [generate-id() =
               generate-id(key('bureaus',
                     concat(AgencyCode, '+', BureauCode))[1])]">
            <Bureau>
              <Name><xsl:value-of select="BureauName"/></Name>
              <Code><xsl:value-of select="BureauCode"/></Code>
              <xsl:for-each select="/Budget/LineItem
                  [AgencyCode=current()/AgencyCode]
                  [BureauCode=current()/BureauCode]
                  [generate-id() = generate-id(key('accounts',
                   concat(AgencyCode,'+',BureauCode,'+',
                                            AccountCode))[1])]">
                <Account>
                  <Name>
                    <xsl:value-of select="AccountName"/>
                  </Name>
                  <Code>
                    <xsl:value-of select="AccountCode"/>
                  </Code>
                  <xsl:for-each select=
                    "/Budget/LineItem
                     [AgencyCode=current()/AgencyCode]
                     [BureauCode=current()/BureauCode]
                     [AccountCode=current()/AccountCode]
                     [generate-id()=generate-id(
                       key('subfunctions' concat(AgencyCode,'+',
                       BureauCode,'+',AccountCode,'+',
                       SubfunctionCode))[1])]">
                    <Subfunction BEACategory="{BEACategory}"
                     BudgetIndicator="{On-Off-BudgetIndicator}">
                      <Title>
                       <xsl:value-of select="SubfunctionTitle"/>
                      </Title>
                      <Code>
                       <xsl:value-of  select="SubfunctionCode"/>
                      </Code>
                      <Amount>
                        <xsl:value-of select="FY2001"/>
                      </Amount>
                    </Subfunction>
                  </xsl:for-each>
                </Account>
              </xsl:for-each>
            </Bureau>
          </xsl:for-each>
        </Agency>
      </xsl:for-each>
    </Budget>
  </xsl:template>

</xsl:stylesheet>

The algorithm for converting flat data to hierarchical data with XSLT is known as the Muenchian method after its inventor, Steve Muench of Oracle. The trick of the Muenchian method is to use the xsl:key element and the key() function to create node sets of all the LineItem elements that share the same agency, bureau, account, or subfunction. Inside the template, the generate-id() function is used to compare the current node to the first node in any given group. Output is generated only if we are indeed processing the first Agency, Bureau, Account, or Subfunction element with a specified code. Also note, that the select attributes in the xsl:for-each elements keep returning to the root rather than processing children and descendants as is customary. This reflects the fact that the hierarchy in the input is not the same as the hierarchy in the output.

One minor advantage of using XSLT instead of Java data structures is that XSLT preserves the order of the input data. You'll notice that the output begins with the Legislative Branch agency, bureau, and Receipts, Central fiscal operations account—the same as the input data does. This was not the case for the output produced by Java.

Note

XSLT 2.0 will make it much easier to write stylesheets that group elements in this fashion. This will likely involve a new xsl:for-each-group element that groups elements according to an XPath expression, and a current-group() function that selects all members of the current group so that they can be processed together.


<?xml version="1.0" encoding="ISO-8859-1"?> 
<Budget year="2001">
   <Agency>
      <Name>Legislative Branch</Name>
      <Code>001</Code>
      <Bureau>
         <Name>Legislative Branch</Name>
         <Code>00</Code>
         <Account>
            <Name>Receipts, Central fiscal operations</Name>
            <Code/>
            <Subfunction BEACategory="Mandatory"
              BudgetIndicator="On-budget">
               <Title>Central fiscal operations</Title>
               <Code>803</Code>
               <Amount>0</Amount>
            </Subfunction>
            <Subfunction BEACategory="Net interest"
              BudgetIndicator="On-budget">
               <Title>Other interest</Title>
               <Code>908</Code>
               <Amount>0</Amount>
            </Subfunction>
         </Account>
         <Account>
            <Name>Charges for services to trust funds</Name>
            ...

The XML Query Language

XSLT is Turing complete. Nonetheless, some operations are more than a little cumbersome in XSLT. XSLT's inventors definitely did not envision using the Muenchian method to impose hierarchy. The W3C has begun work on a language more suitable for querying XML documents, called, simply enough, the XML Query Language, or XQuery for short. XQuery is to XML documents what SQL is to relational tables. However, XQuery is limited to SELECT. It has no equivalent of INSERT, UPDATE, or DELETE. It is a read-only language.

Caution

This section describes bleeding-edge technology. The broad picture presented here is likely to be correct, but the details are almost certain to change. Furthermore, the exact subset of XQuery implemented by early experimental tools varies significantly from one product to the next.


XQuery queries are not in general well-formed XML. Although there is an XML syntax for XQuery, it is not intended to be used by human beings. Instead humans are supposed to write in a more natural 4GL syntax, which will be compiled into XML documents if necessary. If you think about it, this shouldn't be so surprising: SQL statements aren't tables. Why should XQuery statements be XML documents?

The basic nature of an XQuery query is the FLWR (pronounced "flower") statement. FLWR is the acronym for for-let-where-return, the basic form of an XQuery query. In brief, for each node in a node set, let a variable have a certain value, where some condition is true, and return an XML fragment based on the values of these variables. Variables are set and XML is returned using XPath 2.0 expressions.

For example, here's an XQuery that generates a list of agency names from the flat XML budget:

for $name in document("budauth.xml")/Budget/LineItem/AgencyName 
return $name

The for clause iterates over every node in the node set returned by the XPath 2.0 expression document("budauth.xml")/Budget/LineItem/AgencyName. This expression returns a node set containing 3,175 AgencyName elements. The XQuery variable $name is set to each of these elements in turn. The return clause is evaluated for each value of $name. In this case, the return clause says simply to return the node to which the $name variable currently points. In this example, the $name variable always points to an AgencyName element; therefore, the output would begin like this:

<AgencyName>Legislative Branch</AgencyName> 
<AgencyName>Legislative Branch</AgencyName>
<AgencyName>Legislative Branch</AgencyName>
<AgencyName>Legislative Branch</AgencyName>
<AgencyName>Legislative Branch</AgencyName>
...

This is not a well-formed XML document because it does not have a root element. However, it is a well-formed XML document fragment.

You can use the XPath 2.0 distinct-values() function around the XPath expression to select only one of each AgencyName element:

for $name in distinct-values(document("budauth.xml")/Budget/ 
    LineItem/AgencyName)
return $name

The output would now begin like this, listing each agency name only once:

<AgencyName>Legislative Branch</AgencyName> 
<AgencyName>Judicial Branch</AgencyName>
<AgencyName>Department of Agriculture</AgencyName>
<AgencyName>Department of Commerce</AgencyName>
<AgencyName>Department of Defense--Military</AgencyName>
...

As well as copying existing elements, XQuery can create new elements. You can type the tags precisely where you want them to appear. To include the value of a variable (or other expression) inside the tags, enclose it in curly braces. For example, the following query places <Name> and </Name> tags around each agency name, rather than <AgencyName> and </AgencyName>. Notice also that it selects only the text content of each AgencyName element, rather than the complete element node:

for $name in distinct-values( 
    document("budauth.xml")//AgencyName/text())
return <Name>{$name }</Name>

The output now begins like this:

<Name>Legislative Branch</Name> 
<Name>Judicial Branch</Name>
<Name>Department of Agriculture</Name>
<Name>Department of Commerce</Name>
<Name>Department of Defense--Military</Name>
...

More complex queries typically require multiple variables. These can be set in a let clause based on XPath expressions that refer to the variable in the for clause. For example, this query selects distinct agency codes but returns agency names:

for $code in distinct-values(document("budauth.xml")//AgencyCode) 
let $name := $code/../AgencyName
return $name

A where clause can further restrict the members of the node set for which results are generated. where conditions can use boolean connectors such as and, or, and not(). For example, this query finds all the bureaus in the Department of Agriculture:

for $bureau in distinct-values(document("budauth.xml")/Budget/ 
    LineItem/BureauName)
where $bureau/../AgencyName = "Department of Agriculture"
return $bureau

XQuery expressions can nest. That is, the return statement of the FLWR may contain another FLWR. For example, this statement lists all the bureau names inside their respective agencies:

for $ac in distinct-values(document("budauth.xml")//AgencyCode) 
return
  <Agency>
    <Name>{$ac/../AgencyName/text() }</Name>
    {
   for $bc in distinct-values(document("budauth.xml")//BureauCode)
      where $bc/../AgencyCode = $ac
      return
        <Bureau>
          <Name>{$bc/../BureauName/text() }</Name>
        </Bureau>
    }
  </Agency>

The output now begins like this:

<Agency> 
  <Name>Legislative Branch</Name>
  <Bureau>
    <Name>Legislative Branch</Name>
  </Bureau>
  <Bureau>
    <Name>Senate</Name>
  </Bureau>
  <Bureau>
    <Name>House of Representatives</Name>
  </Bureau>
  <Bureau>
    <Name>Joint Items</Name>
  </Bureau>
...

This is all the syntax needed to write a query that will convert flat budget data such as that produced by Example 4.2 into a hierarchical XML document. Example 4.13, which selects the data from 2001, demonstrates such a query.

Example 4.13 An XQuery That Converts Flat Data to Hierarchical Data
<Budget year="2001">
  {
  for $ac in distinct-values(document("budauth.xml")//AgencyCode)
  return
    <Agency>
      <Name>{$ac/../AgencyName/text() }</Name>
      <Code>{$ac/text() }</Code>
      {
        for $bc
         in distinct-values(document("budauth.xml")//BureauCode)
        where $bc/../AgencyCode = $ac
        return
          <Bureau>
            <Name>{$bc/../BureauName/text() }</Name>
            <Code>{$bc/text() }</Code>
            {
            for $acct in distinct-values(
             document("budauth.xml")//AccountCode)
            where $acct/../AgencyCode = $ac
             AND $acct/../BureauCode = $bc
            return
              <Account
                BEACategory="{$acct/../BEACategory/text() }">
                <Name>{$acct/../AccountName/text() }</Name>
                <Code>{$acct/text() }</Code>
                {
                  for $sfx
                    in document("budauth.xml")//SubfunctionCode
                  where $sfx/../AgencyCode = $ac
                    and $sfx/../BureauCode = $bc
                    and $sfx/../AccountCode = $acct
                  return
                    <Subfunction>
                 <Title>{$sfx/../SubfunctionTitle/text()}</Title>
                      <Code>{$sfx/text() }</Code>
                      <Amount>{$sfx/../FY2001/text() }</Amount>
                    </Subfunction>
               }
            </Account>
           }
          </Bureau>
      }
    </Agency>
  }
</Budget>

There's a lot more to XQuery, but this should give you an idea of what it can do. It's definitely worth a look any time you need to perform database-like operations on XML documents.

    Previous Section Next Section


         Main Menu
    Main Page
    Table of content
    Copyright
    Praise for Elliotte Rusty Harold's 'Processing XML with Java™'
    List of Examples
    List of Figures
    Preface
    Part I: XML
    Chapter 1. XML for Data
    Chapter 2. XML Protocols: XML-RPC and SOAP
    Chapter 3. Writing XML with Java
    Chapter 4. Converting Flat Files to XML
    The Budget
    The Model
    Input
    Determining the Output Format
    Building Hierarchical Structures from Flat Data
    Alternatives to Java
    Relational Databases
    Summary
    Chapter 5. Reading XML
    Part II: SAX
    Part III: DOM
    Part IV: JDOM
    Part V: XPath/XSLT
    Part VI: Appendixes


    More Books
    PHP Hacks
    Processing Xml With Java - A Guide To Sax, Dom, Jdom, Jaxp, And Trax
    The Koran (Holy Qur'an)
    Macromedia Flash 8 Bible
    Search Engine Optimization for Dummies
    YouTube Traffic
    PHP 5 for Dummies
    Harry Potter and The Chamber of Secrets
    Harry Potter and the Sorcerer's Stone
    The Pilgrim's Progress
    Wireless Hacks
    Flash Hacks. 100 Industrial-Strength Tips & Tools
    PayPal Hacks. 100 Industrial-Strength Tips and Tools
    Amazon Hacks
    Pdf Hacks
    The Da Vinci Code
    Google Hacks
    The Holy Bible
    Windows XP For Dummies
    Harry Potter and the Half-Blood Prince
    Seo Book
    Upgrading and Repairing Networks
    Macromedia Dreamweaver 8 UNLEASHED
    Windows XP Annoyances
    Windows XP Hacks
    Microsoft Windows XP Power Toolkit
    Teach Yourself MS Office In 24Hours
    iPod & iTunes Missing Manual
    PC Hacks 100 Industrial-Strength Tips and Tools
    PC Overclocking, Optimization, and Tuning - 2th Edition
    PC Hardware In A Nutshell 3rd Edition
    PC Hardware in a Nutshell, 2nd Edition
    Upgrading and Repairing PCs
    Google for Dummies
    MySQL Cookbook
    Teach Yourself Macromedia Flash 8 In 24 Hours
    PHP CookBook
    Sams Teach Yourself JavaScript in 24 Hours
    PHP5 Manual
    Free Games Paper Airplanes
    500 Juegos Gratis 500 Giochi Gratis 500 Jeux Gratuits 500 Jogos Gratis 500 Kostenlose Spiele