Processing Xml With Java - A Guide To Sax, Dom, Jdom, Jaxp, And Trax Free Open Book

Processing Xml With Java - A Guide To Sax, Dom, Jdom, Jaxp, And Trax

Previous Section Next Section

Parsing Non-XML Documents

One of the neatest tricks that a SAX filter can perform is presenting something that isn't an XML document as if it were one. The client application doesn't need to know that what it's parsing is really a tab-delimited text file or a relational database if the filter hides all of those details and just presents a view of that source as a well-formed XML document.

There are several advantages to using this approach over actually converting the source. For one, it's a lot more dynamically adaptable to changing data. For another, it's easier to integrate into existing XML processing chains. A small advantage is that you don't need to worry about escaping illegal characters like < and & because these would normally be unescaped before the parser passed them back to the client application.

For example, JDBC makes it easy to write a SAX filter that puts an XML face on an SQL table. The parse() method reads from a JDBC ResultSet which it annotates with tags and attributes by firing the appropriate events. The entire ResultSet becomes a single root table element. Each record is presented in a record element, which contains one field element for each field. Each field element contains the value of the field as text. It also has an xsi:type attribute that identifies the type of the field, and a name attribute that identifies the name of the field. An alternate approach would use the SQL field names as the element names rather than the generic field with a name attribute. Example 8.13 demonstrates.

Example 8.13 Accessing Databases through SAX
import org.xml.sax.*;
import org.xml.sax.helpers.*;
import java.sql.*;
import java.io.*;

public class DatabaseFilter extends XMLFilterImpl {
  private Connection connection;

  // The string passed to the constructor must be a JDBC URL that
  // contains all necessary information for connecting to the
  // database such as host, port, username, password, and
  // database name. For example,
  // jdbc:mysql://host:port]/dbname?user=username&password=pass
  // The driver should have been loaded before this method is
  // called
  public DatabaseFilter(String jdbcURL) throws SQLException {
    connection = DriverManager.getConnection(jdbcURL);
  }
  public void parse(InputSource in) throws SAXException,
   IOException, UnsupportedOperationException {
    throw new UnsupportedOperationException(
     "Can't read a database from an InputStream or Reader"
    );
  }

  public void parse(String selectQuery)
   throws SAXException, IOException {

    try {
      Statement statement = connection.createStatement();
      ResultSet data = statement.executeQuery(selectQuery);
      ResultSetMetaData metadata = data.getMetaData();
      int numFields = metadata.getColumnCount();

      Attributes emptyAttributes = new AttributesImpl();
      startElement("", "table", "table", emptyAttributes);

      while (data.next()) {
        startElement("", "record", "record", emptyAttributes);
        for (int field = 1; field <= numFields; field++) {
          AttributesImpl fieldAtts = new AttributesImpl();
          int type = metadata.getColumnType(field);
          String typeName = getSchemaType(type);
          fieldAtts.addAttribute(
           "http://www.w3.org/2001/XMLSchema-instance", "type",
           "xsi:type", "NMTOKEN", typeName);
          String name = metadata.getColumnName(field);
          fieldAtts.addAttribute(
           "", "name", "name", "NMTOKEN", name);

          // Convert nulls to empty elements with xsi:nil="true"
          Object value = data.getObject(field);
          if (value == null) {// null value in database
            fieldAtts.addAttribute(
             "http://www.w3.org/2001/XMLSchema-instance", "nil",
             "xsi:nil", "NMTOKEN", "true");
            startElement("", "field", "field", fieldAtts);
            endElement("", "field", "field");
          }
          else {// non-null value
            startElement("", "field", "field", fieldAtts);
            convertToXML(data, field, type);
            endElement("", "field", "field");
          }
        }
        endElement("", "record", "record");
      }
      endElement("", "table", "table");
      statement.close();
    }
    catch (SQLException e) { // convert exception type
      throw new SAXException(e);
    }

  }

  // I want the XML document to store values in the standard W3C
  // XML Schema Language forms. This requires certain conversions
  // depending on the type of the data
  private void convertToXML(ResultSet data, int field, int type)
   throws SQLException, SAXException {
    switch (type) {
      case Types.BINARY:
      case Types.VARBINARY:
      case Types.LONGVARBINARY:
        hexEncode(data.getBinaryStream(field));
        break;
      case Types.BLOB:
        Blob blob = data.getBlob(field);
        hexEncode(blob.getBinaryStream());
        break;
      case Types.CLOB:
        Clob clob = data.getClob(field);
        Reader r = clob.getCharacterStream();
        char[] text = new char[1024];
        int numRead;
        try {
          while ((numRead = r.read(text, 0, 1024)) != -1) {
            escapeText(text, 0, numRead);
            characters(text, 0, numRead);
          }
        }
        catch (IOException e) {
          throw new SAXException("Read from CLOB failed", e);
        }
        break;
      case Types.ARRAY:
        Array array = data.getArray(field);
        writeArray(array);
        break;
      default: // All other types can be handled as strings
        Object o = data.getObject(field);
        if (o == null) return;
        String s = o.toString();
        char[] value = s.toCharArray();
        escapeText(value, 0, value.length);
        characters(value, 0, value.length);
    }
  }

  private void hexEncode(InputStream in)
   throws SQLException, SAXException {
    try {
      int octet;
      while ((octet = in.read()) != -1) {
        StringWriter out = new StringWriter(2);
        if (octet < 16) out.write('0');
        out.write(Integer.toHexString(octet));
        char[] text = out.toString().toCharArray();
        characters(text, 0, 2);
      }
    }
    catch (IOException e) {
      throw new SAXException(e);
    }

  }

  // String types may contain C0 control characters that are
  // not legal in XML. I convert these to the Unicode replacement
  // character 0xFFFD
  private void escapeText(char[] text, int start, int length) {
    for (int i = start; i < length; i++) {
        text[i] = escapeChar(text[i]);
    }
  }

  private char escapeChar(char c) {
    if (c >= 0x20) return c;
    else if (c == '\n') return c;
    else if (c == '\r') return c;
    else if (c == '\t') return c;
    return '\uFFFD';
  }

  private void writeArray(Array array)
   throws SQLException, SAXException {

    ResultSet data = array.getResultSet();
    int type = array.getBaseType();
    String typeName = getSchemaType(type);

    while (data.next()) {
      AttributesImpl fieldAtts = new AttributesImpl();
      fieldAtts.addAttribute(
        "http://www.w3.org/2001/XMLSchema-instance", "type",
        "xsi:type", "NMTOKEN", typeName);
      startElement("", "component", "component", fieldAtts);
      convertToXML(data, 2, type);
      endElement("", "component", "component");
    }

  }

  public static String getSchemaType(int type) {

    switch (type) {
      case Types.ARRAY:         return "array";
      case Types.BIGINT:        return "xsd:long";
      case Types.BINARY:        return "xsd:hexBinary";
      case Types.BIT:           return "xsd:boolean";
      case Types.BLOB:          return "xsd:hexBinary";
      case Types.CHAR:          return "xsd:string";
      case Types.CLOB:          return "xsd:string";
      case Types.DATE:          return "xsd:date";
      case Types.DECIMAL:       return "xsd:decimal";
      case Types.DOUBLE:        return "xsd:double";
      case Types.FLOAT:         return "xsd:decimal";
      case Types.INTEGER:       return "xsd:int";
      case Types.JAVA_OBJECT:   return "xsd:string";
      case Types.LONGVARBINARY: return "xsd:hexBinary";
      case Types.LONGVARCHAR:   return "xsd:string";
      case Types.NUMERIC:       return "xsd:decimal";
      case Types.REAL:          return "xsd:float";
      case Types.REF:           return "xsd:IDREF";
      case Types.SMALLINT:      return "xsd:short";
      case Types.STRUCT:        return "struct";
      case Types.TIME:          return "xsd:time";
      case Types.TIMESTAMP:     return "xsd:dateTime";
      case Types.TINYINT:       return "xsd:byte";
      case Types.VARBINARY:     return "xsd:hexBinary";
                                // most general type
      default:                  return "xsd:string";
    }

  }
  // Warn clients that this filter does not receive its events
  // from another XML parser
  public void setParent(XMLReader parent)
   throws UnsupportedOperationException {
    throw new UnsupportedOperationException(
     "A DatabaseFilter reads from an underlying SQL database;"
     + " not an underlying XML parser"
    );
  }

}

The trickiest part of this design was not the XML output. It was figuring out how to pass in the database connection parameters and the SQL queries. Sun has defined a JDBC URL, but such a URL only indicates the database, username, and password to access. It does not go all the way down to the level of the SQL query, so it could not be used as a system ID. I chose instead to pass in the database connection parameters through constructors, and the SQL query as a system ID string passed to the parse() method. This is not the customary URI system ID, but because this will only be used in this program and not directly in XML documents, this doesn't cause any major problems. Nonetheless, I changed the name of the formal argument to the parse() method from systemID to SQLQuery to try to make this more obvious. This filter cannot parse InputSource objects because it's not really possible to read a database from a stream. Thus the parse() method that takes an InputSource as an argument throws an UnsupportedOperationException.

A large part of the logic in this filter involves converting JDBC results into text. Most of the JDBC/SQL types have natural string representations that conveniently match W3C XML Schema Language primitive types. You can retrieve these by calling toString() on the corresponding Java object returned by the JDBC getObject() method. But first the binary types (BINARY, VARBINARY, LONGVARBINARY, and BLOB) must be hex encoded.[3] CLOBs don't have to be hex encoded, but they aren't available as a single string and thus have to be read from a stream too. Finally, arrays and structs require more detailed treatment as a complex type rather than a simple type.

[3] Base64 encoding would be more efficient, but I didn't want to introduce another class library just to do base64 encoding.

The sample driver program for this filter, shown in Example 8.14, is very similar to earlier driver programs, except that it requires the user to specify two arguments on the command line: the JDBC URL giving the connection parameters and the SQL query to execute. Because SQL queries normally contain white space, it must be enclosed in double quotes. A more serious example would reuse the same driver (or filter) for multiple queries.

Example 8.14 A Very Simple User Interface for Extracting XML Data from a Relational Database
import org.xml.sax.*;
import org.xml.sax.helpers.XMLReaderFactory;
import java.io.*;
import com.megginson.sax.DataWriter;
import java.sql.SQLException;

public class SQLDriver {

  public static void main(String[] args) {

    if (args.length < 2) {
      System.out.println(
       "Usage: java SQLDriver URL query driverClass");
      return;
    }
    String url = args[0];
    String query = args[1];
    String driverClass = "org.gjt.mm.mysql.Driver"; // MySQL
    if (args.length >= 3) driverClass = args[2];

    try {
      // Load JDBC driver
      Class.forName(driverClass).newInstance();
      // Technically, the newInstance() call isn't needed,
      // but the MM.MySQL documentation suggests this to
      // "work around some broken JVMs"

      XMLFilter filter = new DatabaseFilter(url);
      DataWriter out = new DataWriter();
      out.forceNSDecl(
       "http://www.w3.org/2001/XMLSchema-instance", "xsi");
      out.forceNSDecl("http://www.w3.org/2001/XMLSchema", "xsd");
      out.setIndentStep(2);
      filter.setContentHandler(out);
      filter.parse(query);
      out.flush();
    }
    catch (InstantiationException e) {
      System.out.println(driverClass
       + " could not be instantiated");
    }
    catch (ClassNotFoundException e) {
      System.out.println(driverClass + " could not be found");
    }
    catch (Exception e) {// SQL, SAX, and IO
      e.printStackTrace();
      System.out.println(e);
    }

  }

}

Instead of XMLWriter, this driver uses David Megginson's other public-domain writer program, DataWriter, which is designed for record-oriented data of this sort and inserts extra white space to prettify the output. Following is some of the output from when I ran SQLDriver against one of my databases:

% java -Dorg.xml.sax.driver=gnu.xml.aelfred2.XmlReader  SQLDriver 
'jdbc:mysql://luna.oit.unc.edu/NYWC?user=elharo&password=Password'
   "SELECT LastName, FirstName FROM composers
    WHERE LastName like 'A%'"
<table xmlns:xsd="http://www.w3.org/2001/XMLSchema"
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <record>
    <field xsi:type="xsd:string" name="LastName">Anderson</field>
    <field xsi:type="xsd:string" name="FirstName">Beth</field>
  </record>
  <record>
    <field xsi:type="xsd:string" name="LastName">Austin</field>
    <field xsi:type="xsd:string" name="FirstName">Dorothea</field>
  </record>
  <record>
    <field xsi:type="xsd:string" name="LastName">Austin</field>
    <field xsi:type="xsd:string" name="FirstName">Elizabeth</field>
  </record>
  <record>
    <field xsi:type="xsd:string" name="LastName">Ayers</field>
    <field xsi:type="xsd:string" name="FirstName">Lydia</field>
  </record>
</table>

Warning

Depending on the platform, inputting the above arguments can be a little tricky. The SQL query contains white space, and the JDBC URL contains characters that are important to the shell, such as &. You may need to use quote marks to prevent some of these arguments from being interpreted by the shell. Details vary from platform to platform. Of course, this is just a quick hack to demonstrate the filter. A real program would provide a GUI, making these points moot.


    Previous Section Next Section


         Main Menu
    Main Page
    Table of content
    Copyright
    Praise for Elliotte Rusty Harold's 'Processing XML with Java™'
    List of Examples
    List of Figures
    Preface
    Part I: XML
    Part II: SAX
    Chapter 6. SAX
    Chapter 7. The XMLReader Interface
    Chapter 8. SAX Filters
    The Filter Architecture
    The XMLFilter Interface
    Content Filters
    The XMLFilterImpl Class
    Parsing Non-XML Documents
    Multihandler Adapters
    Summary
    Part III: DOM
    Part IV: JDOM
    Part V: XPath/XSLT
    Part VI: Appendixes


    More Books
    PHP Hacks
    Processing Xml With Java - A Guide To Sax, Dom, Jdom, Jaxp, And Trax
    The Koran (Holy Qur'an)
    Macromedia Flash 8 Bible
    Search Engine Optimization for Dummies
    YouTube Traffic
    PHP 5 for Dummies
    Harry Potter and The Chamber of Secrets
    Harry Potter and the Sorcerer's Stone
    The Pilgrim's Progress
    Wireless Hacks
    Flash Hacks. 100 Industrial-Strength Tips & Tools
    PayPal Hacks. 100 Industrial-Strength Tips and Tools
    Amazon Hacks
    Pdf Hacks
    The Da Vinci Code
    Google Hacks
    The Holy Bible
    Windows XP For Dummies
    Harry Potter and the Half-Blood Prince
    Seo Book
    Upgrading and Repairing Networks
    Macromedia Dreamweaver 8 UNLEASHED
    Windows XP Annoyances
    Windows XP Hacks
    Microsoft Windows XP Power Toolkit
    Teach Yourself MS Office In 24Hours
    iPod & iTunes Missing Manual
    PC Hacks 100 Industrial-Strength Tips and Tools
    PC Overclocking, Optimization, and Tuning - 2th Edition
    PC Hardware In A Nutshell 3rd Edition
    PC Hardware in a Nutshell, 2nd Edition
    Upgrading and Repairing PCs
    Google for Dummies
    MySQL Cookbook
    Teach Yourself Macromedia Flash 8 In 24 Hours
    PHP CookBook
    Sams Teach Yourself JavaScript in 24 Hours
    PHP5 Manual
    Free Games Paper Airplanes
    500 Juegos Gratis 500 Giochi Gratis 500 Jeux Gratuits 500 Jogos Gratis 500 Kostenlose Spiele