PHP CookBook Free Open Book

PHP CookBook

Previous Section Next Section

Recipe 12.4 Parsing XML with the DOM

12.4.1 Problem

You want to parse an XML file using the DOM API. This puts the file into a tree, which you can process using DOM functions. With the DOM, it's easy to search for and retrieve elements that fit a certain set of criteria.

12.4.2 Solution

Use PHP's DOM XML extension. Here's how to read XML from a file:

$dom = domxml_open_file('books.xml');

Here's how to read XML from a variable:

$dom = domxml_open_mem($books);

You can also get just a single node. Here's how to get the root node:

$root = $dom->document_element( );

Here's how to do a depth-first recursion to process all the nodes in a document:

function process_node($node) {
    if ($node->has_child_nodes( )) {
        foreach($node->child_nodes( ) as $n) {
            process_node($n);
        }
    }

    // process leaves
    if ($node->node_type( ) =  = XML_TEXT_NODE) {
        $content = rtrim($node->node_value( ));
        if (!empty($content)) {
            print "$content\n";
        }
    }

}
process_node($root);

12.4.3 Discussion

The W3C's DOM provides a platform- and language-neutral method that specifies the structure and content of a document. Using the DOM, you can read an XML document into a tree of nodes and then maneuver through the tree to locate information about a particular element or elements that match your criteria. This is called tree-based parsing . In contrast, the non-DOM XML functions allow you to do event-based parsing.

Additionally, you can modify the structure by creating, editing, and deleting nodes. In fact, you can use the DOM XML functions to author a new XML document from scratch; see Recipe 12.3

One of the major advantages of the DOM is that by following the W3C's specification, many languages implement DOM functions in a similar manner. Therefore, the work of translating logic and instructions from one application to another is considerably simplified. PHP 4.3 comes with an updated series of DOM functions that are in stricter compliance with the DOM standard than previous versions of PHP. However, the functions are not yet 100% compliant. Future PHP versions should bring a closer alignment, but this may break some applications that need minor updates. Check the DOM XML material in the online PHP Manual at http://www.php.net/domxml for changes. Functions available in earlier versions of PHP are available, but deprecated.

The DOM is large and complex. For more information, read the specification at http://www.w3.org/DOM/ or pick up a copy of XML in a Nutshell; Chapter 18 discusses the DOM.

For DOM parsing, PHP uses libxml, developed for the Gnome project. You can download it from http://www.xmlsoft.org. To activate it, configure PHP with --with-dom.

DOM functions in PHP are object-oriented. To move from one node to another, call methods such as $node->child_nodes( ), which returns an array of node objects, and $node->parent_node( ), which returns the parent node object. Therefore, to process a node, check its type and call a corresponding method:

// $node is the DOM parsed node <book cover="soft">PHP Cookbook</book>
$type = $node->node_type();

switch($type) { 
case XML_ELEMENT_NODE:
    // I'm a tag. I have a tagname property.
    print $node->node_name();  // prints the tagname property: "book" 
    print $node->node_value(); // null
    break;
case XML_ATTRIBUTE_NODE:
    // I'm an attribute. I have a name and a value property.
    print $node->node_name();  // prints the name property: "cover"
    print $node->node_value(); // prints the value property: "soft"
    break;
case XML_TEXT_NODE:
    // I'm a piece of text inside an element.
    // I have a name and a content property.
    print $node->node_name();  // prints the name property: "#text"
    print $node->node_value(); // prints the content property: "PHP Cookbook"
    break;
default:
    // another type
    break;
}

To automatically search through a DOM tree for specific elements, use get_elements_by_tagname( ) . Here's how to do so with multiple book records:

<books>
    <book>
        <title>PHP Cookbook</title>
        <author>Sklar</author>
        <author>Trachtenberg</author>
        <subject>PHP</subject>
    </book>
    <book>
        <title>Perl Cookbook</title>
        <author>Christiansen</author>
        <author>Torkington</author>
        <subject>Perl</subject>
    </book>
</books>

Here's how to find all authors:

// find and print all authors
$authors = $dom->get_elements_by_tagname('author');

// loop through author elements
foreach ($authors as $author) { 
    // child_nodes( ) hold the author values
    $text_nodes = $author->child_nodes( );

    foreach ($text_nodes as $text) {    
         print $text->node_value( );
    }
    print "\n";
}

The get_elements_by_tagname( ) function returns an array of element node objects. By looping through each element's children, you can get to the text node associated with that element. From there, you can pull out the node values, which in this case are the names of the book authors, such as Sklar and Trachtenberg.

12.4.4 See Also

Recipe 12.2 for writing XML without DOM; Recipe 12.3 for writing XML with DOM; Recipe 12.5 for event-based XML parsing; documentation on domxml_open_file( ) at http://www.php.net/domxml-open-file, domxml_open_mem( ) at http://www.php.net/domxml-open-mem, and the DOM functions in general at http://www.php.net/domxml; more information about the underlying DOM C library at http://xmlsoft.org/.

    Previous Section Next Section
    Index: [SYMBOL][A][B][C][D][E][F][G][H][I][J][K][L][M][N][O][P][Q][R][S][T][U][V][W][X][Z]


         Main Menu
    Main Page
    Table of content
    Copyright
    Preface
    Chapter 1. Strings
    Chapter 2. Numbers
    Chapter 3. Dates and Times
    Chapter 4. Arrays
    Chapter 5. Variables
    Chapter 6. Functions
    Chapter 7. Classes and Objects
    Chapter 8. Web Basics
    Chapter 9. Forms
    Chapter 10. Database Access
    Chapter 11. Web Automation
    Chapter 12. XML
    12.1 Introduction
    Recipe 12.2 Generating XML Manually
    Recipe 12.3 Generating XML with the DOM
    Recipe 12.4 Parsing XML with the DOM
    Recipe 12.5 Parsing XML with SAX
    Recipe 12.6 Transforming XML with XSLT
    Recipe 12.7 Sending XML-RPC Requests
    Recipe 12.8 Receiving XML-RPC Requests
    Recipe 12.9 Sending SOAP Requests
    Recipe 12.10 Receiving SOAP Requests
    Recipe 12.11 Exchanging Data with WDDX
    Recipe 12.12 Reading RSS Feeds
    Chapter 13. Regular Expressions
    Chapter 14. Encryption and Security
    Chapter 15. Graphics
    Chapter 16. Internationalization and Localization
    Chapter 17. Internet Services
    Chapter 18. Files
    Chapter 19. Directories
    Chapter 20. Client-Side PHP
    Chapter 21. PEAR
    Colophon
    Index


    More Books
    PHP Hacks
    Processing Xml With Java - A Guide To Sax, Dom, Jdom, Jaxp, And Trax
    The Koran (Holy Qur'an)
    Macromedia Flash 8 Bible
    Search Engine Optimization for Dummies
    YouTube Traffic
    PHP 5 for Dummies
    Harry Potter and The Chamber of Secrets
    Harry Potter and the Sorcerer's Stone
    The Pilgrim's Progress
    Wireless Hacks
    Flash Hacks. 100 Industrial-Strength Tips & Tools
    PayPal Hacks. 100 Industrial-Strength Tips and Tools
    Amazon Hacks
    Pdf Hacks
    The Da Vinci Code
    Google Hacks
    The Holy Bible
    Windows XP For Dummies
    Harry Potter and the Half-Blood Prince
    Seo Book
    Upgrading and Repairing Networks
    Macromedia Dreamweaver 8 UNLEASHED
    Windows XP Annoyances
    Windows XP Hacks
    Microsoft Windows XP Power Toolkit
    Teach Yourself MS Office In 24Hours
    iPod & iTunes Missing Manual
    PC Hacks 100 Industrial-Strength Tips and Tools
    PC Overclocking, Optimization, and Tuning - 2th Edition
    PC Hardware In A Nutshell 3rd Edition
    PC Hardware in a Nutshell, 2nd Edition
    Upgrading and Repairing PCs
    Google for Dummies
    MySQL Cookbook
    Teach Yourself Macromedia Flash 8 In 24 Hours
    PHP CookBook
    Sams Teach Yourself JavaScript in 24 Hours
    PHP5 Manual
    Free Games Paper Airplanes
    500 Juegos Gratis 500 Giochi Gratis 500 Jeux Gratuits 500 Jogos Gratis 500 Kostenlose Spiele