PHP CookBook Free Open Book

PHP CookBook

Previous Section Next Section

11.1 Introduction

Most of the time, PHP is part of a web server, sending content to browsers. Even when you run it from the command line, it usually performs a task and then prints some output. PHP can also be useful, however, playing the role of a web browser — retrieving URLs and then operating on the content. Most recipes in this chapter cover retrieving URLs and processing the results, although there are a few other tasks in here as well, such as using templates and processing server logs.

There are four ways to retrieve a remote URL in PHP. Choosing one method over another depends on your needs for simplicity, control, and portability. The four methods are to use fopen( ) , fsockopen( ), the cURL extension, or the HTTP_Request class from PEAR.

Using fopen( ) is simple and convenient. We discuss it in Recipe 11.2. The fopen( ) function automatically follows redirects, so if you use this function to retrieve the directory http://www.example.com/people and the server redirects you to http://www.example.com/people/, you'll get the contents of the directory index page, not a message telling you that the URL has moved. The fopen( ) function also works with both HTTP and FTP. The downsides to fopen( ) include: it can handle only HTTP GET requests (not HEAD or POST), you can't send additional headers or any cookies with the request, and you can retrieve only the response body with it, not response headers.

Using fsockopen( ) requires more work but gives you more flexibility. We use fsockopen( ) in Recipe 11.3. After opening a socket with fsockopen( ), you need to print the appropriate HTTP request to that socket and then read and parse the response. This lets you add headers to the request and gives you access to all the response headers. However, you need to have additional code to properly parse the response and take any appropriate action, such as following a redirect.

If you have access to the cURL extension or PEAR's HTTP_Request class, you should use those rather than fsockopen( ). cURL supports a number of different protocols (including HTTPS, discussed in Recipe 11.6) and gives you access to response headers. We use cURL in most of the recipes in this chapter. To use cURL, you must have the cURL library installed, available at http://curl.haxx.se. Also, PHP must be built with the --with-curl configuration option.

PEAR's HTTP_Request class, which we use in Recipe 11.3, Recipe 11.4, and Recipe 11.5, doesn't support HTTPS, but does give you access to headers and can use any HTTP method. If this PEAR module isn't installed on your system, you can download it from http://pear.php.net/get/HTTP_Request. As long as the module's files are in your include_path, you can use it, making it a very portable solution.

Recipe 11.7 helps you go behind the scenes of an HTTP request to examine the headers in a request and response. If a request you're making from a program isn't giving you the results you're looking for, examining the headers often provides clues as to what's wrong.

Once you've retrieved the contents of a web page into a program, use Recipe 11.8 through Recipe 11.12 to help you manipulate those page contents. Recipe 11.8 demonstrates how to mark up certain words in a page with blocks of color. This technique is useful for highlighting search terms, for example. Recipe 11.9 provides a function to find all the links in a page. This is an essential building block for a web spider or a link checker. Converting between plain ASCII and HTML is covered in Recipe 11.10 and Recipe 11.11. Recipe 11.12 shows how to remove all HTML and PHP tags from a web page.

Another kind of page manipulation is using a templating system. Discussed in Recipe 11.13, templates give you freedom to change the look and feel of your web pages without changing the PHP plumbing that populates the pages with dynamic data. Similarly, you can make changes to the code that drives the pages without affecting the look and feel. Recipe 11.14 discusses a common server administration task — parsing your web server's access log files.

Two sample programs use the link extractor from Recipe 11.9. The program in Recipe 11.15 scans the links in a page and reports which are still valid, which have been moved, and which no longer work. The program in Recipe 11.16 reports on the freshness of links. It tells you when a linked-to page was last modified and if it's been moved.

    Previous Section Next Section
    Index: [SYMBOL][A][B][C][D][E][F][G][H][I][J][K][L][M][N][O][P][Q][R][S][T][U][V][W][X][Z]


         Main Menu
    Main Page
    Table of content
    Copyright
    Preface
    Chapter 1. Strings
    Chapter 2. Numbers
    Chapter 3. Dates and Times
    Chapter 4. Arrays
    Chapter 5. Variables
    Chapter 6. Functions
    Chapter 7. Classes and Objects
    Chapter 8. Web Basics
    Chapter 9. Forms
    Chapter 10. Database Access
    Chapter 11. Web Automation
    11.1 Introduction
    Recipe 11.2 Fetching a URL with the GET Method
    Recipe 11.3 Fetching a URL with the POST Method
    Recipe 11.4 Fetching a URL with Cookies
    Recipe 11.5 Fetching a URL with Headers
    Recipe 11.6 Fetching an HTTPS URL
    Recipe 11.7 Debugging the Raw HTTP Exchange
    Recipe 11.8 Marking Up a Web Page
    Recipe 11.9 Extracting Links from an HTML File
    Recipe 11.10 Converting ASCII to HTML
    Recipe 11.11 Converting HTML to ASCII
    Recipe 11.12 Removing HTML and PHP Tags
    Recipe 11.13 Using Smarty Templates
    Recipe 11.14 Parsing a Web Server Log File
    Recipe 11.15 Program: Finding Stale Links
    Recipe 11.16 Program: Finding Fresh Links
    Chapter 12. XML
    Chapter 13. Regular Expressions
    Chapter 14. Encryption and Security
    Chapter 15. Graphics
    Chapter 16. Internationalization and Localization
    Chapter 17. Internet Services
    Chapter 18. Files
    Chapter 19. Directories
    Chapter 20. Client-Side PHP
    Chapter 21. PEAR
    Colophon
    Index


    More Books
    PHP Hacks
    Processing Xml With Java - A Guide To Sax, Dom, Jdom, Jaxp, And Trax
    The Koran (Holy Qur'an)
    Macromedia Flash 8 Bible
    Search Engine Optimization for Dummies
    YouTube Traffic
    PHP 5 for Dummies
    Harry Potter and The Chamber of Secrets
    Harry Potter and the Sorcerer's Stone
    The Pilgrim's Progress
    Wireless Hacks
    Flash Hacks. 100 Industrial-Strength Tips & Tools
    PayPal Hacks. 100 Industrial-Strength Tips and Tools
    Amazon Hacks
    Pdf Hacks
    The Da Vinci Code
    Google Hacks
    The Holy Bible
    Windows XP For Dummies
    Harry Potter and the Half-Blood Prince
    Seo Book
    Upgrading and Repairing Networks
    Macromedia Dreamweaver 8 UNLEASHED
    Windows XP Annoyances
    Windows XP Hacks
    Microsoft Windows XP Power Toolkit
    Teach Yourself MS Office In 24Hours
    iPod & iTunes Missing Manual
    PC Hacks 100 Industrial-Strength Tips and Tools
    PC Overclocking, Optimization, and Tuning - 2th Edition
    PC Hardware In A Nutshell 3rd Edition
    PC Hardware in a Nutshell, 2nd Edition
    Upgrading and Repairing PCs
    Google for Dummies
    MySQL Cookbook
    Teach Yourself Macromedia Flash 8 In 24 Hours
    PHP CookBook
    Sams Teach Yourself JavaScript in 24 Hours
    PHP5 Manual
    Free Games Paper Airplanes
    500 Juegos Gratis 500 Giochi Gratis 500 Jeux Gratuits 500 Jogos Gratis 500 Kostenlose Spiele