|
Free Open Book
Google Hacks |
Hack 36 XooMLe: The Google API in Plain Old XML
Getting Google results in XML using the XooMLe wrapper. When Google released their Web APIs in April 2002, everyone agreed that it was fantastic, but some thought it could have been better. Google's API was to be driven by Simple Object Access Protocol (SOAP), which wasn't exactly what everyone was hoping for. What's wrong with SOAP? Google made the biggest, best search engine in the world available as a true web service, so it must be a good thing, right? Sure, but a lot of people argued that by using SOAP, they had made it unnecessarily difficult to access Google's service. They argued that using simple HTTP-based technologies would have provided everything they needed, while also making it a much simpler service to use. The irony of this was not lost on everyone—Google, being so well-known and widely used, in part because of its simplicity, was now being slammed for making their service difficult to access for developers. The argument was out there: SOAP was bad, Google needed a REST! Representational State Transfer (REST) is a model for web services that makes use of existing protocols and technologies, such as HTTP GET requests, URIs, and XML to provide a transaction-based approach to web services. The argument was that REST provided a much simpler means of achieving the same results, given Google's limited array of functionality. REST proponents claimed that Google should have made their API available through the simpler approach of requesting a defined URI, including query string-based parameters such as the search term and the output encoding. The response would then be a simple XML document that included results or an error of some sort. After playing with the Google API, I had enough exposure to at least know my way around the WSDL and other bits and pieces involved in working with Google. I read a lot of the suggestions and proposals for how Google "should have done it" and set about actually doing it. The result was XooMLe (http://www.dentedreality.com.au/xoomle/). The first step was to create a solid architecture for accessing the Google API. I was working with the popular and very powerful scripting language, PHP, so this was made very simple by grabbing a copy of Dietrich Ayala's SOAP access class called NuSOAP. Once I had that in place, it was a simple process of writing a few functions and bits and pieces to call the SOAP class, query Google, then reformat the response to something a bit "lighter." I chose to implement a system that would accept a request for a single URL (because at this stage I wasn't too familiar with the RESTful way of doing things) containing a number of parameters, depending on which method was being called from Google. The information returned would depend on the type of request, as outlined here: All the methods would also optionally return a standardized, XML-encoded error message if something went wrong, which would allow developers to easily determine if their requests were successful. Providing this interface required only a small amount of processing before returning the information back to the user. In the case of a call to doGoogleSearch, the results were just mapped across to the XML template then returned, doSpellingSuggestion just had to pull out the suggestion and send that back, while doGetCachedPage had to decode the result (from base-64 encoding) then strip off the first 5 lines of HTML, which contained a Google header. This allowed XooMLe to return just what the user requested; a clean, cached copy of a page, a simple spelling suggestion, or a set of results matching a search term. Searching was XooMLe's first hurdle—returning SOAP-encoded results from Google in clean, custom XML tags, minus the "fluff." I chose to use an XML template rather than hardcoding the structure directly into my code. The template holds the basic structure of a result set returned from Google. It includes things like the amount of time the search took, the title of each result, their URLs, plus other information that Google tracks. This XML template is based directly on the structure outlined in the WSDL and obviously on the actual information returned from Google. It is parsed, and then sections of it are duplicated and modified as required, so that a clean XML document is populated with the results, then sent to the user. If something goes wrong, an error message is encoded in a different XML template and sent back instead. Once searching was operational, spelling suggestions were quickly added, simply removing the suggestion from its SOAP envelope and returning it as plain text. Moving on to the cached pages proved to require a small amount of manipulation, where the returned information had to be converted back to a plain string (originally a base-64 encoded string from Google) and then the Google header, which is automatically added to the pages in their cache, had to be removed. Once that was complete, the page was streamed back to the user, so that if she printed the results of the request directly to the screen, a cached copy of the web page would be displayed directly. After posting the results of this burst of development to the DentedReality web site, nothing much happened. No one knew about XooMLe, so no one used it. I happened to be reading Dave Winer's Scripting News, so I fired off an email to him about XooMLe, just suggesting that he might be interested in it. Five minutes later (literally) there was a link to it on Scripting News describing it as a "REST-style interface," and within 12 hours, I had received approximately 700 hits to the site! It didn't stop there; the next morning when I checked my email, I had a message from Paul Prescod with some suggestions for making it more RESTful and improving the general functionality of it as a service. After exchanging a few emails directly with Prescod, plus receiving a few other suggestions and comments from people on the REST-discuss Yahoo! Group (which I quickly became a member of), I went ahead and made a major revision to XooMLe. This version introduced a number of changes:
And thus a RESTful web service was born. XooMLe implements the full functionality of the Google API (and actually extends it in a few places), using a much simpler interface and output format. A XooMLe result set can be bookmarked, a spelling suggestion can be very easily obtained via a bookmarklet, results can be parsed in pretty much every programming language using simple, native functions, and cached pages are immediately usable upon retrieval. XooMLe demonstrates that it was indeed quite feasible for Google to implement their API using the REST architecture, and provides a useful wrapper to the SOAP functionality they have chosen to expose. It is currently being used as an example of "REST done right" by a number of proponents of the model, including some Amazon/Google linked services being developed by one of the REST-discuss members. On its own, XooMLe may not be particularly useful, but teamed with the imagination and programming prowess of the web services community, it will no doubt help create a new wave of toys, tools, and talking points. 36.1 How It WorksBasically, to use XooMLe you just need to "request" a web page, then do something with the result that gets sent back to you. Some people might call this a request-response architecture, whatever—you ask XooMLe for something, it asks Google for the same thing, then formats the results in a certain format and gives it to you, from there on, you can do what you like with it. Long story short—everything you can ask the Google SOAP API, you can ask XooMLe. 36.1.1 Google Method: doGoogleSearch
36.1.2 Extra Features
36.1.3 Google Method: doSpellingSuggestion
36.1.4 Google Method: doGetCachedPage
36.1.5 Asking XooMLe Something (Forming Requests)Asking XooMLe something is really easy; you can do it in a normal hyperlink, a bookmark, a Favorite, whatever. A request to XooMLe exists as a URL, which contains some special information. It looks something like this: http://xoomle.dentedreality.com.au/search/?key=YourGoogleDeveloperKey&q=dented+reality Enough generic examples! If you are talking to XooMLe, the address you need is: http://xoomle.dentedreality.com.au/<method keyword>/
Your requests might look something like the previous example, or they might be fully fleshed out like the following: http://xoomle.dentedreality.com.au/search/ ?key=YourKey &q=dented+realty &maxResults=1 &start=0 &hl=en &ie=ISO-8859-1 &filter=0 &restrict=countryAU &safeSearch=1 &lr=en &ie=latin1 &oe=latin1 &xsl=myxsl.xsl Note that each option is on a different line so they're easier to read; properly formatted they would be in one long string. All the available parameters are defined in the Google documentation, but just to refresh your memory: key means your Google Developer Key, go get one if you don't have one already (and remember to URL-encode it when passing it in the query string as well!). Another thing you might like to know is that XooMLe makes use of some fancy looping to allow you to request more than the allowed 10 results in one request. If you ask XooMLe to get (for example) 300 results, it will perform multiple queries to Google and send back 300 results to you in XML format. Keep in mind that this still uses up your request limit (1,000 queries per day) in blocks of 10 though, so in this case, you'd drop 30 queries in one hit (and it would take a while to return that many results). 36.2 Error MessagesIf you do something wrong, XooMLe will tell you in a nice little XML package. The errors all look something like this, but they have a different error message and contain that "arguments" array, which includes everything you asked it. Below are all the error messages that you can get, and why you will get them.
36.3 Putting XooMLe to Work: A SOAP::Lite Substitution ModuleXooMLe is not only a handy way to get Google results in XML, it's a handy way to replace the required SOAP::Lite module that a lot of ISPs don't support. XooMLe.pm is a little Perl module best saved into the same directory as your hacks themselves. # XooMLe.pm
# XooMLe is a drop-in replacement for SOAP::Lite designed to use
# the plain old XML to Google SOAP bridge provided by the XooMLe
# service.
package XooMLe;
use strict;
use LWP::Simple; use XML::Simple;
sub new {
my $self = {};
bless($self);
return $self;
}
sub doGoogleSearch {
my($self, %args); ($self, @args{qw/ key q start maxResults
filter restrict safeSearch lr ie oe /}) = @_;
my $xoomle_url = 'http://xoomle.dentedreality.com.au';
my $xoomle_service = 'search';
# Query Google via XooMLe
my $content = get(
"$xoomle_url/$xoomle_service/?" .
join '&', map { "$_=$args{$_}" } keys %args
);
# Parse the XML my $results = XMLin($content);
# Normalize
$results->{GoogleSearchResult}->{resultElements} =
$results->{GoogleSearchResult}->{resultElements}->{item};
foreach (@{$results->{GoogleSearchResult}->{'resultElements'}}) {
$_->{URL} = $_->{URL}->{content};
ref $_->{snippet} eq 'HASH' and $_->{snippet} = '';
ref $_->{title} eq 'HASH' and $_->{title} = '';
}
return $results->{GoogleSearchResult};
}
1;
36.4 Using the XooMLe ModuleHere's a little script to show our home-brewed XooMLe module in action. Its no different, really, from any number of hacks in this book. The only minor alterations necessary to make use of XooMLe instead of SOAP::Lite are highlighted in bold. #!/usr/bin/perl
# xoomle_google2csv.pl
# Google Web Search Results via XooMLe 3rd party web service
# exported to CSV suitable for import into Excel
# Usage: xoomle_google2csv.pl "{query}" [> results.csv]
# Your Google API developer's key
my $google_key = 'insert key here';
use strict;
# Uses our home-brewed XooMLe Perl module
# use SOAP::Lite
use XooMLe;
$ARGV[0] or die qq{usage: perl xoomle_search2csv.pl "{query}"\n};
# Create a new XooMLe object rather than using SOAP::Lite
# my $google_search = SOAP::Lite->service("file:$google_wdsl");
my $google_search = new XooMLe;
my $results = $google_search -> doGoogleSearch(
$google_key, shift @ARGV, 0, 10, "false", "",
"false", "", "latin1", "latin1"
);
@{$results->{'resultElements'}} or warn 'No results';
print qq{"title","url","snippet"\n};
foreach (@{$results->{'resultElements'}}) {
$_->{title} =~ s!"!""!g;
# double escape " marks
$_->{snippet} =~ s!"!""!g;
my $output = qq{"$_->{title}","$_->{URL}","$_->{snippet}"\n};
# drop all HTML tags
$output =~ s!<.+?>!!g;
print $output;
}
36.5 Running the HackRun the script from the command line, providing a query and sending the output to a CSV file you wish to create or to which you wish to append additional results. For example, using "restful SOAP" as our query and results.csv as our output: $ perl xoomle_google2csv.pl "restful SOAP" > results.csv
Leaving off the > and CSV filename sends the results to the screen for your perusal. 36.6 ApplicabilityIn the same manner, you can adapt just about any SOAP::Lite-based hack in this book and those you've made up yourself to use the XooMLe module.
In general, bear in mind that your mileage may vary and don't be afraid to tweak. 36.7 See Also
—Beau Lebens and Rael Dornfest
|
Main Menu |
| 500 Juegos Gratis | 500 Giochi Gratis | 500 Jeux Gratuits | 500 Jogos Gratis | 500 Kostenlose Spiele |