Google Hacks Free Open Book

Google Hacks

Previous Section Next Section

5.7 Understanding the Google API Query

The core of a Google application is the query. Without the query, there's no Google data, and without that, you don't have much of an application. Because of its importance, it's worth taking a little time to look into the anatomy of a typical query.

5.7.1 Query Essentials

The command in a typical Perl-based Google API application that sends a query to Google looks like:

my $results = $google_search ->
  doGoogleSearch(
    key, query, start, maxResults, 
    filter, restrict, safeSearch, lr, 
    ie, oe
  );

Usually the items within the parentheses are variables, numbers, or Boolean values (true or false). In the example above, I've included the names of the arguments themselves rather than sample values so you can see their definitions here:

key

This is where you put your Google API developer's key [Chapter 1]. Without a key, the query won't get very far.

query

This is your query, composed of keywords, phrases, and special syntaxes.

start

Also known as the offset, this integer value specifies at what result to start counting when determining which 10 results to return. If this number were 16, the Google API would return results 16-25. If 300, results 300-309 (assuming, of course, that your query found that many results). This is what's known as a "zero-based index"; counting starts at 0, not 1. The first result is result 0, and the 999th, 998. It's a little odd, admittedly, but you get used to it quickly—especially if you go on to do much programming. Acceptable values are 0 to 999, because Google only returns up to a thousand results for a query.

maxResults

This integer specifies the number of results you'd like the API to return. The API returns results in batches of up to ten, so acceptable values are 1 through 10.

filter

You might think the filter option concerns the SafeSearch filter for adult content. It doesn't. This Boolean value (true or false) specifies whether your results go through automatic query filtering, removing near-duplicate content (titles and snippets are very similar) and multiple (more than two) results from the same host or site. With filtering enabled, only the first two results from each host are included in the result set.

restrict

No, restrict doesn't have anything to do with SafeSearch either. It allows for restricting your search to one of Google's topical searches or to a specific country. Google has four topic restricts: U.S. Government (unclesam), Linux (linux), Macintosh (mac), and FreeBSD (bsd). You'll find the complete country list in the Google Web API documentation. To leave your search unrestricted, leave this option blank (usually signified by empty quotation marks, "").

safeSearch

Now here's the SafeSearch filtering option. This Boolean (true or false) specifies whether results returned will be filtered for questionable (read: adult) content.

lr

This stands for "language restrict" and it's a bit tricky. Google has a list of languages in its API documentation to which you can restrict search results, or you can simply leave this option blank and have no language restrictions.

There are several ways you can restrict to language. First, you can simply include a language code. If you wanted to restrict results to English, for example, you'd use lang_en. But you can also restrict results to more than one language, separating each language code with a | (pipe), signifying OR. lang_en|lang_de, then, constrains results to only those "in English or German."

You can omit languages from results by prepending them with a - (minus sign). -lang_en returns all results but those in English.

ie

This stands for "input encoding," allowing you to specify the character encoding used in the query you're feeding the API. Google's documentation says, "Clients should encode all request data in UTF-8 and should expect results to be in UTF-8." In the first iteration of Google's API program, the Google API documenation offered a table of encoding options (latin1, cyrillic, etc.) but now everything is UTF-8. In fact, requests for anything other than UTF-8 are summarily ignored.

oe

This stands for "output encoding." As with input encoding, everything's UTF-8.

5.7.2 A Sample

Enough with the placeholders; what does an actual query look like?

Take for example a query that uses variables for the key and the query, requests 10 results starting at result number 100 (actually the hundred-and-first result), and specifies filtering and SafeSearch be turned on. That query in Perl would look like this:

my $results = $google_search -> 
doGoogleSearch(
$google_key, $query, 100, 10, 
"true", "", "true", "", 
"utf8", "utf8"
);

Note that the key and query could just as easily have been passed along as quote-delimited strings:

my $results = $google_search -> 
doGoogleSearch(
"12BuCK13mY5h0E/34KN0cK@ttH3Do0R", "+paloentology +dentistry" , 100, 10, 
"true", "", "true", "", 
"utf8", "utf8"
);

While things appear a little more complex when you start fiddling with the language and topic restrictions, the core query remains mostly unchanged; only the values of the options change.

5.7.3 Intersecting Country, and Topic Restrictions

Sometimes you might want to restrict your results to a particular language in a particular country, or a particular language, particular country, and particular topic. Now here's where things start looking a little on the odd side.

Here are the rules:

  • Omit something by prepending it with a - (minus sign).

  • Separate restrictions with a . (period, or full stop)—spaces are not allowed.

  • Specify an OR relationship between two restrictions with a | (pipe).

  • Group restrictions with parentheses.

Let's say you want a query to return results in French, draw only from Canadian sites, and focus only within the Linux topic. Your query would look something like this:

my $results = $google_search -> 
  doGoogleSearch(
    $google_key, $query, 100, 10, 
    "true", "linux.countryCA", "true", "lang_fr", 
    "utf8", "utf8"
  );

For results from Canada or from France, you'd use:

"linux.(countryCA|countryFR)"

Or maybe you want results in French, yet from anywhere but France:

"linux.(-countryFR)"

5.7.4 Putting Query Elements to Use

You might use the different elements of the query as follows:

Using SafeSearch

If you're building a program that's for family-friendly use, you'll probably want to have SafeSearch turned on as a matter of course. But you can also use it to compare safe and unsafe results. The [Hack #81] hack does just that. You could create a program that takes a word from a web form and checks its counts in filtered and unfiltered searches, providing a "naughty rating" for the word based on the counts.

Setting search result numbers

Whether you request 1 or 10 results, you're still using one of your developer key's daily dose of a thousand Goole Web API queries. Wouldn't it then make sense to always request ten? Not necessarily; if you're only using the top result—to bounce the browser to another page, generate a random query string for a password, or whatever—you might as well add even the minutest amount of speed to your application by not requesting results you're just going to throw out or ignore.

Searching different topics

With four different specialty topics available for searching through the Google API, dozens of different languages, and dozens of different countries, there are thousands of combinations of topic/language/country restriction that you would work through.

Consider an "open source country" application. You could create a list of keywords very specific to open source (like linux, perl, etc.) and create a program that cycles through a series of queries that restricts your search to an open source topic (like linux) and a particular country. So you might discover that perl was mentioned in France in the linux topic 15 times, in Germany 20 times, etc.

You could also concentrate less on the program itself and more on an interface to access these variables. How about a form with pull-down menus that allowed you to restrict your searches by continent (instead of country)? You could specify which continent in a variable that's passed to the query. Or how about an interface that lets the user specify a topic and cycles through a list of countries and languages, pulling result counts for each one?

    Previous Section Next Section


         Main Menu
    Main Page
    Table of content
    Copyright
    Dedication
    Credits
    Foreword
    Preface
    Chapter 1. Searching Google
    Chapter 2. Google Special Services and Collections
    Chapter 3. Third-Party Google Services
    Chapter 4. Non-API Google Applications
    Chapter 5. Introducing the Google Web API
    5.1 Hacks #50-59
    5.2 Why an API?
    5.3 Signing Up and Google's Terms
    5.4 The Google Web APIs Developer's Kit
    5.5 Using the Key in a Hack
    5.6 What's WSDL?
    5.7 Understanding the Google API Query
    5.8 Understanding the Google API Response
    Hack 50 Programming the Google Web API with Perl
    Hack 51 Looping Around the 10-Result Limit
    Hack 52 The SOAP::Lite Perl Module
    Hack 53 Plain Old XML, a SOAP::Lite Alternative
    Hack 54 NoXML, Another SOAP::Lite Alternative
    Hack 55 Programming the Google Web API with PHP
    Hack 56 Programming the Google Web API with Java
    Hack 57 Programming the Google Web API with Python
    Hack 58 Programming the Google Web API with C# and .NET
    Hack 59 Programming the Google Web API with VB.NET
    Chapter 6. Google Web API Applications
    Chapter 7. Google Pranks and Games
    Chapter 8. The Webmaster Side of Google
    Colophon
    Index


    More Books
    PHP Hacks
    Processing Xml With Java - A Guide To Sax, Dom, Jdom, Jaxp, And Trax
    The Koran (Holy Qur'an)
    Macromedia Flash 8 Bible
    Search Engine Optimization for Dummies
    YouTube Traffic
    PHP 5 for Dummies
    Harry Potter and The Chamber of Secrets
    Harry Potter and the Sorcerer's Stone
    The Pilgrim's Progress
    Wireless Hacks
    Flash Hacks. 100 Industrial-Strength Tips & Tools
    PayPal Hacks. 100 Industrial-Strength Tips and Tools
    Amazon Hacks
    Pdf Hacks
    The Da Vinci Code
    Google Hacks
    The Holy Bible
    Windows XP For Dummies
    Harry Potter and the Half-Blood Prince
    Seo Book
    Upgrading and Repairing Networks
    Macromedia Dreamweaver 8 UNLEASHED
    Windows XP Annoyances
    Windows XP Hacks
    Microsoft Windows XP Power Toolkit
    Teach Yourself MS Office In 24Hours
    iPod & iTunes Missing Manual
    PC Hacks 100 Industrial-Strength Tips and Tools
    PC Overclocking, Optimization, and Tuning - 2th Edition
    PC Hardware In A Nutshell 3rd Edition
    PC Hardware in a Nutshell, 2nd Edition
    Upgrading and Repairing PCs
    Google for Dummies
    MySQL Cookbook
    Teach Yourself Macromedia Flash 8 In 24 Hours
    PHP CookBook
    Sams Teach Yourself JavaScript in 24 Hours
    PHP5 Manual
    Free Games Paper Airplanes
    500 Juegos Gratis 500 Giochi Gratis 500 Jeux Gratuits 500 Jogos Gratis 500 Kostenlose Spiele