Google Hacks Free Open Book

Google Hacks

Previous Section Next Section

Hack 76 Digging Deeper into Sites

figs/expert.giffigs/hack76.gif

Dig deeper into the hierarchies of web sites matching your search criteria.

One of Google's big strengths is that it can find your search term instantly and with great precision. But sometimes you're not interested so much in one definitive result as in lots of diverse results; maybe you even want some that are a bit more on the obscure side.

One method I've found rather useful is to ignore all results shallower than a particular level in a site's directory hierarchy. You avoid all the clutter of finds on home pages and go for subject matter otherwise often hidden away in the depths of a site's structure. While content comes and gos, ebbs and flows from a site's main focus, it tends to gather in more permanent locales, categorized and archived, like with like.

This script asks for a query along with a preferred depth, above which results are thrown out. Specify a depth of four and your results will come only from http://example.com/a/b/c/d, not /a, /a/b/, or /a/b/c.

Because you're already limiting the kinds of results you see, it's best to use more common words for what you're looking for. Obscure query terms can often cause absolutely no results to turn up.

The default number of loops, retrieving 10 items apiece, is set to 50. This is to assure you glean some decent number of results, because many will be tossed. You can, of course, alter this number but bear in mind that you're using that number of your daily quota of 1,000 Google API queries per developer's key.

76.1 The Code

#!/usr/local/bin/perl
# deep_blue_g.cgi
# Limiting search results to a particular depth in a web 
# site's hierarchy.
# deep_blue_g.cgi is called as a CGI with form input

# Your Google API developer's key
my $google_key='insert key here';

# Location of the GoogleSearch WSDL file
my $google_wdsl = "./GoogleSearch.wsdl";

# Number of times to loop, retrieving 10 results at a time
my $loops = 10;

use SOAP::Lite;
use CGI qw/:standard *table/;

print
  header(  ),
  start_html("Fishing in the Deep Blue G"),
  h1("Fishing in the Deep Blue G"),
  start_form(-method=>'GET'),
  'Query: ', textfield(-name=>'query'),
  br(  ),
  'Depth: ', textfield(-name=>'depth', -default=>4),
  br(  ),
  submit(-name=>'submit', -value=>'Search'),
  end_form(  ), p(  );

# Make sure a query and numeric depth are provided
if (param('query') and param('depth') =~ /\d+/) {

  # Create a new SOAP object
  my $google_search  = SOAP::Lite->service("file:$google_wdsl");

  for (my $offset = 0; $offset <= $loops*10; $offset += 10) {
    my $results = $google_search -> 
      doGoogleSearch(
        $google_key, param('query'), $offset, 10, "false", "",  "false",
        "", "latin1", "latin1"
      );

    last unless @{$results->{resultElements}};

    foreach my $result (@{$results->{'resultElements'}}) {

      # Determine depth
      my $url = $result->{URL};
      $url =~ s!^\w+://|/$!!g;

      # Output only those deep enough
      ( split(/\//, $url) - 1) >= param('depth') and 
        print 
          p(
            b(a({href=>$result->{URL}},$result->{title}||'no title')), br(  ),
            $result->{URL}, br(  ),
            i($result->{snippet}||'no snippet')
          );
    }
  }

  print end_html;
}

76.2 Running the Hack

This hack runs as a CGI script. Point your browser at it, fill out query and depth fields, and click the "Submit" button.

Figure 6-18 shows a query for "Jacques Cousteau", restricting results to a depth of 6—that's six levels down from the site's home page. You'll notice some pretty long URLs in there.

Figure 6-18. A search for "Jacques Cousteau", restricting results to six levels down
figs/gooH_0618.gif

76.3 Hacking the Hack

Perhaps you're interested in just the opposite of what this hack provides: you want only results from higher up in a site's hierarchy. Hacking this hack is simple enough: swap in a < (less than) symbol instead of the > (great than) in the following line:

      ( split(/\//, $url) - 1) <= param('depth') and 

76.4 See Also

    Previous Section Next Section


         Main Menu
    Main Page
    Table of content
    Copyright
    Dedication
    Credits
    Foreword
    Preface
    Chapter 1. Searching Google
    Chapter 2. Google Special Services and Collections
    Chapter 3. Third-Party Google Services
    Chapter 4. Non-API Google Applications
    Chapter 5. Introducing the Google Web API
    Chapter 6. Google Web API Applications
    6.1 Hacks #60-85
    6.2 The Ingenuity of Millions
    6.3 Learning to Code
    6.4 What You'll Find Here
    6.5 Finding More Google API Applications
    6.6 The Possibilities Aren't Endless, but They're Expanding
    Hack 60 Date-Range Searching with a Client-Side Application
    Hack 61 Adding a Little Google to Your Word
    Hack 62 Permuting a Query
    Hack 63 Tracking Result Counts over Time
    Hack 64 Visualizing Google Results
    Hack 65 Meandering Your Google Neighborhood
    Hack 66 Running a Google Popularity Contest
    Hack 67 Building a Google Box
    Hack 68 Capturing a Moment in Time
    Hack 69 Feeling Really Lucky
    Hack 70 Gleaning Phonebook Stats
    Hack 71 Performing Proximity Searches
    Hack 72 Blending the Google and Amazon Web Services
    Hack 73 Getting Random Results (On Purpose)
    Hack 74 Restricting Searches to Top-Level Results
    Hack 75 Searching for Special Characters
    Hack 76 Digging Deeper into Sites
    Hack 77 Summarizing Results by Domain
    Hack 78 Scraping Yahoo! Buzz for a Google Search
    Hack 79 Measuring Google Mindshare
    Hack 80 Comparing Google Results with Those of Other Search Engines
    Hack 81 SafeSearch Certifying URLs
    Hack 82 Syndicating Google Search Results
    Hack 83 Searching Google Topics
    Hack 84 Finding the Largest Page
    Hack 85 Instant Messaging Google
    Chapter 7. Google Pranks and Games
    Chapter 8. The Webmaster Side of Google
    Colophon
    Index


    More Books
    PHP Hacks
    Processing Xml With Java - A Guide To Sax, Dom, Jdom, Jaxp, And Trax
    The Koran (Holy Qur'an)
    Macromedia Flash 8 Bible
    Search Engine Optimization for Dummies
    YouTube Traffic
    PHP 5 for Dummies
    Harry Potter and The Chamber of Secrets
    Harry Potter and the Sorcerer's Stone
    The Pilgrim's Progress
    Wireless Hacks
    Flash Hacks. 100 Industrial-Strength Tips & Tools
    PayPal Hacks. 100 Industrial-Strength Tips and Tools
    Amazon Hacks
    Pdf Hacks
    The Da Vinci Code
    Google Hacks
    The Holy Bible
    Windows XP For Dummies
    Harry Potter and the Half-Blood Prince
    Seo Book
    Upgrading and Repairing Networks
    Macromedia Dreamweaver 8 UNLEASHED
    Windows XP Annoyances
    Windows XP Hacks
    Microsoft Windows XP Power Toolkit
    Teach Yourself MS Office In 24Hours
    iPod & iTunes Missing Manual
    PC Hacks 100 Industrial-Strength Tips and Tools
    PC Overclocking, Optimization, and Tuning - 2th Edition
    PC Hardware In A Nutshell 3rd Edition
    PC Hardware in a Nutshell, 2nd Edition
    Upgrading and Repairing PCs
    Google for Dummies
    MySQL Cookbook
    Teach Yourself Macromedia Flash 8 In 24 Hours
    PHP CookBook
    Sams Teach Yourself JavaScript in 24 Hours
    PHP5 Manual
    Free Games Paper Airplanes
    500 Juegos Gratis 500 Giochi Gratis 500 Jeux Gratuits 500 Jogos Gratis 500 Kostenlose Spiele