Google Hacks Free Open Book

Google Hacks

Previous Section Next Section

Hack 77 Summarizing Results by Domain

figs/expert.giffigs/hack77.gif

Getting an overview of the sorts of domains (educational, commercial, foreign, and so forth) found in the results of a Google query.

You want to know about a topic, so you do a search. But what do you have? A list of pages. You can't get a good idea of the types of pages these are without taking a close look at the list of sites.

This hack is an attempt to get a "snapshot" of the types of sites that result from a query. It does this by taking a "suffix census," a count of the different domains that appear in search results.

This is most ideal for running link: queries, providing a good idea of what kinds of domains (commercial, educational, military, foreign, etc.) are linking to a particular page.

You could also run it to see where technical terms, slang terms, and unusual words were turning up. Which pages mention a particular singer more often? Or a political figure? Does the word "democrat" come up more often on .com or .edu sites?

Of course this snapshot doesn't provide a complete inventory; but as overviews go, it's rather interesting.

77.1 The Code

#!/usr/local/bin/perl
# suffixcensus.cgi
# Generates a snapshot of the kinds of sites responding to a
# query.  The suffix is the .com, .net, or .uk part.
# suffixcensus.cgi is called as a CGI with form input

# Your Google API developer's key
my $google_key='insert key here';

# Location of the GoogleSearch WSDL file
my $google_wdsl = "./GoogleSearch.wsdl";

# Number of times to loop, retrieving 10 results at a time
my $loops = 10;

use SOAP::Lite;
use CGI qw/:standard *table/;

print
  header(  ),
  start_html("SuffixCensus"),
  h1("SuffixCensus"),
  start_form(-method=>'GET'),
  'Query: ', textfield(-name=>'query'),
  '   ',
  submit(-name=>'submit', -value=>'Search'),
  end_form(  ), p(  );

if (param('query')) {
  my $google_search  = SOAP::Lite->service("file:$google_wdsl");
  my %suffixes;

  for (my $offset = 0; $offset <= $loops*10; $offset += 10) {

    my $results = $google_search ->
      doGoogleSearch(
        $google_key, param('query'), $offset, 10, "false", "",  "false",
        "", "latin1", "latin1"
      );
      
    last unless @{$results->{resultElements}};

    map { $suffixes{ ($_->{URL} =~ m#://.+?\.(\w{2,4})/#)[0] }++ }
      @{$results->{resultElements}};
  }

  print
    h2('Results: '), p(  ),
    start_table({cellpadding => 5, cellspacing => 0, border => 1}),
    map( { Tr(td(uc $_),td($suffixes{$_})) } sort keys %suffixes ),
    end_table(  );
}

print end_html(  );

77.2 Running the Hack

This hack runs as a CGI script. Install it in your cgi-bin or appropriate directory, and point your browser at it.

77.3 The Results

Searching for the prevalence of "soda pop" by suffix finds, as one might expect, the most mention on .coms, as Figure 6-19 shows.

Figure 6-19. Prevalence of "soda pop" by suffix
figs/gooH_0619.gif

77.4 Hacking the Hack

There are a couple of ways to hack this hack.

77.4.1 Going back for more

This script, by default, visits Google 10 times, grabbing the top 100 (or fewer, if there aren't as many) results. To increase or decrease the number of visits, simply change the value of the $loops variable at the top of the script. Bear in mind, however, that making $loops = 50 might net you 500 results, but you're also eating quickly into your daily alotment of 1,000 Google API queries.

77.4.2 Comma-separated

It's rather simple to adjust this script to run from the command line and return a comma-separated output suitable for Excel or your average database. Remove the starting HTML, form, and ending HTML output, and alter the code that prints out the results. In the end, you come to something like this (changes in bold):

#!/usr/local/bin/perl
# suffixcensus_csv.pl
# Generates a snapshot of the kinds of sites responding to a
# query.  The suffix is the .com, .net, or .uk part.
# usage: perl suffixcensus_csv.pl query="your query" > results.csv

# Your Google API developer's key
my $google_key='insert key';

# Location of the GoogleSearch WSDL file
my $google_wdsl = "./GoogleSearch.wsdl";

# Number of times to loop, retrieving 10 results at a time
my $loops = 1;

use SOAP::Lite;
use CGI qw/:standard/;

param('query')
  or die qq{usage: suffixcensus_csv.pl query="{query}" [> results.csv]\n};

print qq{"suffix","count"\n};
 
my $google_search  = SOAP::Lite->service("file:$google_wdsl");

my %suffixes;

for (my $offset = 0; $offset <= $loops*10; $offset += 10) {

  my $results = $google_search ->
    doGoogleSearch(
      $google_key, param('query'), $offset, 10, "false", "",  "false",
      "", "latin1", "latin1"
    );

  last unless @{$results->{resultElements}};

  map { $suffixes{ ($_->{URL} =~ m#://.+?\.(\w{2,4})/#)[0] }++ }
    @{$results->{resultElements}};
}

print map { qq{"$_", "$suffixes{$_}"\n} } sort keys %suffixes;

Invoke the script from the command line like so:

$ perl suffixcensus_csv.pl query="query" > results.csv 

Searching for mentions of "colddrink," the South African version of "soda pop," sending the output straight to the screen rather than a results.csv file, looks like this:

$ perl suffixcensus_csv.pl query="colddrink" 
"suffix","count"
"com", "12"
"info", "1"
"net", "1"
"za", "6" 
    Previous Section Next Section


         Main Menu
    Main Page
    Table of content
    Copyright
    Dedication
    Credits
    Foreword
    Preface
    Chapter 1. Searching Google
    Chapter 2. Google Special Services and Collections
    Chapter 3. Third-Party Google Services
    Chapter 4. Non-API Google Applications
    Chapter 5. Introducing the Google Web API
    Chapter 6. Google Web API Applications
    6.1 Hacks #60-85
    6.2 The Ingenuity of Millions
    6.3 Learning to Code
    6.4 What You'll Find Here
    6.5 Finding More Google API Applications
    6.6 The Possibilities Aren't Endless, but They're Expanding
    Hack 60 Date-Range Searching with a Client-Side Application
    Hack 61 Adding a Little Google to Your Word
    Hack 62 Permuting a Query
    Hack 63 Tracking Result Counts over Time
    Hack 64 Visualizing Google Results
    Hack 65 Meandering Your Google Neighborhood
    Hack 66 Running a Google Popularity Contest
    Hack 67 Building a Google Box
    Hack 68 Capturing a Moment in Time
    Hack 69 Feeling Really Lucky
    Hack 70 Gleaning Phonebook Stats
    Hack 71 Performing Proximity Searches
    Hack 72 Blending the Google and Amazon Web Services
    Hack 73 Getting Random Results (On Purpose)
    Hack 74 Restricting Searches to Top-Level Results
    Hack 75 Searching for Special Characters
    Hack 76 Digging Deeper into Sites
    Hack 77 Summarizing Results by Domain
    Hack 78 Scraping Yahoo! Buzz for a Google Search
    Hack 79 Measuring Google Mindshare
    Hack 80 Comparing Google Results with Those of Other Search Engines
    Hack 81 SafeSearch Certifying URLs
    Hack 82 Syndicating Google Search Results
    Hack 83 Searching Google Topics
    Hack 84 Finding the Largest Page
    Hack 85 Instant Messaging Google
    Chapter 7. Google Pranks and Games
    Chapter 8. The Webmaster Side of Google
    Colophon
    Index


    More Books
    PHP Hacks
    Processing Xml With Java - A Guide To Sax, Dom, Jdom, Jaxp, And Trax
    The Koran (Holy Qur'an)
    Macromedia Flash 8 Bible
    Search Engine Optimization for Dummies
    YouTube Traffic
    PHP 5 for Dummies
    Harry Potter and The Chamber of Secrets
    Harry Potter and the Sorcerer's Stone
    The Pilgrim's Progress
    Wireless Hacks
    Flash Hacks. 100 Industrial-Strength Tips & Tools
    PayPal Hacks. 100 Industrial-Strength Tips and Tools
    Amazon Hacks
    Pdf Hacks
    The Da Vinci Code
    Google Hacks
    The Holy Bible
    Windows XP For Dummies
    Harry Potter and the Half-Blood Prince
    Seo Book
    Upgrading and Repairing Networks
    Macromedia Dreamweaver 8 UNLEASHED
    Windows XP Annoyances
    Windows XP Hacks
    Microsoft Windows XP Power Toolkit
    Teach Yourself MS Office In 24Hours
    iPod & iTunes Missing Manual
    PC Hacks 100 Industrial-Strength Tips and Tools
    PC Overclocking, Optimization, and Tuning - 2th Edition
    PC Hardware In A Nutshell 3rd Edition
    PC Hardware in a Nutshell, 2nd Edition
    Upgrading and Repairing PCs
    Google for Dummies
    MySQL Cookbook
    Teach Yourself Macromedia Flash 8 In 24 Hours
    PHP CookBook
    Sams Teach Yourself JavaScript in 24 Hours
    PHP5 Manual
    Free Games Paper Airplanes
    500 Juegos Gratis 500 Giochi Gratis 500 Jeux Gratuits 500 Jogos Gratis 500 Kostenlose Spiele