Google Hacks Free Open Book

Google Hacks

Previous Section Next Section

Hack 80 Comparing Google Results with Those of Other Search Engines

figs/expert.giffigs/hack80.gif

Comparing Google search results with results from other search engines.

True Google fanatics might not like to think so, but there's really more than one search engine. Google's competitors include the likes of AltaVista, AlltheWeb, and Teoma.

Equally surprising to the average Google fanatic is the fact that Google doesn't index the entire Web. There are, at the time of this writing, over 2 billion web pages in the Google index, but that's just a fraction of the Web. You'd be amazed how much non-overlapping content there is in each search engine. Some queries that bring only a few results on one search engine bring plenty on another search engine.

This hack gives you a program that compares counts for Google and several other search engines, with an easy way to plug in new search engines that you want to include. This version of the hack searches different domains for the query, in addition to getting the full count for the query itself.

This hack requires the LWP::Simple (http://search.cpan.org/search?query=LWP%3A%3ASimple) module to run.

80.1 The Code

#!/usr/local/bin/perl
# google_compare.cgi
# Compares Google results against those of other search engines

# Your Google API developer's key
my $google_key='insert key here';

# Location of the GoogleSearch WSDL file
my $google_wdsl = "./GoogleSearch.wsdl";

use strict;

use SOAP::Lite;
use LWP::Simple qw(get);
use CGI qw{:standard};

my $googleSearch = SOAP::Lite->service("file:$google_wdsl");

# setup our browser output.
print "Content-type: text/html\n\n";
print "<html><title>Google Compare Results</title><body>\n";

# ask and we shell receive.
my $query = param('query');
unless ($query) {
   print "<h1>No query defined.</h1></body></html>\n\n";
   exit; # If there's no query there's no program. 
}

# spit out the original before we encode.
print "<h1>Your original query was '$query'.</h1>\n";

$query =~ s/\s/\+/g ;  #changing the spaces to + signs
$query =~ s/\"/%22/g;  #changing the quotes to %22

# Create some hashes of queries for various search engines.  
# We have four types of queries ("plain", "com", "edu", and "org"), 
# and three search engines ("Google", "AlltheWeb", and "Altavista". 
# Each engine has a name, query, and regular expression used to 
# scrape the results.
my $query_hash = { 
   plain => {
      Google => { name => "Google", query => $query, },
      AlltheWeb => {
         name   => "AlltheWeb",
         regexp => "Displaying results <b>.*<\/b> of <b>(.*)<\/b>",
         query  => "http://www.alltheweb.com/search?cat=web&q=$query",
      },
      Altavista => {
         name  => "Altavista", 
         regexp => "We found (.*) results",
         query => "http://www.altavista.com/sites/search/web?q=$query",
      }
   },
   com => {
      Google => { name => "Google", query => "$query site:com", },
      AlltheWeb => {
         name   => "AlltheWeb",
         regexp => "Displaying results <b>.*<\/b> of <b>(.*)<\/b>",
         query  => "http://www.alltheweb.com/search?cat=web&q=$query+domain%3Acom",
      },
      Altavista => {
         name  => "Altavista", regexp => "We found (.*) results",
         query => "http://www.altavista.com/sites/search/web?q=$query+domain%3Acom",
      }
   },
   org => {
      Google => { name => "Google", query => "$query site:org", },
      AlltheWeb => {
         name   => "AlltheWeb",
         regexp => "Displaying results <b>.*<\/b> of <b>(.*)<\/b>",
         query  => "http://www.alltheweb.com/search?cat=web&q=$query+domain%3Aorg",
      },
      Altavista => {
         name  => "Altavista", regexp => "We found (.*) results",
         query => "http://www.altavista.com/sites/search/web?q=$query+domain%3Aorg",
      }
   },
   net => {
      Google => { name => "Google", query => "$query site:net", },
      AlltheWeb => {
         name   => "AlltheWeb",
         regexp => "Displaying results <b>.*<\/b> of <b>(.*)<\/b>",
         query  => "http://www.alltheweb.com/search?cat=web&q=$query+domain%3Anet",
      },
      Altavista => {
         name  => "Altavista", regexp => "We found (.*) results",
         query => "http://www.altavista.com/sites/search/web?q=$query+domain%3Anet",
      }
   }
};

# now, we loop through each of our query types,
# under the assumption there's a matching
# hash that contains our engines and string.
foreach my $query_type (keys (%$query_hash)) {
   print "<h2>Results for a '$query_type' search:</h2>\n";

   # now, loop through each engine we have and get/print the results.
   foreach my $engine (values %{$query_hash->{$query_type}}) {
      my $results_count; 

      # if this is Google, we use the API and not port 80.
      if ($engine->{name} eq "Google") {
         my $result = $googleSearch->doGoogleSearch(
             $google_key, $engine->{query}, 0, 1,
             "false", "", "false", "", "latin1", "latin1");
         $results_count = $result->{estimatedTotalResultsCount};
         # the google api doesn't format numbers with commas.
         my $rresults_count = reverse $results_count;
         $rresults_count =~ s/(\d\d\d)(?=\d)(?!\d*\.)/$1,/g;
         $results_count = scalar reverse $rresults_count;
      }

      # it's not google, so we GET like everyone else.
      elsif ($engine->{name} ne "Google") {
         my $data = get($engine->{query}) or print "ERROR: $!";
         $data =~ /$engine->{regexp}/; $results_count = $1 || 0;
      }

      # and print out the results.
      print "<strong>$engine->{name}</strong>: $results_count<br />\n";
   }
}

80.2 Running the Hack

This hack runs as a CGI script, called from your web browser as: google_compare.cgi?query=your query keywords.

80.3 Why?

You might be wondering why you would want to compare result counts across search engines? It's a good idea to follow what different search engines offer in terms of results. While you might find that a phrase on one search engine provides only a few results, another might return results a-plenty. It makes sense to spend your time and energy using the latter for the research at hand.

—Tara Calishain and Morbus Iff

    Previous Section Next Section


         Main Menu
    Main Page
    Table of content
    Copyright
    Dedication
    Credits
    Foreword
    Preface
    Chapter 1. Searching Google
    Chapter 2. Google Special Services and Collections
    Chapter 3. Third-Party Google Services
    Chapter 4. Non-API Google Applications
    Chapter 5. Introducing the Google Web API
    Chapter 6. Google Web API Applications
    6.1 Hacks #60-85
    6.2 The Ingenuity of Millions
    6.3 Learning to Code
    6.4 What You'll Find Here
    6.5 Finding More Google API Applications
    6.6 The Possibilities Aren't Endless, but They're Expanding
    Hack 60 Date-Range Searching with a Client-Side Application
    Hack 61 Adding a Little Google to Your Word
    Hack 62 Permuting a Query
    Hack 63 Tracking Result Counts over Time
    Hack 64 Visualizing Google Results
    Hack 65 Meandering Your Google Neighborhood
    Hack 66 Running a Google Popularity Contest
    Hack 67 Building a Google Box
    Hack 68 Capturing a Moment in Time
    Hack 69 Feeling Really Lucky
    Hack 70 Gleaning Phonebook Stats
    Hack 71 Performing Proximity Searches
    Hack 72 Blending the Google and Amazon Web Services
    Hack 73 Getting Random Results (On Purpose)
    Hack 74 Restricting Searches to Top-Level Results
    Hack 75 Searching for Special Characters
    Hack 76 Digging Deeper into Sites
    Hack 77 Summarizing Results by Domain
    Hack 78 Scraping Yahoo! Buzz for a Google Search
    Hack 79 Measuring Google Mindshare
    Hack 80 Comparing Google Results with Those of Other Search Engines
    Hack 81 SafeSearch Certifying URLs
    Hack 82 Syndicating Google Search Results
    Hack 83 Searching Google Topics
    Hack 84 Finding the Largest Page
    Hack 85 Instant Messaging Google
    Chapter 7. Google Pranks and Games
    Chapter 8. The Webmaster Side of Google
    Colophon
    Index


    More Books
    PHP Hacks
    Processing Xml With Java - A Guide To Sax, Dom, Jdom, Jaxp, And Trax
    The Koran (Holy Qur'an)
    Macromedia Flash 8 Bible
    Search Engine Optimization for Dummies
    YouTube Traffic
    PHP 5 for Dummies
    Harry Potter and The Chamber of Secrets
    Harry Potter and the Sorcerer's Stone
    The Pilgrim's Progress
    Wireless Hacks
    Flash Hacks. 100 Industrial-Strength Tips & Tools
    PayPal Hacks. 100 Industrial-Strength Tips and Tools
    Amazon Hacks
    Pdf Hacks
    The Da Vinci Code
    Google Hacks
    The Holy Bible
    Windows XP For Dummies
    Harry Potter and the Half-Blood Prince
    Seo Book
    Upgrading and Repairing Networks
    Macromedia Dreamweaver 8 UNLEASHED
    Windows XP Annoyances
    Windows XP Hacks
    Microsoft Windows XP Power Toolkit
    Teach Yourself MS Office In 24Hours
    iPod & iTunes Missing Manual
    PC Hacks 100 Industrial-Strength Tips and Tools
    PC Overclocking, Optimization, and Tuning - 2th Edition
    PC Hardware In A Nutshell 3rd Edition
    PC Hardware in a Nutshell, 2nd Edition
    Upgrading and Repairing PCs
    Google for Dummies
    MySQL Cookbook
    Teach Yourself Macromedia Flash 8 In 24 Hours
    PHP CookBook
    Sams Teach Yourself JavaScript in 24 Hours
    PHP5 Manual
    Free Games Paper Airplanes
    500 Juegos Gratis 500 Giochi Gratis 500 Jeux Gratuits 500 Jogos Gratis 500 Kostenlose Spiele