|
Free Open Book
Google Hacks |
Hack 76 Digging Deeper into Sites
Dig deeper into the hierarchies of web sites matching your search criteria. One of Google's big strengths is that it can find your search term instantly and with great precision. But sometimes you're not interested so much in one definitive result as in lots of diverse results; maybe you even want some that are a bit more on the obscure side. One method I've found rather useful is to ignore all results shallower than a particular level in a site's directory hierarchy. You avoid all the clutter of finds on home pages and go for subject matter otherwise often hidden away in the depths of a site's structure. While content comes and gos, ebbs and flows from a site's main focus, it tends to gather in more permanent locales, categorized and archived, like with like. This script asks for a query along with a preferred depth, above which results are thrown out. Specify a depth of four and your results will come only from http://example.com/a/b/c/d, not /a, /a/b/, or /a/b/c. Because you're already limiting the kinds of results you see, it's best to use more common words for what you're looking for. Obscure query terms can often cause absolutely no results to turn up.
76.1 The Code#!/usr/local/bin/perl
# deep_blue_g.cgi
# Limiting search results to a particular depth in a web
# site's hierarchy.
# deep_blue_g.cgi is called as a CGI with form input
# Your Google API developer's key
my $google_key='insert key here';
# Location of the GoogleSearch WSDL file
my $google_wdsl = "./GoogleSearch.wsdl";
# Number of times to loop, retrieving 10 results at a time
my $loops = 10;
use SOAP::Lite;
use CGI qw/:standard *table/;
print
header( ),
start_html("Fishing in the Deep Blue G"),
h1("Fishing in the Deep Blue G"),
start_form(-method=>'GET'),
'Query: ', textfield(-name=>'query'),
br( ),
'Depth: ', textfield(-name=>'depth', -default=>4),
br( ),
submit(-name=>'submit', -value=>'Search'),
end_form( ), p( );
# Make sure a query and numeric depth are provided
if (param('query') and param('depth') =~ /\d+/) {
# Create a new SOAP object
my $google_search = SOAP::Lite->service("file:$google_wdsl");
for (my $offset = 0; $offset <= $loops*10; $offset += 10) {
my $results = $google_search ->
doGoogleSearch(
$google_key, param('query'), $offset, 10, "false", "", "false",
"", "latin1", "latin1"
);
last unless @{$results->{resultElements}};
foreach my $result (@{$results->{'resultElements'}}) {
# Determine depth
my $url = $result->{URL};
$url =~ s!^\w+://|/$!!g;
# Output only those deep enough
( split(/\//, $url) - 1) >= param('depth') and
print
p(
b(a({href=>$result->{URL}},$result->{title}||'no title')), br( ),
$result->{URL}, br( ),
i($result->{snippet}||'no snippet')
);
}
}
print end_html;
}
76.2 Running the HackThis hack runs as a CGI script. Point your browser at it, fill out query and depth fields, and click the "Submit" button. Figure 6-18 shows a query for "Jacques Cousteau", restricting results to a depth of 6—that's six levels down from the site's home page. You'll notice some pretty long URLs in there. Figure 6-18. A search for "Jacques Cousteau", restricting results to six levels down![]() 76.3 Hacking the HackPerhaps you're interested in just the opposite of what this hack provides: you want only results from higher up in a site's hierarchy. Hacking this hack is simple enough: swap in a < (less than) symbol instead of the > (great than) in the following line: ( split(/\//, $url) - 1) <= param('depth') and
76.4 See Also
|
Main Menu |
| 500 Juegos Gratis | 500 Giochi Gratis | 500 Jeux Gratuits | 500 Jogos Gratis | 500 Kostenlose Spiele |