Hack 14 inurl: Versus site:
 
Use inurl: syntax to search site
subdirectories.
The site: special
syntax is perfect for those situations in which you want to restrict
your search to a certain domain or domain suffix like
"example.com,"
"www.example.org," or
"edu": site:edu.
But it breaks down when you're trying to search for
a site that exists beneath the main or default site (i.e., in a
subdirectory like /~sam/album/).
For example, if you're looking for something below
the main GeoCities site, you can't use
site: to find all the pages in
http://www.geocities.com/Heartland/Meadows/6485/;
Google will return no results. Enter inurl:, a
Google special syntax [Section 1.5] for specifying a string to be found in a
resultant URL. That query, then, would work as expected like so:
inurl:www.geocities.com/Heartland/Meadows/6485/
 |
While the http:// prefix in a URL is summarily
ignored by Google when used with site:, search
results come up short when including it in a
inurl: query. Be sure to remove prefixes in any
inurl: query for the best (read: any) results.
|
|
You'll see that using the inurl:
query instead of the site: query has two immediate
advantages:
14.1 How Many Subdomains?
You can also use inurl: in combination with the
site: syntax to get information about
subdomains. For example, how many
subdomains does O'Reilly.com really have? You
can't get that information via the query
site:oreilly.com, but neither can you get it just
from the query inurl:"*.oreilly.com" (because that
query will pick up mirrors and other pages containing the string
oreilly.com that aren't at the
O'Reilly site).
However, this query will work just fine:
site:oreilly.com inurl:"*.oreilly" -inurl:"www.oreilly"
This query says to Google, "Look on the site O'Reilly.com with page URLs that contain the string `*.oreilly' (remember the full-word wildcard? [Hack #13]) but ignore URLs with the string `www.oreilly'" (because that's a subdomain you're already very familiar with).
|