Hack 97 Being a Good Search Engine Citizen
 
Five don'ts and one do for
getting your site indexed by Google.
A high ranking in Google can mean a great
deal of traffic. Because of that, there are lots of people spending
lots of time trying to figure out the infallible way to get a high
ranking from Google. Add this. Remove that. Get a link from this.
Don't post a link to that.
Submitting your site to Google to be
indexed is simple enough. Google's got a site
submission form (http://www.google.com/addurl.html), though
they say if your site has at least a few inbound links (other sites
that link to you), they should find you that way. In fact, Google
encourages URL submitters to get listed on The Open Directory Project
(DMOZ, http://www.dmoz.org/) or
Yahoo! (http://www.yahoo.com/).
Nobody knows the holy grail secret of high page rank without effort.
Google uses a variety of elements, including page popularity, to
determine page rank. Page rank is one of the factors determining how
high up a page appears in search results. But there are several
things you should not be doing combined with one big thing you
absolutely should.
Does breaking one of these rules mean that you're
automatically going to be thrown out of Google's
index? No; there are over 2 billion pages in
Google's index at this writing, and
it's unlikely that they'll find out
about your rule-breaking immediately. But there's a
good chance they'll find out eventually. Is it worth
it having your site removed from the most popular search engine on
the Internet?
97.1 Thou shalt not:
Cloak. "Cloaking" is when your web site is
set up such that search engine spiders get different pages from those
human surfers get. How does the web site know which are the spiders
and which are the humans? By identifying the
spider's User Agent or IP—the latter being the
more reliable method.
An IP (Internet Protocol) address is the computer address from which
a spider comes from. Everything that connects to the Internet has an
IP address. Sometimes the IP address is always the same, as with web
sites. Sometimes the IP address changes—that's
called a dynamic address. (If you use a dial-up modem, chances are
good that every time you log on to the Internet your IP address is
different. That's a dynamic IP address.)
A "User
Agent" is a way a program that surfs the Web
identifies itself. Internet browsers like Mozilla use User Agents, as
do search engine spiders. There are literally dozens of different
kinds of User Agents; see the Web Robots Database (http://www.robotstxt.org/wc/active.html) for
an extensive list.
Advocates of cloaking claim that cloaking is useful to absolutely
optimize content for spiders. Anticloaking critics claim that
cloaking is an easy way to misrepresent site content—feeding a
spider a page that's designed to get the site hits
for pudding cups when actually it's all about
baseball bats. You can get more details about cloaking and different
perspectives on it at http://pandecta.com/,
http://www.apromotionguide.com/cloaking.html,
and http://www.webopedia.com/TERM/C/cloaking.html.
Hide text. Text is hidden by putting words or links in a web page that are the
same color as the page's background—putting
white words on a white background, for example. This is also called
"fontmatching." Why would you do
this? Because a search engine spider could read the words
you've hidden on the page while a human visitor
couldn't. Again, doing this and getting caught could
get you banned from Google's index, so
don't.
That goes for other page content
tricks too, like title stacking (putting multiple copies of a title
tag on one page), putting keywords in comment tags,
keyword stuffing (putting multiple
copies of keywords in very small font on page), putting keywords not
relevant to your site in your META tags, and so
on. Google doesn't provide an exhaustive list of
these types of tricks on their site, but any attempt to circumvent or
fool their ranking system is likely to be frowned upon. Their
attitude is more like: "You can do anything you want
to with your pages, and we can do anything we want to with our
index—like exclude your pages."
Use
doorway pages. Sometimes doorway pages are called "gateway
pages." These are pages that are aimed very
specifically at one topic, which don't have a lot of
their own original content, and which lead to the main page of a site
(thus the name doorway pages).
For example, say you have a page devoted to cooking. You create
doorway pages for several genres of cooking—French cooking,
Chinese cooking, vegetarian cooking, etc. The pages contain terms and
META tags relevant to each genre, but most of the
text is a copy of all the other doorway pages, and all it does is
point to your main site.
This is illegal in Google and annoying to the Google-user;
don't do it. You can learn more about doorway pages
at http://searchenginewatch.com/webmasters/bridge.html or
http://www.searchengineguide.com/whalen/2002/0530_jw1.html.
Check your link rank with automated queries. Using automated queries (except for the sanctioned Google API) is
against Google's Terms of Service anyway. Using an
automated query to check your PageRank every 12 seconds is triple
bad; it's not what the search engine was built for
and Google probably considers it a waste of their time and resources.
Link to "bad neighborhoods". Bad
neighborhoods are those sites that exist only to propagate links.
Because link popularity is one aspect of how Google determines
PageRank, some sites have set up "link
farms"—sites that exist only for the purpose
of building site popularity with bunches of links. The links are not
topical, like a specialty subject index, and they're
not well-reviewed, like Yahoo!; they're just a pile
of links. Another example of a "bad
neighborhood" is a general FFA page. FFA stands for
"free for all";
it's a page where anyone can add their link. Linking
to pages like that is grounds for a penalty from Google.
Now, what happens if a page like that links to
you? Will Google penalize you page? No. Google
accepts that you have no control over who links to your site.
97.2 Thou shalt:
Create great content. All the HTML contortions in the world will do you little good if
you've got lousy, old, or limited content. If you
create great content and promote it without playing search engine
games, you'll get noticed and
you'll get links. Remember
Sturgeon's Law ("Ninety percent of
everything is crud.") Why not make your web site an
exception?
97.3 What Happens if You Reform?
Maybe
you've got a site that's not
exactly the work of a good search engine citizen. Maybe
you've got 500 doorway pages, 10
title tags per page, and enough hidden text to
make an O'Reilly Pocket Guide. But maybe now you
want to reform. You want to have a clean lovely site and leave the
doorway pages to Better Homes and Gardens. Are
you doomed? Will Google ban your site for the rest of its life?
No. The first thing you need to do is clean up your site—remove
all traces of rule breaking. Next, send a note about your site
changes and the URL to help@google.com. Note
that Google really doesn't have the resources to
answer every email about why they did or didn't
index a site—otherwise, they'd be answering
emails all day—and there's no guarantee that
they will reindex your kinder, gentler site. But they will look at
your message.
97.4 What Happens if You Spot Google Abusers in the Index?
What if some other site that you come across in your Google searching
is abusing Google's spider and pagerank mechanism?
You have two options. You can send an email to
spamreport@google.com
or fill
out the form at http://www.google.com/contact/spamreport.html.
(I'd fill out the form; it reports the abuse in a
standard format that Google's used to seeing.)
|
Main Menu
|