Getting into Google
You can get your site into the Google index in two ways:
-
Submit the site manually
-
Let the crawl find it
Both these methods lead to unpredictable results. Google offers no assurance that submitted sites will be added to the index. Google does not respond to submissions, and it does not promise to add or discard the site within a certain time frame. You may submit and wait, or you may just wait for the crawl. You may submit and wait for the crawl. Submitting does not direct the crawl toward you, and it does not deflect it. Google is impassive and promises nothing. But Google does sometimes add sites that are not linked on other pages, and would probably not be found by the crawl.
| Remember |
If you have added a new page to a site already in the Google index, you do not need to submit the new page. Under most circumstances, Google will find it the next time your site is crawled. But you might as well submit an entirely new site, even if it consists of a single page. Do so at this URL:
www.google.com/addurl.html
|
The submission form could hardly be simpler. Enter your URL address, and make whatever descriptive comments you feel might help your cause. Then click the Add URL button — which is a bit misleading. Submitting a site is not the same as adding it to the index! Only the Google crawler or a human Google staffer can make additions to the index.
Luring the spider
The key to attracting Google’s spider is getting linked on other sites. Google finds your content by following links to your pages. With no incoming links, you’re an unreachable island as far as the Google crawl is concerned. Of course, anybody can reach you directly by entering the URL, but you won’t pluck the spider’s web until you get other sites to link to you.
In theory, any single page currently crawled by Google (that is, in the index) that links to your page or site is enough to send Google’s spider crawling toward you. In practice, you want as many incoming links as possible, both to increase your chance of being crawled (sounds a little uncomfortable, doesn’t it?) and to improve your PageRank after your site is in the index.
| Remember |
Keep your pipes clean. That is to say, don’t make life difficult for Google’s spider. In other words (how many different ways can I say this before I finally make myself clear?), host your site with a reliable Web host, and keep your pages in good working order. The Google crawl attempts to break through connection problems, but it doesn’t keep trying forever. If it can’t get through in the monthly deep crawl and your site isn’t included in the fresh crawl, you could suffer a longish, unnecessary delay before getting into the index.
|
| Tip |
Don’t expect instant recognition in Google when you add a page to your site. If your site is part of the fresh crawl, new page(s) show up fairly quickly in search results, but there’s no firm formula for the frequency of the fresh crawl or the implementation of its results. If the spider hits your site during the deep crawl, the wait for fresh pages to appear in the index is considerably longer. The same factors apply if you move your site from one URL address to another. (Although not if you merely change hosts, keeping the same URL.) Complicating that situation is that your site at the old address might remain cached (stored) in Google’s index, even while search results are matching keywords to your site at the new address. This confusion is one reason some Webmasters don’t like the Google cache — when they make a change to a site or its address, they don’t want the old information living on in the world’s most popular search index.
|
Spider-friendly tips
Getting into the Google index is largely a waiting game, in which preparation, persistence, and patience are the tools of success. However, a number of techniques incline Google’s spider to look on you more favorably:
-
Place important content outside dynamically generated pages: A dynamic page is one created on-the-fly based on choices made by the site visitor. This method of page generation works fine when the visitor is a thinking human. (Or even a relatively thoughtless human.) But when an index robot hits such a site, it can generate huge numbers of pages unintentionally (assuming robots ever have intentions), sometimes crashing the site or its server. The Google spider picks up some dynamically generated pages, but generally backs off when it encounters dynamic content. Weblog pages do not fall into this category — they are dynamically generated by you, the Webmaster, but not by your visitors.
-
Don’t use splash pages: Splash pages, (which Google calls doorway pages) are content-empty entry pages to Web sites. You’ve probably seen them. Some splash pages employ cool multimedia introductions to the content within. Others are mere static welcome mats that force users to click again before getting into the site. Google does not like pointing its searchers to splash pages. In fact, these tedious welcome mats are bad site design by any standard, even if you don’t care about Google indexing, and I recommend getting rid of them. Give your visitors, and Google, meaningful content from the first click, and you’ll be rewarded with happier visitors and better placement in Google’s index.
-
Use frames sparingly: Frames have been generally loathed since their introduction into the HTML specification early in the Web’s history. They wreak havoc with the Back button, and they confuse the fundamental format of Web addresses (one page per address) by including independent page functions within one Web page. However, frames do have legitimate uses. Google itself uses frames to display threads in Google Groups (see Chapter 4). But the Google crawler turns up its nose when it encounters frames. That’s not to say that framed pages necessarily remain out of the Web index. But errors can ensue, hurting both the index and your visitors — either your framed pages won’t be included, or searchers are sent to the wrong page because of addressing confusion. If you do use frames, make your site Google friendly (and human friendly) by providing links to unframed versions of the same content. These links give Google’s diligent spider another route to your valuable content, and give us (Google’s users) better addresses with which to find your stuff. And your visitors get a choice of viewing modes — everybody wins.
-
Divide content topically: How long should a Web page be? The answer differs depending on the nature of the page, the type of visitor it attracts, how heavy (with graphics and other modem-choking material) it is, and how on-topic the entire page is. Long pages are sometimes the result of lazy site building, because it takes effort to spin off a new page, address it, link to it, and integrate into the overall site design. From Google’s perspective, and in the context of securing better representation in the index, breaking up content is good, as long as it makes topical sense. If you operate a fan page for a local music group, and the site contains bios, music clips, concert schedules, and lyrics, Google could make more sense of it all if you devote a separate page to each of those content groups. Google also likes to see page titles relating closely to page content. Keeping your information bites mouth-sized helps Google index your stuff better.
-
Keep your link structure tidy: Google’s spider is efficient, but it’s not a mind-reader. Nor does it make up URL variations, hoping to find hidden content. The Google crawler is a slave to the link. If you want all your pages represented in the index, make sure each one has a link leading to it from within your site. Many site-building programs contain link-checking routines and administrative checks to diagnose linkage problems. Simple sites might not warrant such firepower; in that case, check your navigation sidebars and section headers to make sure you’re not leaving out anything.
|
Main Menu
|