Google For Dummies Free Open Book

Google For Dummies

Previous Section
 < Day Day Up > 
Next Section

Keeping Google Out

Your priority might run contrary to this chapter, in that you want to prevent Google from crawling your site and putting it in the Web search index. It does seem pushy, when you think about it, for any search engine to invade your Web space, suck up all your text, and make it available to anyone with a matching keyword. Some people feel that Google’s cache is more than just pushy, and infringes copyright regulations by caching an unauthorized copy of a site.

If you want to keep the Google crawl out of your site, get familiar with the robots.txt file, also known as the Robots Exclusion Protocol. Google’s spider understands and obeys this protocol.

The robots.txt file is a short, simple text file that you place in the top-level directory (root directory) of your domain server. (If you use server space provided by a utility ISP, such as AOL, you probably need administrative help in placing the robots.txt file.) The file contains two instructions:

  • User-agent: This instruction specifies which search engine crawler must follow the robots.txt instructions.

  • Disallow: This line specifies which directories (Web page folders) or specific pages at your site are off-limits to the search engine. You must include a separate Disallow line for each excluded directory.

A sample robots.txt file looks like this:

User-agent: *
Disallow: /

This example is the most common and simplest robots.txt file. The asterisk after User-agent means all spiders are excluded. The forward slash after Disallow means that all site directories are off-limits.

The name of Google’s spider is Googlebot. (“Here, Googlebot! Come to Daddy! Sit. Good Googlebot! Who’s a good boy?”) If you want to exclude only Google and no other search engines, use this robots.txt file:

User-agent: Googlebot
Disallow: /

You may identify certain directories as impervious to the crawl, either from Google or all spiders:

User-agent: *
Disallow: /cgi-bin/
Disallow: /family/
Disallow: /photos/

Notice the forward slash at each end of the directory string in the preceding examples. Google understands that the first slash implies your domain address before it. So, if the first Disallow line were found at the bradhill.com site, the line would be shorthand for http://www.bradhill.com/cgi-bin/, and Google would know to exclude that directory from the crawl. The second forward slash is the indicator that you are excluding an entire directory.

To exclude individual pages, type the page address following the first forward slash, and leave off the ending forward slash, like this:

User-agent: *
Disallow: /family/reunion-notes.htm
Disallow: /blog/archive00082.htm
Remember 

Each excluded directory and page must be listed on its own Disallow line. Do not group multiple items on one line.

Tip 

You may adjust the robots.txt file as often as you like. It’s a good tool when building out fresh pages that you don’t want indexed while still under construction. When they’re finished, take them out of the robots.txt file.


Previous Section
 < Day Day Up > 
Next Section
Index: [SYMBOL][A][B][C][D][E][F][G][H][I][J][K][L][M][N][O][P][Q][R][S][T][U][V][W][X][Y][Z]


     Main Menu
Table of Contents
BackCover
Google For Dummies
Introduction
Part I: Taming Google
Part II: Specialty Searching
Part III: Putting Google to Work for You
Chapter 9: Google on Your Browser
Chapter 10: Googling in Tongues
Chapter 11: Using Google AdWords
Chapter 12: Bringing Google and Its Users to Your Site
The Google Crawl
Getting into Google
The Folly of Fooling Google
Keeping Google Out
Part IV: Tricks, Games, and Alternatives to Google
Part V: The Part of Tens
Google For Dummies Cheat Sheet
Index
List of Figures
List of Sidebars


More Books
PHP Hacks
Processing Xml With Java - A Guide To Sax, Dom, Jdom, Jaxp, And Trax
The Koran (Holy Qur'an)
Macromedia Flash 8 Bible
Search Engine Optimization for Dummies
YouTube Traffic
PHP 5 for Dummies
Harry Potter and The Chamber of Secrets
Harry Potter and the Sorcerer's Stone
The Pilgrim's Progress
Wireless Hacks
Flash Hacks. 100 Industrial-Strength Tips & Tools
PayPal Hacks. 100 Industrial-Strength Tips and Tools
Amazon Hacks
Pdf Hacks
The Da Vinci Code
Google Hacks
The Holy Bible
Windows XP For Dummies
Harry Potter and the Half-Blood Prince
Seo Book
Upgrading and Repairing Networks
Macromedia Dreamweaver 8 UNLEASHED
Windows XP Annoyances
Windows XP Hacks
Microsoft Windows XP Power Toolkit
Teach Yourself MS Office In 24Hours
iPod & iTunes Missing Manual
PC Hacks 100 Industrial-Strength Tips and Tools
PC Overclocking, Optimization, and Tuning - 2th Edition
PC Hardware In A Nutshell 3rd Edition
PC Hardware in a Nutshell, 2nd Edition
Upgrading and Repairing PCs
Google for Dummies
MySQL Cookbook
Teach Yourself Macromedia Flash 8 In 24 Hours
PHP CookBook
Sams Teach Yourself JavaScript in 24 Hours
PHP5 Manual
Free Games Paper Airplanes
500 Juegos Gratis 500 Giochi Gratis 500 Jeux Gratuits 500 Jogos Gratis 500 Kostenlose Spiele