There are many times you may with to exclude certain pages from being indexed by certain engines. One way to do this is by utilizing a robots.txt file and uploading it to the root directory of your Web site.
Basically, you just create a text file with Window’s NotePad or any other editor that can save ASCII .txt files.
Use the following syntax:
For example, to tell Inktomi’s spider, called Slurp, to not index files called orderform.html and junk.html, create a robots.txt file as follows:
You would then upload this robots.txt file to the root directory of your Web site. Although this is a voluntary protocol, most major search engines will honor it.
You can add more lines to exclude pages from other engines by specifying the User-Agent parameter again in the same file, followed by more Disallow lines. Each disallow statement will be applied to the last User-Agent that was specified. If you want to exclude an entire directory, use this syntax:
Other options are to exclude the page from all spiders with:
Do NOT use the wildcard (*) character in the Disallow line since that’s not supported.
Make sure you use the proper syntax. If you misspell something, it’s not going to work.
This article is copyrighted and has been reprinted with permission from FirstPlace Software, the makers of WebPosition Gold. FirstPlace Software helped define the SEO industry with the introduction of the first product to track your rankings on the major search engines and to help you improve those rankings. A free trial of WebPosition Gold is available from their Web site.