Blocking search engine bots from caching specific webpages

by rajesh[ Edit ] 2007-09-05 10:09:32

Every search engine before accessing/caching your page will look for a file named robots.txt file in your home directory. Example: http://hiox.org/robots.txt

If the file is not present the bot will cache every page it can scroll through..

We can add the robots.txt file and set specific like allowing only certain bot, disallwing certain file folders etc..

Some Example Follow:

First create a file named robots.txt in your home directory and proceed

To allow all robots complete access: Add the following code in robots.txt file

User-agent: *
Disallow:

To exclude all robots from the server: Note we have used a forward Slash

User-agent: *
Disallow: /

To disallow all robots from accessing/caching a folder of a server:

User-agent: *
Disallow: /foldername/
Disallow: /example/

When you use the above code, no robot will access/cache any file under http://sitename/foldername/ & http://sitename/example/ folders

To disallow all robots from accessing/caching a file of a server:

User-agent: *
Disallow: /foldername/filename.html

When you use the above code, no robot will access/cache http://sitename/foldername/filename.html file

Blocking a specific search engine or bot

User-agent: Googlebot
Disallow: /

The above code block Google bot. Just replace Googlebot with whatever search robot to block.

Example: If you want to block access to all bots other than the Googlebot, you can use the following syntax:

User-agent: *
Disallow: /
User-agent: Googlebot
Disallow:

Tagged in:

Websites

2826

You must LOGIN to add comments

Blocking search engine bots from caching specific webpages

by rajesh[ Edit ] 2007-09-05 10:09:32

Tagged in:

Tags

Comments