Blocking search engine bots from caching specific webpages
by rajesh[ Edit ] 2007-09-05 10:09:32
Every search engine before accessing/caching your page will look for a file named robots.txt file in your home directory. Example: http://hiox.org/robots.txt
If the file is not present the bot will cache every page it can scroll through..
We can add the robots.txt file and set specific like allowing only certain bot, disallwing certain file folders etc..
Some Example Follow:
First create a file named robots.txt in your home directory and proceed
To allow all robots complete access: Add the following code in robots.txt file
User-agent: *
Disallow:
To exclude all robots from the server: Note we have used a forward Slash
User-agent: *
Disallow: /
To disallow all robots from accessing/caching a folder of a server:
User-agent: *
Disallow: /foldername/
Disallow: /example/
When you use the above code, no robot will access/cache any file under http://sitename/foldername/ & http://sitename/example/ folders
To disallow all robots from accessing/caching a file of a server:
User-agent: *
Disallow: /foldername/filename.html
When you use the above code, no robot will access/cache http://sitename/foldername/filename.html file
Blocking a specific search engine or bot
User-agent: Googlebot
Disallow: /
The above code block Google bot. Just replace Googlebot with whatever search robot to block.
Example: If you want to block access to all bots other than the Googlebot, you can use the following syntax:
User-agent: *
Disallow: /
User-agent: Googlebot
Disallow: