Robot.txt and SEO
by Vickram H[ Edit ] 2013-08-27 16:45:08
Robots.txt Means?
The Robots Exclusion Protocol (REP) is a group of web standards that regulate web robot behavior and search engine indexing.
Pattern Matching:
Google and Bing both honor two regular expressions that can be used to identify pages or sub-folders that an SEO wants excluded.
* = which is a wildcard that represents any sequence of characters
$ = which matches the end of the URL
Block all web crawlers from all content:
User-agent: * Disallow: /
Block a specific web crawler from a specific folder:
User-agent: Googlebot Disallow: /no-google/
Block a specific web crawler from a specific web page:
User-agent: Googlebot Disallow: /no-google/blocked-page.html
Allow a specific web crawler to visit a specific web page:
Disallow: /no-bots/block-all-bots-except-rogerbot-page.html User-agent: rogerbot Allow: /no-bots/block-all-bots-except-rogerbot-page.html
Sitemap Parameter:
User-agent: * Disallow: Sitemap: http://www.example.com/none-standard-location/sitemap.xml