Wednesday, July 16, 2014
Webmaster level: intermediate-advanced
To crawl, or not to crawl, that is the robots.txt question.
Making and maintaining correct robots.txt files can sometimes be difficult. While most sites have it easy (tip: they often don't even need a robots.txt file!), finding the directives within a large robots.txt file that are or were blocking individual URLs can be quite tricky. To make that easier, we're now announcing an updated robots.txt testing tool in Webmaster Tools.
You can find the updated testing tool in Webmaster Tools within the Crawl section:
Here you'll see the current robots.txt file, and can test new URLs to see whether they're disallowed for crawling. To guide your way through complicated directives, it will highlight the specific one that led to the final decision. You can make changes in the file and test those too, you'll just need to upload the new version of the file to your server afterwards to make the changes take effect. Our developers site has more about robots.txt directives and how the files are processed.
Additionally, you'll be able to review older versions of your robots.txt file, and see when access
issues block us from crawling. For example, if Googlebot sees a
500 server error for
the robots.txt file, we'll generally pause further crawling of the website.
We hope this updated tool makes it easier for you to test and maintain the robots.txt file. Should you have any questions, or need help with crafting a good set of directives, you can drop by our webmaster's help forum!