Useful robots.txt rules

Here are some common useful robots.txt rules:

Useful rules
Disallow crawling of the entire site

Keep in mind that in some situations URLs from the site may still be indexed, even if they haven't been crawled.

User-agent: *
Disallow: /
Disallow crawling of a directory and its contents

Append a forward slash to the directory name to disallow crawling of a whole directory.

User-agent: *
Disallow: /calendar/
Disallow: /junk/
Disallow: /books/fiction/contemporary/
Allow access to a single crawler

Only googlebot-news may crawl the whole site.

User-agent: Googlebot-news
Allow: /

User-agent: *
Disallow: /
Allow access to all but a single crawler

Unnecessarybot may not crawl the site, all other bots may.

User-agent: Unnecessarybot
Disallow: /

User-agent: *
Allow: /

Disallow crawling of a single web page

For example, disallow the useless_file.html page located at https://example.com/useless_file.html, and other_useless_file.html in the junk directory.

User-agent: *
Disallow: /useless_file.html
Disallow: /junk/other_useless_file.html

Disallow crawling of the whole site except a subdirectory

Crawlers may only access the public subdirectory.

User-agent: *
Disallow: /
Allow: /public/

Block a specific image from Google Images

For example, disallow the dogs.jpg image.

User-agent: Googlebot-Image
Disallow: /images/dogs.jpg

Block all images on your site from Google Images

Google can't index images and videos without crawling them.

User-agent: Googlebot-Image
Disallow: /

Disallow crawling of files of a specific file type

For example, disallow for crawling all .gif files.

User-agent: Googlebot
Disallow: /*.gif$

Disallow crawling of an entire site, but allow Mediapartners-Google

This implementation hides your pages from search results, but the Mediapartners-Google web crawler can still analyze them to decide what ads to show visitors on your site.

User-agent: *
Disallow: /

User-agent: Mediapartners-Google
Allow: /
Use the * and $ wildcards to match URLs that end with a specific string

For example, disallow all .xls files.

User-agent: Googlebot
Disallow: /*.xls$