robots.txt 實用規則

以下是 robots.txt 一些常見的實用規則：

實用規則
禁止檢索整個網站	提醒您，在某些情況下，未經檢索的網站網址仍可能會編入索引。注意：使用這項規則時，比對範圍不含各種 AdsBot 檢索器；如要比對 AdsBot 檢索器，必須特別指明。 User-agent: * Disallow: /
禁止檢索特定目錄及其中內容	在目錄名稱後方附加正斜線，即可禁止檢索整個目錄。注意：提醒您，如果想禁止存取私人內容，請不要使用 robots.txt，而是要改用適當的驗證機制。即使是 robots.txt 檔案禁止的網址，仍有可能在未經檢索的情況下編入索引；此外，由於任何人都能查看 robots.txt 檔案，所以私人內容的位置也可能因此曝光。 User-agent: * Disallow: /calendar/ Disallow: /junk/ Disallow: /books/fiction/contemporary/
允許單一檢索器存取網站內容	只有 `googlebot-news` 可檢索整個網站。 User-agent: Googlebot-news Allow: / User-agent: * Disallow: /
允許所有檢索器存取網站內容，但某一個檢索器除外	`Unnecessarybot` 不得檢索網站，但其他漫遊器可以。 User-agent: Unnecessarybot Disallow: / User-agent: * Allow: /
禁止檢索單一網頁	例如，禁止檢索位於 `https://example.com/useless_file.html` 的 `useless_file.html` 頁面，以及 `junk` 目錄中的 `other_useless_file.html` 頁面。 User-agent: * Disallow: /useless_file.html Disallow: /junk/other_useless_file.html
禁止檢索整個網站，但子目錄除外	檢索器只能存取 `public` 子目錄。 User-agent: * Disallow: / Allow: /public/
禁止 Google 圖片檢索特定圖片	例如，禁止 `dogs.jpg` 圖片。 User-agent: Googlebot-Image Disallow: /images/dogs.jpg
禁止 Google 圖片檢索您網站上的所有圖片	Google 不得為未經檢索的圖片和影片建立索引。 User-agent: Googlebot-Image Disallow: /
禁止檢索特定類型的檔案	例如，禁止檢索所有的 `.gif` 檔案。 User-agent: Googlebot Disallow: /*.gif$
禁止檢索整個網站，但允許 `Mediapartners-Google` 進行檢索	這麼做會讓您的網頁無法顯示在搜尋結果中，但 `Mediapartners-Google` 網路檢索器仍可分析網頁，以決定要在網站上對訪客顯示哪些廣告。 User-agent: * Disallow: / User-agent: Mediapartners-Google Allow: /
使用 `*` 和 `$` 萬用字元來比對結尾為特定字串的網址	例如，封鎖所有 `.xls` 檔案。 User-agent: Googlebot Disallow: /*.xls$