註解:定義要搜尋的網站
透過集合功能整理內容
你可以依據偏好儲存及分類內容。
本頁說明如何使用 XML 註解檔案定義搜尋引擎的涵蓋範圍。
- Overview
- 使用程式化搜尋 XML 格式
- 改善搜尋涵蓋率
- 註解限制
總覽
如果您要建構大型搜尋引擎,管理大量網站可能會相當繁瑣。只要在註解檔案中列出並上傳大量網站,即可新增及管理大量網站。此外,註解檔案還可讓您進一步掌控搜尋結果的排名。
註解檔案只是註解清單每個註解都有兩個元件:網站及其相關標籤。標籤指出程式化搜尋引擎如何處理網站。也就是應納入、排除、宣傳或降低網站的排名在內容檔案中,您可以定義標籤。,您可以使用適當的標籤來標記網站。
開始編輯註解檔案時,請先從少量註解開始。只要使用少量註解,即可輕鬆測試您的搜尋引擎,並排解相關問題。達到預期結果時,逐步新增更多註解。
您可以將註解檔案上傳至控制台。如要進一步瞭解檔案限制,請參閱「註解限制」一節。
返回頁首
如果您想要利用程式化搜尋引擎設定檔提供的所有功能,則建議使用 XML。
XML 註解
以下是 XML 註解的範例。這個註解檔案會指示程式化搜尋引擎納入 www.webmd.com/hw/* 下的所有內容,但排除 www.webmd.com/hw/cancer/* 下的所有內容。
<Annotations>
<Annotation about="www.cancer.gov/cancertopics/types/liver/*">
<Label name="_include_"/>
<Comment>government site</Comment>
</Annotation>
<Annotation about="www.medicinenet.com/liver_cancer/">
<Label name="_exclude_"/>
<Comment>site on symptoms</Comment>
</Annotation>
<Annotation about="www.webmd.com/hw/*">
<Label name="_include_"/>
<Comment>great sites for patients!</Comment>
</Annotation>
<Annotation about="www.webmd.com/hw/cancer/*">
<Label name="_exclude_"/>
<Comment>great sites for patients!</Comment>
</Annotation>
<Annotation about="www.oncologychannel.com/*/treatment">
<Label name="_exclude_"/>
</Annotation>
</Annotations>
註解檔案包含下列四個元素:
返回頁首
建立外部註解
如要列出您希望搜尋引擎涵蓋的網站,請按照下列步驟操作:
- 以
<Annotations></Annotations>
根元素做為檔案開頭。
- 新增
<Annotation></Annotation>
標記即可建立註解,然後使用網站網址模式定義 about
屬性。
<Annotations>
<Annotation about="www.webmd.com/hw/cancer/*">
</Annotation>
</Annotations>
- 使用
<Label name=" "/>
標記將網站與搜尋引擎建立關聯,並指定搜尋引擎處理網站的方式。您可以透過搜尋引擎的內容檔案取得搜尋引擎的標籤。您會看到兩個標籤:一個用於將網站新增至程式化搜尋引擎,另一個用於排除網站的網站。如果您尚未變更內容檔案中的搜尋引擎標籤名稱,納入網站的標籤會採用 _include_
,排除網站的標籤則為 _exclude_
。為避免發生錯誤,請複製並貼上這些標籤,而不要手動輸入。
<Annotations>
<Annotation about="http://www.solarenergy.org/*">
<Label name="_include_"/>
</Annotation>
</Annotations>
一個網站可以擁有多個相關聯的標籤
如果您在結構定義檔案中變更了標籤名稱,請記得更新註解檔案中的 Label name
值。
- 如要新增更多網站,請建立並定義另一個
Annotation
元素。
- 儲存 XML 檔案。
返回頁首
改善搜尋聯播網涵蓋率
程式化搜尋引擎是以 Google 索引為基礎。也就是說,您的搜尋引擎可以存取 Google 索引中的網頁。相反地,如果網頁尚未經過 Google 檢索,就不會顯示在搜尋結果中。如果您希望程式化搜尋引擎納入目前未編入 Google 索引的網站,請將 Sitemap 提交至 Google Search Console。
Sitemap 包含您網站上的網頁清單、網頁更新頻率以及各個網頁相對重要性的相關資訊。提交 Sitemap 可協助 Google 找到你的網頁,並改善檢索排程。如要進一步瞭解 Sitemap,請參閱網站管理員說明中心和使用 Sitemap 通訊協定。如果您想要建立更豐富的 Sitemap,請參閱 http://www.sitemaps.org/protocol.php。
如果您的網站具有下列特性,提交 Sitemap 就特別有幫助:
Google 只會為可存取的網頁建立索引。因此,如果您在網頁中使用 robots.txt 檔案或漫遊器中繼標記,請確認這些網頁未封鎖檢索器。
已改善的涵蓋範圍不會立即提升,因為系統需要一些時間檢索網頁並建立索引。不過,一旦網頁編入索引,就能同時顯示在 Google 搜尋和程式化搜尋引擎中。
返回頁首
註解限制
下表列出上傳至程式化搜尋引擎的註解檔案限制:
注意:請遵守相關限制;否則搜尋引擎可能就不會顯示結果。
切面 |
限制 |
檔案大小 (內容或註解檔案) |
30KB |
每個搜尋引擎的註解數量上限 |
5,000 人
提示:如果您發現搜尋引擎已超出 5,000 個網站數量上限,請考慮將個別網址合併成網址模式。 |
返回頁首
除非另有註明,否則本頁面中的內容是採用創用 CC 姓名標示 4.0 授權,程式碼範例則為阿帕契 2.0 授權。詳情請參閱《Google Developers 網站政策》。Java 是 Oracle 和/或其關聯企業的註冊商標。
上次更新時間:2025-07-25 (世界標準時間)。
[null,null,["上次更新時間:2025-07-25 (世界標準時間)。"],[[["\u003cp\u003eDefine your Programmable Search Engine's coverage using an XML annotations file to manage large site collections and fine-tune search result rankings.\u003c/p\u003e\n"],["\u003cp\u003eAnnotations files use labels to include, exclude, promote, or demote sites within your search engine, providing granular control over search results.\u003c/p\u003e\n"],["\u003cp\u003eUtilize XML annotations to specify URL patterns and associate them with labels, enabling precise control over site inclusion and exclusion.\u003c/p\u003e\n"],["\u003cp\u003eImprove search coverage by submitting a Sitemap to Google Search Console, ensuring your site's pages are indexed and available in your search engine.\u003c/p\u003e\n"],["\u003cp\u003eAdhere to the annotations file size and maximum annotation limits to ensure optimal search engine performance.\u003c/p\u003e\n"]]],[],null,["# Annotations: Defining Sites to Search\n\nThis page describes how to define the coverage of your search engine using a XML annotations file.\n\n1. [Overview](#overview)\n2. [Using the Programmable Search XML Format](#xml)\n3. [Improving Search Coverage](#sitemaps)\n4. [Annotations Limits](#limits)\n\nOverview\n--------\n\nManaging a large collection of sites can be tedious if you're building a large search engine. Instead, you can add and manage a lot of sites by listing them in an annotations file and uploading it. In addition, annotations files give you far greater control over the ranking of search results.\n\nAn annotations file is simply a list of annotations. Each annotation has two components: the site and its associated labels. The label tells Programmable Search Engine how to handle a site; that is, whether a site should be included, excluded, promoted, or demoted. In the [context file](/custom-search/docs/context), you define labels; in the annotations file, you tag sites with the appropriate labels.\n\nWhen you start editing your annotations file, start out with a small number of annotations. It's easier to test and troubleshoot your search engine with a handful of annotations. When you get the results that you expect, incrementally add more annotations.\n\nYou can [upload the annotations file](/custom-search/docs/basics#edit) to the Control Panel. For details about file limits, see the [Annotations Limits](#limits) section.\n\n[Back to top](#top)\n\nUsing the Programmable Search XML Format\n----------------------------------------\n\nIf you want to take advantage of all the features available in the Programmable Search Engine configuration file, XML is the way to go.\n\n\n### XML Annotations\n\nThe following is an example of XML annotations. This annotations file tells Programmable Search Engine to include everything under www.webmd.com/hw/\\* but exclude everything under www.webmd.com/hw/cancer/\\*. \n\n```carbon\n\u003cAnnotations\u003e\n \u003cAnnotation about=\"www.cancer.gov/cancertopics/types/liver/*\"\u003e\n \u003cLabel name=\"_include_\"/\u003e\n \u003cComment\u003egovernment site\u003c/Comment\u003e\n \u003c/Annotation\u003e\n \u003cAnnotation about=\"www.medicinenet.com/liver_cancer/\"\u003e\n \u003cLabel name=\"_exclude_\"/\u003e\n \u003cComment\u003esite on symptoms\u003c/Comment\u003e\n \u003c/Annotation\u003e\n \u003cAnnotation about=\"www.webmd.com/hw/*\"\u003e\n \u003cLabel name=\"_include_\"/\u003e\n \u003cComment\u003egreat sites for patients!\u003c/Comment\u003e\n \u003c/Annotation\u003e\n \u003cAnnotation about=\"www.webmd.com/hw/cancer/*\"\u003e\n \u003cLabel name=\"_exclude_\"/\u003e\n \u003cComment\u003egreat sites for patients!\u003c/Comment\u003e\n \u003c/Annotation\u003e\n \u003cAnnotation about=\"www.oncologychannel.com/*/treatment\"\u003e\n \u003cLabel name=\"_exclude_\"/\u003e\n \u003c/Annotation\u003e\n\u003c/Annotations\u003e\n```\n\nThe annotations file has four elements in the following hierarchy:\n\n- `Annotations` *(root element)*\n - `Annotation`\n - `Label`\n - `Comment` *(optional)*\n\n[Back to top](#top)\n\n### Creating External Annotations\n\nTo list sites you want your search engine to cover, do the the following:\n\n1. Start the file with the `\u003cAnnotations\u003e\u003c/Annotations\u003e` root element.\n2. Create an annotation by adding the `\u003cAnnotation\u003e\u003c/Annotation\u003e` tags, and then define the `about` attribute with the URL pattern of the site. \n\n ```scdoc\n \u003cAnnotations\u003e\n \u003cAnnotation about=\"www.webmd.com/hw/cancer/*\"\u003e\n \u003c/Annotation\u003e\n \u003c/Annotations\u003e\n ```\n3. Associate the site with the search engine by using the `\u003cLabel name=\" \"/\u003e` tag, and specify how that site should be treated by the search engine. You can get the labels for your search engine from the Context file of the search engine. You'll find two labels: one for adding sites to your Programmable Search Engine and one for excluding sites from it. If you have not changed the name of the search engine label in the context file, the label for including sites is in the form of `_include_`, and the label for excluding sites is in the form of `_exclude_`. To avoid errors, copy and paste these labels instead of typing them by hand. \n\n ```scdoc\n \u003cAnnotations\u003e\n \u003cAnnotation about=\"http://www.solarenergy.org/*\"\u003e\n \u003cLabel name=\"_include_\"/\u003e\n \u003c/Annotation\u003e\n \u003c/Annotations\u003e\n ```\n\n A single site can have multiple labels associated with it,\n\n If you have changed the name of the label in the context file, remember to update the `Label name` values in your annotation file.\n4. To add more sites, create and define another `Annotation` element.\n5. Save the XML file.\n\n[Back to top](#top)\n\nImproving Search Coverage\n-------------------------\n\nProgrammable Search Engine is built on top of the Google index. This means that webpages that are in the Google index are available to your search engine; conversely, webpages that have not been crawled by Google will not show up in your search results. If you want your Programmable Search Engine to include sites that are not currently in the Google index, submit a Sitemap to [Google Search Console](https://www.google.com/webmasters/tools/dashboard).\n\nA Sitemap includes a list of pages in your site, as well as information about the update frequency of the webpages and their importance relative to each other. Submitting a Sitemap helps Google discover your webpages and improve the crawling schedule. To learn more about Sitemaps, see the [Webmaster Help Center](http://www.google.com/support/webmasters/bin/answer.py?answer=40318&query=sitemap&topic=&type=) and [Using the Sitemap Protocol](https://www.google.com/webmasters/tools/docs/en/protocol.html). If you are interested in building fancier Sitemaps, see \u003chttp://www.sitemaps.org/protocol.php\u003e.\n\nSubmitting Sitemaps is particularly helpful if your site has the following:\n\n- Dynamic content\n- Webpages that aren't easily discovered by Googlebot (Google's web crawler), such as pages with rich AJAX or Flash features\n- Few websites linking to it. Googlebot crawls the web by following links from one page to another, so if your site isn't well linked, it is hard for the crawler to discover it. If your website is new, probably not many websites are pointing to your site.\n\n- A large archive of content pages that does not have a strong network of cross-linking\n\nGoogle can index only pages it can access. So, if you use [robots.txt](/webmasters/control-crawl-index/docs/robots_txt) file or [robots meta tags](http://www.google.com/support/webmasters/bin/answer.py?answer=79812) in your webpages, make sure those pages don't block crawlers.\n\nImproved coverage is not instantaneous, as it takes some time for the pages to be crawled and indexed. But once your webpages are in the index, they could appear in both Google search and your Programmable Search Engine.\n\n[Back to top](#top)\n\nAnnotations Limits\n------------------\n\nThe following table lists the limits for annotations files that are uploaded to Programmable Search Engine:\n\n**Note:** Follow the limits closely; if you exceed them, your search engine might not show results.\n\n| Aspect | Limit |\n|-------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| File size (context or annotations files) | 30KB |\n| Maximum number of annotations per search engine | 5,000 **Tip:** If you find your search engine outgrowing the large 5,000-site limit, consider consolidating individual URLs into [URL patterns](https://support.google.com/programmable-search/answer/4513886). |\n\n[Back to top](#top)"]]