注释功能:定义要搜索的网站
使用集合让一切井井有条
根据您的偏好保存内容并对其进行分类。
本页介绍如何使用 XML 注释文件定义搜索引擎的覆盖范围。
- 概览
- 使用可编程搜索 XML 格式
- 提高搜索广告系列覆盖面
- 注解限制
概览
如果要构建大型搜索引擎,管理大量网站可能会很繁琐。不过,您可以通过在注释文件中列出并上传大量网站来添加和管理大量网站。此外,注释文件可让您更好地控制搜索结果的排名。
注解文件就是一个注解列表。每个注释都包含两个部分:网站及其关联的标签。该标签会告知可编程搜索引擎如何处理网站;即:是应包含、排除、升级还是将某个网站降位。在上下文文件中,您可以定义标签;,您就可以使用适当的标签标记网站。
开始修改注释文件时,请先添加少量注释。使用一些注释,可以更轻松地测试您的搜索引擎并进行问题排查。获得预期结果后,逐步添加更多注解。
您可以将注释文件上传到控制台。如需详细了解文件限制,请参阅注解限制部分。
返回页首
如果您想利用可编程搜索引擎配置文件中的所有功能,则最好使用 XML。
XML 注解
以下是 XML 注解的一个示例。此注解文件会指示可编程搜索引擎包含 www.webmd.com/hw/* 下的所有内容,但排除 www.webmd.com/hw/cancer/* 下的所有内容。
<Annotations>
<Annotation about="www.cancer.gov/cancertopics/types/liver/*">
<Label name="_include_"/>
<Comment>government site</Comment>
</Annotation>
<Annotation about="www.medicinenet.com/liver_cancer/">
<Label name="_exclude_"/>
<Comment>site on symptoms</Comment>
</Annotation>
<Annotation about="www.webmd.com/hw/*">
<Label name="_include_"/>
<Comment>great sites for patients!</Comment>
</Annotation>
<Annotation about="www.webmd.com/hw/cancer/*">
<Label name="_exclude_"/>
<Comment>great sites for patients!</Comment>
</Annotation>
<Annotation about="www.oncologychannel.com/*/treatment">
<Label name="_exclude_"/>
</Annotation>
</Annotations>
注解文件具有以下层次结构中的四个元素:
返回页首
创建外部注释
如需列出您希望搜索引擎涵盖的网站,请执行以下操作:
- 文件应以
<Annotations></Annotations>
根元素开头。
- 通过添加
<Annotation></Annotation>
标记来创建注释,然后使用网站的网址格式定义 about
属性。
<Annotations>
<Annotation about="www.webmd.com/hw/cancer/*">
</Annotation>
</Annotations>
- 使用
<Label name=" "/>
标记将网站与搜索引擎相关联,并指定搜索引擎应如何处理该网站。您可以从相应搜索引擎的上下文文件中获取该搜索引擎的标签。您会看到两个标签:一个用于向可编程搜索引擎添加网站,另一个用于从可编程搜索引擎中排除网站。如果您尚未更改上下文文件中搜索引擎标签的名称,则用于包含网站的标签为 _include_
,用于排除网站的标签为 _exclude_
。为避免错误,请复制并粘贴这些标签,而不是手动输入。
<Annotations>
<Annotation about="http://www.solarenergy.org/*">
<Label name="_include_"/>
</Annotation>
</Annotations>
一个网站可以关联多个标签
如果您在上下文文件中更改了标签的名称,请记得更新注解文件中的 Label name
值。
- 如需添加更多网站,请创建并定义另一个
Annotation
元素。
- 保存 XML 文件。
返回页首
提高搜索广告系列覆盖面
可编程搜索引擎建立在 Google 索引之上。这意味着您的搜索引擎可以访问 Google 索引中的网页;反之,Google 未抓取的网页不会显示在您的搜索结果中。如果您想在可编程搜索引擎中加入目前不在 Google 索引中的网站,请向 Google Search Console 提交站点地图。
站点地图包含您网站中网页的列表,以及网页的更新频率及其相对重要性的相关信息。提交站点地图有助于 Google 发现您的网页并改进抓取时间表。要了解有关站点地图的详情,请访问网站站长帮助中心和使用站点地图协议。如果您想创建更精美的站点地图,请访问 http://www.sitemaps.org/protocol.php。
如果您的网站具备以下条件,提交站点地图尤其有用:
Google 只能将自己可以访问的网页编入索引。因此,如果您在网页中使用 robots.txt 文件或漫游器元标记,请确保这些网页不会阻止抓取工具。
覆盖面的扩大并非即时生效,因为 Google 需要一段时间才能抓取相应网页并将其编入索引。但是,在将您的网页编入索引后,它们可能会同时显示在 Google 搜索和您的可编程搜索引擎中。
返回页首
注解限制
下表列出了上传到可编程搜索引擎的注释文件的限制:
注意:请严格遵守这些限制;如果超出这些限值,您的搜索引擎可能就不会显示结果。
方面 |
限制 |
文件大小(上下文或注解文件) |
30KB |
每个搜索引擎的注释数量上限 |
5,000 次
提示:如果您发现自己所用的搜索引擎超过了网站数量上限 5,000 个,请考虑将单个网址整合到网址格式中。 |
返回页首
如未另行说明,那么本页面中的内容已根据知识共享署名 4.0 许可获得了许可,并且代码示例已根据 Apache 2.0 许可获得了许可。有关详情,请参阅 Google 开发者网站政策。Java 是 Oracle 和/或其关联公司的注册商标。
最后更新时间 (UTC):2025-07-25。
[null,null,["最后更新时间 (UTC):2025-07-25。"],[[["\u003cp\u003eDefine your Programmable Search Engine's coverage using an XML annotations file to manage large site collections and fine-tune search result rankings.\u003c/p\u003e\n"],["\u003cp\u003eAnnotations files use labels to include, exclude, promote, or demote sites within your search engine, providing granular control over search results.\u003c/p\u003e\n"],["\u003cp\u003eUtilize XML annotations to specify URL patterns and associate them with labels, enabling precise control over site inclusion and exclusion.\u003c/p\u003e\n"],["\u003cp\u003eImprove search coverage by submitting a Sitemap to Google Search Console, ensuring your site's pages are indexed and available in your search engine.\u003c/p\u003e\n"],["\u003cp\u003eAdhere to the annotations file size and maximum annotation limits to ensure optimal search engine performance.\u003c/p\u003e\n"]]],[],null,["# Annotations: Defining Sites to Search\n\nThis page describes how to define the coverage of your search engine using a XML annotations file.\n\n1. [Overview](#overview)\n2. [Using the Programmable Search XML Format](#xml)\n3. [Improving Search Coverage](#sitemaps)\n4. [Annotations Limits](#limits)\n\nOverview\n--------\n\nManaging a large collection of sites can be tedious if you're building a large search engine. Instead, you can add and manage a lot of sites by listing them in an annotations file and uploading it. In addition, annotations files give you far greater control over the ranking of search results.\n\nAn annotations file is simply a list of annotations. Each annotation has two components: the site and its associated labels. The label tells Programmable Search Engine how to handle a site; that is, whether a site should be included, excluded, promoted, or demoted. In the [context file](/custom-search/docs/context), you define labels; in the annotations file, you tag sites with the appropriate labels.\n\nWhen you start editing your annotations file, start out with a small number of annotations. It's easier to test and troubleshoot your search engine with a handful of annotations. When you get the results that you expect, incrementally add more annotations.\n\nYou can [upload the annotations file](/custom-search/docs/basics#edit) to the Control Panel. For details about file limits, see the [Annotations Limits](#limits) section.\n\n[Back to top](#top)\n\nUsing the Programmable Search XML Format\n----------------------------------------\n\nIf you want to take advantage of all the features available in the Programmable Search Engine configuration file, XML is the way to go.\n\n\n### XML Annotations\n\nThe following is an example of XML annotations. This annotations file tells Programmable Search Engine to include everything under www.webmd.com/hw/\\* but exclude everything under www.webmd.com/hw/cancer/\\*. \n\n```carbon\n\u003cAnnotations\u003e\n \u003cAnnotation about=\"www.cancer.gov/cancertopics/types/liver/*\"\u003e\n \u003cLabel name=\"_include_\"/\u003e\n \u003cComment\u003egovernment site\u003c/Comment\u003e\n \u003c/Annotation\u003e\n \u003cAnnotation about=\"www.medicinenet.com/liver_cancer/\"\u003e\n \u003cLabel name=\"_exclude_\"/\u003e\n \u003cComment\u003esite on symptoms\u003c/Comment\u003e\n \u003c/Annotation\u003e\n \u003cAnnotation about=\"www.webmd.com/hw/*\"\u003e\n \u003cLabel name=\"_include_\"/\u003e\n \u003cComment\u003egreat sites for patients!\u003c/Comment\u003e\n \u003c/Annotation\u003e\n \u003cAnnotation about=\"www.webmd.com/hw/cancer/*\"\u003e\n \u003cLabel name=\"_exclude_\"/\u003e\n \u003cComment\u003egreat sites for patients!\u003c/Comment\u003e\n \u003c/Annotation\u003e\n \u003cAnnotation about=\"www.oncologychannel.com/*/treatment\"\u003e\n \u003cLabel name=\"_exclude_\"/\u003e\n \u003c/Annotation\u003e\n\u003c/Annotations\u003e\n```\n\nThe annotations file has four elements in the following hierarchy:\n\n- `Annotations` *(root element)*\n - `Annotation`\n - `Label`\n - `Comment` *(optional)*\n\n[Back to top](#top)\n\n### Creating External Annotations\n\nTo list sites you want your search engine to cover, do the the following:\n\n1. Start the file with the `\u003cAnnotations\u003e\u003c/Annotations\u003e` root element.\n2. Create an annotation by adding the `\u003cAnnotation\u003e\u003c/Annotation\u003e` tags, and then define the `about` attribute with the URL pattern of the site. \n\n ```scdoc\n \u003cAnnotations\u003e\n \u003cAnnotation about=\"www.webmd.com/hw/cancer/*\"\u003e\n \u003c/Annotation\u003e\n \u003c/Annotations\u003e\n ```\n3. Associate the site with the search engine by using the `\u003cLabel name=\" \"/\u003e` tag, and specify how that site should be treated by the search engine. You can get the labels for your search engine from the Context file of the search engine. You'll find two labels: one for adding sites to your Programmable Search Engine and one for excluding sites from it. If you have not changed the name of the search engine label in the context file, the label for including sites is in the form of `_include_`, and the label for excluding sites is in the form of `_exclude_`. To avoid errors, copy and paste these labels instead of typing them by hand. \n\n ```scdoc\n \u003cAnnotations\u003e\n \u003cAnnotation about=\"http://www.solarenergy.org/*\"\u003e\n \u003cLabel name=\"_include_\"/\u003e\n \u003c/Annotation\u003e\n \u003c/Annotations\u003e\n ```\n\n A single site can have multiple labels associated with it,\n\n If you have changed the name of the label in the context file, remember to update the `Label name` values in your annotation file.\n4. To add more sites, create and define another `Annotation` element.\n5. Save the XML file.\n\n[Back to top](#top)\n\nImproving Search Coverage\n-------------------------\n\nProgrammable Search Engine is built on top of the Google index. This means that webpages that are in the Google index are available to your search engine; conversely, webpages that have not been crawled by Google will not show up in your search results. If you want your Programmable Search Engine to include sites that are not currently in the Google index, submit a Sitemap to [Google Search Console](https://www.google.com/webmasters/tools/dashboard).\n\nA Sitemap includes a list of pages in your site, as well as information about the update frequency of the webpages and their importance relative to each other. Submitting a Sitemap helps Google discover your webpages and improve the crawling schedule. To learn more about Sitemaps, see the [Webmaster Help Center](http://www.google.com/support/webmasters/bin/answer.py?answer=40318&query=sitemap&topic=&type=) and [Using the Sitemap Protocol](https://www.google.com/webmasters/tools/docs/en/protocol.html). If you are interested in building fancier Sitemaps, see \u003chttp://www.sitemaps.org/protocol.php\u003e.\n\nSubmitting Sitemaps is particularly helpful if your site has the following:\n\n- Dynamic content\n- Webpages that aren't easily discovered by Googlebot (Google's web crawler), such as pages with rich AJAX or Flash features\n- Few websites linking to it. Googlebot crawls the web by following links from one page to another, so if your site isn't well linked, it is hard for the crawler to discover it. If your website is new, probably not many websites are pointing to your site.\n\n- A large archive of content pages that does not have a strong network of cross-linking\n\nGoogle can index only pages it can access. So, if you use [robots.txt](/webmasters/control-crawl-index/docs/robots_txt) file or [robots meta tags](http://www.google.com/support/webmasters/bin/answer.py?answer=79812) in your webpages, make sure those pages don't block crawlers.\n\nImproved coverage is not instantaneous, as it takes some time for the pages to be crawled and indexed. But once your webpages are in the index, they could appear in both Google search and your Programmable Search Engine.\n\n[Back to top](#top)\n\nAnnotations Limits\n------------------\n\nThe following table lists the limits for annotations files that are uploaded to Programmable Search Engine:\n\n**Note:** Follow the limits closely; if you exceed them, your search engine might not show results.\n\n| Aspect | Limit |\n|-------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| File size (context or annotations files) | 30KB |\n| Maximum number of annotations per search engine | 5,000 **Tip:** If you find your search engine outgrowing the large 5,000-site limit, consider consolidating individual URLs into [URL patterns](https://support.google.com/programmable-search/answer/4513886). |\n\n[Back to top](#top)"]]