使用集合让一切井井有条
根据您的偏好保存内容并对其进行分类。
Feedfetcher
Google 使用 Feedfetcher 抓取 Google 新闻和 PubSubHubbub 的 RSS 或 Atom Feed。
Feedfetcher 会存储并定期刷新由应用或服务的用户请求的 Feed。只有播客 Feed 才会在 Google 搜索中编入索引;但是,如果 Feed 不符合 Atom 或 RSS 规范,仍可能被编入索引。对于这款由用户控制的 Feed 抓取工具的工作原理,下文列出了一些常见问题的解答。
如何请求 Google 不检索我网站中的部分或全部 Feed?
当用户添加使用 Feedfetcher 数据的服务或应用时,Google 的 Feedfetcher 会尝试获取相应 Feed 的内容,以便将其显示出来。由于 Feedfetcher 请求来自真实用户(而非来自自动抓取工具)的明确操作,因此 Feedfetcher 会忽略 robots.txt 规则。
如果您的 Feed 是公开的,那么 Google 无法限制用户访问该 Feed。一种解决办法是,将网站配置为向用户代理 Feedfetcher-Google
发送 404
、410
或其他错误状态消息。
如果您的 Feed 是由某个博客或网站托管服务提供商提供的,请直接与相应服务提供商联系,限制用户对您 Feed 的访问。
Feedfetcher 多久检索一次我的 Feed?
对于大多数网站,Feedfetcher 平均每小时最多检索一次 Feed。某些经常更新的网站可能会更频繁地刷新。但请注意,由于网络延迟,Feedfetcher 在短时间内检索您 Feed 的频率可能略高。
为什么 Feedfetcher 会尝试从我的服务器或从根本不存在的网域下载不正确的链接?
Feedfetcher 会根据用户安装的服务或应用发出的请求检索 Feed。所以可能的情况是,用户请求的 Feed 网址不存在。
Feedfetcher 会根据用户安装的服务或应用发出的请求检索 Feed。所以可能的情况是,发出请求的用户知道您的“私密”服务器,或者错误地输入了您的“私密”服务器。
为什么 Feedfetcher 不遵循我的 robots.txt 文件中的指令?
只有在用户已明确启动从 Feed 请求数据的服务或应用后,Feedfetcher 才会检索这些 Feed。Feedfetcher 相当于用户的直接代理,而不是漫游器,因此它会忽略 robots.txt 中的指令。由于 Feedfetcher 会充当多位用户的代理,因此它会代表通过应用或服务请求相应 Feed 的所有用户,仅针对这个共同的 Feed 提出 1 次请求,从而节省了带宽。常见的 Feed 包括 RSS 和 Atom。
您可以将服务器配置为向用户代理 Feedfetcher-Google
发送 404
、410
或其他错误状态消息,从而阻止 Feedfetcher 抓取您的网站。
为什么会有来自 Google.com 上的多台计算机的访问记录,而且所有这些计算机使用的都是用户代理 Feedfetcher?
Feedfetcher 分布在多台计算机上,以便提升性能,并随着网络规模的扩大而扩大其作用范围。为了减少带宽的使用量,所用的计算机通常位于它们正在网络中检索的网站附近。
能否将 Feedfetcher 提出请求时所用的 IP 地址告诉我,以便我过滤日志?
Feedfetcher 使用的 IP 地址包含在 user-triggered-fetchers-google.json 对象中。
为什么 Feedfetcher 会多次下载我网站上的同一网页?
一般来说,Feedfetcher 在指定的 Feed 检索期间只从您的网站下载各文件的一份副本。但在极少数情况下,计算机会关闭并重新启动,这可能会导致 Feedfetcher 再次检索最近访问过的网页。
Feedfetcher 会抓取哪些类型的链接?
与普通的网页抓取工具不同,Feedfetcher 并不会发现要抓取的链接,它只会抓取使用 Feedfetcher 的服务或应用的用户提供给它的单个网址。
此处并没有解答我的 Feedfetcher 问题,在哪里可以获得更多帮助?
如果您遇到的问题仍未得到解决,请尝试在 Google 搜索中心论坛中发帖咨询。
如未另行说明,那么本页面中的内容已根据知识共享署名 4.0 许可获得了许可,并且代码示例已根据 Apache 2.0 许可获得了许可。有关详情,请参阅 Google 开发者网站政策。Java 是 Oracle 和/或其关联公司的注册商标。
最后更新时间 (UTC):2025-08-04。
[null,null,["最后更新时间 (UTC):2025-08-04。"],[[["\u003cp\u003eFeedfetcher is Google's tool for crawling RSS or Atom feeds for Google News and PubSubHubbub, primarily used by apps and services to display feed content.\u003c/p\u003e\n"],["\u003cp\u003eFeedfetcher acts as a user agent, ignoring robots.txt; to block it, configure your server to return a 404, 410, or other error status to the \u003ccode\u003eFeedfetcher-Google\u003c/code\u003e user agent.\u003c/p\u003e\n"],["\u003cp\u003eFeedfetcher typically refreshes feeds hourly, though frequency may vary based on update frequency and network conditions.\u003c/p\u003e\n"],["\u003cp\u003eFeedfetcher requests originate from user actions in apps or services, occasionally leading to requests for nonexistent or "secret" URLs.\u003c/p\u003e\n"],["\u003cp\u003eFeedfetcher operates from various IP addresses, which are publicly listed in a JSON file for filtering purposes.\u003c/p\u003e\n"]]],["Feedfetcher crawls RSS/Atom feeds for Google News and PubSubHubbub, storing and refreshing them for app/service users. It ignores robots.txt due to user-initiated requests. To block Feedfetcher, serve a 404 or 410 error to the `Feedfetcher-Google` user agent. Retrieval typically occurs hourly, but may be more frequent for frequently updated feeds. Feedfetcher operates across multiple machines and fetches only user-provided URLs. IP addresses are listed in `user-triggered-fetchers-google.json`.\n"],null,["# Google Feedfetcher | Google Search Central\n\nFeedfetcher\n===========\n\n\nFeedfetcher is how Google crawls RSS or Atom feeds for\n[Google News](https://play.google.com/store/apps/details?id=com.google.android.apps.magazines)\nand [PubSubHubbub](https://en.wikipedia.org/wiki/PubSubHubbub).\nFeedfetcher stores and periodically refreshes feeds that are requested by users of an app or\nservice. Only podcast feeds get indexed in Google Search; however, if a feed doesn't follow the\n[Atom](https://www.rfc-editor.org/rfc/rfc4287.txt) or\n[RSS](https://cyber.harvard.edu/rss/rss.html) specification, it\nmay still be indexed. Here are some answers to the most commonly asked questions about how this\nuser-controlled feed grabber works.\n\nHow do I request that Google not retrieve some or all of my site's feeds?\n-------------------------------------------------------------------------\n\n\nWhen users add a service or app that uses Feedfetcher data, Google's Feedfetcher attempts to\nobtain the content of the feed in order to display it. Since Feedfetcher requests come from\nexplicit action by human users, and not from automated crawlers, Feedfetcher ignores robots.txt\nrules.\n\n\nIf your feed is publicly available, Google can't restrict users from accessing it. One\nsolution is to configure your site to serve a `404`, `410`, or other error\nstatus message to `Feedfetcher-Google` user agent.\n\n\nIf your feed is provided by a blog or site hosting service, work directly with that service to\nrestrict access to your feed.\n\nHow often will Feedfetcher retrieve my feeds?\n---------------------------------------------\n\n\nFeedfetcher shouldn't retrieve feeds from most sites more than once every hour on average. Some\nfrequently updated sites may be refreshed more often. Note, however, that due to network delays,\nit's possible that Feedfetcher may briefly appear to retrieve your feeds more frequently.\n\nWhy is Feedfetcher trying to download incorrect links from my server, or from a domain that\ndoesn't exist?\n----------------------------------------------------------------------------------------------------------\n\n\nFeedfetcher retrieves feeds at the request of services or apps installed by users. It is\npossible that a user has requested a feed URL that does not exist.\n\nWhy is Feedfetcher downloading information from my \"secret\" web server?\n-----------------------------------------------------------------------\n\n\nFeedfetcher retrieves feeds at the request of services or apps installed by users. It is\npossible that the request came from a user who knows about your \"secret\" server or typed it in\nby mistake.\n\nWhy isn't Feedfetcher obeying my robots.txt file?\n-------------------------------------------------\n\n\nFeedfetcher retrieves feeds only after users have explicitly started a service or app that\nrequests data from the feed. Feedfetcher behaves as a direct agent of the human user, not as a\nrobot, so it ignores robots.txt entries. Since Feedfetcher acts as an agent for multiple\nusers, it conserves bandwidth by making requests for common feeds only once for all users who\nrequested the feed through an app or service. The common feeds are\n[RSS](https://en.wikipedia.org/wiki/RSS) and\n[Atom](https://en.wikipedia.org/wiki/Atom_(Web_standard)).\n\n\nYou can prevent Feedfetcher from crawling your site by configuring your server to serve a\n`404`, `410`, or other error status message to the\n`Feedfetcher-Google` user agent.\n\nWhy are there visits from multiple machines at Google.com, all with user-agent Feedfetcher?\n-------------------------------------------------------------------------------------------\n\n\nFeedfetcher was designed to be distributed on several machines to improve performance and scale as\nthe web grows. To cut down on bandwidth usage, the machines used are often located near the sites\nthat they're retrieving in the network.\n\nCan you tell me the IP addresses from which Feedfetcher makes requests so that I can filter my\nlogs?\n----------------------------------------------------------------------------------------------------\n\n\nThe IP addresses used by Feedfetcher are included in the\n[user-triggered-fetchers-google.json](https://developers.google.com/static/search/apis/ipranges/user-triggered-fetchers-google.json)\nobject.\n\nWhy is Feedfetcher downloading the same page on my site multiple times?\n-----------------------------------------------------------------------\n\n\nIn general, Feedfetcher only downloads one copy of each file from your site during a given feed\nretrieval. Very occasionally, the machines are stopped and restarted, which may cause it to again\nretrieve pages that it's recently visited.\n\nWhat kinds of links does Feedfetcher crawl?\n-------------------------------------------\n\n\nUnlike normal web crawlers, Feedfetcher isn't discovering links to crawl at all; instead, it\ncrawls a single URL that's provided to it by users of a service or app that uses Feedfetcher.\n\nMy Feedfetcher question isn't answered here. Where can I get more help?\n-----------------------------------------------------------------------\n\n\nIf you're still having trouble, try posting your question in the Search Central\n[forum](https://support.google.com/webmasters/community)."]]