Monday, December 9, 2024
Allow us to cache, pretty please.
As the internet grew over the years, so did how much Google crawls. While Google's crawling infrastructure supports heuristic caching mechanisms, in fact always had, the number of requests that can be returned from local caches has decreased: 10 years ago about 0.026% of the total fetches were cacheable, which is already not that impressive; today that number is 0.017%.
Why is caching important?
Caching is a critical piece of the large puzzle that is the internet. Caching allows pages to load lightning fast on revisits, it saves computing resources and thus also natural resources, and saves a tremendous amount of expensive bandwidth for both the clients and servers.
Especially if you have a large site with rarely-changing content under individual URLs, allowing
caching locally may help your site be crawled more efficiently. Google's crawling infrastructure
supports heuristic HTTP caching as defined by the
HTTP caching standard,
specifically through the ETag
response- and If-None-Match
request
header, and the Last-Modified
response- and If-Modified-Since
request
header.
We strongly recommend using ETag
because it's less prone to errors and mistakes (the
value is not structured unlike the Last-Modified
value). And, if you have the option,
set them both: the internet will thank you. Maybe.
As for what you consider a change that requires clients to refresh their caches, that's up to you. Our recommendation is that you require a cache refresh on significant changes to your content; if you only updated the copyright date at the bottom of your page, that's probably not significant.
ETag
and If-None-Match
Google's crawlers support ETag
based conditional requests exactly as defined in the
HTTP caching standard.
That is, to signal caching preference to Google's crawlers, set the Etag
value to any
arbitrary ASCII string (usually a hash of the content or version number, but it could also be a
piece of the π, up to you) unique to the representation of the content hosted by the accessed URL.
For example, if you host different versions of the same content under the same URL (say, mobile
and desktop version), each version could have its own unique ETag
value.
Google's crawlers that support caching will send the ETag
value returned for a
previous crawl of that URL in the If-None-Match header
. If the ETag
value sent by the crawler matches the current value the server generated, your server should
return an HTTP 304
(Not modified) status code with no HTTP body. This last bit, no
HTTP body, is the important part for a couple reasons:
- your server doesn't have to spend compute resources on actually generating content; that is, you save money
- your server doesn't have to transfer the HTTP body; that is, you save money
On the client side, like a user's browser or Googlebot, the content under that URL is retrieved from the client's internal cache. Because there's no data transfer involved, this happens lightning fast, making users happy and potentially saving some resources for them, too.
Last-Modified
and If-Modified-Since
Similarly to ETag
, Google's crawlers support Last-Modified based
conditional requests, too, exactly as defined in the HTTP Caching standard. This works the same
way as ETag
from a semantic perspective — an identifier is used to decide
whether the resource is cacheable —, and provides the same benefits as ETag
on
the clients' side.
We have but a couple recommendations if you're using Last-Modified
as a caching
directive:
-
The date in the
Last-Modified
header must be formatted according to the HTTP standard. To avoid parsing issues, we recommend using the following date format: "Weekday, DD Mon YYYY HH:MM:SS Timezone". For example, "Fri, 4 Sep 1998 19:15:56 GMT". -
While not required, consider also setting the
max-age
field of theCache-Control
header to help crawlers determine when to recrawl the specific URL. Set the value of themax-age
field to the expected number of seconds the content will be unchanged. For example,Cache-Control: max-age=94043
.
Examples
If you're like me, wrapping my head around how heuristic caching works is challenging, however
showing an example of the chain of requests and responses seems to help me. Here are two chains
— one for ETag
/If-None-Match
and one for
Last-Modified
/If-Modified-Since
— to visualize how it's supposed
to work:
ETag /If-None-Match |
Last-Modified /If-Modified-Since |
|
---|---|---|
A server's response to a crawl: This is the response from which a crawler can
save the precondition header fields ETag and Last-Modified .
|
HTTP/1.1 200 OK Content-Type: text/plain Date: Fri, 4 Sep 1998 19:15:50 GMT ETag: "34aa387-d-1568eb00" ... |
HTTP/1.1 200 OK Content-Type: text/plain Date: Fri, 4 Sep 1998 19:15:50 GMT Last-Modified: Fri, 4 Sep 1998 19:15:56 GMT Cache-Control: max-age=94043 ... |
Subsequent crawler conditional request: The conditional request is based on
the precondition header values saved from a previous request. The values are sent back to the
server for validation in the If-None-Match and If-Modified-Since
request headers.
|
GET /hello.world HTTP/1.1 Host: www.example.com Accept-Language: en, hu User-Agent: Googlebot/2.1 (+http://www.google.com/bot.html) If-None-Match: "34aa387-d-1568eb00" ... |
GET /hello.world HTTP/1.1 Host: www.example.com Accept-Language: en, hu User-Agent: Googlebot/2.1 (+http://www.google.com/bot.html) If-Modified-Since: Fri, 4 Sep 1998 19:15:56 GMT ... |
Server response to the conditional request: Since precondition header values
sent by the crawler are validated on the server's side, the server returns a 304
HTTP status code (without an HTTP body) to the crawler. This will happen to every subsequent
request until the preconditions fail to validate (the ETag or the
Last-Modified date changes on the server's side).
|
HTTP/1.1 304 Not Modified Date: Fri, 4 Sep 1998 19:15:50 GMT Expires: Fri, 4 Sep 1998 19:15:52 GMT Vary: Accept-Encoding If-None-Match: "34aa387-d-1568eb00" ... |
HTTP/1.1 304 Not Modified Date: Fri, 4 Sep 1998 19:15:50 GMT Expires: Fri, 4 Sep 1998 19:15:51 GMT Vary: Accept-Encoding If-Modified-Since: Fri, 4 Sep 1998 19:15:56 GMT ... |
If you're in the business of making your users happy and perhaps also want to potentially save a few bucks on your hosting bill, talk to your hosting or CMS provider, or your developers about how to enable HTTP caching for your site. If nothing else, your users will like you a bit more.
If you wanna chat about caching, head to your nearest Search Central help community, and if you have comments about how we're caching, leave feedback on the documentation about caching that we published together with this blog post.
Want to learn more about crawling? Check out the entire Crawling December series:
First Search Central Live in South Africa
Monday, March 3, 2025 On April 2, 2025 we'll be in Johannesburg, South Africa for the very first Search Central Live event in Africa! We're looking forward to an afternoon with a packed program of presentations, Q&A, and networking. We want to
Search Central Live is going to New York City
Thursday, February 13, 2025 We're excited to announce that Search Central Live is coming to New York City for the first time on March 20, 2025. The Google Search team has organized events in the city several times, but this time we're bringing our
Search Central Live is returning to Brazil in 2025
Wednesday, January 15, 2025 We're excited to announce that Search Central Live is returning to São Paulo in 2025. Following our successful events in 2023 and 2024, we're continuing our mission to help Brazilian businesses enhance their site's
Search Central Live Kuala Lumpur and Taipei 2024: Recap
Friday, December 13, 2024 The Search Central Live events in Kuala Lumpur and Taipei were nothing short of amazing, in large thanks to the over 600 people who attended the events! We were thrilled to see the level of enthusiasm and engagement from
Join us at Search Central Live in Zurich!
Wednesday, November 20, 2024 We invite you to join us for Search Central Live Zurich 2024 on Dec 12, 2024 in the Google office in Zurich! We are looking forward to meeting you for a selection of presentations from the teams behind Google Search. Like
Search Central Live Jakarta and Bangkok 2024: it's a wrap
Tuesday, October 15, 2024 Our first two Search Central Live events in Asia this year have been wrapped up and we finished looking back at what we've learned and what we can do better. On July 25, 2024, we were delighted to host 335 people in Jakarta
Search Central Live 2024 is coming to Kuala Lumpur and Taipei
Tuesday, Aug 27, 2024 As previously announced, Search Central Live is coming to Kuala Lumpur and to Taipei in the last quarter of 2024. And yes, we're very excited! What can you expect? A whole day of learning about Search while having some fun along
Search Central Live Bangkok 2024
Thursday, June 20, 2024 We're coming back to Thailand with Search Central Live! As mentioned in our blog post about our APAC plans for SCL, we now have a date and a site where you can sign up for a chance to secure your spot at Search Central Live
Search Central Live 2024 is coming back to the APAC region
Wednesday, May 29, 2024 Search Central Live is coming back to the Asia Pacific region, bringing you insights from Google Search, fun networking opportunity, and more! This year we're aiming to visit Indonesia, Malaysia, Taiwan, and Thailand, but keep
Search Central Live 2024 in Warsaw, Poland
Monday, March 25, 2024 We're excited to announce a Search Central Live event in Warsaw, Poland on April 24, 2024. Search Central Live is our global Google Search event series specifically for site owners, publishers, and SEOs. We'd love to invite
Search Central Live 2024 in Bucharest, Romania
Monday, March 4, 2024 We're excited to announce a Search Central Live event in Bucharest, Romania on April 4, 2024. Search Central Live is our global Google Search event series specifically for site owners, publishers, and SEOs. We'd love to invite
Announcing Search Central Live Argentina
Thursday, February 1, 2024 We're excited to announce that Search Central Live is coming to Buenos Aires on March 5, 2024. Following successful events throughout the world last year, we're continuing our mission to help website owners to enhance their
Search Central Live is returning to Brazil
Thursday, February 1, 2024 We're excited to announce that Search Central Live is returning to São Paulo. Following our successful events last year, we're continuing our mission to help Brazilian businesses enhance their site's performance in Google
Traditional Chinese Blog: Search Central Live Taipei 2023
Thursday, November 2, 2023 We are ecstatic to announce that for the first time ever, Search Central Live is coming to Taipei! Part of the Search Central Live events series, this is the first in-person event we are organizing in Taiwan, and we're
Search Central Live Singapore 2023
Wednesday, October 18, 2023 We're excited to announce that Search Central Live Singapore is coming back this year again on November 22. As usual, the event is hosted by the Google Search team and we're looking forward to bringing together the
Search Central Live Zurich is back!
Friday, September 29, 2023 We're very excited to announce that Search Central Live is coming back to Zurich on Tuesday, October 24th, 2023! After our last Search Central Live event in Zurich in December 2019, we're back in town for more
Announcing the Search Central Live Mexico roadshow
Wednesday, September 20, 2023 We're very excited to announce that Search Central Live is coming to Ciudad de México on November 7! Following successful events throughout the world earlier this year, we're continuing our mission to help you enhance
Search Central Live is coming to India
Thursday, August 17, 2023 With three Search Central Live events behind us this year, it's time to announce the one we get asked about the most: Search Central India! We are coming to you to two locations: Interested? Sign up for Search Central Live
Search Central Live Tokyo and Jakarta: it's a wrap
Thursday, July 27, 2023 It’s been about a month since Search Central Live Tokyo concluded and about 2 weeks since SCL Jakarta! Looking back at the events makes us happy; it’s really great to be back and meet people in person! But let's ponder what we
Indonesian Blog: Search Central Live Jakarta 2023
Wednesday, June 7, 2023 A few weeks back, we announced that Search Central Live is coming to a few locations in Asia Pacific throughout 2023. If you speak Indonesian fluently and are interested, Search Central Live Jakarta 2023 is happening on July
Search Central Live: Tokyo and future plans for APAC
Tuesday, May 16, 2023 Search Central Live is back in Asia Pacific throughout 2023 and we can’t wait to meet you in person! In November 2022 we resumed our search event series with SCL Singapore. Encouraged by the success of the event, we're now ready
Announcing the Search Central Live Brazil roadshow
Monday, February 6, 2023 We're very excited to announce that Search Central Live is coming to Brazil! Following our successful events last year, we're continuing our mission to help you enhance your site's performance in Google Search. We're
Search Central Live 2022 Highlights
Friday, December 16, 2022 We spent the past few weeks busy ( and excited! ) hosting Search Central Live events in Singapore and Tel Aviv! We've missed you since our last Webmaster Conference in early 2020 and it's so great to see you again in person.
Back in business: Search Central Live events
Thursday, October 13, 2022 In 2019 we organized events called Webmaster Conference for site owners and SEOs in over 15 countries, specifically in regions that otherwise don't get much search conference love. Then we had to stop for a while, but