Real Time Mode

When clients choose to use Google Safe Browsing v5 in real-time mode, clients will maintain in their local database: (i) a Global Cache of likely-benign sites, formatted as SHA256 hashes of host-suffix/path-prefix URL expressions, (ii) a set of threat lists, formatted as SHA256 hash prefixes of host-suffix/path-prefix URL expressions. The high-level idea is that whenever the client wishes to check a particular URL, a local check is performed using the Global Cache. If that check passes, a local threat lists check is performed. Otherwise, the client continues with the real-time hash check as detailed below.

Besides the local database, the client will maintain a local cache. Such a local cache need not be in persistent storage and may be cleared in case of memory pressure.

A detailed specification of the procedure is available below.

The Real-Time URL Check Procedure

This procedure takes a single URL u and returns SAFE, UNSAFE or UNSURE. If it returns SAFE the URL is deemed safe by Google Safe Browsing. If it returns UNSAFE the URL is deemed potentially unsafe by Google Safe Browsing and appropriate action should be taken: such as showing a warning to the end user, moving a received message to the spam folder, or requiring extra confirmation by the user before proceeding. If it returns UNSURE, the following local-check procedure should be used afterwards.

  1. Let expressions be a list of suffix/prefix expressions generated by the URL u.
  2. Let expressionHashes be a list, where the elements are SHA256 hashes of each expression in expressions.
  3. For each hash of expressionHashes:
    1. If hash can be found in the global cache, return UNSURE.
  4. Let expressionHashPrefixes be a list, where the elements are the first 4 bytes of each hash in expressionHashes.
  5. For each expressionHashPrefix of expressionHashPrefixes:
    1. Look up expressionHashPrefix in the local cache.
    2. If the cached entry is found:
      1. Determine whether the current time is greater than its expiration time.
      2. If it is greater:
        1. Remove the found cached entry from the local cache.
        2. Continue with the loop.
      3. If it is not greater:
        1. Remove this particular expressionHashPrefix from expressionHashPrefixes.
        2. Check whether the corresponding full hash within expressionHashes is found in the cached entry.
        3. If found, return UNSAFE.
        4. If not found, continue with the loop.
    3. If the cached entry is not found, continue with the loop.
  6. Send expressionHashPrefixes to the Google Safe Browsing v5 server using RPC SearchHashes or the REST method hashes.search. If an error occurred (including network errors, HTTP errors, etc), return UNSURE. Otherwise, let response be the response received from the SB server, which is a list of full hashes together with some auxiliary information identifying the nature of the threat (social engineering, malware, etc), as well as the cache expiration time expiration.
  7. For each fullHash of response:
    1. Insert fullHash into the local cache, together with expiration.
  8. For each fullHash of response:
    1. Let isFound be the result of finding fullHash in expressionHashes.
    2. If isFound is False, continue with the loop.
    3. If isFound is True, return UNSAFE.
  9. Return SAFE.

While this protocol specifies when the client sends expressionHashPrefixes to the server, this protocol purposefully does not specify exactly how to send them. For example, it is acceptable for the client to send all the expressionHashPrefixes in a single request, and it is also acceptable for the client to send each individual prefix in expressionHashPrefixes to the server in separate requests (perhaps proceeding in parallel). It is also acceptable for the client to send unrelated or randomly generated hash prefixes together with the hash prefixes in expressionHashPrefixes, as long as the number of hash prefixes sent in a single request does not exceed 30.