Local Database
使用集合让一切井井有条
根据您的偏好保存内容并对其进行分类。
Google 安全浏览 v5 预计客户端会维护本地数据库,但在客户端选择无存储空间的实时模式时除外。此本地数据库的格式和存储方式由客户端决定。从概念上讲,此本地数据库的内容可视为一个文件夹,其中包含各种列表,而这些文件的内容是 SHA256 哈希值或相应的前缀,其中四字节哈希前缀是最常用的哈希长度。
可用列表
列表通过其独特的名称进行标识,这些名称遵循命名惯例,其中名称包含一个后缀,表示列表中预期的哈希长度。具有相同威胁类型但不同哈希长度的哈希列表将是单独命名的列表,并带有指示哈希长度的后缀进行限定。
以下列表可与哈希列表方法搭配使用。
列表名称 |
相应的 v4 ThreatType 枚举 |
说明 |
gc-32b |
收不到任何通知 |
此列表是全局缓存列表。这是一个特殊列表,仅在实时操作模式下使用。 |
se-4b |
SOCIAL_ENGINEERING |
此列表包含 SOCIAL_ENGINEERING 威胁类型的威胁。 |
mw-4b |
MALWARE |
此列表包含适用于桌面平台的 MALWARE 威胁类型的威胁。 |
uws-4b |
UNWANTED_SOFTWARE |
此列表包含桌面平台的 UNWANTED_SOFTWARE 威胁类型的威胁。 |
uwsa-4b |
UNWANTED_SOFTWARE |
此列表包含 Android 平台的 UNWANTED_SOFTWARE 威胁类型的威胁。 |
pha-4b |
POTENTIALLY_HARMFUL_APPLICATION |
此列表包含 Android 平台的 POTENTIALLY_HARMFUL_APPLICATION 威胁类型的威胁。 |
我们日后可能会提供其他列表,届时上表将会扩展,并且 hashList.list 方法的结果将会显示类似的结果,其中包含最新的列表。
解码列表内容
解码哈希和哈希前缀
所有名单均采用特殊编码传送,以缩减大小。这种编码的工作原理是,从概念上讲,Google 安全浏览名单包含一组哈希值或哈希前缀,这些值在统计上与随机整数无法区分。如果我们对这些整数进行排序并取其相邻差值,那么在某种意义上,这样的相邻差值应该是“较小”的。然后,Golomb-Rice 编码会利用这一特性。
假设要使用 4 字节的哈希前缀传输三个主机后缀路径前缀表达式,即 a.example.com/
、b.example.com/
和 y.example.com/
。进一步假设,Rice 参数(记为 k)被选为
- 服务器首先会计算以下字符串的完整哈希值,分别为:
291bc5421f1cd54d99afcc55d166e2b9fe42447025895bf09dd41b2110a687dc a.example.com/
1d32c5084a360e58f1b87109637a6810acad97a861a7769e8f1841410d2a960c b.example.com/
f7a502e56e8b01c6dc242b35122683c9d25d07fb1f532d9853eb0ef3ff334f03 y.example.com/
然后,服务器会为上述每个内容分别构建 4 字节的哈希前缀,即 32 字节完整哈希的前 4 个字节,解读为大端字节序 32 位整数。大端序是指完整哈希的第一个字节成为 32 位整数的最高有效字节。此步骤会生成整数 0x291bc542、0x1d32c508 和 0xf7a502e5。
服务器必须按字典顺序对这三个哈希前缀进行排序(相当于大端序中的数字排序),排序结果为 0x1d32c508、0x291bc542、0xf7a502e5。第一个哈希前缀会以不变的形式存储在 first_value
字段中。
然后,服务器会计算两个相邻差异,分别为 0xbe9003a 和 0xce893da3。假设 k 被选为 30,服务器会将这两个数字拆分为长度分别为 2 位和 30 位的商部分和余数部分。对于第一个数字,商部分为零,余数为 0xbe9003a;对于第二个数字,商部分为 3,因为最有显位的两位二进制数为 11,余数为 0xe893da3。对于给定的商 q
,它会使用恰好 1 + q
位编码为 (1 << q) - 1
;余数则直接使用 k 位编码。第一个数的分子部分编码为 0,余数部分为二进制 001011111010010000000000111010;第二个数的分子部分编码为 0111,余数部分为 001110100010010011110110100011。
将这些数字组成字节字符串时,使用的是小端字节序。从概念上讲,从最低有效位开始形成一个长位字符串可能更容易理解:我们取第一个数的分子部分,并将第一个数的余数部分附加到前面;然后,我们进一步将第二个数的分子部分附加到前面,并将余数部分附加到前面。这应该会生成以下大数字(添加了换行符和注释以便于理解):
001110100010010011110110100011 # Second number, remainder part
0111 # Second number, quotient part
001011111010010000000000111010 # First number, remainder part
0 # First number, quotient part
用一行代码编写时,代码如下所示
00111010001001001111011010001101110010111110100100000000001110100
显然,这个数字远远超出了单个字节可用的 8 位。然后,小端字节序编码会取该数字中的最低有效 8 位,并将其作为第一个字节输出,即 01110100。为方便起见,我们可以从最低有效位开始,将上述位字符串分组为 8 位一组:
0 01110100 01001001 11101101 00011011 10010111 11010010 00000000 01110100
然后,小端字节编码会从右侧获取每个字节,并将其放入字节串中:
01110100
00000000
11010010
10010111
00011011
11101101
01001001
01110100
00000000
可以看出,由于我们在概念上会将新部分附加到左侧的大数(即添加更多有效位),但我们会从右侧(即最低有效位)进行编码,因此可以增量地进行编码和解码。
最终会导致
additions_four_bytes {
first_value: 489866504
rice_parameter: 30
entries_count: 2
encoded_data: "t\000\322\227\033\355It\000"
}
客户端只需按上述步骤的反向顺序解码哈希前缀即可。
解码移除索引
删除索引使用与上述完全相同的技术编码,并使用 32 位整数。
更新频率
客户端应检查 minimum_wait_duration
字段中服务器返回的值,并使用该值来安排数据库的下一次更新。此值可能为零(minimum_wait_duration
字段完全缺失),在这种情况下,客户端应立即执行另一次更新。
如未另行说明,那么本页面中的内容已根据知识共享署名 4.0 许可获得了许可,并且代码示例已根据 Apache 2.0 许可获得了许可。有关详情,请参阅 Google 开发者网站政策。Java 是 Oracle 和/或其关联公司的注册商标。
最后更新时间 (UTC):2025-07-25。
[null,null,["最后更新时间 (UTC):2025-07-25。"],[],[],null,["# Local Database\n\nGoogle Safe Browsing v5 expects the client to maintain a local database, except when the client chooses the [No-Storage Real-Time Mode](/safe-browsing/reference#no-storage-real-time-mode). It is up to the client the format and storage of this local database. The contents of this local database can conceptually be thought of as a folder containing various lists as files, and the contents of these files are SHA256 hashes, or their corresponding prefixes with four byte hash prefix being the most commonly used hash length.\n\n### Available Lists\n\nLists are identified by their distinct names which follows a naming convention where the name contains a suffix that signifies the length of the hash you should expect in the list. Hash lists with the same threat type but different hash length will be a separately named list that's qualified with a suffix that indicates the hash length.\n\nThe following lists are available for use with the hash list methods.\n\n| List Name | Corresponding v4 `ThreatType` Enum | Description |\n|-----------|------------------------------------|------------------------------------------------------------------------------------------------------|\n| `gc-32b` | None | This list is a Global Cache list. It is a special list only used in the Real-Time mode of operation. |\n| `se-4b` | `SOCIAL_ENGINEERING` | This list contains threats of the SOCIAL_ENGINEERING threat type. |\n| `mw-4b` | `MALWARE` | This list contains threats of the MALWARE threat type for desktop platforms. |\n| `uws-4b` | `UNWANTED_SOFTWARE` | This list contains threats of the UNWANTED_SOFTWARE threat type for desktop platforms. |\n| `uwsa-4b` | `UNWANTED_SOFTWARE` | This list contains threats of the UNWANTED_SOFTWARE threat type for Android platforms. |\n| `pha-4b` | `POTENTIALLY_HARMFUL_APPLICATION` | This list contains threats of the POTENTIALLY_HARMFUL_APPLICATION threat type for Android platforms. |\n\nAdditional lists can become available at a later date, at which time the above table will be expanded, and the results from the [hashList.list method](/safe-browsing/reference/rest/v5/hashList/list) will show a similar result with the most up to date lists.\n\n### Database Updates\n\nThe client will regularly call the [hashList.get method](/safe-browsing/reference/rest/v5/hashList/get) or the [hashLists.batchGet method](/safe-browsing/reference/rest/v5/hashLists/batchGet) to update the database. Since the typical client will want to update multiple lists at a time, it is recommended to use [hashLists.batchGet method](/safe-browsing/reference/rest/v5/hashLists/batchGet).\n\nThe list names will never be renamed. Furthermore, once a list has appeared, it will never be removed (if the list is no longer useful, it will become empty but will continue to exist). Therefore, it is appropriate to hard code these names in the Google Safe Browsing client code.\n\nBoth the [hashList.get method](/safe-browsing/reference/rest/v5/hashList/get) and the [hashLists.batchGet method](/safe-browsing/reference/rest/v5/hashLists/batchGet) support incremental updates. Using incremental updates saves bandwidth and improves performance. Incremental updates work by delivering a delta between client's version of the list and the latest version of the list. (If a client is newly deployed and does not have any versions available, a full update is available.) The incremental update contains removal indices and additions. The client is first expected to remove the entries at the specified indices from its local database, and then apply the additions.\n\nFinally, to prevent corruption, the client should check the stored data against the checksum provided by the server. Whenever the checksum does not match, the client should perform a full update.\n\n### Decoding the List Content\n\n#### Decoding Hashes and Hash Prefixes\n\nAll lists are delivered using a special encoding to reduce size. This encoding works by recognizing that Google Safe Browsing lists contain, conceptually, a set of hashes or hash prefixes, which are statistically indistinguishable from random integers. If we were to sort these integers and take their adjacent difference, such adjacent difference is expected to be \"small\" in a sense. [Golomb-Rice encoding](https://en.wikipedia.org/wiki/Golomb_coding) then exploits this smallness.\n\nSuppose that three host-suffix path-prefix expressions, namely `a.example.com/`, `b.example.com/`, and `y.example.com/`, are to be transmitted using 4-byte hash prefixes. Further suppose that the Rice parameter, denoted by k, is chosen to be\n\n1. The server would start by calculating the full hash for these strings, which are, respectively:\n\n 291bc5421f1cd54d99afcc55d166e2b9fe42447025895bf09dd41b2110a687dc a.example.com/\n 1d32c5084a360e58f1b87109637a6810acad97a861a7769e8f1841410d2a960c b.example.com/\n f7a502e56e8b01c6dc242b35122683c9d25d07fb1f532d9853eb0ef3ff334f03 y.example.com/\n\nThe server then forms 4-byte hash prefixes for each of the above, which is the first 4 bytes of the 32-byte full hash, interpreted as big-endian 32-bit integers. The big endianness refers to the fact that the first byte of the full hash becomes the most significant byte of the 32-bit integer. This step results in the integers 0x291bc542, 0x1d32c508, and 0xf7a502e5.\n\nIt is necessary for the server to sort these three hash prefixes lexicographically (equivalent to numerical sorting in big endian), and the result of the sorting is 0x1d32c508, 0x291bc542, 0xf7a502e5. The first hash prefix is stored unchanged in the `first_value` field.\n\nThe server then calculates the two adjacent differences, which are 0xbe9003a and 0xce893da3 respectively. Given that k is chosen to be 30, the server splits these two numbers into the quotient parts and remainder parts that are 2 and 30 bits long respectively. For the first number, the quotient part is zero and the remainder is 0xbe9003a; for the second number, the quotient part is 3 because the most significant two bits are 11 in binary and the remainder is 0xe893da3. For a given quotient `q` it is encoded into `(1 \u003c\u003c q) - 1` using exactly `1 + q` bits; the remainder is encoded directly using k bits. The quotient part of the first number is encoded as 0, and the remainder part is in binary 001011111010010000000000111010; the quotient part of the second number is encoded as 0111, and the remainder part is 001110100010010011110110100011.\n\nWhen these numbers are formed into a byte string, little endian is used. Conceptually it may be easier to imagine a long bitstring being formed starting from the least significant bits: we take the quotient part of the first number and prepend the remainder part of the first number; we then further prepend the quotient part of the second number and prepend the remainder part. This should result in the following large number (linebreaks and comments added for clarity): \n\n 001110100010010011110110100011 # Second number, remainder part\n 0111 # Second number, quotient part\n 001011111010010000000000111010 # First number, remainder part\n 0 # First number, quotient part\n\nWritten in a single line this would be \n\n 00111010001001001111011010001101110010111110100100000000001110100\n\nObviously this number far exceeds the 8 bits available in a single byte. The little endian encoding then takes the least significant 8 bits in that number, and outputs it as the first byte which is 01110100. For clarity, we can group the above bitstring into groups of eight starting from the least significant bits: \n\n 0 01110100 01001001 11101101 00011011 10010111 11010010 00000000 01110100\n\nThe little endian encoding then takes each byte from the right and puts that into a bytestring: \n\n 01110100\n 00000000\n 11010010\n 10010111\n 00011011\n 11101101\n 01001001\n 01110100\n 00000000\n\nIt can be seen that since we conceptually *prepend* new parts to the large number on the left (i.e. adding more significant bits) but we encode from the right (i.e. the least significant bits), the encoding and decoding can be performed incrementally.\n\nThis finally results in \n\n additions_four_bytes {\n first_value: 489866504\n rice_parameter: 30\n entries_count: 2\n encoded_data: \"t\\000\\322\\227\\033\\355It\\000\"\n }\n\nThe client simply follows the above steps in reverse to decode the hash prefixes.\n\n#### Decoding Removal Indices\n\nRemoval indices are encoded using the exact same technique as above using 32-bit integers.\n\n### Update Frequency\n\nThe client should inspect the server's returned value in the field `minimum_wait_duration` and use that to schedule the next update of the database. This value is possibly zero (the field `minimum_wait_duration` is completely missing), in which case the client SHOULD immediately perform another update."]]