压缩
使用集合让一切井井有条
根据您的偏好保存内容并对其进行分类。
本文档适用于以下方法:Update API (v4):threatListUpdates.fetch。
关于压缩
压缩是安全浏览 API (v4) 的一项关键功能。压缩可显着降低带宽要求,这对移动设备尤为重要,但不仅仅如此。
安全浏览服务器目前支持 Rice 压缩。将来可能会添加其他压缩方法。
使用 supportedCompressions 字段和 CompressionType 设置压缩。
客户端应使用 RICE 和 RAW 压缩类型。如果未设置压缩类型,安全浏览将使用 COMPRESSION_TYPE_UNSPECIFIED 类型(将替换 RAW 压缩)。
无论选择哪种压缩类型,只要客户端设置正确的 HTTP 压缩标头(请参阅维基百科文章 HTTP 压缩),Safe Browsing 服务器还会使用标准 HTTP 压缩来进一步压缩响应。
Rice 压缩
如前所述,安全浏览服务器目前支持 Rice 压缩(如需全面了解 Golomb-Rice 编码,请参阅维基百科上关于 Golomb 编码的文章)。
压缩/解压缩
RiceDeltaEncoding 对象表示 Rice-Golomb 编码数据,用于发送压缩的移除索引或压缩的 4 字节哈希前缀。(长度超过 4 个字节的哈希前缀将不会进行压缩,并以原始格式提供。)
对于移除索引,索引列表按升序排序,然后使用 RICE 编码进行增量编码。对于添加项,系统会将 4 个字节的哈希前缀重新解读为小端字节序 uint32s,按升序排列,然后使用 RICE 编码进行增量编码。
请注意 RICE 压缩和 RAW 之间的哈希格式差异:原始哈希是按字典顺序排序的字节,而 Rice 哈希是按升序排序的 uint32(解压缩后)。
也就是说,整数列表 [1、5、7、13] 将被编码为 1(第一个值)和增量 [4、2、6]。
第一个值存储在 firstValue
字段中,而增量则使用 Golomb-Rice 编码器进行编码。Rice 参数 k(请参阅下文)存储在 riceParameter 中。numEntries
字段包含 Rice 编码器中编码的增量数量(在我们的上面的示例中为 3,而不是 4)。encodedData
字段包含实际编码的增量。
编码器/解码器
在 Rice 编码器/解码器中,每个增量 n 都编码为 q 和 r,其中 n = (q<<k) + r(或 n = q * (2**k) + r)。k 是 Rice 编码器/解码器的常量和参数。q 和 r 的值在位流中使用不同的编码方案进行编码。
商 q 采用一元编码进行编码,后跟 0。也就是说,3 将编码为 1110,4 编码为 11110,7 编码为 11111110。先对商 q 进行解码。
余数 r 使用截断的二进制编码进行编码。只有 r 中最不重要的 k 位才会从位流写入(也就是读取)。余数 r 在解码 q 之后进行解码。
位编码器/解码器
Rice 编码器依赖于位编码器/解码器,可以将单个位附加到位编码器上;也就是说,对可能只有两位的长度的商 q 进行编码。
位编码器是一个(8 位)字节的列表。位从第一个字节中的最低有效位设置到第一个字节中的最高有效位。如果字节的所有位都已设置,则将初始化为零的新字节会附加到字节列表的末尾。如果最后一个字节未完全使用,则其最高有效位设置为零。示例:
已添加的位 |
在添加位后的 BitEncoder |
|
[] |
0 |
[00000000] |
1 |
[00000010] |
1 |
[00000110] |
1,0,1 |
[00101110] |
0,0,0 |
[00101110, 00000000] |
1,1,0 |
[00101110, 00000110] |
如未另行说明,那么本页面中的内容已根据知识共享署名 4.0 许可获得了许可,并且代码示例已根据 Apache 2.0 许可获得了许可。有关详情,请参阅 Google 开发者网站政策。Java 是 Oracle 和/或其关联公司的注册商标。
最后更新时间 (UTC):2025-07-25。
[null,null,["最后更新时间 (UTC):2025-07-25。"],[[["\u003cp\u003eSafe Browsing APIs (v4) utilize compression, primarily Rice compression, to minimize bandwidth usage, especially beneficial for mobile devices.\u003c/p\u003e\n"],["\u003cp\u003eClients should specify RICE or RAW compression types using the \u003ccode\u003esupportedCompressions\u003c/code\u003e field and \u003ccode\u003eCompressionType\u003c/code\u003e enum; if unspecified, RAW is used by default.\u003c/p\u003e\n"],["\u003cp\u003eIn addition to Rice or RAW compression, Safe Browsing servers employ standard HTTP compression if the client sets the appropriate HTTP compression header.\u003c/p\u003e\n"],["\u003cp\u003eRice compression involves encoding data using the Rice-Golomb method, where data is delta-encoded and represented using the \u003ccode\u003eRiceDeltaEncoding\u003c/code\u003e object.\u003c/p\u003e\n"],["\u003cp\u003eThe Rice encoder/decoder utilizes unary coding for the quotient and truncated binary encoding for the remainder, relying on a bit encoder/decoder to append individual bits to a byte list.\u003c/p\u003e\n"]]],["The Safe Browsing API uses compression to reduce bandwidth, supporting Rice and RAW compression. Clients specify compression types using `supportedCompressions` and `CompressionType`. Rice compression encodes removal indices and 4-byte hash prefixes by sorting values as uint32s, delta encoding them, and storing them in `RiceDeltaEncoding`. This involves unary coding quotients and truncated binary encoding remainders. A bit encoder manages bit streams, packing bits into bytes, adding new bytes as needed. The API also uses HTTP compression.\n"],null,["# Compression\n\nThis document applies to the following method:\n[Update API (v4)](/safe-browsing/v4/update-api):\n[threatListUpdates.fetch](/safe-browsing/v4/update-api#example-threatListUpdatesfetch).\n\nAbout compression\n-----------------\n\nCompression is a key feature of the Safe Browsing APIs (v4). Compression significantly reduces\nbandwidth requirements, which is particularly, but not exclusively, relevant for mobile devices.\nThe Safe Browsing server currently supports Rice compression. Additional compression methods may\nbe added in the future.\n\nCompression is set using the\n[supportedCompressions](/safe-browsing/reference/rest/v4/threatListUpdates/fetch#constraints)\nfield and\n[CompressionType](/safe-browsing/reference/rest/v4/threatListUpdates/fetch#compressiontype).\nClients should use the RICE and RAW compression types. Safe Browsing uses the\nCOMPRESSION_TYPE_UNSPECIFIED type when the compression type is not set (RAW compression will be\nsubstituted).\n\nThe Safe Browsing server will also use standard HTTP compression to further compress responses,\nregardless of the compression type selected, as long as the client sets the correct HTTP compression\nheader (see the Wikipedia article [HTTP compression](https://en.wikipedia.org/wiki/HTTP_compression)).\n\nRice compression\n----------------\n\nAs noted, the Safe Browsing server currently supports Rice compression (see the Wikipedia article\n[Golomb coding](https://en.wikipedia.org/wiki/Golomb_coding)\nfor a full discussion of Golomb-Rice coding).\n\n### Compression/decompression\n\nThe\n[RiceDeltaEncoding](/safe-browsing/reference/rest/v4/threatListUpdates/fetch#RiceDeltaEncoding)\nobject represents the Rice-Golomb encoded data and is used to send compressed removal indices or compressed\n4-byte hash prefixes. (Hash prefixes longer than 4 bytes will not be compressed, and will be served in raw\nformat instead.)\n\nFor removal indices, the list of indices is sorted in ascending order and then delta encoded\nusing RICE encoding. For additions, the 4-byte hash prefixes are re-interpreted as\nlittle-endian uint32s, sorted in ascending order, and then delta encoded using RICE encoding.\nNote the difference in hash format between RICE compression and RAW: raw hashes are\nlexicographically sorted bytes, whereas Rice hashes are uint32s sorted in ascending order (after\ndecompression).\n\nThat is, the list of integers \\[1, 5, 7, 13\\] will be encoded as 1 (the first value) and the\ndeltas \\[4, 2, 6\\].\n\nThe first value is stored in the `firstValue` field and the deltas are encoded using a Golomb-Rice\nencoder. The Rice parameter k (see below) is stored in riceParameter. The `numEntries` field\ncontains the number of deltas encoded in the Rice encoder (3 in our example above, not 4). The\n`encodedData` field contains the actual encoded deltas.\n\n### Encoder/decoder\n\nIn the Rice encoder/decoder every delta n is encoded as q and r where n = (q\\\u003c\\\u003ck) + r\n(or, n = q \\* (2\\*\\*k) + r). k is a constant and a parameter of the Rice encoder/decoder. The\nvalues for q and r are encoded in the bit stream using different encoding schemes.\n\nThe quotient q is encoded in unary coding followed by a 0. That is, 3 would be encoded as 1110, 4\nas 11110 and 7 as 11111110. The quotient q is decoded first.\n\nThe remainder r is encoded using truncated binary encoding. Only the least significant k bits\nof r are written (and therefore read) from the bit stream. The remainder r is decoded after having\ndecoded q.\n\n### Bit encoder/decoder\n\nThe Rice encoder relies on a bit encoder/decoder where single bits can be appended to the bit\nencoder; that is, to encode a quotient q that could be only two bits long.\n\nThe bit encoder is a list of (8-bit) bytes. Bits are set from the lowest significant bit in the\nfirst byte to the highest significant bit in the first byte. If a byte has all its bits already\nset, a new byte (initialized to zero) is appended to the end of the byte list. If the last byte\nis not fully used, its highest significant bits are set to zero. Example:\n\n| Bits Added | BitEncoder After Adding Bits |\n|------------|------------------------------|\n| | \\[\\] |\n| 0 | \\[00000000\\] |\n| 1 | \\[00000010\\] |\n| 1 | \\[00000110\\] |\n| 1,0,1 | \\[00101110\\] |\n| 0,0,0 | \\[00101110, 00000000\\] |\n| 1,1,0 | \\[00101110, 00000110\\] |"]]