Local Database
透過集合功能整理內容
你可以依據偏好儲存及分類內容。
Google 安全瀏覽 v5 預期用戶端會維護本機資料庫,除非用戶端選擇不儲存即時模式。本機資料庫的格式和儲存方式由用戶端自行決定。從概念上來說,這個本機資料庫的內容可視為包含各種清單的資料夾,而這些檔案的內容則是 SHA256 雜湊,或其對應的前置字串,其中四個位元組的前置字串是最常用的雜湊長度。
可用的清單
系統會透過清單的專屬名稱來識別清單,這些名稱遵循命名慣例,其中的後置字串會代表清單中應有的雜湊長度。同樣屬於威脅類型但雜湊長度不同的雜湊清單,會以不同的名稱和後置字串 (表示雜湊長度) 命名。
以下清單可與雜湊清單方法搭配使用。
清單名稱 |
對應的 v4 ThreatType 列舉 |
說明 |
gc-32b |
無 |
這份清單是 Global Cache 清單。這是一份特殊清單,僅用於即時模式運作。 |
se-4b |
SOCIAL_ENGINEERING |
這份清單包含 SOCIAL_ENGINEERING 威脅類型的威脅。 |
mw-4b |
MALWARE |
這份清單包含電腦平台的惡意軟體威脅類型威脅。 |
uws-4b |
UNWANTED_SOFTWARE |
這份清單包含桌面平台的 UNWANTED_SOFTWARE 威脅類型。 |
uwsa-4b |
UNWANTED_SOFTWARE |
這份清單包含 Android 平台的 UNWANTED_SOFTWARE 威脅類型威脅。 |
pha-4b |
POTENTIALLY_HARMFUL_APPLICATION |
這份清單包含 Android 平台的 POTENTIALLY_HARMFUL_APPLICATION 威脅類型。 |
我們日後可能會提供其他清單,屆時上表會擴充,而 hashList.list 方法 的結果也會顯示與最新清單類似的結果。
解碼清單內容
解碼雜湊和雜湊前置字串
所有清單都會使用特殊編碼來縮減大小。這種編碼方式的運作原理是,在概念上,Google 安全瀏覽服務清單包含一組雜湊或雜湊前置字串,這些雜湊與隨機整數在統計上無法區分。如果我們對這些整數進行排序,並計算相鄰的差異,這些相鄰的差異在某種程度上應是「小」的。Golomb-Rice 編碼會利用這個特性。
假設要使用 4 位元組雜湊前置字元傳送三個主機尾碼路徑前置字元運算式,即 a.example.com/
、b.example.com/
和 y.example.com/
。再假設 Rice 參數 (以 k 表示) 的值為
- 伺服器會先計算這些字串的完整雜湊,分別為:
291bc5421f1cd54d99afcc55d166e2b9fe42447025895bf09dd41b2110a687dc a.example.com/
1d32c5084a360e58f1b87109637a6810acad97a861a7769e8f1841410d2a960c b.example.com/
f7a502e56e8b01c6dc242b35122683c9d25d07fb1f532d9853eb0ef3ff334f03 y.example.com/
然後,伺服器會為上述每個項目建立 4 位元組的雜湊字首,也就是 32 位元組完整雜湊的前 4 位元組,並解讀為大端序 32 位元整數。大端序是指完整雜湊值的第一個位元組成為 32 位元整數的最高有效位元組。這個步驟會產生 0x291bc542、0x1d32c508 和 0xf7a502e5 等整數。
伺服器必須依字典順序排序這三個雜湊前置字元 (等同於 big endian 的數字排序),排序結果為 0x1d32c508、0x291bc542、0xf7a502e5。第一個雜湊前置字會儲存在 first_value
欄位中,不會變更。
接著,伺服器會計算兩個相鄰的差異值,分別為 0xbe9003a 和 0xce893da3。假設 k 的值為 30,伺服器會將這兩個數字分為商和餘數,分別為 2 位元和 30 位元長度。對於第一個數字,除數部分為零,餘數為 0xbe9003a;對於第二個數字,由於最重大的兩個位元在二進位為 11,因此除數部分為 3,餘數為 0xe893da3。對於給定的商 q
,系統會使用 1 + q
位元編碼成 (1 << q) - 1
,而餘數則會直接使用 k 位元編碼。第一個數字的除數部分會編碼為 0,餘數部分則為二進位 001011111010010000000000111010;第二個數字的除數部分會編碼為 0111,餘數部分則為 001110100010010011110110100011。
當這些數字形成位元組字串時,系統會使用 little endian。從概念上來說,您可能比較容易想像從最不重要的位元開始形成的長位元字串:我們取第一個數字的除數部分,並在第一個數字的餘數部分前方加上這個值;然後,我們再進一步在第二個數字的除數部分前方加上這個值,並在前方加上餘數部分。這應該會產生下列大數字 (為了清楚起見,我們加入了換行符號和註解):
001110100010010011110110100011 # Second number, remainder part
0111 # Second number, quotient part
001011111010010000000000111010 # First number, remainder part
0 # First number, quotient part
以單一行寫入的話,會是
00111010001001001111011010001101110010111110100100000000001110100
顯然,這個數字遠遠超過單一位元組可用的 8 位元。小端編碼會接著取出該數字中最低有效的 8 位元,並將其輸出為第一個位元組,也就是 01110100。為了方便說明,我們可以將上述位元字串分組,從最不重要的位元開始,每組八位元:
0 01110100 01001001 11101101 00011011 10010111 11010010 00000000 01110100
然後,位元組由小到大編碼會從右側取出每個位元組,並將其放入位元組字串:
01110100
00000000
11010010
10010111
00011011
11101101
01001001
01110100
00000000
從圖中可知,由於我們在概念上將新部分置於左側的大數字之前 (也就是增加更多有效位元),但我們是從右側 (也就是最末位元) 進行編碼,因此可以逐步執行編碼和解碼作業。
最終會導致
additions_four_bytes {
first_value: 489866504
rice_parameter: 30
entries_count: 2
encoded_data: "t\000\322\227\033\355It\000"
}
用戶端只要按照上述步驟反向運作,即可解碼雜湊字首。
解碼移除索引
移除索引的編碼方式與上述使用 32 位元整數的編碼方式完全相同。
更新頻率
用戶端應檢查 minimum_wait_duration
欄位中伺服器傳回的值,並使用該值安排資料庫的下一次更新作業。這個值可能為零 (minimum_wait_duration
欄位完全缺少),在這種情況下,用戶端應立即執行其他更新。
除非另有註明,否則本頁面中的內容是採用創用 CC 姓名標示 4.0 授權,程式碼範例則為阿帕契 2.0 授權。詳情請參閱《Google Developers 網站政策》。Java 是 Oracle 和/或其關聯企業的註冊商標。
上次更新時間:2025-07-25 (世界標準時間)。
[null,null,["上次更新時間:2025-07-25 (世界標準時間)。"],[],[],null,["# Local Database\n\nGoogle Safe Browsing v5 expects the client to maintain a local database, except when the client chooses the [No-Storage Real-Time Mode](/safe-browsing/reference#no-storage-real-time-mode). It is up to the client the format and storage of this local database. The contents of this local database can conceptually be thought of as a folder containing various lists as files, and the contents of these files are SHA256 hashes, or their corresponding prefixes with four byte hash prefix being the most commonly used hash length.\n\n### Available Lists\n\nLists are identified by their distinct names which follows a naming convention where the name contains a suffix that signifies the length of the hash you should expect in the list. Hash lists with the same threat type but different hash length will be a separately named list that's qualified with a suffix that indicates the hash length.\n\nThe following lists are available for use with the hash list methods.\n\n| List Name | Corresponding v4 `ThreatType` Enum | Description |\n|-----------|------------------------------------|------------------------------------------------------------------------------------------------------|\n| `gc-32b` | None | This list is a Global Cache list. It is a special list only used in the Real-Time mode of operation. |\n| `se-4b` | `SOCIAL_ENGINEERING` | This list contains threats of the SOCIAL_ENGINEERING threat type. |\n| `mw-4b` | `MALWARE` | This list contains threats of the MALWARE threat type for desktop platforms. |\n| `uws-4b` | `UNWANTED_SOFTWARE` | This list contains threats of the UNWANTED_SOFTWARE threat type for desktop platforms. |\n| `uwsa-4b` | `UNWANTED_SOFTWARE` | This list contains threats of the UNWANTED_SOFTWARE threat type for Android platforms. |\n| `pha-4b` | `POTENTIALLY_HARMFUL_APPLICATION` | This list contains threats of the POTENTIALLY_HARMFUL_APPLICATION threat type for Android platforms. |\n\nAdditional lists can become available at a later date, at which time the above table will be expanded, and the results from the [hashList.list method](/safe-browsing/reference/rest/v5/hashList/list) will show a similar result with the most up to date lists.\n\n### Database Updates\n\nThe client will regularly call the [hashList.get method](/safe-browsing/reference/rest/v5/hashList/get) or the [hashLists.batchGet method](/safe-browsing/reference/rest/v5/hashLists/batchGet) to update the database. Since the typical client will want to update multiple lists at a time, it is recommended to use [hashLists.batchGet method](/safe-browsing/reference/rest/v5/hashLists/batchGet).\n\nThe list names will never be renamed. Furthermore, once a list has appeared, it will never be removed (if the list is no longer useful, it will become empty but will continue to exist). Therefore, it is appropriate to hard code these names in the Google Safe Browsing client code.\n\nBoth the [hashList.get method](/safe-browsing/reference/rest/v5/hashList/get) and the [hashLists.batchGet method](/safe-browsing/reference/rest/v5/hashLists/batchGet) support incremental updates. Using incremental updates saves bandwidth and improves performance. Incremental updates work by delivering a delta between client's version of the list and the latest version of the list. (If a client is newly deployed and does not have any versions available, a full update is available.) The incremental update contains removal indices and additions. The client is first expected to remove the entries at the specified indices from its local database, and then apply the additions.\n\nFinally, to prevent corruption, the client should check the stored data against the checksum provided by the server. Whenever the checksum does not match, the client should perform a full update.\n\n### Decoding the List Content\n\n#### Decoding Hashes and Hash Prefixes\n\nAll lists are delivered using a special encoding to reduce size. This encoding works by recognizing that Google Safe Browsing lists contain, conceptually, a set of hashes or hash prefixes, which are statistically indistinguishable from random integers. If we were to sort these integers and take their adjacent difference, such adjacent difference is expected to be \"small\" in a sense. [Golomb-Rice encoding](https://en.wikipedia.org/wiki/Golomb_coding) then exploits this smallness.\n\nSuppose that three host-suffix path-prefix expressions, namely `a.example.com/`, `b.example.com/`, and `y.example.com/`, are to be transmitted using 4-byte hash prefixes. Further suppose that the Rice parameter, denoted by k, is chosen to be\n\n1. The server would start by calculating the full hash for these strings, which are, respectively:\n\n 291bc5421f1cd54d99afcc55d166e2b9fe42447025895bf09dd41b2110a687dc a.example.com/\n 1d32c5084a360e58f1b87109637a6810acad97a861a7769e8f1841410d2a960c b.example.com/\n f7a502e56e8b01c6dc242b35122683c9d25d07fb1f532d9853eb0ef3ff334f03 y.example.com/\n\nThe server then forms 4-byte hash prefixes for each of the above, which is the first 4 bytes of the 32-byte full hash, interpreted as big-endian 32-bit integers. The big endianness refers to the fact that the first byte of the full hash becomes the most significant byte of the 32-bit integer. This step results in the integers 0x291bc542, 0x1d32c508, and 0xf7a502e5.\n\nIt is necessary for the server to sort these three hash prefixes lexicographically (equivalent to numerical sorting in big endian), and the result of the sorting is 0x1d32c508, 0x291bc542, 0xf7a502e5. The first hash prefix is stored unchanged in the `first_value` field.\n\nThe server then calculates the two adjacent differences, which are 0xbe9003a and 0xce893da3 respectively. Given that k is chosen to be 30, the server splits these two numbers into the quotient parts and remainder parts that are 2 and 30 bits long respectively. For the first number, the quotient part is zero and the remainder is 0xbe9003a; for the second number, the quotient part is 3 because the most significant two bits are 11 in binary and the remainder is 0xe893da3. For a given quotient `q` it is encoded into `(1 \u003c\u003c q) - 1` using exactly `1 + q` bits; the remainder is encoded directly using k bits. The quotient part of the first number is encoded as 0, and the remainder part is in binary 001011111010010000000000111010; the quotient part of the second number is encoded as 0111, and the remainder part is 001110100010010011110110100011.\n\nWhen these numbers are formed into a byte string, little endian is used. Conceptually it may be easier to imagine a long bitstring being formed starting from the least significant bits: we take the quotient part of the first number and prepend the remainder part of the first number; we then further prepend the quotient part of the second number and prepend the remainder part. This should result in the following large number (linebreaks and comments added for clarity): \n\n 001110100010010011110110100011 # Second number, remainder part\n 0111 # Second number, quotient part\n 001011111010010000000000111010 # First number, remainder part\n 0 # First number, quotient part\n\nWritten in a single line this would be \n\n 00111010001001001111011010001101110010111110100100000000001110100\n\nObviously this number far exceeds the 8 bits available in a single byte. The little endian encoding then takes the least significant 8 bits in that number, and outputs it as the first byte which is 01110100. For clarity, we can group the above bitstring into groups of eight starting from the least significant bits: \n\n 0 01110100 01001001 11101101 00011011 10010111 11010010 00000000 01110100\n\nThe little endian encoding then takes each byte from the right and puts that into a bytestring: \n\n 01110100\n 00000000\n 11010010\n 10010111\n 00011011\n 11101101\n 01001001\n 01110100\n 00000000\n\nIt can be seen that since we conceptually *prepend* new parts to the large number on the left (i.e. adding more significant bits) but we encode from the right (i.e. the least significant bits), the encoding and decoding can be performed incrementally.\n\nThis finally results in \n\n additions_four_bytes {\n first_value: 489866504\n rice_parameter: 30\n entries_count: 2\n encoded_data: \"t\\000\\322\\227\\033\\355It\\000\"\n }\n\nThe client simply follows the above steps in reverse to decode the hash prefixes.\n\n#### Decoding Removal Indices\n\nRemoval indices are encoded using the exact same technique as above using 32-bit integers.\n\n### Update Frequency\n\nThe client should inspect the server's returned value in the field `minimum_wait_duration` and use that to schedule the next update of the database. This value is possibly zero (the field `minimum_wait_duration` is completely missing), in which case the client SHOULD immediately perform another update."]]