AES-GCM-HKDF Streaming AEAD

This document formally defines the mathematical function represented by AES-GCM-HKDF Streaming keys, encoded in proto format as type.googleapis.com/google.crypto.tink.AesGcmHkdfStreamingKey.

This encryption is loosely based on HRRV15¹. For security analysis, we refer to HS20².

Key and parameters

Keys are described by the following parts (all sizes in this document are in bytes):

$\mathrm{KeyValue}$, a byte string.
$\mathrm{CiphertextSegmentSize} \in \{1, 2, \ldots, 2^{31}-1\}$.
$\mathrm{DerivedKeySize} \in \{16, 32\}$.
$\mathrm{HkdfHashType} \in \{\mathrm{SHA1}, \mathrm{SHA256}, \mathrm{SHA512}\}$.

Valid keys additionally satisfy the following properties:

$\mathrm{len}(\mathrm{KeyValue}) \geq \mathrm{DerivedKeySize}$.
$\mathrm{CiphertextSegmentSize} > \mathrm{DerivedKeySize} + 24$ (This equals $\mathrm{len}(\mathrm{Header}) + 16$ as explained later).

Keys which don't satisfy any of these properties are rejected by Tink, either when the key is parsed or when the corresponding primitive is created.

Encryption function

To encrypt a message $\mathrm{Msg}$ with associated data $\mathrm{AssociatedData}$, we create a header, split the message into segments, encrypt each segment, and concatenate the encrypted segments.

Create the header

We pick a uniform random string $\mathrm{Salt}$ of length $\mathrm{DerivedKeySize}$ and a uniform random string $\mathrm{NoncePrefix}$ of length 7.

We then set $\mathrm{Header} := \mathrm{len}(\mathrm{Header}) \| \mathrm{Salt} \| \mathrm{NoncePrefix}$, where the length of the header is encoded as a single byte. Note that $\mathrm{len}(\mathrm{Header}) \in \{24, 40\}$.

Next, we use HKDF³ with the hash function given by $\mathrm{HkdfHashType}$ and inputs $\mathrm{ikm} := \mathrm{KeyValue}$, $\mathrm{salt} := \mathrm{Salt}$, and $\mathrm{info} := \mathrm{AssociatedData}$, with output length $\mathrm{DerivedKeySize}$. We call the result $\mathrm{DerivedKey}$.

Split the message

The message $\mathrm{Msg}$ is next split into parts: $\mathrm{Msg} = M_0 \| M_1 \| \cdots \| M_{n-1}$.

Their lengths are chosen to satisfy:

$\mathrm{len}(M_0) \in \{0,\ldots, \mathrm{CiphertextSegmentSize} - \mathrm{len}(\mathrm{Header}) - \mathrm{16}\}$.
If $n>1$, then $\mathrm{len}(M_1), \ldots, \mathrm{len}(M_{n-1}) \in \{1,\ldots, \mathrm{CiphertextSegmentSize} - \mathrm{16}\}$.
If $n>1$, then $\mathrm{len}(M_{0}), \ldots, \mathrm{len}(M_{n-2})$ must have maximal length according to the above to constraints.

$n$ may be at most $2^{32}$. Otherwise, encryption fails.

Encrypt the blocks

To encrypt segment $M_i$, we compute $\mathrm{IV}_i := \mathrm{NoncePrefix} \| \mathrm{i} \| b$, where $\mathrm{i}$ is 4 bytes in big-endian encoding and byte $b$ is 0x00 if $i < n-1$ and 0x01 otherwise.

We then encrypt $M_i$ using AES-GCM⁴, where the key is $\mathrm{DerivedKey}$, the initialization vector is $\mathrm{IV}_i$, and the associated data is the empty string. $C_i$ is the result of this encryption (i.e. the concatenation of $C$ and $T$ in section 5.2.1.2 of the linked AES-GCM reference).

Concatenate the encrypted segments

Finally, all segments are concatenated as $\mathrm{Header} \| C_0 \| \cdots \| C_{n-1}$, which is the final ciphertext.

Decryption

Decryption inverts the encryption. We use the header to obtain $\mathrm{NoncePrefix}$, and decrypt each segment of ciphertext individually.

APIs may (and typically do) allow random access, or access to the beginning of a file without inspecting the end of the file. This is intentional, since it is possible to decrypt $M_i$ from $C_i$, without decrypting all previous and remaining ciphertext blocks.

However, APIs should be careful to not allow users to confuse end-of-file and decryption errors: in both cases the API probably has to return an error, and ignoring the difference can lead to an adversary being able to effectively truncate files.

Serialization and parsing of keys

To serialize a key in the "Tink Proto" format, we first map the parameters in the obvious way into the proto given at aes_gcm_hkdf_streaming.proto. The field version needs to be set to 0. We then serialize this using normal proto serialization, and embed the resulting string in the value of field of a KeyData proto. We set the type_url field to type.googleapis.com/google.crypto.tink.AesGcmHkdfStreamingKey. We then set key_material_type to SYMMETRIC, and embed this into a keyset. We usually set the output_prefix_type to RAW. The exception is that if the key was parsed with a different value set for output_prefix_type, Tink may either write RAW or the previous value.

To parse a key, we reverse the above process (in the usual way when parsing protos). The field key_material_type is ignored. The value of output_prefix_type can either be ignored, or keys which have output_prefix_type different from RAW can be rejected. Keys which have a version different from 0 must be rejected.

Known issues

Implementations of the above encryption function are not expected to be fork safe. See Fork Safety.

References

Hoang, Reyhanitabar, Rogaway, Vizar, 2015. Online authenticated-encryption and its nonce-reuse misuse-resistance. CRYPTO 2015. https://eprint.iacr.org/2015/189 ↩
Hoang, Shen, 2020. Security of Streaming Encryption in Google's Tink Library. https://eprint.iacr.org/2020/1019 ↩
RFC 5869. HMAC-based Extract-and-Expand Key Derivation Function (HKDF). https://www.rfc-editor.org/rfc/rfc5869 ↩
NIST SP 800-38D. Recommendation for Block Cipher Modes of Operation: Galois/Counter Mode (GCM) and GMAC. https://csrc.nist.gov/pubs/sp/800/38/d/final ↩