Optimal PRFs from Blockcipher Designs

. Cryptographic modes built on top of a blockcipher usually rely on the assumption that this primitive behaves like a pseudorandom permutation (PRP). For many of these modes, including counter mode and GCM, stronger security guarantees could be derived if they were based on a PRF design. We propose a heuristic method of transforming a dedicated blockcipher design into a dedicated PRF design. Intuitively, the method consists of evaluating the blockcipher once, with one or more intermediate state values fed-forward. It shows strong resemblance with the optimally secure EDMD construction by Mennink and Neves (CRYPTO 2017), but the use of internal state values make their security analysis formally inapplicable. In support of its security, we give the rationale of relying on the EDMD function (as opposed to alternatives), and present analysis of simpliﬁed versions of our conversion method applied to the AES. We conjecture that our main proposal AES-PRF, AES with a feed-forward of the middle state, achieves close to optimal security. We apply the design to GCM and GCM-SIV, and demonstrate how it entails signiﬁcant security improvements. We furthermore demonstrate how the technique extends to tweakable blockciphers and allows for security improvements in, for instance, PMAC1.


Introduction
The conventional approach to cryptographic designs is to evaluate a blockcipher in a certain mode of operation, and undoubtedly the vast majority of MAC functions, encryption schemes, and authenticated encryption schemes follow this paradigm.It allows to reduce the security of the (keyed) construction in a standard model argument to the security of the keyed underlying primitive.The approach is, to a certain extent, a natural one.Ample literature discusses the design [DR02, KR11, DR01, RDP + 96, DKR97, Vau03, Mat96, HT96, DPU + 16] and analysis [BS93, Mat93, Knu94, LH94, JK97, BBS99, Wag99, BDK01] of blockciphers, and we even have a widely deployed and well-understood standardized blockcipher, the AES [DR02].For modes that evaluate the cryptographic primitive in the forward as well as inverse direction, one in fact needs an invertible primitive, and a blockcipher is the most logical choice.
However, many cryptographic primitives in the literature evaluate the underlying blockcipher in forward direction only.For example, counter mode encryption [BDJR97] (the observation applies equally well to authenticated encryption mode GCM [MV04]) internally uses a blockcipher E : {0, 1} κ × {0, 1} n → {0, 1} n to encrypt a message m = m 1 • • • m l ∈ ({0, 1} n ) l as c i = E k (ctr + i) ⊕ m i for i = 1, . . ., l. Counter mode can be distinguished from a random encryption scheme in about 2 n/2 data blocks: an adversary can keep m i constant and observe that the c i never collide whereas they likely collide for a random encryption scheme.Inspired by this, it makes more sense to not use a blockcipher, but rather a dedicated pseudorandom function inside counter mode (and GCM).This is no longer a theoretical purity concern.Birthday-style attacks on common blockcipher modes of operation have been shown to be feasible in real scenarios [BL16,McG12], and in the case of counter mode they could be entirely avoided by choosing a PRF instead of a PRP.Although [BL16] could be chalked up to legacy ciphers (e.g., DES or Blowfish) being used in modern protocols, this is not necessarily the case in general.Over the last decade, there has been a flurry of ciphers targeting low-end hardware, which overwhelmingly have small block sizes in common.Some, like SIMON [BSS + 13], SPECK, SIMECK [YZS + 15], KATAN, or KTANTAN [CDK09], go as far as having 32-bit block variants.A birthday bound here renders these ciphers nearly unusable in several relevant modes of operation.
Another prominent example scheme that, to a lesser extent, benefits from using a pseudorandom function over a pseudorandom permutation is Wegman-Carter MAC [WC81,Bra82]: where F k is a PRF and h is a universal hash function.Not only is WC typically used with a PRF, the concept of PRFs was developed (in part) to make Wegman-Carter work with short keys [Bra82,GGM86].Nevertheless, given that blockciphers are better understood, Shoup suggested to use a blockcipher instead, in what is now known as the Wegman-Carter-Shoup MAC [Sho96], where E k is a PRP.But despite quantitative improvements from Shoup [Sho96] and Bernstein [Ber05], WCS based on a PRP remains stuck at the birthday bound, unlike WC based on a PRF.Further examples of schemes that would benefit from the usage of a PRF abound.Unfortunately, unlike the case of blockciphers, dedicated fixed input length pseudorandom function designs are scarce: the only well-known candidate in literature is SURF by Bernstein [Ber97].In fact, this scarcity was one of the reasons for the introduction of WCS over WC.

Generic PRP-PRF Conversion Functions
Various methods of generically transforming a PRP into a PRF have appeared in literature.First off, the well-known PRP-PRF switch [Fre77, IR88, BKR94, HWKS98, BR06, CN08] suggests to simply view the PRP as a PRF, which can be done as long as q 2 n/2 .Mennink and Neves [MN17] summarized four main directions in achieving security beyond the birthday bound.
A first direction is in truncating permutations, as first suggested in cryptographic context by Hall et al. [HWKS98].Bellare and Impagliazzo [BI99] and later Gilboa and Gueron [GG16] proved that truncating an n-bit blockcipher by m < n bits is secure up to about 2 m+n 2 queries.1 On the downside, truncation decreases the rate at which randomness is generated and hence makes the mode less efficient.
Bellare et al. [BKR98] were the first to suggest the xor of permutations, Following a sequence of analyses by Lucks [Luc00] and Bellare and Impagliazzo [BI99], Patarin achieved 2 n /67 security [Pat08,Pat13b,Pat10].The results generalize to more permutations [CLP14,MP15], as well as to the single key variant with domain separation [Pat10].Using XoP in counter mode or Wegman-Carter would yield optimal security, but the resulting scheme is twice as expensive.
Iwata [Iwa06] offered a compromise between the xor of permutations and traditional counter mode with the CENC mode of operation: E k1 can be used as a mask which is used for the encryption of w ≥ 1 blocks using E k2 .If w = 1, CENC constitutes counter mode based on XoP, but also for larger (but still reasonably small) values of w, CENC achieves essentially optimal security [IMV16] and shares most of the benefits of the xor of permutations construction without much of a performance hit.Nevertheless, CENC remains a mode of operation, not a PRF primitive, and thus is not usable as a general replacement for a PRF.
Two novel constructions are EDM by Cogliati and Seurin [CS16] and EDMD by Mennink and Neves [MN17]: Mennink and Neves proved security of EDM up to approximately 2 n /(67n) queries and EDMD up to approximately 2 n /67 queries.However, just like the xor of permutations, these two generic modes are again twice as expensive.

Towards a Dedicated PRF
None of the above generic methods seem particularly suitable for the design of an efficient PRF.In particular, it is difficult to argue for their practical usage when they entail such a noticeable slowdown.What if, instead, we could design a secure and efficient PRF from scratch?One could try and design a non-invertible round function, and design a PRF around its iteration.However, non-invertible round functions are hard to get right, as collision probabilities are amplified with each iteration, and the track record of this design approach is not very reassuring [PK14,Dae16].
Instead, our approach is to stick with tried-and-tested designs, namely those of blockciphers.Our key observation is that the EDMD structure of (5) is particularly suited to (heuristically) transform imperfect random permutations into a good PRF.We call this heuristic construction FastPRF.At a high level, the idea of FastPRF is as follows: if a blockcipher E k consists of r rounds, the blockcipher is evaluated exactly one time, with a predetermined selection of state values fed-forward.Naturally, the strength of FastPRF highly depends on the blockcipher itself, as well as on the choice of states that are fed-forward.For example, if E is any blockcipher, and the 0th state is fed-forward, this effectively corresponds to which can be distinguished from random in about 2 n/2 evaluations (cf., Section 3.3).A more logical choice is to use the middle state.Let E 1 k and E 2 k be the first and second r/2 rounds of the cipher, for example.Then we can define FastPRF as Closer inspection at SURF reveals that the high-level structure behind (7) matches that of SURF [Ber97], yet FastPRF is more general.We can observe that (7) resembles the structure of EDMD with As a matter of fact, the generalized FastPRF method in Section 2.2 is based on the generalized GEDMD construction which we introduce in Section 2.1 and which we prove to attain at least the same level of security as EDMD.
Unfortunately, the security guarantees of EDMD and GEDMD do not transfer to FastPRF.For one, the same key is used for both permutations.Additionally, the underlying permutations are neither ideal nor independently drawn.Thus, we cannot claim "provable" security of FastPRF designs.One could argue the security of FastPRF using the "prove-then-prune" approach, introduced by Hoang et al. [HKR15] to argue the security of AEZ, but invoking "prove-then-prune" still requires both a solid heuristic argument for the instantiation, as well as cryptanalytic results.We discuss the rationale of FastPRF in Section 2.3.
We see FastPRF as a potentially fruitful concrete object of study and analysis, but also as an opportunity for blockcipher designers-particularly in the lightweight space-to define a PRF along with their designs, in order to widen their applicability.As initial concrete target, we propose AES-PRF: an instantiation of FastPRF based on the AES standard.We introduce and analyze this instantiation in Section 3.
We demonstrate the applicability of our scheme in Section 4, by instantiating GCM and GCM-SIV with it.In more detail, whereas the original GCM [MV04] achieves birthday bound security only, GCM instantiated with a dedicated PRF is optimally secure.Likewise, using a dedicated PRF inside GCM-SIV [GL15, GLL17, LLG17] yields significant security and efficiency improvements, both in the subkey derivation and in the internal evaluation of GCM.In Section 5, we briefly elaborate on the neat and almost immediate extension of our technique to tweakable blockciphers such as SKINNY [BJK + 16].Extending FastPRF to tweakable blockciphers effectively results in a compressing fixed-input-length PRF.The resulting construction can contribute to a removal of the length parameter in security bounds, as we exemplify for PMAC1.The extension to tweakable blockciphers finds further applications in MAC functions based on compressing fixed-input-length PRFs such as Yasuda's [Yas08] construction and NI + [DNP16].

Optimal PRFs from Blockciphers
In this section we describe a general method of transforming an iterative blockcipher into a PRF.We begin with a generalization of Mennink and Neves's EDMD construction [MN17], and demonstrate that it achieves at least the same level of security (Section 2.1).Then, in Section 2.2 we show how to use this construction to design native PRFs.We elaborate on its rationale in Section 2.3.

Generalized EDMD
We will demonstrate that GEDMD d has at least the same level of security as EDMD = GEDMD 2 .

Security Model
Denote by perm(n) the set of all permutations on {0, 1} n , and by func(m, n) the set of all functions from {0, 1} m → {0, 1} n .For a blockcipher E : {0, 1} κ × {0, 1} n → {0, 1} n , we denote its PRP security against distinguisher D by where the probabilities are taken over uniform random drawings k where the probabilities are taken over uniform random drawings k

Security of GEDMD
Mennink and Neves [MN17] proved that EDMD is secure up to 2 n /67 evaluations.
One can easily observe that, for any d ≥ 3, GEDMD d is at least as secure as GEDMD d−1 .As such, the bound of Lemma 1 is inherited by GEDMD d for any For any distinguisher D with query complexity at most q ≤ 2 n /67, we have for some distinguishers D and D with the same query and time complexity as D.
Proof (Proof).Consider any distinguisher D whose goal is to distinguish By a hybrid argument, we have However, as E k1 is a permutation, the distributions of F k1,g and f are identical.Inductive application yields, for any d ≥ 2, where GEDMD2 = EDMD.The proof is completed using Lemma 1.

FastPRF Design
To prevent some classes of attacks (e.g., slide attacks [BW99]), most blockciphers consist of distinct round functions.This might be accomplished by a complex key schedule, by adding round constants, or some other way.In effect, the result is that most blockciphers resemble the following structure, in one way or another: While this immediately suggests the use of GEDMD with each state value fed-forward, this is not a good idea: round functions are often too weak and too simple for each of them to realistically resemble a random permutation.
Because round functions are individually weak, blockciphers tend to have a lot of them.Instead of looking at the cipher in terms of round functions, we can see it in terms of groups of round functions, as follows: where each E i k function is comprised of a number of rounds, i.e., d < r.We can now define a PRF out of this representation by applying GEDMD: Concrete blockciphers are not ideal permutations, and even though a good blockcipher can be considered as a pseudorandom permutation, we cannot directly apply the results of Section 2.1 to FastPRF: one requires the individual groups of round functions E i k to be mutually independent and sufficiently random.In general, however, this is the nature of concrete ciphers.There is a long history of concrete designs, such as Feistel networks or key-alternating ciphers, taking provably-secure structures and instantiating them with weaker-but efficiently computable-round functions.

Rationale
The goal behind FastPRF of (13) is to achieve an optimal or quasi-optimal PRF at the same cost as a regular blockcipher.We are convinced by the plausibility of this quest, given that there is no complexity-theoretic reason that a PRF should be significantly slower to compute than a PRP.To achieve our goal, we looked at the currently known PRP to PRF conversion methods (cf., Section 1.1), and investigated which of them suits our purposes best.
Truncation necessarily entails a significant slowdown compared to the original blockcipher.To salvage this slowdown, one could, hypothetically, split the blockcipher and output the concatenation of the truncation of both: where trunc truncates by n/2 bits.This construction has two strong drawbacks: 1.An attacker has direct access to the output of half the cipher, making the function significantly riskier to use than the original blockcipher; 2. Even assuming that both halves are ideal, one would still be far from obtaining optimal security: distinguishing this construction from random can be done in approximately 2 3n/4 queries [GG16].
The xor of permutations can be used likewise with the same cost as a single E k evaluation as However, similar to the case of truncation, the attacker has more or less direct access to half the cipher; in particular, many useful properties may survive the xor of two weaker primitives.This, once again, makes this primitive riskier to use than the original cipher.This risk is technically eliminated using Cogliati and Seurin's EDM construction: Indeed, unlike truncation and xor of permutations, EDM does not expose the results of half the cipher directly to an adversary.However, there is still some risk involved with this construction: the attacker has control over the intermediate state.

Concrete Instantiation: AES-PRF
We present a concrete instantiation of FastPRF based on the AES [DR02].We adopt the most straightforward choice: • For 128-bit keys and 10 rounds, we define AES-PRF-128 to be AES xored with the internal state after 5 rounds (cf., Figure 1); • For 192-bit keys and 12 rounds, we define AES-PRF-192 to be AES xored with the internal state after 6 rounds; • For 256-bit keys and 14 rounds, we define AES-PRF-256 to be AES xored with the internal state after 7 rounds.
Alternative choices naturally exist.One could, for example, split the 192-bit key case into three 4-round permutations, resulting in an arguably stronger PRF.Note that, by design, each of the proposals consists of one full AES xored with an intermediate state.
Throughout, we assume that the full AES is a secure pseudorandom permutation.In addition, we denote the first t ≥ 0 rounds of AES by AES t , where the instance of AES (128-, 192-, or 256-bit keys) is usually clear from the context.For convention, we define the special case AES 0 (x) to be x instead of x ⊕ k.

Efficiency
It is easy to verify that AES-PRF is essentially as fast as the AES.The additional overhead is composed of • An additional 128 bits to store the intermediate state(s) to xor the output with.
As far as performance goes, the xor is negligible compared to the cost of the full cipher.
The extra state might be more of a problem for implementers in heavily constrained environments, but we argue that in most cases, this is not a problem.For example, in counter mode, we can xor the intermediate state directly with the message block to encrypt, thus eliminating the need for another separate state.
We have implemented AES-PRF-128 in counter mode for some Intel and AMD x86_64 processors, using the AES-NI instruction set, and its performance is nearly indistinguishable from using the AES directly, as can be verified in Table 1.The only noteworthy case is Sandy Bridge, which is particularly sensitive to instruction scheduling.However, for Sandy Bridge, the overhead of incrementing the counter is far more noticeable-0.72cycles per byte against the optimal 0.63-than the overhead incurred by using AES-PRF over AES.Of course, the same implementation precautions must be taken with AES-PRF as with AES.In particular, implementations that make use of table lookups are susceptible to cachetiming attacks [TOS10].In most consumer hardware this is no longer an insurmountable problem, with AES-NI being present in the majority of Intel and AMD processors, and with ARMv8-A, SPARC T4, POWER8, and others also having dedicated constant-time AES instructions available.

Security Analysis
To assess the concrete security of the AES-PRF construction, it is necessary to dive into the details of the AES.In particular, the 5-round AES is the weakest link in the construction.
There are well-established bounds for the maximum expected differential and linear probabilities of 4-round AES [KMT01, PSLL03, KS07].Therefore, we do not expect differential or linear attacks to have any meaningful success against AES-PRF.
While there are several attacks that do efficiently break AES reduced to 5 rounds, e.g., [DKR97, BK00, Bir04, Tun12], these attacks appear to be inapplicable here, as the 5-round output is masked by a full AES application.Moreover, most such attacks rely on 3-or 4-round distinguishers, followed by key recovery; direct distinguishers for 5 rounds have only recently been discovered, and have massive data and time requirements [SLG + 16, GRR16].The most promising attack known to date, by Grassi et al. [GRR17], belongs to the subspace trail family of attacks [GRR16], and expects the number of output differences of a certain kind of input difference to be a multiple of 8.In AES-PRF, it is not possible to directly access this information, so it seems unlikely that this kind of distinguisher is feasible in our setting.
More generally, attacks that rely on observing relations between tuples of outputs seem unlikely to be successful-the attacker is only able to observe these relations masked by a strong PRP.Suppose, for the sake of argument, that a high-probability differential ∆ AES5 −−−→ ∆ existed.What the attacker would be able to see, in effect, would be Even if AES 5 (x i ) ⊕ AES 5 (x i ⊕ ∆) were effectively constant, AES 10 (x i ) ⊕ AES 10 (x i ⊕ ∆) is not, and is in itself essentially indistinguishable from the distribution of differentials of a random function [DR07] (assuming, again, security of the full AES).This masking becomes even more pronounced for higher-order attacks, which are the most dangerous attack class for reduced-round AES.
AES-PRF is also, of course, vulnerable to any attack that does not rely on particular properties of AES 5 , but only on the high-level structure of EDMD.The very costly distinguishers enabled by this structure are, for the sake of completeness, described in Appendix A. We conjecture that AES-PRF cannot be distinguished from random significantly faster than by either bruteforcing the key or by the generic attacks of Appendix A.

AES-PRF 0
AES-PRF 0 is exactly the Davies-Meyer construction.It is known to be no less distinguishable than the underlying blockcipher.In more detail, distinguishing AES-PRF 0 from a random function f is equally hard as distinguishing AES-PRF 0 ⊕ id from random.This function, however, does not expose collisions, and can be distinguished from random with 2 n/2 queries [CS16].

AES-PRF 1
AES-PRF 1 is the first nontrivial case, and the above attack no longer works.However, AES-PRF 1 is still not more secure than AES-PRF 0 : one can perform a key-recovery attack with 2 67 queries.
We may rewrite AES-PRF 1 (x) as where P is the non-keyed portion of the AES round, i.e., MixColumns•ShiftRows•SubBytes.
We rely here in the following property of the AES round: In other words, 4 well-chosen bytes will only affect some other 4 bytes of the output.In particular, input bytes s 0 , s 5 , s 10 , s 15 only affect output bytes s 0 , s 1 , s 2 , s 3 ; input bytes s 4 , s 9 , s 14 , s 13 only affect output bytes s 4 , s 5 , s 6 , s 7 ; input bytes s 8 , s 13 , s 2 , s 7 only affect output bytes s 8 , s 9 , s 10 , s 11 ; and input bytes s 12 , s 1 , s 6 , s 11 only affect output bytes s 12 , s 13 , s 14 , s 15 .Thus, if we correctly guess 4 bytes of the key on one of those input positions, this will cancel out the contribution of P (x ⊕ k) on the corresponding 4 output bytes, in which case we obtain AES 10 (x) ⊕ k 1 verbatim, which can be distinguished from random by its lack of collisions.

For each (A → B) pair:
(a) For all values a of A: i. Compute S = {y i ⊕ P (x i ⊕ a)} for all q queries; ii.If there are no collisions in S, we either succeeded in finding the correct key or did not collect enough queries; iii.If there are collisions in S, but S iB = S j B for all i = j, the current value a is likely the correct choice for the value of k A .Set k A = a, and move on to the next (A → B) pair.
In total, we require approximately 2 67 queries, 2 101 computations, and 2 67 memory.The number of queries is justified by the following criteria: • The probability that there is no collision after q queries is approximately e −( q 2 )/2 128 .With q = 2 67 , this probability is suitably small: approximately 2 −46 ; • The probability that every collision misses the currently active B is 2 −32 per collision.
The key observation here is that, under the wrong key randomization hypothesis, only the correct key guess k results in the set {AES-PRF 1 (x i ) ⊕ P (x i ⊕ k )} to have no collisions with high probability.We take advantage of the fact that 1 round of AES does not have full diffusion and allows us to guess 32 bits of the key at a time.This attack may improve its time complexity by noticing that once bytes of k are available, so are the corresponding bytes of k 1 = AES 10 (x) ⊕ P (x ⊕ k).Exploiting the key schedule may accelerate the filtering of incorrect key guesses.

AES-PRF 2
The same attack strategy used for AES-PRF 1 no longer works for two rounds.Any single byte affects every output byte, so detecting where collisions happen is no longer reliable as a means to verify correct keys.However, we do believe that AES-PRF 2 is still within reach of an efficient attack, and leave it as an open problem.

AES-PRF 9
We can also look at the other end of unbalanced variants of AES-PRF.The first thing to notice is that the last round does not include MixColumns. 3 This means that AES-PRF 9 can be written, for the first row of the state, as By observing the distribution of x ⊕ S(x) (see Table 2), we see that only 3 outputs have maximal probability 1/64: Therefore, by observing the frequency of bytes 0, 4, 8, 12 of AES-PRF 9 for sufficiently many outputs (q 2 8 ), we are able to derive each of k 0 , k 4 , k 8 , k 12 as one of 3 possibilities with high confidence.For the other rows the same principle does not apply, since their bytes are of the form x i ⊕ S(x j ), whose output is balanced.Nevertheless, recovering the first 32 bits of the key cheaply is still an attack, and the distribution of x ⊕ S(x) acts as very efficient distinguisher here.

Application to GCM and GCM-SIV
We discuss the security of GCM by McGrew and Viega [MV04] and GCM-SIV by Gueron and Lindell [GL15], in case they are instantiated using FastPRF.The reasoning below directly applies to counter mode encryption, as GCM uses this mode internally.

Security Model
Formally, an authenticated encryption scheme AE consists of two algorithms Enc, Dec.The encryption algorithm Enc gets as input a key k, a nonce n, associated data ad, a message m, and outputs a ciphertext c, and tag t.The decryption algorithm Dec gets as input a key k, a nonce n, associated data ad, a ciphertext c, and a tag t, and outputs either a message m or a dedicated ⊥-sign, where Dec(k, n, ad, Enc(k, n, ad, m)) = m is required to hold for any (k, n, ad, m).
Overview of the GCM mode with a 96-bit nonce, a single block of associated data, and two blocks of plaintext (resp.ciphertext).⊗ H is multiplication by H in F 2 128 , whereas 1 is addition modulo 2 32 of the least significant bytes of the state.
Security of AE = (Enc, Dec) is usually measured via its confidentiality and authenticity.We denote the confidentiality of AE against a distinguisher D by where $(n, a, m) always returns a random (c, t) $ ← − {0, 1} |m|+τ (where τ is the tag size), and where the probabilities are taken over uniform random drawings k $ ← − {0, 1} κ and $.The authenticity of AE against a distinguisher D is denoted by where ⊥ always returns the ⊥-sign, and where the probabilities are taken over the uniform random drawing k $ ← − {0, 1} κ .For authenticity, the distinguisher is not allowed to relay a response from its first oracle to its second oracle.Unless explicitly stated otherwise, we will consider the case where D is required to be nonce-respecting: it is not allowed to repeat a nonce in an encryption query (it may reuse a nonce in a decryption query).

AES-PRF-GCM
GCM is an authenticated encryption scheme by McGrew and Viega [MV04].It internally uses counter mode on top of a blockcipher (see Figure 2).
McGrew and Viega [MV04] and later Iwata et al. [IOM12] proved the following result for GCM with 96-bit nonce.(We express the result in terms of AES as underlying primitive for convenience.)Theorem 2 (GCM [MV04,IOM12]).Let AES : {0, 1} κ × {0, 1} n → {0, 1} n be the AES blockcipher, and τ be the tag length.For any distinguisher D with encryption query complexity at most q, decryption query complexity at most q (= 0 for confidentiality), per-query length at most , and total complexity at most σ, we have for some distinguisher D with the same time complexity as D and making at most q + q + σ + 1 queries.
Proof (Proof (sketch)).We only discuss the high-level structure.The proof consists of three steps: (i) replacing AES by a random permutation π $ ← − perm(n), (ii) subsequently replacing π by a random function f $ ← − func(n, n), and (iii) analyzing (with slight abuse of notation) GCM[f, τ ].In more detail, the following bound for confidentiality/authenticity follows by a hybrid argument: for some distinguishers D , D that make at most q + q + σ + 1 queries.The core part in the analysis of GCM centers around the analysis of GCM based on a random function f , and McGrew and Viega [MV04] and later Iwata et al. [IOM12] proved that which completes the proof.
From high-level inspection of the security analysis of GCM, it becomes clear that FastPRF can be used to improve GCM's security significantly.In more detail, if we use AES-PRF instead of AES, steps (i) and (ii) in the proof merge and become "replacing AES-PRF by a random function f $ ← − func(n, n)."In other words, we simply get instead of (18), and we obtain the following corollary.
Corollary 1.Let AES-PRF : {0, 1} κ × {0, 1} n → {0, 1} n be the AES-PRF construction of Section 3, and τ be the tag length.For any distinguisher D with encryption query complexity at most q, decryption query complexity at most q (= 0 for confidentiality), per-query length at most , and total complexity at most σ, we have for some distinguisher D with the same time complexity as D and making at most q + q + σ + 1 queries.

AES-PRF-GCM-SIV
GCM is notoriously sensitive to nonce repeats, which lead to forgeries and even key recovery [Jou06, BZD + 16].GCM-SIV [GL15, GLL17, LLG17] is an authenticated encryption mode based on GCM that aims to be more robust to such usage failures.In particular, GCM-SIV aims for a slightly different security notion than GCM-misuse-resistant authenticated encryption, or mrAE.This notion comprises ( 14) and (15), with the exception that the requirement that nonces are unique is lifted.
Figure 3: A high-level overview of GCM-SIV with a 128-bit key, one block of associated data, and two blocks of plaintext (resp.ciphertext).Notation matches that of Figure 2. fix 0 sets the most significant bit of a block to 0; fix 1 sets it to 1. indicates truncation of the first 64 bits; denotes concatenation.
There are several variants of GCM-SIV [GL15,IM16,GLL17].In this work, we consider the most recent one, [GLL17], which is also being considered as an IETF RFC [LLG17].It is based on the SIV [RS06] mode, and reuses the individual components of GCM (see Figure 3).The basic GCM-SIV construction uses two keys, one of size n bits, one of size at most 2n bits, and GCM-SIV of [GLL17] uses the DeriveKey mechanism4 to derive two subkeys from a single one in the following way: where trunc truncates by n/2 bits (recall the truncation construction of Section 1.1).
An earlier security analysis of GCM-SIV was performed by Gueron et al. [GL15,GLL17].Iwata and Seurin [IS17] pointed out several shortcomings in the analysis, and performed an improved analysis.
Theorem 3 (GCM-SIV [IS17]).Let AES : {0, 1} κ × {0, 1} n → {0, 1} n be the AES blockcipher.For any distinguisher D that can make encryption queries for at most q u distinct nonces and at most r repeats per nonce, and that can make q D decryption queries with total complexity at most σ D , all with per-query associated data and message length at most a and m , we have for some distinguishers D making at most r( m + 1) + 1 + σ D queries and D making at most 6(q u + q D ) queries (both with the same time complexity as D).
Part (22) of the bound comes from how well the DeriveKey functionality behaves like random, part (23) reflects the security of AES used in GCM-SIV based on two uniformly randomly generated subkeys, and (24) reflects the security of GCM-SIV based on uniformly randomly generated primitives.We remark that the bound of Iwata and Seurin is slightly stronger, having the multi-user PRF security Adv mu-prf AES (D ) instead of (23), where now distinguisher D makes at most r( m + 1) queries for at most q u distinct users and an additional amount of q D + σ D queries freely distributed over all users.We have adopted the slightly simplified bound.
Suppose that, instead of (21), the key is derived using AES-PRF: then this subkey derivation function inherits the PRF security of AES-PRF against a distinguisher D making at most 3(q u +q D ) queries.In other words, part (22) of Theorem 3 becomes Adv prf AES-PRF (D ), for some distinguisher D making at most 3(q u + q D ) queries.In addition, (23) has a term that measures the PRF security of AES, which is at best r( m+1)+1+σD 2 /2 n due to the PRP-PRF switch.By directly using AES-PRF instead of AES, this implicit birthday term gets eliminated.We thus obtain the following corollary.
Corollary 2. Let AES-PRF : {0, 1} κ × {0, 1} n → {0, 1} n be the AES-PRF construction of Section 3.For any distinguisher D that can make encryption queries for at most q u distinct nonces and at most r repeats per nonce, and that can make q D decryption queries with total complexity at most σ D , all with per-query associated data and message length at most a and m , we have for some distinguishers D making at most r( m + 1) + 1 + σ D queries and D making at most 3(q u + q D ) queries (both with the same time complexity as D).

Extension to Tweakable Blockciphers
Tweakable blockciphers are a relatively recent invention formalized by Liskov et al. [LRW02,LRW11].A tweakable blockcipher, as the name implies, is a blockcipher that takes one additional input beyond the message and key-a tweak.The security of a tweakable blockcipher is then defined as the indistinguishability of the construction against a collection of random permutations, one per each key and tweak.The FastPRF construction can be generalized to such designs as well, though more care is necessary to ensure that the tweak and the key contribute to each of the individual permutations.More detailed, if E is a tweakable blockcipher which can be partitioned into groups of round functions as follows: we can define a compressing PRF out of this representation by applying GEDMD: Once again: this construction only works if the groups of rounds are sufficiently strong individually, but in addition, we require that each of the rounds is sufficiently dependent on t.
Unlike the case of blockciphers, there do not exist many native designs of tweakable blockciphers to draw from.Indeed, most tweakable blockciphers in current use are in fact generic blockcipher-based constructions with far from optimal security (e.g., Rogaway's XEX construction [Rog04] is used in OCB2, a large number of CAESAR submissions, and XTS disk encryption).Some particular examples of dedicated tweakable blockcipher designs to apply FastPRF on are Threefish [FLS + 10], SCREAM [GLS + 15], or the more general TWEAKEY [JNP14] framework which is for instance adopted by the developers of SKINNY [BJK + 16].One may for example take SKINNY-128-256 as tweakable blockcipher with a 128-bit state and 256-bit tweakey: it consists of 48 rounds, and SKINNY-PRF-128 can be defined to be SKINNY-128-256 xored with the internal state after 24 rounds.Like the AES, SKINNY has solid design principles; 6 rounds are already sufficient for full diffusion, and 24 rounds are already sufficient to withstand several classes of attacks.However, there has not yet been enough cryptanalytic research on SKINNY to confidently claim that the resulting PRF is heuristically secure.
Assuming the existence of FastPRF, one could use this construction instead of existing tweakable blockciphers in settings where the tweakable blockcipher is not evaluated in inverse direction.For example, consider PMAC1 from Rogaway [Rog04], the tweakable blockcipher based variant of PMAC.Rogaway proved the following result on the PRF security of PMAC1 with tag length n (the definition of Section 2.1.1 generalizes to variable input sizes).
Closer inspection of the security analysis reveals that σ 2 /2 n comes from viewing E as a random function (one could call this a TPRP-TPRF-switch, although a tweakable PRF is just a compressing fixed-input-length PRF).Following a similar reasoning as in Section 4, one can observe that directly using FastPRF in PMAC1 yields the following corollary.
Corollary 3. Let FastPRF : {0, 1} κ × {0, 1} τ × {0, 1} n → {0, 1} n be the construction of (27).For any distinguisher D with encryption query complexity at most q and total complexity at most σ, we have for some distinguisher D with the same time complexity as D and making at most σ queries.
In other words, unlike for the original PMAC1, the security bound of PMAC1 based in FastPRF does not admit a quadratic security loss on σ, provided FastPRF is in turn built on a dedicated tweakable blockcipher.

A Security Against Generic Attacks
We consider how AES-PRF behaves when its underlying permutations are idealized.Patarin [Pat13a] found several attacks against XoP d k1,...,k d dependent on q, the number of oracle queries.Here we adapt the attacks to the two-permutation GEDMD construction used in AES-PRF.

A.1 q = 2 n
Given access to the full codebook, the following property occurs with probability 1: In a random function, this event has probability 2 −n .This yields an attack with advantage of 1 − 2 −n with running time of 2 n xor operations.

A.2 q < 2 n
In this setting, we can distinguish EDMD from a random function by counting the number of collisions.Let n coll (q) be this quantity.In a random function, the expected number of collisions is n coll = q 2 /2 n ; in EDMD it is q 2 /(2 n − 1).This distinguisher stems from the fact that given a collision E k2 (E k1 (x)) ⊕ E k1 (x) = E k2 (E k1 (y)) ⊕ E k1 (y), we equivalently have E k2 (E k1 (x)) ⊕ E k2 (E k1 (y)) = E k1 (x) ⊕ E k1 (y), in which neither side can be 0.
When q < 2 n/2 , the distinguisher simply outputs 1 when a collision exists, and 0 otherwise.The advantage is given by ( q 2 ) 2 2n −2 n ≈ q 2 /2 2n .When q > 2 n/2 , the distinguisher is slightly different: output 1 when n coll ≥ q 2 /2 n , 0 otherwise.The advantage here is more complex to calculate, but Patarin calculates it to be O(q/2 3n/2 ).This attack strategy is likely to be optimal, as it matches the recent asymptotic bound on the sum of permutations by Eberhard [Ebe17, Theorem 1.5].
None of these attacks is particularly threatening to the PRF security of AES-PRF, as no amount of extra computation-short of bruteforcing the key-will be of any help.In effect, the advantage remains negligible even when the attacker obtains nearly the entire codebook.

Table 1 :
Observed speed, in cycles per byte, of AES-128 in counter mode versus AES-PRF-128 in counter mode.

Table 2 :
Distribution of output values of x ⊕ S(x) by number of preimages.