Keywords

1 Introduction

The impossible differential (ID) attack [5, 16] is one of the most important attacks for block ciphers. This attack exploits a pair of input and output differences \((\varDelta _i, \varDelta _o)\) of a (round-reduced) cipher \(E_K\) that cannot be connected for any K. Namely, two plaintexts \(p, p'\) satisfying \(p \oplus p' = \varDelta _i\) never satisfy \(E_K(p) \oplus E_K(p') = \varDelta _o\). Such difference pair \((\varDelta _i, \varDelta _o)\) is then called an ID. The ID attack has been one of the most powerful cryptographic attacks nowadays. For example, it is the first attack that could break 7 rounds of AES-128 [17] and remains the best attack on reduced SKINNY-64 under the single-tweakey setting [13].

In the early days, an ID \((\varDelta _i, \varDelta _o)\) was detected by the miss-in-the-middle approach manually [6] and the details of the Sboxes are usually ignored. The first automated search attempt appeared in [5] with so-called shrink technique. It shrinks the word size to 3 bits and find impossible differentials of the global structure of the cipher by exhaustively testing all possible differences and values. This method is only applicable to those ciphers consisting of a small number of words with a big word size, so it doesn’t work for many modern-day ciphers such as SKINNY [3], CRAFT [4] or GIFT [1], etc. In [15], Kim et al. presented a new automated tool called \(\mathcal {U}\)-method. To detect if \((\varDelta _i, \varDelta _o)\) is impossible, one first propagates \(\varDelta _i\) forwards by \(r_1\) rounds and checks the status of the difference of each output word (known active, active, inactive, or unknown). Then he/she propagates \(\varDelta _o\) backward by \(r_2\) rounds and checks the status again. Finally, if any contradiction occurs, \((\varDelta _i, \varDelta _o)\) is an \((r_1 + r_2)\)-round ID. Several extensions of the \(\mathcal {U}\)-method have been done such as the UID-method by Luo et al. [18] and the method proposed by Wu and Wang [26]. Recently, a constraint-programming-aided version of the \(\mathcal {U}\)-method called \(\mathcal {U}^\star \)-method has been developed by Sun et al.  [22], which allows exhausting all possible plaintext and ciphertext difference patterns automatically. All these methods above focus on truncated IDs, i.e., the contradictions inside the Differential Distribution Tables (DDT) of corresponding Sboxes are not considered. Consequently, we have no way of knowing if we would have gotten more if the information of the DDTs is taken into consideration.

Several attempts focus on the upper bound on the rounds of IDs. At EUROCRYPT 2016, Sun et al. [21] used the primitive index of the characteristic matrix of the linear layer to give upper bounds on the length of IDs for some special Substitution-Permutation-Networks (SPN) block ciphers with the detail of the Sbox omitted. They proved that under some special conditions, the existence of IDs relies on the existence of low-weight IDs [21, Theorem 1]. In [25], by using linear algebra the authors gave a practical method that could give upper bounds on the length of IDs for any SPN block cipher when omitting the differential property of the Sboxes. Currently, all systematical methods for bounding the length of IDs are all without considering the Sbox details.

Another line of detecting IDs starts independently from [10, 20]. The MILP model for searching for differential characteristics is simply modified by adding additional specific constraints on the plaintext and ciphertext differences. If the model is infeasible, the corresponding plaintext and ciphertext difference pair is an ID. Compared to the previous methods, this method can detect all kinds of contradictions (with the assumption that the round keys are uniformly random, which is a default assumption of this paper). However, since the constraints on the plaintext/ciphertext differences are fixed, the number of models we need to solve is equivalent to the number of difference pairs we want to check. Exhaustively checking all plaintext and ciphertext difference pairs is clearly computationally infeasible. Actually, for a cipher with block size n, the search space is as large as \(2^{2n}\). Based on the intuition that the longest IDs are usually caused by difference pairs with a small number of active bits or words in both plaintext and ciphertext ends, users of this model prefer to test only a small proportion of the difference pairs with one or two active bits or words for plaintext and ciphertext differences. Nowadays, the model has been very popular in measuring the security strength of newly designed ciphers against ID attacks. For example, the designers of GIFT [1] took it to prove that there does not exist any ID with one-active nibble against 7 rounds of GIFT-64. The designers of CRAFT searched for IDs with plaintext and ciphertext differences having at most two active nibbles and they found twelve 13-round IDs [4].

Apart from these works, it is worth mentioning that in [24] Wang and Jin proved that there is no ID for 5-round AES even with the information of the DDT based on some careful mathematical analyses. But unfortunately, this method is specifically designed for AES only. Generally speaking, the MILP method is much more convenient than other methods, since it only needs some slight modifications from the MILP models for searching for differential characteristics. However, as we mentioned, all current MILP models can check a small number of the difference pairs. How to tackle the huge search space has been a long-standing problem.

Contributions. In this paper, we propose a systematic method based on the MILP model to find all IDs in the whole search space. As mentioned above, to exhaust all input and output difference pairs requires a complexity of \(2^{2n}\) which is infeasible. Our method delicately partitions the whole search space and efficiently excludes those containing no IDs. The search space is then reduced to a reasonable size. Finally, the remaining IDs (if they exist) can be detected with the plain MILP models. If our method finds no IDs for the r-round cipher, we know that there exists no ID for this r-round cipher. The provable security is thus achieved.

Our method is efficient for SPN ciphers with a block size equal to 64. For SKINNY-64, we find all IDs for 11 and 12 rounds in 4 and 1.5 h, respectively. Interestingly, all these IDs can be assembled into 432 and 12 truncated IDs for 11 and 12 rounds. We also prove, for the first time, that the 13-round SKINNY-64 is free of ID distinguishers with considering the DDT information. Similarly, we find all 2,700 IDs for 13-round CRAFT which is equivalent to 12 truncated IDs and prove there is no ID for 14 rounds. For Substitution Bit-Permutation Network (SbPN) cipher GIFT-64, we prove that there is no ID for 8 rounds.

Our method is also useful to improve the current search strategies for ciphers with large blocks. We show its usage in applications to Rijndael-192 and GIFT-128 as examples. For Rijndael-192, we search for IDs under the arbitrary Sbox/MC mode, i.e., only the activeness of an Sbox and the branch number of the MixColumn operation would be considered (which is inspired by the arbitrary Sbox model [20]). In this scenario, we show that all 6,750 truncated IDs of 6-round Rijndael-192 can be identified in 1 s, and prove there is no truncated ID for 7 rounds. In previous methods, we need to solve approximately \(2^{48}\) plain MILP models to achieve this. For GIFT-128, we search for IDs that have one active superbox in both plaintext and ciphertext differences. In previous methods, we need to solve \(2^{38}\) plain MILP models, now with our new tool, we only need to handle 4,608 MILP models. We prove that there is no ID with one active superbox in both ends for 8-round GIFT-128. We list all our application results in Table 1 for readers’ quick reference.

Implications of Finding All IDs. On the one hand, if our new model finds no ID for a (round-reduced) cipher, we achieve a more thorough security proof for the cipher against ID distinguishers compared to [20]. On the other hand, it is also useful to list all IDs for a (round-reduced) cipher. Firstly, different IDs would affect the concrete attacking phase as well as the data and time complexity. In terms of the ID distinguishers, more active bits in the output mean less data/time complexities. In terms of the key recovery attacks, IDs with good input and output difference patterns may have a better performance, e.g., the ID attacks on AES were improved with alternative IDs [17]. Secondly, finding all IDs (with or without considering the DDT) is a long-standing challenge in cryptanalysis and cipher design. Finding out all IDs can help us understand better the structure of target ciphers and the ID attack itself.

We highlight that all IDs we discuss in this paper are under the assumption that the round keys are uniformly random. All source codes of this work are provided in the git repository https://github.com/hukaisdu/SearchForID.git to help readers understand our tool better.

Table 1. The application results of this paper. N in the #ID column means no IDs. #Space is the size of the whole search space with plain MILP models.

Organization of the Remaining Paper. In Sect. 2, we introduce the notations and some global settings used in this paper. In Sect. 3, we show how to partition the whole search space and quickly exclude those containing no IDs and identify all IDs in the remaining candidates. Applications to SKINNY-64, CRYFT and GIFT-64 are presented in Sects. 4 and 5. In Sect. 6, we discuss how to apply our idea to enhance some traditional search strategies based on MILP for large-size ciphers. In Sect. 7 we conclude our paper and highlight two future works.

2 Preliminaries

2.1 Notations and Definitions

In this paper, we are only interested in the differences, so the differences are directly represented by lowercase letters such as x rather than conventional \(\varDelta x\). Consider a (round-reduced) cipher E, if (xy) is an ID over E, we write it as \(x {\mathop {\nrightarrow }\limits ^{E}} y\). Conversely, \(x \xrightarrow {E} y\) means x can propagate to y over E, i.e., (xy) is a possible pattern. We use uppercase letters to represent the sets of differences such as X and Y. \(X \xrightarrow {E} Y\) means for all \((x, y) \in X \otimes Y\) we have \(x \xrightarrow {E} y\). \(x \xrightarrow {E} X\) is equivalent to \(\{x\} \xrightarrow {E} X\). Similarly, \(X \xrightarrow {E} y\) is equivalent to \(X \xrightarrow {E} \{y\}\). \(X \otimes Y \) (sometimes we use \(X \bigotimes Y\) for a better looking of a complicated equation) is defined as \(\{ (x, y) : x\in X, y \in Y \}\). If \(X \cap Y = \varnothing \), we would write \(X \cup Y\) as \(X + Y\) to highlight \(X \cap Y = \varnothing \). Generally, if \(\bigcap _i X_i = \varnothing \), the union set of all \(X_i\) is written as \(\sum _i X_i\). \(X - Y\) is defined as \(\{ x \in X: x \notin Y \}\).

Global Settings. A modern block cipher usually iterates a simple round function many times with different round keys. So we can always decompose a (round-reduced) cipher into three parts, say \(E = E_2 \circ E_1 \circ E_0\). We denote the input difference/set of \(E_0\), \(E_1\) and \(E_2\) by \(x_0/X_0, x_1/X_1\) and \(x_2/X_2\) respectively, and output difference/set of E by \(x_3/X_3\). See Fig. 1 for details of the settings. In the remaining paper, if we do not specify \(x_0, x_1, x_2, x_3\) and \(X_0, X_1, X_2, X_3\), they denote the difference or sets as defined here.

Fig. 1.
figure 1

The global settings of our theoretical model.

2.2 Current MILP Model for Detecting IDs

In [9, 10, 20], the MILP models for detecting IDs are independently proposed. This method is developed from the MILP models for searching for differential characteristics [19, 23] by adding some additional constraints on the plaintext and ciphertext differences. To construct the MILP model for checking if a given difference pair \((\varDelta _i, \varDelta _o)\) for a cipher E is impossible, we first declare a sequence of variables to represent input and output differences for all components of E such as Sboxes and linear layers. Next, we use inequalities to force these variables to be legal patterns that are compatible with the differential propagation rules of the corresponding components. Thus, any solutions satisfying these constraints are legal differential characteristics. Additionally, suppose the variables representing the differences of plaintext and ciphertext are \(u_0\) and \(u_r\), we add two more constraints as

$$ u_0 = \varDelta _i,\ u_r = \varDelta _o. $$

If the overall MILP model is feasible, there is one differential characteristic propagating from \(\varDelta _i\) to \(\varDelta _o\), i.e., \((\varDelta _i, \varDelta _o)\) is a possible differential. Otherwise, \((\varDelta _i, \varDelta _o)\) is an ID.

According to different methods in which we use inequalities to describe the differential propagations over an Sbox or linear layer, the capabilities to detect IDs of the corresponding MILP models are also different. For example, if details of the DDT and linear layers are all described, then all kinds of contradictions could be detected. This is the default mode we use in this paper. If only the information that an Sbox is active or not and the branch number of a linear layer is described in the MILP search model, truncated IDs could be detected. We refer to such a model as the arbitrary Sbox/MC model. In this paper, the application to Rijndael-192 is the only instance using this mode. We will assume that the readers of this paper have been familiar with the plain MILP models for detecting IDs. Or we refer the readers to [10, 20] for more details of this topic.

3 Finding All Impossible Differentials

Taking the MILP model in [10, 20], we need to solve \(2^{2n}\) models for a cipher with block size n bits to check all input and output difference pairs. The search space is obviously too large. So we partition the whole search space into many smaller sets and then process each set one by one to exclude those containing no IDs. For the remaining smaller sets that have the potential to contain IDs, we apply several methods to identify all IDs contained by them.

Main Idea. To determine if \((x_0, x_3)\) is possible or not, we try to find a pair of \((x_1, x_2)\) satisfying

$$\begin{aligned} x_0 \xrightarrow {E_0} x_1 \xrightarrow {E_1} x_2 \xrightarrow {E_2} x_3. \end{aligned}$$

Obviously, if such \((x_1, x_2)\) exists, \((x_0, x_3)\) is possible. Further, if we have known \(x_1 \xrightarrow {E_1} x_2\), then all \((x_0, x_3)\) satisfying (1) \(x_0 \xrightarrow {E_1} x_1\) (2) \(x_2 \xrightarrow {E_1} x_3\) are possible. We have the following proposition (also illustrated by Fig. 2a),

Fig. 2.
figure 2

The illustration of Proposition 1 and the implication.

Proposition 1

  Let \(E = E_2 \circ E_1 \circ E_0\), \(X_0 \subseteq \mathbb {F}_2^n\) be a set of differences satisfying \(X_0 \xrightarrow {E_0} x_1\) and \( X_3 \subseteq \mathbb {F}_2^n\) satisfying \(x_2 \xrightarrow {E_2} X_3\). If \(x_1 \xrightarrow {E_1} x_2\), then \(X_0 \xrightarrow {E} X_3\).

The proof is obvious from above analyses, so we omit it here. If \(x_1 {\mathop {\nrightarrow }\limits ^{E_1}} x_2\), we cannot predict anything so we say \(X_0 \otimes X_3\) has the potential to contain IDs, see Fig. 2b. We conclude it into a corollary of Proposition 1 for better understanding this fact.

Corollary 1

With the same notations as Proposition 1, if \((x_0, x_3) \in X_0 \otimes X_3\) is an ID, then all \(x_1\) and \(x_2\) satisfying \(X_0 \xrightarrow {E_0} x_1\) and \(x_2 \xrightarrow {E_2} X_3\) cannot be connected, i.e., \((x_1, x_2)\) must satisfy \(x_1 {\mathop {\nrightarrow }\limits ^{E_1}} x_2\).

To make our method more general, we assume we study a question how to find all IDs in the sets \(X_0 \otimes X_3\) over E, where \(X_0\) and \(X_3\) can be subsets of \(\mathbb {F}_2^n\). Thus, now we want to find all \((x_0, x_3) \in X_0 \otimes X_3\) satisfying \(x_0 {\mathop {\nrightarrow }\limits ^{E}} x_3\).

Our method consists of three steps:

  1. (1)

    Partition the whole set \(X_0 \otimes X_3\) into many non-overlapping smaller sets, i.e.,

    $$ X_0 \otimes X_3 = \sum _{i} \sum _{j} X^i_0 \otimes X^j_3, \text{ where } X_0 = \sum _{i} X^i_0, \ X_3 = \sum _{j} X^j_3 $$

    For each pair ij, we require that there always exist \(x^i_1\) and \(x^j_2\) satisfying \(X^i_0 \xrightarrow {E_0} x^i_1\) and \(x^j_2 \xrightarrow {E_2} X^j_3\), respectively;

  2. (2)

    Exhaustively check all possible \((x^i_1, x^j_2)\) pairs to see if \(x^i_1 \xrightarrow {E_1} x^j_2\) by MILP models introduced in [10, 20]. \(X_0^i \otimes X^j_3\) contains no IDs if \(x^i_1 \xrightarrow {E_1} x^j_2\), and otherwise has a potential to contain some IDs;

  3. (3)

    Process those \(X_0^i \otimes X^j_3\) that have the potential to contain IDs one by one to identify all IDs with specific strategies that we will introduce later.

3.1 Partition: A Theoretical Viewpoint

In this paper, we always assume that \(E_0\) and \(E_2\) are non-linear functions, so there exists an expansion property for the difference propagation over \(E_0\) and \(E_2\). Consequently, it is possible for us to find two smaller sets \(X_1\) and \(X_2\) satisfying

  1. (1)

    \(\forall x_0 \in X_0\), \(\exists x_1 \in X_1\) s.t. \( x_0 \xrightarrow {E_0} x_1\),

  2. (2)

    \(\forall x_3 \in X_3\), \(\exists x_2 \in X_2\) s.t. \(x_2 \xrightarrow {E_2} x_3\).

We call \(X_1\) a representative set of \(X_0\) over \(E_0\) while \(X_2\) a representative set of \(X_3\) over \(E^{-1}_2\). Suppose we have obtained one such representative set \(X_1\), we know the following

$$ \bigcup _{x_1 \in X_1} \left\{ x_0 \in X_0 : x_0 \xrightarrow {E_0} x_1 \right\} = X_0. $$

By removing the overlapping elements among \(\left\{ x_0 \in X_0 : x_0 \xrightarrow {E_0} x_1 \right\} \) for all \(x_1 \in X_1\), we get a partition of \(X_0\) which can be stored in a hash table with the elements in \(X_1\) as keys and sets after partitioning as values (similar to \(X_3\) and \(X_2\)). We call such a hash table a partition (hash) table of \(X_0\) over \(E_0\). An intuitive algorithm for determining one representative set as well as the corresponding partition table for a non-linear function and its input difference set is given in Algorithm 1.

Algorithm 1
figure a

Determine a representative set and partition table of X over f

The basic idea of Algorithm 1 is to select representative for X one by one and exclude corresponding elements from X until X is reduced to an empty set. The complexity of Algorithm 1 is roughly limited by \(\mathcal {O}(|X|)\) times of loops (line 3–10). The operations in line 5 and 8 determine the real time of this algorithm, whose complexity is at most \(2^{2\log |X|}\) (the complexity of computing the whole DDT of f). Thus the overall complexity of Algorithm 1 is bounded by \(\mathcal {O}(2^{3\log |X|})\). Note that the actual complexity should be much less than this upper bound for the number of loops usually small. In applications of this paper, f will be at most a 16-bit-input function, so this algorithm is practical.

We first apply Algorithm 1 to \(X_0\) to obtain its representative set \(X_1\) and a partition table \(H_1\) over \(E_0\), i.e.,

$$\begin{aligned} X_0 = \sum _{x_1 \in X_1} H_1[x_1] \end{aligned}$$
(1)

Similarly we get the representative set \(X_2\) and partition table \(H_2\) of \(X_3\) over \(E^{-1}_2\), i.e.,

$$\begin{aligned} X_3 = \sum _{x_2 \in X_2} H_2[x_2] \end{aligned}$$
(2)

Then the whole search space \(X_0 \otimes X_3\) has been partitioned into \(|X_1| \times |X_2|\) smaller sets through combining Eqs. (1) and (2), i.e.,

$$\begin{aligned} X_0 \otimes X_3 = \sum _{x_1 \in X_1} H_1[x_1] \bigotimes \sum _{x_2 \in X_2} H_2[x_2] = \sum _{x_1 \in X_1} \sum _{x_2 \in X_2} H_1[x_1] \otimes H_2[x_2] \end{aligned}$$
(3)

Figure 3 demonstrates the partition of \(X_0 \otimes X_3\). A partition of \(X_0 \otimes X_3\) can be uniquely determined by a quartet \((X_1, H_1, X_2, H_2)\). For simplicity of description, we define the partition of a (round-reduced) cipher.

Definition 1 (Partition)

[Partition] For a (round-reduced) cipher \(E = E_2 \circ E_1 \circ E_0\), a partition of its whole input and output difference spaces is a set of smaller sets as follows,

$$ \mathbb {P}(X_1, H_1, X_2, H_2) = \left\{ H[x_1] \otimes H[x_2] : x_1 \in X_1, x_2 \in X_2 \right\} $$

When there is no ambiguity, we just say \(\mathbb {P}\) is a partition of E.

Fig. 3.
figure 3

The partitions of the input difference set \(X_0\) and \(X_1\). Since \(x^0_1 {\mathop {\nrightarrow }\limits ^{E_1}} x^{\delta '}_2\), \(H_1[x^0_1] \otimes H_2[x^\delta _2]\) has potential to contain IDs. Since \(x^1_1 \xrightarrow {E_1} x^1_2\), \(H_1[x^1_1] \otimes H_2[x^1_2]\) contains no IDs.

3.2 Partition: A Practical Viewpoint

If we apply directly Algorithm 1 to \(\mathbb {F}_2^n\), the complexity is not affordable even for a 64-bit block cipher. However, an important observation is that SPN ciphers usually comprise several smaller parallel parts. The well-known examples include the superboxes used in AES-like ciphers such as SKINNY [3] and CRAFT [4]. Two continuous rounds can be represented by 4 parallel superboxes. Another example is GIFT [1] which follows a so-called Substitution bit-Permutation Network (SbPN) paradigm. All Sboxes of the i-th round of GIFT, denoted by \(Sb^i_0, Sb^i_1, \ldots , Sb^i_s\) where \(s = n/4\) and n is the block size, can be grouped in two different ways – the Quotient and Remainder groups, Qx and Rx, defined as

  • \(Qx = \left\{ Sb_{4x}, Sb_{4x+1}, Sb_{4x + 2}, Sb_{4x + 3} \right\} \),

  • \(Rx = \left\{ Sb_{x}, Sb_{x + q}, Sb_{x + 2q}, Sb_{x + 3q} \right\} \), where \(q = \frac{s}{4}, 0 \le x \le q - 1\).

Taking GIFT-64 as an instance, the 16-bit output of \(Qx^i = \{Sb^i_{4x}, Sb^i_{4x+1}, Sb^i_{4x+2}, Sb^i_{4x+3}\}\) map to input bits of \(Rx^{i+1} = \{ Sb_x^{i+1}, Sb_{x+4}^{i+1}, Sb_{x+8}^{i+1}, Sb_{x+12}^{i+1}\}\). Then the interfacing two rounds of GIFT-64 can be also represented by 4 parallel superboxes. An illustration for GIFT-64 is shown in Fig. 4.

Fig. 4.
figure 4

The superbox representation of GIFT-64 based on the two groups (Quotient and Remainder) of Sboxes.

In the remaining part of this paper, we focus on these SPN or SbPN ciphers with superboxes. Suppose \(E_0\) and \(E_2\) comprise respectively of m parallel superboxes, denoted by \(E_0 = E_{0,0} || E_{0,1} || \cdots || E_{0, m-1}\) and \(E_2 = E_{2,0} || E_{2,1} || \cdots || E_{2, m-1}\), where the size of input and output of \(E_{i,j}, i\in \{0,2\}, j\in \{0,1,\ldots , m-1\}\) is n/4 bits. Then we apply Algorithm 1 to each \(E_{i,j}\), which is a function with 16-bit block size.

For \(i \in \{0,1,\ldots , m-1\}\), let \(X_{1,i}\) and \(H_{1,i}\) be representative sets and partition tables for \(E_{0,i}\) of its input difference set \(X_{0,i}\) while \(X_{2,i}\) and \(H_{2,i}\) the representative sets and partition tables for \(E_{2,i}^{-1}\) of \(X_{3,i}\). The Eqs. (1) and (2) can be re-written as

$$\begin{aligned} \begin{aligned} X_0 & = \bigotimes _{0 \le j < m} X_{0, j} = \bigotimes _{0 \le j < m} \left( \sum _{x_{1,j} \in X_{1,j}} H_{1,j}[x_{1,j}]\right) \\ &= \sum _{x_{1,0} \in X_{1,0}} \cdots \sum _{x_{1,m-1} \in X_{1,m-1}} H_{1,0}[x_{1,0}] \otimes \cdots \otimes H_{1,m-1}[x_{1,m-1}]. \end{aligned} \end{aligned}$$
(4)

Similarly,

$$\begin{aligned} \begin{aligned} X_3 & = \bigotimes _{0 \le j < m} X_{3, j} = \bigotimes _{0 \le j < m} \left( \sum _{x_{2,j} \in X_{2,j}} H_{2,j}[x_{2,j}]\right) \\ &= \sum _{x_{2,0} \in X_{2,0}} \cdots \sum _{x_{2,m-1} \in X_{2,m-1}} H_{2,0}[x_{2,0}] \otimes \cdots \otimes H_{2,m-1}[x_{2,m-1}]. \end{aligned} \end{aligned}$$
(5)

See Fig. 5 for a better understanding to Eq. (4).

Fig. 5.
figure 5

The partition of the input difference set \(X_0\) (the case \(i=0\) in Eq. (4)) based on 2 superboxes \((m=2)\).

Then we can accordingly rewrite Eq. (3) as

$$\begin{aligned} \begin{aligned} X_0 \otimes X_3 &= \bigotimes _{i = 1,2} \left( \sum _{x_{i,0} \in X_{i,0}} \cdots \sum _{x_{i,m-1} \in X_{i,m-1}} H_{i,0}[x_{i,0}] \otimes \cdots \otimes H_{i,m-1}[x_{i,m-1}] \right) \\ &= \sum _{x_{1,0} \in X_{1,0}} \cdots \sum _{x_{2,m-1} \in X_{2,m-1}} H_{1,0}[x_{1,0}] \otimes \cdots \otimes H_{2,m-1}[x_{2,m-1}] \end{aligned} \end{aligned}$$
(6)

That is to say, considering the superbox effects, we can partition the whole difference space into \(\prod _{i = 1,2; j = 0,1,\ldots , m-1} |X_{i,j}|\) smaller sets. A partition of E is consequently updated to

$$ \begin{aligned} &\mathbb {P}(X_{1,0}, H_{1,0}, \ldots , X_{2,m-1}, H_{2,m-1}) \\ &~~= \left\{ H[x_{1,0}] \otimes \cdots \otimes H[x_{2,m-1}] : x_{i,j} \in X_{i,j} \text{ for } i = 1,2; 0 \le j < m \right\} \end{aligned} $$

which is related to a tuple with \(4\cdot m\) elements.

3.3 Solving MILP Models for \(E_1\)

Once we get a partition \(\mathbb {P}(X_{1,0}, H_{1,0}, \ldots , X_{2,m-1}, H_{2,m-1})\) of E, we construct

\(\prod _{i = 0,1; j = 0,\ldots , m-1} |X_{i,j}|\) MILPs for each elements \((x_{1,0}, \ldots , x_{2,m-1}) \in X_{1,0} \otimes \cdots \otimes X_{2,m-1} \) to see whether \((x_{1,0}, \ldots , x_{1,m-1}) \xrightarrow {E_1} (x_{2, 0}, \ldots , x_{2,m-1})\). If the MILP model for \((x_{1,0}, \ldots , x_{2,m-1})\) is feasible, we do not need to consider \(H_{1,0}[x_{1,0}] \otimes \cdots \otimes H_{2,m-1}[x_{2,m-1}]\) any more. Otherwise, \(H_{1,0}[x_{1,0}] \otimes \cdots \otimes H_{2,m-1}[x_{2,m-1}]\) has the potential to contain some IDs, we have to proceed with it in the next step.

3.4 Identify All IDs in Remaining \(H_{1,0}[x_{1,0}] \otimes \cdots \otimes H_{2,m-1}[x_{2,m-1}]\)

The final step is to handle the remaining \(H_{1,0}[x_{1,0}] \otimes \cdots \otimes H_{2,m-1}[x_{2,m-1}] \in \mathbb {P}\) that survive the second step one by one. We mainly introduce two methods to find all IDs in each \(H_{1,0}[x_{1,0}] \otimes \cdots \otimes H_{2,m-1}[x_{2,m-1}]\) in this subsection.

Direct Search. Considering the case when the size of \(H_{1,0}[x_{1,0}] \otimes \cdots \otimes H_{2,m-1}[x_{2,m-1}]\) is small. Let \(\sigma = \prod _{i = 1,2; 0 \le j < m } |H_{i,j}|\), for example, we say the size is small when \(\sigma \le 2^{28}\), we can just directly test every pattern with a MILP model as [10, 20]. All IDs contained in \(H_{1,0}[x_{1,0}] \otimes \cdots \otimes H_{2,m-1}[x_{2,m-1}]\) can be found naturally.

To enhance the efficiency of this step, we introduce the fast reducing technique. For a randomly chosen plaintext-ciphertext difference pair from \(H_{1,0}[x_{1,0}] \otimes \cdots \otimes H_{2,m-1}[x_{2,m-1}]\), we construct the MILP model to see if this pair is an ID. If this pair is truly an ID, we continue checking whether it belongs to a truncated ID. If so, we will record this truncated ID and remove all related IDs that belong to this truncated ID from the search pool. If this ID doesn’t belong to any truncated ID, we record it and proceed with another pair. If it is not an ID, then a difference characteristic will be returned by the MILP solver. We extract the values of \(x^\star _1\) and \(x^\star _2\) of this characteristic (this is easy with the interface of MILP solvers), and calculate two sets \(X_0^\star \) and \(X_3^\star \) satisfying \(X_0^\star \xrightarrow {E_0} x_1^\star \) and \(x^\star _2 \xrightarrow {E_1} X_3^\star \). Thus all patterns in \(X_0^\star \otimes X_3^\star \) are all possible, we only need to proceed with \(H_{1,0}[x_{1,0}] \otimes \cdots \otimes H_{2,m-1}[x_{2,m-1}] \backslash X_0^\star \otimes X_3^\star \) until all patterns are determined as possible or impossible.

Partition Further. If \(\sigma \) is larger (e.g., \(\sigma > 2^{28}\)), exhausting all patterns is not a good idea. We can apply Algorithm 1 to sets in every \(H_{i, j}, i \in \{1,2\}, j \in \{0,1,\ldots , m\}\) and repeatedly partition \(H_{1,0}[x_{1,0}] \otimes \cdots \otimes H_{2,m-1}[x_{2,m-1}]\) into several smaller sets. We handle each smaller set according to its size recursively until the size is below the threshold and can be handled by a direct search.

The whole procedure for identifying all IDs among \(X_0 \otimes X_3\) over E is demonstrated in Algorithm 2.

Algorithm 2
figure b

Find all IDs over a cipher E

4 Applications to AES-Like SPN Ciphers

One of the standard ways for designing a good round function from an Sbox and an MDS mapping is the one followed by the AES [12] and is known as the wide trail strategy [11]. Some newly proposed lightweight ciphers also follow the AES structure by replacing the MDS matrix with simple ones such as SKINNY [3] and CRAFT [4]. We say these ciphers are AES-like. In this section, we show how to apply our methods to SKINNY-64, the application to CRAFT is provided the full version.

The experiments are conducted by Gurobi Solver (version 9.1.1) on a workstation with 2\(\times \)AMD EPYC 7302 16-core (32 siblings) Processor 3.3 GHz, (a total 64 threads), 256G RAM, and Ubuntu 20.10.

Application to SKINNY-64. Our first application is to SKINNY-64. The block cipher family SKINNY was presented at CRYPTO 2016 [3] designed under the TWEAKEY framework [14], whose goal is to compete with the NSA design SIMON [2] in terms of hardware/software performance. According to the length of block and tweakey, the SKINNY family consists of 6 different members represented as SKINNY-n-t, where \(n = 64, 128\) and \(t = n, 2n, 3n\), which respectively represent the sizes of the block and tweakey. We are only interested in the security of the 64-bit version of SKINNY in this paper, i.e., SKINNY-64, under the single tweakey model. The round function of SKINNY-64 comprises five operations as SubCells (SC), AddConstants (AC), AddRoundTweakey (ART), ShiftRows (SR) and MixColumns (MC). Since we only consider the single-tweakey scenario, we can ignore the ART and AC operations and pay attention to the remaining three ones. Therefore, a round of SKINNY can be written as

$$ R = \textsf {MC} \circ \textsf {SR} \circ \textsf {SC} $$

When applying Algorithm 2 to r-round SKINNY-64, we rearrange the functions in the r rounds as

$$ R^r = \underbrace{\textsf {SC} \circ \textsf {MC} \circ \textsf {SC}}_{E_2} \circ \underbrace{ \textsf {SR} \circ R^{r-4} \circ \textsf {MC} \circ \textsf {SR} }_{E_1} \circ \underbrace{ \textsf {SC} \circ \textsf {MC} \circ \textsf {SC}}_{E_0} $$

As can be seen, the \(\textsf {SR}\) in the first round and \(\textsf {MC} \circ \textsf {SR}\) in the last round are omitted for they do not affect our analysis. \(E_0\) and \(E_2\) consist of four parallel superboxes \(E_{0,i}\) and \(E_{2,i}\) for \(i = 0,1,2,3\), respectively.

We apply Algorithm 1 to the four superboxes of \(E_0\) and the four inverse superboxes of \(E^{-1}_2\). The representative sets we calculated in the experiment are listed in Table 2. Note that the four superboxes are identical as well as their representative sets and partition tables.

Table 2. Representative sets of the superboxes \(E_{0,i}\) and \(E^{-2}_{2,i}\) for \(i = 0,1,2,3\), of SKINNY-64

As is seen, each representative set of the superbox of \(E_{0,i}\) and \(E^{-1}_{2,i}\) contains only 7 values, so the sizes of \(X_1\) and \(X_2\) are both \(7^4 -1 = 2,400\) non-zero values. Considering the rotational symmetry of SKINNY-64, we can remove the rotationally-symmetric elements in \(X_1\). After this treatment, only 615 elements remain in \(X_1\). Therefore, the total number of MILP models we need to solve is \(615 \times 2,400 = 147,600 \approx 2^{20.5}\). Since these MILP models are treated independently, we can solve them by a parallel strategy based on multi-threading programming.

11-Round. For 11-round SKINNY-64, We need to solve \(2^{20.5}\) MILP models for \(11 - 4 = 7\) rounds. These MILP models were solved in roughly 4 h. There are 618 infeasible patterns \((x_{1,0}, \ldots , x_{1,3}, x_{2,0}\ldots , x_{2,3}) \in X_{1,0} \otimes X_{1,1} \otimes X_{2,0}\cdots \otimes X_{2,3}\). We then try to identify all the IDs from the corresponding sets related to these 618 patterns. Finally, since every ID belongs to one certain truncated ID, we identified all 432 truncated IDs. All these IDs are provided in our git repository.

12-Round. For 12-round SKINNY-64, we need to solve \(2^{20.5}\) MILP models for 8 rounds. Solving these MILP models cost about 1.5 hours in our cluster. Only 15 patterns out of them \((x_{1,0}, x_{1,1}, x_{2,0}\ldots , x_{2,3}) \in X_{1,0} \otimes X_{1,1} \otimes X_{2,0}\cdots \otimes X_{2,3}\) are infeasible. Among them, we extracted 2,700 IDs, which are assembled into 12 truncated IDs. These IDs are identical to those reported in previous works [3, 22].

13-Round. We prove that there does not exist any ID for 13-round SKINNY-64 even with consideration of the details of Sboxes and linear layers.

5 Applications to SbPN Cipher GIFT-64

Substitution and bit-Permutation Network (SbPN) is a special SP network where the permutation layer takes a bit shuffle rather than a word-oriented diffusion. It was introduced by the first ultra-lightweight block cipher PRESENT at CHES 2007 [7], and recently refined by GIFT at CHES 2017 [1]. SbPN consists of a layer of Sboxes (denoted by S), a bit shuffle (denoted by P), and a layer of key/constant addition only, which can be very efficient, especially in hardware implementation. In this section, we show how to apply our methods to GIFT-64. This paper is only interested in the single-key scenario, we do not need to consider the key/constant addition. So a round function of GIFT-64 is viewed as \(R = \textsf {P} \circ \textsf {S}\). As is mentioned in Sect. 3.2 and also shown in Fig. 6, two interfacing rounds of GIFT-64 can be viewed as four parallel superboxes. Then we have

$$ \textsf {S} \circ \textsf {P} \circ \textsf {S} = \textsf {P}_2 \circ \textsf {S} \circ \textsf {P}_1 \circ \textsf {S}, $$

where \(\textsf {P}_1\) consists of four identical small bit shuffles and \(\textsf {P}_2\) is a word-oriented permutation. \(\textsf {P}_1\) and \(\textsf {P}_2\) in GIFT-64 are illustrated in Fig. 6. Let \(\textsf {P}_1 = \textsf {P}'_1 || \textsf {P}'_1 || \textsf {P}'_1 || \textsf {P}'_1\), then \(\textsf {P}'_1\) and \(\textsf {P}_2\) are as follows (inside each permutation, the 0-th is the leftmost unit),

$$ \begin{aligned} \textsf {P}'_1 = [12, 1, 6, 11, 8, 13, 2, 7, 4, 9, 14, 3, 0, 5, 10, 15], \\ \textsf {P}_2 = [0, 4, 8, 12, 1, 5, 9, 13, 2, 5, 10, 14, 3, 6, 11, 15]. \\ \end{aligned} $$
Fig. 6.
figure 6

An equivalent representation of the GIFT-64 round functions.

We thus rearrange the round functions of an r-round GIFT-64 cipher as

$$ R^r = \underbrace{\textsf {S} \circ P_1 \circ \textsf {S}}_{E_2} \circ \underbrace{R^{r-4} \circ \textsf {P} \circ \textsf {P}_2}_{E_1} \circ \underbrace{\textsf {S} \circ \textsf {P}_1 \circ \textsf {S}}_{E_0} $$

Again, we apply Algorithm 1 to superboxes of \(E_{0,i}\) and \(E^{-2}_{2,i}\) and get the representative sets as shown in Table 3. The number of elements in the representative sets of superboxes for \(E_{0,i}\) are 10, and for \(E^{-1}_{2,i}\) is 9, so the number of non-zero elements in \(X_1 \otimes X_2\) is \((10^4 - 1) \times (9^4 - 1) = 65,593,440 \approx 2^{26}\). We then need to construct \(2^{26}\) MILP models with these patterns for \(E_1\).

In the specification, the designers showed that there do not exist any impossible differentials with 1-active nibble against 7 rounds of GIFT-64. We then check the 7-round GIFT-64 first, unfortunately, after the first step of Algorithm 2 costing about 12 h, there are too many (363,510) impossible patterns out of the \(2^{26}\) patterns. We find that processing these bad inner patterns costs a significant amount of time, this may imply that 7 rounds is the borderline of whether IDs exist or not. Considering that GIFT-64 is a 28-round cipher, 7 round IDs (even if an ID exists) should not threaten its security, we do not pursue the exact security proof for 7 rounds.

For 8-round GIFT-64, these \(2^{26}\) MILP models can be processed within 17 h, where only 236 are impossible. With another 4 min, all the 236 patterns can be processed and none of them imply IDs. In other words, 8-round GIFT-64 is free of any IDs.

Table 3. The representative sets of superboxes \(E_{0,i}\) and \(E^{-1}_{2,i}\) for \(i = 0,1,2,3\), of GIFT-64

6 Towards Large-Size Ciphers

Our tool works very well on 64-bit SPN and SbPN ciphers. However, its efficiency for ciphers with larger blocks is not high. Note that in the second step of Algorithm 2 (also introduced in Sect. 3.3), we still need to solve a set of MILP models whose size can be larger than \(2^{20}\). For large-size ciphers, this step would take a lot of time. However, with some compromises between accuracy and efficiency, our method can be very useful to strengthen some existing search strategies for large-size ciphers. In this section, we show how to use our idea to enhance the search for the large-size cipher Rijndael-192. The application to GIFT-128 is provided in the full version.

Application to Rijndael-192. In [20], Sasaki and Todo also encountered similar problems with the efficiency when processing 8-bit Sbox ciphers, so they took a degenerated version of the MILP model called the arbitrary Sbox mode to boost the search efficiency. The arbitrary Sbox mode allows the model to ignore the details of the Sbox while mainly reflecting the property of the linear layers, thus the search time can be saved a lot. Despite this compromise, their MILP model still cannot work to search for all plaintext and ciphertext difference pairs. Let s be the number of Sboxes of a cipher, then the whole search space is \(2^{2s}\), e.g., for Rijndael-192 that has 24 Sboxes, the search space is approximately \(2^{48}\) which is also very costly.

Inspired by their work, we can also boost our search efficiency by ignoring the details of Sboxes and even the linear layers. Different from Sasaki and Todo’s tool, ours can exhaust all truncated difference pairs very fast. As is well known, Rijndael was designed by Daemen and Rijmen in 1998 and the 128 block size version was selected as the AES [12]. In this section, we take Rijndael-192 as an example and show how to identify all truncated IDs of 6-round Rijndael-192 within seconds. The state of Rijndael-192 is arranged as \(4 \times 6\) matrix of bytes. Its round function comprises four operations, AddRoundKey (AK), SubBytes (SB), ShiftRows(SR) and MixColumns(MC). Without considering AK, r-round Rijndael-192 can be written similar to SKINNY-64,

$$ R^r = \underbrace{\textsf {SB} \circ \textsf {MC} \circ \textsf {SB}}_{E_2} \circ \underbrace{ \textsf {SR} \circ R^{r-4} \circ \textsf {MC} \circ \textsf {SR} }_{E_1} \circ \underbrace{ \textsf {SB} \circ \textsf {MC} \circ \textsf {SB}}_{E_0}, $$

Note that we also omit the SR of the first round and MC \(\circ \) SR of the last round. Thus \(E_0\) and \(E_2\) of Rijndael-192 can be seen as 6 parallel superboxes, respectively.

In the arbitrary Sbox/MC mode, only being active or inactive for an Sbox would be considered instead of its detailed input and output differences. Thus, the difference of an Sbox can be labeled by one bit 0 or 1. Consequently, the differences of plaintexts, ciphertexts, and intermediate states are labeled by a binary vector \(x \in \mathbb {F}_2^{24}\). Consequently, to apply our new tool with the arbitrary Sbox/MC mode, we adapt Algorithm 1 to compute the representative sets and partition tables for all truncated differences into a superbox. In terms of Rijndael-192, there are only 16 kinds of truncated IDs as input of a superbox (from 0b0000 to 0b1111). Since we only consider the branch number of MC, its representative set contains only 2 elements as shown in Table 4.

We first call Algorithm 2 to test 6-round Rijndael-192 (note that the MILP models here are in the arbitrary Sbox mode and only the branch number of MC are considered.). Within 1 s, we can find out all 6,750 IDs, which are provided with our codes in our git repository. Next, we test 7 rounds, no ID can be found. Therefore, we prove that there is no truncated ID for 7-round Rijndael-192.

Table 4. The representative sets of superboxes \(E_{0,i}\) and \(E^{-1}_{2,i}\) for \(i = 0,1,2,3,4,5\), of Rijndael-192

7 Conclusion and Future Work

In this paper, we proposed a new method to detect all IDs based on MILP models with the DDT considered. The whole search space is partitioned into smaller ones and some of them can be quickly determined to contain no IDs. Thus the search space is significantly reduced, sometimes to a practical size. Then we could handle the remaining candidates to check if there are any IDs. With this novel strategy, we identified all IDs for 11-, and 12-round SKINNY-64 and prove there exists no ID for 13-round SKINNY-64. Similarly, we identified all IDs for 13-round CRAFT and prove there is no ID for 14 rounds. We also proved there is no ID for 8-round GIFT-64. The idea of our new method is also very useful to enhance the current MILP models for ciphers with large blocks. For example, we can partition the whole space of truncated differences for Rijndael-192 into smaller ones under the arbitrary Sbox mode. We quickly identified all truncated IDs for 6-round Rijndael-192 in 1 s and proved there is no truncated ID for 7 rounds. For GIFT-128, we searched in a smaller space where all differences have an active superbox in plaintext and ciphertext differences.

It is interesting to study if our method is also applicable to searching for all zero-correlation linear hull distinguishers [8] due to the dual property of the ID and zero-correlation linear hull. However, since the correlation of a linear hull is equivalent to the summation of the correlation of all its trails, in theory there might be a linear hull with the zero-correlation consisting some non-zero-correlation linear trails. Thus, our method cannot be directly used for zero-correlation linear hulls, or more assumptions might be necessary. This would be one of our future works. The representative sets and partition tables in our work are generated based on an intuitive algorithm (Algorithm 1) that is not very efficient. We guess there may be other methods to choose the representative sets and partition tables that could consider the property of \(E_1\) simultaneously, such that we could reduce the number of MILP models we need to solve in the second step of Algorithm 2 and fewer impossible patterns for \(E_1\) remains after it. This would be another interesting future work. Finally, our method currently works only for ciphers whose round functions are based on superboxes, it is also interesting to see how to generalize it to more types of ciphers.