Gimli: a cross-platform permutation

## Notation.

We denote by $\mathcal{W} = \{0, 1\}^{32}$ the set of bitstrings of length $32$. We will refer to the elements of this set as “words”. We use

• $a \oplus b$ to denote a bitwise exclusive or (XOR) of the values $a$ and $b$,
• $a \land b$ for a bitwise logical and of the values $a$ and $b$,
• $a \lor b$ for a bitwise logical or of the values $a$ and $b$,
• $a \lll k$ for a cyclic left shift of the value a by a shift distance of $k$, and
• $a \ll k$ for a non-cyclic shift (i.e, a shift that is filling up with zero bits) of the value a by a shift distance of $k$.

We index all vectors and matrices starting at zero. We encode words as bytes in little-endian form.

## The state.

Gimli applies a sequence of rounds to a $384$-bit state. The state is represented as a parallelepiped with dimensions $3 \times 4 \times 32$ or, equivalently, as a $3 \times 4$ matrix of $32$-bit words.

We name the following sets of bits:

• a column $j$ is a sequence of $96$ bits such that $\textbf{s}_j = \{s_{0,j};s_{1,j};s_{2,j}\} \in \mathcal{W}^{3}$
• a row $i$ is a sequence of $128$ bits such that $\textbf{s}_i = \{s_{i,0};s_{i,1};s_{i,2};s_{i,3}\} \in \mathcal{W}^{4}$

Each round is a sequence of three operations:

1. a non-linear layer, specifically a $96$-bit SP-box applied to each column;
2. in every second round, a linear mixing layer;
3. in every fourth round, a constant addition. ## The non-linear layer.

The SP-box consists of three sub-operations: rotations of the first and second words; a 3-input nonlinear T-function; and a swap of the first and third words. ## The linear layer.

The linear layer consists of two swap operations, namely Small-Swap and Big-Swap. Small-Swap occurs every 4 rounds starting from the 1st round. Big-Swap occurs every 4 rounds starting from the 3rd round. ## The round constants.

There are 24 rounds in Gimli, numbered $24,23,\dots,1$. When the round number $r$ is $24,20,16,12,8,4$ we XOR the round constant $\texttt{0x9e377900} \oplus r$ to the first state word $s_{0,0}$.

## Reference code in C.

#include <stdint.h>

uint32_t rotate(uint32_t x, int bits)
{
if (bits == 0) return x;
return (x << bits) | (x >> (32 - bits));
}

extern void gimli(uint32_t *state)
{
int round;
int column;
uint32_t x;
uint32_t y;
uint32_t z;

for (round = 24; round > 0; −−round)
{
for (column = 0; column < 4; ++column)
{
x = rotate(state[    column], 24);
y = rotate(state[4 + column],  9);
z =        state[8 + column];

state[8 + column] = x ^ (z << 1) ^ ((y&z) << 2);
state[4 + column] = y ^ x        ^ ((x|z) << 1);
state[column]     = z ^ y        ^ ((x&y) << 3);
}

if ((round & 3) == 0) { // small swap: pattern s...s...s... etc.
x = state;
state = state;
state = x;
x = state;
state = state;
state = x;
}
if ((round & 3) == 2) { // big swap: pattern ..S...S...S. etc.
x = state;
state = state;
state = x;
x = state;
state = state;
state = x;
}

if ((round & 3) == 0) { // add constant: pattern c...c...c... etc.
state ^= (0x9e377900 | round);
}
}
}


Version: This is version 2017.09.16 of the Spec web page.