Gimli: a cross-platform permutation

Cross-platform software performance comparison of various permutations.

b no data
h Sponge construction [10] with $c$ = 256 bits, $r$ = 128 bits and 256 bits of output.

Hashing 500 bytes on AVR ATmega

Hash function Cycles ROM Bytes RAM Bytes
Spongent [5] 25 464 000 364 101
Keccak-f [400] [5] 1 313 000 608 96
Gimli-Hashh small 805 110 778 44
Gimli-Hashh fast 362 712 19 218 45

AVR ATmega

permutations Cycles/B ROM bytes RAM Bytes
Gimli small 413 778 44
ChaCha20 [31] 238 b 132
Salsa20 [19] 216 1 750 266
Gimli fast 213 19 218 45
AES-128 small [22] 171 1 570 b
AES-128 fast [22] 155 3 098 b

ARM Cortex-M0

Permutation Cycles/B ROM Bytes RAM Bytes
Gimli 49 4 730 64
ChaCha20 [23] 40 b b
Chaskey [21] 17 414 b

ARM Cortex-M3/M4

Permutation Cycles/B ROM Bytes RAM Bytes
Ascon [15] 196 b b
Keccak-f [400] [30] 106 540 b
AES-128 [25] 34 3 216 72
Gimli 21 3 972 44
ChaCha20 [18] 13 2 868 8
Chaskey [21] 7 908 b

ARM Cortex-A8

Permutation Cycles/B ROM Bytes RAM Bytes
Keccak-f [400] (KetjeSR) [8] 37.52 b b
Ascon [8] 25.54 b b
AES-128 [8] many blocks 19.25 b b
Gimli single block 8.73 480 b
ChaCha20 [8] multiple blocks 6.25 b b
Salsa20 [8] multiple blocks 5.48 b b

Intel Haswell

Permutation Cycles/B ROM Bytes RAM Bytes
Gimli single block 4.46 252 b
NORX-32-4-1 [8] single block 2.84 b b
Gimli two blocks 2.33 724 b
Gimli four blocks 1.77 1227 b
Salsa20 [8] eight blocks 1.38 b b
ChaCha20 [8] eight blocks 1.20 b b
AES-128 [8] many blocks 0.85 b b

Version: This is version 2017.06.26 of the Speed web page.