Movatterモバイル変換

Correlation attack

From Wikipedia, the free encyclopedia

Cryptographic attack

Correlation attacks are a class ofcryptographic known-plaintext attacks for breakingstream ciphers whosekeystreams are generated by combining the output of severallinear-feedback shift registers (LFSRs) using aBoolean function. Correlation attacks exploit astatistical weakness that arises from the specificBoolean function chosen for the keystream. While some Boolean functions are vulnerable to correlation attacks, stream ciphers generated using such functions are not inherently insecure.

Explanation

[edit]

Correlation attacks become possible when a significantcorrelation exists between the output state of an individual LFSR in the keystream generator and the output of the Boolean function that combines the output states of all the LFSRs. These attacks are employed in combination with partial knowledge of the keystream, which is derived from partial knowledge of the plaintext. The two are then compared using anXOR logic gate. This vulnerability allows an attacker to brute-force thekey for the individual LFSR and the rest of the system separately. For instance, in a keystream generator where four 8-bit LFSRs are combined to produce the keystream, and if one of the registers is correlated to the Boolean function output, it becomes possible to brute force it first, followed by the remaining three LFSRs. As a result, the total attack complexity becomes 2⁸ + 2²⁴.

Compared to the cost of launching abrute-force attack on the entire system, with complexity 2³², this represents an attack effort saving factor of just under 256. If a second register is correlated with the function, the process may be repeated and decrease the attack complexity down to 2⁸ + 2⁸ + 2¹⁶ for an effort saving factor of just under 65028.

Example

[edit]

Geffe generator

[edit]

One example is the Geffe generator, which consists of three LFSRs: LFSR-1, LFSR-2, and LFSR-3. Let these registers be denoted as: $x_{1}$ , $x_{2}$ , and $x_{3}$ , respectively. Then, the Boolean function combining the three registers to provide the generator output is given by $F(x_{1},x_{2},x_{3})=(x_{1}\wedge x_{2})\oplus (\neg x_{1}\wedge x_{3})$ (i.e. ( $x_{1}$ AND $x_{2}$ ) XOR (NOT $x_{1}$ AND $x_{3}$ )). There are 2³ = 8 possible values for the outputs of the three registers, and the value of this combining function for each of them is shown in the table below:

Boolean function output table
$x_{1}$	$x_{2}$	$x_{3}$	$F(x_{1},x_{2},x_{3})$
0	0	0	0
0	0	1	1
0	1	0	0
0	1	1	1
1	0	0	0
1	0	1	0
1	1	0	1
1	1	1	1

Consider the output of the third register, $x_{3}$ . The table above shows that of the 8 possible outputs of $x_{3}$ , 6 are equal to the corresponding value of the generator output, $F(x_{1},x_{2},x_{3})$ . In 75% of all possible cases, $x_{3}=F(x_{1},x_{2},x_{3})$ . Thus LFSR-3 is 'correlated' with the generator. This is a weakness that may be exploited as follows:

An interception can be made on the cipher text $c_{1},c_{2},c_{3},\ldots ,c_{n}$ of a plain text $p_{1},p_{2},p_{3},\ldots$ which has been encrypted by a stream cipher using a Geffe generator as its keystream generator, i.e. $c_{i}=p_{i}\oplus F(x_{1i},x_{2i},x_{3i})$ for $i=1,2,3,\ldots ,n$ , where $x_{1i}$ is the output of LFSR-1 at time $i {\displaystyle i}$ , etc. It's also possible that part of the plain text, e.g. $p_{1},p_{2},p_{3},\ldots ,p_{32}$ , the first 32 bits of the plaintext (corresponding to 4 ASCII characters of text). This is not entirely improbable considering plain text is a valid XML file, for instance, the first 4 ASCII characters must be "<xml". Similarly, many file formats or network protocols have very standard headers or footers. Given the intercepted $c_{1},c_{2},c_{3},\ldots ,c_{32}$ and our known/guessed $p_{1},p_{2},p_{3},\ldots ,p_{32}$ , we may easily find $F(x_{1i},x_{2i},x_{3i})$ for $i=1,2,3,\ldots ,32$ by XOR-ing the two together. This makes the 32 consecutive bits of the generator output easy to determine.

This enables abrute-force search of the space of possible keys (initial values) for LFSR-3 (assuming we know the tapped bits of LFSR-3, an assumption which is in line withKerckhoffs' principle). For any given key in the key space, we may quickly generate the first 32 bits of LFSR-3's output and compare these to our recovered 32 bits of the entire generator's output. Because we have established earlier that there is a 75% correlation between the output of LFSR-3 and the generator, we know we have correctly guessed the key for LFSR-3 if approximately 24 of the first 32 bits of LFSR-3 output will match up with the corresponding bits of generator output. If we have guessed incorrectly, we should expect roughly half, or 16, of the first 32 bits of these two sequences to match. Thus we may recover the key for LFSR-3 independently of the keys of LFSR-1 and LFSR-2. At this stage, we have reduced the problem of brute forcing a system of 3 LFSRs to the problem of brute forcing a single LFSR and then a system of 2 LFSRs. The amount of effort saved here depends on the length of the LFSRs. For realistic values, it is a very substantial saving and can make brute-force attacks very practical.

Observe in the table above that $x_{2}$ also agrees with the generator output 6 times out of 8, again a correlation of 75% correlation between $x_{2}$ and the generator output. We may begin a brute force attack against LFSR-2 independently of the keys of LFSR-1 and LFSR-3, leaving only LFSR-1 unbroken. Thus, we are able to break the Geffe generator with as much effort as required to brute force 3 entirely independent LFSRs. This means the Geffe generator is a very weak generator and should never be used to generate stream cipher keystreams.

Note from the table above that $x_{1}$ agrees with the generator output 4 times out of 8—a 50% correlation. We cannot use this to brute force LFSR-1 independently of the others: the correct key will yield output that agrees with the generator output 50% of the time, but on average so will an incorrect key. This represents the ideal situation from a security perspective—the combining function $F(x_{1},x_{2},x_{3})$ should be chosen so the correlation between each variable and the combining function's output is as close as possible to 50%. In practice, it may be difficult to find a function that achieves this without sacrificing other design criteria, e.g., period length, so a compromise may be necessary.

Clarifying the statistical nature of the attack

[edit]

While the above example illustrates well the relatively simple concepts behind correlation attacks, it perhaps simplifies the explanation of precisely how the brute forcing of individual LFSRs proceeds. Incorrectly guessed keys will generate LFSR output that agrees with the generator output roughly 50% of the time because, given two random bit sequences of a given length, the probability of agreement between the sequences at any particular bit is 0.5. However, specific individual incorrect keys may well generate LFSR output that agrees with the generator output more or less often than exactly 50% of the time. This is particularly salient in the case of LFSRs whose correlation with the generator is not especially strong; for small enough correlations, it is certainly not outside the realm of possibility that an incorrectly guessed key will also lead to LFSR output that agrees with the desired number of bits of the generator output. Thus, it may not be possible to identify the unique key to that LFSR. It may be possible to identify a number of potential keys, however, which is still a significant breach of the cipher's security. Moreover, given a megabyte of known plain text, the situation would be substantially different. An incorrect key may generate LFSR output that agrees with more than 512 kilobytes of the generator output but is not likely to generate output that agrees with as much as 768 kilobytes of the generator output as a correctly guessed key would. As a rule, the weaker the correlation between an individual register and the generator output, the more known plain text is required to find that register's key with a high degree of confidence. Estimates of the length of known plain text required for a given correlation can be calculated using thebinomial distribution.

Higher order correlations

[edit]

Definition

[edit]

This sectionneeds additional citations forverification. Please helpimprove this article byadding citations to reliable sources in this section. Unsourced material may be challenged and removed.(June 2022) (Learn how and when to remove this message)

The correlations which were exploited in the example attack on the Geffe generator are examples of what are calledfirst order correlations: they are correlations between the value of the generator output and an individual LFSR. It is possible to define higher-order correlations in addition to these. For instance, it may be possible that while a given Boolean function has no strong correlations with any of the individual registers it combines, a significant correlation may exist between some Boolean function of two of the registers, e.g., $x_{1}\oplus x_{2}$ . This would be an example of a second order correlation. Third order correlations and higher can be defined in this way.

Higher-order correlation attacks can be more powerful than single-order correlation attacks, however, this effect is subject to a "law of limiting returns". The table below shows a measure of the computational cost for various attacks on a keystream generator consisting of eight 8-bit LFSRs combined by a single Boolean function. Understanding the calculation of cost is relatively straightforward: the leftmost term of the sum represents the size of the key space for the correlated generators, and the rightmost term represents the size of the key space for the remaining generators.

Generator attack effort
Attack	Effort (size of keyspace)
Brute force	$2^{8\times 8}=18446744073709551616$
Single 1st order correlation attack	$2^{8}+2^{7\times 8}=72057594037928192$
Single 2nd order correlation attack	$2^{2\times 8}+2^{6\times 8}=281474976776192$
Single 3rd order correlation attack	$2^{3\times 8}+2^{5\times 8}=1099528404992$
Single 4th order correlation attack	$2^{4\times 8}+2^{4\times 8}=8589934592$
Single 5th order correlation attack	$2^{5\times 8}+2^{3\times 8}=1099528404992$
Single 6th order correlation attack	$2^{6\times 8}+2^{2\times 8}=281474976776192$
Single 7th order correlation attack	$2^{7\times 8}+2^{8}=72057594037928192$

While higher-order correlations lead to more powerful attacks, they are also more difficult to find, as the space of available Boolean functions to correlate against the generator output increases as the number of arguments to the function does.

Terminology

[edit]

A Boolean function $F(x_{1},\ldots ,x_{n})$ ofn variables is said to be "m-th order correlation immune", or to have "mth ordercorrelation immunity" for some integerm, if no significant correlation exists between the function's output and any Boolean function ofm of its inputs. For example, a Boolean function that has no first-order or second-order correlations, but which does have a third-order correlation exhibits 2nd order correlation immunity. Obviously, higher correlation immunity makes a function more suitable for use in a keystream generator (although this is not the only thing that needs to be considered).

Siegenthaler showed that the correlation immunitym of a Boolean function of algebraic degreed ofn variables satisfies $m+d\leq n$ ; for a given set of input variables, this means that a high algebraic degree will restrict the maximum possible correlation immunity. Furthermore, if the function is balanced then $m\leq n-1$ .^[1]

It follows that it is impossible for a function ofn variables to benth order correlation immune. This also follows from the fact that any such function can be written using a Reed-Muller basis as a combination of XORs of the input functions.

Cipher design implications

[edit]

Given the probable extreme severity of a correlation attack's impact on a stream cipher's security, it should be essential to test a candidate Boolean combination function for correlation immunity before deciding to use it in a stream cipher. However, it is important to note that high correlation immunity is a necessary, butnot sufficient condition for a Boolean function to be appropriate for use in a keystream generator. There are other issues to consider, for example, whether or not the function isbalanced - whether it outputs as many or roughly as many 1's as it does 0's when all possible inputs are considered.

Research has been conducted into methods for easily generating Boolean functions of a given size which are guaranteed to have at least some particular order of correlation immunity. This research has uncovered links between correlation immune Boolean functions anderror correcting codes.^[2]

This sectionneeds expansion. You can help byadding missing information.(October 2008)

References

[edit]

^T. Siegenthaler (September 1984). "Correlation-Immunity of Nonlinear Combining Functions for Cryptographic Applications".IEEE Transactions on Information Theory.30 (5):776–780.doi:10.1109/TIT.1984.1056949.
^Chuan-Kun Wu and Ed Dawson,Construction of Correlation Immune Boolean Functions Archived 2006-09-07 at theWayback Machine, ICICS97

Bruce Schneier.Applied Cryptography: Protocols, Algorithms and Source Code in C, Second Edition. John Wiley & Sons, Inc. 1996.ISBN 0-471-12845-7. Page 382 of section 16.4: Stream Ciphers Using LFSRs.

External links

[edit]

The Online Database of Boolean Functions allows visitors to search a database of Boolean factors in several ways, including by correlation immunity.

v t e Block ciphers (security summary)
Common algorithms	AES Blowfish DES (internal mechanics,Triple DES) Serpent SM4 Twofish
Less common algorithms	ARIA Camellia CAST-128 GOST IDEA LEA RC5 RC6 SEED Skipjack TEA XTEA
Other algorithms	3-Way Adiantum Akelarre Anubis Ascon BaseKing BassOmatic BATON BEAR and LION CAST-256 Chiasmus CIKS-1 CIPHERUNICORN-A CIPHERUNICORN-E CLEFIA CMEA Cobra COCONUT98 Crab Cryptomeria/C2 CRYPTON CS-Cipher DEAL DES-X DFC E2 FEAL FEA-M FROG G-DES Grand Cru Hasty Pudding cipher Hierocrypt ICE IDEA NXT Intel Cascade Cipher Iraqi Kalyna KASUMI KeeLoq KHAZAD Khufu and Khafre KN-Cipher Kuznyechik Ladder-DES LOKI (97,89/91) Lucifer M6 M8 MacGuffin Madryga MAGENTA MARS Mercy MESH MISTY1 MMB MULTI2 MultiSwap New Data Seal NewDES Nimbus NOEKEON NUSH PRESENT Prince Q QARMA RC2 REDOC Red Pike S-1 SAFER SAVILLE SC2000 SHACAL SHARK Simon Speck Spectr-H64 Square SXAL/MBAL Threefish Treyfer UES xmx XXTEA Zodiac
Design	Feistel network Key schedule Lai–Massey scheme Product cipher S-box P-box SPN Confusion and diffusion Round Avalanche effect Block size Key size Key whitening (Whitening transformation)
Attack (cryptanalysis)	Brute-force (EFF DES cracker) MITM Biclique attack 3-subset MITM attack Linear (Piling-up lemma) Differential Impossible Truncated Higher-order Differential-linear Distinguishing (Known-key) Integral/Square Boomerang Modn Related-key Slide Rotational Side-channel Timing Power-monitoring Electromagnetic Acoustic Differential-fault XSL Interpolation Partitioning Rubber-hose Black-bag Davies Rebound Weak key Tau Chi-square Time/memory/data tradeoff
Standardization	AES process CRYPTREC NESSIE NSA Suite B CNSA
Utilization	Initialization vector Mode of operation Padding

v t e Cryptographic hash functions andmessage authentication codes
List Comparison Known attacks
Common functions	MD5 (compromised) SHA-1 (compromised) SHA-2 SHA-3 BLAKE2
SHA-3 finalists	BLAKE Grøstl JH Skein Keccak (winner)
Other functions	BLAKE3 CubeHash ECOH FSB Fugue GOST HAS-160 HAVAL Kupyna LSH Lane MASH-1 MASH-2 MD2 MD4 MD6 MDC-2 N-hash RIPEMD RadioGatún SIMD SM3 SWIFFT Shabal Snefru Streebog Tiger VSH Whirlpool
Password hashing/ key stretching functions	Argon2 Balloon bcrypt Catena crypt LM hash Lyra2 Makwa PBKDF2 scrypt yescrypt
General purpose key derivation functions	HKDF KDF1/KDF2
MAC functions	CBC-MAC DAA GMAC HMAC NMAC OMAC/CMAC PMAC Poly1305 SipHash UMAC VMAC
Authenticated encryption modes	CCM ChaCha20-Poly1305 CWC EAX GCM IAPM OCB
Attacks	Collision attack Preimage attack Birthday attack Brute-force attack Rainbow table Side-channel attack Length extension attack
Design	Avalanche effect Hash collision Merkle–Damgård construction Sponge function HAIFA construction
Standardization	CAESAR Competition CRYPTREC NESSIE NIST hash function competition Password Hashing Competition NSA Suite B CNSA
Utilization	Hash-based cryptography Merkle tree Message authentication Proof of work Salt Pepper

Stream ciphers

Widely used ciphers

eSTREAM Portfolio

Software	HC-128 Rabbit Salsa20 SOSEMANUK
Hardware	Grain MICKEY Trivium

Other ciphers

Generators

Theory

Attacks

v t e Cryptography
General	History of cryptography Outline of cryptography Classical cipher Cryptographic protocol Authentication protocol Cryptographic primitive Cryptanalysis Cryptocurrency Cryptosystem Cryptographic nonce Cryptovirology Hash function Cryptographic hash function Key derivation function Secure Hash Algorithms Digital signature Kleptography Key (cryptography) Key exchange Key generator Key schedule Key stretching Keygen Machines Ransomware Random number generation Cryptographically secure pseudorandom number generator (CSPRNG) Pseudorandom noise (PRN) Secure channel Insecure channel Subliminal channel Encryption Decryption End-to-end encryption Harvest now, decrypt later Information-theoretic security Plaintext Codetext Ciphertext Shared secret Trapdoor function Trusted timestamping Key-based routing Onion routing Garlic routing Kademlia Mix network
Mathematics	Cryptographic hash function Block cipher Stream cipher Symmetric-key algorithm Authenticated encryption Public-key cryptography Quantum key distribution Quantum cryptography Post-quantum cryptography Message authentication code Random numbers Steganography
Category