Movatterモバイル変換


[0]ホーム

URL:


Next Article in Journal
Comparison of Cost of Protection against Differential Power Analysis of Selected Authenticated Ciphers
Next Article in Special Issue
Efficient One-Time Signatures from Quasi-Cyclic Codes: A Full Treatment
Previous Article in Journal
A New Visual Multi-Secrets Sharing Scheme by Random Grids
 
 
Search for Articles:
Title / Keyword
Author / Affiliation / Email
Journal
Article Type
 
 
Section
Special Issue
Volume
Issue
Number
Page
 
Logical OperatorOperator
Search Text
Search Type
 
add_circle_outline
remove_circle_outline
 
 
Journals
Cryptography
Volume 2
Issue 3
10.3390/cryptography2030025
Font Type:
ArialGeorgiaVerdana
Font Size:
AaAaAa
Line Spacing:
Column Width:
Background:
Article

On the Performance and Security of Multiplication inGF(2N)

1
LTCI, Télécom ParisTech, Université Paris-Saclay, 75013 Paris, France
2
Secure-IC S.A.S., 35510 Cesson-Sévigné, France
3
Département d’Informatique, École Normale Supérieure, CNRS, PSL Research University, 75005 Paris, France
4
Département Mathématique et Informatique, Université Cheikh Anta Diop, Dakar 5005, Senegal
5
Department of Mathematical Sciences, Florida Atlantic University, Boca Raton, FL 33431, USA
*
Authors to whom correspondence should be addressed.
Submission received: 2 August 2018 /Revised: 4 September 2018 /Accepted: 13 September 2018 /Published: 18 September 2018
(This article belongs to the Special IssueCode-Based Cryptography)

Abstract

:
Multiplications inGF(2N) can be securely optimized for cryptographic applications when the integerN is small and does not match machine words (i.e.,N<32). In this paper, we present a set of optimizations applied to DAGS, a code-based post-quantum cryptographic algorithm and one of the submissions to the National Institute of Standards and Technology’s (NIST) Post-Quantum Cryptography (PQC) standardization call.

    1. Introduction

    Arithmetic inGF(2N) is very attractive since addition is carry-less. This is why it is adopted in many cryptographic algorithms, which are thus efficient both in hardware (no carry means no long delays) and in software implementations.
    In this article, we focus on software multiplication inGF(2N), and more specifically for smallN. WhenN is smaller than a machine word size (that is,N<32 or 64, on typical smartphones or desktops), all known window-based computational optimizations become irrelevant.
    Our goal is to compute fast multiplications (since sums are trivially executed with the native XOR operation of computer instruction sets) that are secure with respect to cache-timing attacks. Therefore, we look for regular multiplication algorithms, that is, algorithms whose control flow does not depend on the data. Our method is not to come up with novel algorithms for multiplication, but to organize the computations in such a way that the resources of the computer are utilized optimally. Our contribution is thus to explore the way to load the machine in the most efficient way while remaining regular. We leverage the fact that regular algorithms can be executed SIMD (Single Instruction Multiple Data), hence they are natural candidates for bitslicing or similar types of parallel processing of packed operations. We compare these operations with those which are insecure and those which resort to special instructions (such as Intel Carry-Less MULtiplication, PCLMULQDQ). We conclude that the most efficient implementations are SIMD, and hence benefit at the same time of performance and security, at no extra overhead. On top of that, we show that computations can be faster when mapping elements from tower fieldsGF((2)m) to isomorphic fieldsGF(2N), whereN=m. Previously, Paar [1] proposed an exhaustive research method to map elements of fixed isomorphic representations, and Sunar et al. [2] suggested a towering construction, fromGF(2N) toGF((2)m). In this paper, we show a use-case where subsumingGF((2)m) intoGF(2N) is beneficial to computation speed.
    In the next section, we will show that computation in fields of characteristic two is a key building block for a wide array of cryptographic algorithms, and then cover some mathematical aspects related to computation inGF(2N) (where all computations are checked with SAGE). Our contribution is presented inSection 3. We also provide an application of our techniques to the DAGS Key Encapsulation Mechanism. DAGS was submitted to NIST [3] as one of the candidates for Post-Quantum Standardization, and so all the relative documentation (including reference code) [4]. Finally, we conclude inSection 5. The algorithms tested in this paper are given in C language inAppendix A.

    2. The FieldGF(2N) in Cryptography: Arithmetic and Suitability

    2.1. Application to Block Ciphers

    Block ciphers are the most important and most used symmetric cryptographic algorithms. Since 2001, the Advanced Encryption Standard (AES) [5] has become the most popular block cipher. AES is based on computations over the finite fieldGF(2N) withN=8, which maps naturally to computer architectures. Representing bytes as elements ofGF(28) allows for expressing the confusion operation (named SubBytes) thanks to a multiplicative inverse in the finite field. This has the necessary properties to thwart differential and linear cryptanalyses, as well as attacks that use algebraic manipulations such as interpolation attacks.

    2.2. Application to Classical Public-Key Cryptography

    Elliptic curve cryptography can be executed efficiently on fields of characteristic two NIST standardizes Koblitz curves K-163, K-233, K-283, K-409, and K-571.

    2.3. Application to Post-Quantum Public-Key Cryptography

    As quantum systems start surfacing on the horizon, the importance of Post-Quantum Cryptography (PQC) is being recognized globally, to the point that NIST has launched a call for Post-Quantum Standardization [3]. Due to its inherent resistance to attacks by quantum computers, code-based cryptography is one of the main candidates for the task, alongside multivariate and lattice-based schemes. Although the original McEliece cryptosystem [6] is almost as old as RSA [7] and Diffie–Hellman [8], it has never been largely deployed, mainly due to large key sizes. There is thus an opportunity to revive and refine the area by developing more memory efficient primitives. However, multiple variants of the McEliece cryptosystem have been broken so far, as recalled by Bardet et al., in [9], and in practice the secure schemes are restricted to just a few families of codes, like Goppa and Generalized Srivastava codes. These schemes, by definition, need to handle elements in an extension of the base fieldGF(2), and are thus of interest to us.
    Among the candidates accepted for the first round [10] of the PQC competition, there are several schemes which perform operations inGF(2N) and fail to do so in a side-channel resistant way, at least in their reference implementation. The exhaustive list is given inTable 1.
    Note that most fieldsGF(2N) haveN smaller than the typical size of machine words.
    The vulnerability to side-channel attacks mainly stems from two aspects. For small extensions (up toN=8 or slightly higher), multiplication of elements inGF(2N) is implemented using log-antilog tables, using the fact thata×b=log1(log(a)+log(b)), fora,b0. The logarithm and antilogarithm of all elements inGF(2N) are tabulated, and a multiplication merely consists in three table accesses (two in the log table, one in the antilog table, or vice versa). However, this creates a data-dependent table access, and the operands could potentially be recovered using standard side-channel attacks such as FLUSH+RELOAD [11]. For big extensions, computing these tables is too costly and the implementation of operations inGF(2N) is handled differently. However, the multiplications are executed by takingdata-dependent branches; which branch is taken could be recovered via similar attack techniques, thereby revealing once again the operands of the multiplication.
    We checked for these kinds of potential leaks using a static analysis tool [12] specifically developed for tracking microarchitectural side-channels, includingdata-dependent table accesses anddata-dependent branches. This tool requires the user to specify which variables are sensitive, such as the secret key or the randomness used during signature or encryption. It then performs a dependency analysis and determines whether any variable depending on sensitive values is used as an index for table access or as the condition variable of a branching operation (If,While,Switch or the stop condition in aFor loop).

    2.4. Arithmetic in Extensions ofGF(2)

    LetGF(2N) denote the extension field of orderN defined overGF(2). Letα be a primitive element ofGF(2N). The set
    B={1,α,α2,,αN1}
    is a basis ofGF(2N) overGF(2), referred to as a polynomial basis. Thus, given an elementAGF(2N), we can write
    A=i=0N1aiαi,
    where the coefficientsa0,a1,,aN1GF(2)={0,1}. Then, the field arithmetic can be derived using the chosen basis.

    2.5. Tower Fields Representation

    Depending on the choice of the basisB, the elements ofGF(2N) can be defined differently. IfN is the product of two integers andm, thenGF(2N) can be defined overGF(2). In the rest of the paper, we callGF((2)m) acomposite field andGF(2) theground field. Note thatGF((2)m) andGF(2N) refer to the same field although their representation methods are different.
    Given two representations of the finite fieldGF(2N), it is possible to map one to the other, thanks to a conversion matrix. The first representation isGF(2N) as an extension ofGF(2) and the second representation isGF(2N) as an extension ofGF(2) whereN=m for,mN. Here, the elements ofGF(2N) are polynomials whose coefficients are inGF(2)={0,1} of degree at mostN1 and the elements ofGF((2)m) are polynomials whose coefficients are inGF(2) of degree at mostm1. Hence, we write in Kronecker style
    GF(2)=GF(2)[γ]/P,GF((2)m)=GF(2)[β]/Q,GF(2N)=GF(2)[α]/D,
    where:
    • P(x)=x++p1x+1 is an irreducible polynomial overGF(2) of degree,
    • Q(x)=xm++q1x+1 is an irreducible polynomial overGF(2) of degreem,
    • D(x)=xN++d1x+1 is an irreducible polynomial overGF(2) of degreeN,
    and whereγ,β andα are their respective roots. Thus, the elements ofGF(2),GF((2)m) andGF(2N) are the residue classes modulo of their respective irreducible polynomials. Such polynomials always exist [13]. In general, the number of irreducible polynomials of degreeN with coefficients inGF(q) is given by
    Lq(N)=1NdNqdμNd,
    whereμ(k) is the Möbius function defined by
    μ(k)=0,ifkhasoneormorerepeatedprimefactors,1,ifk=1,(1)n,ifkisaproductofndistinctprimes.
    For instance, the number of irreducible polynomials of degree 12 inGF(212) (field used in DAGS) is
    L2(12)=112d122dμ(12d)=335
    and thus multiple representations can be derived for the same element in the field.

    2.6. Composite Fields and Fields Mapping

    SinceGF((2)m) andGF(2N) refer to the same field, they are isomorphic [13]. However, although two fields’ representations are isomorphic, the algorithmic complexity of their field operations may differ, depending on the polynomialsQ andD. A binaryN×N matrixT can be derived to map elements ofGF(2N) to elements ofGF((2)m). The inverse ofT, denotedT1, will perform the mapping the other way around. The conversion problem was addressed by Paar in [1]. In this work, conversion matrices are derived fromGF(2N) andGF((2)m) that are already fixed by their generating polynomials. The construction is based on finding a relation between the primitive elementsγ andα such that
    • αr=γ is known for some integerr and;
    • D(αr)0(modP,Q).
    Since there is no established mathematical connection betweenα andγ an exhaustive search is needed. In a related work [2], Sunar et al. redefined the problem. Instead of finding a conversion matrix, the paper proposes to construct the composite field given the field of characteristic two. Here, we recall the results and examine the problem in a slightly different way: our aim is to find a suitable isomorphic representation to construct the field of characteristic two given the ground and extension fields.
    Theorem 1.
    ForβGF(2m) andγ=βr wherer=2m121,
    (1)
    βrGF(2),
    (2)
    if β is a primitive element, then γ is primitive inGF(2).
    Proof. 
    Chapter 2, [13]. ☐
    LetGF(2)=GF(2)[γ]/P be the ground field. The extension fieldGF((2)m) can be constructed using the polynomial
    Sβ(x)=(x+β)(x+β2)(x+β22)(x+β2m1)=smxm++s1x+s0.
    Noting thatsm=1 and using Vieta’s formulas, we obtain
    sm1=ββ2β22β2m1sm2=(ββ2+ββ22++ββ2(m1))++(β2β22+β2β23++β2β2(m1))++β2(m2)β2(m1)s0=(1)mββ2β22β2m1
    or, equivalently, the(mk)-th coefficientsmk is related to a signed sum of all possible subproducts of roots, takenk-at-a-time:
    smk=(1)k1i1i2ikmβi1βi22βi322βim2m1.
    Thus, given a ground fieldGF(2) and its extension fieldGF((2)m), we are looking for a fieldGF(2N) with primitive elementα such that:
    • P(αr)=0 wherer=(2m1)/(21), henceαr=γ,
    • Q(x)=Sα(x) hence,qmk=(1)k1i1i2ikmαi1αi22αi322αim2m1.
    Once we find the suitable representation ofGF(2N), we derive the conversion matrix as follows. Any elementA inGF(2N) has two representations:
    A=i=0N1aiαi=j=0m1ajβj,aiGF(2),ajGF(2)
    .
    We showed that our construction allowsβr to be primitive inGF(2). Thus,{1,βr,β2r,,β(m1)r} is a basis inGF(2) and we can write
    aj=i=01ajiβri.
    Thus,A=j=0m1ajβj=j=0m1i=01ajiβriβj=j=0m1i=01ajiβri+j.
    Then, the termsβri+j are reduced using the generating polynomialP(x)
    βri+j=f=01tfβf,tfGF(2).
    These are the coefficients of the conversion matrix. In the end, we will have
    A=j=0m1i=01f=0N1ajitfβf.
    TheN×N conversion matrixT with coefficients inGF(2) is obtained from Equation. (1):
    a0am1=Ta00a(m1)(1).
    The conversion matrix from the field to the composite field is thenT1.
    Example 1.
    Let the ground and extension fields be
    GF(26)=GF(2)[γ]/γ6+γ+1,GF((26)2)=GF(26)[β]/β2+γ34β+γ.
    The fieldGF(212)=GF(2)[α]/P with(α65)6+α65+1=0 andα65=γ andα64+α=γ34 isGF(2)[α]/α12+α11+α8+α6+1. Now, we show the construction of the conversion matrices. An element A inGF((26)2) is expressed as
    A=a0+a1α,
    whereajGF(26). We can expressaj inGF(26) usingγ=α65 as the basis element
    aj=aj0+aj1γ+aj2γ2+aj3γ3+aj4γ4+aj5γ5=aj0+aj1α65+aj2α130+aj3α195+aj4α260+aj5α325,
    whereajiGF(2) forj{0,1} andi{0,1,2,3,4,5}. Thus, the representation of A in the composite field is
    A=a00+a01α65+a02α130+a03α195+a04α260+a05α325++a10α+a11α66+a12α131+a13α196+a14α261+a15α326.
    The next step is to reduce the termsα65i+j forj={0,1} andi={0,1,2,3,4,5} using the generating polynomialp(x). This will give terms of Equation(2) with α exponent between0 and11. The reduction modulop(x) is done by using successively the relationα12=α11+α8+α6+1. We then obtain the representation of A in the fieldGF(212) using the basis{1,α,α2,,α11} as
    A=b0+b1α+b2α2++b11α11.
    The entries of the12×12 matrix T are determined by the relationship between the termbh forh=1,2,,11 andaji forj={0,1} andi={0,1,2,3,4,5}. Gathering all the terms of{1,α,α2,,α11}, we obtain
    T=100010001111001011100010010000001011001000010000011110001000010001011110010010011110000111010010010111001000010101010111011111010101001111010000,
    T1=100111110011001111001011000011111111000101100001000000010111000001110111010001110001000111111111001011000010000000101111000011101110001111100111.
    The matrix T gives the representation of an element in the fieldGF(212) given its representation in the composite fieldGF((26)2). The inverse transformation, i.e., the conversion fromGF(212) toGF((26)2), requires the computation ofT1.

    3. Results and Discussion

    We start by presenting state-of-the-art multiplication algorithms inGF(2N) for small values ofN, i.e., whenN is smaller than the machine word. Those algorithms are insecure. Then, we present secure variants with respect to cache timing attacks.

    3.1. Multiplication inGF(2N)

    Fast implementation techniques forGF(2N) arithmetic have been studied intensively by researchers. Among various arithmetic operations,GF(2N) multiplications gained the most attention.
    For smallN, the multiplication is usually carried out using look-up tables, called log-antilog tables. Algorithms of tables initialization and the derived tabulated multiplication are given in Algorithm 1 and Algorithm 2 and in Algorithms A1 and A2 inAppendix A as C code. This method (Algorithm A3 inAppendix A) presents a testing vulnerability. Indeed, 0 is always mapped to1 and never used; furthermore, this method is vulnerable anyways to cache-timing attacks such as PRIME+PROBE [14], and FLUSH+RELOAD [11] that targets the Last-Level-Cache. In a cryptographic scheme where critical operations such as key generation or encryption use log-antilog multiplication overGF(2N), the difference in memory access time to lookup tables caused by the cache may leak information about the secret key. These attacks first completely fill the cache with the attacker’s data. The critical operation is run, and, as it is running, the parts of tables that it uses are loaded from main memory into the cache. Since the cache is full of attacker’s data, some of it will have to be evicted to make place. Once the operation is done, the attacker analyses which parts of his data have been evicted and this tells him which table indexes were used, leaking information about the secret key.
    Algorithm 1 Initialization of the antilog table
    Require: The finite fieldGF(2n) and its generator polynomialP.
    Ensure: The antilog table.
    1:
    antilog[0]=0
    2:
    fori in range(1,2n)do
    3:
      antilog[i]=antilog[i1]<<1                    ▹ Shift to the left
    4:
      if antilog[i1]=2n1then
    5:
        antilog[i]=antilog[i]P            ▹ XOR with the generator polynomial
    6:
    antilog[2n1]=1
    Algorithm 2 Initialization of the log table
    Require: the antilog table.
    Ensure: the log table.
    1:
    log[0]=1
    2:
    log[1]=0
    3:
    fori in range(1,2n)do
    4:
      log[antilog[i]]=i
    The implementation of the tabulated method may be sometimes too costly for largeN and the multiplication is then handled differently. One may use tower field arithmetic and store lookup tables forGF(2) where is taken small such asN (Algorithm A4 inAppendix A). Tower field arithmetic is slow, but we can perform conversions both ways with respect to the theory presented inSection 2 (Algorithm 3).
    Algorithm 3 Multiplication in the tower field inGF((26)2)
    Require: two polynomialsx={xi},y={yi} and an extension polynomialp of order 2
    Ensure: polynomialr=x.y={ri}
    1:
    a1=x>>6
    2:
    a2=y>>6
    3:
    b1=x&63
    4:
    b2=y&63
    5:
    tmp1=mult(a1,a2)
    6:
    a3=mult(tmp1,p1)mult(a1,b2)mult(b1,a2)      ▹p1 the coefficient ofx inp
    7:
    b3=mult(tmp1,p0)mult(b1,b2)              ▹p0 the constant term inp
    8:
    r=(a3<<6)b3
    A straightforward alternative method is iterative multiplication. This is done performing polynomial multiplication and conditional reduction modulo the generator polynomial (Algorithm 4 and Algorithm A5 inAppendix A as C code). Iterative methods cannot be executed without taking data-dependent branches and thus are vulnerable to branch prediction analysis [15,16]. In fact, the information leakage is based on the timing differences produced by the branch prediction unit (BPU).
    Algorithm 4 Iterative multiplication with conditional reduction
    Require: Two polysX={xi},Y={yi} of orders at mostn and a reduction polynomialP of ordern
    Ensure: PolynomialR=X.Y={ri} of ordern
    1:
    R0
    2:
    fori in range(n)do
    3:
      ifyi=1then
    4:
        ri=ri+xi
    5:
      if order(R)>nthen
    6:
        reduceR byP                      ▹ polynomial division

    3.2. Secure Computation inGF(2N)

    There are many countermeasures to prevent cache-timing attacks, but they may affect the computation performance. Here, we propose a trade-off between secure and fastGF(2N) computation. One countermeasure is the constant time implementation where the execution time does not depend on the secret key or input data. In case of tabulated multiplication, this cannot be achieved, but, in case of iterative methods, we replace conditional reduction by unconditional reduction. This means that the reduction modulo the generator polynomial is performed at each iteration and therefore no timing information is leaked (Algorithm 5 and Algorithm A6 inAppendix A as C code).
    Algorithm 5 Iterative multiplication with unconditional reduction
    Require: Two polynomialsX={xi},Y={yi} of orders at mostn and a reduction polynomialP of ordern
    Ensure: PolynomialR=X.Y={ri} of ordern
    1:
    R0
    2:
    fori in range(n)do
    3:
      myi                                ▹ the mask
    4:
      ri=ri+xm
    5:
      mxn                               ▹ the mask
    6:
      reduceR byP                         ▹ polynomial division
    Basically, this code adds the termxiXi ifyi is one (wherexi andyi are the polynomials to multiply). Note that, because of the two’s complement representation,0 is 0000000000000000 over 16 bits and1 is 1111111111111111. We can, thus, useyi as a mask in the first branch. We follow the same idea in the second branch shifting the condition by. Another constant time countermeasure is the bitsliced implementation [17,18] that uses SIMD architecture to perform the same operation on multiple data points simultaneously. In fact, we convert 64N-bit words intoN 64-bit words and multiply the 64-bit words in a single generation (Algorithm 6 and Algorithm A7 inAppendix A as C code).
    Algorithm 6 Bitsliced multiplication
    Require:2×64n-bit wordsXi andYi wherei[1,64]
    Ensure: 64n-bit wordsRi=Xi.Yi
    1:
    TransposeX=X1X2X64 toX=X1X2Xn whereXj are 64-bit words forj[1,n]
    2:
    TransposeY=Y1Y2Y64 toY=Y1Y2Yn whereYj are 64-bit words forj[1,n].
    3:
    GetR whereRj=Xj.Yj forj[1,n]
    4:
    TransposeR=R1R2Rn toR=R1R2Rn whereRi aren-bit words fori[1,64]
    This countermeasure prevents information leakage and speeds up the implementation. To compensate for the loss of performance, one can use Intel’s CLMUL (Carry-Less Multiplication) assembly instruction set to improve the speed of multiplication overGF(2N). PCLMULQDQ, available with the new 2010 Intel Core processor family based on the 32 nm Intel microarchitecture codename Westmere, performs carry-less multiplication of two 64-bit operands over a finite field (without reduction). The optimizations take advantage of the processor datapath (64 bit) and of available instructions. It is thus artificial and probably not very informative to write multiplications as algorithms. Instead, we provide the extensive C code for the caseN=12 (=6 andm=2), written in a portable way (ANSI POSIX). The code is abundantly commented on, hence the operations carried out should not leave place to ambiguity. Regarding performance evaluation of the different functionally equivalent codes, again, a pure algorithmic description would be misleading. For instance, 64 XOR operations can be conducted in one clock cycle provided the operands are laid out as the 64 bits of a quad word, or otherwise in 64 clock cycles if the operands are located at different addresses. Therefore, performance reads better from the C code. We estimated the performance by averaging the execution time of each multiplication placed in a loop. This method allows to filter out abnormal durations caused by improper pipeline or cache initializations.
    In the next section, we show a case study where we compare different implementations ofGF(2N) multiplication on the basis of security and performance. The comparison further points out computation performance over the tower fieldGF((2)m) and the fieldGF(2N), whereN=m, using the derived conversion matrices inSection 2.

    4. Case Study: Optimization of DAGS

    DAGS is a code-based Key Encapsulation Mechanism (KEM). As all code-based primitives, it relies on the Syndrome Decoding Problem [19], and shows no vulnerabilities against quantum attacks. In particular, DAGS is based on the McEliece cryptosystem [6] and uses Quasi-Dyadic Generalized Srivastava (GS) codes to address the issue of the large public key size, which is inherent to code-based cryptography. We start by recalling some important definitions.
    Definition 1.
    Form,n,s,tN and a prime power q, letα1,,αn,w1,,ws ben+s distinct elements ofGF(qm) andz1,,zn be nonzero elements ofGF(qm). The Generalized Srivastava (GS) code of orderst and length n is defined by a parity-check matrix of the form:
    H=H1H2Hs,
    where each block is defined as
    Hi=z1α1wiznαnwiz1(α1wi)2zn(αnwi)2z1(α1wi)tzn(αnwi)t.
    Parameters are the lengthnqms, dimensionknmst and minimum distancedst+1.
    Definition 2.
    Given a ringR (in our caseFqm) and a vectorh¯=(h0,,hn1)Rn, thedyadic matrixΔ(h¯)Rn×n is the symmetric matrix with componentsΔij=hij, where ⊕ stands for bitwise exclusive-or on the binary representations of the indices. The sequenceh¯ is called its signature. If n is a power of 2, then every2l×2l dyadic matrix can be described recursively as
    M=ABBA,
    where each block is a2l1×2l1 dyadic matrix (and where any1×1 matrix is dyadic).
    A linear code isquasi-dyadic if it admits a parity-check in quasi-dyadic form, i.e., a block matrix whose component blocks are dyadic matrices.
    It has been shown by Misoczki and Barreto [20] that it is possible to build Goppa codes in quasi-dyadic form if the code is defined over a field of characteristic 2, and the dyadic signature satisfies the fundamental equation
    1hij=1hi+1hj+1h0.
    Persichetti in [21] then showed how to adapt the Misoczki–Barreto algorithm to generate quasi-dyadic GS codes. Intuitively, using this larger class of codes (of which Goppa codes are a subclass) provides more flexibility in the design of cryptographic schemes. More importantly, thanks to their “layered” structure, GS codes make it easier to resist structural attacks aimed at recovering the private key. In particular, the parametert plays a crucial role in defining the complexity of one such attack (and successive variants), due to Faugère, Otmani, Perret and Tillich, and simply known as FOPT [21].
    The attack, succinctly, consists of generating a system of equations from the fundamental relationship between generator and parity-check matricesG·HT=0. The system of equations is heavily simplified thanks to the particular relations stemming from the quasi-dyadic form (and the limitations inherent to alternant codes) and successively solved using Gröbner bases. This allows for recovering an equivalent matrix for decoding, i.e., a private key. Despite the lack of a definitive complexity analysis, it is possible to observe that the attack scales somewhat proportionally to the value that defines the solution space (i.e., number of free variables) of the system of equations. In the case of quasi-dyadic Goppa codes, this is given simply bym1; thus, the key factor is that the valuem be large enough to make the attack unfeasible. In the original proposal by Misoczki and Barreto, this value varies from 2 to 16, where the large extension fieldGF(2N) is kept constant (i.e.,N=16) and the base field varies, (i.e.,=1,2,4,8). As a consequence, the dimension of the solution space is in most cases trivial or so, and the only parameters that weren’t broken in practice were those corresponding tom=N=16, although the attack authors recommendm1 to be at least 20.
    In the case of GS codes, it is possible to apply the attack, but the dimension of the solution space is given bymt1 instead. It is thus a lot easier to achieve larger values for this, while at the same time keeping the extension field small (but still large enough to define long codes) and consequently having more efficient arithmetic. It follows that schemes based on GS codes (and DAGS is no exception) are usually defined over a relatively large base field, with the goal of minimizing the valuem; the “burden" of thwarting FOPT falls then ont, which is chosen as relatively large. This has the additional advantage of a better error-correction capacity, since the number of correctable errors depends ont (it is in factst/2). Better error-correction means that generic decoding attacks like Information-Set Decoding (ISD) [23,24] are harder, and thus implies the possibility of better parameters.

    4.1. Initial Choice of Parameters

    We report DAGS parameters belowTable 2. Note that in all cases the conditionmt21 is satisfied to prevent the FOPT attack, as suggested by the authors themselves.
    For DAGS 3 and DAGS 5, the finite fieldGF(26) is built using the polynomialx6+x+1 and then extended toGF(212) using the quadratic irreducible polynomialx2+α34x+α, whereα is a primitive element ofGF(26).

    4.2. Improved Field Selection

    The protocol specification for DAGS 3 and DAGS 5 is the same; hence, we choose to focus on DAGS 5 optimization in this section. The overall process is
    • Key Generation,
    • Encapsulation,
    • Decapsulation.
    Multiplications over the finite fieldsGF(26) andGF((26)2) are carried all along the process and must be protected especially in critical operations such as key generation and encapsulation. The key generation is performed over the tower fieldGF((26)2) using the log-antilog tables ofGF(26). Thus, it must be protected against cache-timing attack on one hand and optimized for fast implementation on the other. Encapsulation is a critical operation performed overGF(26) using the tabulated method and hence is vulnerable to cache-leakage. In the following, we propose comparing the performance of seven implementations of multiplication algorithms overGF(26),GF((26)2) andGF(212). In fact, according toSection 2, we can convert elements fromGF((26)2) toGF(212) to perform multiplication in the key generation process faster. In the example ofSection 2, we used DAGS polynomialsx6+x+1 forGF(26) andx2+α34x+α forGF((26)2) withα a primitive inGF(26) and hence we can use the derived matrixT for the isomorphic mapping. Note that we can change the tower field polynomial, yet still be consistent with DAGS design, in order to construct a fieldGF(212) with a sparse generator polynomial. That is to say, using the polynomialx2+α27x+α for the extension yields a mapping toGF(212) with the generator trinomialx12+x7+1. However, the overall gain when compared to the performances using the pentanomial from the example is negligible, not to mention the cost of changing the initial polynomial in the reference code. Thus, we chose to keep the initial parameters and compare the following seven implementations on the basis of security and performance:
    • Tabulated log/antilog (Algorithms A1–A3),
    • Iterative, conditional reduction (Algorithm A5),
    • Iterative, ASM with PCLMUL, conditional reduction (Algorithm A5),
    • Iterative, unconditional reduction (Algorithm A6),
    • Iterative, ASM with PCLMUL, unconditional reduction (Algorithm A6),
    • Iterative, unconditional reduction, 1-bit-sliced, 64 comput. in parallel (Algorithm A7),
    • Iterative, ASM with PCLMUL, unconditional reduction, bit-sliced 2 computs. In parallel (Algorithm A8).

    4.3. Implementation Performances

    We will present the results of our experiments below.
    • (*): Conversion fromGF((26)2) toGF(212) usingT in Example (1) is 112 cycles, using POPCNT ASM instruction is 38 cycles (Algorithm A9).
    • (**): Time to initialize the tables: (Algorithm A1 and A2);
      • 2360 cycles onGF(26),
      • 267,086 cycles onGF((26)2) and,
      • 7884 cycles onGF(212),
      (can be precomputed, hence cycles=0)
    • (***): Transposition (Algorithm A7.1) time is;
      • 780 cycles onGF(26) and,
      • 1613 cycles onGF(212),
    • (****): Transposition(X,X)X22N+X is 2 cycles onGF(2N).
    InTable 3, we have presented the performances of different implementations of multiplication over the finite fieldGF(26). We further compared these implementations over the tower fieldGF((26)2) and the isomorphic fieldGF(212). Note that the costs of isomorphic mapping, bitslice transposition and log-antilog tables initialization are excluded fromTable 3 since they can be carried only a few times during the process and not at each multiplication. Accordingly, we conclude the following:
    • The tabulated log-antilog version is the fastest amongst non-parallel algorithms.
    • It is faster to implement tower field computation directly in an isomorphic field of characteristic two.
    • The modular multiplication with Carry-less MULtiplication (PCLMUL) dedicated Assembly (ASM) instruction does not improve the speed since the overhead in the function call is dominating the computation for those small values ofN. However, in case of only one serial operation, PCLMUL should be used because it has the lowest latency.
    • Constant-time cache secure implementations take more time than those that are not secure. Moreover, we noticed that “conditional reduction” in C code is actually constant-time once compiled in assembly code (when optimization flag is set) owing to the use by the compiler of the CMOV (conditional move) assembly instruction, which executes in one single clock cycle.
    • The bitsliced single multiplication takes only55/64=0.86 cycle overGF(26) and335/64=5.23 cycles forGF(212) and is invulnerable to cache-timing attacks. Thus, it is our champion implementation to be chosen for fast and secure arithmetic overGF(2N).
    • For the second version of bitsliced implementation, we pack two wordsX,XGF(2N) asX22N+X. Then, the productsXY andXY can be computed in one go by noticing that PCLMULQDQ(X22N+X,Y22N+Y) = PCLMULQDQ(X,Y)24N + (PCLMULQDQ(X,Y)⊕ PCLMULQDQ(X,Y))22N + PCLMULQDQ(X,Y); hence, the results are obtained at bit indices[6N,4N] and[2N,0].
    For DAGS key generation, we chose to map elements fromGF((26)2) toGF(212) and perform a bitsliced multiplication for secure and fast computation. The encapsulation process is carried overGF(26) and we chose the bitsliced multiplication as well. Concerning the decapsulation, we mapped elements toGF(212) and kept the tabulated method because it is unimportant to secure this public-key process against cache-leakage.

    5. Conclusions

    In this paper, we compared several implementations of multiplication over the finite fieldGF(2N), forN<32, on the basis of security and performance. Our analysis showed that log-antilog tabulated method and conditional iterative methods are vulnerable to cache-timing attacks. Moreover, for big values ofN, the tabulated method becomes costly and tower fields are used to perform the arithmetic. We showed that this towering technique is slow and we proposed to map the elements to the isomorphic field for better performances.
    To counter the cache-attacks, we presented two constant-time implementations: the iterative method with unconditional reduction which removes branches and thus is longer and the bitsliced implementation which is executed SIMD. We used DAGS, a code-based KEM submission to NIST’s PQC call, as a case study to examine the different multiplications overGF((26)2). The conclusions are that the bitsliced implementation is faster than the tower multiplication and secure with respect to cache-attacks. It should be pointed out that our results also apply to secure and accelerated implementations of the other PQC algorithms listed, as well as AES and symmetric ciphers that run over finite fieldsGF(2N). Finally, note that all the algorithms tested in the paper are provided in C language inAppendix A.

    Author Contributions

    Conceptualization, J.-L.D.; investigation and writing: Y.E.H.; review and editing, A.F.; formal analysis, C.T.G.; project administration, S.G.; data curation, S.H.; visualization, O.N.; investigation, E.P.; investigation, visualization, A.S.

    Funding

    We acknowledge the support of the FrenchProgramme d’Investissements d’Avenir under the national project RISQ.

    Conflicts of Interest

    The authors declare no conflict of interest.

    Appendix A. C Code for Various Algorithms

    • #include <stdlib.h>
    • #include <stdint.h>
    • typedef uint16_t gf_t; /* Galois field elements */
    • #define gf_extd 6
    • #define gf_card (1 << gf_extd)
    • #define gf_ord ((gf_card)-1)
    • #define poly_primitive_subfield 67 // 0x43 (0b01000011; the bits are defined
    • //following the polynomial: X^6 + x + 1
    • /* Algorithm A1: Precomputation of the antilog table for F(2)[x]/x^6+x+1 */
    • static gf_t *gf_antilog;
    • void gf_init_antilog()
    • {
    •   int i = 1;
    •   int temp = 1 << (gf_extd - 1);
    •   gf_antilog = (gf_t *)malloc((gf_card * sizeof(gf_t)));
    •   gf_antilog[0] = 1; // Dummy value (not used)
    •   for (i = 1; i < gf_ord; ++i)
    •   {
    •     gf_antilog[i] = gf_antilog[i - 1] << 1;
    •     if ((gf_antilog[i - 1]) & temp)
    •     {
    •       // XOR with 67: X^6 + x + 1
    •       gf_antilog[i] ^= poly_primitive_subfield;
    •     }
    •   }
    •   gf_antilog[gf_ord] = 1;
    • }
    • /* Algorithm A2: Precomputation of the log table for F(2)[x]/x^6+x+1 */
    • static gf_t* gf_log;
    • void gf_init_log()
    • {
    •   int i = 1;
    •   gf_log = (gf_t *)malloc((gf_card * sizeof(gf_t)));
    •   gf_log[0] = -1; // Dummy value (not used)
    •   gf_log[1] = 0;
    •   for (i = 1; i < gf_ord; ++i)
    •   {
    •     gf_log[gf_antilog[i]] = i;
    •   }
    • }
    • /* Algorithm A3: Tabulated multiplication over GF(2^6) */
    • /* Use precomputed tables to accelerate the multiplication: it uses the */
    • /* algorithm 1 and 2 which are done just once to the DAGS initialization*/
    • /* This algorithm is not constant-time, so it is not protected. */
    • #define gf_mult_tabulated(x,y)((y) ? gf_antilog[(gf_log[x]+gf_log[y])
    • % gf_ord]: 0)
    • //not constant-time
    • /* Algorithm A4: Tabulated multiplication over GF((2^6)^2) */
    • /* Uses the algorithm 3, so it is not protected */
    • gf_t gf_mult_extension_tabulated(gf_t x, gf_t y)
    • {
    •   gf_t a1, b1, a2, b2, a3, b3;
    •   a1 = x >> 6;
    •   b1 = x & 63;
    •   a2 = y >> 6;
    •   b2 = y & 63;
    •   // not constant-time
    •   a3 = gf_mult_tabulated(gf_mult_tabulated(a1, a2), 36) ^
    •     gf_mult_tabulated(a1, b2)^ gf_mult_tabulated(b1, a2);
    •   //36 is p_1 in the extension polynomial
    •   b3 = gf_mult_tabulated(gf_mult_tabulated(a1, a2), 2)
    •      ^ gf_mult_tabulated(b1, b2); //2 is p_0 in the extension polynomial
    •   return (a3 << 6) ^ b3;
    • }
    • /* Algorithm A5: Iterative multiplication over GF(2^6) with conditional reduction */
    • /* The multiplication does not use the precomputed tables and the
    • ASM PCLMUL instruction */
    • /* can be used. It is not constant-time. */
    • gf_t gf_mult_iterative_conditional(gf_t x, gf_t y)
    • {
    • #ifndef PCLMUL
    •   gf_t res,m;
    •   res = 0; // this variable will contain the result
    •   for(int i=0;i<6;++i) // For each coefficient of the polynomial
    •   {
    •    if(y&1==1) // Check the coefficient, it is not constant-time
    •    {
    •     res = res ^ x; // addition
    •    }
    •    y=y>>1;
    •    //this shift permits to have the next coefficient b_i for the next iteration
    •    x = x << 1;
    •    if((x & 64) != 0)
    •    // x must be reduced modulo X^6+X+1, 64 for 0x40, 0b01000000
    •    {
    •     x ^= 67; // 0x43: X^6 + x + 1
    •    }
    •   }
    •   return res;
    • #else
    •   //using ASM PCLMUL instruction
    •   uint32_t a, m;
    • // Multiplication
    •   asmvolatile ("movdqa  %1, %%xmm0;\n\t"
    •          "movdqa  %2, %%xmm1;\n\t"
    •          "pclmulqdq$0x00, %%xmm0, %%xmm1;\n\t"
    •          "movdqa %%xmm1, %0;\n\t"
    •       : "=x"(a)
    •       : "x"((uint32_t)y), "x"((uint32_t)x)
    •       : "%xmm0","%xmm1"
    •       );
    •   // reduction polynomial (conditional reduction)
    •   for (int k=0; k<6; k++) { // For each coefficient of the polynomial
    •     if (a >> (11-k)) //not constant-time
    •     {
    •       a ^= (67 << (5-k)); // 0x43: X^6 + x + 1
    •     }
    •   }
    •   return a&0xFFFF;
    • #endif
    • }
    • /* Algorithm A6: Iterative multiplication
    •    over GF(2^6) with unconditional reduction */
    • /* The multiplication does not use the
    •    precomputed tables and the ASM PCLMUL instruction */
    • /* can be used. It is constant-time. */
    • gf_t gf_mult_iterative_unconditional(gf_t x, gf_t y)
    • {
    • #ifndef PCLMUL
    •   gf_t res,m;
    •   res = 0;
    •   for(int i=0;i<6;++i)
    •   // For each coefficient of the polynomial, constant-time
    •   {
    •     m = -(y&1); //m is either 0xffff or 0x0000
    •     res = res ^ (x&m); // addition
    •     y=y>>1;
    •     x = x << 1;
    •     // x must be reduced modulo X^6+X+1
    •     m=-(((x)>>6)&1);
    •     x ^= m & 67; // 0x43 : X^6 + x + 1
    •   }
    •   return res;
    • #else
    •   //using ASM PCLMUL instruction
    •   uint32_t a, m;
    • // multiplication
    •   asmvolatile ("movdqa  %1, %%xmm0;\n\t"
    •          "movdqa  %2, %%xmm1;\n\t"
    •          "pclmulqdq$0x00, %%xmm0, %%xmm1;\n\t"
    •           "movdqa %%xmm1, %0;\n\t"
    •       : "=x"(a)
    •       : "x"((uint32_t)y), "x"((uint32_t)x)
    •       :"%xmm0","%xmm1"
    •       );
    •   // reduction polynomial
    •   for (int k=0; k<6; k++) {
    •     m = -((a >> (11-k))&1);
    •     a ^= ((67 << (5-k))&m); // 0x43: X^6 + x + 1
    •   }
    •   return a&0xFFFF;
    • #endif
    • }
    • /* Algorithm A7: 1-bitsliced multiplication
    •   over GF(2^6) (64 computations in parallel) */
    • /* A7.1: Transpositions */
    • void to_bitslice(gf_t *x, uint64_t *res) {
    •   int i = 0;
    •   for (i = 0; i<64; i++) {
    •     res[0] |= (((((uint64_t)x[i])) & 1) << i);
    •     res[1] |= (((((uint64_t)x[i]) >> 1) & 1) << i);
    •     res[2] |= (((((uint64_t)x[i]) >> 2) & 1) << i);
    •     res[3] |= (((((uint64_t)x[i]) >> 3) & 1) << i);
    •     res[4] |= (((((uint64_t)x[i]) >> 4) & 1) << i);
    •     res[5] |= (((((uint64_t)x[i]) >> 5) & 1) << i);
    •   }
    • }
    • void from_bitslice(uint64_t *res, gf_t *x) {
    •   int i = 0;
    •   for (i = 0;i<64; i++) {
    •     x[i] |= (((res[0] >> i) & 1));
    •     x[i] |= (((res[1] >> i) & 1) << 1);
    •     x[i] |= (((res[2] >> i) & 1) << 2);
    •     x[i] |= (((res[3] >> i) & 1) << 3);
    •     x[i] |= (((res[4] >> i) & 1) << 4);
    •     x[i] |= (((res[5] >> i) & 1) << 5);
    •   }
    • }
    • /* A7.2: 1-bit-sliced multiplication (SIMD code) */
    • void gf_multsubTab(gf_t *x, gf_t *y, gf_t *z)
    • {
    •   uint64_t xbin[6];
    •   uint64_t ybin[6];
    •   uint64_t res[6];
    •   xbin[0]=xbin[1]=xbin[2]=xbin[3]=xbin[4]=xbin[5] = 0;
    •   ybin[0]=ybin[1]=ybin[2]=ybin[3]=ybin[4]=ybin[5] = 0;
    •   // Transpose x and y
    •   to_bitslice(x, xbin);
    •   to_bitslice(y, ybin);
    •   // Multiplication and reduction polynomial
    •    //with 64 computations in parallel for a each coefficient of the polynomial
    • // constant-time
    •   uint64_tconst xbin05 = xbin[0] ^ xbin[5];
    •   uint64_tconst xbin54 = xbin[5] ^ xbin[4];
    •   uint64_tconst xbin43 = xbin[4] ^ xbin[3];
    •   uint64_tconst xbin32 = xbin[3] ^ xbin[2];
    •   uint64_tconst xbin21 = xbin[2] ^ xbin[1];
    •   res[0] = (xbin[0] & ybin[0]);
    •   res[1] = (xbin[1] & ybin[0]);
    •   res[2] = (xbin[2] & ybin[0]);
    •   res[3] = (xbin[3] & ybin[0]);
    •   res[4] = (xbin[4] & ybin[0]);
    •   res[5] = (xbin[5] & ybin[0]);
    •   res[0] ^= (xbin[5] & ybin[1]);
    •   res[1] ^= (xbin05 & ybin[1]);
    •   res[2] ^= (xbin[1] & ybin[1]);
    •   res[3] ^= (xbin[2] & ybin[1]);
    •   res[4] ^= (xbin[3] & ybin[1]);
    •   res[5] ^= (xbin[4] & ybin[1]);
    •   res[0] ^= (xbin[4] & ybin[2]);
    •   res[1] ^= (xbin54 & ybin[2]);
    •   res[2] ^= (xbin05 & ybin[2]);
    •   res[3] ^= (xbin[1] & ybin[2]);
    •   res[4] ^= (xbin[2] & ybin[2]);
    •   res[5] ^= (xbin[3] & ybin[2]);
    •   res[0] ^= (xbin[3] & ybin[3]);
    •   res[1] ^= (xbin43 & ybin[3]);
    •   res[2] ^= (xbin54 & ybin[3]);
    •   res[3] ^= (xbin05 & ybin[3]);
    •   res[4] ^= (xbin[1] & ybin[3]);
    •   res[5] ^= (xbin[2] & ybin[3]);
    •   res[0] ^= (xbin[2] & ybin[4]);
    •   res[1] ^= (xbin32 & ybin[4]);
    •   res[2] ^= (xbin43 & ybin[4]);
    •   res[3] ^= (xbin54 & ybin[4]);
    •   res[4] ^= (xbin05 & ybin[4]);
    •   res[5] ^= (xbin[1] & ybin[4]);
    •   res[0] ^= (xbin[1] & ybin[5]);
    •   res[1] ^= (xbin21 & ybin[5]);
    •   res[2] ^= (xbin32 & ybin[5]);
    •   res[3] ^= (xbin43 & ybin[5]);
    •   res[4] ^= (xbin54 & ybin[5]);
    •   res[5] ^= (xbin05 & ybin[5]);
    •   // Transpose
    •   from_bitslice(res, z);
    • }
    • /* Algorithm A8: Iterative, ASM with PCLMUL, unconditional reduction, bit-sliced
    • (2 computations in parallel) */
    • /* with PCMUL, 2 computations maximum are possible. It is constant-time. */
    • void gf_mult_bitslice_2computations(gf_t *x, gf_t *y, gf_t *tab) {
    •   // Transposition for computation in parallel
    •   uint64_t x2 = x[1] << 12 | x[0], y2 = y[1] << 12 | y[0];
    •   uint64_t a, m, m1, s, m0;
    • // Multiplication
    •   // As the output is on 64 bits max
    •   asmvolatile ("movdqa  %1, %%xmm0;\n\t"
    •           "movdqa  %2, %%xmm1;\n\t"
    •           "pclmulqdq$0x00, %%xmm0, %%xmm1;\n\t"
    •           "movq %%xmm1, %0;\n\t"
    •           : "=x"(a)
    •           : "x"(y2), "x"(x2)
    •           :"%xmm0","%xmm1"
    •           );
    •   // Polynomial reduction
    •   for (int k=0; k<6; k++) {
    •     m0 = a >> (11-k);
    •     m = -(m0&1);
    •     m1 = -((m0>>24)&1);
    •     s = (67 << (5-k)); // 0x43: X^6 + x + 1
    •     a ^= (( s & m ) | ((s << 24) & m1 ));
    •   }
    •   // Transposition
    •   tab[0] = a&0x3F;
    •   tab[1] = (a>>24)&0x3F;
    • }
    • /* Algorithm A9: Mapping between GF((2^6)^2) and GF(2^12) */
    • //Conversion Matrix from GF((2^6)^2) to GF(2^12)
    • static const gf_t T[12] = {3857, 1140, 3330, 132, 286,
    • 1954, 1938, 1208, 314, 3754, 2750, 188};
    • //Conversion Matrix from GF(2^12) to GF((2^6)^2)
    • static const gf_t Ti[12] = {3321, 3388, 4080, 2152,
    •  3712, 3808, 2274, 4088, 1076, 3904, 1904, 3708};
    • //Hamming weight computation
    • static inline gf_t hamming_weight(gf_t n) {
    • #ifndef ASM_POPCNT
    •   n = ((n & 0x0AAA) >> 1) + (n & 0x0555);
    •   n = ((n & 0x0CCC) >> 2) + (n & 0x0333);
    •   n = ((n & 0x00F0) >> 4) + (n & 0x0F0F);
    •   n = ((n & 0x0F00) >> 8) + (n & 0x00FF);
    • #else
    •   //using ASM
    •   asm (
    •     "POPCNT %1, %0 \n" // Count of Number of Bits Set to 1
    •     : "=r" (n)
    •     : "mr" (n)
    •     : "cc"
    •   );
    • #endif
    •   return n;
    • }
    • /* A9.1: Conversion from GF(2^12) to GF((2^6)^2) */
    • /* with the convertion Matrix Ti from GF(2^12) to GF((2^6)^2) */
    • gf_t iconv_bit(gf_t x)
    • {
    •   gf_t res = 0;
    •   for (int i=0; i<12; i++) {
    •     res |= (hamming_weight(x & Ti[i])&1) << i; // Ti defined in (3.5)
    •   }
    •   return res;
    • }
    • /* A9.2: Conversion from GF((2^6)^2) to GF(2^12) */
    • /* with the convertion Matrix T from GF((2^6)^2) to GF(2^12) */
    • gf_t conv_bit(gf_t x)
    • {
    •   gf_t res = 0;
    •   for (int i=0; i<12; i++) {
    •     res |= (hamming_weight(x & T[i])&1) << i; // T defined in (3.5)
    •   }
    •   return res;
    • }

    References

    1. Paar, C. Efficient VLSI architectures for Bit-Parallel Computation in Galois Fields. Ph.D. Thesis, Institute for Experimental Mathematics, University of Essen, Duisburg, Germany, 1994. Available online:https://tinyurl.com/yc7hmfmo (accessed on 18 September 2018).
    2. Sunar, B.; Savas, E.; Koç, Ç.K. Constructing composite field representations for efficient conversion.IEEE Trans. Comput.2003,52, 1391–1398. [Google Scholar] [CrossRef] [Green Version]
    3. Round 1 Submissions (30/11/2017)—Post-Quantum Cryptography. Available online:https://csrc.nist.gov/Projects/Post-Quantum-Cryptography/Round-1-Submissions (accessed on 18 September 2018).
    4. DAGS project. Available online:http://www.dags-project.org (accessed on 18 September 2018).
    5. NIST/ITL/CSD. Advanced Encryption Standard (AES). FIPS PUB 197, 11/26/2001. (Also ISO/IEC 18033-3:2010). Available online:http://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.197.pdf (accessed on 18 September 2018).
    6. McEliece, R.J. A public-key cryptosystem based on algebraic coding theory.JPL DSN Prog. Rep.1978,42–44, 114–116. [Google Scholar]
    7. Rivest, R.L.; Shamir, A.; Adleman, L. A method for obtaining digital signatures and public-key cryptosystems.Commun. ACM1978,21, 120–126. [Google Scholar] [CrossRef] [Green Version]
    8. Diffie, W.; Hellman, M. New directions in cryptography.IEEE Trans. Inf. Theory1976,22, 644–654. [Google Scholar] [CrossRef] [Green Version]
    9. Bardet, M.; Chaulet, J.; Dragoi, V.; Otmani, A.; Tillich, J.P. Cryptanalysis of the McEliece public key cryptosystem based on polar codes. In Proceedings of the 7th International Conference on Post-Quantum Cryptography (PQCrypto 2016), Fukuoka, Japan, 24–26 February 2016; Springer: Berlin, Germany, 2016; pp. 118–143. [Google Scholar]
    10. Post-Quantum Cryptography Challenge (ongoing). Available online:https://csrc.nist.gov/Projects/Post-Quantum-Cryptography/Round-1-Submissions (accessed on 18 September 2018).
    11. Yarom, Y.; Falkner, K. FLUSH+RELOAD: A High Resolution, Low Noise, L3 Cache Side-Channel Attack. In Proceedings of the 23rd USENIX Security Symposium (USENIX Security 14), San Diego, CA, USA, 20–22 August 2014; pp. 719–732. [Google Scholar]
    12. Facon, A.; Guilley, S.; Lec’hvien, M.; Schaub, A.; Souissi, Y. Detecting cache-timing vulnerabilities in post-quantum cryptography algorithms. In Proceedings of the 3rd IEEE International Verification and Security Workshop, Hotel Cap Roig, Platja d’Aro, Costa Brava, Spain, 2–4 July 2018. [Google Scholar]
    13. Lidl, R.; Niederreiter, H.Finite Fields; Cambridge University Press: Cambridge, UK, 1997. [Google Scholar]
    14. Tromer, E.; Osvik, D.A.; Shamir, A. Efficient Cache Attacks on AES, and Countermeasures.J. Cryptol.2010,23, 37–71. [Google Scholar] [CrossRef]
    15. Aciiçmez, O.; Koç, Ç.K.; Seifert, J.P. On the power of simple branch prediction analysis. In Proceedings of the 2nd ACM Symposium on Information, Computer and Communications Security, Singapore, 20–22 March 2007; pp. 312–320. [Google Scholar]
    16. Aciiçmez, O.; Koç, Ç.K.; Seifert, J. Predicting Secret Keys Via Branch Prediction. In Proceedings of the Cryptographers’ Track at the RSA Conference 2007, San Francisco, CA, USA, 5–9 February 2007; pp. 225–242. [Google Scholar]
    17. Biham, E. A Fast New DES Implementation in Software. In Proceedings of the the Fourth International Workshop on Fast Software Encryption, Haifa, Israel, 20–22 January 1997; pp. 260–272. [Google Scholar]
    18. Matsui, M.; Nakajima, J. On the Power of Bitslice Implementation on Intel Core2 Processor. In Proceedings of the Cryptographic Hardware and Embedded Systems, Vienna, Austria, 10–13 September 2007; pp. 121–134. [Google Scholar]
    19. Berlekamp, E.; McEliece, R.; van Tilborg, H. On the Inherent Intractability of Certain Coding Problems.IEEE Trans. Inform. Theory1978,24, 384–386. [Google Scholar] [CrossRef]
    20. Misoczki, R.; Barreto, P.S.L.M.B. Compact McEliece Keys from Goppa Codes. In Proceedings of the 16th Workshop on Selected Areas in Cryptography (SAC 2009), Calgary, AB, Canada, 13–14 August 2009; pp. 376–392. [Google Scholar]
    21. Persichetti, E. Compact McEliece keys based on quasi-dyadic Srivastava codes.J. Math. Cryptol.2012,6, 149–169. [Google Scholar] [CrossRef]
    22. Faugère, J.C.; Otmani, A.; Perret, L.; Tillich, J.P. Algebraic Cryptanalysis of McEliece Variants with Compact Keys. In Proceedings of the 29th Annual International Conference on the Theory and Applications of Cryptographic Techniques, French Riviera, France, 30 May–3 June 2010; pp. 279–298. [Google Scholar]
    23. Prange, E. The use of information sets in decoding cyclic codes.IRE Trans. Inf. Theory1962,8, 5–9. [Google Scholar] [CrossRef]
    24. Peters, C. Information-Set Decoding for Linear Codes overFq. In Proceedings of the The Third International Workshop on Post-Quantum Cryptography, Darmstadt, Germany, 25–28 May 2010; pp. 81–94. [Google Scholar]
    Table
    Table 1. PQC submissions at NIST [3] using vulnerableGF(2N) operations.
    Table 1. PQC submissions at NIST [3] using vulnerableGF(2N) operations.
    SubmissionTypeFinite FieldTower Fields Used
    BIG QUAKECode-basedGF(2N),N=12,18No
    DAGSCode-basedGF((25)2),GF((26)2)Yes
    EdonKCode-basedGF(2N),N=128,192No
    RamstakeCode-basedGF(28)No
    RLCECode-basedGF(2N),N=10,11No
    LACLattice-basedGF(2N),N=9,10No
    DMEMultivariateGF(2N),N=24,48No
    HIMQ-3MultivariateGF(28)No
    LUOVMultivariateGF(28),GF((216)),=3,4,5Yes
    Table
    Table 2. DAGS [4] parameters.
    Table 2. DAGS [4] parameters.
    NameSecurity LevelqmnkstPublic Key Size
    DAGS_112825283241624136760
    DAGS_3256262121651225118448
    DAGS_55122622112704261111,616
    Table
    Table 3. Performances (in clock cycles) of several multiplication algorithms on an Intel(R) Core(TM) i7-4790 CPU @3.60GHz with one processor.
    Table 3. Performances (in clock cycles) of several multiplication algorithms on an Intel(R) Core(TM) i7-4790 CPU @3.60GHz with one processor.
    Multiplication AlgorithmAlgorithmGF(26)GF(212) (*)GF((26)2)Constant-Time
    Tabulated log/antilog (**)3, 481120No
    Iterative, conditional reduction52751133No
    Iterative, ASM with PCLMUL,52941146No
    conditional reduction
    Iterative, unconditional reduction63058155Yes
    Iterative, ASM with PCLMUL,63565225Yes
    unconditional reduction
    Iterative, unconditional reduction,755/64335/64-Yes
    1-bit-sliced (***) 64 computations in parallel
    Iterative, ASM with PCLMUL,855/295/2-Yes
    unconditional reduction, bit-sliced (****) 2
    computations in parallel

    © 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

    Share and Cite

    MDPI and ACS Style

    Danger, J.-L.; El Housni, Y.; Facon, A.; Gueye, C.T.; Guilley, S.; Herbel, S.; Ndiaye, O.; Persichetti, E.; Schaub, A. On the Performance and Security of Multiplication inGF(2N).Cryptography2018,2, 25. https://doi.org/10.3390/cryptography2030025

    AMA Style

    Danger J-L, El Housni Y, Facon A, Gueye CT, Guilley S, Herbel S, Ndiaye O, Persichetti E, Schaub A. On the Performance and Security of Multiplication inGF(2N).Cryptography. 2018; 2(3):25. https://doi.org/10.3390/cryptography2030025

    Chicago/Turabian Style

    Danger, Jean-Luc, Youssef El Housni, Adrien Facon, Cheikh T. Gueye, Sylvain Guilley, Sylvie Herbel, Ousmane Ndiaye, Edoardo Persichetti, and Alexander Schaub. 2018. "On the Performance and Security of Multiplication inGF(2N)"Cryptography 2, no. 3: 25. https://doi.org/10.3390/cryptography2030025

    APA Style

    Danger, J.-L., El Housni, Y., Facon, A., Gueye, C. T., Guilley, S., Herbel, S., Ndiaye, O., Persichetti, E., & Schaub, A. (2018). On the Performance and Security of Multiplication inGF(2N).Cryptography,2(3), 25. https://doi.org/10.3390/cryptography2030025

    Article Metrics

    No
    No

    Article Access Statistics

    For more information on the journal statistics, clickhere.
    Multiple requests from the same IP address are counted as one view.
    Cryptography, EISSN 2410-387X, Published by MDPI
    RSSContent Alert

    Further Information

    Article Processing Charges Pay an Invoice Open Access Policy Contact MDPI Jobs at MDPI

    Guidelines

    For Authors For Reviewers For Editors For Librarians For Publishers For Societies For Conference Organizers

    MDPI Initiatives

    Sciforum MDPI Books Preprints.org Scilit SciProfiles Encyclopedia JAMS Proceedings Series

    Follow MDPI

    LinkedIn Facebook X
    MDPI

    Subscribe to receive issue release notifications and newsletters from MDPI journals

    © 1996-2025 MDPI (Basel, Switzerland) unless otherwise stated
    Terms and Conditions Privacy Policy
    We use cookies on our website to ensure you get the best experience.
    Read more about our cookieshere.
    Accept
    Back to TopTop
    [8]ページ先頭

    ©2009-2025 Movatter.jp