2024-12-17
To defend against the potential advent of "Cryptographically Relevant Quantum Computers"there is a move to using "hybrid" key exchange algorithms. These glue togethera widely-deployed classical algorithm (likeX25519) and a new post-quantum-secure algorithm(likeML-KEM) and treat the result as one TLS-level key exchange algorithm (likeX25519MLKEM768).
In this report, first we'll measure the additional cost of post-quantum-secure key exchange.Then we'll describe and measure an optimization we have implemented.
All these measurements are taken on our amd64 benchmarking machine, which has aXeon E-2386G CPU. We'll compare:
All three are taken on the same hardware, and the latter measurements are fromour previous report -- which also contains reproductioninstructions and describes what the benchmarks measure.
One important thing to note is that post-quantum key exchange involves sending andreceiving much larger messages than classical ones. Our benchmark design only coversCPU costs -- and does not include networking -- so real-world performance willbe worse than these measurements.
The cost of X25519MLKEM768 post-quantum key exchange is clearly visible forboth clients and servers.
We can see that the performance headroom that rustls has attained means we canalmostcompletely absorb the extra cost of post-quantum key exchange, while still performingbetter than (post-quantum-insecure) OpenSSL -- with the exception of client resumption.
We will do further comparative benchmarking in this area when OpenSSL gains post-quantum keyexchange support.
In TLS1.3, the client starts the key exchange in its first message (theClientHello
).TheClientHello
includes both a description of which algorithms the client supports, andzero or more presumptive "key shares".
The server then evaluates which algorithms it is willing to use, and either uses oneof the presumptive key shares, or replies with aHelloRetryRequest
which instructsthe client to send newClientHello
with a specific, mutually-acceptable key share.
AHelloRetryRequest
can be expensive, because it introduces an additional round tripinto the handshake. It also means any work the client did for its presumptive keyshares is wasted.
It's therefore advantageous for a client to avoidHelloRetryRequest
s, by:
Having prior knowledge of the server's preferences.draft-ietf-tls-key-share-predictionis an effort to standardize a mechanism for a client to learn this out-of-band.
Remembering a server's preferences from a previous connection. rustls hasdone this since adding support for TLS1.3 in 2017. This generally meansa client making many connections to one server may avoid repeatedHelloRetryRequest
s.
Sending many presumptive key shares. Though there's an obvious trade-offin terms of wasted computation and message size.
Following ecosystem preferences.X25519 key exchange is overwhelminglypreferred in TLS1.3 implementations, due to its performance and implementationquality.
The key shares in aClientHello
would look like:
At least for a transitional period, we want to avoid aHelloRetryRequest
roundtrip when connecting to a server that hasn't been upgraded to support X25519MLKEM768.That means also offering a separate X25519 key share:
However, this arrangement is not optimal. While X25519 setup is very fast, we are doing it twiceand then we are guaranteed to throw away half of that work, because the server can only ever selectone key share to use.
Instead, we can do:
This report measures the benefit of that optimization.
This optimization is described further indraft-ietf-tls-hybrid-design section 3.2.
First, we can micro-benchmark the time to construct and serialize aClientHello
, in a varietyof situations:
We run this on two machines that cover both amd64 (Xeon E-2386G) and aarch64 (Ampere Altra Q80-30)architectures.
From this we can see:
Next, let's measure the same scenarios in the context of whole client handshakes.The remaining measurements are only done on our amd64 benchmark machine.
The above optimization only affects the client's first message, so now we'll seewhether the effect of the optimization is meaningful when compared to the restof the computation a client must do.
The difference is visible but small, as it has been diluted by other partsof the handshake. It is approximately 4.3% for resumptions,2.8% for full RSA handshakes, and 2.6% for ECDSA handshakes.