Movatterモバイル変換


[0]ホーム

URL:


Measuring and (slightly) Improving Post-Quantum Handshake Performance

2024-12-17

To defend against the potential advent of "Cryptographically Relevant Quantum Computers"there is a move to using "hybrid" key exchange algorithms. These glue togethera widely-deployed classical algorithm (likeX25519) and a new post-quantum-secure algorithm(likeML-KEM) and treat the result as one TLS-level key exchange algorithm (likeX25519MLKEM768).

In this report, first we'll measure the additional cost of post-quantum-secure key exchange.Then we'll describe and measure an optimization we have implemented.

Headline measurements

All these measurements are taken on our amd64 benchmarking machine, which has aXeon E-2386G CPU. We'll compare:

All three are taken on the same hardware, and the latter measurements are fromour previous report -- which also contains reproductioninstructions and describes what the benchmarks measure.

One important thing to note is that post-quantum key exchange involves sending andreceiving much larger messages than classical ones. Our benchmark design only coversCPU costs -- and does not include networking -- so real-world performance willbe worse than these measurements.

client handshake performance results on amd64 architecture

server handshake performance results on amd64 architecture

The cost of X25519MLKEM768 post-quantum key exchange is clearly visible forboth clients and servers.

We can see that the performance headroom that rustls has attained means we canalmostcompletely absorb the extra cost of post-quantum key exchange, while still performingbetter than (post-quantum-insecure) OpenSSL -- with the exception of client resumption.

We will do further comparative benchmarking in this area when OpenSSL gains post-quantum keyexchange support.

Sharing X25519 setup costs

Background

In TLS1.3, the client starts the key exchange in its first message (theClientHello).TheClientHello includes both a description of which algorithms the client supports, andzero or more presumptive "key shares".

The server then evaluates which algorithms it is willing to use, and either uses oneof the presumptive key shares, or replies with aHelloRetryRequest which instructsthe client to send newClientHello with a specific, mutually-acceptable key share.

AHelloRetryRequest can be expensive, because it introduces an additional round tripinto the handshake. It also means any work the client did for its presumptive keyshares is wasted.

It's therefore advantageous for a client to avoidHelloRetryRequests, by:

The key shares in aClientHello would look like:

diagram of TLS1.3 client key exchange with X25519MLKEM768

At least for a transitional period, we want to avoid aHelloRetryRequest roundtrip when connecting to a server that hasn't been upgraded to support X25519MLKEM768.That means also offering a separate X25519 key share:

diagram of TLS1.3 client key exchange with X25519MLKEM768 and X25519

However, this arrangement is not optimal. While X25519 setup is very fast, we are doing it twiceand then we are guaranteed to throw away half of that work, because the server can only ever selectone key share to use.

Instead, we can do:

diagram of TLS1.3 optimized client key exchange with X25519MLKEM768 and X25519

This report measures the benefit of that optimization.

This optimization is described further indraft-ietf-tls-hybrid-design section 3.2.

Micro benchmarking

First, we can micro-benchmark the time to construct and serialize aClientHello, in a varietyof situations:

We run this on two machines that cover both amd64 (Xeon E-2386G) and aarch64 (Ampere Altra Q80-30)architectures.

micro benchmark results on amd64 architecture

micro benchmark results on arm64 architecture

From this we can see:

Whole handshakes

Next, let's measure the same scenarios in the context of whole client handshakes.The remaining measurements are only done on our amd64 benchmark machine.

The above optimization only affects the client's first message, so now we'll seewhether the effect of the optimization is meaningful when compared to the restof the computation a client must do.

client handshake performance results on amd64 architecture

The difference is visible but small, as it has been diluted by other partsof the handshake. It is approximately 4.3% for resumptions,2.8% for full RSA handshakes, and 2.6% for ECDSA handshakes.


[8]ページ先頭

©2009-2025 Movatter.jp