Kernel TLS¶
Overview¶
Transport Layer Security (TLS) is a Upper Layer Protocol (ULP) that runs overTCP. TLS provides end-to-end data integrity and confidentiality.
User interface¶
Creating a TLS connection¶
First create a new TCP socket and once the connection is established set theTLS ULP.
sock=socket(AF_INET,SOCK_STREAM,0);connect(sock,addr,addrlen);setsockopt(sock,SOL_TCP,TCP_ULP,"tls",sizeof("tls"));
Setting the TLS ULP allows us to set/get TLS socket options. Currentlyonly the symmetric encryption is handled in the kernel. After the TLShandshake is complete, we have all the parameters required to move thedata-path to the kernel. There is a separate socket option for movingthe transmit and the receive into the kernel.
/* From linux/tls.h */structtls_crypto_info{unsignedshortversion;unsignedshortcipher_type;};structtls12_crypto_info_aes_gcm_128{structtls_crypto_infoinfo;unsignedchariv[TLS_CIPHER_AES_GCM_128_IV_SIZE];unsignedcharkey[TLS_CIPHER_AES_GCM_128_KEY_SIZE];unsignedcharsalt[TLS_CIPHER_AES_GCM_128_SALT_SIZE];unsignedcharrec_seq[TLS_CIPHER_AES_GCM_128_REC_SEQ_SIZE];};structtls12_crypto_info_aes_gcm_128crypto_info;crypto_info.info.version=TLS_1_2_VERSION;crypto_info.info.cipher_type=TLS_CIPHER_AES_GCM_128;memcpy(crypto_info.iv,iv_write,TLS_CIPHER_AES_GCM_128_IV_SIZE);memcpy(crypto_info.rec_seq,seq_number_write,TLS_CIPHER_AES_GCM_128_REC_SEQ_SIZE);memcpy(crypto_info.key,cipher_key_write,TLS_CIPHER_AES_GCM_128_KEY_SIZE);memcpy(crypto_info.salt,implicit_iv_write,TLS_CIPHER_AES_GCM_128_SALT_SIZE);setsockopt(sock,SOL_TLS,TLS_TX,&crypto_info,sizeof(crypto_info));
Transmit and receive are set separately, but the setup is the same, using eitherTLS_TX or TLS_RX.
Sending TLS application data¶
After setting the TLS_TX socket option all application data sent over thissocket is encrypted using TLS and the parameters provided in the socket option.For example, we can send an encrypted hello world record as follows:
constchar*msg="hello world\n";send(sock,msg,strlen(msg));
send() data is directly encrypted from the userspace buffer providedto the encrypted kernel send buffer if possible.
The sendfile system call will send the file’s data over TLS records of maximumlength (2^14).
file=open(filename,O_RDONLY);fstat(file,&stat);sendfile(sock,file,&offset,stat.st_size);
TLS records are created and sent after eachsend() call, unlessMSG_MORE is passed. MSG_MORE will delay creation of a record untilMSG_MORE is not passed, or the maximum record size is reached.
The kernel will need to allocate a buffer for the encrypted data.This buffer is allocated at the timesend() is called, such thateither the entiresend() call will return -ENOMEM (or block waitingfor memory), or the encryption will always succeed. Ifsend() returns-ENOMEM and some data was left on the socket buffer from a previouscall using MSG_MORE, the MSG_MORE data is left on the socket buffer.
Receiving TLS application data¶
After setting the TLS_RX socket option, all recv family socket callsare decrypted using TLS parameters provided. A full TLS record mustbe received before decryption can happen.
charbuffer[16384];recv(sock,buffer,16384);
Received data is decrypted directly in to the user buffer if it islarge enough, and no additional allocations occur. If the userspacebuffer is too small, data is decrypted in the kernel and copied touserspace.
EINVAL is returned if the TLS version in the received message does notmatch the version passed in setsockopt.
EMSGSIZE is returned if the received message is too big.
EBADMSG is returned if decryption failed for any other reason.
Send TLS control messages¶
Other than application data, TLS has control messages such as alertmessages (record type 21) and handshake messages (record type 22), etc.These messages can be sent over the socket by providing the TLS record typevia a CMSG. For example the following function sends @data of @length bytesusing a record of type @record_type.
/* send TLS control message using record_type */staticintklts_send_ctrl_message(intsock,unsignedcharrecord_type,void*data,size_tlength){structmsghdrmsg={0};intcmsg_len=sizeof(record_type);structcmsghdr*cmsg;charbuf[CMSG_SPACE(cmsg_len)];structiovecmsg_iov;/* Vector of data to send/receive into. */msg.msg_control=buf;msg.msg_controllen=sizeof(buf);cmsg=CMSG_FIRSTHDR(&msg);cmsg->cmsg_level=SOL_TLS;cmsg->cmsg_type=TLS_SET_RECORD_TYPE;cmsg->cmsg_len=CMSG_LEN(cmsg_len);*CMSG_DATA(cmsg)=record_type;msg.msg_controllen=cmsg->cmsg_len;msg_iov.iov_base=data;msg_iov.iov_len=length;msg.msg_iov=&msg_iov;msg.msg_iovlen=1;returnsendmsg(sock,&msg,0);}
Control message data should be provided unencrypted, and will beencrypted by the kernel.
Receiving TLS control messages¶
TLS control messages are passed in the userspace buffer, with messagetype passed via cmsg. If no cmsg buffer is provided, an error isreturned if a control message is received. Data messages may bereceived without a cmsg buffer set.
charbuffer[16384];charcmsg[CMSG_SPACE(sizeof(unsignedchar))];structmsghdrmsg={0};msg.msg_control=cmsg;msg.msg_controllen=sizeof(cmsg);structiovecmsg_iov;msg_iov.iov_base=buffer;msg_iov.iov_len=16384;msg.msg_iov=&msg_iov;msg.msg_iovlen=1;intret=recvmsg(sock,&msg,0/* flags */);structcmsghdr*cmsg=CMSG_FIRSTHDR(&msg);if(cmsg->cmsg_level==SOL_TLS&&cmsg->cmsg_type==TLS_GET_RECORD_TYPE){intrecord_type=*((unsignedchar*)CMSG_DATA(cmsg));// Do something with record_type, and control message data in// buffer.//// Note that record_type may be == to application data (23).}else{// Buffer contains application data.}
recv will never return data from mixed types of TLS records.
TLS 1.3 Key Updates¶
In TLS 1.3, KeyUpdate handshake messages signal that the sender isupdating its TX key. Any message sent after a KeyUpdate will beencrypted using the new key. The userspace library can pass the newkey to the kernel using the TLS_TX and TLS_RX socket options, as forthe initial keys. TLS version and cipher cannot be changed.
To prevent attempting to decrypt incoming records using the wrong key,decryption will be paused when a KeyUpdate message is received by thekernel, until the new key has been provided using the TLS_RX socketoption. Any read occurring after the KeyUpdate has been read andbefore the new key is provided will fail with EKEYEXPIRED. poll() willnot report any read events from the socket until the new key isprovided. There is no pausing on the transmit side.
Userspace should make sure that the crypto_info provided has been setproperly. In particular, the kernel will not check for key/noncereuse.
The number of successful and failed key updates is tracked in theTlsTxRekeyOk,TlsRxRekeyOk,TlsTxRekeyError,TlsRxRekeyError statistics. TheTlsRxRekeyReceived statisticcounts KeyUpdate handshake messages that have been received.
Integrating in to userspace TLS library¶
At a high level, the kernel TLS ULP is a replacement for the recordlayer of a userspace TLS library.
A patchset to OpenSSL to use ktls as the record layer ishere.
An exampleof calling send directly after a handshake using gnutls.Since it doesn’t implement a full record layer, controlmessages are not supported.
Optional optimizations¶
There are certain condition-specific optimizations the TLS ULP can make,if requested. Those optimizations are either not universally beneficialor may impact correctness, hence they require an opt-in.All options are set per-socket usingsetsockopt(), and theirstate can be checked usinggetsockopt() and via socket diag (ss).
TLS_TX_ZEROCOPY_RO¶
For device offload only. Allowsendfile() data to be transmitted directlyto the NIC without making an in-kernel copy. This allows true zero-copybehavior when device offload is enabled.
The application must make sure that the data is not modified between beingsubmitted and transmission completing. In other words this is mostlyapplicable if the data sent on a socket viasendfile() is read-only.
Modifying the data may result in different versions of the data being usedfor the original TCP transmission and TCP retransmissions. To the receiverthis will look like TLS records had been tampered with and will resultin record authentication failures.
TLS_RX_EXPECT_NO_PAD¶
TLS 1.3 only. Expect the sender to not pad records. This allows the datato be decrypted directly into user space buffers with TLS 1.3.
This optimization is safe to enable only if the remote end is trusted,otherwise it is an attack vector to doubling the TLS processing cost.
If the record decrypted turns out to had been padded or is not a datarecord it will be decrypted again into a kernel buffer without zero copy.Such events are counted in theTlsDecryptRetry statistic.
TLS_TX_MAX_PAYLOAD_LEN¶
Specifies the maximum size of the plaintext payload for transmitted TLS records.
When this option is set, the kernel enforces the specified limit on all outgoingTLS records. No plaintext fragment will exceed this size. This option can be usedto implement the TLS Record Size Limit extension [1].
For TLS 1.2, the value corresponds directly to the record size limit.
For TLS 1.3, the value should be set to record_size_limit - 1, sincethe record size limit includes one additional byte for the ContentTypefield.
The valid range for this option is 64 to 16384 bytes for TLS 1.2, and 63 to16384 bytes for TLS 1.3. The lower minimum for TLS 1.3 accounts for theextra byte used by the ContentType field.
Statistics¶
TLS implementation exposes the following per-namespace statistics(/proc/net/tls_stat):
TlsCurrTxSw,TlsCurrRxSw-number of TX and RX sessions currently installed where host handlescryptographyTlsCurrTxDevice,TlsCurrRxDevice-number of TX and RX sessions currently installed where NIC handlescryptographyTlsTxSw,TlsRxSw-number of TX and RX sessions opened with host cryptographyTlsTxDevice,TlsRxDevice-number of TX and RX sessions opened with NIC cryptographyTlsDecryptError-record decryption failed (e.g. due to incorrect authentication tag)TlsDeviceRxResync-number of RX resyncs sent to NICs handling cryptographyTlsDecryptRetry-number of RX records which had to be re-decrypted due toTLS_RX_EXPECT_NO_PADmis-prediction. Note that this counter willalso increment for non-data records.TlsRxNoPadViolation-number of data RX records which had to be re-decrypted due toTLS_RX_EXPECT_NO_PADmis-prediction.TlsTxRekeyOk,TlsRxRekeyOk-number of successful rekeys on existing sessions for TX and RXTlsTxRekeyError,TlsRxRekeyError-number of failed rekeys on existing sessions for TX and RXTlsRxRekeyReceived-number of received KeyUpdate handshake messages, requiring userspaceto provide a new RX key