RFC 8871 | Private Media Framework | January 2021 |
Jones, et al. | Standards Track | [Page] |
This document describes a solution framework for ensuring that mediaconfidentiality and integrity are maintained end to end within thecontext of a switched conferencing environment where MediaDistributors are not trusted with the end-to-end mediaencryption keys. The solution builds upon existing securitymechanisms defined for the Real-time Transport Protocol (RTP).¶
This is an Internet Standards Track document.¶
This document is a product of the Internet Engineering Task Force (IETF). It represents the consensus of the IETF community. It has received public review and has been approved for publication by the Internet Engineering Steering Group (IESG). Further information on Internet Standards is available in Section 2 of RFC 7841.¶
Information about the current status of this document, any errata, and how to provide feedback on it may be obtained athttps://www.rfc-editor.org/info/rfc8871.¶
Copyright (c) 2021 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.¶
Switched conferencing is an increasingly popular model for multimediaconferences with multiple participants using a combination of audio,video, text, and other media types. With this model, real-time mediaflows from conference participants are not mixed, transcoded,translated, recomposed, or otherwise manipulated by a MediaDistributor, as might be the case with a traditional media server orMultipoint Control Unit (MCU). Instead, media flows transmitted byconference participants are simply forwarded by Media Distributorsto each of the other participants. Media Distributors often forward only a subset offlows based on voice activity detection or other criteria. In someinstances, Media Distributors may make limited modifications toRTP headers[RFC3550], for example, but the actual media content(e.g., voice or video data) is unaltered.¶
An advantage of switched conferencing is that Media Distributors canbe more easily deployed on general-purpose computing hardware,including virtualized environments in private and public clouds.Virtualized public cloud environments have been viewed as lesssecure, since resources are not always physically controlled bythose who use them. This document defines improved security so as tolower the barrier to taking advantage of those environments.¶
This document defines a solution framework wherein media privacy isensured by making it impossible for a Media Distributor togain access to keys needed to decrypt or authenticate the actual mediacontent sent between conference participants. At the same time, theframework allows for the Media Distributors to modify certain RTPheaders; add, remove, encrypt, or decrypt RTP header extensions; andencrypt and decrypt RTP Control Protocol (RTCP) packets[RFC3550].The framework also prevents replayattacks by authenticating each packet transmitted between a givenparticipant and the Media Distributor using a unique key perendpoint that is independent from the key for media encryption andauthentication.¶
This solution framework provides for enhanced privacyin RTP-based conferencing environments while utilizing existingsecurity procedures defined for RTP with minimal enhancements.¶
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14[RFC2119][RFC8174] when, and only when, they appear in all capitals, as shown here.¶
Additionally, this solution framework uses the followingterms and abbreviations:¶
Figure 1 depicts the trust relationships, direct orindirect, between entities described in the subsequent subsections.Note that these entities may be co-located or further divided intomultiple, separate physical devices.¶
Please note that some entities classified as untrusted in the simple,general deployment scenario used most commonly in this document mightbe considered trusted in other deployments. This document does notpreclude such scenarios, but it keeps the definitions and examplesfocused by only using the simple, most general deploymentscenario.¶
| +----------+ | +-----------------+ | Endpoint | | | Call Processing | +----------+ | +-----------------+ | |+----------------+ | +--------------------+| Key Distributor| | | Media Distributor |+----------------+ | +--------------------+ | Trusted | Untrusted Entities | Entities |
The architecture described in this framework document enablesconferencing infrastructure to be hosted in domains, such as in acloud conferencing provider's facilities, where the trustworthiness isbelow the level needed to assume that the privacy of the participant's mediais not compromised. The conferencing infrastructure in such adomain is still trusted with reliably connecting the participantstogether in a conference but is not trusted with keying material neededto decrypt any of the participant's media. Entities in suchless-trustworthy domains are referred to as untrustedentities from this point forward.¶
It is important to understand that "untrusted" in this document does notmean that an entity is not expected to function properly. Rather, it meansonly that the entity does not have access to the E2E media encryptionkeys.¶
A Media Distributor forwards RTP flows between endpoints in theconference while performing per-hop authentication of each RTP packet.The Media Distributor may need access to one or more RTP headers orheader extensions, potentially adding or modifying a certain subset.The Media Distributor also relays secured messaging between theendpoints and the Key Distributor and acquires per-hop keyinformation from the Key Distributor. The actual media contentmust not be decryptable by a Media Distributor, as it is not trusted tohave access to the E2E media encryption keys. The key exchangemechanisms specified in this framework prevent the Media Distributorfrom gaining access to the E2E media encryption keys.¶
An endpoint's ability to connect to a conference serviced by a MediaDistributor implies that the endpoint is authorized tohave access to the E2E media encryption keys, although the Media Distributordoes not have the ability to determine whether an endpoint isauthorized. Instead, the Key Distributor is responsible forauthenticating the endpoint (e.g., using WebRTC identity assertions[RFC8827]) and determining itsauthorization to receive E2E and HBH media encryption keys.¶
A Media Distributor must perform its role in properly forwardingmedia packets while taking measures to mitigate the adverse effects ofdenial-of-service attacks (refer toSection 8) to a level equalto or better than traditional conferencing (non-PERC)deployments.¶
A Media Distributor or associated conferencing infrastructure may alsoinitiate or terminate various messaging techniques related to conferencecontrol. This topic is outside the scope of this framework document.¶
Call processing is untrusted in the simple, generaldeployment scenario. When a physical subset of call processingresides in facilities outside the trusted domain, it shouldnot be trusted to have access to E2E key information.¶
Call processing may include the processing of callsignaling messages, as well as the signing of those messages. It mayalso authenticate the endpoints for the purpose of call signaling and ofsubsequently joining a conference hosted through one or more MediaDistributors. Call processing may optionally ensure the privacy ofcall signaling messages between itself (call processing), the endpoint, and otherentities.¶
From the PERC model system's perspective, entities considered trusted(refer toFigure 1) can be in possession of the E2E mediaencryption keys for one or more conferences.¶
An endpoint is considered trusted and has access to E2E keyinformation. While it is possible for an endpoint to be compromised,subsequently performing in undesired ways, defining endpointresistance to compromise is outside the scope of this document.Endpoints take measures to mitigate the adverse effects of denial-of-service attacks (refer toSection 8) from other entities,including from other endpoints, to a level equal to or better thantraditional conference (non-PERC) deployments.¶
The Key Distributor, which may be co-located with an endpoint or existstandalone, is responsible for providing key information to endpointsfor both E2E and HBH security and for providing keyinformation to Media Distributors for HBH security.¶
Interaction between the Key Distributor and call processingis necessary for proper conference-to-endpointmappings. This is described inSection 5.3.¶
The Key Distributor needs to be secured and managed in a way thatprevents exploitation by an adversary, as any kind of compromise of theKey Distributor puts the security of the conference at risk.¶
The Key Distributor needs to know which endpoints and which MediaDistributors are authorized to participate in the conference. How theKey Distributor obtains this information is outside the scope of thisdocument. However, Key DistributorsMUST reject DTLS associationswith any unauthorized endpoint or Media Distributor.¶
The purpose of this framework is to define a means through whichmedia privacy is ensured when communicating within a conferencingenvironment consisting of one or more Media Distributors that onlyswitch, and hence do not terminate, media. It does not otherwise attempt tohide the fact that a conference between endpoints is taking place.¶
This framework reuses several specified RTP security technologies,including the Secure Real-time Transport Protocol (SRTP)[RFC3711],Encrypted Key Transport (EKT)[RFC8870],and DTLS-SRTP.¶
This solution framework focuses on the E2E privacy andintegrity of the participant's media by limiting access to only trustedentities to the E2E key used for authenticated E2E encryption.However, this framework does give a Media Distributor access to RTP headerfields and header extensions, as well as the ability to modify a certainsubset of the header fields and to add or change header extensions. Packetsreceived by a Media Distributor or an endpoint are authenticatedhop by hop.¶
To enable all of the above, this framework defines the use of twosecurity contexts and two associated encryption keys: an "inner" key(a distinct E2E key for each transmitted media flow) for authenticatedencryption of RTP media between endpoints and an "outer" key (a HBH key)known only to a Media Distributor or the adjacent endpointfor the hop between an endpoint and a Media Distributor or peer endpoint.An endpoint will receive one or more E2E keys fromevery other endpoint in the conference that correspond to the media flowstransmitted by those other endpoints, while HBH keys are derived fromthe DTLS-SRTP association with the Key Distributor. Two communicatingMedia Distributors use DTLS-SRTP associations directly with each other toobtain the HBH keys they will use. SeeSection 4.5 for more detailson key exchange.¶
+-------------+ +-------------+| |################################| || Media |------------------------ *----->| Media || Distributor |<----------------------*-|------| Distributor || X |#####################*#|#|######| Y || | | | | | |+-------------+ | | | +-------------+ # ^ | # HBH Key (XY) -+ | | # ^ | # # | | # E2E Key (B) ---+ | # | | # # | | # E2E Key (A) -----+ # | | # # | | # # | | # # | | # # | | # # | | *---- HBH Key (AX) HBH Key (YB) ----* | | # # | | # # | | # # *--------- E2E Key (A) E2E Key (A) ---------* # # | *------- E2E Key (B) E2E Key (B) -------* | # # | | # # | | # # | v # # | v #+-------------+ +-------------+| Endpoint A | | Endpoint B |+-------------+ +-------------+
The double transform[RFC8723] enables endpointsto perform encryption using both the E2E and HBH contexts whilestill preserving the same overall interface as other SRTPtransforms. The Media Distributor simply uses the correspondingnormal (single) AES-GCM transform, keyed with the appropriate HBHkeys. SeeSection 6.1 for a description of the keys used in PERCandSection 7 for a diagram of how encrypted RTP packets appear on thewire.¶
RTCP is only encrypted hop by hop -- not end to end. This frameworkdoes not provide an additional step for RTCP-authenticatedencryption. Rather, implementations utilize the existing proceduresspecified in[RFC3711]; those procedures usethe same outer, HBH cryptographic context chosen in the double transform operationdescribed above. For this reason, endpointsMUST NOT sendconfidential information via RTCP.¶
To ensure the confidentiality of E2E keys shared between endpoints,endpoints use a common Key Encryption Key (KEK) that isknown only by the trusted entities in a conference. That KEK, definedin the EKT specification[RFC8870] as the EKT Key, isused to subsequently encrypt the SRTP master key used for E2E-authenticated encryption of media sent by a given endpoint.Each endpoint in the conference creates an SRTP masterkey for E2E-authenticated encryption andkeeps track of the E2E keys received via the Full EKT Tag foreach distinct synchronization source (SSRC) in the conference so that itcan properly decrypt received media. An endpoint may change its E2E key at anytime and advertise that new key to the conference as specified in[RFC8870].¶
Any given RTP media flow is identified by its SSRC, and an endpointmight send more than one at a time and change the mix of media flowstransmitted during the lifetime of a conference.¶
Thus, an endpointMUST maintain a list of SSRCs from received RTPflows and each SSRC's associated E2E key information. An endpointMUSTdiscard old E2E keys no later than when it leaves the conference.¶
If the packet is to contain RTP header extensions, it should be notedthat those extensions are only encrypted hop by hop per[RFC8723]. Forthis reason, endpointsMUST NOT transmit confidential informationvia RTP header extensions.¶
To ensure the integrity of transmitted media packets, it isREQUIRED that every packet be authenticated hop by hop betweenan endpoint and a Media Distributor, as well as between MediaDistributors. The authentication key used for HBHauthentication is derived from an SRTP master key shared only on therespective hop. Each HBH key is distinct per hop, and no two hops everuse the same SRTP master key.¶
While endpoints also perform HBH authentication, the ability of the endpointsto reconstruct the original RTP header also enables the endpoints toauthenticate RTP packets end to end. This design yields flexibility to the MediaDistributor to change certain RTP header values as packets areforwarded. Values that the Media Distributor can change in the RTP headerare defined in[RFC8723]. RTCP can only be encrypted hop byhop, giving the Media Distributor the flexibility to (1) forward RTCPcontent unchanged, (2) transmit compound RTCP packets, (3) initiateRTCP packets for reporting statistics, or (4) convey other information.Performing HBH authentication for all RTP and RTCP packets also helpsprovide replay protection (seeSection 8). The use of the replayprotection mechanism specified inSection 3.3.2 of [RFC3711] isREQUIRED at each hop.¶
If there is a need to encrypt one or more RTP header extensionshop by hop, the endpoint derives an encryption key from the HBH SRTPmaster key to encrypt header extensions as per[RFC6904]. Thisstill gives the Media Distributor visibility into header extensions,such as the one used to determine the audio level[RFC6464] of conferenceparticipants. Note that when RTP header extensions are encrypted, allhops need to decrypt andre-encrypt these encrypted header extensions. Please refer toSections 5.1,5.2, and5.3 of[RFC8723] for proceduresto perform RTP header extension encryption and decryption.¶
In brief, the keys used by any given endpoints are determined asfollows:¶
The Media Distributor maintains a tunnel with the Key Distributor(e.g., using the tunnel protocol defined in[PERC-DTLS]), making itpossible for the Media Distributor to facilitate the establishment ofa secure DTLS association between each endpoint and the KeyDistributor as shown inFigure 3. The DTLS associationbetween endpoints and the Key Distributor enables each endpoint togenerate E2E and HBH keys and receive the KEK.At the same time, the Key Distributor securelyprovides the HBH key information to the Media Distributor. The keyinformation summarized here may include the SRTP master key, the SRTPmaster salt, and the negotiated cryptographic transform.¶
+-----------+ KEK info | Key | HBH Key info to to Endpoints |Distributor| Endpoints & Media Distributor +-----------+ # ^ ^ # # | | #--- Tunnel # | | #+-----------+ +-----------+ +-----------+| Endpoint | DTLS | Media | DTLS | Endpoint || KEK |<------------|Distributor|------------>| KEK || HBH Key | to Key Dist | HBH Keys | to Key Dist | HBH Key |+-----------+ +-----------+ +-----------+
In addition to the secure tunnel between the Media Distributor and theKey Distributor, there are two additional types of security associationsutilized as a part of the key exchange, as discussed in the followingparagraphs. One is a DTLS-SRTP association between an endpoint and the KeyDistributor (with packets passing through the Media Distributor), and theother is a DTLS-SRTP association between peer Media Distributors.¶
Endpoints establish a DTLS-SRTP association over the RTP session with theMedia Distributor and its media ports for the purposes of key informationexchange with the Key Distributor. The Media Distributor does not terminatethe DTLS signaling but instead forwards DTLS packets receivedfrom an endpoint on to the Key Distributor (and vice versa) via atunnel established between the Media Distributor and the Key Distributor.¶
When establishing the DTLS association between endpoints and theKey Distributor, the endpointMUST act as the DTLS client, and theKey DistributorMUST act as the DTLS server. The KEKis conveyed by the Key Distributor over the DTLSassociation to endpoints via procedures defined in EKT[RFC8870] via the EKTKey message.¶
The Key DistributorMUST NOT establish DTLS-SRTP associations withendpoints without first authenticating the Media Distributor tunneling theDTLS-SRTP packets from the endpoint.¶
Note that following DTLS-SRTP procedures for the cipher definedin[RFC8723], the endpoint generates both E2E and HBH encryption keysand salt values. EndpointsMUST either use the DTLS-SRTP-generated E2E keyfor transmission or generate a fresh E2E key. In either case, the generatedSRTP master salt for E2E encryptionMUST be replaced with the salt valueprovided by the Key Distributor via the EKTKey message. That is becauseevery endpoint in the conference uses the same SRTP master salt. Theendpoint only transmits the SRTP master key (not the salt) used for E2Eencryption to other endpoints in RTP/RTCP packets per[RFC8870].¶
Media Distributors use DTLS-SRTP directly with a peerMedia Distributor to establish the HBH key for transmitting RTP and RTCPpackets to that peer Media Distributor. The Key Distributor does notfacilitate establishing a HBH key for use between Media Distributors.¶
Following the initial key information exchange with the KeyDistributor, an endpoint is able to encrypt media end to end withan E2E key, sending that E2E key to other endpoints encrypted with theKEK, and is able to encrypt and authenticate RTP packetsusing a HBH key. This framework does not allow the Media Distributorto gain access to the KEK information, preventing it fromgaining access to any endpoint's E2E key and subsequently decryptingmedia.¶
The KEK may need to change from time to time during thelifetime of a conference, such as when a new participant joins or leaves aconference. Dictating if, when, or how often a conference is to berekeyed is outside the scope of this document, but this frameworkdoes accommodate rekeying during the lifetime of a conference.¶
When a Key Distributor decides to rekey a conference, it transmits anew EKTKey message containing the new EKT Keyto each of the conference participants.Upon receipt of the new EKT Key, the endpointMUST create anew SRTP master key and prepare to send that key inside a FullEKTField usingthe new EKT Key as perSection 4.5 of [RFC8870]. In order to allow time for all endpoints in the conference to receive the newkeys, the sender should follow the recommendations inSection 4.6 of [RFC8870]. On receiving a new EKT Key, endpointsMUSTbe prepared to decrypt EKT Tags using the new key. The EKT Security ParameterIndex (SPI) field isused to differentiate between EKT Tags encrypted with the old and new keys.¶
After rekeying, an endpointSHOULD retain prior SRTP master keys andEKT Keys for a period of time sufficient for the purpose of ensuring that it candecrypt late-arriving or out-of-order packets or packets sent by otherendpoints that used the prior keys for a period of time after rekeying began.An endpointMAY retain old keys until the end of the conference.¶
EndpointsMAY follow the procedures inSection 5.2 of [RFC5764]to renegotiate HBH keys as desired. If new HBH keys are generated,the new keys are also delivered to the Media Distributor followingthe procedures defined in[PERC-DTLS] as one possible method.¶
At any time, endpointsMAY change the E2Eencryption key being used. An endpointMUST generate a new E2E encryption keywhenever it receives a new EKT Key. After switching to a new key,the new key is conveyed to other endpoints in the conferencein RTP/RTCP packets per[RFC8870].¶
It is important that entities canvalidate the authenticity of other entities, especially the KeyDistributor and endpoints. Details on this topic are outside the scopeof this specification, but a few possibilities are discussed in thefollowing sections. The critical requirements are that (1) an endpointcan verify that it is connected to the correct Key Distributor for theconference and (2) the Key Distributor can verify that the endpoint isthe correct endpoint for the conference.¶
Two possible approaches to resolve this situation are identity assertions andcertificate fingerprints.¶
A WebRTC identity assertion[RFC8827] is usedto bind the identity of the user of the endpoint to the fingerprint ofthe DTLS-SRTP certificate used for the call. This certificate isunique for a given call and a conference. This certificate is unique for a given call and a conference, allowing theKey Distributor to ensure that only authorized users participate in theconference. Similarly, the Key Distributor can create a WebRTC identityassertion to bind the fingerprint of the unique certificate used bythe Key Distributor for this conference so that the endpoint canverify that it is talking to the correct Key Distributor. Such a setuprequires an Identity Provider (IdP) trusted by the endpoints and theKey Distributor.¶
Entities managing session signaling are generally assumed to beuntrusted in the PERC framework. However, there are some deploymentscenarios where parts of the session signaling may be assumedtrustworthy for the purposes of exchanging, in a manner that can beauthenticated, the fingerprint of an entity's certificate.¶
As a concrete example, SIP[RFC3261] andthe Session Description Protocol (SDP)[RFC4566] can be usedto convey the fingerprint information per[RFC5763]. An endpoint'sSIP User Agent would send an INVITE message containing SDP for themedia session along with the endpoint's certificate fingerprint, whichcan be signed using the procedures described in[RFC8224] for thebenefit of forwarding the message to other entities by the focus[RFC4353]. Other entities can verify that the fingerprints match thecertificates found in the DTLS-SRTP connections to find the identityof the far end of the DTLS-SRTP connection and verify that it is theauthorized entity.¶
Ultimately, if using session signaling, an endpoint's certificatefingerprint would need to be securely mapped to a user and conveyed tothe Key Distributor so that it can check that the user in question is authorized.Similarly, the Key Distributor's certificate fingerprint can beconveyed to an endpoint in a manner that can be authenticated as being anauthorized Key Distributor for this conference.¶
The Key Distributor needs to know what endpoints are being added to agiven conference. Thus, the Key Distributor and the Media Distributorneed to know endpoint-to-conference mappings, which are enabled byexchanging a conference-specific unique identifier as described in[PERC-DTLS]. How this uniqueidentifier is assigned is outside the scope of this document.¶
This section describes the various keys employed by PERC.¶
This section summarizes the several different keys used in the PERC framework,how they are generated, and what purpose they serve.¶
The keys are described in the order in which they would typically beacquired.¶
The various keys used in PERC are shown inTable 1 below.¶
Key | Description |
---|---|
HBH Key | SRTP master key used to encrypt media hop by hop. |
KEK (EKT Key) | Key shared by all endpoints and used to encrypt each endpoint's E2E SRTP master key so receiving endpoints can decrypt media. |
E2E Key | SRTP master key used to encrypt media end to end. |
While the number of key types is very small, it should be understood thatthe actual number of distinct keys can be large as the conferencegrows in size.¶
As an example, with 1,000 participants in a conference, there would be atleast 1,000 distinct SRTP master keys, all of which share the same master salt.Each of those keys is passed through the Key Derivation Function (KDF) as defined in[RFC3711] to producethe actual encryption and authentication keys.¶
Complicating key management is the fact that the KEK can change and, whenit does, the endpoints generate new SRTP master keys that are associated witha new EKT SPI. Endpoints might retain old keys for a period of time toensure that they can properly decrypt late-arriving or out-of-order packets, whichmeans that the number of keys held during that period of time might besubstantially higher.¶
A more detailed explanation of each of the keys follows.¶
The first set of keys acquired are for HBH encryption anddecryption. Per the double transform procedures[RFC8723], theendpoint performs a DTLS-SRTP exchange with the Key Distributorand receives a key that is, in fact, "double" the size that is needed.The E2E part is the first half of the key, so the endpoint discardsthat information when generating its own key. The second half of the keyingmaterial is for HBH operations, so that half of the key(corresponding to the least significant bits) is assigned internally asthe HBH key.¶
The Key Distributor informs the Media Distributor of the HBH key. Specifically,the Key Distributor sends the least significant bits corresponding to thehalf of the keying material determined through DTLS-SRTP with the endpointto the Media Distributor. A salt value isgenerated along with the HBH key. The salt is also longer than neededfor HBH operations; thus, only the least significant bits of therequired length (half of the generated salt material) are sent to theMedia Distributor. One way to transmit this key and salt informationis via the tunnel protocol defined in[PERC-DTLS].¶
No two endpoints have the same HBH key; thus, the Media DistributorMUST keep track of each distinct HBH key (and the corresponding salt) anduse it only for the specified hop.¶
The HBH key is also used for HBH encryption of RTCP. RTCP is notE2E-encrypted in PERC.¶
The Key Distributor sends the KEK (the EKT Key per[RFC8870]) to the endpoint via the aforementioned DTLS-SRTP association. This key is known only tothe Key Distributor and endpoints; it is the most important entity toprotect, since having knowledge of this key (and the SRTP master salttransmitted as a part of the same message) allows an entity todecrypt any media packet in the conference.¶
Note that the Key Distributor can send any number of EKT Keys toendpoints. This information is used to rekey the entire conference. Eachkey is identified by an SPI value.EndpointsMUST expect that a conference might be rekeyedwhen a new participant joins a conference or when a participantleaves a conference, in order to protect the confidentiality ofthe conversation before and after such events.¶
The SRTP master salt to be used by the endpoint is transmitted alongwith the EKT Key. All endpoints in the conference utilizethe same SRTP master salt that corresponds with a given EKT Key.¶
The Full EKT Tag in media packets is encrypted using a cipher specifiedvia the EKTKey message (e.g., AES Key Wrap with a 128-bit key). Thiscipher is different than the cipher used to protect media and is onlyused to encrypt the endpoint's SRTP master key (and other EKT Tag dataas per[RFC8870]).¶
The KEK is not given to the Media Distributor.¶
As stated earlier, the E2E key determined via DTLS-SRTPMAY bediscarded in favor of a locally generated E2E SRTP master key. While theDTLS-SRTP-derived SRTP master key can be used initially, the endpoint mightchoose to change the SRTP master key periodically andMUST change theSRTP master key as a result of the EKT Key changing.¶
A locally generated SRTP master key is used along with the master salttransmitted to the endpoint from the Key Distributor via the EKTKeymessage to encrypt media end to end.¶
Since the Media Distributor is not involved in E2E functions, it does notcreate this key, nor does it have access to any endpoint's E2E key. Note, too,that even the Key Distributor is unaware of the locally generated E2E keysused by each endpoint.¶
The endpoint transmits its E2E key to other endpoints in the conferenceby periodically including it in SRTP packets in a Full EKT Tag. Whenplaced in the Full EKT Tag, it is encrypted using the EKT Key providedby the Key Distributor. The master salt is not transmitted, though,since all endpoints receive the same master salt via the EKTKeymessage from the Key Distributor. The recommended frequency with which anendpoint transmits its SRTP master key is specified in[RFC8870].¶
All endpoints have knowledge of the KEK.¶
Every HBH key is distinct for a given endpoint; thus, Endpoint A andEndpoint B do not have knowledge of the other's HBH key. Since HBH keysare derived from a DTLS-SRTP association, there is at most one HBH keyper endpoint. (The only exception is where the DTLS-SRTP association mightbe rekeyed perSection 5.2 of [RFC5764] and a new key is created toreplace the former key.)¶
Each endpoint generates its own E2E key (SRTP master key); thus,there is a distinct E2E key per endpoint. This key is transmitted (encrypted) viathe Full EKT Tag to other endpoints. Endpoints that receive media froma given transmitting endpoint gain knowledge of thetransmitter's E2E key via the Full EKT Tag.¶
Table 2 summarizes the various keys and which entity is in possession of a given key.¶
Key/Entity | Endpoint A | MD X | MD Y | Endpoint B |
---|---|---|---|---|
KEK (EKT Key) | Yes | No | No | Yes |
E2E Key (A and B) | Yes | No | No | Yes |
HBH Key (A<=>MD X) | Yes | Yes | No | No |
HBH Key (B<=>MD Y) | No | No | Yes | Yes |
HBH Key (MD X<=>MD Y) | No | Yes | Yes | No |
Figure 4 presents a complete picture of what an encryptedmedia packet per this framework looks like when transmitted over the wire.The packet format shown in the figure is encrypted using the double cryptographic transformwith an EKT Tag appended to the end.¶
0 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<++ |V=2|P|X| CC |M| PT | sequence number | IO +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ IO | timestamp | IO +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ IO | synchronization source (SSRC) identifier | IO +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+ IO | contributing source (CSRC) identifiers | IO | .... | IO +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<+O | RTP extension (OPTIONAL) ... | |O+>+>+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<+OO I | payload ... | IOO I | +-------------------------------+ IOO I | | RTP padding | RTP pad count | IOO +>+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+<+OO | | E2E authentication tag | |OO | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |OO | | OHB ... | |O+>| +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |+| | | HBH authentication tag | ||| | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ ||| | | EKT Tag ... | R ||| | +-+-+-+-+-+-+-+-+-+ | ||| | +- Neither encrypted nor authenticated; ||| | appended after the double transform ||| | is performed ||| | ||| +- E2E-Encrypted Portion E2E-Authenticated Portion ---+|| |+--- HBH-Encrypted Portion HBH-Authenticated Portion ----+ I = Inner (E2E) encryption/authentication O = Outer (HBH) encryption/authentication
Third-party attacks are attacks attempted by an adversary that is notsupposed to have access to keying material or is otherwise not anauthorized participant in the conference.¶
On-path attacks are mitigated by HBH integrity protection andencryption. The integrity protection mitigates packet modification.Encryption makes selective blocking of packets harder, but notimpossible.¶
Off-path attackers could try connecting to different PERC entities tosend specifically crafted packets with an aim of forcing the receiver toforward or render bogus media packets. Endpoints and Media Distributors mitigatesuch an attack by performing HBH authentication and discarding packetsthat fail authentication.¶
Another attack vector is a third party claiming to be a MediaDistributor, fooling endpoints into sending packets to the falseMedia Distributor instead of the correct one. The deceived sendingendpoints could incorrectly assume that their packets have been deliveredto endpoints when they in fact have not. While this attack is possible,the result is a simple denial of service with no leakage of confidentialinformation, since the false Media Distributor would not have accessto either HBH or E2E encryption keys.¶
A third party could cause a denial of service by transmitting many bogusor replayed packets toward receiving devices and ultimately degradingconference or device performance. Therefore, implementations might wish todevise mechanisms to safeguard against such illegitimate packets, such asutilizing rate-limiting or performing basic sanity checks on packets(e.g., looking at packet length or expected sequence number ranges), beforeperforming decryption operations that are more expensive.¶
The use of mutual DTLS authentication (as required by DTLS-SRTP) also helps toprevent a denial-of-service attack by preventing a false endpoint or falseMedia Distributor from successfully participating as a perceived valid mediasender that could otherwise carry out an on-path attack. When mutualauthentication fails, a receiving endpoint would know that it could safelydiscard media packets received from the endpoint without inspection.¶
A malicious or compromised Media Distributor can attack the session in anumber of possible ways, as discussed below.¶
A simple form of attack is discarding received packets that should beforwarded. This solution framework does not provide any mitigationmechanisms for Media Distributors that fail to forward media packets.¶
Another form of attack is modifying received packets before forwarding.With this solution framework, any modification of the E2E-authenticated dataresults in the receiving endpoint getting an integrity failure when performing authentication on the received packet.¶
The Media Distributor can also attempt to perform resource consumptionattacks on the receiving endpoint. One such attack would be to insertrandom SSRC/CSRC values in any RTP packet along with a Full EKT Tag. Since such a message would trigger the receiver to form a new cryptographiccontext, the Media Distributor can attempt to consume the receivingendpoint's resources. While E2E authentication would fail and thecryptographic context would be destroyed, the key derivation operationwould nonetheless consume some computational resources. While resourceconsumption attacks cannot be mitigated entirely, rate-limiting packetsmight help reduce the impact of such attacks.¶
A replay attack is an attack where an already-received packet from a previouspoint in the RTP stream is replayed as a new packet. This could, forexample, allow a Media Distributor to transmit a sequence of packetsidentified as a user saying "yes", instead of the "no" the useractually said.¶
A replay attack is mitigated by the requirement to implementreplay protection asdescribed inSection 3.3.2 of [RFC3711].E2E replay protectionMUST be provided for theduration of the conference.¶
A delayed playout attack is an attack where media is received and held bya Media Distributor and then forwarded to endpoints at a later pointin time.¶
This attack is possible even if E2E replay protection is in place.Because the Media Distributor is allowed to select asubset of streams and not forward the rest to a receiver, such as inforwarding only the most active speakers, the receiver has to acceptgaps in the E2E packet sequence. The problem here is that a MediaDistributor can choose to not deliver a particular stream for a while.¶
While the Media Distributor can purposely stop forwarding media flows, itcan also select an arbitrary starting point to resume forwarding thosemedia flows, perhaps forwarding old packets rather than current packets.As a consequence, what the media source sent can be substantially delayedat the receiver with the receiver believing that newly arriving packetsare delayed only by transport delay when the packets may actually beminutes or hours old.¶
While this attack cannot be eliminated entirely, its effectivenesscan be reduced by rekeying the conference periodically, sincesignificantly delayed media encrypted with expired keys would not bedecrypted by endpoints.¶
A splicing attack is an attack where a Media Distributor receivingmultiple media sources splices one media stream into the other. Ifthe Media Distributor were able to change the SSRC without the receiverhaving any method for verifying the original source ID, then the MediaDistributor could first deliver stream A and then later forward streamB under the same SSRC that stream A was previously using. By includingthe SSRC in the integrity check for each packet -- both HBH and E2E -- PERCprevents splicing attacks.¶
PERC does not provide E2E protection of RTCP messages. This allowsa compromised Media Distributor to impact any message that might betransmitted via RTCP, including media statistics, picture requests, or lossindication. It is also possible for a compromised Media Distributor to forgerequests, such as requests to the endpoint to send a new picture. Suchrequests can consume significant bandwidth and impair conference performance.¶
As stated inSection 3.2.2, the Key Distributor needs to be secured,since exploiting the Key Server can allow an adversary to gain access tothe keying material for one or more conferences. Having access to thatkeying material would then allow the adversary to decrypt media sent fromany endpoint in the conference.¶
As a first line of defense, the Key Distributor authenticates everysecurity association -- associations with both endpoints and MediaDistributors. The Key Distributor knows which entities are authorized tohave access to which keys, and inspection of certificates will substantiallyreduce the risk of providing keys to an adversary.¶
Both physical and network access to the Key Distributor should be severelyrestricted. This may be more difficult to achieve when the Key Distributoris embedded within, for example, an endpoint. Nonetheless, considerationshould be given to shielding the Key Distributor from unauthorized accessor any access that is not strictly necessary for the support of anongoing conference.¶
Consideration should be given to whether access to the keying materialwill be needed beyond the conclusion of a conference. If not needed,the Key Distributor's policy should be to destroy the keying materialonce the conference concludes or when keying material changes duringthe course of the conference. If keying material is needed beyond thelifetime of the conference, further consideration should be given toprotecting keying material from future exposure. While it might seemobvious, it is worth making this point, to avoid any doubt that if an adversary wereto record the media packets transmitted during a conference and thengain unauthorized access to the keying material left unsecured on theKey Distributor even years later, the adversary could decrypt thecontent of every packet transmitted during the conference.¶
A Trusted Endpoint is so named because conference confidentiality reliesheavily on the security and integrity of the endpoint. If an adversarysuccessfully exploits a vulnerability in an endpoint, it might be possiblefor the adversary to obtain all of the keying material used in theconference. With that keying material, an adversary could decrypt anyof the media flows received from any other endpoint, either in real timeor at a later point in time (assuming that the adversary makes a copy of themedia packets).¶
Additionally, if an adversary successfully exploits an endpoint, theadversary could inject media into the conference. For example, an adversarycould manipulate the RTP or SRTP software to transmitwhatever media the adversary wishes to send. This could involve the reuse of the compromised endpoint's SSRC or,since all conference participants share the same KEK,the use of a new SSRC or the SSRC value of another endpoint.Only a single SRTP cipher suite defined provides sourceauthentication properties that allow an endpoint to cryptographicallyassert that it sent a particular E2E-protected packet (namely, Timed EfficientStream Loss-Tolerant Authentication (TESLA)[RFC4383]), and its usage is presently notdefined for PERC. The suitedefined in PERC only allows an endpoint to determine that whoever sent apacket had received the KEK.¶
However, attacks on the endpoint are not limited to the PERC-specificsoftware within the endpoint. An attacker could inject media or recordmedia by manipulating the software that sits between the PERC-enabledapplication and the hardware microphone of a video camera, for example.Likewise, an attacker could potentially access confidential media byaccessing memory, cache, disk storage, etc. if the endpoint is not secured.¶
This document has no IANA actions.¶
The authors would like to thankMo Zanaty,Christian Oien, andRichard Barnes for invaluable input on this document. Also, we would like to acknowledgeNermeen Ismail for serving on the initial draft versions of this document as a coauthor. We would also like to acknowledgeJohn Mattsson,Mats Naslund, andMagnus Westerlund for providing some of the text in the document, including much of the original text in the Security Considerations section (Section 8).¶