Using WebRTC Encoded Transforms
Limited availability
This feature is not Baseline because it does not work in some of the most widely-used browsers.
WebRTC Encoded Transforms provide a mechanism to inject a high performanceStream API for modifying encoded video and audio frame into the incoming and outgoing WebRTC pipelines.This enables use cases such as end-to-end encryption of encoded frames by third-party code.
The API defines both main thread and worker side objects.The main-thread interface is aRTCRtpScriptTransform
instance, which on construction specifies theWorker
that is to implement the transformer code.The transform running in the worker is inserted into the incoming or outgoing WebRTC pipeline by adding theRTCRtpScriptTransform
toRTCRtpReceiver.transform
orRTCRtpSender.transform
, respectively.
A counterpartRTCRtpScriptTransformer
object is created in the worker thread, which has aReadableStream
readable
property, aWritableStream
writable
property, and anoptions
object passed from the associatedRTCRtpScriptTransform
constructor.Encoded video frames (RTCEncodedVideoFrame
) or audio frames (RTCEncodedAudioFrame
) from the WebRTC pipeline are enqueued onreadable
for processing.
TheRTCRtpScriptTransformer
is made available to code as thetransformer
property of thertctransform
event, which is fired at the worker global scope whenever an encoded frame is enqueued for processing (and initially on construction of the correspondingRTCRtpScriptTransform
).The worker code must implement a handler for the event that reads encoded frames fromtransformer.readable
, modifies them as needed, and writes them totransformer.writable
in the same order and without any duplication.
While the interface doesn't place any other restrictions on the implementation, a natural way to transform the frames is to create apipe chain that sends frames enqueued on theevent.transformer.readable
stream through anTransformStream
to theevent.transformer.writable
stream.We can use theevent.transformer.options
property to configure any transform code that depends on whether the transform is enqueuing incoming frames from the packetizer or outgoing frames from a codec.
TheRTCRtpScriptTransformer
interface also provides methods that can be used when sending encoded video to get the codec to generate a "key" frame, and when receiving video to request that a new key frame be sent.These may be useful to allow a recipient to start viewing the video more quickly, if (for example) they join a conference call when delta frames are being sent.
The following examples provide more specific examples of how to use the framework using aTransformStream
based implementation.
Test if encoded transforms are supported
Test ifencoded transforms are supported by checking for the existence ofRTCRtpSender.transform
(orRTCRtpReceiver.transform
):
const supportsEncodedTransforms = window.RTCRtpSender && "transform" in RTCRtpSender.prototype;
Adding a transform for outgoing frames
A transform running in a worker is inserted into the outgoing WebRTC pipeline by assigning its correspondingRTCRtpScriptTransform
to theRTCRtpSender.transform
for an outgoing track.
This example shows how you might stream video from a user's webcam over WebRTC, adding a WebRTC encoded transform to modify the outgoing streams.The code assumes that there is anRTCPeerConnection
calledpeerConnection
that is already connected to a remote peer.
First we get aMediaStreamTrack
, usinggetUserMedia()
to get a videoMediaStream
from a media device, and then theMediaStream.getTracks()
method to get the firstMediaStreamTrack
in the stream.
The track is added to the peer connection usingaddTrack()
, which starts streaming it to the remote peer.TheaddTrack()
method returns theRTCRtpSender
that is being used to send the track.
// Get Video stream and MediaTrackconst stream = await navigator.mediaDevices.getUserMedia({ video: true });const [track] = stream.getTracks();const videoSender = peerConnection.addTrack(track, stream);
AnRTCRtpScriptTransform
is then constructed taking a worker script, which defines the transform, and an optional object that can be used to pass arbitrary messages to the worker (in this case we've used aname
property with value "senderTransform" to tell the worker that this transform will be added to the outbound stream).We add the transform to the outgoing pipeline by assigning it to theRTCRtpSender.transform
property.
// Create a worker containing a TransformStreamconst worker = new Worker("worker.js");videoSender.transform = new RTCRtpScriptTransform(worker, { name: "senderTransform",});
TheUsing separate sender and receiver transforms section below shows how thename
might be used in a worker.
Note that you can add the transform at any time, but by adding it immediately after callingaddTrack()
the transform will get the first encoded frame that is sent.
Adding a transform for incoming frames
A transform running in a worker is inserted into the incoming WebRTC pipeline by assigning its correspondingRTCRtpScriptTransform
to theRTCRtpReceiver.transform
for an incoming track.
This example shows how you add a transform to modify an incoming stream.The code assumes that there is anRTCPeerConnection
calledpeerConnection
that is already connected to a remote peer.
First we add anRTCPeerConnection
track
event handler to catch the event when the peer starts receiving a new track.Within the handler we construct anRTCRtpScriptTransform
and add it toevent.receiver.transform
(event.receiver
is aRTCRtpReceiver
).As in the previous section, the constructor takes an object withname
property, but here we usereceiverTransform
as the value to tell the worker that frames are incoming.
peerConnection.ontrack = (event) => { const worker = new Worker("worker.js"); event.receiver.transform = new RTCRtpScriptTransform(worker, { name: "receiverTransform", }); received_video.srcObject = event.streams[0];};
Note again that you can add the transform stream at any time.However by adding it in thetrack
event handler ensures that the transform stream will get the first encoded frame for the track.
Worker implementation
The worker script must implement a handler for thertctransform
event, creating apipe chain that pipes theevent.transformer.readable
(ReadableStream
) stream through aTransformStream
to theevent.transformer.writable
(WritableStream
) stream.
A worker might support transforming incoming or outgoing encoded frames, or both, and the transform might be hard coded, or configured at run-time using information passed from the web application.
Basic WebRTC Encoded Transform
The example below shows a basic WebRTC Encoded transform, which negates all bits in queued frames.It does not use or need options passed in from the main thread because the same algorithm can be used in the sender pipeline to negate the bits and in the receiver pipeline to restore them.
The code implements an event handler for thertctransform
event.This constructs aTransformStream
, then pipes through it usingReadableStream.pipeThrough()
, and finally pipes toevent.transformer.writable
usingReadableStream.pipeTo()
.
addEventListener("rtctransform", (event) => { const transform = new TransformStream({ start() {}, // Called on startup. flush() {}, // Called when the stream is about to be closed. async transform(encodedFrame, controller) { // Reconstruct the original frame. const view = new DataView(encodedFrame.data); // Construct a new buffer const newData = new ArrayBuffer(encodedFrame.data.byteLength); const newView = new DataView(newData); // Negate all bits in the incoming frame for (let i = 0; i < encodedFrame.data.byteLength; ++i) { newView.setInt8(i, ~view.getInt8(i)); } encodedFrame.data = newData; controller.enqueue(encodedFrame); }, }); event.transformer.readable .pipeThrough(transform) .pipeTo(event.transformer.writable);});
The implementation of the WebRTC encoded transform is similar to a "generic"TransformStream
, but with some important differences.Like the generic stream, itsconstructor takes an object that defines anoptionalstart()
method, which is called on construction,flush()
method, which is called as the stream is about to be closed, andtransform()
method, which is called every time there is a chunk to be processed.Unlike the generic constructor anywritableStrategy
orreadableStrategy
properties that are passed in the constructor object are ignored, and the queuing strategy is entirely managed by the user agent.
Thetransform()
method also differs in that it is passed either anRTCEncodedVideoFrame
orRTCEncodedAudioFrame
rather than a generic "chunk".The actual code shown here for the method isn't notable other than it demonstrates how to convert the frame to a form where you can modify it and enqueue it afterwards on the stream.
Using separate sender and receiver transforms
The previous example works if the transform function is the same when sending and receiving, but in many cases the algorithms will be different.You could use separate worker scripts for the sender and receiver, or handle both cases in one worker as shown below.
If the worker is used for both sender and receiver, it needs to know whether the current encoded frame is outgoing from a codec, or incoming from the packetizer.This information can be specified using the second option in theRTCRtpScriptTransform
constructor.For example, we can define a separateRTCRtpScriptTransform
for the sender and receiver, passing the same worker, and an options object with propertyname
that indicates whether the transform is used in the sender or receiver (as shown in previous sections above).The information is then available in the worker inevent.transformer.options
.
In this example we implement theonrtctransform
event handler on the global dedicated worker scope object.The value of thename
property is used to determine whichTransformStream
to construct (the actual constructor methods are not shown).
// Code to instantiate transform and attach them to sender/receiver pipelines.onrtctransform = (event) => { let transform; if (event.transformer.options.name === "senderTransform") transform = createSenderTransform(); // returns a TransformStream else if (event.transformer.options.name === "receiverTransform") transform = createReceiverTransform(); // returns a TransformStream else return; event.transformer.readable .pipeThrough(transform) .pipeTo(event.transformer.writable);};
Note that the code to create the pipe chain is the same as in the previous example.
Runtime communication with the transform
TheRTCRtpScriptTransform
constructor allows you to pass options and transfer objects to the worker.In the previous example we passed static information, but sometimes you might want to modify the transform algorithm in the worker at runtime, or get information back from the worker.For example, a WebRTC conference call that supports encryption might need to add a new key to the algorithm used by the transform.
While it is possible to share information between the worker running the transform code and the main thread usingWorker.postMessage()
, it is generally easier to share aMessageChannel
as anRTCRtpScriptTransform
constructor option, because then the channel context is directly available in theevent.transformer.options
when you are handling a new encoded frame.
The code below creates aMessageChannel
andtransfers its second port to the worker.The main thread and transform can subsequently communicate using the first and second ports.
// Create a worker containing a TransformStreamconst worker = new Worker("worker.js");// Create a channel// Pass channel.port2 to the transform as a constructor option// and also transfer it to the workerconst channel = new MessageChannel();const transform = new RTCRtpScriptTransform( worker, { purpose: "encrypt", port: channel.port2 }, [channel.port2],);// Use the port1 to send a string.// (we can send and transfer basic types/objects).channel.port1.postMessage("A message for the worker");channel.port1.start();
In the worker the port is available asevent.transformer.options.port
.The code below shows how you might listen on the port'smessage
event to get messages from the main thread.You can also use the port to send messages back to the main thread.
event.transformer.options.port.onmessage = (event) => { // The message payload is in 'event.data'; console.log(event.data);};
Triggering a key frame
Raw video is rarely sent or stored because it consumes a lot of space and bandwidth to represent each frame as a complete image.Instead, codecs periodically generate a "key frame" that contains enough information to construct a full image, and between key frames sends "delta frames" that just include the changes since the last delta frame.While this is far more efficient that sending raw video, it means that in order to display the image associated with a particular delta frame, you need the last key frame and all subsequent delta frames.
This can cause a delay for new users joining a WebRTC conference application, because they can't display video until they have received their first key frame.Similarly, if an encoded transform was used to encrypt frames, the recipient would not be able to display video until they get the first key frame encrypted with their key.
In order to ensure that a new key frame can be sent as early as possible when needed, theRTCRtpScriptTransformer
object inevent.transformer
has two methods:RTCRtpScriptTransformer.generateKeyFrame()
, which causes the codec to generate a key frame, andRTCRtpScriptTransformer.sendKeyFrameRequest()
, which a receiver can use to request a key frame from the sender.
The example below shows how the main thread might pass an encryption key to a sender transform, and trigger the codec to generate a key frame.Note that the main thread doesn't have direct access to theRTCRtpScriptTransformer
object, so it needs to pass the key and restriction identifier ("rid") to the worker (the "rid" is a stream id, which indicates the encoder that must generate the key frame).Here we do that with aMessageChannel
, using the same pattern as in the previous section.The code assumes there is already a peer connection, and thatvideoSender
is anRTCRtpSender
.
const worker = new Worker("worker.js");const channel = new MessageChannel();videoSender.transform = new RTCRtpScriptTransform( worker, { name: "senderTransform", port: channel.port2 }, [channel.port2],);// Post rid and new key to the senderchannel.port1.start();channel.port1.postMessage({ rid: "1", key: "93ae0927a4f8e527f1gce6d10bc6ab6c",});
Thertctransform
event handler in the worker gets the port and uses it to listen formessage
events from the main thread.If an event is received it gets therid
andkey
, and then callsgenerateKeyFrame()
.
event.transformer.options.port.onmessage = (event) => { const { rid, key } = event.data; // key is used by the transformer to encrypt frames (not shown) // Get codec to generate a new key frame using the rid // Here 'rcEvent' is the rtctransform event. rcEvent.transformer.generateKeyFrame(rid);};
The code for a receiver to request a new key frame would be almost identical, except that "rid" isn't specified.Here is the code for just the port message handler:
event.transformer.options.port.onmessage = (event) => { const { key } = event.data; // key is used by the transformer to decrypt frames (not shown) // Request sender to emit a key frame. transformer.sendKeyFrameRequest();};