Signaling and video calling

WebRTC allows real-time, peer-to-peer, media exchange between two devices. A connection is established through a discovery and negotiation process calledsignaling. This tutorial will guide you through building a two-way video-call.

WebRTC is a fully peer-to-peer technology for the real-time exchange of audio, video, and data, with one central caveat. A form of discovery and media format negotiation must take place,as discussed elsewhere, in order for two devices on different networks to locate one another. This process is calledsignaling and involves both devices connecting to a third, mutually agreed-upon server. Through this third server, the two devices can locate one another, and exchange negotiation messages.

In this article, we will further enhance the to support opening a two-way video call between users. You cantry out this example on Render to experiment with it as well.You can alsolook at the full project on GitHub.

The signaling server

Establishing a WebRTC connection between two devices requires the use of asignaling server to resolve how to connect them over the internet. A signaling server's job is to serve as an intermediary to let two peers find and establish a connection while minimizing exposure of potentially private information as much as possible. How do we create this server and how does the signaling process actually work?

First we need the signaling server itself. WebRTC doesn't specify a transport mechanism for the signaling information. You can use anything you like, fromWebSocket tofetch() to carrier pigeons to exchange the signaling information between the two peers.

It's important to note that the server doesn't need to understand or interpret the signaling data content. Although it'sSDP, even this doesn't matter so much: the content of the message going through the signaling server is, in effect, a black box. What does matter is when theICE subsystem instructs you to send signaling data to the other peer, you do so, and the other peer knows how to receive this information and deliver it to its own ICE subsystem. All you have to do is channel the information back and forth. The contents don't matter at all to the signaling server.

Readying the chat server for signaling

Ourchat server uses theWebSocket API to send information asJSON strings between each client and the server. The server supports several message types to handle tasks, such as registering new users, setting usernames, and sending public chat messages.

To allow the server to support signaling and ICE negotiation, we need to update the code. We'll have to allow directing messages to one specific user instead of broadcasting to all connected users, and ensure unrecognized message types are passed through and delivered, without the server needing to know what they are. This lets us send signaling messages using this same server, instead of needing a separate server.

Let's take a look at changes we need to make to the chat server to support WebRTC signaling. This is in the filechatserver.js.

First up is the addition of the functionsendToOneUser(). As the name suggests, this sends a stringified JSON message to a particular username.

function sendToOneUser(target, msgString) {  connectionArray.find((conn) => conn.username === target).send(msgString);}

This function iterates over the list of connected users until it finds one matching the specified username, then sends the message to that user. The parametermsgString is a stringified JSON object. We could have made it receive our original message object, but in this example it's more efficient this way. Since the message has already been stringified, we can send it with no further processing. Each entry inconnectionArray is aWebSocket object, so we can just call itssend() method directly.

Our original chat demo didn't support sending messages to a specific user. The next task is to update the main WebSocket message handler to support doing so. This involves a change near the end of the"connection" message handler:

if (sendToClients) {  const msgString = JSON.stringify(msg);  if (msg.target && msg.target.length !== 0) {    sendToOneUser(msg.target, msgString);  } else {    for (const connection of connectionArray) {      connection.send(msgString);    }  }}

This code now looks at the pending message to see if it has atarget property. If that property is present, it specifies the username of the client to which the message is to be sent, and we callsendToOneUser() to send the message to them. Otherwise, the message is broadcast to all users by iterating over the connection list, sending the message to each user.

As the existing code allows the sending of arbitrary message types, no additional changes are required. Our clients can now send messages of unknown types to any specific user, letting them send signaling messages back and forth as desired.

That's all we need to change on the server side of the equation. Now let's consider the signaling protocol we will implement.

Designing the signaling protocol

Now that we've built a mechanism for exchanging messages, we need a protocol defining how those messages will look. This can be done in a number of ways; what's demonstrated here is just one possible way to structure signaling messages.

This example's server uses stringified JSON objects to communicate with its clients. This means our signaling messages will be in JSON format, with contents which specify what kind of messages they are as well as any additional information needed in order to handle the messages properly.

Exchanging session descriptions

When starting the signaling process, anoffer is created by the user initiating the call. This offer includes a session description, inSDP format, and needs to be delivered to the receiving user, which we'll call thecallee. The callee responds to the offer with ananswer message, also containing an SDP description. Our signaling server will use WebSocket to transmit offer messages with the type"video-offer", and answer messages with the type"video-answer". These messages have the following fields:

type: The message type; either"video-offer" or"video-answer".
name: The sender's username.
target: The username of the person to receive the description (if the caller is sending the message, this specifies the callee, and vice versa).
sdp: The SDP (Session Description Protocol) string describing the local end of the connection from the perspective of the sender (or the remote end of the connection from the receiver's point of view).

At this point, the two participants know whichcodecs andcodec parameters are to be used for this call. They still don't know how to transmit the media data itself though. This is whereInteractive Connectivity Establishment (ICE) comes in.

Exchanging ICE candidates

Two peers need to exchange ICE candidates to negotiate the actual connection between them. Every ICE candidate describes a method that the sending peer is able to use to communicate. Each peer sends candidates in the order they're discovered, and keeps sending candidates until it runs out of suggestions, even if media has already started streaming.

Anicecandidate event is sent to theRTCPeerConnection to complete the process of adding a local description usingpc.setLocalDescription(offer).

Once the two peers agree upon a mutually-compatible candidate, that candidate's SDP is used by each peer to construct and open a connection, through which media then begins to flow. If they later agree on a better (usually higher-performance) candidate, the stream may change formats as needed.

Though not currently supported, a candidate received after media is already flowing could theoretically also be used to downgrade to a lower-bandwidth connection if needed.

Each ICE candidate is sent to the other peer by sending a JSON message of type"new-ice-candidate" over the signaling server to the remote peer. Each candidate message include these fields:

type: The message type:"new-ice-candidate".
target: The username of the person with whom negotiation is underway; the server will direct the message to this user only.
candidate: The SDP candidate string, describing the proposed connection method. You typically don't need to look at the contents of this string. All your code needs to do is route it through to the remote peer using the signaling server.

Each ICE message suggests a communication protocol (TCP or UDP), IP address, port number, connection type (for example, whether the specified IP is the peer itself or a relay server), along with other information needed to link the two computers together. This includes NAT or other networking complexity.

Note:The important thing to note is this: the only thing your code is responsible for during ICE negotiation is accepting outgoing candidates from the ICE layer and sending them across the signaling connection to the other peer when youronicecandidate handler is executed, and receiving ICE candidate messages from the signaling server (when the"new-ice-candidate" message is received) and delivering them to your ICE layer by callingRTCPeerConnection.addIceCandidate(). That's it.

The contents of the SDP are irrelevant to you in essentially all cases. Avoid the temptation to try to make it more complicated than that until you really know what you're doing. That way lies madness.

All your signaling server now needs to do is send the messages it's asked to. Your workflow may also demand login/authentication functionality, but such details will vary.

Note:Theonicecandidate Event andcreateAnswer() Promise are both async calls which are handled separately. Be sure that your signaling does not change order! For exampleaddIceCandidate() with the server's ice candidates must be called after setting the answer withsetRemoteDescription().

Signaling transaction flow

The signaling process involves this exchange of messages between two peers using an intermediary, the signaling server. The exact process will vary, of course, but in general there are a few key points at which signaling messages get handled:

Each user's client running within a web browser
Each user's web browser
The signaling server
The web server hosting the chat service

Imagine that Naomi and Priya are engaged in a discussion using the chat software, and Naomi decides to open a video call between the two. Here's the expected sequence of events:

Diagram of the signaling process

We'll see this detailed more over the course of this article.

ICE candidate exchange process

When each peer's ICE layer begins to send candidates, it enters into an exchange among the various points in the chain that looks like this:

Diagram of ICE candidate exchange process

Each side sends candidates to the other as it receives them from their local ICE layer; there is no taking turns or batching of candidates. As soon as the two peers agree upon one candidate that they can both use to exchange the media, media begins to flow. Each peer continues to send candidates until it runs out of options, even after the media has already begun to flow. This is done in hopes of identifying even better options than the one initially selected.

If conditions change (for example, the network connection deteriorates), one or both peers might suggest switching to a lower-bandwidth media resolution, or to an alternative codec. That triggers a new exchange of candidates, after which another media format and/or codec change may take place. In the guideCodecs used by WebRTC you can learn more about the codecs which WebRTC requires browsers to support, which additional codecs are supported by which browsers, and how to choose the best codecs to use.

Optionally, seeRFC 8445: Interactive Connectivity Establishment,section 2.3 ("Negotiating Candidate Pairs and Concluding ICE") if you want greater understanding of how this process is completed inside the ICE layer. You should note that candidates are exchanged and media starts to flow as soon as the ICE layer is satisfied. This is all taken care of behind the scenes. Our role is to send the candidates, back and forth, through the signaling server.

The client application

The core to any signaling process is its message handling. It's not necessary to use WebSockets for signaling, but it is a common solution. You should, of course, select a mechanism for exchanging signaling information that is appropriate for your application.

Let's update the chat client to support video calling.

Updating the HTML

The HTML for our client needs a location for video to be presented. This requires video elements, and a button to hang up the call:

html

<div>  <div>    <video autoplay></video>    <video autoplay muted></video>    <button disabled>Hang Up</button>  </div></div>

document.getElementById("hangup-button").addEventListener("click", hangUpCall);

The page structure defined here is using<div> elements, giving us full control over the page layout by enabling the use of CSS. We'll skip layout detail in this guide, buttake a look at the CSS on GitHub to see how we handled it. Take note of the two<video> elements, one for your self-view, one for the connection, and the<button> element.

The<video> element with theidreceived_video will present video received from the connected user. We specify theautoplay attribute, ensuring once the video starts arriving, it immediately plays. This removes any need to explicitly handle playback in our code. Thelocal_video<video> element presents a preview of the user's camera; specifying themuted attribute, as we don't need to hear local audio in this preview panel.

Finally, thehangup-button<button>, to disconnect from a call, is defined and configured to start disabled (setting this as our default for when no call is connected) and apply the functionhangUpCall() on click. This function's role is to close the call, and send a signalling server notification to the other peer, requesting it also close.

The JavaScript code

We'll divide this code into functional areas to more easily describe how it works. The main body of this code is found in theconnect() function: it opens up aWebSocket server on port 6503, and establishes a handler to receive messages in JSON object format. This code generally handles text chat messages as it did previously.

Sending messages to the signaling server

Throughout our code, we callsendToServer() in order to send messages to the signaling server. This function uses theWebSocket connection to do its work:

function sendToServer(msg) {  const msgJSON = JSON.stringify(msg);  connection.send(msgJSON);}

The message object passed into this function is converted into a JSON string by callingJSON.stringify(), then we call the WebSocket connection'ssend() function to transmit the message to the server.

UI to start a call

The code which handles the"user-list" message callshandleUserListMsg(). Here we set up the handler for each connected user in the user list displayed to the left of the chat panel. This function receives a message object whoseusers property is an array of strings specifying the user names of every connected user.

function handleUserListMsg(msg) {  const listElem = document.querySelector(".user-list-box");  while (listElem.firstChild) {    listElem.removeChild(listElem.firstChild);  }  msg.users.forEach((username) => {    const item = document.createElement("li");    item.appendChild(document.createTextNode(username));    item.addEventListener("click", invite, false);    listElem.appendChild(item);  });}

After getting a reference to the<ul> which contains the list of user names into the variablelistElem, we empty the list by removing each of its child elements.

Note:Obviously, it would be more efficient to update the list by adding and removing individual users instead of rebuilding the whole list every time it changes, but this is good enough for the purposes of this example.

Then we iterate over the array of user names usingforEach(). For each name, we create a new<li> element, then create a new text node containing the user name usingcreateTextNode(). That text node is added as a child of the<li> element. Next, we set a handler for theclick event on the list item, that clicking on a user name calls ourinvite() method, which we'll look at in the next section.

Finally, we append the new item to the<ul> that contains all of the user names.

Starting a call

When the user clicks on a username they want to call, theinvite() function is invoked as the event handler for thatclick event:

const mediaConstraints = {  audio: true, // We want an audio track  video: true, // And we want a video track};function invite(evt) {  if (myPeerConnection) {    alert("You can't start a call because you already have one open!");  } else {    const clickedUsername = evt.target.textContent;    if (clickedUsername === myUsername) {      alert(        "I'm afraid I can't let you talk to yourself. That would be weird.",      );      return;    }    targetUsername = clickedUsername;    createPeerConnection();    navigator.mediaDevices      .getUserMedia(mediaConstraints)      .then((localStream) => {        document.getElementById("local_video").srcObject = localStream;        localStream          .getTracks()          .forEach((track) => myPeerConnection.addTrack(track, localStream));      })      .catch(handleGetUserMediaError);  }}

This begins with a basic sanity check: is the user already connected? If there's already aRTCPeerConnection, they obviously can't make a call. Then the name of the user that was clicked upon is obtained from the event target'stextContent property, and we check to be sure that it's not the same user that's trying to start the call.

Then we copy the name of the user we're calling into the variabletargetUsername and callcreatePeerConnection(), a function which will create and do basic configuration of theRTCPeerConnection.

Once theRTCPeerConnection has been created, we request access to the user's camera and microphone by callingMediaDevices.getUserMedia(), which is exposed to us through theMediaDevices.getUserMedia property. When this succeeds, fulfilling the returned promise, ourthen handler is executed. It receives, as input, aMediaStream object representing the stream with audio from the user's microphone and video from their webcam.

Note:We could restrict the set of permitted media inputs to a specific device or set of devices by callingnavigator.mediaDevices.enumerateDevices() to get a list of devices, filtering the resulting list based on our desired criteria, then using the selected devices'deviceId values in thedeviceId field of themediaConstraints object passed intogetUserMedia(). In practice, this is rarely if ever necessary, since most of that work is done for you bygetUserMedia().

We attach the incoming stream to the local preview<video> element by setting the element'ssrcObject property. Since the element is configured to automatically play incoming video, the stream begins playing in our local preview box.

We then iterate over the tracks in the stream, callingaddTrack() to add each track to theRTCPeerConnection. Even though the connection is not fully established yet, you can begin sending data when you feel it's appropriate to do so. Media received before the ICE negotiation is completed may be used to help ICE decide upon the best connectivity approach to take, thus aiding in the negotiation process.

Note that for native apps, such as a phone application, you should not begin sending until the connection has been accepted at both ends, at a minimum, to avoid inadvertently sending video and/or audio data when the user isn't prepared for it.

As soon as media is attached to theRTCPeerConnection, anegotiationneeded event is triggered at the connection, so that ICE negotiation can be started.

If an error occurs while trying to get the local media stream, our catch clause callshandleGetUserMediaError(), which displays an appropriate error to the user as required.

Handling getUserMedia() errors

If the promise returned bygetUserMedia() concludes in a failure, ourhandleGetUserMediaError() function performs.

function handleGetUserMediaError(e) {  switch (e.name) {    case "NotFoundError":      alert(        "Unable to open your call because no camera and/or microphone" +          "were found.",      );      break;    case "SecurityError":    case "PermissionDeniedError":      // Do nothing; this is the same as the user canceling the call.      break;    default:      alert(`Error opening your camera and/or microphone: ${e.message}`);      break;  }  closeVideoCall();}

An error message is displayed in all cases but one. In this example, we ignore"SecurityError" and"PermissionDeniedError" results, treating refusal to grant permission to use the media hardware the same as the user canceling the call.

Regardless of why an attempt to get the stream fails, we call ourcloseVideoCall() function to shut down theRTCPeerConnection, and release any resources already allocated by the process of attempting the call. This code is designed to safely handle partially-started calls.

Creating the peer connection

ThecreatePeerConnection() function is used by both the caller and the callee to construct theirRTCPeerConnection objects, their respective ends of the WebRTC connection. It's invoked byinvite() when the caller tries to start a call, and byhandleVideoOfferMsg() when the callee receives an offer message from the caller.

function createPeerConnection() {  myPeerConnection = new RTCPeerConnection({    iceServers: [      // Information about ICE servers - Use your own!      {        urls: "stun:stun.stunprotocol.org",      },    ],  });  myPeerConnection.onicecandidate = handleICECandidateEvent;  myPeerConnection.ontrack = handleTrackEvent;  myPeerConnection.onnegotiationneeded = handleNegotiationNeededEvent;  myPeerConnection.onremovetrack = handleRemoveTrackEvent;  myPeerConnection.oniceconnectionstatechange =    handleICEConnectionStateChangeEvent;  myPeerConnection.onicegatheringstatechange =    handleICEGatheringStateChangeEvent;  myPeerConnection.onsignalingstatechange = handleSignalingStateChangeEvent;}

When using theRTCPeerConnection() constructor, we will specify an object providing configuration parameters for the connection. We use only one of these in this example:iceServers. This is an array of objects describing STUN and/or TURN servers for theICE layer to use when attempting to establish a route between the caller and the callee. These servers are used to determine the best route and protocols to use when communicating between the peers, even if they're behind a firewall or usingNAT.

Note:You should always use STUN/TURN servers which you own, or which you have specific authorization to use. This example is using a known public STUN server but abusing these is bad form.

Each object iniceServers contains at least aurls field providing URLs at which the specified server can be reached. It may also provideusername andcredential values to allow authentication to take place, if needed.

After creating theRTCPeerConnection, we set up handlers for the events that matter to us.

The first three of these event handlers are required; you have to handle them to do anything involving streamed media with WebRTC. The rest aren't strictly required but can be useful, and we'll explore them. There are a few other events available that we're not using in this example, as well. Here's a summary of each of the event handlers we will be implementing:

onicecandidate: The local ICE layer calls youricecandidate event handler, when it needs you to transmit an ICE candidate to the other peer, through your signaling server. SeeSending ICE candidates for more information and to see the code for this example.
ontrack: This handler for thetrack event is called by the local WebRTC layer when a track is added to the connection. This lets you connect the incoming media to an element to display it, for example. SeeReceiving new streams for details.
onnegotiationneeded: This function is called whenever the WebRTC infrastructure needs you to start the session negotiation process anew. Its job is to create and send an offer, to the callee, asking it to connect with us. SeeStarting negotiation to see how we handle this.
onremovetrack: This counterpart toontrack is called to handle theremovetrack event; it's sent to theRTCPeerConnection when the remote peer removes a track from the media being sent. SeeHandling the removal of tracks.
oniceconnectionstatechange: Theiceconnectionstatechange event is sent by the ICE layer to let you know about changes to the state of the ICE connection. This can help you know when the connection has failed, or been lost. We'll look at the code for this example inICE connection state below.
onicegatheringstatechange: The ICE layer sends you theicegatheringstatechange event, when the ICE agent's process of collecting candidates shifts, from one state to another (such as starting to gather candidates or completing negotiation). SeeICE gathering state below.
onsignalingstatechange: The WebRTC infrastructure sends you thesignalingstatechange message when the state of the signaling process changes (or if the connection to the signaling server changes). SeeSignaling state to see our code.

Starting negotiation

Once the caller has created itsRTCPeerConnection, created a media stream, and added its tracks to the connection as shown inStarting a call, the browser will deliver anegotiationneeded event to theRTCPeerConnection to indicate that it's ready to begin negotiation with the other peer. Here's our code for handling thenegotiationneeded event:

function handleNegotiationNeededEvent() {  myPeerConnection    .createOffer()    .then((offer) => myPeerConnection.setLocalDescription(offer))    .then(() => {      sendToServer({        name: myUsername,        target: targetUsername,        type: "video-offer",        sdp: myPeerConnection.localDescription,      });    })    .catch(window.reportError);}

To start the negotiation process, we need to create and send an SDP offer to the peer we want to connect to. This offer includes a list of supported configurations for the connection, including information about the media stream we've added to the connection locally (that is, the video we want to send to the other end of the call), and any ICE candidates gathered by the ICE layer already. We create this offer by callingmyPeerConnection.createOffer().

WhencreateOffer() succeeds (fulfilling the promise), we pass the created offer information intomyPeerConnection.setLocalDescription(), which configures the connection and media configuration state for the caller's end of the connection.

Note:Technically speaking, the string returned bycreateOffer() is anRFC 3264 offer.

We know the description is valid, and has been set, when the promise returned bysetLocalDescription() is fulfilled. This is when we send our offer to the other peer by creating a new"video-offer" message containing the local description (now the same as the offer), then sending it through our signaling server to the callee. The offer has the following members:

type: The message type:"video-offer".
name: The caller's username.
target: The name of the user we wish to call.
sdp: The SDP string describing the offer.

If an error occurs, either in the initialcreateOffer() or in any of the fulfillment handlers that follow, an error is reported by invoking ourwindow.reportError() function.

OncesetLocalDescription()'s fulfillment handler has run, the ICE agent begins sendingicecandidate events to theRTCPeerConnection, one for each potential configuration it discovers. Our handler for theicecandidate event is responsible for transmitting the candidates to the other peer.

Session negotiation

Now that we've started negotiation with the other peer and have transmitted an offer, let's look at what happens on the callee's side of the connection for a while. The callee receives the offer and callshandleVideoOfferMsg() function to process it. Let's see how the callee handles the"video-offer" message.

Handling the invitation

When the offer arrives, the callee'shandleVideoOfferMsg() function is called with the"video-offer" message that was received. This function needs to do two things. First, it needs to create its ownRTCPeerConnection and add the tracks containing the audio and video from its microphone and webcam to that. Second, it needs to process the received offer, constructing and sending its answer.

function handleVideoOfferMsg(msg) {  let localStream = null;  targetUsername = msg.name;  createPeerConnection();  const desc = new RTCSessionDescription(msg.sdp);  myPeerConnection    .setRemoteDescription(desc)    .then(() => navigator.mediaDevices.getUserMedia(mediaConstraints))    .then((stream) => {      localStream = stream;      document.getElementById("local_video").srcObject = localStream;      localStream        .getTracks()        .forEach((track) => myPeerConnection.addTrack(track, localStream));    })    .then(() => myPeerConnection.createAnswer())    .then((answer) => myPeerConnection.setLocalDescription(answer))    .then(() => {      const msg = {        name: myUsername,        target: targetUsername,        type: "video-answer",        sdp: myPeerConnection.localDescription,      };      sendToServer(msg);    })    .catch(handleGetUserMediaError);}

This code is very similar to what we did in theinvite() function back inStarting a call. It starts by creating and configuring anRTCPeerConnection using ourcreatePeerConnection() function. Then it takes the SDP offer from the received"video-offer" message and uses it to create a newRTCSessionDescription object representing the caller's session description.

That session description is then passed intomyPeerConnection.setRemoteDescription(). This establishes the received offer as the description of the remote (caller's) end of the connection. If this is successful, the promise fulfillment handler (in thethen() clause) starts the process of getting access to the callee's camera and microphone usinggetUserMedia(), adding the tracks to the connection, and so forth, as we saw previously ininvite().

Once the answer has been created usingmyPeerConnection.createAnswer(), the description of the local end of the connection is set to the answer's SDP by callingmyPeerConnection.setLocalDescription(), then the answer is transmitted through the signaling server to the caller to let them know what the answer is.

Any errors are caught and passed tohandleGetUserMediaError(), described inHandling getUserMedia() errors.

Note:As is the case with the caller, once thesetLocalDescription() fulfillment handler has run, the browser begins firingicecandidate events that the callee must handle, one for each candidate that needs to be transmitted to the remote peer.

Finally, the caller handles the answer message it received by creating a newRTCSessionDescription object representing the callee's session description and passing it intomyPeerConnection.setRemoteDescription().

function handleVideoAnswerMsg(msg) {  const desc = new RTCSessionDescription(msg.sdp);  myPeerConnection.setRemoteDescription(desc).catch(window.reportError);}

Sending ICE candidates

The ICE negotiation process involves each peer sending candidates to the other, repeatedly, until it runs out of potential ways it can support theRTCPeerConnection's media transport needs. Since ICE doesn't know about your signaling server, your code handles transmission of each candidate in your handler for theicecandidate event.

Youronicecandidate handler receives an event whosecandidate property is the SDP describing the candidate (or isnull to indicate that the ICE layer has run out of potential configurations to suggest). The contents ofcandidate are what you need to transmit using your signaling server. Here's our example's implementation:

function handleICECandidateEvent(event) {  if (event.candidate) {    sendToServer({      type: "new-ice-candidate",      target: targetUsername,      candidate: event.candidate,    });  }}

This builds an object containing the candidate, then sends it to the other peer using thesendToServer() function previously described inSending messages to the signaling server. The message's properties are:

type: The message type:"new-ice-candidate".
target: The username the ICE candidate needs to be delivered to. This lets the signaling server route the message.
candidate: The SDP representing the candidate the ICE layer wants to transmit to the other peer.

The format of this message (as is the case with everything you do when handling signaling) is entirely up to you, depending on your needs; you can provide other information as required.

Note:It's important to keep in mind that theicecandidate event isnot sent when ICE candidates arrive from the other end of the call. Instead, they're sent by your own end of the call so that you can take on the job of transmitting the data over whatever channel you choose. This can be confusing when you're new to WebRTC.

Receiving ICE candidates

The signaling server delivers each ICE candidate to the destination peer using whatever method it chooses; in our example this is as JSON objects, with atype property containing the string"new-ice-candidate". OurhandleNewICECandidateMsg() function is called by our mainWebSocket incoming message code to handle these messages:

function handleNewICECandidateMsg(msg) {  const candidate = new RTCIceCandidate(msg.candidate);  myPeerConnection.addIceCandidate(candidate).catch(window.reportError);}

This function constructs anRTCIceCandidate object by passing the received SDP into its constructor, then delivers the candidate to the ICE layer by passing it intomyPeerConnection.addIceCandidate(). This hands the fresh ICE candidate to the local ICE layer, and finally, our role in the process of handling this candidate is complete.

Each peer sends to the other peer a candidate for each possible transport configuration that it believes might be viable for the media being exchanged. At some point, the two peers agree that a given candidate is a good choice and they open the connection and begin to share media. It's important to note, however, that ICE negotiation doesnot stop once media is flowing. Instead, candidates may still keep being exchanged after the conversation has begun, either while trying to find a better connection method, or because they were already in transport when the peers successfully established their connection.

In addition, if something happens to cause a change in the streaming scenario, negotiation will begin again, with thenegotiationneeded event being sent to theRTCPeerConnection, and the entire process starts again as described before. This can happen in a variety of situations, including:

Changes in the network status, such as a bandwidth change, transitioning from Wi-Fi to cellular connectivity, or the like.
Switching between the front and rear cameras on a phone.
A change to the configuration of the stream, such as its resolution or frame rate.

Receiving new streams

When new tracks are added to theRTCPeerConnection— either by calling itsaddTrack() method or because of renegotiation of the stream's format—atrack event is set to theRTCPeerConnection for each track added to the connection. Making use of newly added media requires implementing a handler for thetrack event. A common need is to attach the incoming media to an appropriate HTML element. In our example, we add the track's stream to the<video> element that displays the incoming video:

function handleTrackEvent(event) {  document.getElementById("received_video").srcObject = event.streams[0];  document.getElementById("hangup-button").disabled = false;}

The incoming stream is attached to the"received_video"<video> element, and the "Hang Up"<button> element is enabled so the user can hang up the call.

Once this code has completed, finally the video being sent by the other peer is displayed in the local browser window!

Handling the removal of tracks

Your code receives aremovetrack event when the remote peer removes a track from the connection by callingRTCPeerConnection.removeTrack(). Our handler for"removetrack" is:

function handleRemoveTrackEvent(event) {  const stream = document.getElementById("received_video").srcObject;  const trackList = stream.getTracks();  if (trackList.length === 0) {    closeVideoCall();  }}

This code fetches the incoming videoMediaStream from the"received_video"<video> element'ssrcObject property, then calls the stream'sgetTracks() method to get an array of the stream's tracks.

If the array's length is zero, meaning there are no tracks left in the stream, we end the call by callingcloseVideoCall(). This cleanly restores our app to a state in which it's ready to start or receive another call. SeeEnding the call to learn howcloseVideoCall() works.

Ending the call

There are many reasons why calls may end. A call might have completed, with one or both sides having hung up. Perhaps a network failure has occurred, or one user might have quit their browser, or had a system crash. In any case, all good things must come to an end.

Hanging up

When the user clicks the "Hang Up" button to end the call, thehangUpCall() function is called:

function hangUpCall() {  closeVideoCall();  sendToServer({    name: myUsername,    target: targetUsername,    type: "hang-up",  });}

hangUpCall() executescloseVideoCall() to shut down and reset the connection and release resources. It then builds a"hang-up" message and sends it to the other end of the call to tell the other peer to neatly shut itself down.

Ending the call

ThecloseVideoCall() function, shown below, is responsible for stopping the streams, cleaning up, and disposing of theRTCPeerConnection object:

function closeVideoCall() {  const remoteVideo = document.getElementById("received_video");  const localVideo = document.getElementById("local_video");  if (myPeerConnection) {    myPeerConnection.ontrack = null;    myPeerConnection.onremovetrack = null;    myPeerConnection.onremovestream = null;    myPeerConnection.onicecandidate = null;    myPeerConnection.oniceconnectionstatechange = null;    myPeerConnection.onsignalingstatechange = null;    myPeerConnection.onicegatheringstatechange = null;    myPeerConnection.onnegotiationneeded = null;    if (remoteVideo.srcObject) {      remoteVideo.srcObject.getTracks().forEach((track) => track.stop());    }    if (localVideo.srcObject) {      localVideo.srcObject.getTracks().forEach((track) => track.stop());    }    myPeerConnection.close();    myPeerConnection = null;  }  remoteVideo.removeAttribute("src");  remoteVideo.removeAttribute("srcObject");  localVideo.removeAttribute("src");  localVideo.removeAttribute("srcObject");  document.getElementById("hangup-button").disabled = true;  targetUsername = null;}

After pulling references to the two<video> elements, we check if a WebRTC connection exists; if it does, we proceed to disconnect and close the call:

All of the event handlers are removed. This prevents stray event handlers from being triggered while the connection is in the process of closing, potentially causing errors.
For both remote and local video streams, we iterate over each track, calling theMediaStreamTrack.stop() method to close each one.
Close theRTCPeerConnection by callingmyPeerConnection.close().
SetmyPeerConnection tonull, ensuring our code learns there's no ongoing call; this is useful when the user clicks a name in the user list.

Then for both the incoming and outgoing<video> elements, we remove theirsrc andsrcObject properties using theirremoveAttribute() methods. This completes the disassociation of the streams from the video elements.

Finally, we set thedisabled property totrue on the "Hang Up" button, making it unclickable while there is no call underway; then we settargetUsername tonull since we're no longer talking to anyone. This allows the user to call another user, or to receive an incoming call.

Dealing with state changes

There are a number of additional events for which you can set listeners to notify your code of a variety of state changes. We use three of them:iceconnectionstatechange,icegatheringstatechange, andsignalingstatechange.

ICE connection state

iceconnectionstatechange events are sent to theRTCPeerConnection by the ICE layer when the connection state changes (such as when the call is terminated from the other end).

function handleICEConnectionStateChangeEvent(event) {  switch (myPeerConnection.iceConnectionState) {    case "closed":    case "failed":      closeVideoCall();      break;  }}

Here, we apply ourcloseVideoCall() function when the ICE connection state changes to"closed" or"failed". This handles shutting down our end of the connection so that we're ready start or accept a call once again.

Note:We don't watch thedisconnected signaling state here as it can indicate temporary issues and may go back to aconnected state after some time. Watching it would close the video call on any temporary network issue.

ICE signaling state

Similarly, we watch forsignalingstatechange events. If the signaling state changes toclosed, we likewise close the call out.

function handleSignalingStateChangeEvent(event) {  switch (myPeerConnection.signalingState) {    case "closed":      closeVideoCall();      break;  }}

Note:Theclosed signaling state has been deprecated in favor of theclosediceConnectionState. We are watching for it here to add a bit of backward compatibility.

ICE gathering state

icegatheringstatechange events are used to let you know when the ICE candidate gathering process state changes. Our example doesn't use this for anything, but it can be useful to watch these events for debugging purposes, as well as to detect when candidate collection has finished.

function handleICEGatheringStateChangeEvent(event) {  // Our sample just logs information to console here,  // but you can do whatever you need.}

Next steps

You can nowtry out this example to see it in action.Open the Web console on both devices and look at the logged output—although you don't see it in the code as shown above, the code on the server (and onGitHub) has a lot of console output so you can see the signaling and connection processes at work.

Another obvious improvement would be to add a "ringing" feature, so that instead of just asking the user for permission to use the camera and microphone, a "User X is calling. Would you like to answer?" prompt appears first.

Help improve MDN

Was this page helpful to you?

Learn how to contribute.

This page was last modified onJul 7, 2025 byMDN contributors.

View this page on GitHub •Report a problem with this content

Movatterモバイル変換

In this article

Signaling and video calling

The signaling server

Readying the chat server for signaling

Designing the signaling protocol

Exchanging session descriptions

Exchanging ICE candidates

Signaling transaction flow

ICE candidate exchange process

The client application

Updating the HTML

The JavaScript code

Sending messages to the signaling server

UI to start a call

Starting a call

Handling getUserMedia() errors

Creating the peer connection

Starting negotiation

Session negotiation

Handling the invitation

Sending ICE candidates

Receiving ICE candidates

Receiving new streams

Handling the removal of tracks

Ending the call

Hanging up

Ending the call

Dealing with state changes

ICE connection state

ICE signaling state

ICE gathering state

Next steps

See also

Help improve MDN