Hi there 🙂Â
We are two friends, are going to explain how our journey into the world of WebRTC and WebSockets was. We had a lot of struggles, disappointment and also fun. We hope this post will be helpful or at least informative for whoever is reading.
We both are frontend engineers, hence to build the app we went for well-known NextJS as our core framework.Â
For styling we chose TailwindCSS, since we haven't had any experience with it, and we wanted to play around with it.Â
After reading a couple articles and watching some tutorial videos, we realised that dealing with WebRTC natively is quite cumbersome. Due to this, PeerJS came handy to abstract away some configurations around WebRTC. While implementing peer-to-peer communication, there must be a signalling server for the purpose of keeping in sync the state/-s (muted, camera is off and etc.) between peers. Therefore, SocketIO played a role of signalling server. Basic authentication layer was integrated via Auth0 to implement certain features.
In this article we are going to develop following features:
- lobby page to setup your initial stream settings
- creation of a room to join
- meeting host abilities
- sharing screen with others
- turning off light indicator of your device if video stream is off
- visual indication of active speaker in the room
- messaging
- list of participants and their statuses
Excited? Let’s dive into it
Initial setup
- installation and configuration of TailwindCSS - Pull Request, documentation
- installation and documentation of Auth0 - Pull Request, documentation
- integration of SocketIO with NextJS - Pull Request
Before diving into explanation of features' implementation details, we would like to give you a visual representation of the folders structure. It would help to navigate while reading.
|- app
|----| index.tsx
|- components
|----| lobby.tsx
|----| control-panel.tsx
|----| chat.tsx
|----| status.tsx
|- contexts
|----| users-connection.tsx
|----| users-settings.tsx
|- hooks
|----| use-is-audio-active.ts
|----| use-media-stream.ts
|----| use-peer.ts
|----| use-screen.ts
|- pages
|----| index.tsx
|----| room
|--------| [roomId].tsx
Lobby Page
Once the user has landed on home page, they have two options: create a new room or join an existing one. No matter what is selected, they are going to go through the lobby page to setup their initial stream setting, such as muting their mic or turning their video off before entering.
// pages/[roomId].tsx
export default function Room(): NextPage {
const [isLobby, setIsLobby] = useState(true);
const { stream } = useMediaStream();
return isLobby
? <Lobby stream={stream} onJoinRoom={() => setIsLobby(false)} />
: <Room stream={stream} />;
}
As you noticed, now we have a stream. Stream can be audio and/or video, and it is a chain of data in time period. Protocol MediaCapture and Streams API allows us to create and manipulate the stream. The stream itself consist of multiple tracks, mainly audio and video.
Here is Lobby component:
// components/lobby.tsx
// pseudocode
const Lobby = ({
stream,
onJoinRoom,
}: {
stream: MediaStream;
onJoinRoom: () => void;
}) => {
const { toggleAudio, toggleVideo } = useMediaStream(stream);
return (
<>
<video srcObject={stream} />
<button onClick={toggleVideo}>Toggle video</button>
<button onClick={toggleAudio}>Toggle audio</button>
<button onClick={onJoinRoom}>Join</button>
</>
);
}
Note (from MDN):
TheÂenabled
 property on theÂ[MediaStreamTrack](https://developer.mozilla.org/en-US/docs/Web/API/MediaStreamTrack)
 interface is a Boolean value which isÂtrue
 if the track is allowed to render the source stream orÂfalse
 if it is not. This can be used to intentionally mute a track.
When enabled, a track's data is output from the source to the destination; otherwise, empty frames are output.
With that said, toggling enabled property does not require any syncing process between peers, as it happens automatically.
Going inside the room
To make our life easier, let's imagine the user went with audio and video on. Right after entering the room, Peer and Socket entities are created. Peer to connect and share a stream with other users, and Socket to transport a state of stream.
To create a peer we are going to need roomId (from useRouter) and user (from useUser)
// hooks/use-peer.ts
// core part of the code
// open connection
peer.on('open', (id: PeerId) => {
// tell others new user joined the room
socket.emit('room:join', {
roomId, // which room to connect to
user: { id, name: user.name, muted, visible } // joining user's data
});
});
Below you can find pseudo-code realisation of Room page/component:
// app/index.tsx
// pseudocode
// stream comes from Lobby page
export default App({ stream }: { stream: MediaStream }) => {
const socket = useContext(SocketContext);
const peer = usePeer(stream);
return (
<UsersStateContext.Provider>
<UsersConnectionContext.Provider value={{ peer, stream }}>
<MyStream />
<OthersStream />
<ControlPanel />
</UsersConnectionContext.Provider>
</UsersStateContext.Provider>
)
};
UsersStateContext
takes responsibility for changing and translating the state of user. UsersConnectionContext
is all about communication: entering a room, setting up a connection between peers, leaving a room and demonstration of screen. Yeap, screen demonstration is part of communication because of newly created stream for sharing user's screen. We will talk about it in more detail bit later.
So, we are inside the room. Now all the other users that have been here already, have to greed themselves, give their stream and name to display.
// contexts/users-connection.tsx
// event is listened on users who are already in the room
socket.on('user:joined', ({ id, name }: UserConfig) => {
// call to newly joined user's id with my stream and my name
const call = peer.call(id, stream, {
metadata: {
username: user.name,
},
});
});
In here, our socket basically says: "Yoyo, here we have new guest in da house" via event name user:joined, and after it is triggered, every single user comes to new guest to welcome them with stream and name. And in response they take name, stream and id of the guest.
// contexts/users-connection.tsx
// action below happens on the newly joined user's device
peer.on('call', (call) => {
const { peer, metadata } = call;
const { user } = metadata;
// answers incoming call with the stream
call.answer(stream);
// stream, name and id of user who was already in the room
call.on('stream', (stream) => appendVideoStream({ id: peer, name: user.name })(stream));
});
Success! We have established connection between peers, and now they can see and hear each other 🙂
Panel of control buttons
All good, but at this point no one can manipulate their stream. So it is a time to dig into what can be changed for a given stream:
- toggle audio
- toggle video
- shut down
- share display
// app/index.tsx
<ControlPanel
visible={visible}
muted={muted}
onLeave={() => router.push('/')}
onToggle={onToggle}
/>
There is nothing special about "leaving the room", where we just redirect to home page, and our return function inside useEffect takes care of cleaning up with destroying connection. However, interesting bit comes to onToggle
method.
// app/index.tsx
// only related part of the code
const { toggleAudio, toggleVideo } = useMediaStream(stream);
const { myId } = usePeer(stream);
const { startShare, stopShare, screenTrack } = useScreen(stream);
async function onToggle(
kind: Kind,
users?: MediaConnection[]
) {
switch (kind) {
case 'audio': {
toggleAudio();
socket.emit('user:toggle-audio', myId);
return;
}
case 'video': {
toggleVideo(
(newVideoTrack: MediaTrack) => {
users.forEach((user) => replaceTrack(user)(newVideoTrack))
}
);
socket.emit('user:toggle-video', myId);
return;
}
case 'screen': {
if (screenTrack) {
stopShare(screenTrack);
socket.emit('user:stop-share-screen');
} else {
await startShare(
() => socket.emit('user:share-screen'),
() => socket.emit('user:stop-share-screen')
);
}
return;
}
default:
break;
}
}
toggleAudio
and toggleVideo
functions are acting in a similar way but with a tiny difference in toggleVideo, that will be described further below.
Screen sharing
// hooks/use-screen.ts
async function startShare(
onstarted: () => void,
onended: () => void
) {
const screenStream = await navigator.mediaDevices.getDisplayMedia({
video: true,
audio: false,
});
const [screenTrack] = screenStream.getTracks();
setScreenTrack(screenTrack);
stream.addTrack(screenTrack);
onstarted();
// once screen is shared, tiny popup will appear with two buttons - Stop sharing, Hide
// they are NOT custom, and come as they are
// so .onended is triggered when user clicks "Stop sharing"
screenTrack.onended = () => {
stopShare(screenTrack);
onended();
};
}
To start sharing the screen, we would need to create new stream, take out its video track and extend our current stream with it. Eventually, we are going to have three tracks: audio, webcam track and screen track. Next we notify other users with event user:shared-screen so they can reset peer connection to receive additional video track.
// contexts/users-connection.tsx
socket.on('user:shared-screen', () => {
// peer connection reset
peer.disconnect();
peer.reconnect();
});
To stop sharing, we would need to stop the video track and remove it.
// hooks/use-screen.ts
function stopShare(screenTrack: MediaStreamTrack) {
screenTrack.stop();
stream.removeTrack(screenTrack);
}
Control actions of host user
The host user has permission to mute and disconnect other users. Those actions are visible once the user hovers over the video stream.
// components/video-container/index.tsx
// pseudocode
// wrapper around the stream takes a responsibility to render
// corresponding component or icon depending on the state of stream
function VideoContainer({
children,
id,
onMutePeer,
onRemovePeer
}: {
children: React.ReactNode,
id: PeerId,
onMutePeer: (id: PeerId) => void,
onRemovePeer: (id: PeerId) => void
}) {
return (
<>
<div>
/* here goes video stream component */
{children}
</div>
/* show host control panel if I created the room */
{isHost && (myId !== id) && (
<HostControlPanel
onMutePeer={() => onMutePeer && onMutePeer(id)}
onRemovePeer={() => onRemovePeer && onRemovePeer(id)}
isMuted={muted}
/>
)}
</>
)
}
To mute some other user is trivial, since MediaStreamTrack API handles that for us, but in order to visually represent that the host has muted someone, we are triggering socket event with payload of the muted user's id.
However, there are multiple actions that get called once onRemovePeer
is executed:
- send others removing peer’s id, so they can show respective icon or toaster
- remove the user from “my” room and update the state of streams
- close peer connection
// contexts/users-connection.tsx
function leaveRoom(id: PeerId) {
// notify everyone
socket.emit('user:leave', id);
// closing a peer connection
users[id].close();
// remove the user in ui
setStreams((streams) => {
const copy = {...streams};
delete copy[id];
return copy;
});
}
Turning off web-cam light indicator
Here comes the tiny difference between toggleAudio
and toggleVideo
.
While we are turning the video stream off, we have to make sure that the indicator light goes off. That guarantees that the web camera is currently switched off.
// hooks/use-media-stream.ts
// @param onTurnVideoOn - optional callback that takes newly created video track
async function toggleVideo(onTurnVideoOn?: (track: MediaTrack) => void) {
const videoTrack = stream.getVideoTracks()[0];
if (videoTrack.readyState === 'live') {
videoTrack.enabled = false;
videoTrack.stop(); // turns off web cam light indicator
} else {
const newStream = await navigator.mediaDevices.getUserMedia({
video: true,
audio: false,
});
const newVideoTrack = newStream.getVideoTracks()[0];
if (typeof onTurnVideoOn === 'function') onTurnVideoOn(newVideoTrack);
stream.removeTrack(videoTrack);
stream.addTrack(newVideoTrack);
setStream(stream);
}
}
The false
value of property enabled
does not know anything about the indicator light, hence it does not turn it off. There is a method called stop on the interface of MediaStreamTrack, which tells your browser that the stopping track is not anymore needed and changes readyState
to ended
. Unfortunately, MediaTrack does not have a start
or restart
method as you may have thought of. Therefore to turn the camera on, we create a new stream, take the video track from it and insert it into the old stream.
Wait, we are changing video track back and forth here without notifying other users in the room. Relax, replaceTrack
got your back.
// app/index.tsx
// @param track - new track to replace old track
function replaceTrack(track: MediaStreamTrack) {
return (peer: MediaConnection) => {
const sender = peer.peerConnection
.getSenders()
.find((s) => s.track.kind === track.kind);
sender?.replaceTrack(track);
}
}
Consider our web-cam is off. Now what happens when we turn it back on? An optional callback is passed inside toggleVideo, that takes a single parameter, new video track. In the body of the callback we can change our old track to the new one for each user in the room. In order to achieve this we use getSenders method of RTCPeerConnection interface that returns list of RTCRtpSender. RTCRtpSender - object that gives you the opportunity to manipulate mediatrack sending all other users.
Indicator of active speaker
Maybe you have noticed in google-meet when a user is speaking, there is a small icon in the corner of the video container, indicating that the person is currently speaking. All this logic is encapsulated inside the custom hook useIsAudioActive
.
Since we are dealing here with stream of media data and it is hard to visualise it, we think of it as nodes via WebAudio API AudioContext.
// hooks/use-is-audio-active.ts
const audioContext = new AudioContext();
const analyser = new AnalyserNode(audioContext, { fftSize });
// source is a stream (MediaStream)
const audioSource = audioContext.createMediaStreamSource(source);
// connect your audio source to output (usually laptop's mic), here it is analyser in terms of time domain
audioSource.connect(analyser);
Depending on passed value of FFT
(Fast Fourier Transform) and using requestAnimationFrame
we know whether a person is speaking returning boolean value on each frame. More detailed explanation on FFT and AnalyzerNode.
// hooks/use-is-audio-active.ts
// buffer length gives us how many different frequencies we are going to be measuring
const bufferLength = analyser.frequencyBinCount;
// array with 512 length (half of FFT) and filled with 0-s
const dataArray = new Uint8Array(bufferLength);
update();
function update() {
// fills up dataArray with ~128 samples for each index
analyser.getByteTimeDomainData(dataArray);
const sum = dataArray.reduce((a, b) => a + b, 0);
if (sum / dataArray.length / 128.0 >= 1) {
setIsSpeaking(true);
setTimeout(() => setIsSpeaking(false), 1000);
}
requestAnimationFrame(update);
}
Chat
There is no magic behind the chat feature, we just leverage the usage of socket events such as chat:post
with the payload of message and chat:get to receive a new message and append it to the list.
// components/chat/index.tsx
function Chat() {
const [text, setText] = useState('');
const [messages, setMessages] = useState<UserMessage[]>([]);
useEffect(() => {
socket.on('chat:get', (message: UserMessage) =>
setMessages(append(message))
);
}, []);
return (
<>
<MessagesContainer messages={messages}/>
<Input
value={text}
onChange={(e) => setText(e.target.value)}
onKeyDown={sendMessage}
/>
</>
);
}
// components/chat/index.tsx
function sendMessage(e: React.KeyboardEvent<HTMLInputElement>) {
if (e.key === 'Enter' && text) {
const message = {
user: username,
text,
time: formatTimeHHMM(Date.now()),
};
socket.emit('chat:post', message);
setMessages(append(message));
setText('');
}
}
List of users with their statuses
As a bonus feature, we implemented sidebar component to show each users' status in real-time.
// components/status/index.tsx
const Status = ({ muted, visible }: { muted: boolean; visible: boolean }) => {
const { avatars, muted, visible, names } = useContext(UsersStateContext);
const usersIds = Object.keys(names);
return (
<>
{usersIds.map((id) => (
<div>
<img src={avatars[id]} alt="User image" />
<span>{names[id]}</span>
<Icon variant={muted[id] ? 'muted' : 'not-muted'} />
<Icon variant={visible[id] ? 'visible' : 'not-visible'} />
</div>
))}
</>
);
};
That is it. We covered core features of standard video chat application. We hope you enjoyed and got some knowledge out of it.
Conclusion
After finishing the app, we came to know that we have just scratched the surface of WebRTC and WebSockets. Nevertheless, the core features are done, and now we have our own playground to experiment further. The source code is here
Thank you
P.S. The app is little laggish and it has some bugs that we are aware of. We are going to fix them :)
Top comments (2)
I'm getting this error how to solve it ?
Awesome. I was looking for a way to implement webrtc in my next js project and this looks like it will help in my implementation.