DEV Community

Cover image for DIY Clubhouse: for iOS, Android, and even Unity
Avicus Delacroix
Avicus Delacroix

Posted on

DIY Clubhouse: for iOS, Android, and even Unity

The pandemic changed us all. Working and studying remotely, we craved live communications and seeing our friends and colleagues in person. Maybe that's why a new social network, Clubhouse, has gathered more than 6 million people in less than a year.

Clubhouse is an all-new type of social networking. No texting at all, communicate only by voice. All the communication happens in so-called "rooms", which can be open for everyone or private. The room's owner (moderator) decides who can talk, and the listeners can "raise a hand" to indicate that they want to ask a question.

However, Clubhouse has its limitations. For now, it's only available for iOS users, and to join, you need to receive an invitation from an existing user. But what if you are an Android user?

You can wait until they release an Android or desktop version, or you can make your own Clubhouse, with blackjack better security, voice recording, AI, and support for any platform. And Voximplant will help you.

In this article, you’ll see how to create a Clubhouse-like audio room with Voximplant's Web SDK, so clients can use a common browser as a client.

Please note that Voximplant supports multiple SDKs, from Web, iOS, and Android to React Native, Flutter, and even Unity. Imagine, you can embed an audio (or video) chat into your game design

Set up VoxEngine

VoxEngine is a cloud platform to enable serverless communications. Let's create a room. Log into your account (or create one) and create a new application. Then create a user for authorization and a new scenario.

First import the conference module. Then implement user disconnecting:

let owner = null;
const onEndpointDisconnected = (event)=>{
   const members = conference.getList();
   if(members.length === 1){
Enter fullscreen mode Exit fullscreen mode

Then implement permissions check:

const checkPermissions = ({call,headers}) =>{
   return new Promise((resolve)=>{
Enter fullscreen mode Exit fullscreen mode

Implement room creation and adding a new user:

let logURL = ''; // for debug reason
let conference = null;
VoxEngine.addEventListener(AppEvents.Started, event => {
   logURL = event.logURL;
   conference = VoxEngine.createConference({hd_audio:true});
VoxEngine.addEventListener(AppEvents.CallAlerting, async event => {
   const permissions = await checkPermissions(event);
   if(permissions) {, onEndpointDisconnected);;
           displayName: event.headers['X-Name'],
           mode: "FORWARD",
           direction: "BOTH",
           scheme: event.scheme
       if(conference.getList().length === 1){
           owner = conference.getList()[0].id();
   } else {{'X-Reason':'DENIED'});
Enter fullscreen mode Exit fullscreen mode

The first part is done. Now let's create a client.

Set up the client

As I decided to make an example with Web SDK, my client will be an HTML-page. I'll hide the control buttons until the SDK initialization.

<!DOCTYPE html>
<html lang="en">
       .hidden {
           display: none !important;
   <meta charset="UTF-8">
   <title>The demo</title>
<div id="btns" class="hidden">
   <p id="myname">Avi</p>
   <button id="viewer">Join as listener</button>
   <button id="speaker">Join as speaker</button>
   <button id="leave" disabled>Leave</button>
<div id="audio"></div>
<h3>Current speakers <span id="countSpeakers">0</span></h3>
<div id="endpoints"></div>
<script src="*****"></script>
Enter fullscreen mode Exit fullscreen mode

An interesting part. Let's initialize our SDK in the tag and show the control buttons if the initialization is successful.

// sdk init
const sdk = VoxImplant.getInstance();
let user = 'user*****';
const init = async () => {
   await sdk.init({ showDebugInfo: true, serverIp: 'url*****' });
   await sdk.connect();
   await sdk.login(`${user}@app**.acc**`, 'pass*****');
init().then(() => {

Create some necessary constants. I will get the user name from an HTML-tag with id=”myname”. Then let's define when each button is displayed:

let currentCall;
let currentRole;
let countSpeakers = 0;
const confNumber = 'Test room';
const speakerBtn = document.getElementById('speaker');
const viewerBtn = document.getElementById('viewer');
const leaveBtn = document.getElementById('leave');
document.getElementById('myname').innerText = myName;
let setRole = (role) => {
   currentRole = role;
   if(role === 'speaker') {
       speakerBtn.disabled = true;
       leaveBtn.disabled = false;
       viewerBtn.disabled = false;
   if(role === 'viewer') {
       speakerBtn.disabled = false;
       leaveBtn.disabled = false;
       viewerBtn.disabled = true;
   if(role === 'start') {
       speakerBtn.disabled = false;
       viewerBtn.disabled = false;
       leaveBtn.disabled = true;

Handle leaving the conversation:

let endCall = () => {
   if(currentCall && currentCall.state() !== 'ENDED') {
       document.getElementById('endpoints').innerText = '';

Then adding and removing a user to/from a conversation:

let onEndpointAdded = (e) => {
   console.warn('Endpoint added',;
   const nameTable = document.getElementById('endpoints');
   let p = document.createElement('p'); =;
   p.innerText = `Name: ${e.endpoint.displayName}, id: ${}`;
   document.getElementById('countSpeakers').innerText = countSpeakers + 1;
   e.endpoint.addEventListener(VoxImplant.EndpointEvents.RemoteMediaAdded, (ev)=> {
       console.warn('RemoteMediaAdded', ev.mediaRenderer);
       const nodeCall = document.getElementById('audio');
   e.endpoint.addEventListener(VoxImplant.EndpointEvents.RemoteMediaRemoved, (ev)=>{
       console.warn(`Endpoint ${} media removed ${ev.mediaRenderer}`);
   e.endpoint.addEventListener(VoxImplant.EndpointEvents.Removed, (ev)=>{
       console.warn(`Endpoint ${} removed`);
       let removeP = document.getElementById(;

Handle the Leave button and some necessary events:

const setCall = () => {
   leaveBtn.onclick = endCall;
   currentCall.addEventListener(VoxImplant.CallEvents.EndpointAdded, onEndpointAdded);
   currentCall.addEventListener(VoxImplant.CallEvents.MessageReceived, (e) => {
       console.warn('MessageReceived', e.text);
   //handle connection
   currentCall.addEventListener(VoxImplant.CallEvents.Connected, () => {
       console.warn(`Call connected successfully`);
   //other call event listeners
   currentCall.addEventListener(VoxImplant.CallEvents.Disconnected, () => {
       console.warn(`Call disconnected`);
   currentCall.addEventListener(VoxImplant.CallEvents.Failed, (e) => {
       console.warn(`Call failed`);

And handle the role buttons (Speaker or Listener):

speakerBtn.onclick = async () => {
   document.getElementById('endpoints').innerText = '';
   if(currentCall) {
       document.getElementById('endpoints').innerText = '';
       await currentCall.hangup();
   setTimeout(() => {
       currentCall = sdk.callConference({
           number: confNumber,
           extraHeaders: {'X-Name': myName}
   }, 300)
viewerBtn.onclick = async () => {
   if(currentCall) {
       document.getElementById('endpoints').innerText = '';
       await currentCall.hangup();
   setTimeout(() => {
       currentCall = sdk.joinAsViewer(confNumber);
   }, 300)

Done. Now we have a working room with two roles (Speaker and Listener) simple button controls and a list of active speakers. All we need to do to make it a complete app is to make a decent interface and add some user/room management with databases and search.

And do not forget that the functionality of the Voximplant engine goes far beyond this: you can schedule a voice room to record if you're unable to join due to a scheduling conflict, you can add messages if you need, you can even create interactive voice menus with speech synthesis and recognition, which can be lots of fun.

Discussion (0)