Creating A Fixed Place Spatial Environment for Video Conferencing

#javascript #spatialaudio #tutorial #webdev

Being in a virtual meeting where the audio comes at you can be a jarring and awkward experience as audio tracks overlap and speakers become indiscernible. Spatial audio, also known as 3D audio is a more natural approach to solving this problem. Traditionally non-spatial audio is subject to the walkie-talkie effect, where sound is flattened and output to the user via one or two channels of data in an experience that can feel like the audio is coming at you all at once. This limitation is corrected with spatial audio as sound instead comes from around you, helping create an environment where conversations blend more naturally like they would in the physical world.

In this blog post, we'll show how you can set up a 2D virtual video conference where participants will speak from fixed spatial perspectives using Dolby.io Spatial Audio and the Dolby.io Web SDK.

Account Setup

To get started with creating a static spatial audio scene you first need to have signed up for a free Dolby.io account. The free tier of Dolby.io awards you trial credit and doesn't require a credit card.

Once signed up and logged in, scroll down to the "applications" section and create a new app titled "Static_spatial". After naming the application your page will open to a list of API keys, take note of the Communications API Consumer Key and Consumer Secret.

Dolby.io Web SDK

With your account set up and your API keys on hand, we can get started with creating a static spatial video conference using the Dolby.io Web SDK.

If you haven't worked with our Web SDK before I recommend first building a basic web application by following our Web SDK Getting Started guide. If you are already familiar with the guide we can start by cloning the Fixed Place Spatial Demo repository. This project largely builds off of the original Web SDK getting started guide by adding user interface adjustments for creating a spatial experience. Whilst this blog will mainly focus on implementing a fixed spatial audio experience, all of the code is included in the repository in case you are interested in reviewing the interface changes as well.

Getting Started with a Spatial Environment

Once you have cloned the Fixed Place Spatial Demo project, the next step to adding spatial to your app is enabling it in the creation of your conference. To do this we assign an alias that can be user-defined or static depending on the scope of your web app and enable Dolby Voice, a tool that optimizes bandwidth utilization and suppresses background noise and audio defects in real-time.

let conferenceOptions = {
    alias: "spatialTestConf", // Can be user defined
    params: {dolbyVoice: true}, //Required for spatial audio
};

VoxeetSDK.conference.create(conferenceOptions)

With a Dolby Voice enabled conference created, we now activate spatial audio. It is important to note that these API calls are asynchronous operations, meaning they can vary slightly in execution time and hence rely on promises to resolve, so including the await operator or a .then() function is required.

//Start conference with audio and video turned off
VoxeetSDK.conference.join(conference, {
                    constraints: { audio: false, video: false },
                    spatialAudio: true,
                })

Setting a Spatial Scene

Part of integrating spatial audio into a web application is defining the spatial "scene", or rather how the audio renderer interprets what is "Forward" or what is "Right".

We define these directions on an "x, y, z-axis", where the larger "x" is the further to the right noise is heard, and the smaller "y" is the more participants at the top of the screen are heard from the front. In this case, the 3rd direction is irrelevant as our conference is only represented on a 2-dimensional plane, however, three-dimensional projects would also have to define an "up" axis as "z".

// Negative Y axis is heard in the forwards direction, so tell the SDK forward has -1 Y
const forward = { x: 0, y: -1, z: 0 };

// Upwards axis is unimportant for this case, we can set it to either Z = +1 or Z -1,
// we never provide a Z position
const up  = { x: 0, y: 0, z: 1 };

// Positive X axis is heard in the right-hand direction, so tell the SDK right has +1 X
const right  = { x: 1, y: 0, z: 0 };

In addition to directions, we also define a scale that mimics the physical world. In real life we define the hearing limit as the furthest possible distance you can hear someone. Whilst a variety of factors influence this limit in the real world, in the Dolby.io virtual world this limit is capped at 100 meters, so a person further than 100 meters away wouldn't be heard. This raises a question though, what is a meter in the virtual space? We define this as the "scale" parameter which can be converted from user defined units.

// scale for Z axis doesn't matter as we never provide a Z position, set it to 1
// We set the scale as 1:10, so as we move one unit in the virtual world, our hearing changes as // if we have moved 10 meters in the physical world.
 const scale = {
                    x: 0.1,
                    y: 0.1,
                    z: 1,
                };

For the purposes of our virtual conference, we want the scale to be defined by a 1:10 ratio, meaning that a guest who is assigned an "x" position 5 units greater than you would sound 50 meters further away in the "x" direction.

Setting Spatial Position

With the scale set and spatial audio enabled we now need to make sure everyone is assigned their spatial location as they join.

How people's spatial location is selected will depend on the layout of your web app, in our example code, we will be using a 3x3 square grid allowing for a maximum of 9 participants to join the web conference.

To appropriately assign spatial location we must track who joined and when. One way to do this is to define a posList object composed of 9 different arrays each containing an undefined participant ID and different position combinations. With both these lists created we next need to assign spatial positions to the attendees in the order of left to right, top to bottom as they arrive.

let posList = [
    [undefined, 1, 1],
    [undefined, 2, 1],
    [undefined, 3, 1],
    [undefined, 1, 2],
    [undefined, 2, 2],
    [undefined, 2, 3],
    [undefined, 3, 1],
    [undefined, 3, 2],
    [undefined, 3, 3],
];

There are a variety of different ways we can work to associate a particular participant with a spatial location. For example, our documentation maps the audio locations to the participant's center video. In our case, we will take a different approach that works by using a for loop to iterate over an array that records the participant’s ID. For each participant ID, the loop will then assign the corresponding spatial position according to the count. I.e. the 1st person would be assigned the array [personOneID, 1, 1] which corresponds to the first square along the top row and would sound 10 meters away in the x-direction when heard by someone who was assigned the second spatial position [personTwoID, 2, 1].

Once the position is assigned we can use the setSpatialPosition function. This function takes in a newly joined participant and assigns them to the next available cell. This means the first person will be assigned the top left square, the second person would be assigned the top center square, the third person will be assigned the top right square, and so on.


//Function for altering spatial positions as guests join
const setSpatialPosition = (participant) => {
    let spatialPosition = { x: 0, y: 0, z: 0 }; //default spatial position

    //loop over posList
    for (let i = 0; i < posList.length; i++) {
        //If posList[i] has no assigned participantID, assign one
        if (!posList[i][0] || participant.id == posList[i][0]) {
            posList[i][0] = participant.id;

            //Assigned spatial position based on join order
            spatialPosition = {
                x: posList[i][1],
                y: posList[i][2],
                z: 0, // Only 2d so "z" is never changed
            };

            break;
        }
    }
    VoxeetSDK.conference.setSpatialPosition(participant, spatialPosition);
};

In the sample code provided we included a banner that displays the spatial position of the user in terms of "x, y, and z".

Try it out yourself

Now that we have the theory out of the way we can boot up the sample app and try it out. the first step to getting the spatial app up and running is to update the last two rows of the "scripts/client.js" file with your Communications API Consumer Key and Consumer Secret.

//Update scripts/client.js with your API Keys
main(
    "Insert your Communications APIs Consumer Key here",
    "Insert your Communications APIs Consumer Secret here"
);

Now, simply open the file "index.html" in your web browser and start playing with the application.

It is important to note that hard coding your API keys into the client.js file is only for testing and should not be used for production as the key are not secure and could be stolen. Instead we we recommend using a token to initialize the SDK. For more information, see Initializing or learn about security best practices.

Next Steps

Spatial audio opens the door to a range of possibilities when building your web conferencing app, such as virtual events, meeting spaces, and collaboration tools. For this blog we kept it simple with a fixed place example, however, the tools work just as well for building a dynamically updating web app that adjusts spatial audio as the users move around in a 2D or 3D environment.

Whatever your next spatial project is, the Dolby.io team is here to help. Connect with us here or check out a few of our helpful resources to dive deeper into the awesome world of communication and spatial audio: