Rehema for Zoom

Posted on Jan 6, 2023 • Edited on Feb 28, 2023

Build with Me! - Getting Started with the Video SDK

#videosdk #sdk #zoom

Intro

At Zoom, we're known for the consistently high-quality media we provide to our users. As a platform for customers and developers alike, not only can you rely on that quality for your meetings, but you can also build with our servers to integrate said quality into your own apps.

How you ask? Enter, Zoom Video-SDK. With the Video SDK, you can pull high-quality audio and media into your application, while customizing it to fit your brand and voice. Want to live stream your at-home yoga class? You got it. Build out a virtual classroom? You can do that, too. With such an array of possibilities, the goal of this article is to simply get you started.

In this post, we'll go over how I built I simplified version of the published Web Video SDK app, using React. We'll cover:

Tech-stack used for the project
Generating our JWT Token & Storing our Variables
Building Out Core Functionality in index.js & App.js
Video, Audio, & Screen-Share Functionality

Again, the goal of this demo is to provide an easy-to-follow framework, as a starting point for working with the Video SDK. For a more complex build of this application, feel free to clone and review the published Video SDK Web sample application, on which this project is based.

Tech Stack Overview

For this project, we’ll be using React to develop our frontend, taking advantage of Context API to pass and share data relevant to our created Zoom client. In addition, our project will utilize react-router-dom for dynamic routing throughout.

This project makes use of ant design for styling, but Vanilla CSS or any UI library can be used, such as Material UI, Bootstrap, etc. Additionally, our project will require encryption for JWT generation. For this, we’ll use jsrsasign as our cryptography library.

To build out my API on the backend, I’ll be using Nodejs with Express. Also, I (of course) used the '@zoom/videosdk to create, manage, and join sessions.

Lastly, this project will be built using Create-React-App. Of course, if preferred, manually building and creating your own webpack file, or using parcel to create your project are effective options, as well.

Now that we’ve reviewed the packages being used in this project, let’s get to building the application! The first thing we want to do is safely create and store variables.

Generating our JWT on the Server-Side

When building with the Video SDK, a JWT signature will always be required to initialize or join a session. To generate and use your token within your application, it's best practice to write your functionality on the server-side, to ensure safe-keeping. I did this using express on top of Nodejs, making use of middleware to perform the logic for creating my token.

I first created my 'server.js' file, where I added a variable, 'router', set equal to the express.Router() class to manage my routes. After properly configuring my server.js file (pulling in requirements, setting up my port, error handling, etc.), I created a post request to send my generated token to the front end.

server/server.js

router.post('/generate', middleware.generateToken, (req, res) => {
    res.status(200).send(res.locals.token)
})

You'll see an added function in my post request 'middleware.generateToken', which was created in a separate file and then imported in. There, I handled our token generation. I destructured my request body to access my meeting arguments for use in my JWT payload. After creating my signature, I save it to my res.locals object, which was sent to the frontend in my post request. From there, I finished creating my middleware function, which can be seen below.

server/middleware.js

const KJUR = require('jsrsasign')
require('dotenv').config();

const middleware = {};

middleware.generateToken = async(req, res, next) => {
    console.log(req.body)
    try {
        let signature = '';
        const iat = Math.round(new Date().getTime() / 1000);
        const exp = iat + 60 * 60 * 2;

        const oHeader = { alg: 'HS256', typ: 'JWT' };
        const {topic, passWord, userIdentity, sessionKey, 
              roleType} = req.body
        const sdkKey = process.env.SDK_KEY;
        const sdkSecret = process.env.SDK_SECRET;
        const oPayload = {
            app_key: sdkKey,
            iat,
            exp,
            tpc: topic,
            pwd: passWord,
            user_identity: userIdentity,
            session_key: sessionKey,
            role_type: roleType,
        };
        const sHeader = JSON.stringify(oHeader);
        const sPayload = JSON.stringify(oPayload);
        signature = KJUR.jws.JWS.sign('HS256', sHeader, sPayload, sdkSecret);
        res.locals.token = signature;
        return next();
    }
    catch(err) {
        return next({err})
    }
}

(The logic used for token generation can also be found in our documentation, here).

Safely Storing Your Variables

You may have noticed in the code snippet above that the values for the sdkKey and sdkSecret are read in using process.env. This is to ensure I'm keeping my private data (API key and secret) private. I created a ‘.env’ file to store both of those credentials, and used 'dotenv' to read them into my middleware function.

Building out index.js

After building out my backend functionality for token generation, I jumped into my frontend and started with my entry-point file, index.js. Here, the first thing I did was pull in all necessary imports, including local files for styling/component rendering, and all my React necessities (react, react-Dom, used hooks, etc.). I’ve also imported the ZoomVideo’ component from the @zoom/videosdk package, which will be used to create our client.

I also imported my devConfig file. This was where I created an object to house all the necessary parameters (outside of my sdk key and sdk secret) for initializing and joining a Video SDK session. It can be seen below:

client/src/dev.js

export const devConfig = {
  topic: ' test topic',
  name: 'Chrome',
  password: 'pass', 
  role: 1, 
};

Lastly, I imported a file called ZoomContext to make use of the Context API. In this file, I created a context to easily pass values to nested components.

Imports in client/src/index.js

import React from 'react';
import ReactDOM from 'react-dom/client';
import './index.css';
import App from './App';
import { devConfig } from './config/dev';
import { generateVideoToken } from './tools/tools.jsx'
import ZoomVideo from '@zoom/videosdk';
import ZoomContext from './context/zoom-context';

client/src/context/zoom-context.js

import { VideoClient } from '@zoom/videosdk';
import React from 'react';

export default React.createContext(VideoClient);

After my imports, the next thing I did was build out my meeting arguments object, which is used to initialize and join sessions. To do this, I first declared the variable 'meetingArgs' and assigned it to a copy of my imported ‘devConfig’ object.

From there, I needed to generate and add my JWT signature to the meeting arguments object, contingent on it already containing the topic. To get this done, I first created the function 'getToken', which makes a fetch call to my backend API (described in the previous section).

client/src/index.js

let meetingArgs = {...devConfig};

const getToken = async({options}) => {
  let result; 
  result = await fetch("http://localhost:3001", options);
  result = await result.json();
  return result;

}

Next, I want to call my created function inside a conditional that checks my meeting arguments object for the necessary elements. Inside of my conditional, before calling the function, I created my requestOptions object to send to my backend route (this is necessary when making a post request).

As seen, the body of the request is my meeting arguments object (converted to a JSON string). After this, while still in my conditional, I called my 'getToken' function, passing in my 'requestOptions' object as my argument. Since this function receives and returns my generated token from the backend, I set it's output equal to my meeting argument's signature value.

client/src/index.js

if (!meetingArgs.signature && meetingArgs.sdkSecret && meetingArgs.topic) {
  const requestOptions = {
    method: 'POST',
    headers: {'Content-Type' : 'application/json'},
    body: JSON.stringify(meetingArgs) 
  }
  getToken(requestOptions).then((res)={meetingArgs.signature 
  =res})
}

My next step was creating my Video SDK client, which is used to manage my sessions. This is done by simply using the createClient method on my ZoomVideo component.

client/src/index.js

const client = ZoomVideo.createClient();

Now that I have my meeting arguments object properly configured and my client created and stored, I used React DOM to render my application. Inside my render function, I passed in my imported ‘App’ component, adding a context wrapper around it. Using the provider method (ZoomContext.Provider), I passed in my created client as the provider’s value so that it is now accessible in my App Component. Lastly, I prop-drilled down my meetingArgs object.

client/src/index.js

const root = ReactDOM.createRoot(document.getElementById('root'));
root.render(
  <React.StrictMode>
      <ZoomContext.Provider value = {client}>
        <App meetingArgs = {meetingArgs}/>
      </ZoomContext.Provider>
  </React.StrictMode>
);

Building out App.js

In App.js, I built out the necessary functionality for when the application first launches. This includes pulling in my client & joining and starting a session. To ensure efficiency, I made use of different React Hooks;

useState to manage different pieces of simple state throughout the application, such as loading text shown on the screen
useEffect to mimic componentDidMount and perform different side effects, such as initializing a session whenever there’s a change to meeting arguments
useContext to access values passed down in provider wrappers. Here, it was used to create and store my Video SDK client

After making my necessary imports (react hooks, react-router-dom, styling components, context files, etc.) and moving into my functional component, I took some preliminary steps to give myself a good starting framework. The first one was some object destructuring to access the values I prop-drilled down in index.js. This gives me easy access to those meeting arguments for future use.

client/src/app.js

const App = (props) => {
  const {
    meetingArgs: { sdkKey, topic, signature, name, password }
  } = props;

Next, I made use of the useState hook, as mentioned before, to set up pieces of state to appropriately render different parts of my application.

client/src/app.js

  const [loading, setIsLoading] = useState(true);
  const [loadingText, setLoadingText] = useState(' ');
  const [mediaStream, setMediaStream] = useState();
  const [status, setStatus] = useState(false);

The last preliminary step was using the useContext hook to grab that value I passed through in index.js, and create my client here in my App Component.

client/src/app.js

 const client = useContext(ZoomContext);

With those things out of the way, I dove into my first function, init, which is an asynchronous function that makes use of the ‘try…catch’ block for error handling. Inside init, I called client.init(), which is a Zoom Video SDK built-in method that initializes a session.

I passed in the two required parameters to this method; the language being used (English, in this case), and the dependent asset deployment path (‘CDN’, in this case. A more detailed explanation on this be found in our reference guide).

Inside my ‘try’ block, I wrote the logic for joining the session (after a successful session initialization). The join method takes in four necessary parameters; the user’s meeting topic, signature, name, and password.

After joining, I needed to set mediaStream state. I created a variable 'stream', & set it equal to the output of using the getMediaStream method on my client. Now, I’m able to use my ‘stream’ variable to manipulate any incoming media, such as audio and video. This is the variable I used to set my mediaStream state.

The last thing I’ll do inside of my ‘try’ block is set my loading state to false because we’re finished initializing and joining.

client/src/app.js

    try {
      setLoadingText('Joining Session...')
      await client.join(topic, signature, name, password)
      const stream = client.getMediaStream();
      setMediaStream(stream);
      setIsLoading(false);
    }

When I wrote all this logic for initializing and joining a session, I wrapped it inside of a useEffect hook. The side-effect performed is the execution of the init function, and the clean-up functionality for it is destroying the client, effectively ending any sessions. To do the destroying, I used the destroyClient method on my imported ZoomVideo component.

My dependency array for this useEffect functionality includes the following elements: sdkKey, signature, client, topic, name, password. Put together, all of this means that the created init() function will be performed as a side effect if there is any change to those previously listed elements, and before that side effect ever re-renders, the client will be destroyed. The complete functionality is shown below.

client/src/app.js

useEffect(() => {
    const init = async () => {
      await client.init('en-US', 'CDN')

    try {
      setLoadingText('Joining Session...')
      await client.join(topic, signature, name, password)
      const stream = client.getMediaStream();
      setMediaStream(stream);
      setIsLoading(false);
    }
    catch(err) {
      console.log('Error Joining Meeting', err);
      setIsLoading(false);
      message.error(err.reason);
    }
  }
    init();
    return () => {
      ZoomVideo.destroyClient();
    }
    }, [sdkKey, signature, client, topic, name, password])

The next piece of my App component was adding in my UI and component navigation (making use of React-Router-Dom for the latter). Really, the only UI actually written within this component was a conditional ‘loading’ statement. Otherwise, the displayed UI after loading comes from the rendered ‘home’ component, since its path is listed as ‘/’ (this makes it the default page for the loaded application. You can set whatever component you've created to be your default page).

A key aspect of App.js’s return statement is the use of Context API to pass down a value to the nested components. I made use of the MediaContext component I created and imported, and passed in my mediaStream variable as the provider’s value. This is what allowed me access to incoming media in my nested component used for video.

client/src/app.js

 return (
    <div className="App">
      {loading && <LoadingLayout content = {loadingText}/>}
      {!loading && (
        <MediaContext.Provider value = {mediaStream}>
          <Router>
          <Routes>
          <Route path = "/" element = {<Home props={props} status = {status} onLeaveOrJoin = {onLeaveOrJoin}/>}/>
          <Route path = "/video" element = {<VideoContainer/>} />
          </Routes>
          </Router>
         </MediaContext.Provider>
      )}
    </div>
  );

With that, let’s go ahead and move on to the Video Component!

Note: (While the home page is a component, it’s not essential to understanding the Video SDK, and therefore we’re skipping over it.)

Video Component

Accessing Video, Audio, and enabling Screen-sharing

Alright, now it’s time to really use the magic of the Video SDK. Inside of my video component, after pulling in my imports, the first thing I did was create some pieces of state that helped me make appropriate use of my features:

videoStarted, to know whether or not the camera is on audioStarted, to know whether or not the microphone is on
isMuted, to know whether or not the user is muted
isShareScreen, to know whether or not there’s an active screen share happening
isSAB, to know whether or not SharedArrayBuffer is enabled (discussed in further detail later)

Next, I created my client and mediaStream variables through the use of the useContext hook, accessing those passed down values ('client' passed down in index.js, and 'mediaStream' passed down in my App Component). Now that these steps are done, let’s look at what I did for my video functionality.

Video Functionality

To get started here, I created an asynchronous function to handle the turning on and off of the camera and checked my 'videoStarted' state before anything else.

To turn on and access my video as self-view, I’m going to use the startVideo method on my mediaStream variable (remember, the mediaStream variable was created in App.js by calling the .getMediaStream method, then passed down through the MediaContext Provider). The parameter that I pass through to startVideo(), though, is dependent on whether or not the browser has SharedArrayBuffer enabled.

Let’s look at the difference in how we start our video based on this factor:

When working to achieve self-view video with the Video SDK, without SharedArrayBuffer, you must pass through your HTML element as a video tag.

client/src/feature/video.js

 if (!videoStarted) {
   if (!!window.chrome && !(typeof SharedArrayBuffer 
   ==='function')) {
     setIsSAB(false);
     await mediaStream.startVideo({videoElement: 
     document.querySelector('#self-view-video')})

When working to do the same with SharedArrayBuffer enabled, the 'startVideo' method won’t take any parameters. Instead, they’ll be passed into an additional method, renderVideo. 'RenderVideo' is called on your mediaStream variable immediately following the invocation of .startVideo()
Further, your HTML element passed through to .renderVideo() will be a canvas tag, in addition to several other parameters to determine dimension. These parameters are current userID, width, height, x coordinate, and y coordinate.

client/src/feature/video.js

} else {
  setIsSAB(true);
  await mediaStream.startVideo();
  mediaStream.renderVideo(document.querySelector('#self-view- 
  canvas'),client.getCurrentUserInfo().userId, 1920, 1080, 0, 
  0,3)
 }

To stop my video, I simply invoked the stopVideo method, followed by the stopRender method if necessary (based on SharedArrayBuffer). If we do have SharedArrayBuffer enabled and are using 'stopRender', know that it only calls for two parameters; canvas element and the current userID.

My video functionality is wrapped in a useCallback hook so that the functionality is memoized, and its dependency array includes the following: mediaStream, videoStarted, isSAB, client.

The full functionality is shown below, for starting and stopping video, with and without SharedArrayBuffer enabled.

client/src/feature/video.js

   const startVideoButton = useCallback(async () => {
     if (!videoStarted) {
       if (!!window.chrome && !(typeof SharedArrayBuffer 
       ==='function')) {
         setIsSAB(false);
         await mediaStream.startVideo({videoElement: 
         document.querySelector('#self-view-video')})
       } else {
         setIsSAB(true);
         await mediaStream.startVideo();
         mediaStream.renderVideo(
         document.querySelector('#self-view-canvas'), 
         client.getCurrentUserInfo().userId, 
         1920, 1080, 0, 0, 3)
        }
       setVideoStarted(true)
   } else {
        await mediaStream.stopVideo();
        if (isSAB) { 
         mediaStream.stopRenderVideo
         (document.querySelector('#self-view-canvas'), 
         client.getCurrentUserInfo().userId)
         }
     setVideoStarted(false);
      }

  }, [mediaStream, videoStarted, client, isSAB])

Audio Functionality

Starting our audio is a bit simpler than working with our video. In this app, I wrote functionality to start audio, mute, and unmute audio, which was all controlled by one button.

The methods for starting, muting, and unmuting audio require no parameters, making it fairly simple functionality to write out. I used the 'unmuteAudio' method to unmute, the 'muteAudio' to mute, and the 'startAudio' method to start audio and prompt the browser to ask for microphone permission. A code snippet is shown below.

client/src/feature/video.js

 const startAudioButton = useCallback(async () => {
   if (audioStarted) {
     if(isMuted) {
       await mediaStream.unmuteAudio();
       setIsMuted(false)
       } else {
           await mediaStream.muteAudio();
           setIsMuted(true);
         }
      } else {
          await mediaStream.startAudio();
          setAudioStarted(true);
        }
  }, [mediaStream, audioStarted, isMuted])

Also wrapped in a useCallback hook, my audio functionality will only fire when there’s a change to my audio state, muted state, or mediaStream variable, as shown above.

(Note: While not shown here, stopping audio is done with the 'stopAudio' method. You can incorporate this into the same button used for muting/unmuting/starting audio using a menu option, or separate the stop/start audio and mute/unmute audio buttons. For a more in-depth example, please refer to the published (Video SDK web sample application)[(https://github.com/zoom/videosdk-web-sample)], as previously mentioned).

Screen Share Functionality

The last feature I added was screen-sharing. The first piece of logic I wrote was to stop screen-sharing after checking my ‘isShareScreen’ state. I did this with the 'stopShareScreen' method, called on my mediaStream variable.

client/src/feature/video.js

const shareScreen = useCallback(async () => {
  if (isShareScreen) {
    await mediaStream.stopShareScreen();
    setIsShareScreen(false)
  }

Next, to start screen sharing, I needed to first check and see if my browser supports the WebCodecs API. Similar to working with SharedArrayBuffer for the video functionality, the HTML element I use for displaying the shared screen is determined by whether or not this browser feature is supported. If WebCodecs is supported, we’ll display on a video tag. Else wise, we’ll display on a canvas tag. The code snippet below shows the full functionality.

client/src/feature/video.js

   const shareScreen = useCallback(async () => {
     if (isShareScreen) {
       await mediaStream.stopShareScreen();
       setIsShareScreen(false)
     } else {
         if (isSupportWebCodecs()) {
           await mediaStream.startShareScreen(
          document.querySelector('#share-video'));
         } else {
             await mediaStream.startShareScreen(
             document.querySelector('#share-canvas'));
           }
           setIsShareScreen(true);
         }
   }, [isShareScreen, mediaStream])

You'll see the dependency array for my share-screen functionality includes the shareScreen state, and my mediaStream variable.

Looking at Our UI

Alright, we're almost to the finish line (woo-hoo!). Our last stop is a quick look at our UI. As mentioned at the beginning of this article, I used ant design as my styling library, so you’ll see some tags specific to it throughout the code.

The first thing I did was create both my canvas element and my video element, rendering the appropriate one based on my ternary operator (notice that the dimensions given in my video tag match the dimensions passed through in my .renderVideo() method earlier).

I did the same with another ternary operator for my screen-share element. For this demo, my screen-share is displayed on an HTML element directly next to my self-view video element.

For a more complex display, similar to that of an actual Zoom meeting, take a look at the (published sample app)[(https://github.com/zoom/videosdk-web-sample)].

client/src/feature/video.js

   return (
     <div>
       { isSAB ?
         <canvas id="self-view-canvas" width="1920" 
         height="1080"></canvas> 
         :
         <video id="self-view-video" width="1920" 
         height="1080"></video>
       }
       {!isSupportWebCodecs() ? 
         <canvas id="share-canvas" width="1920" height="1080"> 
         </canvas> 
         :
         <video id="share-video" width="1920" height="1080"> 
         </video>
       }

The rest of the logic here is where I created my buttons. I utilized ternary operators to ensure my buttons showed the appropriate text, based on the state of the feature (i.e.; 'unmute' if audio is muted). Remember, this app was built with ant design as its UI library, so certain styling elements are specific to that.

   <div className="video-footer">
     <Tooltip title={`${videoStarted ? 'Stop Camera' : 'Start 
     Camera'}`}>
       <Button 
         className='camera-button' icon={videoStarted ? 
         <VideoCameraOutlined /> : <VideoCameraAddOutlined />} 
         shape='circle'
         size='large'
         onClick={startVideoButton}
        />
      </Tooltip>

      <Tooltip title={`${!isShareScreen ? 'Share Screen':'Stop 
      Sharing Screen'}`}>
        <Button
          className='camera-button'
          icon={isShareScreen ? <FullscreenOutlined /> : 
          <FullscreenExitOutlined />}
          shape='circle'
          size='large'
          onClick={shareScreen}
        />
       </Tooltip>
       <Tooltip title={`${audioStarted ? isMuted ? 'unmute' : 
       'mute' : 'Start Audio'}`}>
         <Button
           className='camera-button'
           icon={audioStarted ? isMuted ? <AudioMutedOutlined 
           /> : <AudioOutlined/> : <IconFont type ="icon- 
           headset" />}
           shape="circle"
           size="large"
           onClick={startAudioButton}
          />
         </Tooltip>
       </div>
    </div>

Summary

That’s it! All the functionality was achieved through some simple browser-feature checks, state management, UI manipulation, and using the built-in methods from the Video SDK package. Using a convenient styling library and/or some basic CSS, you can easily render the different pieces (buttons, canvases, etc.). While I created a simple video-chat application, the possibilities of what you can do are exponential (check out how one of our engineers used it to create a claw machine)!

Thanks for building with me! For a video walk-through of this build, check it out on our youtube!

Top comments (4)

firas • Mar 17 '23

Thank you for your explaining , but in my case I am at the end of my studies, and my project is to create custom applications with a powerful collaboration suite that includes designing and developing a web application that integrates features to optimize meetings (instant meetings, scheduled meetings, one-to-one calls, joining a Zoom meeting, and many more…). Each user can access these features by creating an account on this site.

I have read the Zoom documentation, but I’m confused about which app to use to start with. Should I use the Meeting SDK or Video SDK? Also, how can my users generate Zoom meetings without having a Zoom account?

By the way, the company i have an internship with is a Zoom partner. Please let me know if you have any advice to help me.

Rehema • Mar 23 '23

Hi @firas_zoom ,

Thanks for reading! In regards to which app to use for your project, I'd suggest the Meeting SDK. This SDK allows you access to creating both meetings and webinars, while you can only create sessions with the Video SDK. Further, you can integrate our APIs and webhooks with the SDK to create a majority of the features you mentioned. Please see this chart for a side-by-side comparison of the two SDKs.

Thanks,
Rehema

firas • Mar 27 '23

Hi @rehema ,
Thank you for your explanations, but I would like to know how my users can generate Zoom meetings without having a Zoom account, simply by having an account in my application?

NamrataWalchale • Jun 28 '23

Hello @rehema ,
Thank you for your explanation. But I am in bit trouble I am having MERN Stack project in that my requirement is create a zoom online meeting link then send that link to participant through WhatsApp or email . So for that I purchase Video SDK account and clone git repo in my existing project but having some issue like git repo having language as typescript I am unable to convert it JavaScript (for e.g context/MediaContex file which is Typescript). I have read the Zoom documentation, but I’m confused about which app to use to start with.
So can you please guide me how to achieve my requirement.
Waiting for reply.

DEV Community

Build with Me! - Getting Started with the Video SDK

Intro

Tech Stack Overview

Generating our JWT on the Server-Side

Safely Storing Your Variables

Building out index.js

Building out App.js

Video Component

Accessing Video, Audio, and enabling Screen-sharing

Video Functionality

Audio Functionality

Screen Share Functionality

Looking at Our UI

Summary

Top comments (4)

Read next

GenAI Developer Roadmap 🚀 | Week 1, Day 2

Receive UDP Broadcast from PC on Android

Semantic search with Azure MS SQL and EF Core

Next.js form validation on the client and server with Zod