Posted on Aug 13, 2023 • Edited on Aug 14, 2023

Exploring web API's: webcam, screen capture and Download! w/ JavaScript

#webdev #javascript #beginners #tutorial

By 2025, it’s estimated that 463 exabytes of data will be created each day globally – that’s the equivalent of 212,765,957 DVDs per day!

Media capture(Video, Audio etc) is one of the few contributing methods.

The more ways you are able collect data from users as a developer, the more interesting and creative applications you can build:

Video analysis software, video authenticator, sound analysis, editors, media capture extensions, seeing applications(Object detection, Object tracking) etc etc.

In this article we will play with this idea, of media in the web, and develop a taste for it,

before moving on to advanced applications, like video analysis, object detection and so on.

Media Capture Fundamentals

source code: git

Media Capture Fundamentals

The normal move is to start at the basic level, from capturing an image to video and then screen.

but we will re-arrange the order to capturing video, image and screen, the why will make sense soon!

create a basic html project:

  src/
    app.js
  index.html

copy and paste the following HTML starter in your index.html :

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>Media Fundamentals</title>
    <style>
        *, *::after, *::before{
            box-sizing: border-box;
        }

        body{
            margin: 0;
            padding: 0;
            background-color: white;
        }

      .media_display{
           width: 100%;
           height: 100vh;
           display: flex;
           justify-content: center;
           align-items: center;
           gap: 5px;
      }
      .media_snap,  .media_webcam {

        flex: 1;
        display: grid;
        place-content: center;
        gap:10px;


      }


    </style>
</head>
<body>

    <div class="media_display">
       <div class="media_snap">
          <img id="img_preview" width="400" height="400" />
          <button>snap📷</button>

       </div>
       <div class="media_webcam">
        <video id="vid_preview" width="400" height="400" controls></video>
        <button>record📹</button>
        </div>
    </div>


    <script src="./src/app.js"></script>
</body>
</html>

Navigate to app.js.

Let's listen for the dom content loaded event, to make sure everything is loaded is the best practice.

In the code below we are getting a reference to all necessary elements, both the buttons, and destination elements:

img to store the final snap, and video element to play the webcam/display capture.

document.addEventListener("DOMContentLoaded", ()=> {
       /**
        * @type {HTMLDivElement}
        */
       const snap = document.querySelector(".media_snap")

       /**
         * @type {[HTMLVideoElement, HTMLButtonElement]}
         */
        let [img, snapBtn] = snap.children

          snapBtn.onclick = (e) => {
            console.log("take img")
          }
        /**
        * @type {HTMLDivElement}
        */

        const camcoder = document.querySelector(".media_webcam")
        /**
         * @type {[HTMLVideoElement, HTMLButtonElement]}
         */
        let [vid, cambtn] = camcoder.children

          cambtn.onclick = (e) => {
             console.log("webcam")
          }

})

The way we get the elements may be unconventional, but it works, and it is less code,

we are getting the children elements of the div container, and destructing them to a tuple [img, snapBtn]

// children mantains the order of elements
 let [img, snapBtn] = snap.children

          snapBtn.onclick = () => {
            console.log("take img")
    }

We are going to start with the video, as stated earlier,

Capturing video

We are going to re-use most of the code from this project, for future projects and articles,

Functions are the best way to encapsulate code, so we can re-use it, declare the following function on top :

/*
* @param {HTMLVideoElement} - el
* @param {*} - config
* @param {Function} -  onPlaying
*/
function WebcamVideo(el,config, onPlaying){

}

// will be useful later ignore for now
let streaming = false;

document.addEventListener("DOMContentLoaded", ()=> {

})

The WebcamVideo function takes three parameters,

el - HTML video element, to stream the captured video into, and preview it in the page.
config - officially called constraints, an object with settings to control the capturing, for example either capture video or audio, or both etc.
onPlaying - a callback function to alert us when the streams starts playing, so we can "react".

To use devices in the web, we use a web API,

An API is a piece of code someone has written and exposed via endpoints, functions, interface etc.

Of course the above is a generalization, APIs can encompass a broader scope.

The web does provide the same via web API's, which allow us to communicate with the browser, to invoke certain behaviors, like asking for a camera, handling permissions etc

Without thinking about all the necessary details,

to access and use media device's we use the navigator web API

example, getting the web cam :

navigator.mediaDevices.getUserMedia(config);

We are ready to implement the entire WebcamVideo :

function WebcamVideo(el,config, onPlaying){
  const hasGetUserMedia = () => !!navigator.mediaDevices?.getUserMedia;

    if(hasGetUserMedia()){
        // only run's once, and passes the stream, to the video element
        navigator.mediaDevices.getUserMedia(config)
        .then(function(stream) {

           if(!el)
               return
           el.srcObject = stream;
           el.play();
           onPlaying()
        })
        .catch(function(err) {
           console.log("An error occurred! " + err);
        });
    }else{
        console.log("Need a web cam")
    }

}

Let's go thru the code top to bottom, we first check if the user has the webcam to begin with,

using the following code, which return a boolean true and false,

  const hasGetUserMedia = () => !!navigator.mediaDevices?.getUserMedia;

!! the exclamation marks are casting whatever is returned to a boolean value.

If we have a media device, then we proceed to ask for the webcam:

 if(hasGetUserMedia()){


        navigator.mediaDevices.getUserMedia(config)
        .then(function(stream) {

           if(!el)
               return
           el.srcObject = stream;
           el.play();
           onPlaying()
        })
        .catch(function(err) {
           console.log("An error occurred! " + err);
        });
    }

getUserMedia returns a promise with a media stream type, which conveniently the video element can play,

and is what we are doing below, after checking if the element exists:

         if(!el)
            return

           el.srcObject = stream;
           el.play();

On success, we call the callback function onPlaying:

 onPlaying()

Let's plug this on camBtn click:

    cambtn.onclick = () => {
              WebcamVideo(vid, { video: true, audio: false }, () => {
                 streaming = true;
                 console.log("webcam live")
              })
     }

Our constraints are simple, we are asking for video only { video: true, audio: false },

spin up your browser, given everything went well, we should see, the webcam stream, playing in the video element.

You can add controls, on the video element with the following controls property:

 <video id="vid_preview" width="400" height="400" controls></video>

A video is a collection of pictures per second, 30 pictures or ideally 60 pictures per second,

This is important because we cannot capture a single image directly with a browser, there's no API for that(currently experimental),

as compared to phones or other devices, with both video and picture functionality.

What we do is "hijack" a webcam video stream at the exact moment the user clicks snapshot, and extract that frame to image.

It's easier than it sounds, with the help of the canvas API

Capturing an Image

We already have access to the stream, from the "capturing video" section, we know we passed it to the video element.

And the canvas can extract a frame from a video player directly, with the drawImage function.

We are going to hijack it, here is the process via pseudo-logical code:

on snapshot click:
    extract frame from video element && pass it to the canvas context
    convert it to an image
    display the image

Alongside the webcam video function declare captureFrame

/*
* @param {HTMLVideoElement} - videoElement
* @param {HTMLImageElement} - imageElement
* 
*/
function captureFrame(videoElement, imageElement) {
  const canvas = document.createElement('canvas');
  canvas.width = videoElement.videoWidth;
  canvas.height = videoElement.videoHeight;
  canvas.getContext('2d').drawImage(videoElement, 0, 0, canvas.width, canvas.height);
  const capturedImage = new Image();
  capturedImage.src = canvas.toDataURL('image/jpeg');
  imageElement.src = capturedImage.src;
}

The first three lines are setup, we are creating a canvas element in memory, same size as the video:

 const canvas = document.createElement('canvas');
  canvas.width = videoElement.videoWidth;
  canvas.height = videoElement.videoHeight;

The last 4 lines, handle the rest, first extracting the frame, and converting it to an Image with canvas.toDataURL

  canvas.getContext('2d').drawImage(videoElement, 0, 0, canvas.width, canvas.height);
  const capturedImage = new Image();
  capturedImage.src = canvas.toDataURL('image/jpeg');

and finally showing in the DOM:

imageElement.src = capturedImage.src;

On snapshot click, we call capture frame, and we need a reference to video element:

let streaming = false
// hold reference to the video element
let globalVideoRef;
document.addEventListener("DOMContentLoaded", ()=> {
        ...
         let [img, snapBtn] = snap.children

          snapBtn.onclick = () => {
          // call capture frame
          if(globalVideoRef)
               captureFrame(globalVideoRef, img)
          }

        let [vid, cambtn] = camcoder.children

          cambtn.onclick = () => {
              WebcamVideo(vid, { video: true, audio: false }, () => {
                 streaming = true;
                 // point global ref to the video element
                 globalVideoRef = vid;
                 console.log("webcam recording")
              })
          }

})

First let me address the questions you might have, like why are we not using vid directly, instead assigning it to globalRef, and is that efficient?

To the first we can use vid from camcoder.children it makes no difference,

the reason we declare a global var is to allow re-use of the stream via the video element, outside of DOMContentLoaded

And it is efficient, globalRef does not copy the element, but points to it,

Objects are by reference in JavaScript and not copied.

Either way will work,

Now how about capturing the device itself(screen capture), as compared to the real world.

Capturing the device(screen capture)

The idea is the same, instead of capturing the world, we capture the screen,

and the result is the same a media stream.

We are still using navigator.mediaDevices instead of user media, we need the display(screen) media

navigator.mediaDevices.getDisplayMedia(config)

Unlike user media, the getDisplayMedia presents the user with a pop up to select what they want to capture,

Which is convenient for us, remember the definition of API's, everything is handled for us,

when we successfully get the stream, that's when we can download it, send it over a wire(e.g zoom call) etc.

In terms of API/code there's not much difference between user and display media, here is the implementation below:

/*
* @param {HTMLVideoElement} - el
* @param {*} - config
* @param {Function} -  onPlaying
*/

function getDisplay(el,config, onPlaying){

  const hasGetDisplayMedia = () => !!navigator.mediaDevices?.getDisplayMedia;

  if(hasGetDisplayMedia){

    navigator.mediaDevices.getDisplayMedia(config)
      .then((stream)=> {
        if(!el)
           return

        el.srcObject = stream;
        el.play();
        onPlaying()

      }).catch((err)=> {
         console.log(err)
      })
  }
}

To call this function, let's use a keyboard key for a change, and avoid adding another button,

let's listen for a key down event, on the key space bar, let's start display capture:

          onkeydown = (e) => {
            if(e.key == " "){
              getDisplay(vid,{ video: true, audio: true }, ()=> {
               globalVideoRef = vid;
              })
            }
          }
})

a popup with choices to capture should be presented, along with the option to capture system audio(bottom left),

because we set audio to true.

How do we download this capture to a file?

Remember the video element is the median, we use it to show or play the stream,
we are not downloading it but the media stream data type,

meaning we can download the same way for the webcam(user media) capture, as it's all media streams.

There's a lot I can say about the media stream data type and the functionality it exposes ,to avoid a long article I'll leave references to more resources in the comments section.

Downloading Media Stream

The browser provides the media recorder object, to record, well media,

the following is how we instantiate it and give it a stream to record:

const mediaRecorder = new MediaRecorder(stream);

mediaRecorder will provide functions and events to handle incoming streams.

Let's implement a function to handle starting a media recorder:

/**
 *
 * @param {MediaStream} stream
 * @param {Function} ondata
 * @returns {MediaRecorder}
 */

function startRecording(stream, ondata) {

  let mediaRecorder = new MediaRecorder(stream);

  console.log(mediaRecorder, stream)

  mediaRecorder.ondataavailable = (event) => {

    if (event.data.size > 0) {
      ondata(event.data);
    }
  };

  return mediaRecorder

}

The ondata callback, will receive the chunk's from media recorder, which we will turn into a blob,

A blob is downloadable.

Because we are efficient(not true), more of lazy developers, instead of adding buttons let's add more shortcut keys to handle recording.

Please note in a production application buttons are desirable, that's good UX, not everyone enjoys shortcuts like developers do.

Update onkeydown with the following:

          onkeydown = (e) => {
          
            if(e.key == " "){

              getDisplay(vid,{ video: true, audio: true }, ()=> {

                globalVideoRef = vid;

              })

            }else if(e.key == "r"){

              const stream = []

              mediaRecorder =  startRecording(vid.srcObject, (streamChunks)=> {

                   stream.push(streamChunks)
               })



               mediaRecorder.onstop = () => {

                  console.log("recording ended")

                  saveStream(stream)

               }



               mediaRecorder.onstart = () => {

                console.log("recording started")

               }

               mediaRecorder.start()



            }else if(e.key == "s"){

               mediaRecorder.stop()

            }

          }

On key r we call the start recording function, passing a call back that receives the recorded chunk's and we store them in an array:

              const stream = []
               // vid.srcObject is the actual stream
              mediaRecorder =  startRecording(vid.srcObject, (streamChunks)=> {

                   stream.push(streamChunks)
               })

we start the recorder with the following:

   mediaRecorder.start()

On key s we stop the recording:

else if(e.key == "s"){

               mediaRecorder.stop()

     }

on media recorder stopped, we save the stream to a webm video file:

 mediaRecorder.onstop = () => {

                  console.log("recording ended")
                  // implemented below
                  saveStream(stream)

        }

Implementation of save stream:


function saveStream(streamChunnks){

  //console.log(streamChunnks)

  const blob = new Blob(streamChunnks, { type: 'video/webm' });

  const url = URL.createObjectURL(blob);

  // Create a link element to download the video

  const a = document.createElement('a');
  a.href = url;
  a.download = 'recorded-video.webm';
  a.click();



}

Voila our video is downloaded, this will work for both user and display media.

You can test both, for the display and webcam.

So far we haven't looked at stopping the stream, even if you hit pause in the controls of the video element,

The stream is still going, you can test this by pausing and waiting for a moment and play, it will jump to what the webcam is currently getting.

Stopping the stream

To stop the stream, we can do it directly on the media tracks, and we have already seen how to get the media stream from the video element:

vid.srcObject

add a last shortcut key of c, for clear, which will stop the stream:

      else if(e.key == "c"){

              vid.srcObject.getTracks().forEach(track => track.stop());
             
               if(mediaRecorder && mediaRecorder.state == "recording"){
                mediaRecorder.stop()
              }
              vid.src = ""
              e.preventDefault()

           }

We are getting all the tracks(video, audio) stopping them, and clearing the video src, importantly checking if we are still recording and stopping the recorder.

Honestly we can keep going, the article is already too long, and this is not the last one on streams and video, more are coming.

In this article we covered all the fundamentals of media capture, from video to capturing a screen, also downloading that media.

as it stands you have all the tools to build a video/media driven web application.

Thanks for reading, I hope you enjoyed this article, please let me know your thoughts.

Oh and don't forget to give this article a ❤ and a 🦄, it really does help and is appreciated!

I will be posting articles related Machine Learning in the browser both here on dev.to and mostly on ko-fi, because I can write long(blog unfriendly) articles there, as it is my passion and line of work.

if you are interested you can follow me there, it's free!

Or want to support the blog, which Is much appreciated: