DEV Community

Cover image for Visualizing Deep Learning Annotations - Interactive Video Player
Abinash S
Abinash S

Posted on

Visualizing Deep Learning Annotations - Interactive Video Player

In the intricate world of video analytics, the art of visualization is as essential as the core analysis itself. It's a realm where pixels and algorithms weave a visual narrative, and every annotation is a pivotal character. Traditional HTML video tags, adept for basic video playback, find themselves outpaced in terms of annotations, particularly in the world of object tracking and detection.

The Why Behind the What: Annotations

Before we delve into the realm of visualizations, let's explore the concept of annotations. Picture a busy street scene captured on camera – each person and vehicle, a distinct element in a larger visual story. Systems skilled in Person or Object detection and tracking take the lead in this visual waltz.

Models like Yolo step in. They take a video and transform it into a series of bounding boxes, each framing an object or a person. These boxes are given in a data frame, capturing frame numbers, object classes, confidence scores, and respective coordinates. These coordinates mark the boundaries of each bounding box, neatly framing every key element in our scene.

Image sample annotation image

Imagine a scenario where you want precise control over your video annotations. Sometimes, you might need to highlight specific objects, display all annotations, or perhaps none at all, especially when integrating these features into a website or mobile app. Constantly relying on a Python backend to generate and retrieve a new video for each user action is far from ideal. This kind of agility needs to be at the client's fingertips. The twist? The humble HTML <video> tag is not equipped to handle these dynamic, on-the-fly annotations. This is the cue for our custom solution to take center stage – a harmonious blend of a canvas element and a bespoke video player.

Custom Video Player

For this task, Vue was my go-to for its reactive capabilities, complemented by Vuetify's sleek UI components. The result? A video player that not only plays the video but also seamlessly overlays machine learning annotations, synchronized with the video frames and equipped with custom video controls.

Workflow

The heart of our solution lies in integrating the <canvas> element with video playback, allowing us to paint annotations directly over the video. Here is an overview,

  1. Video Tag Creation: Start by creating a video tag as the source for your video.
  2. Setting Up the Canvas: Position a canvas with identical dimensions as the video.
  3. Rendering Video on Canvas: Keep the video tag hidden, and instead, render the video's frames on the canvas, refreshing in sync with the browser's repaint.
  4. Developing Custom Video Controls: Craft a set of video controls using icons and CSS to programmatically manage video playback.

Note: The scope of the blog is to explore building basic canvas and not Vue 3 setup and installation. There are other great blogs out there for the same.

Here's a breakdown of the key components:

Vue Template

Let’s create the template part of our video player component. We will have the hidden video element, the canvas where are going to display video and annotations, a control button to enable and disable the annotations and video controls.



<template>
  <v-container class="mx-auto pa-0" id="canvas-container" style="max-width: 1280px; max-height: 720px">
    <div>
      <video ref="video" style="display: none;">
        <source :src="videoData.source" :type="videoData.type">
      </video>
      <canvas ref="canvas" id="canvas" width="1280" height="720"></canvas>
      <VideoControls @previousFrame="previousFrame" @playVideo="playVideo"
                         @pauseVideo="pauseVideo" @nextFrame="nextFrame" @playFram="playFrame" :isPlaying="isPlaying" :currentFrame="videoData.currentFrame" :currentDuration="videoData.currentDuration" :duration="videoData.duration"/>
            <button class="mx-auto text-center ma-3" @click="showBoundingBoxes = !showBoundingBoxes">
        BBOX
      </button>
    </div>
  </v-container>
</template>


Enter fullscreen mode Exit fullscreen mode

Component/Tag References are important as only through which we will be able to get data and control it in Vue

Annotations

We should have a backend which sends annotations in a format that javascript can handler, preferably json. When loading the video, we should also get the annotations for the video along with it. A sample annotation would look like below,



const annotations = 
{ 
    0: [
            {
          "frameNo": 0,
          "bbox": [295, 40, 750, 50],
          "class": "scoreboard",
          "fillColor": [152, 255, 255],
                "confidence": 0.89
        },
            {
          "frameNo": 0,
          "bbox": [595, 151, 227, 569],
          "class": "player",
          "fillColor": [255, 255, 255]
                "confidence": 0.60
        },
        ],
    // Other frames and their detections
}


Enter fullscreen mode Exit fullscreen mode

Getting started with States

Vue 3's Composition API introduces a more flexible way to manage state in your components. We start by declaring our reactive states using ref



const annotations = ref({
  // ... the annotations constant or load from from http request
});
const showBoundingBoxes = ref(false);
const video = ref(null);
const canvas = ref(null);
const isPlaying = ref(false);
const requestedFrame = ref(0);
const frameRequestId = ref(null);
const ctx = ref(null);
const videoData = ref({
  source: "samplevideo.mp4",
  type: "video/mp4",
  fps: 60,
  duration: 0,
  currentDuration: 0,
  currentFrame: 0
});


Enter fullscreen mode Exit fullscreen mode

Frame Calculation and Rendering

An essential aspect of our player is calculating the current frame based on the duration, as the HTML5 video tag doesn’t provide this directly



const estimateCurrentFrame = (currentTime) => {
  return Math.floor(currentTime * videoData.value.fps) ?? 1;
}


Enter fullscreen mode Exit fullscreen mode

To play the video in canvas, we have to take every frame in the video and draw that in the canvas and check if the frame has any annotations associated. If so, we have to draw boxes in the canvas with the help of bbox coordinates. Let’s create the functions to do these.



const drawFrame = () => {
  videoData.value.currentFrame = estimateCurrentFrame(video.value.currentTime);
  videoData.value.currentDuration = video.value.currentTime;
  // Draw the current frame from the video onto the canvas
  ctx.value.drawImage(video.value, 0, 0, canvas.value.width, canvas.value.height);
  // Continue drawing the next frame
  frameRequestId.value = requestAnimationFrame(drawFrame);
  if (showBoundingBoxes) drawBoundingBoxes(estimateCurrentFrame(video.value.currentTime));
}


Enter fullscreen mode Exit fullscreen mode


const drawBoundingBox = (frameNo, label, x, y, width, height, color) => {
  // Set the color and border width for the bounding box
  ctx.value.strokeStyle = color;
  ctx.value.lineWidth = 2;
  // Draw the bounding box
  ctx.value.beginPath();
  ctx.value.rect(x, y, width, height);
  ctx.value.stroke();
  // Add the label text above the bounding box
  ctx.value.fillStyle = color;
  ctx.value.font = '12px Arial';
  ctx.value.fillText(`${label}`, x, y - 5);
}


Enter fullscreen mode Exit fullscreen mode


const drawBoundingBoxes = (frameNo) => {
  const boundingBoxes = annotations.value[frameNo];
    // Iterate annotations as there can be multiple detections in a single frame
  if (boundingBoxes.length > 0) {
    for (const item of boundingBoxes) {
      let label = item.class;
      const color = `rgb(${item.fillColor.join(',')})`;
      const [x, y, width, height] = item.bbox;
      drawBoundingBox(frameNo, label, x, y, width, height, color);
    }
  }
}


Enter fullscreen mode Exit fullscreen mode

The drawFrame method is pivotal, rendering each frame onto the canvas and overlaying it with annotations. requestAnimationFrame is a JavaScript method used for creating smooth, high-performance animations in web browsers. It tells the browser that you wish to perform an animation and requests that the browser calls a specified function to update an animation before the next repaint.

Key Considerations for Optimal Video Playback

  1. Video Dimensions: Knowing the video's height and width is crucial. These dimensions are essential for correctly sizing the canvas and accurately positioning the bounding boxes. A mismatch in dimensions could lead to improper object tagging.
  2. Frame Rate: Understanding the video's exact frame rate is vital for accurately calculating the current frame during playback. This is essential for synchronizing the bounding boxes with the video. Unfortunately, the HTML5 <video> tag doesn't provide this information directly. However, we can circumvent this by having the server send the frame rate as part of the metadata along with the annotations.

    Video Control Actions

To implement the playback controls, we can use the video tag’s default functions.



// Function to play the video
const playVideo = () => {
  video.value.play();
  // Draw the current frame
  drawFrame();
}

// Function to move to the next frame
const nextFrame = () => {
  // Move one frame forward
  video.value.currentTime += 1 / videoData.value.fps;
  drawFrame();
}

// Other required functions


Enter fullscreen mode Exit fullscreen mode

These are the functions we are going to call from the video controls component. By setting video.value.currentTime we can do many things by taking the video to a particular frame, or forward or backward 30 seconds functionality.

Events

In order to capture the video events like play, pause, load, we need to register the events when the component mounts. Let’s do that. We can use loadedmetadata event to know if the video is ready to be played. This can be useful to get the duration, other meta data as well as showing loaders if required. We need to call the drawFrame() here as it is always good to see the first frame of the video before playing it ;)



onMounted(() => {
  ctx.value = canvas.value.getContext('2d');

  video.value.addEventListener('loadedmetadata', () => {
    // The 'loadedmetadata' event is fired when the video's metadata, including the duration, is loaded.
    videoData.value.duration = video.value.duration;
  });

  video.value.addEventListener('play', () => {
    videoData.value.currentDuration = video.value.currentTime;
    videoData.value.currentFrame = estimateCurrentFrame(video.value.currentTime);
    isPlaying.value = true;
    drawFrame();
  });

  video.value.addEventListener('pause', () => {
    isPlaying.value = false;
  });

  video.value.addEventListener('ended', () => {
    isPlaying.value = false;
  });

  drawFrame();
});


Enter fullscreen mode Exit fullscreen mode

Video Playback Controls

As we hid the video and playing with the canvas, we are the ones to design controls for our video. Fortunately, HTML's video tag comes packed with programmable functions, which allowed us to craft our own set of controls tailored to our needs which we have already implemented in our parent component. We just need to call them here.

Lets create a component for the video controls and start with the video slider for video progress and other playback controls,



<template>
    <!-- Custom Slider for Video Progress -->
  <v-row class="pa-2" justify="center">
    <v-slider
        :model-value="currentDuration"
        v-model="thumb"
        @update:modelValue="emit('playFrame', thumb)"
        :max="duration"
        :step="1"
        color="white"
        thumb-color="red"
        hide-details
        rounded
        class="ma-1"
    ></v-slider>
  </v-row>
    <!-- Playback Controls -->
  <v-row class="pa-2 mb-2" justify="center">
    <!-- Play/Pause Buttons -->
    <v-col cols="1" v-if="!isPlaying">
      <v-btn icon="mdi-play-outline" @click="emit('playVideo')"></v-btn>
    </v-col>
        <!-- .... Other Required controls -->
    </v-row>
</template>


Enter fullscreen mode Exit fullscreen mode

Now, as the template is ready, lets do the script part,



<script setup>
import { ref, watch, computed } from 'vue';

// Define props
const props = defineProps({
  isPlaying: Boolean,
  currentFrame: Number,
  currentDuration: Number,
  duration: Number
});

// Define emits
const emit = defineEmits(["previousFrame", "playVideo", "pauseVideo", "nextFrame", "playFrame"]);

const thumb = ref(0);

// Watchers
watch(() => props.currentDuration, (newVal) => {
  thumb.value = newVal;
});

</script>


Enter fullscreen mode Exit fullscreen mode

Here, we show the playing duration, overall duration, play/pause, next frame and previous frame. There is also a slider, sliding on which it will calculate the duration and move the video playback to the relative duration

Our Output

With all the efforts, the result will look like this,

Image Vue Video Player

This UI isn't just a static entity; it's a canvas of possibilities, offering vast flexibility. One of the hidden gems (hint: check out the GitHub Gist) is the ability to display video captions, adding another layer of engagement.

And remember, this innovation isn't confined to the realms of Vue. The principles and processes we've explored can be adapted to various web frameworks, showcasing the universality of our approach.While this post has highlighted key aspects and challenges of our project, the full depth and scope are best experienced through the complete code. For the curious minds and avid coders, the scripts await on GitHub (link provided below). Dive in, explore, adapt it to your projects, and maybe even enhance it with your unique touch. I eagerly anticipate seeing how this project grows and transforms with your contributions and insights.

Github Gist: https://gist.github.com/s-abinash/4a3c7afaba94ab9dd74c551f0fe898fc

Top comments (0)