Keyur Paralkar

Posted on Jan 31, 2024

How I build a YouTube Video Player with ReactJS: Building FrameTooltip component

#javascript #react #opensource #webdev

Introduction

I am pretty sure you guys might have used OTT/Streaming platforms like Netflix, Hotstar or youtube. Then you might have also experienced the UI that appears when you try to hover on the seek bar of the video of any of these platform like below:

This user experience is excellent. It allows you to quickly access video insights from later or previous parts of the video. I refer to this component as the FrameTooltip component because it functions as a tooltip that displays the video frame at a specific time period.

In this blog post, we will build this component. So, without any further ado, let's get started.

Prerequisites

Basic to Intermediate understanding of react ecosystem, and context APIs.
Version control: Git
Javascript and Typescript
Basic knowledge of running python scripts

The What?

We are trying to build a tooltip component that displays the video frame at a specific period during the following interaction:

When hovering on the seek bar of the video.
While dragging the thumb of the seek bar of the video.

Video Frame Preview — A jest of the tooltip we are building

The Why?

We are doing this because:

To explore real-world scenarios and its complexities.
To enhance our mindset from a system design point of view.
It’s fun 😀

Some backstory

This blog is a part of this series which I recommend you guys to have a read. But to give a quick heads-up on what’s the project architecture let me give you a brief:

So this project is where I am trying to implement a youtube's video player clone with react. The project architecture is explained in the following GIF:

Different controls are available on the video such as seek bar, play button, etc. I call this control components. Whenever these control component changes a global state is updated which in turn updates the Video’s actual state. You can read more about this here.

The How?

Now let us dive into what it takes to build up this component:

Below are the steps that are required:

Video Preprocessing steps: whatever the video sample that you choose needs to go through the below steps
- Generating Image Sprite of the video
- Generating VTT file of the video based on the image sprite
- Adding Image Coordinates in the VTT.
Loading the VTT file in the video
Extracting the data from the VTT to the project

I know there are too many jargons here but please bear with me, I will be explaining all of it in the coming sections.

Video Preprocessing Step: Generating Image Sprite

It is imperative for us to first understand what the hell are Image Sprites.

It’s an image that consists of multiple images.
It is efficient to use a single image that consists of many images rather than fetching every image.

Consider this GIF for an example:

Image Sprite Animation — Images to Image Sprite Conversion

In GIF, at first, we have multiple images but later all of them are placed in the grid. That grid here represents the image sprites.

So based on this concept, we extrapolate our first step i.e. generating image sprites of the video:

We first convert all the frames of the video into an image.
And then put all of them into a single JPEG file.

Image Sprite Generation — Video Frames gets converted to image sprite

Video Preprocessing Step: Generating VTT file for the Video

So what is a VTT file?

VTT stands for Video Text Tracks. It is a file format for displaying timed text tracks using the track element.
The entire purpose of the VTT is to place an overlay text on videos just like subtitles.
You can read more about VTT on mdn.

Any VTT file would have the following format:

WEBVTT

00:01.000 --> 00:04.000
- [subtitle 1]: Lorem Ipsum has been the industry's standard dummy text

00:02.000 --> 00:03.000
- [subtitle 2]: when an unknown printer took it to make a type specimen book.

The above text here represents that from 1st second till the 4th second, display the subtitle: [subtitle 1]: Lorem Ipsum has been the industry's standard dummy text.

As you can see, each entry in this file corresponds to text that would be presented to the video at the specified period. Have a look at this in action below:

In the above GIF, you can also see that we load this VTT file with the help of the track element.

For our example, we will do the following:

Our .VTT file will contain for each second the name of the image sprite file along with current seconds video frame co-ordinates
We store this data in the cues because we can refer to the respective sprite file in that time span of the video
We later prepend this cue with the host url that store the sprite file e.g. Dropbox

We make sure that our VTT file will look like below:

WEBVTT

Img 1
00:00:00.000 --> 00:00:01.000
tears-of-steel-battle-clip-medium_sprite.jpg#xywh=0,0,200,83

Img 2
00:00:01.000 --> 00:00:02.000
tears-of-steel-battle-clip-medium_sprite.jpg#xywh=200,0,200,83

Img 3
00:00:02.000 --> 00:00:03.000
tears-of-steel-battle-clip-medium_sprite.jpg#xywh=400,0,200,83

Img 4
00:00:03.000 --> 00:00:04.000
tears-of-steel-battle-clip-medium_sprite.jpg#xywh=600,0,200,83

Img 5
00:00:04.000 --> 00:00:05.000
tears-of-steel-battle-clip-medium_sprite.jpg#xywh=800,0,200,83

Each cue here presents the following thing:

All these steps can be done manually with online tools but while implementing this component I found this cool python script: https://github.com/vlanard/videoscripts that combines all these steps and generates the .vtt file along with the image sprite .jpeg file name.

Each entry in this .vtt will be similar to what we saw just above i.e. name of the image sprite file along with the coordinates of the video frame in the sprite.

Now we know that we need these things to proceed forward, so let us start with the implementation:

First, choose the sample video file that you want to work with. I choose tears of steel open source video file for the project.
Next, clone this project.
The script comes with certain defaults that need to be tuned to our requirements. These defaults are as follows:
```
THUMB_RATE_SECONDS=45
THUMB_WIDTH=100
```
Where,
- THUMB_RATE_SECONDS - Tells the script to take screenshots on every Nth second.
- THUMB_WIDTH - Tells the script that generates the screenshot with a width of N pixels
We need to tune these defaults with the following values:
```
THUMB_RATE_SECONDS=1
THUMB_WIDTH=200
```
Here we tell the script that it should take a screenshot of the video every second and generate the image with a width of 200px
Now save the script and run the below command:
```
python makesprites.py <path-to-.mp4-video-file>
```
Make sure that you are inside the project.
Once the command gets executed, it generates a thumbs output folder that contains:
- Screenshots of each frame
- Image sprite
- .vtt file
Our image sprite and .vtt file will look like this:

WEBVTT

Img 1
00:00:00.000 --> 00:00:01.000
tears-of-steel-battle-clip-medium_sprite.jpg#xywh=0,0,200,83

Img 2
00:00:01.000 --> 00:00:02.000
tears-of-steel-battle-clip-medium_sprite.jpg#xywh=200,0,200,83

Img 3
00:00:02.000 --> 00:00:03.000
tears-of-steel-battle-clip-medium_sprite.jpg#xywh=400,0,200,83

Img 4
00:00:03.000 --> 00:00:04.000
tears-of-steel-battle-clip-medium_sprite.jpg#xywh=600,0,200,83

Img 5
00:00:04.000 --> 00:00:05.000
tears-of-steel-battle-clip-medium_sprite.jpg#xywh=800,0,200,83

We now have all the things we need. Now let us start with loading and extracting this data into the project.

Loading and Extracting VTT in the project

To load the above VTT, follow these steps:

Add a track element inside the video element of the Video.tsx file like below:

Once the track element is added we make use of a ref : trackMetaDataRef to access this track element in the below uesEffect:

Since we want the video frame to be displayed every second during the hover of the seek bar so for we run the above effect whenever the global state hoveredDuration changes.

💡 NOTE: hoveredDuration and hoveredUrl is a states that gets updated whenever we hover on the seek bar. You can understand about them here in the given sequence:
Changes in context, reducer, actions, triggering action on mousemove to update hoveredDuration

In this effect, we access all the available cues and index the cue array with the help of truncated hoveredDuration value. We then update the hoveredThumbnailUrl global state with the currentCue. Here is how these both states update when we hover the seekbar:

Loading the image sprite into the project

Now we have got the current cue of that second in the hoveredThumbnailUrl we can easily access this context wherever we want. Since we want to show this frame in a tooltip, therefore we create a FrameTooltip component.

Add the below code in the file: src/components/Seekbar/FrameTooltip.tsx. If this file doesn’t exists then create one at the given location:

The code is pretty simple,

The component accepts the current frame duration: duration, the current text cue from the VTT file: thumbnailUrl and the dimensions and the coordinates in dims.
We call this component in the [Seekbar](https://github.com/keyurparalkar/react-youtube-player-clone/blob/1bb248b86d3771d1551d71b2ef30090fb38f8e69/src/components/Seekbar/index.tsx#L114) component in the following way:

We split the hoveredThumbnailUrl such that we get the sprite name and the coordinates like below:

   hoveredThumbnailUrl = tears-of-steel-battle-clip- 
   medium_sprite.jpg#xywh=0,0,200,83
   /* After splitting we get: */
   spriteName = tears-of-steel-battle-clip-medium_sprite.jpg
   coords = 0,0,200,83

Next, we pass this component to the custom tooltip component that I have implemented here. We pass our FrameTooltip as a content prop to the tooltip:

All things set now, let us see how our changes looks like:

Summary

Well, folks we learned a lot in this blog post. To summarize,

We learned what are Image sprites, and VTTs.
We learned how to generate the image sprite out of a video.
We implemented our way to create a custom VTT file.
We also saw how to load and extract the VTT into our existing project.
We saw the usage of the track element.

That’s all folks! In the next blog post, we are going to implement the chapter's functionality. So stay tuned guys!!

The entire code for this tutorial can be found here.

Thank you for reading!

Follow me on twitter, github, and linkedIn.

Top comments (4)

Reena Dadanatti • Oct 16 '24

Wonderfully explained, thank you for tailoring things with such details.
May I know, how normal YT clones that we see on tutorials do this? Does it(frame tooltip) come with YT api or some library.