DEV Community

Cover image for Creating Contextual Video Overlays with TomScottPlus
Kevin Lewis for Deepgram

Posted on • Originally published at

Creating Contextual Video Overlays with TomScottPlus

The team behind TomScottPlus used Deepgram to analyze YouTube videos in real-time and provide an contextual overlays with Wikipedia links to read. I sat down with Gwendolen Sellers, Harry Langford, Maxwell Pettett, and Tim McGilly to ask them about their project.

Tom is an English YouTuber who mostly makes videos about geography, history, science, technology, and linguistics. His style is 'talk to camera' as he explains various nerdy topics, sometimes with cutaways to other experts explaining a concept.

The team took their inspiration from Tom's YouTube experience, where he shares interesting facts that inspire watchers to learn more. As they talked about learning through YouTube videos, they all agreed that it was cumbersome to learn more about topics mentioned in the videos. They found themselves often pausing videos, opening a browser tab, and searching for a mentioned topic for further reading. That's how the idea for TomScottPlus was born. TomScottPlus is a Chrome extension that aims to make this as seamless as possible by providing clickable overlay for videos with contextual Wikipedia article links in a video overlay as topics are mentioned in the video.

A frame from a playing video with Tom speaking. On the left side is a purple pane with a link to Wikipedia article "Coins of the pound sterling" with a short page summary underneath.

When a YouTube video is visited, the Chrome extension sends a request to a Python application which downloads the audio and gets a high-quality transcript using the Deepgram Python SDK and our utterances feature.

The Python application then performed basic Natural Language Processing to look for contextually-relevant words and look for matching data points on Wikipedia (which took several API requests making this quite computationally expensive even with batching). Data points were filtered based on relevance and returned to the Chrome extension, which would display data over the video.

You can check out the code for this project on GitHub.

Top comments (0)