Final Report - Video Editor Application, GSoC`22@OWASP

Basic overview of the project

The idea is to create a Video Editor, a web-based application that takes markdown files, PDFs or any asset file as input, separates those images and text, and converts them into a video or presentation. In technical terms, we can do this by making a video editor in React or some other framework that will take a file like writeup.pdf or another format. Then on the UI side, you have options to put text in blocks. A block will contain text, a time interval in video and an image which we will show at that time interval. That’s how we can make multiple blocks after submitting the data. We will call the backend REST APIs in Django or flask, which will take those blocks of data which contain time intervals, images and text. And convert that text into speech and ultimately combine all those into an MP4 or any video file the user can get after the request is complete.

Work

Starting the Community bonding period, I began to work on the prototype of the project, describing how it would look when completed. During this period, I had several meetings with my mentor. We discussed several things related to the prototyping and tech stack required to complete the task. Talking about the tech stacks, Django was new to me, but I was familiar with python and other backend frameworks like NodeJs; it didn’t take that long to get comfortable with Django.

Following the community bonding period, I began coding the front-end part of the application. We use React to design our application. We divided the user interface into three parts; side-drawer, bottom-drawer and main editor.
The side drawer is the application's main navigator and action centre. The bottom drawer contains the timeline with different video frames, and the main Editor section includes the section to preview the video generated.

The first few weeks entirely, I spent making the user interface to handle all requests from users and make it better and more responsive for all platforms. After completing the UI, the central part of the application comes, i.e. backend to drive the application. Initially, I started analysing the PDF by extracting texts and images from it; I took the help of PyPDF, a python package to extract the text data from it. Later on, after researching for a few days, we came across a good tool named gentlemen API, which solves our problem in creating the video file. It efficiently handled our headache with managing the video.

After completing all technical functionality driven from the backend, we needed to build a way to communicate between the frontend and backend part of our application, so we used the Django Rest framework for making all of our APIs. Using these APIs, we send the data uploaded by a user to process and after processing, i.e. generating a video file, we send it back to the user to extract in multiple formats.

Pull Requests

References

Text To Video: https://github.com/iJohnMaged/Text-To-Video-Py
Pyttsx3: For Converting Voice Rate (https://github.com/nateshmbhat/pyttsx3)
Tacotron by Yuxuan Wang (https://arxiv.org/abs/1703.10135)
Pytessaract: https://github.com/madmaze/pytesserac
Data Extract: https://towardsdatascience.com/pdf-text-extraction-in-python-5b6ab9e92dd
Video Generation from Text by Yitong Li (https://arxiv.org/abs/1710.00421)
Videgear: For Video Processing (https://pypi.org/project/vidgear/)
Smart Video Generation (https://www.datatobiz.com/blog/smart-video-generation-from-text/)

DEV Community

Final Report - Video Editor Application, GSoC`22@OWASP

Basic overview of the project

Work

Pull Requests

References

Top comments (0)

Read next

Deep Diving Into the Erlang Scheduler

ipify - GET User IP API - Free

How to Choose the Right Programming Language as a Beginner

Build An AI-Powered Code Generator (Nextjs, CopilotKit, gemini-pro, Langchain)