DEV Community

Cover image for Data factory for LLM video models
moses omondi
moses omondi

Posted on

Data factory for LLM video models

Hadithi is an open-source, bash-based command-line tool that enables AI and ML developers to easily convert Youtube, Torrent, and enterprise videos into high-quality datasets for fine-tuning large language models (LLMs).
access source code

Top comments (1)

Collapse
 
moses_omondi_d411af81e579 profile image
moses omondi

Hadithi automates video processing: it organizes and renames videos with timestamps, segments them into clips, detects scenes, removes audio if needed, filters out short videos, rescales and extracts frames, batches videos, validates image counts in folders, and creates videos from images at the correct frame rate.

It is easy to use, open-source, and runs entirely on a CPU with minimal setup:

Developers simply point the path to their dataset folder and, with the click of a single button, start extracting structured datasets—a task that is usually time consuming, very expensive, and requires expert skill.

The source code is written in bash, which is lightweight and easy to understand.Developers can modify the source code to suit their needs. They can even use it to set up their own data foundry!

Unlike most video processing tools, it doesn't require a GPU.Anyone with a moderate cpu and sufficient storage hardware can create thousands of videos.

Only Bash, FFmpeg, and Exiftool are required to setup the system.Sorry, Windows and Mac OS users.,I developed the system on Ubuntu 18.04 but you can test it on your operating systems.

Cloudinary image

Video API: manage, encode, and optimize for any device, channel or network condition. Deliver branded video experiences in minutes and get deep engagement insights.

Learn more

👋 Kindness is contagious

Immerse yourself in a wealth of knowledge with this piece, supported by the inclusive DEV Community—every developer, no matter where they are in their journey, is invited to contribute to our collective wisdom.

A simple “thank you” goes a long way—express your gratitude below in the comments!

Gathering insights enriches our journey on DEV and fortifies our community ties. Did you find this article valuable? Taking a moment to thank the author can have a significant impact.

Okay