I just open-sourced a tool I’ve been working on called videsc.
It’s a video analyzer that uses local extraction, then sends the data to Groq for the reasoning. It reads on-screen text, transcribes audio, and describes what's happening.
bossprogrammer275
/
local-ai-video-analyzer
Analyze Videos locally and get an AI output
🎥 Local AI Video Analyzer (viddesc)
Local AI Video Analyzer is an open-source, lightweight, and quota-optimized engine for deep video understanding. It bridges the gap between local privacy/efficiency and powerful cloud-based reasoning.
By combining local OCR and audio extraction with Groq's high-speed Llama models, this tool provides a comprehensive analysis of any video—capturing visual scenes, on-screen text, and spoken dialogue—without breaking the bank.
🌟 Why This Project?
Most video AI tools are either purely cloud-based (expensive) or purely local (slow). viddesc uses a hybrid approach:
- Local Heavy Lifting: OCR and frame extraction happen on your machine.
- Turbocharged Reasoning: Groq handles the complex narrative merging in milliseconds.
- Privacy First: Temporary files are purged immediately after processing.
✨ Key Features
-
Hybrid Vision & OCR
-
Vision: Uses
meta-llama/llama-4-scout-17b-16e-instructon 5 keyframes for visual context. - Local OCR: High-frequency Tesseract OCR (1 fps) captures on-screen text with zero API…
-
Vision: Uses
Top comments (0)