VoxEdit AI – A Conversational Video Editing Agent with Gemini and Google Cloud

#googlecloud #gemini #ai #showdev

VoxEdit AI is a conversational video editing agent that allows users to edit videos using natural language commands instead of complex editing tools. The goal of this project is to simplify video editing by allowing creators to interact with an AI assistant that understands their intent and automatically performs editing operations.

This project was built using Google’s Gemini multimodal AI models combined with Google Cloud infrastructure to create a scalable AI-powered editing pipeline.

How VoxEdit AI Works

The system allows users to upload a video clip and give editing commands such as trimming, adding sound effects, or generating audio responses. Instead of manually editing timelines, the user simply tells the AI what they want to change.

The workflow of VoxEdit AI is:

The user uploads a video through the frontend interface.
The backend processes the video and stores it temporarily for analysis.
Frames and contextual information from the video are analyzed using Gemini AI.
Gemini interprets the user’s natural language instruction and generates an editing plan.
The backend executes the plan using FFmpeg video processing tools.
The processed video is returned to the user.

Technology Stack

The system was built using the following technologies:
Gemini AI for multimodal reasoning and command interpretation
FastAPI for the backend API
FFmpeg for video editing operations
React for the frontend interface
Google Cloud Run for scalable backend deployment
Google Cloud Run allows the backend service to scale automatically and handle AI requests efficiently.

Architecture Overview

The architecture of VoxEdit AI includes:

User Interface → FastAPI Backend → Gemini AI Agent → Video Processing Engine → Google Cloud Run Deployment

This architecture enables the AI agent to understand user instructions and convert them into executable editing operations.

Conclusion

VoxEdit AI demonstrates how multimodal AI agents can transform traditional creative workflows. By combining natural language interaction with video processing and cloud infrastructure, the project shows how AI can simplify complex tasks like video editing.

This project was created for the #GeminiLiveAgentChallenge hackathon to explore the capabilities of Google’s Gemini models and Google Cloud in building next-generation AI agents. #GeminiLiveAgentChallenge.

DEV Community

VoxEdit AI – A Conversational Video Editing Agent with Gemini and Google Cloud

Top comments (0)