Summary:
This project presents an end-to-end, fully automated Video Summarization and Multilingual Narration System, built using Java, integrated with Camunda Workflow Platform, and powered by a suite of Google Cloud Platform (GCP) AI Services. It is designed to showcase advanced cloud-native capabilities for intelligent media processing and is part of a broader academic and professional contribution aligned with NIW and IEEE publication objectives.
Git Hub:
https://github.com/lalamanil/VideoSummarizedToAudioMultiLingual.git
Technologies Used:
Java (JDK 17)
Google Cloud Platform APIs:
Cloud Storage
Video Intelligence API
Vertex AI (Gemini)
Cloud Translation API
Text-to-Speech API
Service Usage API
Camunda Workflow Engine
System Overview
✅
Service Account Verification:
The solution starts by programmatically validating whether the below GCP services are enabled for a given service account
storage.googleapis.com (Cloud Storage),
videointelligence.googleapis.com (Video Intelligence API),
aiplatform.googleapis.com (Vertex AI),
translate.googleapis.com (Cloud Translation API),
speech.googleapis.com (Text-to-Speech API).
This verification is achieved using the Service Usage API, and appropriate user-friendly messages are displayed if any APIs are not enabled or if the service account lacks the required IAM permissions (such as roles/ serviceusage.viewer). Upon passing this validation stage, the application follows a Camunda-driven service task workflow that initiates the main logic.The system prompts the user to provide a local video file, which is uploaded to a preconfigured Cloud Storage bucket. The available target languages are then listed, and the user selects a preferred language for summarization and narration.
☁️
Cloud-Based Media Pipeline
The processing pipeline includes:
Cloud Strorage API - to upload the video file to GCS bucket
Video Intelligence API - extracts full transcripts from the video.
Vertex AI ( Gemini model) - performs intelligent summarization of the transcripts.
Cloud Translation API - translates the summary into the user-selected language.
Text-to-Speech API - generates high-quality audio narration in the target language.
The final narrated .mp3 file is stored both in the GCS bucket and on the local machine for convenient access.
This robust Java-based architecture enforces IAM-driven role validation (e.g., roles/storage.admin, roles/ aiplatform.user, roles/ cloudtranslate.user) and guides the user in configuring essential properties such as the service account file, project ID, and bucket name via constants in the ApplicationConstants interface.
🔄
Camunda Workflow:
Manages execution flow using Camunda BPM.
Triggers service tasks based on the validation and input stages.
The application embodies the fusion of cloud AI, workflow automation, and language accessibility - a powerful combination demonstrating how cloud-native design can be harnessed to solve real-world multimedia challenges at scale.
Architecture:
[Application architecture flow]
Camunda Workflow:
[Java based camunda workflow for automated pipeline]
Setup Instructions:
Create a service account and provide below roles
Provide roles for service account
Place your Service account in sir/main/resources and Copy the service account file name and configure it in ApplicationConstants.java
Step:1
Step2:
Please update below fields in ApplicationConstants.java
How to Run APP:
Either Run App.java from eclipse IDE or go to command prompt fire mvn package which will generate the jar file under target folder.
Then run java -jar
Output Sample:
Author: Anil Lalam
Top comments (0)