DEV Community

ANIL LALAM
ANIL LALAM

Posted on

AI-Powered Video Summarization and Multilingual Narration Using Google Cloud Platform APIs and Camunda Workflow Automation

Summary:

This project presents an end-to-end, fully automated Video Summarization and Multilingual Narration System, built using Java, integrated with Camunda Workflow Platform, and powered by a suite of Google Cloud Platform (GCP) AI Services. It is designed to showcase advanced cloud-native capabilities for intelligent media processing and is part of a broader academic and professional contribution aligned with NIW and IEEE publication objectives.

Git Hub:

https://github.com/lalamanil/VideoSummarizedToAudioMultiLingual.git

Technologies Used:

Java (JDK 17)

Google Cloud Platform APIs:

  1. Cloud Storage

  2. Video Intelligence API

  3. Vertex AI (Gemini)

  4. Cloud Translation API

  5. Text-to-Speech API

  6. Service Usage API

Camunda Workflow Engine

System Overview

Service Account Verification:

The solution starts by programmatically validating whether the below GCP services are enabled for a given service account

storage.googleapis.com (Cloud Storage),

videointelligence.googleapis.com (Video Intelligence API),

aiplatform.googleapis.com (Vertex AI),

translate.googleapis.com (Cloud Translation API),

speech.googleapis.com (Text-to-Speech API).

This verification is achieved using the Service Usage API, and appropriate user-friendly messages are displayed if any APIs are not enabled or if the service account lacks the required IAM permissions (such as roles/ serviceusage.viewer). Upon passing this validation stage, the application follows a Camunda-driven service task workflow that initiates the main logic.The system prompts the user to provide a local video file, which is uploaded to a preconfigured Cloud Storage bucket. The available target languages are then listed, and the user selects a preferred language for summarization and narration.

☁️

Cloud-Based Media Pipeline

The processing pipeline includes:

Cloud Strorage API - to upload the video file to GCS bucket

Video Intelligence API - extracts full transcripts from the video.

Vertex AI ( Gemini model) - performs intelligent summarization of the transcripts.

Cloud Translation API - translates the summary into the user-selected language.

Text-to-Speech API - generates high-quality audio narration in the target language.

The final narrated .mp3 file is stored both in the GCS bucket and on the local machine for convenient access.

This robust Java-based architecture enforces IAM-driven role validation (e.g., roles/storage.admin, roles/ aiplatform.user, roles/ cloudtranslate.user) and guides the user in configuring essential properties such as the service account file, project ID, and bucket name via constants in the ApplicationConstants interface.

🔄

Camunda Workflow:

  • Manages execution flow using Camunda BPM.

  • Triggers service tasks based on the validation and input stages.

The application embodies the fusion of cloud AI, workflow automation, and language accessibility - a powerful combination demonstrating how cloud-native design can be harnessed to solve real-world multimedia challenges at scale.

Architecture:
[Application architecture flow]

Camunda Workflow:
[Java based camunda workflow for automated pipeline]

Setup Instructions:

Create a service account and provide below roles

Provide roles for service account

Place your Service account in sir/main/resources and Copy the service account file name and configure it in ApplicationConstants.java

Step:1

Step2:

Please update below fields in ApplicationConstants.java

How to Run APP:

Either Run App.java from eclipse IDE or go to command prompt fire mvn package which will generate the jar file under target folder.

Then run java -jar

Output Sample:

Author: Anil Lalam

Top comments (0)