DEV Community: Sam Ben

Auto-Vid: Serverless Video Processing Platform built for the AWS Lambda Hackathon

Sam Ben — Wed, 02 Jul 2025 18:50:52 +0000

This is a copy of my original participation in the DevPost AWS Lambda Hackathon: https://devpost.com/software/auto-vid-serverless-video-processing

🎬 The Inspiration

As a developer who's spent countless hours manually editing videos for side projects, I was frustrated by the repetitive nature of adding voiceovers, background music, and sound effects. Every marketing team I knew was struggling with the same "content treadmill" - needing to produce 5-10 videos per week but lacking the time or budget for professional editing.

The breakthrough moment came when I realized that most video editing follows predictable patterns: add a voiceover at specific timestamps, duck the background music during speech, insert sound effects at key moments. This seemed perfect for automation, but existing solutions were either too expensive or required complex video editing skills.

I wanted to create something that could transform a simple JSON specification into a professionally edited video - making video production as easy as writing a configuration file.

🎯 What it does

Auto-Vid transforms video creation from a manual, time-consuming process into an automated workflow. Users submit a simple JSON specification describing their video requirements - the base video file, background music, voiceover text, and sound effects with precise timing. The platform then automatically generates a professionally edited video with AI-powered text-to-speech, intelligent audio mixing (including automatic ducking of background music during speech), crossfading between music tracks, and synchronized sound effects. The entire process happens serverlessly on AWS, scaling from zero to hundreds of concurrent video processing jobs, with results delivered via secure download URLs and optional webhook notifications.

🛠️ How I Built It

Architecture Decision: I chose a fully serverless approach to handle unpredictable workloads - from zero videos per day to hundreds during peak times. The architecture uses three main components:

API Layer (Lambda + API Gateway): Lightweight functions for job submission and status checking
Processing Engine (Lambda Container): Heavy-duty video processing with MoviePy and AWS Polly
Storage & Orchestration (S3 + SQS + DynamoDB): Managed storage with reliable job queuing

Development Workflow: Local development was tricky since video processing requires the full AWS environment. I created a hybrid approach:

Individual components (TTS generation, S3 upload, webhooks) can be tested locally
Full integration testing requires AWS deployment
SAM handles the complex container build and ECR management automatically

Key Technical Implementation:

# Core video processing pipeline
def _process_video_internal(self, job_id, job_spec, job_temp_dir, start_time):
    # 1. Download assets from S3
    audio_assets = self._download_audio_assets(job_spec.assets.audio, job_temp_dir)
    video_path = self.asset_manager.download_asset(job_spec.assets.video.source, job_temp_dir)

    # 2. Load video and get duration
    video = VideoFileClip(video_path)
    video_duration = video.duration

    # 3. Generate TTS and audio clips for timeline events
    audio_clips = []
    ducking_ranges = []

    for index, event in enumerate(job_spec.timeline):
        if event.type == "tts":
            clip = self._create_tts_clip(event, index, job_temp_dir)
            audio_clips.append(clip)
            if event.data.duckingLevel is not None:
                ducking_ranges.append({
                    "start": event.start,
                    "end": event.start + clip.duration,
                    "ducking_level": event.data.duckingLevel,
                    "fade_duration": event.data.duckingFadeDuration,
                })

    # 4. Create background music with crossfading
    background_music = self._create_background_music(
        job_spec.backgroundMusic, audio_assets, video_duration
    )

    # 5. Apply audio ducking during speech
    if background_music and ducking_ranges:
        background_music = self._apply_ducking(background_music, ducking_ranges)

    # 6. Composite final video
    all_audio = [background_music] if background_music else []
    all_audio.extend(audio_clips)
    final_audio = CompositeAudioClip(all_audio)
    final_video = video.with_audio(final_audio)

    return final_video

Infrastructure as Code: Everything is defined in a single SAM template that creates:

Lambda functions with proper IAM roles
S3 bucket with organized folder structure
SQS queue for reliable job processing
DynamoDB table for status tracking
API Gateway endpoints with CORS support

🚧 Challenges Faced

Lambda Memory Limits: The biggest surprise was discovering that many AWS accounts have a 3GB Lambda memory limit by default. Video processing needs significantly more - I configured 10GB for optimal performance. This required users to request quota increases through AWS Support, which I documented thoroughly in the deployment guide.

Container Size Optimization: My initial Docker image was 800MB, which caused slow cold starts. I implemented multi-stage builds, removed unnecessary dependencies, and optimized the Python environment to get down to 360MB while maintaining full functionality.

Audio Synchronization: Getting perfect audio ducking was surprisingly complex. Background music needs to fade down smoothly when speech starts, maintain the lower volume during the entire speech clip, then fade back up. I developed a custom algorithm that:

def _apply_ducking(self, background_music, ducking_ranges):
    """Apply ducking to background music based on ranges"""
    # Sort ranges by start time
    ducking_ranges.sort(key=lambda x: x["start"])

    # Merge overlapping ranges
    merged_ranges = []
    if ducking_ranges:
        current_range = ducking_ranges[0]
        for next_range in ducking_ranges[1:]:
            if next_range["start"] <= current_range["end"]:
                # Ranges overlap, merge them
                current_range["end"] = max(current_range["end"], next_range["end"])
                # Use more aggressive ducking level (lower value)
                current_range["ducking_level"] = min(
                    current_range["ducking_level"], next_range["ducking_level"]
                )
                # Use longer fade duration
                current_range["fade_duration"] = max(
                    current_range["fade_duration"], next_range["fade_duration"]
                )
            else:
                # No overlap, add current range and start new one
                merged_ranges.append(current_range)
                current_range = next_range
        merged_ranges.append(current_range)

    # Apply ducking for each merged range
    for range_info in merged_ranges:
        if range_info["fade_duration"] > 0:
            # Apply fade effects when fade duration is specified
            background_music = background_music.with_effects(
                [
                    afx.AudioFadeIn(range_info["fade_duration"]),
                    afx.AudioFadeOut(range_info["fade_duration"]),
                ]
            ).with_volume_scaled(
                range_info["ducking_level"], range_info["start"], range_info["end"]
            )
        else:
            # Apply instant volume change when fade duration is 0
            background_music = background_music.with_volume_scaled(
                range_info["ducking_level"], range_info["start"], range_info["end"]
            )

    return background_music

Error Handling Across Distributed Components: With multiple Lambda functions, S3 operations, and external webhook calls, failure scenarios were complex. I implemented comprehensive retry logic, dead letter queues for failed jobs, and detailed error reporting that helps users understand what went wrong and how to fix it.

Empty Timeline Support: A late addition was supporting videos with just background music (empty timeline). This seemed simple but required refactoring the entire processing pipeline to handle the edge case gracefully while maintaining all the audio mixing capabilities.

🏆 Accomplishments that I am proud of

Solving Real Business Problems: Auto-Vid addresses genuine pain points in content creation - the "content treadmill" that marketing teams face, the high cost of video production, and the lack of scalable solutions for repetitive editing tasks.

Technical Excellence in Serverless Architecture: Successfully implemented complex video processing in a fully serverless environment, handling memory optimization, container builds, and distributed error handling across multiple Lambda functions while maintaining production-ready reliability.

Declarative Video Editing: Created an intuitive JSON-based specification format that makes professional video editing accessible to non-technical users, transforming complex MoviePy operations into simple configuration files.

Advanced Audio Processing: Developed sophisticated audio ducking algorithms that automatically lower background music during speech with smooth fade transitions, plus crossfading between music tracks - features typically found only in professional editing software.

Production-Ready Infrastructure: Built comprehensive error handling, retry logic, webhook notifications, and automatic resource cleanup - demonstrating that hackathon projects can achieve enterprise-grade quality and reliability.

📚 What I Learned

Building Auto-Vid taught me several crucial lessons about serverless video processing:

Lambda Container Optimization: Video processing requires significant memory and storage. I learned to optimize Docker containers for Lambda, reducing the image size from 800MB to 360MB through multi-stage builds and careful dependency management. The biggest challenge was working within Lambda's memory limits - many AWS accounts default to 3GB, requiring quota increase requests for the full 10GB needed for complex video processing.

Advanced MoviePy Techniques: Processing video in a serverless environment requires different approaches than traditional desktop editing. I developed techniques for precise audio ducking (automatically lowering background music during speech), crossfading between music tracks, and synchronizing multiple audio layers without memory overflow.

AWS Polly's Evolution: I discovered the differences between Polly's engines - standard voices for basic needs, neural for natural speech, long-form for extended content, and the new generative engine for ultra-realistic voices. Each has different latency and cost characteristics that affect the overall user experience.

Serverless Architecture Patterns: Managing a complex workflow across multiple Lambda functions taught me about event-driven architecture, proper error handling with SQS dead letter queues, and designing for eventual consistency with DynamoDB.

🚀 What's Next

Real-World Applications: Auto-Vid solves genuine business problems. I've identified use cases ranging from automated social media content creation to e-commerce product demos at scale. The declarative JSON approach means it can integrate with existing content management systems and marketing workflows.

Technical Improvements: Future enhancements include:

AI-powered video spec generation from natural language prompts using AWS Bedrock
Support for multiple video inputs (picture-in-picture, transitions)
Visual effects and text overlays
Integration with more TTS providers
Batch processing for multiple videos
Cost optimization through spot instances for non-urgent jobs

Business Potential: The serverless architecture means zero infrastructure costs when idle, making it viable for both small businesses and enterprise customers. The pay-per-use model aligns costs directly with value delivered.

Auto-Vid demonstrates that complex, traditionally expensive workflows can be democratized through thoughtful serverless architecture. By combining AWS Lambda's scalability with modern video processing libraries, it transforms video editing from a specialized skill into a simple API call.

🛠️ Built With

AWS Lambda - Serverless compute for video processing
AWS Polly - Text-to-speech generation with multiple voice engines
AWS S3 - Storage for video assets, audio files, and processed outputs
AWS SQS - Message queuing for reliable job processing
AWS DynamoDB - Status tracking and job metadata storage
AWS API Gateway - RESTful API endpoints with CORS support
AWS SAM - Infrastructure as Code deployment
MoviePy - Python library for video editing and processing
Docker - Container packaging for Lambda deployment
Python 3.12 - Core programming language
Pydantic - Data validation and JSON schema management

🚀 Try It

Ready to experience serverless video processing?

DevPost Submission: https://devpost.com/software/auto-vid-serverless-video-processing

GitHub Repository: https://github.com/ossamaweb/auto-vid

Creating Engaging, Image-Based LinkedIn Carousels with Agent.ai Automation

Sam Ben — Fri, 24 Jan 2025 18:20:17 +0000

This is a submission for the Agent.ai Challenge: Productivity-Pro Agent (See Details)

What I Built

I built a "Article Visual Carousel Pro Agent". This agent is designed to take raw webpage data (specifically, the output of a web crawler) and intelligently transform it into engaging, multi-slide LinkedIn carousels. I built this agent because I saw a need to streamline the process of creating social media content from existing web content. It automates the extraction of key information, structures it into a visually appealing format, and adds a call to action, saving marketers valuable time and effort. I envision it being used by marketing teams, content creators, and social media managers to quickly generate engaging content for social media platforms, specifically LinkedIn. This can help them repurpose existing content, reach new audiences, and drive engagement.

Here's how the agent works:

User Input: The agent starts by collecting user input:
- (1/5) Blog Post URL: The URL of the webpage to be converted into a carousel.
- (2/5) Format: The desired output format (defaults to 'Carousel - PDF - 1080 x 1080').
- (3/5) Theme: The visual theme of the carousel (defaults to 'Light 1').
- (4/5) Brand Name / Handle: The brand's name or social media handle.
- (5/5) Email Address: An email address to contact for more information.
Webpage Scraping: The agent then scrapes the content of the provided page_url and saves it to crawled_page.
AI-Powered Content Structuring: Using the crawled_page, the agent invokes a Gemini AI model (gemini-2.0-flash-exp) to analyze the scraped content. This AI acts as a marketing expert, extracting key information and structuring it into a JSON format (llm_json_output) suitable for carousel generation.
Data Validation, Mapping, and Typevis API Integration (Lambda Function): The llm_json_output is then passed to an AWS Lambda function (step 8, Language: node). This is where the core logic for preparing the data for the Typevis API is handled. Specifically, the Lambda function:
- Validates the AI Output: Checks the llm_json_output for data integrity and required fields.
- Maps to Typevis API: Transforms the llm_json_output into the specific JSON structure required by the Typevis API, ensuring compatibility.
- Sends POST Request: Makes a POST request to the Typevis API using the mapped data to generate the carousel. This step also handles any necessary API authentication.
Conditional Output (HTML): The agent uses conditional logic (If/Else) to determine the output styling. Based on the conditions, it outputs the data in HTML format with specific background colors.

Demo

You can interact with my Article Visual Carousel Pro Agent here: https://agent.ai/profile/10br0wobj8t9ns14

Youtube Video:

Agent.ai Experience

My experience with the Agent.ai Builder was mostly positive. The Builder's interface was intuitive and allowed me to quickly prototype and iterate on my agent's logic. The ability to define custom functions and integrate with external APIs was a significant highlight, enabling me to implement the complex logic required for content extraction and structuring.

One of the most delightful moments was seeing the agent successfully parse complex webpage data and generate a structured JSON output, ready to be used with the Typevis API, which is part of an app I'm currently developing. It was rewarding to see the agent transform raw text into a well-formatted, engaging carousel structure. Typevis is still in its alpha stage, and this agent is designed to integrate with it to fully automate the carousel creation process. This integration is a key part of my vision for Typevis.

However, there were also some challenging moments. Debugging complex logic and ensuring the agent handled various edge cases (e.g., missing data, overly long text) required careful planning and iterative refinement. It would be very helpful to have more robust debugging tools and real-time feedback within the builder.

Despite the challenges, the Agent.ai platform is a powerful tool for building intelligent agents, and I'm excited to see its future development and potential applications.

Conclusion

I believe this Content Carousel Creator Agent has the potential to significantly streamline content creation workflows. I encourage you to explore its capabilities and share your thoughts in the comments below. Let me know how you think this agent could be improved or used in new and innovative ways!

Building Dynamic, Multi-Slide LinkedIn Carousels with Agent.ai

Sam Ben — Fri, 24 Jan 2025 18:15:58 +0000

This is a submission for the Agent.ai Challenge: Full-Stack Agent (See Details)

What I Built

Here's how the agent works:

User Input: The agent starts by collecting user input:
- (1/5) Blog Post URL: The URL of the webpage to be converted into a carousel.
- (2/5) Format: The desired output format (defaults to 'Carousel - PDF - 1080 x 1080').
- (3/5) Theme: The visual theme of the carousel (defaults to 'Light 1').
- (4/5) Brand Name / Handle: The brand's name or social media handle.
- (5/5) Email Address: An email address to contact for more information.
Webpage Scraping: The agent then scrapes the content of the provided page_url and saves it to crawled_page.
AI-Powered Content Structuring: Using the crawled_page, the agent invokes a Gemini AI model (gemini-2.0-flash-exp) to analyze the scraped content. This AI acts as a marketing expert, extracting key information and structuring it into a JSON format (llm_json_output) suitable for carousel generation.
Data Validation, Mapping, and Typevis API Integration (Lambda Function): The llm_json_output is then passed to an AWS Lambda function (step 8, Language: node). This is where the core logic for preparing the data for the Typevis API is handled. Specifically, the Lambda function:
- Validates the AI Output: Checks the llm_json_output for data integrity and required fields.
- Maps to Typevis API: Transforms the llm_json_output into the specific JSON structure required by the Typevis API, ensuring compatibility.
- Sends POST Request: Makes a POST request to the Typevis API using the mapped data to generate the carousel. This step also handles any necessary API authentication.
Conditional Output (HTML): The agent uses conditional logic (If/Else) to determine the output styling. Based on the conditions, it outputs the data in HTML format with specific background colors.

Demo

You can interact with my Article Visual Carousel Pro Agent here: https://agent.ai/profile/10br0wobj8t9ns14

Youtube Video:

Agent.ai Experience

Despite the challenges, the Agent.ai platform is a powerful tool for building intelligent agents, and I'm excited to see its future development and potential applications.

Conclusion

Building an AWS Gamified Learning Platform with Amazon Q and Gemini: An AI-Powered Journey (Public Repo)

Sam Ben — Fri, 17 Jan 2025 00:43:49 +0000

I'm excited to share my project, CloudQuest, built for the AWS Game Builder Challenge. Inspired by the engaging learning style of Duolingo, CloudQuest transforms the often-daunting world of AWS into a fun, interactive, and rewarding game.

Demo | Github

What is CloudQuest?

CloudQuest is a gamified learning platform that helps you master Amazon Web Services (AWS) through interactive quizzes and a game-like progression system. Whether you're a complete beginner or have some cloud experience, CloudQuest is designed to make learning about AWS accessible, enjoyable, and effective.

Disclaimer: Just to clarify, CloudQuest is a project I built independently for the AWS Game Builder Challenge and is not the same as the official Amazon game, Cloud Quest. I hope you enjoy it!

How Does It Work?

The core gameplay revolves around modules and lessons. You'll start with the basic concepts and progress to more advanced topics. Each lesson presents you with 12 interactive quiz questions, designed to test and reinforce your understanding. These questions come in various formats, including:

Multiple Choice: Select the correct answer from a list of options.
True/False: Determine if a statement is true or false.
Fill-in-the-Blank: Complete the sentence with the correct words.
Short Answer: Type in a brief answer to a question.
Drag and Drop: Match items by dragging them into the correct categories.
Matching: Pair terms with their definitions.
Ordering: Arrange steps or items in the correct sequence.
Image Identification: Select the correct AWS service based on the given image.

You can play all of these question types using just your keyboard, making the game easily accessible. As you advance, you'll level up your knowledge, earn points, and track your progress.

AWS Services Used

CloudQuest leverages the following AWS services:

AWS Amplify: This is the backbone of the app, handling frontend hosting, backend, and CI/CD. It also manages user authentication and authorization using AWS Cognito.
AWS DynamoDB: This service was used as the database to store all the game data and user progress.
AWS AppSync: This service was used to create a GraphQL API, connecting the frontend to the DynamoDB database.
Amazon Q: I used Amazon Q Developer as a co-developer to assist in code generation, debugging, and research.
Gemini 2.0 Flash: Used to generate the questions for each lesson using function calling.

My Development Journey

This project was an exciting and challenging experience. Here are some of the things I learned:

AWS Amplify and Cognito: Learning these services and getting them configured took time and effort, but it was also rewarding to learn their power.
Rapid Development: I challenged myself to build a project in a short amount of time, pushing the limits of what I could achieve in 15 days.
Just Start: I learned to just start the project and iterate based on the needs.

What I'm Proud Of

Amazon Q Collaboration: I'm very proud to have worked with Amazon Q as my coding partner. It sped up my development process and helped me code, debug and research with great efficiency.
Functional Prototype: I'm happy to have built and launched a functional project under a short amount of time.
Community Engagement: I was able to participate in the AWS Game Builder Challenge and I was happy to share my work with the community.

What's Next for CloudQuest?

Beta Testing: I am looking forward to getting feedback from beta users and improve the gameplay.
Content Expansion: I am planning to expand the content and cover more AWS topics.
Personalized Learning: I'm planning to integrate Amazon Bedrock to create personalized lessons based on user performance and learning patterns.

How I Used Amazon Q Developer

Here are some ways I leveraged Amazon Q during development:

@workspace for Context: I used the @workspace command to provide Amazon Q with context from my codebase, which helped it generate relevant code and suggestions.
/dev for UI Components: I used the /dev command to rapidly generate UI components and pages.
Command+I for Code Edits: I used Command+I (or equivalent) in my IDE to edit and generate code snippets.
Contextual Actions: I leveraged the context menu options for refactoring, explanation and initiating longer discussions.
Multi-Tab Chat: I used the multi-tab chat feature to work on different tasks in parallel.

Gameplay

Try CloudQuest!

I invite you to check out CloudQuest and give it a try. Any feedback or comments are much appreciated!

Demo: https://main.d15m5mz0uevgdr.amplifyapp.com/

Github Repo: https://github.com/ossamaweb/cloud-quest