DEV Community

Cover image for One Open Source Project Per Day (Day 67): Open-Generative-AI - Open Source Center for AI Video & Image Creation
WonderLab
WonderLab

Posted on

One Open Source Project Per Day (Day 67): Open-Generative-AI - Open Source Center for AI Video & Image Creation

Introduction

"Creative freedom belongs to everyone, unfiltered and unconstrained."

This is the No.67 article in the "One Open Source Project Per Day" series. Today we explore Open-Generative-AI.

In the field of AI video and image generation, while powerful platforms like Kling, Sora, and Midjourney have emerged, closed-source ecosystems, subscription fees, and strict content filters (guardrails) often limit the creativity of many creators. Open-Generative-AI serves as an open-source alternative to these platforms, providing an unfiltered, customizable, and self-hostable creation environment by integrating over 200 state-of-the-art models.

What You Will Learn

  • Core Concepts: How to build a unified AI creation hub for multiple models.
  • Key Features: Comprehensive capabilities covering Text-to-Image, Image-to-Image, Text-to-Video, Image-to-Video, and audio-driven lip-sync.
  • Technical Highlights: Support for local inference in Electron desktop apps (via sd.cpp and Wan2GP) and remote GPU offloading.
  • Application Scenarios: From personal artistic creation to building automated media pipelines.
  • Comparative Advantages: No content filters, zero subscription fees, and full private deployment.

Prerequisites

  • Basic understanding of Generative AI (Diffusion Models, Video Generation).
  • Familiarity with JavaScript/TypeScript development environments.
  • Fundamental knowledge of Docker/Node.js deployment.

Project Background

Project Introduction

Open-Generative-AI is a free and open-source studio for AI images, videos, cinema, and lip-syncing. Its core value lies in the "Infinite Budget" cinematic workflow philosophy, allowing creators to escape expensive subscription services and create using top-tier models like Flux, Kling, and Wan 2.2 on local machines or self-hosted servers. It provides not only a Web interface but also a powerful desktop client and can even serve as a backend skill library for AI coding agents like Claude Code.

Author/Team Introduction

  • Author: Anil-matcha
  • Background: An active open-source developer focused on AI toolchains and media processing.
  • Creation Date: 2024 (Under rapid development)

Project Data


Main Features

Core Utility

Open-Generative-AI provides a highly integrated UI that allows users to call various AI generation models through simple configurations (such as API keys or local model paths), enabling a complete flow from ideation to final rendering.

Usage Scenarios

  1. Short Video/Film Creation
    • Use Cinema Studio's professional camera controls (lens, focal length, aperture) to generate high-quality shots.
  2. Podcast/Marketing Video Production
    • Leverage Lip Sync Studio to make static portraits speak based on audio, creating "talking head" videos.
  3. Private/Unfiltered Creation
    • Bypass the safety concerns of commercial platforms and run unfiltered models on your local machine.
  4. Automated AI Media Pipelines
    • Integrate with skill libraries to let AI agents automatically perform tasks like "Prompt Generation -> Generate -> Edit -> Stitch."

Quick Start

Experience it quickly through two methods:

1. Online Browser Use
Visit muapi.ai to experience the four studio modes directly.

2. Local Deployment (From Source)

# Clone the repository
git clone https://github.com/Anil-matcha/Open-Generative-AI.git
cd Open-Generative-AI

# Install dependencies
pnpm install

# Start development server
pnpm dev

# Build desktop app (Electron)
npm run electron:build
Enter fullscreen mode Exit fullscreen mode

Key Characteristics

  1. Image Studio: Supports 50+ t2i and 55+ i2i models.
  2. Video Studio: Covers 40+ t2v and 60+ i2v models with intelligent mode switching.
  3. Lip Sync Studio: 9 dedicated models for animating portraits or existing videos from audio.
  4. Cinema Studio: Interface for photorealistic cinematic shots with pro camera controls.
  5. Local Inference: Built-in sd.cpp support for Apple Silicon (Metal) and CUDA/ROCm; plus Wan2GP for remote GPU servers.
  6. Multi-Image Input: Allows uploading up to 14 reference images for specific models.
  7. Workflow Studio: Node-based editor for building and running multi-step AI pipelines visually.

Project Advantages

Feature Open-Generative-AI Commercial Platforms (Sora/Midjourney) Traditional Open Source UIs (A1111)
Model Count 200+ (Cross-vendor) Single vendor only Mostly Stable Diffusion
Content Filtering None (User controlled) Extremely strict None
Deployment Web/Desktop/Self-host Cloud only Complex local install
Integration Very strong (API/SDK/CLI) Closed Plugin driven

Detailed Technical Insights

Architecture: Dual Local Inference Engines

The flexibility of the Open-Generative-AI desktop app lies in how it handles local compute.

1. Bundled sd.cpp

A C++ engine based on stable-diffusion.cpp packaged within the app.

  • Advantage: Ready to use, supports Metal acceleration on Mac M-series. Supports not only SD 1.5/SDXL but also newer models like Z-Image.
  • Detail: Driven via sd-cli without needing a complex Python environment.

2. Wan2GP (Remote Engine)

For models requiring high-performance NVIDIA GPUs (like Wan 2.2 or Hunyuan Video), which are CUDA-based and cannot run efficiently on Mac natively.

  • Solution: Users run a Wan2GP server on a GPU-enabled Linux box, and Open-Generative-AI connects as a client.
  • Impact: Enables cross-platform compute offloading so Mac users can drive top-tier video models.

Key Implementation: Intelligent Workflow Switching

The project features deep UI optimizations. When entering Image or Video Studio, the system monitors if a reference image has been uploaded.

  • No Upload: Automatically switches to Text-to-Image/Video sets.
  • With Upload: Instantly switches to Image-to-Image/Video sets (e.g., Kling i2v, LTX Video i2v).

This state-based routing significantly reduces operational complexity for users.


Resources and Links

Official Resources

Related Resources

Target Audience

  • Digital Artists & Filmmakers: Seeking low-cost, unrestricted creation tools.
  • AI Developers: Needing to quickly integrate multi-model capabilities.
  • Open Source Enthusiasts: Preferring private deployment and self-hosted apps.

Find more useful knowledge and interesting products at my Homepage

Top comments (0)