WonderLab

Posted on May 17

One Open Source Project Per Day (Day 67): Open-Generative-AI - Open Source Center for AI Video & Image Creation

#ai #react #opensource

Introduction

"Creative freedom belongs to everyone, unfiltered and unconstrained."

This is the No.67 article in the "One Open Source Project Per Day" series. Today we explore Open-Generative-AI.

In the field of AI video and image generation, while powerful platforms like Kling, Sora, and Midjourney have emerged, closed-source ecosystems, subscription fees, and strict content filters (guardrails) often limit the creativity of many creators. Open-Generative-AI serves as an open-source alternative to these platforms, providing an unfiltered, customizable, and self-hostable creation environment by integrating over 200 state-of-the-art models.

What You Will Learn

Core Concepts: How to build a unified AI creation hub for multiple models.
Key Features: Comprehensive capabilities covering Text-to-Image, Image-to-Image, Text-to-Video, Image-to-Video, and audio-driven lip-sync.
Technical Highlights: Support for local inference in Electron desktop apps (via sd.cpp and Wan2GP) and remote GPU offloading.
Application Scenarios: From personal artistic creation to building automated media pipelines.
Comparative Advantages: No content filters, zero subscription fees, and full private deployment.

Prerequisites

Basic understanding of Generative AI (Diffusion Models, Video Generation).
Familiarity with JavaScript/TypeScript development environments.
Fundamental knowledge of Docker/Node.js deployment.

Project Background

Project Introduction

Open-Generative-AI is a free and open-source studio for AI images, videos, cinema, and lip-syncing. Its core value lies in the "Infinite Budget" cinematic workflow philosophy, allowing creators to escape expensive subscription services and create using top-tier models like Flux, Kling, and Wan 2.2 on local machines or self-hosted servers. It provides not only a Web interface but also a powerful desktop client and can even serve as a backend skill library for AI coding agents like Claude Code.

Author/Team Introduction

Author: Anil-matcha
Background: An active open-source developer focused on AI toolchains and media processing.
Creation Date: 2024 (Under rapid development)

Project Data

⭐ GitHub Stars: 14.5k+
🍴 Forks: 2.5k+
📦 Version: v1.0.9 (Latest)
📄 License: MIT
🌐 Website: muapi.ai/open-generative-ai

Main Features

Core Utility

Open-Generative-AI provides a highly integrated UI that allows users to call various AI generation models through simple configurations (such as API keys or local model paths), enabling a complete flow from ideation to final rendering.

Usage Scenarios

Short Video/Film Creation
- Use Cinema Studio's professional camera controls (lens, focal length, aperture) to generate high-quality shots.
Podcast/Marketing Video Production
- Leverage Lip Sync Studio to make static portraits speak based on audio, creating "talking head" videos.
Private/Unfiltered Creation
- Bypass the safety concerns of commercial platforms and run unfiltered models on your local machine.
Automated AI Media Pipelines
- Integrate with skill libraries to let AI agents automatically perform tasks like "Prompt Generation -> Generate -> Edit -> Stitch."

Quick Start

Experience it quickly through two methods:

1. Online Browser Use
Visit muapi.ai to experience the four studio modes directly.

2. Local Deployment (From Source)

# Clone the repository
git clone https://github.com/Anil-matcha/Open-Generative-AI.git
cd Open-Generative-AI

# Install dependencies
pnpm install

# Start development server
pnpm dev

# Build desktop app (Electron)
npm run electron:build

Key Characteristics

Image Studio: Supports 50+ t2i and 55+ i2i models.
Video Studio: Covers 40+ t2v and 60+ i2v models with intelligent mode switching.
Lip Sync Studio: 9 dedicated models for animating portraits or existing videos from audio.
Cinema Studio: Interface for photorealistic cinematic shots with pro camera controls.
Local Inference: Built-in sd.cpp support for Apple Silicon (Metal) and CUDA/ROCm; plus Wan2GP for remote GPU servers.
Multi-Image Input: Allows uploading up to 14 reference images for specific models.
Workflow Studio: Node-based editor for building and running multi-step AI pipelines visually.

Project Advantages

Feature	Open-Generative-AI	Commercial Platforms (Sora/Midjourney)	Traditional Open Source UIs (A1111)
Model Count	200+ (Cross-vendor)	Single vendor only	Mostly Stable Diffusion
Content Filtering	None (User controlled)	Extremely strict	None
Deployment	Web/Desktop/Self-host	Cloud only	Complex local install
Integration	Very strong (API/SDK/CLI)	Closed	Plugin driven

Detailed Technical Insights

Architecture: Dual Local Inference Engines

The flexibility of the Open-Generative-AI desktop app lies in how it handles local compute.

1. Bundled sd.cpp

A C++ engine based on stable-diffusion.cpp packaged within the app.

Advantage: Ready to use, supports Metal acceleration on Mac M-series. Supports not only SD 1.5/SDXL but also newer models like Z-Image.
Detail: Driven via sd-cli without needing a complex Python environment.

2. Wan2GP (Remote Engine)

For models requiring high-performance NVIDIA GPUs (like Wan 2.2 or Hunyuan Video), which are CUDA-based and cannot run efficiently on Mac natively.

Solution: Users run a Wan2GP server on a GPU-enabled Linux box, and Open-Generative-AI connects as a client.
Impact: Enables cross-platform compute offloading so Mac users can drive top-tier video models.

Key Implementation: Intelligent Workflow Switching

The project features deep UI optimizations. When entering Image or Video Studio, the system monitors if a reference image has been uploaded.

No Upload: Automatically switches to Text-to-Image/Video sets.
With Upload: Instantly switches to Image-to-Image/Video sets (e.g., Kling i2v, LTX Video i2v).

This state-based routing significantly reduces operational complexity for users.

Resources and Links

Official Resources

🌟 GitHub: Anil-matcha/Open-Generative-AI
📚 Docs: Medium Guide
💬 Community: Discord / Reddit
🐛 Issue Tracker: GitHub Issues

Related Resources

Generative-Media-Skills - Skill library for AI Agents.
Wan2GP - Remote inference support.

Target Audience

Digital Artists & Filmmakers: Seeking low-cost, unrestricted creation tools.
AI Developers: Needing to quickly integrate multi-model capabilities.
Open Source Enthusiasts: Preferring private deployment and self-hosted apps.

Find more useful knowledge and interesting products at my Homepage

DEV Community