DEV Community

Cover image for Genra Video Creator: Open-Source AI Skills for Autonomous Video Production
Genra
Genra

Posted on • Originally published at genra.ai

Genra Video Creator: Open-Source AI Skills for Autonomous Video Production

What Is Genra Video Creator?

Genra Video Creator is an open-source collection of AI skills that allow autonomous agents to control the Genra video creation platform. It includes reusable, best-practice workflows and templates for generating different types of videos — from multi-shot narratives to e-commerce product showcases to brand atmosphere films.

Think of it this way: Genra is the video editor. These skills are the instruction manuals that teach AI agents how to use it. They encode the production knowledge — art direction, character consistency, pacing, audio mixing — so the agent can produce professional-quality videos autonomously.

The repo is live at github.com/genra-ai/video-creator.

Why Open-Source AI Video Skills?

Genra already has a built-in AI agent directly in the editor. You open the chat panel, describe what you want, and it creates the video. No setup needed.

But developers want more. They want to integrate video creation into their own AI workflows — trigger video generation from a CI/CD pipeline, have Claude Code produce a demo video alongside code changes, or build custom applications that generate videos on demand.

That's what these skills enable. They're markdown instruction files that any autonomous agent can read and follow. The agent learns the Genra API, understands video production best practices, and executes complex multi-step workflows — all from a single file reference.

How It Works

The architecture is intentionally simple:

  1. Skill files — Markdown documents that contain step-by-step instructions, API endpoints, and production rules
  2. HTTP API — A straightforward REST API at https://action.genra.ai/. No SDK required. Just POST requests with JSON payloads
  3. Any agent — Claude Code, Codex, Gemini CLI, OpenClaw, or any system that can read markdown and make HTTP calls

The API surface is minimal:

curl -s -X POST https://action.genra.ai/ \
  -H "Content-Type: application/json" \
  -d '{"session_key":"SK","action":"get_state"}'
Enter fullscreen mode Exit fullscreen mode

Three core actions: get_state to read the current project state, click to interact with UI elements, and edit to modify text content. Quick operations return synchronously. Long operations (like generating video clips) return a job ID that you poll until completion.

File uploads use a simple multipart endpoint:

curl -s -X POST https://action.genra.ai/upload \
  -F "session_key=SK" -F "file=@product-photo.jpg"
Enter fullscreen mode Exit fullscreen mode

The returned asset_id can be referenced in subsequent commands with the $ prefix.

7 Built-In Skills

The repo ships with seven production-ready skills, each encoding a complete video creation workflow:

1. Script to Video

The flagship skill. Converts a multi-shot screenplay into a finished video with consistent characters, scene lighting, and voiceover.

How it works: The agent analyzes your script, supplements each shot with visual descriptions (art style, lighting, camera angle), generates images with character consistency checks, produces voiceover and background music, then assembles and exports the final video.

Key feature: Automatic character consistency verification. The agent groups shots by character and compares visual features — hair color, clothing, signature accessories — across all appearances, regenerating any inconsistent shots before proceeding.

Supports: Chinese dialogue scripts, English screenplays (INT./EXT. format), and narrative text in other languages.

2. Talking Head

Generates single-speaker videos with fixed framing and seamless continuity between shots — ideal for social media content, educational explainers, and product introductions.

Key feature: Tail-frame chaining. Each shot's closing frame becomes the next shot's opening frame, creating natural continuity without jump cuts. The agent verifies depth-of-field alignment, character size, and vertical positioning across every transition.

3. Product Showcase

Converts product images and selling points into e-commerce videos optimized for platforms like Taobao, JD, Amazon, and Shopify.

Key feature: Auto-storyboarding. The agent researches competitor product videos, identifies core value propositions, prioritizes 3–5 selling points, and generates an 8–12 shot storyboard (30–60 seconds) with each shot serving exactly one selling point.

Default format: 9:16 vertical (720x1280) for product listing pages.

4. Brand Atmosphere Film

Creates high-end emotional brand films inspired by Nike, Apple, and Red Bull — emphasizing mood, visual aesthetics, and dynamic pacing over feature-driven messaging.

Key feature: Pacing rhythm control. The agent alternates between quick cuts (1–1.5s) and lingering shots (2.5–4s) to create cinematic tension. Voiceover is intentionally minimal (5–10 words max), letting the visuals and music carry the emotional narrative.

5. Photo Vlog

Transforms 3–10 real photos into a 30–60 second narrative vlog with camera movements and background music.

Key feature: Emotional ordering. Instead of arranging photos chronologically, the agent analyzes each photo's emotional intensity and narrative potential, then sequences them along an emotional arc for maximum impact. Camera movements (push-in, pan, pull-back) automatically alternate between shots.

6. Video Edit

Manages post-production modifications and quality assurance through three modes: single edits, systematic quality checks, or batch modifications.

Key feature: Multi-category QA. The agent downloads and inspects every frame, checking for A-category defects (numbers, borders, watermarks), B-category continuity issues (adjacent shots), and C-category conflicts (description vs. visual content). Issues are fixed via targeted I2I editing or full regeneration.

7. Video Continuation

Extends existing projects by adding new shots while preserving the original style, characters, music, and narrative flow.

Key feature: Selective regeneration. The agent never touches existing content — it only generates audio and video for new shots. Character definitions and scene descriptions are preserved verbatim to prevent character drift across the seam between old and new segments.

Getting Started

Three ways to use Genra Video Creator, from simplest to most flexible:

Option 1: Built-In Agent (No Setup)

Go to genra.ai, open a project, and use the chat panel. The built-in agent already has all skills loaded. Just describe what you want.

Option 2: Claude Code Plugin

Install the plugin in two commands:

/plugin marketplace add genra-ai/video-creator
/plugin install genra@genra-ai
Enter fullscreen mode Exit fullscreen mode

Then connect:

/genra:start
Enter fullscreen mode Exit fullscreen mode

All skills become available as /genra:<skill-name> commands with automatic discovery. For example, /genra:script-to-video converts your screenplay into a finished video, and /genra:product-showcase turns product images into e-commerce videos.

Option 3: Any AI Agent

Point your agent to the skill file URL:

https://github.com/genra-ai/video-creator/blob/main/plugins/genra/commands/start.md
Enter fullscreen mode Exit fullscreen mode

The agent reads the instructions, authenticates via the API, and gains full control of the Genra editor. This works with Codex, Gemini CLI, OpenClaw, or any agent that can read markdown and make HTTP requests.

Why Markdown Skills Instead of an SDK?

Traditional API integrations require SDKs, version management, dependency installation, and language-specific implementations. Genra Video Creator takes a different approach: the "SDK" is a markdown file that any AI agent can read.

This design choice has several advantages:

  • Zero dependencies. No packages to install, no version conflicts, no build step. The agent reads a URL and starts working.
  • Agent-native. AI agents are better at following natural language instructions than parsing API documentation. Markdown skills speak the agent's language.
  • Self-updating. Point to the GitHub URL and the agent always gets the latest version. No SDK updates to ship.
  • Cross-platform. Works with any agent on any platform. Claude Code, Codex, Gemini CLI — if it can read text and make HTTP calls, it works.
  • Embeds domain knowledge. Skills don't just describe API endpoints — they encode production best practices. The script-to-video skill knows how to check character consistency. The brand film skill knows how to pace cuts for emotional impact. This knowledge transfers directly to the agent.

Multilingual Support

All skills are available in both English and Chinese. Chinese versions are located in the plugins/genra/commands/cn/ directory with the _cn suffix:

  • script-to-video_cn.md
  • talking-head_cn.md
  • product-showcase_cn.md
  • brand-story_cn.md
  • photo-vlog_cn.md
  • video-edit_cn.md
  • video-continuation_cn.md

What Can You Build With This?

Some ideas:

  • Automated product video pipeline: Feed your e-commerce catalog into the product-showcase skill. Generate listing videos for every SKU automatically.
  • Content factory: Script 20 videos, queue them up, and let the agent produce them overnight. Wake up to 20 finished videos.
  • Custom video generation app: Build a web app where users input a brief, your backend agent calls Genra, and the user gets a finished video in minutes.
  • CI/CD integration: Auto-generate demo videos when a new feature ships. The agent reads the changelog, writes a script, and produces the video.
  • Multi-language video localization: Take one video, extend it with the video-continuation skill in 10 languages, each with native voiceover.

Frequently Asked Questions

What is Genra Video Creator?

Genra Video Creator is an open-source collection of AI skill files that teach autonomous agents how to control the Genra video editor via API. It includes pre-built workflows for script-to-video, talking-head videos, product showcases, brand films, photo vlogs, and more.

Which AI agents work with Genra Video Creator?

It works with Claude Code (as a native plugin), OpenClaw, Codex, Gemini CLI, and any autonomous agent that can read markdown instructions and make HTTP requests. No SDK is required.

Do I need to install an SDK?

No. Genra's API is plain HTTP. You send POST requests to https://action.genra.ai/ with JSON payloads. Any language or tool that can make HTTP calls can control Genra.

Is it free?

The skill files and plugin are open source on GitHub. You need a Genra account to use the video creation platform.

Can I contribute new skills?

Yes. The repo is open source. Fork it, create a new skill markdown file following the existing patterns, and submit a pull request. Community skills that meet quality standards will be merged and become available to all users.

Get Started

The repo: github.com/genra-ai/video-creator

The fastest path: open Genra, use the built-in agent, and start creating. If you want programmatic control, install the Claude Code plugin or point any AI agent to the skill file URL. The skills handle the production complexity — you just tell the agent what video you want.

Top comments (0)