SINGHO

Posted on Feb 17 • Originally published at github.com

Build AI Videos with Code: Seedance 2.0 API Integration Guide

#ai #tutorial #mcp #cursor

Everyone's talking about Seedance 2.0 — ByteDance's new AI video model that generates cinema-grade video with native audio sync. But most articles stop at "look what it can do." This one shows you how to actually build with it — real API calls, real code, real results.

What you'll walk away with:

Generate AI videos from text, images, video references, and audio — all via API
Poll async tasks and retrieve your output video URL
Set up MCP integration so your AI editor can call the API for you
Use Cursor Skills to automate entire storyboard-to-video workflows

Prerequisites

Register at SuTui AI (the API platform for Seedance 2.0)
Create an API Key on the API Key page
Grab some credits — a 5-second Fast video costs 50 credits (~$0.50)

Base URL: https://api.xskill.ai

Auth header:

Authorization: Bearer sk-your-api-key

Core Concepts: Two Modes, One Model

Seedance 2.0 (st-ai/super-seed2) supports two function modes:

Mode	What It Does	Input Assets
`omni_reference`	Multi-modal mixing — combine images, videos, and audio freely	`image_files`, `video_files`, `audio_files`
`first_last_frames`	Control start/end frames for smooth transitions	`filePaths`

The @ Reference System

Upload assets and reference them by position in your prompt:

@image_file_1 character performs the dance from @video_file_1, cinematic lighting

@image_file_N → N-th element in the image_files array
@video_file_N → N-th element in the video_files array
@audio_file_N → N-th element in the audio_files array

Up to 9 images + 3 videos + 3 audio clips in a single request.

Step 1: Generate Your First Video (cURL)

Two endpoints, two steps — create a task, then poll for results.

Create a task

curl -X POST "https://api.xskill.ai/api/v3/tasks/create" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-your-api-key" \
  -d '{
    "model": "st-ai/super-seed2",
    "params": {
      "model": "seedance_2.0_fast",
      "prompt": "A golden sunset over the ocean, waves gently crashing on shore, cinematic drone shot",
      "functionMode": "first_last_frames",
      "ratio": "16:9",
      "duration": 5
    }
  }'

Response:

{
  "code": 200,
  "data": {
    "task_id": "task_abc123",
    "price": 50
  }
}

Poll for results

curl -X POST "https://api.xskill.ai/api/v3/tasks/query" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-your-api-key" \
  -d '{"task_id": "task_abc123"}'

When complete:

{
  "code": 200,
  "data": {
    "status": "completed",
    "result": {
      "output": {
        "images": ["https://your-video-output.mp4"]
      }
    }
  }
}

Status values: pending → processing → completed (or failed)

Step 2: Python — Full Create + Poll Loop

Here's a production-ready pattern you can drop into any Python project:

import requests
import time

API_KEY = "sk-your-api-key"
BASE = "https://api.xskill.ai"
HEADERS = {
    "Content-Type": "application/json",
    "Authorization": f"Bearer {API_KEY}"
}

def create_video(prompt, **kwargs):
    """Create a Seedance 2.0 video task."""
    payload = {
        "model": "st-ai/super-seed2",
        "params": {
            "model": kwargs.get("speed", "seedance_2.0_fast"),
            "prompt": prompt,
            "functionMode": kwargs.get("mode", "first_last_frames"),
            "ratio": kwargs.get("ratio", "16:9"),
            "duration": kwargs.get("duration", 5),
        }
    }
    # Add optional asset arrays
    for key in ("image_files", "video_files", "audio_files", "filePaths"):
        if key in kwargs:
            payload["params"][key] = kwargs[key]

    resp = requests.post(f"{BASE}/api/v3/tasks/create", json=payload, headers=HEADERS)
    return resp.json()["data"]["task_id"]

def poll_result(task_id, interval=5, timeout=300):
    """Poll until task completes or timeout."""
    elapsed = 0
    while elapsed < timeout:
        resp = requests.post(
            f"{BASE}/api/v3/tasks/query",
            json={"task_id": task_id},
            headers=HEADERS
        )
        data = resp.json()["data"]
        status = data["status"]
        print(f"[{elapsed}s] Status: {status}")

        if status == "completed":
            return data["result"]["output"]["images"][0]
        if status == "failed":
            raise Exception("Task failed")

        time.sleep(interval)
        elapsed += interval
    raise TimeoutError("Task timed out")

# --- Example: Text-to-Video ---
task_id = create_video("A cat wearing sunglasses walks down a neon-lit Tokyo street at night")
video_url = poll_result(task_id)
print(f"Video ready: {video_url}")

Step 3: JavaScript — Async/Await Pattern

const API_KEY = "sk-your-api-key";
const BASE = "https://api.xskill.ai";
const headers = {
  "Content-Type": "application/json",
  Authorization: `Bearer ${API_KEY}`,
};

async function createVideo(prompt, opts = {}) {
  const resp = await fetch(`${BASE}/api/v3/tasks/create`, {
    method: "POST",
    headers,
    body: JSON.stringify({
      model: "st-ai/super-seed2",
      params: {
        model: opts.speed || "seedance_2.0_fast",
        prompt,
        functionMode: opts.mode || "first_last_frames",
        ratio: opts.ratio || "16:9",
        duration: opts.duration || 5,
        ...opts.assets,
      },
    }),
  });
  const { data } = await resp.json();
  return data.task_id;
}

async function pollResult(taskId, interval = 5000, timeout = 300000) {
  const start = Date.now();
  while (Date.now() - start < timeout) {
    const resp = await fetch(`${BASE}/api/v3/tasks/query`, {
      method: "POST",
      headers,
      body: JSON.stringify({ task_id: taskId }),
    });
    const { data } = await resp.json();
    console.log(`Status: ${data.status}`);

    if (data.status === "completed")
      return data.result.output.images[0];
    if (data.status === "failed")
      throw new Error("Task failed");

    await new Promise((r) => setTimeout(r, interval));
  }
  throw new Error("Timeout");
}

// --- Example: Image-to-Video (Omni Reference) ---
const taskId = await createVideo(
  "@image_file_1 character slowly turns and smiles, breeze moves hair, golden hour lighting",
  {
    mode: "omni_reference",
    assets: {
      image_files: ["https://your-character-image.png"],
    },
  }
);
const videoUrl = await pollResult(taskId);
console.log("Video ready:", videoUrl);

Real-World Use Cases

Here are the patterns that unlock Seedance 2.0's full power:

Motion Transfer (Image + Video)

Make a character from a photo perform actions from a reference video:

{
  "prompt": "@image_file_1 character performs following @video_file_1 motion and camera style, cinematic lighting",
  "functionMode": "omni_reference",
  "image_files": ["https://character.png"],
  "video_files": ["https://dance-reference.mp4"]
}

Audio-Driven Lip Sync (Image + Audio)

A character speaks with phoneme-level lip sync in 8+ languages:

{
  "prompt": "@image_file_1 character speaks naturally, matching @audio_file_1 with expressive lip sync",
  "functionMode": "omni_reference",
  "image_files": ["https://character.png"],
  "audio_files": ["https://voiceover.mp3"]
}

Multi-Character Scene

Two characters interacting in one shot:

{
  "prompt": "@image_file_1 and @image_file_2 face each other in conversation, warm indoor lighting",
  "functionMode": "omni_reference",
  "image_files": ["https://person-a.png", "https://person-b.png"]
}

First/Last Frame Transition

Control exactly where your video starts and ends:

{
  "prompt": "Smooth cinematic transition, elegant camera movement, natural lighting shift",
  "functionMode": "first_last_frames",
  "filePaths": ["https://sunrise.png", "https://sunset.png"],
  "duration": 10
}

Level Up: MCP Integration

Don't want to write API calls manually? Use MCP (Model Context Protocol) to let your AI editor handle everything.

One-Click Install

Mac / Linux:

curl -fsSL https://api.xskill.ai/install-mcp.sh | bash -s -- YOUR_API_KEY

Windows (PowerShell):

irm https://api.xskill.ai/install-mcp.ps1 | iex

Manual Setup (Cursor)

Create .cursor/mcp.json in your project root:

{
  "mcpServers": {
    "sutui-ai": {
      "command": "npx",
      "args": [
        "-y",
        "@anthropic/mcp-client",
        "https://api.xskill.ai/api/v3/mcp-http"
      ],
      "env": {
        "SUTUI_API_KEY": "YOUR_API_KEY"
      }
    }
  }
}

Now you can just chat:

You: Generate a 10-second video of an astronaut walking on Mars, 16:9, cinematic style

Agent: Sure! Submitting to Seedance 2.0... your video is ready: [link]

Works with Cursor, Claude Desktop, and any MCP-compatible editor.

Bonus: AI Storyboarding with Cursor Skills

This is where it gets wild. Install the Seedance Storyboard Skill and the AI Agent automates the entire creative pipeline:

Your Idea → Info Gathering → Reference Image Generation → Storyboard → Video → Done

Install

git clone https://github.com/siliconflow/seedance2-api.git
cp -r seedance2-api/.cursor/skills/seedance-storyboard/ \
      your-project/.cursor/skills/seedance-storyboard/

What Happens

Just describe what you want in plain English:

You: Make a 15-second coffee brand commercial for "Lucky Coffee"

Agent:
1. Gathers info (duration, ratio, style)
2. Generates reference images with Seedream 4.5
3. Builds a professional shot list:
   0-3s: Macro — coffee pouring, steam rising
   3-6s: Medium orbit — hand holding cup, sunlight
   6-10s: Push into coffee beans falling
   10-12s: Black transition
   12-15s: Brand text "Lucky Coffee" fades in
4. Submits to Seedance 2.0
5. Returns your finished video

The Skill includes built-in storyboard templates (narrative, product, action, scenic) and a camera movement glossary — the Agent picks the best fit automatically.

Pricing Quick Reference

Mode	Cost
Fast 5s (text-to-video)	50 credits
Fast 5s (with video input)	100 credits
Standard 5s (with video input)	200 credits
Per-second: Fast (no video)	10 credits/sec
Per-second: Fast (with video)	20 credits/sec

Duration range: 4-15 seconds. Speed modes: seedance_2.0_fast or seedance_2.0 (standard).

Wrapping Up

Seedance 2.0 is not just another AI video model — the multi-modal input system, native audio sync, and @ reference syntax make it genuinely programmable. Whether you're building a video generation feature into your SaaS, automating content pipelines, or just experimenting — the API is straightforward and the MCP/Skills layer makes it even easier.

Resources:

Have questions or built something cool with Seedance 2.0? Drop a comment below — I'd love to see what you're making.

DEV Community