DEV Community

Cover image for Build AI Videos with Code: Seedance 2.0 API Integration Guide
SINGHO
SINGHO

Posted on • Originally published at github.com

Build AI Videos with Code: Seedance 2.0 API Integration Guide

Everyone's talking about Seedance 2.0 — ByteDance's new AI video model that generates cinema-grade video with native audio sync. But most articles stop at "look what it can do." This one shows you how to actually build with it — real API calls, real code, real results.

What you'll walk away with:

  • Generate AI videos from text, images, video references, and audio — all via API
  • Poll async tasks and retrieve your output video URL
  • Set up MCP integration so your AI editor can call the API for you
  • Use Cursor Skills to automate entire storyboard-to-video workflows

Prerequisites

  1. Register at SuTui AI (the API platform for Seedance 2.0)
  2. Create an API Key on the API Key page
  3. Grab some credits — a 5-second Fast video costs 50 credits (~$0.50)

Base URL: https://api.xskill.ai

Auth header:

Authorization: Bearer sk-your-api-key
Enter fullscreen mode Exit fullscreen mode

Core Concepts: Two Modes, One Model

Seedance 2.0 (st-ai/super-seed2) supports two function modes:

Mode What It Does Input Assets
omni_reference Multi-modal mixing — combine images, videos, and audio freely image_files, video_files, audio_files
first_last_frames Control start/end frames for smooth transitions filePaths

The @ Reference System

Upload assets and reference them by position in your prompt:

@image_file_1 character performs the dance from @video_file_1, cinematic lighting
Enter fullscreen mode Exit fullscreen mode
  • @image_file_N → N-th element in the image_files array
  • @video_file_N → N-th element in the video_files array
  • @audio_file_N → N-th element in the audio_files array

Up to 9 images + 3 videos + 3 audio clips in a single request.


Step 1: Generate Your First Video (cURL)

Two endpoints, two steps — create a task, then poll for results.

Create a task

curl -X POST "https://api.xskill.ai/api/v3/tasks/create" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-your-api-key" \
  -d '{
    "model": "st-ai/super-seed2",
    "params": {
      "model": "seedance_2.0_fast",
      "prompt": "A golden sunset over the ocean, waves gently crashing on shore, cinematic drone shot",
      "functionMode": "first_last_frames",
      "ratio": "16:9",
      "duration": 5
    }
  }'
Enter fullscreen mode Exit fullscreen mode

Response:

{
  "code": 200,
  "data": {
    "task_id": "task_abc123",
    "price": 50
  }
}
Enter fullscreen mode Exit fullscreen mode

Poll for results

curl -X POST "https://api.xskill.ai/api/v3/tasks/query" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer sk-your-api-key" \
  -d '{"task_id": "task_abc123"}'
Enter fullscreen mode Exit fullscreen mode

When complete:

{
  "code": 200,
  "data": {
    "status": "completed",
    "result": {
      "output": {
        "images": ["https://your-video-output.mp4"]
      }
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Status values: pendingprocessingcompleted (or failed)


Step 2: Python — Full Create + Poll Loop

Here's a production-ready pattern you can drop into any Python project:

import requests
import time

API_KEY = "sk-your-api-key"
BASE = "https://api.xskill.ai"
HEADERS = {
    "Content-Type": "application/json",
    "Authorization": f"Bearer {API_KEY}"
}

def create_video(prompt, **kwargs):
    """Create a Seedance 2.0 video task."""
    payload = {
        "model": "st-ai/super-seed2",
        "params": {
            "model": kwargs.get("speed", "seedance_2.0_fast"),
            "prompt": prompt,
            "functionMode": kwargs.get("mode", "first_last_frames"),
            "ratio": kwargs.get("ratio", "16:9"),
            "duration": kwargs.get("duration", 5),
        }
    }
    # Add optional asset arrays
    for key in ("image_files", "video_files", "audio_files", "filePaths"):
        if key in kwargs:
            payload["params"][key] = kwargs[key]

    resp = requests.post(f"{BASE}/api/v3/tasks/create", json=payload, headers=HEADERS)
    return resp.json()["data"]["task_id"]

def poll_result(task_id, interval=5, timeout=300):
    """Poll until task completes or timeout."""
    elapsed = 0
    while elapsed < timeout:
        resp = requests.post(
            f"{BASE}/api/v3/tasks/query",
            json={"task_id": task_id},
            headers=HEADERS
        )
        data = resp.json()["data"]
        status = data["status"]
        print(f"[{elapsed}s] Status: {status}")

        if status == "completed":
            return data["result"]["output"]["images"][0]
        if status == "failed":
            raise Exception("Task failed")

        time.sleep(interval)
        elapsed += interval
    raise TimeoutError("Task timed out")

# --- Example: Text-to-Video ---
task_id = create_video("A cat wearing sunglasses walks down a neon-lit Tokyo street at night")
video_url = poll_result(task_id)
print(f"Video ready: {video_url}")
Enter fullscreen mode Exit fullscreen mode

Step 3: JavaScript — Async/Await Pattern

const API_KEY = "sk-your-api-key";
const BASE = "https://api.xskill.ai";
const headers = {
  "Content-Type": "application/json",
  Authorization: `Bearer ${API_KEY}`,
};

async function createVideo(prompt, opts = {}) {
  const resp = await fetch(`${BASE}/api/v3/tasks/create`, {
    method: "POST",
    headers,
    body: JSON.stringify({
      model: "st-ai/super-seed2",
      params: {
        model: opts.speed || "seedance_2.0_fast",
        prompt,
        functionMode: opts.mode || "first_last_frames",
        ratio: opts.ratio || "16:9",
        duration: opts.duration || 5,
        ...opts.assets,
      },
    }),
  });
  const { data } = await resp.json();
  return data.task_id;
}

async function pollResult(taskId, interval = 5000, timeout = 300000) {
  const start = Date.now();
  while (Date.now() - start < timeout) {
    const resp = await fetch(`${BASE}/api/v3/tasks/query`, {
      method: "POST",
      headers,
      body: JSON.stringify({ task_id: taskId }),
    });
    const { data } = await resp.json();
    console.log(`Status: ${data.status}`);

    if (data.status === "completed")
      return data.result.output.images[0];
    if (data.status === "failed")
      throw new Error("Task failed");

    await new Promise((r) => setTimeout(r, interval));
  }
  throw new Error("Timeout");
}

// --- Example: Image-to-Video (Omni Reference) ---
const taskId = await createVideo(
  "@image_file_1 character slowly turns and smiles, breeze moves hair, golden hour lighting",
  {
    mode: "omni_reference",
    assets: {
      image_files: ["https://your-character-image.png"],
    },
  }
);
const videoUrl = await pollResult(taskId);
console.log("Video ready:", videoUrl);
Enter fullscreen mode Exit fullscreen mode

Real-World Use Cases

Here are the patterns that unlock Seedance 2.0's full power:

Motion Transfer (Image + Video)

Make a character from a photo perform actions from a reference video:

{
  "prompt": "@image_file_1 character performs following @video_file_1 motion and camera style, cinematic lighting",
  "functionMode": "omni_reference",
  "image_files": ["https://character.png"],
  "video_files": ["https://dance-reference.mp4"]
}
Enter fullscreen mode Exit fullscreen mode

Audio-Driven Lip Sync (Image + Audio)

A character speaks with phoneme-level lip sync in 8+ languages:

{
  "prompt": "@image_file_1 character speaks naturally, matching @audio_file_1 with expressive lip sync",
  "functionMode": "omni_reference",
  "image_files": ["https://character.png"],
  "audio_files": ["https://voiceover.mp3"]
}
Enter fullscreen mode Exit fullscreen mode

Multi-Character Scene

Two characters interacting in one shot:

{
  "prompt": "@image_file_1 and @image_file_2 face each other in conversation, warm indoor lighting",
  "functionMode": "omni_reference",
  "image_files": ["https://person-a.png", "https://person-b.png"]
}
Enter fullscreen mode Exit fullscreen mode

First/Last Frame Transition

Control exactly where your video starts and ends:

{
  "prompt": "Smooth cinematic transition, elegant camera movement, natural lighting shift",
  "functionMode": "first_last_frames",
  "filePaths": ["https://sunrise.png", "https://sunset.png"],
  "duration": 10
}
Enter fullscreen mode Exit fullscreen mode

Level Up: MCP Integration

Don't want to write API calls manually? Use MCP (Model Context Protocol) to let your AI editor handle everything.

One-Click Install

Mac / Linux:

curl -fsSL https://api.xskill.ai/install-mcp.sh | bash -s -- YOUR_API_KEY
Enter fullscreen mode Exit fullscreen mode

Windows (PowerShell):

irm https://api.xskill.ai/install-mcp.ps1 | iex
Enter fullscreen mode Exit fullscreen mode

Manual Setup (Cursor)

Create .cursor/mcp.json in your project root:

{
  "mcpServers": {
    "sutui-ai": {
      "command": "npx",
      "args": [
        "-y",
        "@anthropic/mcp-client",
        "https://api.xskill.ai/api/v3/mcp-http"
      ],
      "env": {
        "SUTUI_API_KEY": "YOUR_API_KEY"
      }
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Now you can just chat:

You: Generate a 10-second video of an astronaut walking on Mars, 16:9, cinematic style

Agent: Sure! Submitting to Seedance 2.0... your video is ready: [link]

Works with Cursor, Claude Desktop, and any MCP-compatible editor.


Bonus: AI Storyboarding with Cursor Skills

This is where it gets wild. Install the Seedance Storyboard Skill and the AI Agent automates the entire creative pipeline:

Your Idea → Info Gathering → Reference Image Generation → Storyboard → Video → Done
Enter fullscreen mode Exit fullscreen mode

Install

git clone https://github.com/siliconflow/seedance2-api.git
cp -r seedance2-api/.cursor/skills/seedance-storyboard/ \
      your-project/.cursor/skills/seedance-storyboard/
Enter fullscreen mode Exit fullscreen mode

What Happens

Just describe what you want in plain English:

You: Make a 15-second coffee brand commercial for "Lucky Coffee"

Agent:
1. Gathers info (duration, ratio, style)
2. Generates reference images with Seedream 4.5
3. Builds a professional shot list:
   0-3s: Macro — coffee pouring, steam rising
   3-6s: Medium orbit — hand holding cup, sunlight
   6-10s: Push into coffee beans falling
   10-12s: Black transition
   12-15s: Brand text "Lucky Coffee" fades in
4. Submits to Seedance 2.0
5. Returns your finished video
Enter fullscreen mode Exit fullscreen mode

The Skill includes built-in storyboard templates (narrative, product, action, scenic) and a camera movement glossary — the Agent picks the best fit automatically.


Pricing Quick Reference

Mode Cost
Fast 5s (text-to-video) 50 credits
Fast 5s (with video input) 100 credits
Standard 5s (with video input) 200 credits
Per-second: Fast (no video) 10 credits/sec
Per-second: Fast (with video) 20 credits/sec

Duration range: 4-15 seconds. Speed modes: seedance_2.0_fast or seedance_2.0 (standard).


Wrapping Up

Seedance 2.0 is not just another AI video model — the multi-modal input system, native audio sync, and @ reference syntax make it genuinely programmable. Whether you're building a video generation feature into your SaaS, automating content pipelines, or just experimenting — the API is straightforward and the MCP/Skills layer makes it even easier.

Resources:


Have questions or built something cool with Seedance 2.0? Drop a comment below — I'd love to see what you're making.

Top comments (0)