Wanda

Posted on Apr 10 • Originally published at apidog.com

How to use reference video in Seedance 2.0: copy motion and camera moves

TL;DR

Reference video in Seedance 2.0 lets you anchor motion—camera moves, character choreography, timing—to an existing clip instead of describing everything in text. Use clean, single-shot, 3–8 second H.264 reference clips (no jump cuts). Keep text prompts short (≤3 adjectives for style). Text covers what the reference can’t show; the reference drives motion. If results ignore the reference, follow the troubleshooting steps in this guide.

Try Apidog today

Introduction

Text-only video generation is effective for open-ended concepts: atmospheric scenes, exploratory directions, varied visual styles. But when you need precise motion—specific gesture timing, a camera push-in, or a walk cycle—text falls short.

Reference video bridges this gap. Provide a sample clip, and Seedance 2.0 reinterprets the motion for your new scene.

This guide explains when to use reference video, how to create effective reference clips, and how to troubleshoot common issues.

When to use reference video

Reference video is best for:

Micro-gestures: Precise actions like “a thumb tap” or “a nod that lands on beat three.”
Choreography: Consistent motion patterns, such as a specific walk or repeated routine.
Camera moves: Subtle operations (slow push-ins, orbits, unique framing changes).
Beat-matching: Syncing actions to audio cues; the model reads timing directly from the clip.

Text-only is better for:

Loose concepts or atmospheric pieces where variety is desirable
Exploring visual directions for the same content
Cases with simple motion that text can describe, or no reference clip is available

Preparing reference clips

To get reliable results, follow these best practices:

Length: 3–8 seconds. Short: too little info. Long: inconsistent output.
Continuity: No edits, cuts, or jump cuts. Use a single continuous shot.
Compression: Clean H.264 (no macro-blocking or artifacts).
Subject clarity: Plain backgrounds, steady lighting. Minimize distractions.

Reference clip checklist:

[ ] Under 8 seconds
[ ] Single continuous shot (no cuts)
[ ] Clean compression (no visible blocking)
[ ] Subject clearly visible
[ ] Steady lighting throughout

Prompting with a reference clip

Combine reference video with concise text. The text prompt should clarify what the reference can’t show.

Use text for:

Style descriptors (lighting, color palette, tone)
Subject identity (who or what appears)
Camera context (if not clear from reference)
One or two specific constraints

Optimal prompt structure:

Style: [2-3 descriptors for lighting and palette]
Subject: [identity description with clear features]
Camera: [if different from reference]
Reference intent: "Respect motion from reference: reinterpret texture and color."
Must not: [one specific constraint, if needed]

Example:

Reference: person walking with a specific pace

Prompt:

Style: warm afternoon light, golden tones
Subject: a man in a gray suit, early 40s, confident posture
Respect motion from reference: reinterpret texture and color.
Must not: change walking pace

Tip: Limit to three style adjectives. More can create conflicting instructions and reduce output quality.

API usage via WaveSpeedAI

Use the WaveSpeedAI API to generate video with reference clips:

POST https://api.wavespeed.ai/api/v2/seedance/v2/image-to-video
Authorization: Bearer {{WAVESPEED_API_KEY}}
Content-Type: application/json

{
  "prompt": "Warm afternoon light, golden tones. A man in a gray suit walks forward. Respect motion from reference.",
  "image_url": "https://example.com/subject-reference.jpg",
  "reference_video_url": "https://example.com/motion-reference.mp4",
  "duration": 5,
  "aspect_ratio": "16:9"
}

Testing with Apidog

Set up an Apidog test collection before integrating.

1. Environment setup:

Create an environment with WAVESPEED_API_KEY as a Secret variable.

2. Two-request flow:

Request 1: Start generation.
Request 2: Poll for completion.

Request 1:

POST https://api.wavespeed.ai/api/v2/seedance/v2/image-to-video
Authorization: Bearer {{WAVESPEED_API_KEY}}
Content-Type: application/json

{
  "prompt": "{{motion_prompt}}",
  "image_url": "{{subject_image}}",
  "reference_video_url": "{{reference_clip}}",
  "duration": {{duration}},
  "aspect_ratio": "16:9"
}

In the Tests tab:

pm.environment.set("job_id", pm.response.json().id);

Request 2:

GET https://api.wavespeed.ai/api/v2/predictions/{{job_id}}
Authorization: Bearer {{WAVESPEED_API_KEY}}

Assert:

Check response body: status equals "completed".

Troubleshooting guide

Motion jitter

Trim the clip to remove edge micro-adjustments.
Reduce noise in the original footage.
Stabilize during capture, not in post.
Use a 3–5 second reference.
Simplify the text prompt (remove conflicting descriptors).

Reference ignored

Exaggerate the move; center the subject.
Use only one motion type per clip.
Explicitly state in text: “copy camera movement from reference.”
Select the cleanest 2–3 second span.
Use reference marks (e.g., tape) for clear parallax in camera moves.

Style drift

Limit style descriptors to 2–3.
Add a static reference frame along with video reference.
Reduce busy patterns and details.
Keep settings consistent across tests.
Lock in motion first, then iterate on appearance.

Rights and consent

If your reference video includes identifiable people:

Get written consent from all featured individuals.
Secure guardian signatures for minors.
Confirm filming locations allow commercial use.
Exclude prominent logos or third-party marks.
Keep records (dates, consent, clip versions).

These requirements apply to both the reference clip and any generated output with identifiable subjects.

FAQ

Does the reference video replace the image reference?

No—image reference sets subject appearance. Video reference sets motion. Use both to control appearance and motion separately.

How long should the reference clip be?

3–8 seconds. Shorter: not enough info. Longer: inconsistent results.

Can I use a reference clip from another genre?

Yes. For example, use a walking person in one context to generate a robot with the same gait. Motion transfers; appearance is set by your input.

What resolution should the reference clip be?

720p or higher. Low-resolution clips reduce motion quality in output.

Can I generate multiple clips from the same reference?

Yes—you can use the same reference for multiple generations with different prompts, keeping motion consistent across scenes.

DEV Community