Preecha

Posted on May 18

How to use reference video in Seedance 2.0: copy motion and camera moves

TL;DR

Reference video in Seedance 2.0 lets you anchor motion — camera moves, character choreography, timing — to an existing clip rather than describing everything in text. Use 3-8 second reference clips: single shot, no jump cuts, clean H.264 compression. Keep text prompts short: three adjectives or fewer for style. Text describes what the reference can’t show; the reference handles the motion. If your output drifts or ignores the reference, use the troubleshooting ladder in this guide.

Try Apidog today

Introduction

Text-only video generation works well for loose concepts: atmospheric scenes, exploratory directions, and varied visual approaches. But when the motion is already decided — the timing of a gesture, a camera push-in, or a walk cycle — text prompts are often too imprecise.

Reference video solves that by giving Seedance 2.0 an example clip to follow. The model can reinterpret the motion from that clip into the new scene described by your prompt.

This guide covers:

When to use reference video instead of text-only prompting
How to prepare reference clips that transfer motion cleanly
How to structure prompts around a reference clip
How to test the API flow with Apidog
How to debug common failure cases

When to use reference video

Use reference video when the motion matters more than creative variation.

Reference video works best for:

Micro-gestures: Precise timing like a thumb tap, a blink, or a nod that lands on a specific beat.
Choreography: Repeated physical routines, walk cycles, or body movement with a specific cadence.
Camera moves: Slow push-ins, controlled orbits, handheld motion, or framing changes that are hard to describe in words.
Beat-matching: Actions that need to align with audio cues or timed events.

Use text-only prompting when:

The concept is loose or atmospheric
You want to explore multiple visual directions
The motion is simple enough to describe
You do not have a clean reference clip

A practical rule: if you can sketch the scene in words, use text. If you need frame-level timing or movement fidelity, use a reference video.

Preparing reference clips

The reference clip should make the motion easy for the model to read.

Recommended reference clip settings

Property	Recommendation
Length	3-8 seconds
Shot type	Single continuous shot
Cuts	None
Compression	Clean H.264
Background	Plain or low-distraction
Lighting	Stable throughout
Resolution	720p or higher

Why clip quality matters

A good reference clip gives the model clear motion information. A poor clip forces it to infer too much.

Avoid:

Jump cuts
Re-encoded footage with macro-blocking
Busy backgrounds
Rapid lighting changes
Multiple subjects moving independently
Mixed motion types in one clip, such as character movement plus complex camera movement

Pre-upload checklist

Before uploading a reference clip, verify:

[ ] Clip is under 8 seconds
[ ] Clip is one continuous shot
[ ] No jump cuts or edits
[ ] H.264 compression is clean
[ ] No visible blocking artifacts
[ ] Subject is clearly visible against the background
[ ] Lighting is steady
[ ] Only one main motion is being demonstrated

Prompting with a reference clip

When using a reference clip, the text prompt should not restate every movement. Let the video handle motion and timing.

Use the text prompt for what the reference cannot show:

Visual style
Lighting
Color palette
Subject identity
Scene context
Constraints

Prompt structure

Use a compact structure like this:

Style: [2-3 descriptors for lighting and palette]
Subject: [identity description using stable visible features]
Camera: [only if different from the reference]
Reference intent: Respect motion from reference; reinterpret texture and color.
Must not: [one specific constraint if needed]

Example

Reference clip: a person walking forward with a measured pace.

Prompt:

Style: warm afternoon light, golden tones
Subject: a man in a gray suit, early 40s, confident posture
Reference intent: Respect motion from reference; reinterpret texture and color.
Must not: change walking pace

Keep style descriptors short

Limit style descriptors to two or three.

For example, use:

warm afternoon light, golden tones

Avoid stacking too many style terms:

cinematic, hyperrealistic, dreamy, futuristic, soft, moody, dramatic, glossy

Too many descriptors can conflict. The model may try to satisfy all of them and end up satisfying none well.

API usage via WaveSpeedAI

Seedance 2.0 is accessible via WaveSpeedAI’s API.

Reference video endpoint:

POST https://api.wavespeed.ai/api/v2/seedance/v2/image-to-video
Authorization: Bearer {{WAVESPEED_API_KEY}}
Content-Type: application/json

Example request body:

{
  "prompt": "Warm afternoon light, golden tones. A man in a gray suit walks forward. Respect motion from reference.",
  "image_url": "https://example.com/subject-reference.jpg",
  "reference_video_url": "https://example.com/motion-reference.mp4",
  "duration": 5,
  "aspect_ratio": "16:9"
}

Use:

prompt for style, subject, and constraints
image_url for the subject appearance reference
reference_video_url for motion and timing
duration to match the desired output length
aspect_ratio for output framing

Testing with Apidog

Set up a small test collection before building the integration into your app.

The workflow has two requests:

Start the generation job
Poll the prediction result until completion

1. Configure the environment

Create an Apidog environment and add:

WAVESPEED_API_KEY

Store it as a Secret variable.

You can also add reusable variables:

motion_prompt
subject_image
reference_clip
duration
job_id

2. Request 1: start generation

POST https://api.wavespeed.ai/api/v2/seedance/v2/image-to-video
Authorization: Bearer {{WAVESPEED_API_KEY}}
Content-Type: application/json

Request body:

{
  "prompt": "{{motion_prompt}}",
  "image_url": "{{subject_image}}",
  "reference_video_url": "{{reference_clip}}",
  "duration": {{duration}},
  "aspect_ratio": "16:9"
}

In the Tests tab, extract the job ID for polling:

pm.environment.set("job_id", pm.response.json().id);

3. Request 2: poll for completion

GET https://api.wavespeed.ai/api/v2/predictions/{{job_id}}
Authorization: Bearer {{WAVESPEED_API_KEY}}

Add an assertion that checks the response body field:

status == "completed"

You can keep polling until the job reaches a terminal state.

Troubleshooting guide

Use this ladder when the output does not follow the reference as expected.

Motion jitter

If the generated clip jitters or introduces unwanted micro-movements:

Trim the reference clip to remove accidental movement at the beginning or end.
Reduce visual noise in the source footage.
Capture the footage more steadily instead of relying on post-production stabilization.
Shorten the reference to 3-5 seconds.
Simplify the text prompt by removing descriptors that may conflict with the reference.

Reference ignored

If the model does not follow the reference motion:

Exaggerate the move slightly in the reference clip.
Center the subject in the frame.
Include only one type of motion per clip.
Do not mix complex camera movement with complex character movement.
Add an explicit instruction:

Copy camera movement from reference.

Extract the cleanest 2-3 second span from the reference clip.
For camera move references, use visible reference marks such as tape on a surface to clarify parallax.

Style drift

If the output follows the motion but not the intended aesthetic:

Reduce style descriptors to two or three.
Add a single static reference frame alongside the video reference.
Simplify patterns and busy details in the reference clip.
Keep settings consistent across renders.
Lock the motion first, then iterate on appearance.

Do not try to fix motion and style at the same time. First make the motion reliable. Then refine the visual direction.

Rights and consent

Reference video with identifiable people requires consent.

Practical requirements:

Get written consent from anyone whose motion or likeness appears in the reference clip.
Get guardian signatures for minors.
Verify that filming locations permit commercial use.
Exclude prominent logos or third-party marks from the reference.
Keep records of dates, consent notes, and clip versions.

These requirements apply to both:

The reference clip
Any identifiable subjects in the generated output

FAQ

Does the reference video replace the image reference?

No. They serve different purposes.

The image reference anchors subject appearance: who or what appears in the scene.

The video reference anchors motion: how subjects and camera move.

Use both when you want to control appearance and motion independently.

How long should the reference clip be?

Use 3-8 seconds.

If the clip is too short, the model may not get enough motion information. If it is too long, model confidence can drop and the output may become inconsistent.

Can I use a reference clip from a different genre?

Yes. For example, you can use a reference clip of a person walking and generate a robot character with the same gait.

The motion transfers. The visual content is replaced by your prompt and subject reference.

What resolution should the reference clip be?

Use 720p or higher.

Very low-resolution clips provide less motion information and usually produce weaker motion transfer.

Can I generate multiple clips from the same reference?

Yes. You can reuse the same reference clip with different prompts.

This is useful when you need multiple scene variations with consistent motion.

DEV Community