TL;DR
Reference video in Seedance 2.0 lets you anchor motion — camera moves, character choreography, timing — to an existing clip rather than describing everything in text. Use 3-8 second reference clips: single shot, no jump cuts, clean H.264 compression. Keep text prompts short: three adjectives or fewer for style. Text describes what the reference can’t show; the reference handles the motion. If your output drifts or ignores the reference, use the troubleshooting ladder in this guide.
Introduction
Text-only video generation works well for loose concepts: atmospheric scenes, exploratory directions, and varied visual approaches. But when the motion is already decided — the timing of a gesture, a camera push-in, or a walk cycle — text prompts are often too imprecise.
Reference video solves that by giving Seedance 2.0 an example clip to follow. The model can reinterpret the motion from that clip into the new scene described by your prompt.
This guide covers:
- When to use reference video instead of text-only prompting
- How to prepare reference clips that transfer motion cleanly
- How to structure prompts around a reference clip
- How to test the API flow with Apidog
- How to debug common failure cases
When to use reference video
Use reference video when the motion matters more than creative variation.
Reference video works best for:
- Micro-gestures: Precise timing like a thumb tap, a blink, or a nod that lands on a specific beat.
- Choreography: Repeated physical routines, walk cycles, or body movement with a specific cadence.
- Camera moves: Slow push-ins, controlled orbits, handheld motion, or framing changes that are hard to describe in words.
- Beat-matching: Actions that need to align with audio cues or timed events.
Use text-only prompting when:
- The concept is loose or atmospheric
- You want to explore multiple visual directions
- The motion is simple enough to describe
- You do not have a clean reference clip
A practical rule: if you can sketch the scene in words, use text. If you need frame-level timing or movement fidelity, use a reference video.
Preparing reference clips
The reference clip should make the motion easy for the model to read.
Recommended reference clip settings
| Property | Recommendation |
|---|---|
| Length | 3-8 seconds |
| Shot type | Single continuous shot |
| Cuts | None |
| Compression | Clean H.264 |
| Background | Plain or low-distraction |
| Lighting | Stable throughout |
| Resolution | 720p or higher |
Why clip quality matters
A good reference clip gives the model clear motion information. A poor clip forces it to infer too much.
Avoid:
- Jump cuts
- Re-encoded footage with macro-blocking
- Busy backgrounds
- Rapid lighting changes
- Multiple subjects moving independently
- Mixed motion types in one clip, such as character movement plus complex camera movement
Pre-upload checklist
Before uploading a reference clip, verify:
- [ ] Clip is under 8 seconds
- [ ] Clip is one continuous shot
- [ ] No jump cuts or edits
- [ ] H.264 compression is clean
- [ ] No visible blocking artifacts
- [ ] Subject is clearly visible against the background
- [ ] Lighting is steady
- [ ] Only one main motion is being demonstrated
Prompting with a reference clip
When using a reference clip, the text prompt should not restate every movement. Let the video handle motion and timing.
Use the text prompt for what the reference cannot show:
- Visual style
- Lighting
- Color palette
- Subject identity
- Scene context
- Constraints
Prompt structure
Use a compact structure like this:
Style: [2-3 descriptors for lighting and palette]
Subject: [identity description using stable visible features]
Camera: [only if different from the reference]
Reference intent: Respect motion from reference; reinterpret texture and color.
Must not: [one specific constraint if needed]
Example
Reference clip: a person walking forward with a measured pace.
Prompt:
Style: warm afternoon light, golden tones
Subject: a man in a gray suit, early 40s, confident posture
Reference intent: Respect motion from reference; reinterpret texture and color.
Must not: change walking pace
Keep style descriptors short
Limit style descriptors to two or three.
For example, use:
warm afternoon light, golden tones
Avoid stacking too many style terms:
cinematic, hyperrealistic, dreamy, futuristic, soft, moody, dramatic, glossy
Too many descriptors can conflict. The model may try to satisfy all of them and end up satisfying none well.
API usage via WaveSpeedAI
Seedance 2.0 is accessible via WaveSpeedAI’s API.
Reference video endpoint:
POST https://api.wavespeed.ai/api/v2/seedance/v2/image-to-video
Authorization: Bearer {{WAVESPEED_API_KEY}}
Content-Type: application/json
Example request body:
{
"prompt": "Warm afternoon light, golden tones. A man in a gray suit walks forward. Respect motion from reference.",
"image_url": "https://example.com/subject-reference.jpg",
"reference_video_url": "https://example.com/motion-reference.mp4",
"duration": 5,
"aspect_ratio": "16:9"
}
Use:
-
promptfor style, subject, and constraints -
image_urlfor the subject appearance reference -
reference_video_urlfor motion and timing -
durationto match the desired output length -
aspect_ratiofor output framing
Testing with Apidog
Set up a small test collection before building the integration into your app.
The workflow has two requests:
- Start the generation job
- Poll the prediction result until completion
1. Configure the environment
Create an Apidog environment and add:
WAVESPEED_API_KEY
Store it as a Secret variable.
You can also add reusable variables:
motion_prompt
subject_image
reference_clip
duration
job_id
2. Request 1: start generation
POST https://api.wavespeed.ai/api/v2/seedance/v2/image-to-video
Authorization: Bearer {{WAVESPEED_API_KEY}}
Content-Type: application/json
Request body:
{
"prompt": "{{motion_prompt}}",
"image_url": "{{subject_image}}",
"reference_video_url": "{{reference_clip}}",
"duration": {{duration}},
"aspect_ratio": "16:9"
}
In the Tests tab, extract the job ID for polling:
pm.environment.set("job_id", pm.response.json().id);
3. Request 2: poll for completion
GET https://api.wavespeed.ai/api/v2/predictions/{{job_id}}
Authorization: Bearer {{WAVESPEED_API_KEY}}
Add an assertion that checks the response body field:
status == "completed"
You can keep polling until the job reaches a terminal state.
Troubleshooting guide
Use this ladder when the output does not follow the reference as expected.
Motion jitter
If the generated clip jitters or introduces unwanted micro-movements:
- Trim the reference clip to remove accidental movement at the beginning or end.
- Reduce visual noise in the source footage.
- Capture the footage more steadily instead of relying on post-production stabilization.
- Shorten the reference to 3-5 seconds.
- Simplify the text prompt by removing descriptors that may conflict with the reference.
Reference ignored
If the model does not follow the reference motion:
- Exaggerate the move slightly in the reference clip.
- Center the subject in the frame.
- Include only one type of motion per clip.
- Do not mix complex camera movement with complex character movement.
- Add an explicit instruction:
Copy camera movement from reference.
- Extract the cleanest 2-3 second span from the reference clip.
- For camera move references, use visible reference marks such as tape on a surface to clarify parallax.
Style drift
If the output follows the motion but not the intended aesthetic:
- Reduce style descriptors to two or three.
- Add a single static reference frame alongside the video reference.
- Simplify patterns and busy details in the reference clip.
- Keep settings consistent across renders.
- Lock the motion first, then iterate on appearance.
Do not try to fix motion and style at the same time. First make the motion reliable. Then refine the visual direction.
Rights and consent
Reference video with identifiable people requires consent.
Practical requirements:
- Get written consent from anyone whose motion or likeness appears in the reference clip.
- Get guardian signatures for minors.
- Verify that filming locations permit commercial use.
- Exclude prominent logos or third-party marks from the reference.
- Keep records of dates, consent notes, and clip versions.
These requirements apply to both:
- The reference clip
- Any identifiable subjects in the generated output
FAQ
Does the reference video replace the image reference?
No. They serve different purposes.
The image reference anchors subject appearance: who or what appears in the scene.
The video reference anchors motion: how subjects and camera move.
Use both when you want to control appearance and motion independently.
How long should the reference clip be?
Use 3-8 seconds.
If the clip is too short, the model may not get enough motion information. If it is too long, model confidence can drop and the output may become inconsistent.
Can I use a reference clip from a different genre?
Yes. For example, you can use a reference clip of a person walking and generate a robot character with the same gait.
The motion transfers. The visual content is replaced by your prompt and subject reference.
What resolution should the reference clip be?
Use 720p or higher.
Very low-resolution clips provide less motion information and usually produce weaker motion transfer.
Can I generate multiple clips from the same reference?
Yes. You can reuse the same reference clip with different prompts.
This is useful when you need multiple scene variations with consistent motion.
Top comments (0)