TL;DR
Reference video in Seedance 2.0 lets you anchor motion—camera moves, character choreography, timing—to an existing clip instead of describing everything in text. Use clean, single-shot, 3–8 second H.264 reference clips (no jump cuts). Keep text prompts short (≤3 adjectives for style). Text covers what the reference can’t show; the reference drives motion. If results ignore the reference, follow the troubleshooting steps in this guide.
Introduction
Text-only video generation is effective for open-ended concepts: atmospheric scenes, exploratory directions, varied visual styles. But when you need precise motion—specific gesture timing, a camera push-in, or a walk cycle—text falls short.
Reference video bridges this gap. Provide a sample clip, and Seedance 2.0 reinterprets the motion for your new scene.
This guide explains when to use reference video, how to create effective reference clips, and how to troubleshoot common issues.
When to use reference video
Reference video is best for:
- Micro-gestures: Precise actions like “a thumb tap” or “a nod that lands on beat three.”
- Choreography: Consistent motion patterns, such as a specific walk or repeated routine.
- Camera moves: Subtle operations (slow push-ins, orbits, unique framing changes).
- Beat-matching: Syncing actions to audio cues; the model reads timing directly from the clip.
Text-only is better for:
- Loose concepts or atmospheric pieces where variety is desirable
- Exploring visual directions for the same content
- Cases with simple motion that text can describe, or no reference clip is available
Preparing reference clips
To get reliable results, follow these best practices:
- Length: 3–8 seconds. Short: too little info. Long: inconsistent output.
- Continuity: No edits, cuts, or jump cuts. Use a single continuous shot.
- Compression: Clean H.264 (no macro-blocking or artifacts).
- Subject clarity: Plain backgrounds, steady lighting. Minimize distractions.
Reference clip checklist:
- [ ] Under 8 seconds
- [ ] Single continuous shot (no cuts)
- [ ] Clean compression (no visible blocking)
- [ ] Subject clearly visible
- [ ] Steady lighting throughout
Prompting with a reference clip
Combine reference video with concise text. The text prompt should clarify what the reference can’t show.
Use text for:
- Style descriptors (lighting, color palette, tone)
- Subject identity (who or what appears)
- Camera context (if not clear from reference)
- One or two specific constraints
Optimal prompt structure:
Style: [2-3 descriptors for lighting and palette]
Subject: [identity description with clear features]
Camera: [if different from reference]
Reference intent: "Respect motion from reference: reinterpret texture and color."
Must not: [one specific constraint, if needed]
Example:
Reference: person walking with a specific pace
Prompt:
Style: warm afternoon light, golden tones
Subject: a man in a gray suit, early 40s, confident posture
Respect motion from reference: reinterpret texture and color.
Must not: change walking pace
Tip: Limit to three style adjectives. More can create conflicting instructions and reduce output quality.
API usage via WaveSpeedAI
Use the WaveSpeedAI API to generate video with reference clips:
POST https://api.wavespeed.ai/api/v2/seedance/v2/image-to-video
Authorization: Bearer {{WAVESPEED_API_KEY}}
Content-Type: application/json
{
"prompt": "Warm afternoon light, golden tones. A man in a gray suit walks forward. Respect motion from reference.",
"image_url": "https://example.com/subject-reference.jpg",
"reference_video_url": "https://example.com/motion-reference.mp4",
"duration": 5,
"aspect_ratio": "16:9"
}
Testing with Apidog
Set up an Apidog test collection before integrating.
1. Environment setup:
Create an environment with WAVESPEED_API_KEY as a Secret variable.
2. Two-request flow:
- Request 1: Start generation.
- Request 2: Poll for completion.
Request 1:
POST https://api.wavespeed.ai/api/v2/seedance/v2/image-to-video
Authorization: Bearer {{WAVESPEED_API_KEY}}
Content-Type: application/json
{
"prompt": "{{motion_prompt}}",
"image_url": "{{subject_image}}",
"reference_video_url": "{{reference_clip}}",
"duration": {{duration}},
"aspect_ratio": "16:9"
}
In the Tests tab:
pm.environment.set("job_id", pm.response.json().id);
Request 2:
GET https://api.wavespeed.ai/api/v2/predictions/{{job_id}}
Authorization: Bearer {{WAVESPEED_API_KEY}}
Assert:
Check response body: status equals "completed".
Troubleshooting guide
Motion jitter
- Trim the clip to remove edge micro-adjustments.
- Reduce noise in the original footage.
- Stabilize during capture, not in post.
- Use a 3–5 second reference.
- Simplify the text prompt (remove conflicting descriptors).
Reference ignored
- Exaggerate the move; center the subject.
- Use only one motion type per clip.
- Explicitly state in text: “copy camera movement from reference.”
- Select the cleanest 2–3 second span.
- Use reference marks (e.g., tape) for clear parallax in camera moves.
Style drift
- Limit style descriptors to 2–3.
- Add a static reference frame along with video reference.
- Reduce busy patterns and details.
- Keep settings consistent across tests.
- Lock in motion first, then iterate on appearance.
Rights and consent
If your reference video includes identifiable people:
- Get written consent from all featured individuals.
- Secure guardian signatures for minors.
- Confirm filming locations allow commercial use.
- Exclude prominent logos or third-party marks.
- Keep records (dates, consent, clip versions).
These requirements apply to both the reference clip and any generated output with identifiable subjects.
FAQ
Does the reference video replace the image reference?
No—image reference sets subject appearance. Video reference sets motion. Use both to control appearance and motion separately.
How long should the reference clip be?
3–8 seconds. Shorter: not enough info. Longer: inconsistent results.
Can I use a reference clip from another genre?
Yes. For example, use a walking person in one context to generate a robot with the same gait. Motion transfers; appearance is set by your input.
What resolution should the reference clip be?
720p or higher. Low-resolution clips reduce motion quality in output.
Can I generate multiple clips from the same reference?
Yes—you can use the same reference for multiple generations with different prompts, keeping motion consistent across scenes.
Top comments (0)