AI video demos well. Production is where it gets messy.
The failure mode I keep seeing:
Team generates 50 short clips, 7 are usable, nobody tracks why the other 43 failed, and the next batch starts from scratch.
That is not just a model problem. It is a workflow and measurement problem.
If you are building an AI video pipeline for ads, ecommerce, social, product marketing, or creative ops, do not start with:
cost per generation
Start with:
cost per usable clip
That metric forces you to include retries, review, editing, failed generations, and brand/compliance overhead.
Cost per generation is the wrong production metric
A typical estimate looks like this:
duration_seconds × credits_per_second × price_per_credit
That is useful for API spend. It is not production cost.
A better metric:
cost per usable clip
= generation_cost_per_attempt × attempts_per_usable_clip
+ human_review_cost
+ editing_cost
+ compliance_or_brand_review_cost
+ storage / orchestration / tooling cost
Track these variables:
- usable rate: what percentage of clips are publishable or close?
- attempts per usable clip: how many generations produce one usable asset?
- human minutes per usable clip: how much review/editing does each approved clip need?
- rejection reasons: why are clips failing?
If you do not track those, you are guessing.
A simple 50-generation pilot
Assume a team tests 5–8 second AI B-roll clips for social.
| Metric | Value |
|---|---|
| Total generations | 50 |
| Usable clips | 8 |
| Published clips | 5 |
| Total model/API cost | $30 |
| Total human review time | 180 min |
| Total editing time | 120 min |
| Internal hourly cost | $60/hr |
Calculations:
usable rate = 8 / 50 = 16%
attempts per usable clip = 50 / 8 = 6.25
review + editing = 300 min = 5 hours
human cost = 5 × $60 = $300
total pilot cost = $30 + $300 = $330
cost per usable clip = $330 / 8 = $41.25
cost per published clip = $330 / 5 = $66
That might be great if the alternative is a shoot, agency edit, or stock-footage workflow. It might be bad if your current process is faster and more reliable.
The point is not whether $66 is good or bad. The point is that you now have a number you can compare.
Log every attempt, not just the wins
You do not need a complex system at first. A spreadsheet, Airtable, Notion database, Postgres table, or JSONL file is enough.
Minimum fields:
| Field | Why it matters |
|---|---|
brief_id |
Groups attempts by campaign/request |
prompt_id / prompt_version
|
Compares prompt iterations |
model |
Compares vendors/models |
duration_seconds |
Helps calculate cost |
credits_used / generation_cost_usd
|
Tracks API spend |
asset_url |
Links output to metadata |
status |
Drives workflow |
rejection_reason |
Shows where quality fails |
review_minutes |
Captures human cost |
editing_minutes |
Captures post-production cost |
published |
Separates usable from shipped |
Example record:
{
"id": "gen_00042",
"brief_id": "bf_2025_001",
"prompt_id": "pr_003",
"prompt_version": "v2",
"model": "video-model-a",
"duration_seconds": 6,
"credits_used": 42,
"generation_cost_usd": 0.84,
"asset_url": "s3://ai-video-pilots/bf_2025_001/gen_00042.mp4",
"status": "rejected",
"rejection_reason": "product_detail_wrong",
"review_minutes": 3,
"editing_minutes": 0,
"published": false,
"created_at": "2026-05-21T12:00:00Z"
}
Start with fields that answer:
How much did this cost?
How much human time did it require?
Why did outputs fail?
Which prompts/models are improving?
Use explicit review states
Do not let generated media go directly from model output to scheduled post.
Use states like:
draft_brief
→ prompt_ready
→ generated
→ review_pending
→ approved_for_edit
→ edited
→ brand_review
→ approved_to_publish
→ scheduled
→ published
Rejected paths should be explicit too:
review_pending → rejected_quality
review_pending → rejected_accuracy
review_pending → rejected_rights_risk
brand_review → rejected_brand_fit
brand_review → needs_revision
This matters because rejection reasons are one of the most valuable outputs of the pilot.
If most clips fail because of prompt ambiguity, fix the prompt template.
If most fail because of product accuracy, use AI video for background visuals or pre-production instead of exact product shots.
If most fail during compliance review, model cost is probably irrelevant. Your bottleneck is risk.
A copyable pilot workflow
brief template
→ prompt template
→ generation job
→ asset storage
→ metadata logging
→ human review UI
→ edit/caption step
→ approval state
→ scheduler/manual publish
→ performance notes
→ cost dashboard
Brief template
Keep briefs structured. Free-text briefs make runs hard to compare.
{
"brief_id": "bf_2025_001",
"channel": "instagram_reel",
"format": "social_broll",
"duration_seconds": 6,
"goal": "support a post about summer product launch",
"must_include": ["bright kitchen", "morning light", "refreshing mood"],
"must_avoid": ["visible logos", "people drinking alcohol", "incorrect product packaging"],
"risk_level": "low",
"consistency_requirement": "low"
}
Prompt template
Version your prompts. They are part of the production system, not throwaway inputs.
Create a {{duration_seconds}} second {{format}} clip for {{channel}}.
Scene: {{scene}}.
Mood: {{mood}}.
Camera: {{camera_direction}}.
Must include: {{must_include}}.
Must avoid: {{must_avoid}}.
No text overlays. No logos. No recognizable public figures.
Generation job
Create a record before generation and update it after the asset exists.
async function runGenerationJob({ brief, prompt, model }) {
const record = await db.generations.insert({
brief_id: brief.id,
prompt_id: prompt.id,
prompt_version: prompt.version,
model,
status: "generation_started",
created_at: new Date().toISOString()
})
try {
const result = await videoProvider.generate({
model,
prompt: prompt.text,
duration_seconds: brief.duration_seconds
})
const assetUrl = await storage.save(result.video)
await db.generations.update(record.id, {
status: "review_pending",
asset_url: assetUrl,
duration_seconds: result.duration_seconds,
credits_used: result.credits_used,
generation_cost_usd: result.cost_usd
})
} catch (err) {
await db.generations.update(record.id, {
status: "generation_failed",
error_message: err.message
})
}
}
The provider does not matter for the pilot. The logging does.
Human review
Reviewers should not just click approve/reject. Make them choose a reason.
Useful rejection reasons:
artifact_or_distortion
product_detail_wrong
brand_mismatch
too_generic
prompt_not_followed
rights_or_likeness_risk
unsafe_or_policy_risk
needs_editing
other
This turns subjective review into data.
Cost dashboard
At the end of the pilot, calculate:
select
count(*) as total_generations,
sum(case when status in ('approved_to_publish', 'published') then 1 else 0 end) as usable_clips,
sum(generation_cost_usd) as model_cost,
sum(review_minutes) as review_minutes,
sum(editing_minutes) as editing_minutes
from generations
where brief_id = 'bf_2025_001';
Then compute:
usable_rate = usable_clips / total_generations
attempts_per_usable_clip = total_generations / usable_clips
human_cost = ((review_minutes + editing_minutes) / 60) × hourly_rate
cost_per_usable_clip = (model_cost + human_cost) / usable_clips
That is the number to compare with your existing workflow.
Where humans should stay in the loop
Automate:
- structured brief creation
- prompt generation from approved templates
- generation job creation
- file naming and storage
- metadata logging
- review queue creation
- caption/post copy drafts
- reporting
Keep human approval for:
- brand fit
- product accuracy
- claims and disclaimers
- likeness rights
- copyright/music concerns
- trademarks/logos
- platform ad policy risk
- sensitive categories like health, finance, children, politics, or legal topics
- final approval for paid campaigns
A good system increases throughput without turning publishing into an unreviewed media firehose.
Pick the right first use case
Evaluate AI video with two dimensions:
risk level
consistency requirement
| Risk | Consistency needed | Suggested use |
|---|---|---|
| Low | Low | Good production test |
| Low | High | Drafts, variants, partial shots |
| High | Low | Strict human review only |
| High | High | Keep traditional production primary |
Good early candidates:
- social B-roll
- ad hook variants
- background visuals
- storyboard previews
- internal concept exploration
- rough product scenario tests before a shoot
Use caution with:
- exact product demos
- regulated paid ads
- real customer likenesses
- recurring character stories
- complex multi-shot narratives
- brand hero films
- anything where a small visual error creates legal or trust risk
A clip can look impressive and still be wrong for production.
The two-week pilot I would run
Keep it narrow:
format: social B-roll clips
clip length: 5–8 seconds
models: 1–2
prompt templates: 2–3
target: 50 generations
success metric: cost per usable clip vs current workflow
Rules:
- Log every generation.
- Force reviewers to choose rejection reasons.
- Track review and editing minutes.
- Separate “usable” from “published.”
- Compare against a real current benchmark.
At the end, the answer should not be:
AI video is ready.
It should be:
For this format, on this channel, with this review process,
AI video costs $X per usable clip and meets / does not meet our quality bar.
That is a decision you can build on.
Final takeaway
AI video is production-ready when three things are true:
- Cost per usable clip beats your current benchmark.
- Quality clears the bar for the specific channel and risk level.
- The workflow is repeatable without heroic manual effort.
Until then, treat AI video like an experiment with instrumentation.
The model output is only one part of the system. The production system is the logging, review states, human gates, and feedback loop around it.
Top comments (0)