Most tutorials show you how to call Replicate. Few show you how to use it well inside a real production app. This article covers the mistakes I made and the patterns that actually work — using Goodbye Watermark as a real-world case study.
What Is Replicate, Really?
Replicate is a cloud API that lets you run AI models — image generation, video, audio, vision — without owning a single GPU. You send an HTTP request, a model runs on their infrastructure, and you get the result back.
The business model is pay-per-prediction: you're charged for the time the model actually runs, not idle time. That means cold boots don't affect your cost — only your latency.
1. Understand the Prediction Lifecycle Before Writing Any Code
Every Replicate call creates a prediction — an object with a lifecycle:
starting → processing → succeeded (or failed / canceled)
-
starting: model is booting (cold start happens here) -
processing:predict()is actively running -
succeeded: output is ready — but files are deleted after 1 hour
That last point is critical. If you're not saving outputs immediately, you'll lose them. More on that below.
2. Polling vs. Webhooks: Choose the Right Strategy
Replicate gives you three ways to handle async predictions:
Polling (simplest, fine for most apps)
// Create the prediction
const prediction = await replicate.predictions.create({
model: "owner/model-name",
input: { image: imageUrl },
});
// Poll until done
let result = prediction;
while (result.status !== "succeeded" && result.status !== "failed") {
await new Promise((r) => setTimeout(r, 1000));
result = await replicate.predictions.get(result.id);
}
Works well for short-lived predictions (under ~15s). Simple to implement. The tradeoff: you're making repeated requests even when nothing has changed.
Webhooks (better for longer or background tasks)
const prediction = await replicate.predictions.create({
model: "owner/model-name",
input: { image: imageUrl },
webhook: `${process.env.VERCEL_URL}/api/webhooks`,
webhook_events_filter: ["completed"], // only fire when done
});
Replicate POSTs to your URL when the prediction finishes. No polling loop. If there are network issues, they retry automatically.
Use webhooks when:
- Predictions take more than ~10-15 seconds
- You want to persist results to a database
- You're building background processing flows
Tip: Add query params to your webhook URL to carry context:
https://yourapp.com/api/webhooks?userId=abc123&predictionType=watermark
When to use each
| Scenario | Use |
|---|---|
| Fast model, UX waits for result | Polling |
| Slow model, fire and notify | Webhooks |
| Background job, store to DB | Webhooks |
| Quick prototype | Polling |
3. Cold Starts Are Real — Here's How to Handle Them
When a model hasn't been used recently, it needs to "boot up." This can add several seconds of latency on the first request after idle time.
For casual traffic: Cold boots are fine. You only pay for actual compute, not boot time.
For production apps with consistent traffic: Use a Deployment with minInstances: 1:
// Via the Replicate dashboard or API:
// Create a deployment for your model with min_instances = 1
// This keeps the model warm 24/7
This costs more (you're paying to keep the instance warm) but eliminates cold start latency entirely.
For Goodbye Watermark, I don't use a deployment because the traffic is spread across the day and a few seconds of latency on first boot is acceptable. But if you're building something with strict SLA requirements — use deployments.
4. Save Outputs Immediately — They Expire in 1 Hour
This is the gotcha that trips up everyone:
Input and output files are automatically deleted after 1 hour for any predictions created through the API.
If your app doesn't save the result right after succeeded, it's gone. Your options:
Option A: Stream back to the client immediately
// Next.js API route
export async function GET(request: Request) {
const output = await replicate.run("owner/model", { input });
return new Response(output); // stream back to client
}
Option B: Save to your own storage (Supabase Storage, S3, etc.)
const output = await replicate.run("owner/model", { input });
const response = await fetch(output[0]); // download from Replicate
const buffer = await response.arrayBuffer();
await supabase.storage.from("outputs").upload(`${userId}/${id}.png`, buffer);
For Goodbye Watermark, I stream the result directly back to the client. The user downloads it immediately. No storage needed, no expiry problem.
5. Next.js Config: Don't Forget This
If you're displaying output images from Replicate in a Next.js <Image> component, add this to your config or you'll get a domain error:
// next.config.ts
const nextConfig = {
images: {
remotePatterns: [
{
protocol: "https",
hostname: "replicate.delivery",
},
{
protocol: "https",
hostname: "*.replicate.delivery",
},
],
},
};
Small thing, but it will bite you in production.
6. Error Handling That Doesn't Suck
Real-world Replicate usage needs to handle:
- Network timeouts
- Model errors (bad input format, unsupported file type)
- Rate limits (429)
- Prediction timeouts (30 min hard cap)
try {
const prediction = await replicate.predictions.create({ ... });
if (prediction?.error) {
return NextResponse.json({ error: prediction.error }, { status: 500 });
}
// poll with timeout safety
let result = prediction;
const deadline = Date.now() + 60_000; // 60s max wait
while (result.status !== "succeeded" && result.status !== "failed") {
if (Date.now() > deadline) {
return NextResponse.json({ error: "Prediction timed out" }, { status: 504 });
}
await new Promise((r) => setTimeout(r, 1500));
result = await replicate.predictions.get(result.id);
}
if (result.status === "failed") {
return NextResponse.json({ error: "Model failed" }, { status: 500 });
}
return NextResponse.json({ output: result.output });
} catch (err) {
return NextResponse.json({ error: "Unexpected error" }, { status: 500 });
}
Set your own deadline. Replicate's hard limit is 30 minutes, but your users don't want to wait more than ~60 seconds for most tasks.
7. Rate Limits to Know
From Replicate's docs:
- Create prediction: 600 requests/minute
- All other endpoints: 3000 requests/minute
For most indie apps, you won't hit these. If you do, they return a 429 — build retry logic with exponential backoff.
8. Choosing the Right Model
Replicate hosts thousands of models. Two categories matter:
Official models — maintained by Replicate, always warm, stable API, predictable per-output pricing. Best for production use.
Community models — more variety, charged by compute time, may have cold starts, API can change between versions.
For Goodbye Watermark, I use the Qwen model for watermark removal. The choice came down to output quality and how well it handled semi-transparent watermarks — which are significantly harder than solid text watermarks. Testing a few models on realistic samples before committing to one is worth the extra hour.
Real-World Case Study: Goodbye Watermark
Goodbye Watermark is an AI watermark removal tool built with Next.js + Replicate + Vercel. The full stack is:
- Frontend: Next.js + Tailwind CSS
- AI: Replicate (Qwen model)
- Hosting: Vercel
- Payments: Stripe (two credit tiers)
The entire MVP was built in ~1 hour. The hardest part wasn't the UI — it was getting consistent output quality from the model across different watermark types.
Current results:
- ~150 weekly organic users
- $0 paid acquisition
- Zero infrastructure management
Replicate made the difference. Running my own GPU inference would have added weeks of setup and ongoing ops overhead. Instead, I spent that time on the UX and monetization.
TL;DR — The Patterns That Matter
- Understand the prediction lifecycle — especially the 1-hour file expiry
- Use polling for short tasks, webhooks for long/background ones
- Use Deployments if cold start latency is a problem for your UX
-
Save or stream outputs immediately after
succeeded - Add replicate.delivery to your Next.js image domains
- Set your own deadline — don't wait 30 minutes for a user-facing request
- Test multiple models before committing — quality varies significantly
Replicate is genuinely one of the best tools for indie developers shipping AI products fast. Use it well and you can build something real in a weekend.
Built something with Replicate? Drop it in the comments — always curious to see what people are shipping.
Top comments (0)