A few months ago, I almost killed a feature.
Not because it didn’t work
but because improving it felt… impossible.
We had an AI system in production.
Users were interacting with it daily.
And they were doing something incredibly valuable:
👎 Clicking “thumbs down”
At first, we treated it like a metric.
Then it hit me:
That is the dataset.
🧠 The Moment Everything Clicked
Every time a user said:
- “this is wrong”
- “this isn’t helpful”
- “this makes no sense”
They were giving us:
real-world training data
Not synthetic.
Not curated.
Not delayed.
Raw. Messy. Honest.
And we were… ignoring it.
Because like most teams, we thought:
“Fine-tuning is expensive. We’ll deal with it later.”
⚠️ The Lie Most Founders Believe
Fine-tuning has a reputation problem.
You hear it and think:
- GPU clusters
- ML engineers
- weeks of experimentation
That’s true for large-scale research.
But for a product?
It’s overkill.
🔁 The Shift: From Pipelines to Loops
Instead of building a “training pipeline,”
we built a feedback loop.
Small difference. Massive impact.
⚙️ What We Actually Built
Nothing fancy.
Just:
- SQS → store feedback
- Lambda → decide when to train
- Batch + Spot GPU → run training
- S3 → store model versions
That’s it.
No always-on infrastructure.
No ML team.
No pipeline monster.
💡 The Part Nobody Tells You
This only works if you fix one thing:
❌ “thumbs down” is not enough
A negative signal tells you:
something is wrong
But not:
what is right
So we added one tiny UX change:
👉 “What should it have said instead?”
That single input:
- improved training quality dramatically
- reduced noise
- made the model actually improve
⚠️ Where We Almost Broke Everything
This is where most blog posts lie to you.
1. We shipped a worse model
The first time we automated training:
- accuracy dropped
- responses got inconsistent
Why?
Because we skipped evaluation.
Now:
- every model is tested before deployment
- bad versions never go live
2. Spot instances killed our jobs
We loved the cost savings…
until training jobs randomly died.
Turns out:
Spot instances can terminate anytime
Fix:
- checkpoint training to S3
- retry automatically
3. Costs weren’t zero (but close)
We expected “almost free”
Reality:
- small but real costs from SQS, logs, storage
- occasional spikes from training
Nothing scary — but not $0 either.
💰 What This Actually Costs
Here’s what we see at early-stage scale:
| Component | What you pay for | Monthly cost |
|---|---|---|
| SQS | requests (1M free tier) | $1–3 ([Amazon Web Services, Inc.][1]) |
| Lambda | executions + duration | $1–10 ([Amazon Web Services, Inc.][2]) |
| S3 | storage + requests | $1–5 ([Amazon Web Services, Inc.][3]) |
| Batch | orchestration | $0 ([Amazon Web Services, Inc.][4]) |
| GPU (Spot) | training time | $5–30 |
| Logs + misc | CloudWatch etc. | $1–10 |
Total:
👉 ~$10 to $60/month
The reason it’s cheap is simple:
Nothing runs unless users give feedback
🧠 The Real Insight
This isn’t about infrastructure.
It’s about mindset.
Most teams think:
“We’ll improve the model later”
The better approach:
Let users improve it continuously
🏆 What Changed After We Shipped This
- The model improved every week
- Edge cases started disappearing
- users noticed
But more importantly:
We stopped guessing what users wanted
⚠️ What I Would Do Differently
If I had to rebuild this:
1. Start collecting feedback on day 1
Not after launch
2. Force correction input early
Not optional
3. Add evaluation before automation
Not after breaking production
🧾 Final Thought
You don’t need:
- a research team
- expensive infrastructure
- complex pipelines
You need:
- a feedback loop
- a trigger
- and a way to not make things worse
🔥 One Line That Changed How I Think About AI Systems
Your model doesn’t get better when you train it.
It gets better when users correct it.
Curious how others are doing this:
👉 Are you collecting feedback but not using it?
👉 Or already closing the loop?
Let’s talk 👇
Top comments (2)
Excellent insights on fine-tuning. The feedback loop is key!
Thank you for the appreciation ☺️. Keep reading!