DEV Community

Cover image for I Thought Fine-Tuning Needed an ML Team. I Was Wrong.
Dhananjay Lakkawar
Dhananjay Lakkawar

Posted on

I Thought Fine-Tuning Needed an ML Team. I Was Wrong.

A few months ago, I almost killed a feature.

Not because it didn’t work
but because improving it felt… impossible.

We had an AI system in production.
Users were interacting with it daily.

And they were doing something incredibly valuable:

👎 Clicking “thumbs down”

At first, we treated it like a metric.

Then it hit me:

That is the dataset.


🧠 The Moment Everything Clicked

Every time a user said:

  • “this is wrong”
  • “this isn’t helpful”
  • “this makes no sense”

They were giving us:

real-world training data

Not synthetic.
Not curated.
Not delayed.

Raw. Messy. Honest.

And we were… ignoring it.

Because like most teams, we thought:

“Fine-tuning is expensive. We’ll deal with it later.”


⚠️ The Lie Most Founders Believe

Fine-tuning has a reputation problem.

You hear it and think:

  • GPU clusters
  • ML engineers
  • weeks of experimentation

That’s true for large-scale research.

But for a product?

It’s overkill.


🔁 The Shift: From Pipelines to Loops

Instead of building a “training pipeline,”
we built a feedback loop.

Small difference. Massive impact.

Image SECPMD

⚙️ What We Actually Built

Nothing fancy.

Just:

  • SQS → store feedback
  • Lambda → decide when to train
  • Batch + Spot GPU → run training
  • S3 → store model versions

That’s it.

No always-on infrastructure.
No ML team.
No pipeline monster.


💡 The Part Nobody Tells You

This only works if you fix one thing:

❌ “thumbs down” is not enough

A negative signal tells you:

something is wrong

But not:

what is right

So we added one tiny UX change:

👉 “What should it have said instead?”

That single input:

  • improved training quality dramatically
  • reduced noise
  • made the model actually improve

⚠️ Where We Almost Broke Everything

This is where most blog posts lie to you.

1. We shipped a worse model

The first time we automated training:

  • accuracy dropped
  • responses got inconsistent

Why?

Because we skipped evaluation.

Now:

  • every model is tested before deployment
  • bad versions never go live

2. Spot instances killed our jobs

We loved the cost savings…
until training jobs randomly died.

Turns out:

Spot instances can terminate anytime

Fix:

  • checkpoint training to S3
  • retry automatically

3. Costs weren’t zero (but close)

We expected “almost free”

Reality:

  • small but real costs from SQS, logs, storage
  • occasional spikes from training

Nothing scary — but not $0 either.


💰 What This Actually Costs

Here’s what we see at early-stage scale:

Component What you pay for Monthly cost
SQS requests (1M free tier) $1–3 ([Amazon Web Services, Inc.][1])
Lambda executions + duration $1–10 ([Amazon Web Services, Inc.][2])
S3 storage + requests $1–5 ([Amazon Web Services, Inc.][3])
Batch orchestration $0 ([Amazon Web Services, Inc.][4])
GPU (Spot) training time $5–30
Logs + misc CloudWatch etc. $1–10

Total:

👉 ~$10 to $60/month

The reason it’s cheap is simple:

Nothing runs unless users give feedback


🧠 The Real Insight

This isn’t about infrastructure.

It’s about mindset.

Most teams think:

“We’ll improve the model later”

The better approach:

Let users improve it continuously


🏆 What Changed After We Shipped This

  • The model improved every week
  • Edge cases started disappearing
  • users noticed

But more importantly:

We stopped guessing what users wanted


⚠️ What I Would Do Differently

If I had to rebuild this:

1. Start collecting feedback on day 1

Not after launch

2. Force correction input early

Not optional

3. Add evaluation before automation

Not after breaking production


🧾 Final Thought

You don’t need:

  • a research team
  • expensive infrastructure
  • complex pipelines

You need:

  • a feedback loop
  • a trigger
  • and a way to not make things worse

🔥 One Line That Changed How I Think About AI Systems

Your model doesn’t get better when you train it.
It gets better when users correct it.


Curious how others are doing this:

👉 Are you collecting feedback but not using it?
👉 Or already closing the loop?

Let’s talk 👇

Top comments (2)

Collapse
 
laurent_quastana profile image
Laurent Quastana

Excellent insights on fine-tuning. The feedback loop is key!

Collapse
 
dhananjay_lakkawar profile image
Dhananjay Lakkawar • Edited

Thank you for the appreciation ☺️. Keep reading!