What Building an AI Detector Taught Me About Machine Learning

#ai #llm #machinelearning #nlp

When I started building Naturalmelo, I thought the difficult part would be training a machine learning model to distinguish AI-generated text from human writing.

I quickly realized that wasn't the hardest problem.

The more challenging question was actually what users expected the detector to do.

The First Mistake I Made

Initially, I treated AI detection like a traditional classification task.

Input text
      ↓
ML Model
      ↓
Human or AI

Simple enough.

But after testing different LLMs and talking with users, it became obvious that this assumption didn't match reality.

Most documents today aren't purely human-written or AI-generated.

A common workflow looks more like this:

Human creates an outline
AI generates a draft
Human rewrites sections
AI improves grammar
Human performs the final review

Trying to classify that document with a single label loses a lot of useful information.

Accuracy Isn't the Entire Product

As developers, we naturally optimize for metrics.

Higher accuracy.

Lower latency.

Better precision and recall.

While those metrics still matter, they aren't necessarily what users care about most.

Most users didn't ask me,

"How accurate is your detector?"

Instead they asked:

Can I trust this result?
Which parts of my document look suspicious?
What should I review before publishing?

That shifted my thinking from building a classifier to building a decision-support tool.

The Engineering Challenge

One interesting challenge is that modern language models improve constantly.

Patterns that worked well for older models don't necessarily generalize to newer ones.

That means an AI detector can't be treated as a "train once and forget" system.

It has to evolve alongside the models it's trying to analyze.

For me, this changed the project from a machine learning problem into a continuous engineering problem involving evaluation, iteration, and monitoring.

The Bigger Lesson

The biggest takeaway from building Naturalmelo wasn't about machine learning.

It was about product design.

Developers often optimize for model performance because it's measurable.

Users optimize for confidence because that's what helps them make decisions.

Those aren't always the same thing.

Building software that bridges that gap turned out to be much more interesting than simply chasing another percentage point of accuracy.

If you're building AI products, I'd recommend spending just as much time understanding how people use the output as you do improving the model itself.

In the end, that might be the feature users value most.

I'd love to hear from other developers building AI products.

Have you found that the hardest problem wasn't the model itself, but how users actually interact with it?