Joan A.

Posted on Mar 24

What Building an AI Contract Review Tool Taught Me About Trust, Tone, and Starting Narrow

#ai #product #showdev #startup

When people first hear about an AI tool for reviewing work contracts, the reaction is usually something like:

“That sounds straightforward. Upload a contract, extract the text, ask the model to explain it.”

In practice, it’s not straightforward at all.

has taught me that contract review is one of those product categories that looks simple from the outside, but gets complicated the moment you try to make it reliable for real users.

A contract is not just text. It is risk, context, ambiguity, and user anxiety packed into a PDF.

And that changes everything.

The first lesson: start with one contract type

One of the biggest mistakes you can make when building an AI product is assuming that similar-looking tasks are actually the same task.

At first glance, employment contracts, freelance agreements, NDAs, consulting agreements, and offer letters all feel close enough that a single generalized workflow should handle them.

That assumption breaks fast.

Each document type has different structures, different key clauses, different user expectations, and different levels of risk. The same model prompt that sounds useful on one contract can sound vague, overly cautious, or even misleading on another.

That’s why I think starting narrow matters so much.

Instead of trying to analyze every possible legal document from day one, it makes far more sense to focus on one contract type and get very good at it. Not because expansion is impossible, but because trust is built through consistency. A tool that works well for one use case is more valuable than a tool that kind of works for ten.

In AI products, premature generalization is often just another form of product fragility.

The second lesson: tone is part of the product

Another thing I underestimated early on was how much the tone of the output matters.

With contract review, users do not just want information. They want information delivered in a way that feels clear, grounded, and useful.

Too neutral, and the output feels generic.
Too aggressive, and everything sounds like a legal emergency.
Too much hedging, and users stop trusting the tool.
Too much confidence, and the tool becomes dangerous.

The sweet spot is surprisingly hard to hit.

The best experience is usually not “formal legal robot” and not “casual AI assistant.” It is closer to a knowledgeable friend who helps you understand what deserves attention.

That means the model has to do more than summarize clauses. It has to communicate risk with the right level of weight. A non-compete clause should not sound identical to a probation period clause. A standard confidentiality clause should not trigger the same tone as a one-sided termination condition.

This is where prompt design alone is not enough.

You need iteration, feedback, and a quality loop. You need to read real outputs and ask:

Does this feel trustworthy?
Does this over-warn?
Does this bury the important part?
Would a normal user actually understand what to do next?

In other words, UX writing and model behavior are tightly connected. Tone is not decoration. Tone is product design.

The third lesson: PDF extraction is not a minor detail

A lot of AI product demos begin after the text is already clean.

Real users do not.

They upload scanned PDFs, low-quality exports, phone camera captures, partially corrupted files, and documents with inconsistent formatting. And once that happens, your system is only as good as the text extraction layer underneath it.

This is one of the hardest parts of the stack.

The most dangerous failure mode is not a visible crash. It is silent failure.

The OCR or extraction pipeline mangles a clause.
A heading disappears.
A date is misread.
A salary figure is broken.
A paragraph is split incorrectly.

Then the LLM does exactly what LLMs do: it confidently analyzes the text it received.

That means the model can produce something polished, structured, and persuasive while being grounded in incomplete or distorted input.

This is why I think extraction quality should be treated as a core product problem, not a preprocessing detail.

If the source text is weak, the system should be able to detect uncertainty early, flag it clearly, and avoid pretending the analysis is operating on perfect input.

For users, confidence without reliability is worse than a limitation message.

The real product is trust

What I’ve learned from building WorkContractReview.com
is that AI contract review is not mainly about “making GPT read a contract.”

It is about building a system that users can trust when the stakes feel personal.

And work contracts are personal.

People are not uploading them for curiosity. They are uploading them because they are about to sign something that affects their income, obligations, flexibility, and future options. They want speed, yes. But more than that, they want clarity.

That changes how you build.

You start caring less about showing off model intelligence and more about reducing ambiguity.
You start caring less about covering every document type and more about reliability in the ones you support.
You start caring less about sounding impressive and more about sounding useful.

That shift has probably been the biggest lesson for me.

Top comments (1)

Travis Drake • Mar 25

I build in this space and that all tracks with my experience. Some domains seem simple until you realize you have to get every single detail correct. The architecture gets complex quickly.