Paulo Eremita

Posted on May 6

Structured Context Before AI: The Rule That Made My Legacy Analysis Tool Useful

#ai #python #legacy #productivity

A lot of AI-assisted coding tools fail for a simple reason:

they ask the model to think before the system has done enough work.

That usually leads to one of two outcomes:

vague summaries that sound smart but don’t help much
confident mistakes wrapped in beautiful prose

While building my legacy analysis tool, I kept running into the same conclusion:

if you want useful AI output, you need structured context first

That rule changed the whole design of the project.

The temptation

When working with a legacy codebase, it is very tempting to do this:

grab a file
send it to an LLM
ask: "what does this do?"

Sometimes that works.

But legacy systems are rarely understandable from a single file.

A method may look innocent until you discover:

it depends on includes
it calls another class indirectly
it builds SQL dynamically
it is triggered by a UI flow you haven't seen yet
the real business meaning lives in the database schema, not the method body

If you ask AI too early, it will often fill the gaps with plausible fiction.

That is not intelligence. That is autocomplete wearing a tie.

What I changed

Instead of starting with AI, I designed the workflow to start with extraction.

The tool tries to collect structure before asking for commentary.

Depending on the analysis path, that may include:

file path and class/method identity
dependency chains
include resolution
SQL blocks
tables touched
UI actions and endpoints
schema metadata
procedure summaries
domain map hints

Only after that does AI step in.

So the prompt is no longer:

"please explain this mysterious code"

It becomes closer to:

"here is the method, here are its dependencies, here are the SQL fragments, here are the tables, here is the path it came from, now explain what is actually happening"

That difference is enormous.

Why this works better

Structured context improves AI output in at least four ways.

1. It reduces hallucination pressure

When the model has actual extracted signals, it does not need to invent as much missing context.

That does not eliminate mistakes, but it lowers the odds of polished nonsense.

2. It makes answers more specific

Instead of generic observations like:

"this appears to handle business logic"
"it may interact with the database"

you start getting analysis grounded in real technical evidence:

which tables are involved
which dependencies were resolved
which flow the UI suggests
where the risk points actually are

3. It makes reports reusable

If the context is structured, it can be cached, compared, regenerated, and reused.

That matters a lot in real engineering work, where you often revisit the same area multiple times.

4. It makes AI part of a system, not a gimmick

This is the most important one.

AI becomes one stage in a pipeline.

Not the whole pipeline.

That means the tool still has value even before the model speaks.

And that is a good test of whether your design is serious.

A practical example

Let’s say you are trying to understand a legacy backend action.

A weak workflow is:

open one PHP file
paste it into an LLM
ask for a summary

A stronger workflow is:

identify the real method or action
resolve related includes
extract SQL fragments
detect external class calls
identify likely tables involved
inspect schema when needed
only then generate AI commentary

The first workflow gives you a guess.

The second gives you a technical briefing.

The hidden lesson

This idea goes beyond legacy analysis.

I think it applies to almost every serious AI engineering workflow.

If you want reliable AI assistance, you usually need some combination of:

preprocessing
structure extraction
constrained prompting
typed outputs
caching
human-readable artifacts

In other words:

the quality of the AI output often depends more on the surrounding system than on the model itself.

That is not as flashy as saying "I built an AI tool."

But it is much closer to the truth.

What I believe now

I no longer think the most valuable AI tools are the ones that "do everything."

I think the valuable ones are the ones that:

collect the right context
shape the right questions
constrain the right outputs
fit into real workflows engineers already need

Models matter.

But scaffolding matters more than people want to admit.

Final thought

One of the most useful engineering habits I’m developing is this:

never ask AI to infer what your system could have extracted first

That rule made my tool better.

It also made my trust in the output more rational.

And in applied AI, rational trust is worth a lot more than impressive demos.

If you’re building AI-assisted engineering tools, I’d love to know:

how much of your result comes from the model, and how much comes from the structure around it?

DEV Community