I Accidentally Found a Better Way to Get effective results from AI Agents

#programming #softwareengineering #productivity #ai

The Problem I Kept Running Into

Recently I was experimenting with different ways of using AI agents on larger projects and I found a workflow that gives surprisingly better results than normal prompting.

Usually when people use tools like Claude or ChatGPT for coding, they paste a few snippets or files and ask questions directly. That works fine for smaller problems, but once the project becomes bigger, the outputs start getting inconsistent. Sometimes the AI misses obvious architectural problems, sometimes it suggests things that don’t fit the system at all, and sometimes it completely loses track of how different parts of the codebase interact.

After testing this for a while, I realized the issue is usually not the model itself.

The issue is context.

Most current AI coding workflows rely on fragmented context. The model only sees disconnected chunks of the system, so it never develops a real understanding of execution flow, runtime behavior, dependency direction or how services interact together.

The Workflow I Started Using

So I tried something different.

I wrote a small script that scans my repository and bundles important files into one large file. Not literally every file, because that creates too much noise, but the parts that actually explain how the system works. Things like schemas, orchestration logic, parsers, graph systems, runtime handling, caching, workers, dependency management and other core logic.

Then instead of directly asking Claude to analyze the codebase, I first use ChatGPT to generate a much deeper investigation prompt.

Not generic prompts like:

“find bugs”

But prompts focused on things like:

architectural analysis
performance bottlenecks
scaling risks
hidden coupling
unnecessary abstractions
memory pressure
concurrency issues
runtime inefficiencies

Then I paste both the bundled code file and the generated investigation prompt into Claude.

The difference in output quality is honestly huge.

Why I Think This Works

The model starts reasoning about the system much more coherently because it can actually see relationships between different layers of the application.

It starts identifying patterns that are difficult to notice when looking at isolated files — duplicated logic, circular dependencies, architectural inconsistencies, bad retry handling, unnecessary complexity and hidden performance issues.

What became interesting to me is that this doesn’t really feel like “better prompting”.

It feels more like giving the model enough architectural visibility to reason properly.

I think a lot of current AI tools behave like they’re trying to understand a large system while only seeing tiny windows into it. Once the model can access enough of the architecture at once, the quality of reasoning changes significantly.

Where This Approach Breaks

I also noticed there’s a limit to this approach.

If you dump too much irrelevant information into context — logs, generated files, lockfiles, vendor code, build artifacts and other noisy files — the quality drops again. The model starts focusing on the wrong things or missing important details completely.

So now I’ve been experimenting more with what I’d call context engineering instead of just prompt engineering.

Another thing that improved results a lot was feeding structural information before the raw code itself. Things like dependency graphs, import relationships, architecture overviews and execution flow maps help the model build a mental representation of the system before reading implementation details.

That alone improved the consistency of outputs quite a bit.

Context Engineering > Prompt Engineering

Honestly, I think this is where AI-assisted development tools are heading.

Not just bigger models, but better context systems.

Right now most coding agents still behave like they’re reading code through a keyhole. They only see tiny sections at a time, which makes deeper reasoning difficult on larger projects.

Once the model can understand more of the system at once, the analysis becomes much more useful for:

debugging
architecture reviews
performance analysis
scaling research
understanding unfamiliar codebases

This workflow still has limitations and it definitely doesn’t magically solve hallucinations or reasoning errors.

But compared to normal snippet-based prompting, the results have been noticeably better.

Final Thoughts

I’m curious if other people are experimenting with similar workflows too, because it feels like we’re still very early in figuring out how AI should actually interact with large codebases.

A lot of people are focused on better models, but I think context handling is becoming just as important.

And honestly, whoever solves context engineering properly is probably going to build extremely powerful developer tools.