Over the past few months, I’ve been investing a lot of time building agentic development workflows for real production environments.
Not only prompts.
Actual operational environments around agents.
Things like skills, execution tooling, validation layers, testing flows, memory handling, Git integrations, and constrained execution paths.
One thing became very clear very quickly.
Using agents in legacy or mission critical systems without a proper harness can become dangerous surprisingly fast.
Especially in financial systems.
Even with specification driven development (SDD), detailed tasks, and explicit instructions, I noticed a recurring problem.
The agent would correctly implement the requested functionality, but at the same time introduce large unintended changes across the codebase.
Not because the model was “bad”.
But because the environment still gave it too much freedom.
A small business change could suddenly trigger a massive refactor in tightly coupled parts of the application.
The functionality worked.
But reviewing the pull request became painful.
Risk analysis became harder.
The blast radius became unpredictable.
And in highly sensitive systems, this matters a lot.
The Shift That Changed Everything
To address this, I started combining a few ideas together:
- TDD
- Harness Engineering
- The Seam Model from Michael Feathers
- Constrained execution environments for agents
- This changed the workflow completely.
Instead of letting the agent freely reshape large parts of the codebase, I started designing the environment to naturally constrain behavior.
The agent now operates through a harness I built around it.
This harness provides structured skills and controlled capabilities such as:
- Reading specific files
- Analyzing code diffs
- Running tests incrementally
- Validating architectural constraints
- Checking impacted dependencies
- Generating isolated implementations
- Blocking risky operations
One of the biggest improvements came from applying the Seam Model mindset.
“A seam is a place where you can alter behavior in your program without editing in that place.”
— Michael Feathers, Working Effectively with Legacy Code
Instead of modifying deeply coupled code directly, the agent identifies stable seams where behavior can be isolated safely.
Then new functionality gets introduced incrementally behind those seams.
This dramatically reduces unintended side effects.
Critique and Validation Skills
Another important part of the harness is the critique and validation layer.
The agent is not only responsible for generating code.
It also needs to review its own changes against explicit acceptance criteria and architectural constraints.
I created specialized skills focused on critique workflows, where the agent analyzes the generated diff and verifies things like:
- Did the implementation fully satisfy the acceptance criteria?
- Did the agent modify unrelated modules?
- Did it introduce unnecessary refactors?
- Did it violate architectural boundaries?
- Did it expand the blast radius beyond the intended scope?
- This changes the workflow significantly.
Instead of treating code generation as the final step, generation becomes only one phase inside a larger controlled execution pipeline.
In practice, this dramatically improves reviewability and reduces the risk of unintended modifications in legacy or mission critical systems.
Practical Example
Imagine a legacy financial reconciliation service.
A new business rule needs to be introduced into the settlement calculation flow.
Without constraints, the agent might attempt to “improve” the architecture while implementing the feature.
Suddenly:
Shared abstractions get rewritten
Core flows get reorganized
Multiple services are refactored together
Dozens of unrelated files change
Technically impressive.
Operationally dangerous.
With the harnessed approach, the flow becomes very different.
The agent:
Identifies stable seams in the codebase
Creates isolated extension points
Implements behavior incrementally
Runs targeted tests after every step
Validates architectural boundaries
Restricts modifications outside approved scopes
Critiques its own generated diff against acceptance criteria
The final result is much smaller, easier to review, safer to deploy, and significantly more predictable.
The Most Interesting Part
What surprised me most is that the value was not only personal productivity.
The biggest impact came after I shared these agents, skills, and harness environments with the engineering teams I lead.
Now the entire team benefits from the same operational guardrails.
Developers can leverage the toolkit to:
Reduce risky refactors
Improve reviewability
Increase delivery confidence
Work more safely in legacy systems
Move faster without increasing instability
This starts creating organizational leverage, not just individual acceleration.
And honestly, this is where I believe a huge part of software engineering is heading.
The conversation is moving far beyond prompt engineering.
The real challenge is designing reliable operational environments where agents can safely participate in software delivery pipelines.
Especially in systems where reliability matters more than raw speed.
Final Thoughts
I don’t think agents replace engineering discipline.
Actually, I think they amplify the importance of it.
The better the engineering foundations, the more powerful these systems become.
TDD becomes more important.
Architectural boundaries become more important.
Observability becomes more important.
Validation becomes more important.
Harness design becomes more important.
The model is only one part of the system.
The environment around it is what determines whether the outcome is production ready or operational chaos.
Top comments (0)