DEV Community: Muthu C

Spec2PR: The Design-First Pipeline

Muthu C — Tue, 19 May 2026 17:58:53 +0000

Spec2PR: The Design-First Pipeline

On a large delivery program, one of the most expensive things you can do is start a LARB (Architecture Review Board) review with an incomplete document. I know because I spent months watching it happen.

The architecture team met twice a week, three hours combined, to review LLDs (Low Level Design documents) before handing them to engineering squads for development. Almost none passed the first time. The same security architect asked the same questions in every session. Error handling, data classification, alerting, unhappy paths. And the architects presenting the next LLD had still not incorporated those comments from the week before.

When the LLD did eventually make it to the engineers, the handshake had its own problem. Junior engineers did not ask the clarifying questions that a senior engineer would. They took the document at face value and started building, gaps and all.

I don't fully blame the presenters. Business teams often don't share the unhappy path. But as an architect, my job is to build a fool-proof design and that means those gaps have to be caught somewhere.

Adding AI to the Review Process

Around the same time, I was already building a VS Code extension to explore how AI could improve developer productivity. I had added a command for code reviews and thought, why not add LLD review too?

The idea was simple. The tool reads the LLD document and checks it against a checklist: error handling scenarios, security requirements, data classification, alerting and notifications. These were exactly the questions that kept coming up in every LARB session. It gives the document a percentage score for development readiness and lists what is missing.

The architecture team's reaction was positive. They acknowledged the gaps the tool surfaced and appreciated how thorough it was. But more importantly, those conversations could now happen before the formal review, not during it.

Giving Architects a Head Start

As the project grew, leadership had limited visibility into what was happening across squads. With multiple teams running in parallel, the handshake points between business, architecture, and engineering were hard to track. Jira was already the source of truth for the project, so I had to build Jira integration anyway for the code development workflow.

Once that pipeline was in place, I added two more commands. Summarize Jira reads the story and gives the architect a concise picture of what the business is asking for. Generate LLD takes that further. It prompts the architect with specific questions, builds a full document across 10 sections, and covers all the NFRs (non-functional requirements) including error handling, security, data classification, and alerting. The architect still has to review it and fill in company-specific details and standards. But the structure is there, the questions are answered, and the document is ready to take into a review.

From Approved LLD to Closed Story

Once the LLD is approved, the next command is Generate Jira Story from LLD. It reads the approved design, asks clarifying questions, generates an OpenAPI spec for any RESTful API stories, and creates the development stories in Jira. Not just one story. It breaks the work down into a development story, a database migration story, and any other stories needed. Each one has enough detail for an engineer to actually start work.

From there the developer takes over. They run Implement Jira Story and pick the story they are working on. The tool asks a few clarifying questions: which programming language, which LLD, which OpenAPI spec to use. These choices are already available from the earlier steps in the pipeline, so it is not starting from scratch. Once confirmed, it generates the code, creates the GitHub branch named after the Jira story, commits the changes with full context, and updates the Jira story with the right comments automatically.

When the developer is satisfied the code is working, one more command submits the PR and closes the Jira story. The whole thread from approved LLD to closed story is traceable.

Guardrails: Making AI Consistent

Early on I ran into a problem with code generation. One run would produce a Maven project. The next run would produce a Gradle project. Spring Boot versions were inconsistent across generated services. The AI was making its own choices every time, and there was no guarantee two developers on the same project were getting the same foundation.

The fix was a templates folder. It contains sample Spring Boot and .NET projects with the correct version dependencies, the database frameworks the project uses, and the security principles to follow. When the extension builds the prompt for code generation, it pulls from these templates. The AI is no longer making those choices. They are already made. Every generated service comes out on the same stack, with the same versions, following the same standards.

For a principal engineer who is accountable for what goes into production, that consistency matters. You are not reviewing each PR wondering what decisions the AI made this time. You already know.

What This Makes Possible

I did not build this to replace engineers or architects. I built it because the gap between a business requirement and a production-ready service is full of manual steps that slow everyone down and introduce inconsistency at every handoff.

What I want people to take away from this is not the tool itself. It is the idea that AI can do more than generate code. When you set the right guardrails, give it the right context, and connect it to the right systems, it can carry the thread across the entire delivery lifecycle. And the principal engineer who is accountable for the output can feel confident because the standards were baked in from the start, not reviewed in at the end.

This article is part of the **Spec2PR* series on Intelligent Software Delivery.*
DevEx AI Assistant — AI-powered SDLC acceleration for engineering teams.

The Agent That Created 107 PRs (And Why That Was the Problem)

Muthu C — Mon, 18 May 2026 14:33:02 +0000

The Agent That Created 107 PRs (And Why That Was the Problem)

One of our leaders has a way of framing AI initiatives that I find genuinely useful. Three buckets: Vibe Coding, Professional AI Assistant, and Autonomous Agents. I won't unpack it further, but if you work in a large engineering org right now, you probably recognise all three.

The push has been toward that third one. And the metrics look good. Story points closed. Alerts resolved. PRs raised. Numbers that are easy to put on a slide for the CIO.

This is a story about what those numbers don't show.

107 Pull Requests on a Monday Morning

We had a backlog of code scanning alerts. The kind that quietly builds up over months because no sprint ever picks it up and no one owns it. Security debt, sitting in a dashboard.

Someone decided to assign an autonomous agent to clear it.

By Monday morning, the agent had reviewed every alert, produced a fix for every one, and opened 107 pull requests for engineers to review and approve.

On a leadership slide, this is a win. Backlog cleared overnight.

What the Metrics Don't Capture

Someone still had to read those 107 PRs. Not skim them. Actually understand them.

Each one required an engineer to figure out what the original alert was, read what the agent changed, decide whether the fix was correct for this specific codebase, and check whether it introduced new risk.

The agent didn't know the context. It was not asked to reason about it. It was asked to act.

The story points looked great. The review queue told a different story.

The Fix That Wasn't Really a Fix

Here is the part I keep thinking about.

One of the alerts was a Java trust boundary violation in an old codebase. Code from the 1990s that, during build time, pulls in source from another project. The agent saw the violation, could not change the original source, and wrote a Python script to handle it at build time.

Technically, the alert was resolved.

But any engineer who has worked on legacy systems knows what that means. You now have a silent dependency. If anyone changes the original code, for any reason, the Python script breaks. There is no warning. There is no test. The fix works until it doesn't, and when it stops working, the person debugging it has no idea why.

Was that the only way to address it? Maybe. This is old code and the constraints were real.

But that is exactly the conversation that should have happened before anyone wrote a single line. Not "here is the fix." Instead, "here is what I found, here is why it's complicated, here is one option and here is what it risks."

The agent skipped that conversation entirely.

Two Different Dashboards

A CIO sees 107 security issues resolved by an autonomous agent overnight. That is a real number and genuinely impressive work.

A Principal Engineer sees 107 changes that need to be validated before trusting any of them, plus at least one fix that trades a known vulnerability for an invisible fragility.

Neither view is wrong. They are measuring different things.

The CIO is measuring output. The Principal Engineer is measuring trust. The cost of validating an agent's work at scale does not show up in story points. And when leadership optimises for the number without understanding what the number measures, the gap between those two dashboards gets wider.

What I Actually Want to Know

I don't have a tidy answer here. But I am curious whether others have seen this same dynamic, where the metrics say one thing and the engineers feel something different.

A few honest questions:

Have your AI agent metrics told a different story than what your engineers experienced on the ground?
How do you measure the review burden an agent creates, not just the output it produces?
Is your organisation pushing toward autonomous agents right now, and if so, what does human oversight look like in practice?
And when an agent finds something genuinely complex, something with history and context and tradeoffs, what should it do? Act, and let a human review it? Or stop and ask first?

I want to hear the real stories in the comments. The ones that didn't make it onto the slide.

Spec2PR: Reimagining the SDLC for Intelligent Software Delivery

Muthu C — Mon, 18 May 2026 03:24:42 +0000

Spec2PR: Reimagining the SDLC for Intelligent Software Delivery

The future of software delivery is not faster coding. It is intelligent orchestration.

I didn't set outto rethink the software delivery lifecycle.

I set out to help engineers ship faster. What I discovered along the way changed how I think about AI, engineering systems, and what the real bottlenecks in software delivery actually are.

This is the story of that journey — and the thinking behind Spec2PR.

The original goal: AI as a coding accelerator for teams

Like most engineering leaders who started exploring AI tools in the last few years, my initial hypothesis was straightforward: give engineers AI-assisted coding capabilities and they will move faster.

The goals were pragmatic:

Accelerate delivery velocity
Help junior engineers ramp up faster
Reduce onboarding friction
Improve implementation consistency
Give teams access to AI accelerators without requiring each engineer to become a prompt expert

This seemed reasonable. AI models were getting better rapidly. Code generation quality was improving. The value proposition appeared obvious.

So we built Spec2PR as an AI-assisted platform — an internal accelerator that would help engineering teams generate code faster and get more done with less effort.

For a while, it worked. And then the problems started.

The first realization: AI output drifts without engineering structure

The first cracks appeared gradually, then suddenly.

Without clear guidance, AI-generated code started drifting away from our company standards. Not in obvious, breaking ways — but in subtle, cumulative ways that created real engineering debt.

We observed:

Inconsistent implementations of similar patterns across teams
Architectural drift away from established platform standards
Governance gaps — AI-generated code that worked locally but violated organizational constraints
A dangerous dependency on individual prompt quality
Increased rework as teams discovered the drift during reviews

The frustrating part was that the models were not getting worse. The code was often syntactically correct and locally functional. The problem was not the AI's intelligence.

The problem was the missing engineering structure around it.

"Prompt engineering was becoming accidental architecture."

Each engineer was making micro-decisions about how to prompt the AI. Those micro-decisions accumulated into macro-inconsistencies. The platform was fast, but it was generating inconsistency at scale.

This was the first real lesson: AI tools amplify whatever context they are given. Without engineering structure, they amplify inconsistency.

Structuring engineering conversations with the RTCFR framework

The response to this problem was not to constrain the AI. It was to structure the conversation.

We introduced the RTCFR framework — Role, Task, Context, Format, Report — a structured approach to encoding the right engineering information into every implementation workflow before a single line of code is generated.

The difference is easier to see than describe. An unstructured prompt looks like:

"Build a REST API endpoint for user authentication."

An RTCFR-structured workflow looks like:

"You are a senior backend engineer on a Java Spring Boot platform (Role). Implement a JWT authentication endpoint (Task). The service must meet our internal security standards, integrate with our existing OAuth provider, handle 10k RPS, and emit structured logs to our observability stack (Context). Output production-ready code with unit tests following our naming conventions (Format). Flag any assumptions about the security model (Report)."

Same request. Completely different output quality.

The goals were clear:

Structure engineering conversations before code generation begins
Embed engineering rigor into the workflow itself, not as a manual checklist after the fact
Abstract complexity away from junior engineers so they benefit from senior engineering thinking by default

The results were meaningful. Consistency improved. Drift reduced. Junior engineers were producing outputs that reflected organizational standards they had not yet internalized on their own.

But something more important had shifted conceptually.

The platform was no longer just generating code. It was operationalizing engineering thinking.

The AI had become a delivery layer for structured engineering intent — encoding how senior engineers think about problems and making that thinking available at every implementation step.

"Locally correct code can still create globally inconsistent systems."

Fixing prompt inconsistency was valuable. But it had revealed something deeper.

The next bottleneck: upstream context quality determines downstream implementation quality

With implementation workflows now structured, I expected the quality problems to largely disappear.

They did not.

A new pattern emerged. Even when engineers followed the RTCFR framework correctly, the quality of the output was still constrained by the quality of the inputs coming from upstream.

Low-level design documents produced by architects frequently arrived without:

Non-functional requirements (NFRs)
Scalability considerations
Observability and monitoring requirements
Reliability and fault tolerance expectations
Operational runbook context
Security considerations
The broader "-ilities" that experienced engineers know to ask about

The AI faithfully implemented what the LLD described. And the LLD was incomplete.

"The further upstream I moved in the SDLC, the more I realized code generation was never the real problem."

This was a significant realization. We had been optimizing a downstream symptom. The root cause was upstream context quality.

If the engineering intent captured at the design stage was incomplete, no amount of implementation optimization would fully compensate. The system was propagating incomplete intent with high efficiency.

Context degradation across the SDLC — how intent gets lost at every handoff

Stepping back and looking at the full delivery lifecycle, a clear pattern emerged.

Engineering intent degrades at every handoff across the SDLC:

Product intent is captured with clarity in strategy sessions, then diluted into vague user stories
User stories are handed to architects, who produce LLDs that may capture the functional requirement but lose the operational and NFR context
LLDs reach implementation teams, stripped of the architectural reasoning and tradeoff decisions that informed them
Implementations reach operations, where the teams responsible for running the system encounter its operational realities for the first time

By the time code reaches production, much of the original engineering intent has been lost. What remains is a functional implementation that may technically meet the specification while failing to reflect the full engineering thinking that went into the design.

"Software delivery problems are coordination problems, not coding problems."

This is the systems-thinking insight at the core of Spec2PR's evolution: the bottleneck was never code generation speed. The bottleneck was intent fidelity — the ability to preserve engineering thinking accurately as it moves through the delivery system.

AI coding tools address a real problem. But they address it at the wrong layer.

Intelligent SDLC orchestration: the core thesis and what comes next

This is where Spec2PR evolved from an AI coding accelerator into something more fundamental.

The platform began evolving toward a different goal: preserving and propagating engineering intent across the entire SDLC, from requirements through architecture, implementation, operations, and feedback.

The thesis crystallized into what I now call Intelligent Software Delivery:

Intelligent Software Delivery treats software engineering as a continuous, context-aware orchestration problem — not a sequence of isolated tasks.

Where today's AI tools focus on local optimization, Intelligent SDLC Orchestration operates at the system level:

Today's AI Tools	Intelligent SDLC Orchestration
IDE-centric	SDLC-centric
Stateless prompts	Persistent engineering context
Local code optimization	System-wide delivery optimization
Reactive assistance	Proactive orchestration
Code generation	Intent preservation
Developer productivity	Delivery intelligence

Code generation is still part of this system. But it becomes one capability within a much larger engineering coordination layer — not the goal itself.

AI becomes transformative when it understands engineering systems, not just source files.

Closing: what I learned by going upstream

I started this journey trying to make engineers faster. What I found was a different problem entirely.

The teams that struggled most with AI-assisted development were not struggling because the models were insufficient. They were struggling because their delivery systems lacked the structure to give AI the context it needed to be useful at scale.

AI models are context amplifiers. The quality of what comes out is bounded by the quality of what goes in. And the quality of what goes in is an organizational problem, not a tooling problem.

"I started by trying to accelerate coding. I ended up realizing the real bottleneck was coordination."

The future of software delivery is not faster coding. It is intelligent orchestration.

That is the idea behind Spec2PR — and the thread I will continue pulling on in this series.

This article is part of the **Spec2PR* series on Intelligent Software Delivery.*
DevEx AI Assistant — AI-powered SDLC acceleration for engineering teams.