DEV Community: Mark Ward

Applying a Systems Engineering Framework to Agentic Coding: Why Prompts Fail and Structure Wins

Mark Ward — Fri, 29 May 2026 06:50:45 +0000

Agentic AI coding tools are transforming how we build software. But they share a fundamental constraint: context windows are finite, and as chat sessions grow, AI performance degrades, a phenomenon Anthropic calls context rot. The model loses its grip on early instructions, leading to a frustrating "fix-it loop" where the agent fixes one thing but breaks another.

Most of us prompt an agent, let it write code, review it, and repeat. This works beautifully for prototypes. But when you need to build a stable, full-featured product with hundreds of mission-critical acceptance criteria (AC), "vibe-coding" breaks down.

The reality is that you get better behavior from agents the same way you get it from humans, by explicitly capturing what good and bad look like, and checking against it.

Coming from a systems engineering background in regulated industries, I knew we needed to stop treating agents like conversational chat buddies and start treating them like engineering assets. That's why I built DevCortex: a purpose-built structured intelligence layer that brings systems engineering discipline to agentic workflows.

What is DevCortex?

DevCortex is an agentic development platform built on one core idea: AI agents work best when they have structured, queryable access to a database of requirements they can interrogate on demand, not a wall of text in a prompt.
It sits between the human specification and AI execution using three components:
1. An Agentic-V Model Database: A structured hierarchy mapping your high-level vision (ConOps) to system specs (Specs), individual requirements (Reqs), linked defects (Issues), and an auto-generated Traceability Matrix.
2. An MCP Server: Delivers just-in-time, high-signal context to tools like Claude Code or Open Code. Instead of dumping requirements upfront, the agent queries exactly what it needs, when it needs it.
3. Human Control Planes (Web UI & CLI): A multi-user Web UI with real-time WebSocket feeds to watch your agent work, plus a powerful dcx CLI for power users and CI pipelines.

Putting it to the Test: The KiroPyUnitConverter

In a recent test I compared using DevCortex and AWS Kiro to build a Python CLI unit converter. The project had 8 requirements and 31 acceptance criteria.

Step 1: Import the Spec
For this test I used dcx CLI tool to import the Kiro requirements.md file directly into DevCortex, (Alternatively, I could have loaded the Spec and Reqs via the DevCortex WebUI or get the integrated AI Assistant to create them for me):

dcx init dcx import kiro ./requirements.md
(Result: 1 Spec, 8 reqs, 31 ACs populated and ready for the agent).

Step 2: Feed the Agent the Workflow
I then gave Claude Code a simple workflow via our Model Context Protocol (MCP) server:
1. Call dc_get_backlog to retrieve requirements.
2. For each requirement, fetch full AC details using dc_get_requirement.
3. Implement the code, write the tests, and verify.
4. The "Good vs. Bad" Rule: Only call dc_update_verification to mark an AC as PASSED if a test explicitly proves it.

Step 3: Watch it Work

Once executed, Claude Code didn't just blind-code. It systematically checked project health, pulled the requirement backlog, built an architectural map, and rigorously tracked its own verification.
Here is what the terminal trace looked like as it interacted with the DevCortex MCP server:

● Running dc_health and fetching the full backlog...
↳ Called devcortex MCP server
● 8 requirements found. Fetching full AC details in parallel...
↳ Called devcortex 8 times
● All 8 requirements read. Generated implementation plan:

kiropy/units.py -> REQ-002 (categories & factors)

kiropy/converter.py -> REQ-001, REQ-003, REQ-007 (logic & accuracy)

kiropy/printer.py -> REQ-004 (formatting & precision)

kiropy/cli.py -> REQ-003, REQ-005, REQ-006, REQ-008 (arguments & exit codes) ● Key interpretation decisions recorded:

"6 significant decimal places" (REQ-004 AC-2) interpreted as 6 digits after decimal point.

Absolute zero check (REQ-003 AC-5) to block invalid physical conversions (e.g., -300°C).

Round-trip accuracy (REQ-007 AC-2) via base-unit intermediary with zero intermediate rounding. ● Implementing project structure, test suites, and source files... ● All files written. Running test suites: 48/48 tests pass. ● Smoke tests pass. Recording verification results back to DevCortex... ↳ Called devcortex 32 times (Updating 29 distinct Acceptance Criteria to PASS) ↳ Called devcortex 8 times (Marking all 8 REQs as VERIFIED) ● Committing and closing out session. ✔ Project structure configured ✔ Core logic implemented & verified against trace matrix.

The Result

The application was built flawlessly with all 31 acceptance criteria verified and marked PASSED.
More importantly, the traceability matrix completely bridges the trust gap. Every single AC links directly to a named test and explicit evidence. If you review this codebase in six months, you will know exactly why and how every requirement was fulfilled.

What this taught me about Agentic Development

Structured requirements reduce drift: When an agent is bound to a structured backlog contract, it radically reduces hallucinating features or skipping "trivial" requirements.
Evidence-based verification reduces errors: Requiring the agent to provide test proof caught instances where the AI's initial code passed a shallow test but missed the spirit of the AC. The agent caught its own gaps and fixed them before claiming completion.
Effective use of agent context increases determinism: Given LLMs are constrained by a finite attention budget, enabling the coding agent to fetch the specific requirements and ACs as they need them helps reduce them maintain focus on the job at hand.

Try it Yourself

DevCortex is now available at devcortexai.com with a free tier.
You can also install the CLI right now via npm:
npm install -g @devcortex/cli
Check out the getting started guide to connect Claude Code, or OpenCode via MCP, run your first verified build, or read our Case Study about building a full stack Career Journal App with DevCortex.
If you're working on agentic systems engineering or requirement-driven development, I'd love to compare notes in the comments below!

An Olympic Windfoil (IQFoil) race and training session performance analysis toolset

Mark Ward — Wed, 04 Mar 2026 03:34:57 +0000

1. What I Built: The SailMetrics Ecosystem

For this project, I tackled the "invisible" challenge of iQFOiL windfoiling: the high-speed, 6-degree-of-freedom (6-DOF) physics that occur beneath the water's surface. To help athletes analyze and improve their performance, I built a three-tier telemetry ecosystem:

The SailMetrics Kotlin App: Running on a Google Pixel 3a (flashed with LineageOS), this edge-device app serves as the "black box." It polls the IMU (accelerometer/gyroscope), GPS, and barometer at high frequencies to capture the motions of the windfoil board in multiple dimensions as well as display basic speed and heading feedback to the rider.

Windfoil Visualizer: A Python/Matplotlib dashboard that renders a 3D "Digital Twin" of the board’s orientation using a model of the board rendered in 3D space as well as graphs fusing the IMU sensor data to recreate the board's pitch and heel and allows user to replay and analyse each second of their session. Multiple sessions can also be compared with each other.

SailPerfView (web viewer): An online session analysis tool that allows sailors and coaches views of the session data recorded either by the SailMetrics Andoid App or a Garmin device (FIT file), rendering the sailing session track on a map and displaying insights into the sailing performance with VMG Polar Chart, Performance Statistics, and Interactive Metrics chart with synchonised hover markers linked to the track map.

The Role of Google Gemini:
Gemini acted as my a technical adviser, not only writing code snippets but also helping bridge the gap between abstract fluid dynamics (based on research papers like Urbański’s "Theoretical investigation of pitch control") and tangible Python/Kotlin code. It was instrumental in implementing the Pitch Phase Portrait and other graphs visualising the 3D data.

2. What I Learned: From Theory to Telemetry

The biggest technical takeaway was Sensor Fusion and Noise Mitigation.

Filtering the "Invisible": I learned how to extract "Foil Height" from a barometric pressure signal. Gemini helped me design a three-stage filtering pipeline: noise smoothing, rolling baseline subtraction, and height derivation.

Performance Optimization: Animating 3D geometry in Matplotlib can be sluggish. Gemini suggested methods to improve UI responsiveness when replaying the data timeline animation.

Soft Skills: I learned the value of "Hardware Recycling." Using Gemini to troubleshoot LineageOS-specific sensor API calls on a Pixel 3a proved that high-end sports tech doesn't require high-end modern hardware—just smart software.

3. Google Gemini Feedback: The Good, The Bad, and The Ugly

The Good (The "Aha!" Moments):
Gemini’s ability to "read" and intepret academic papers (like the Urbański paper) and suggest how to translate formulas into Python pandas logic.

Google AI support in both Android Studio for developing the Kotlin App and VSCode for developing the Python backend code provided consistency and convenience.

The Google AI enabled development toolset and workflow enabled very rapid development of a Proof of Concept that I was able to start collecting data in the space of a weekend with a basic visualisation app. While it took a little longer to build more sophisticated visualisation dashboards incorporating 3D data fusion, it is possible to experiment with different ideas without much effort.

The friction (The "Candid" Feedback):
To be honest this project went very smoothly as it was approached in disrete stages without complete interdependencies, and code changes were managed with git.

4. Looking Forward: The Next Reach

More testing and data gathering is required to validate the current functionality, I am also interested in incorporating AI into the Visualisation dashboards so that it can access the session data directly and provide insights into sailor performance and windfoil board and foil tuning, such as rake and stabiliser settings.

The project was inspired by amazing data analytics that is occuring in the Americas Cup (AC75) and SailGP (GP) foiling classes, and how that has helped the crews. The goal of this project is to provide windfoil sailors access to some similar data insights at a much lower cost by leveraging older smartphones as data recorders and the power of Google AI.

This project proved that with Google Gemini, an amateur builder can take complex academic theory and turn it into a tool that helps athletes fly higher and faster.

References

Urbański, O. (2023). Theoretical investigation of pitch control and stability for hydrofoiling windsurfing. Adam Mickiewicz University, Poznań.
Politecnico di Milano. (2023). Study of iQfoil’s settings and performance investigation through 6 DOF dynamic model. Master’s Degree Thesis. Supervisor: Prof. Giuliana Mattiazzo.