DEV Community: dimitri

Agentic Engineering Journey — Brain Dump

dimitri — Fri, 03 Apr 2026 12:27:37 +0000

1. Where It Started: Memory and Context

I started with Claude Code around April 2025. The first real step was recognising that Claude's native memory was essentially useless. The workaround was using markdown files as persistent memory stores, editable both through Claude and tools like Cursor. That opened the door to storing not just session notes but also instructions, roles, and agent skills — anything that would otherwise be forgotten across context resets.

But the fundamental problem remained: at some point the context window fills, the model gets amnesia, and starts behaving destructively. Cursor handled this somewhat better at the time. Gemini had an edge due to its larger context window (already at 1M tokens), though at a cost. Neither was a real solution.

2. The Core Principle

Taking a step back from tooling led to the central insight the whole framework is built on:

The better the prompt, the better the output. The better the instruction — and the context around it — the higher the likelihood of a good result.

This is no different from how you'd brief a human. Context, clarity, traceability, constraints — all of it shapes the quality of what gets executed. The question became: how do you systematically generate and maintain that context?

3. The Agentic Engineering Framework

To produce good, consistent context, you need to capture:

What has been done before — every instruction, tool call, output, error, pivot, and decision
What the goals and architecture look like — what was decided and why
What is connected to what — if you change function X, what does that break elsewhere?

This last point introduced the concept of the blast radius — borrowed from physical and industrial engineering. It describes the potential impact zone of any given change.

Context Fabric

Captures the full history of work: what was done, what failed, what changed, what was decided. When an agent starts a new task, it can look back at relevant prior context rather than starting blind.

Component Fabric

Provides structural awareness — understanding how components relate to each other so that an action's blast radius can be assessed before it's taken.

The Prime Directive

Nothing gets done without a task. Every action must be linked to a task. This enforces traceability and prevents autonomous drift.

Enforcement is the hard part. Git hooks work sometimes. Claude Code doesn't reliably respect the constraint — partly due to the stochastic nature of LLMs, partly due to permissive execution environments. If broad tool permissions are granted, there's nothing structurally preventing the model from bypassing the rule.

4. TermLink

The challenge: how to coordinate multiple agents in a reliable, deterministic way.

The idea behind TermLink is that if terminal sessions are initialised in a known state, you can inject into them — essentially simulating a USB keyboard over the terminal link, sending ASCII sequences directly to the session.

In practice this works well. The weak point is that Claude Code sometimes falls back to calling claude -p through PTY rather than opening a terminal and running it properly. That loses the interactive feedback loop — the back-and-forth that makes the coordination meaningful.

TermLink is also now using a network socket interface. This opens up communication across machines — and with it, the possibility of real orchestration: routing tasks to different agents, mixing providers, and matching the right model to the right type of work reliably.

5. Proof Is in the Pudding

The proof is in the eating of the pudding. I'm using the framework to build real things I can use, and that's where you find out if it actually works.

Open-Claw ingestion: Took the open-claw codebase, ingested it through the context fabric, exposed it for browsing and querying. Used it to extract improvement ideas for the Agentic Engineering Framework itself. The model identified enhancements, formatted them against the standard task structure, and dispatched them to the TermLink agent, which pulled from the knowledge repository and started working autonomously. It worked.

Email archiver: Started as a utility to consolidate ~70K emails across Hotmail and Google domain accounts into a single searchable archive (useful for things like digging up tax receipts). Evolved into a fuller email client with AI capabilities — translation, generation, support for both local and remote models. Still in progress. A rough first release has been pushed to GitHub. The focus has shifted toward a more controlled, personal-assistant-style interface rather than trying to match full open-source alternatives.

If you want to test-drive the OpenClaw Fabric Explorer or the AI Email Personal Assistant, drop a comment: EXPLORER. You find the other repo's here:
The Engineering Framework
Termlink

Blast Radius - Series: Agentic Engineering Framework

dimitri — Mon, 16 Mar 2026 11:55:51 +0000

Why every AI agent commit should know what it touches before it lands

In enterprise programme management, the most expensive failures are not the ones that break one thing. They are the ones that break one thing which quietly breaks six others.

A programme manager who modifies a shared interface — a data format, a handoff protocol, a service contract — without checking who depends on it will discover the damage days later, through complaints from teams downstream. The fix was a 10-minute change. The blast radius was three workstreams, two weeks of rework, and a missed milestone. The root cause was not incompetence. It was invisibility: the dependency chain existed, but nobody could see it before committing the change.

AI coding agents have this problem in a more concentrated form. A human developer builds tacit knowledge over months: "if I change the auth module, I need to check the middleware." An AI agent has no tacit knowledge. Every session, the codebase is fresh. It sees the file it is editing. It does not see the six files that import from it, the three templates that render its output, or the two hooks that trigger on its changes.

The Agentic Engineering Framework tracks every significant component, the subsystem it belongs to, and the dependency edges between them. Without visibility into that graph, an agent editing one node is flying blind about all the others.

What blast radius analysis does

Before a commit lands, blast radius analysis answers one question: what does this change touch beyond the files you edited?

$ fw fabric blast-radius HEAD

Blast radius: HEAD
  T-503: TermLink Phase 0 complete — fw route, help entry, CLAUDE.md section

bin/fw (fw)
    writes → agents/termlink/termlink.sh
    writes → agents/context/context.sh
    writes → agents/handover/handover.sh

Registered component(s) changed: bin/fw

The commit modified bin/fw. The Component Fabric knows that bin/fw calls dozens of other scripts and is in turn called or sourced by dozens of components. That single file change has a wide potential blast radius. The output makes this visible before anyone discovers it through a failure.

How it works

The Component Fabric maintains a YAML card for every significant file in the project:

# .fabric/components/fw.yaml
id: fw
name: fw
type: script
subsystem: framework-core
location: bin/fw
purpose: "Main CLI entry point — routes to all agents and subsystems"
depends_on:
  - target: create-task
    type: calls
  - target: update-task
    type: calls
  - target: context-dispatcher
    type: calls
  # ...
depended_by:
  - source: plugin-audit
    type: called_by
  - source: fabric
    type: called_by
  # ...

When fw fabric blast-radius HEAD runs, it extracts changed files from the commit via git diff-tree, looks up each file's component card, and reports its dependency edges. The algorithm is deliberately shallow — direct dependencies only. Full transitive impact (what depends on the things that depend on your change) is available via fw fabric impact <path>, but the blast radius command optimises for speed and clarity at commit time.

Files without a component card show as unregistered — a signal that the fabric has drifted and needs updating:

  CLAUDE.md (no fabric card)

This is not a failure. It is a nudge. Drift detection (fw fabric drift) catches unregistered files, orphaned cards, and stale dependency information systematically.

Where it runs

Three enforcement points, from softest to hardest:

Post-commit hook. Every commit triggers a blast radius summary automatically. The agent sees the output immediately after committing. If a changed component has high connectivity, the hook flags it with a warning. This catches the common case: an agent edits a utility function without realising it is called by multiple subsystems.

CLAUDE.md procedural rule. Before setting any task to work-completed, the agent must run blast radius analysis if source files changed. This is a governance instruction — the agent's operating manual says to do it. It is soft enforcement: the agent can forget, but the rule is explicit.

Verification gate. Tasks can include fw fabric blast-radius HEAD in their ## Verification section. The completion gate runs these commands mechanically. If blast radius shows unexpected impact, the task author can make it a blocking check. This is hard enforcement: the task cannot complete until impact is reviewed.

The value of visibility over prevention

Blast radius analysis does not block commits. It does not reject changes. It does not force the agent to update all downstream files before proceeding. This is a deliberate design decision.

Prevention — blocking a commit because it affects too many files — would be brittle and counterproductive. Some high-impact commits are necessary. A deliberate refactoring that touches many components is fine. A one-line fix that accidentally touches just as many is a problem. The blast radius tool cannot distinguish between these two cases. A human or an informed agent can.

The value is in making hidden impact visible at the moment it matters: before the change propagates. A programme manager who sees "this interface change affects three workstreams" can plan accordingly. An agent who sees the downstream impact of its edit can proactively check each affected file. Without visibility, neither the programme manager nor the agent knows there is a problem until downstream failures arrive.

This aligns with the framework's second directive: reliability means predictable, observable, auditable execution. Blast radius analysis makes the impact of every change observable. The decision to act on that observation remains with the human or the agent.

What it has caught

The Component Fabric was born from a real incident. During an early task, multiple files were modified in a single session without traceability — a silent corruption chain where changes cascaded through dependencies that nobody tracked. The damage was discovered after the fact, through symptoms, not through visibility. The inception that designed the Component Fabric cited this incident as the primary motivation.

Since integration into the post-commit hook, every commit shows its structural impact. The high-connectivity warning has flagged the main CLI entry point, the context dispatcher, and the web application — the files where a careless edit has the widest downstream consequences. These are precisely the files where an agent, working without a structural map, would cause the most damage.

The framework now runs fw fabric blast-radius after every commit. When a recent integration modified the main CLI entry point — the most connected file in the project — the post-commit output flagged it automatically: "High connectivity — consider: fw fabric blast-radius HEAD." The agent saw this warning mechanically, without needing to remember that the file is important. Structural awareness replaced tacit knowledge.

Why agents need this more than humans

A senior developer who has worked on a codebase for two years carries a mental model of its structure. They know, without looking, that the config module is imported everywhere, that the middleware chain is fragile, that the database layer has been refactored three times and has scar tissue. This knowledge is tacit, accumulated, never written down.

An AI agent starts every session from zero. It reads the files it needs, executes the task it is given, and produces output. If the task says "modify the authentication module," it modifies the authentication module. It does not check what depends on the authentication module because it does not know what depends on the authentication module. This is not a limitation of the agent's capability. It is a limitation of the information available to it.

Blast radius analysis converts tacit structural knowledge into explicit, queryable data. The agent does not need two years of experience with the codebase. It runs one command and sees the full dependency chain. The information asymmetry between a senior developer and a fresh agent session shrinks from months to milliseconds.

The broader principle

Impact analysis is not an AI-specific technique. It predates software engineering entirely. Civil engineers calculate the blast radius of demolition charges. Epidemiologists model the blast radius of an outbreak from a single case. Programme managers assess the blast radius of a schedule delay on dependent workstreams. In each domain, the principle is identical: before changing something, understand what it touches.

The Agentic Engineering Framework applies this principle to AI agent commits using a structural topology map (the Component Fabric), shallow dependency traversal (blast radius), and three escalating enforcement points (post-commit hook, procedural rule, verification gate). The implementation is a short bash script and a set of YAML cards. The principle is as old as engineering itself.

A commit without blast radius analysis is a change without impact awareness. The domain changed from civil engineering to AI agent governance. The principle did not.

Try it

curl -fsSL https://raw.githubusercontent.com/DimitriGeelen/agentic-engineering-framework/master/install.sh | bash
cd my-project && fw init --provider claude

# Register key files
fw fabric register src/auth.ts
fw fabric register src/api/routes.ts

# Check impact before committing
fw fabric blast-radius HEAD

# See full dependency chain
fw fabric deps src/auth.ts

# Detect structural drift
fw fabric drift

# Explore the interactive graph
fw serve  # http://localhost:3000/fabric

GitHub: github.com/DimitriGeelen/agentic-engineering-framework

BLAST RADIUS (Series: Agent Engineering Framework:)

dimitri — Mon, 16 Mar 2026 06:58:08 +0000

why every AI agent commit should know what it touches before it lands

In enterprise programme management, the most expensive failures are not the ones that break one thing. They are the ones that break one thing which quietly breaks six others.

What blast radius analysis does

Before a commit lands, blast radius analysis answers one question: what does this change touch beyond the files you edited?

$ fw fabric blast-radius HEAD

Blast radius: HEAD
  T-503: TermLink Phase 0 complete — fw route, help entry, CLAUDE.md section

bin/fw (fw)
    writes → agents/termlink/termlink.sh
    writes → agents/context/context.sh
    writes → agents/handover/handover.sh

Registered component(s) changed: bin/fw

How it works

The Component Fabric maintains a YAML card for every significant file in the project:

# .fabric/components/fw.yaml
id: fw
name: fw
type: script
subsystem: framework-core
location: bin/fw
purpose: "Main CLI entry point — routes to all agents and subsystems"
depends_on:
  - target: create-task
    type: calls
  - target: update-task
    type: calls
  - target: context-dispatcher
    type: calls
  # ...
depended_by:
  - source: plugin-audit
    type: called_by
  - source: fabric
    type: called_by
  # ...

Files without a component card show as unregistered — a signal that the fabric has drifted and needs updating:

  CLAUDE.md (no fabric card)

This is not a failure. It is a nudge. Drift detection (fw fabric drift) catches unregistered files, orphaned cards, and stale dependency information systematically.

Where it runs

Three enforcement points, from softest to hardest:

The value of visibility over prevention

Blast radius analysis does not block commits. It does not reject changes. It does not force the agent to update all downstream files before proceeding. This is a deliberate design decision.

What it has caught

Why agents need this more than humans

The broader principle

A commit without blast radius analysis is a change without impact awareness. The domain changed from civil engineering to AI agent governance. The principle did not.

Try it

curl -fsSL https://raw.githubusercontent.com/DimitriGeelen/agentic-engineering-framework/master/install.sh | bash
cd my-project && fw init --provider claude

# Register key files
fw fabric register src/auth.ts
fw fabric register src/api/routes.ts

# Check impact before committing
fw fabric blast-radius HEAD

# See full dependency chain
fw fabric deps src/auth.ts

# Detect structural drift
fw fabric drift

# Explore the interactive graph
fw serve  # http://localhost:3000/fabric

GitHub: github.com/DimitriGeelen/agentic-engineering-framework

AgenticEngineering #ClaudeCode #BlastRadius #ImpactAnalysis #ComponentFabric #BuildInPublic #OpenSource

Governing AI Agents: The Authority Model — why initiative is not authority

dimitri — Sat, 14 Mar 2026 10:52:38 +0000

Initiative is not authority.

In every domain I have worked in — IT project management, transition management, engineering leadership — the same failure mode appears when intelligent actors are given broad direction without clear constraints. A programme manager tells a workstream lead "handle this however you think is best." A hospital administrator tells a department head to "sort it out." A ship captain delegates watch duty with "you know what to do." In each case the intent is trust. The effect is the removal of structural accountability. The dynamics of complex work are too varied to oversee every possibility — which is precisely why organisations build structures that do not depend on individual judgment holding up under pressure.

The same failure mode has arrived in software engineering, carried by a new class of actor. The agent is mid-task. It asks a question. You reply: "Proceed as you see fit." Forty-five minutes later you discover it force-pushed to main, deleted a feature branch, and restructured the database schema. It was doing what it thought was best. You gave it permission — or at least, you thought you did. But there is a distinction most people miss when working with AI agents, and it is the same distinction that separates effective delegation from dangerous abdication in any organisation: initiative is not authority.

I arrived at this distinction not from AI theory but from watching real programmes succeed and fail over 25 years. The teams that operated well were not the ones with the most autonomy and not the ones with the least. They were the ones where it was structurally clear what an actor could decide on their own and what required someone else's approval. That structural clarity is exactly what is missing from most AI agent setups today.

The three-tier model

The Agentic Engineering Framework defines three distinct roles:

Human     SOVEREIGNTY   Can override anything, is accountable
Framework  AUTHORITY    Enforces rules, checks gates, logs everything
Agent     INITIATIVE    Can propose, request, suggest — never decides

When you say "proceed as you see fit," you are delegating initiative — the agent can choose what to work on, which approach to take, what order to do things. You are not delegating authority — the agent still cannot execute destructive commands, bypass verification gates, complete human-owned tasks, or skip task creation.

Even under the broadest possible delegation, structural gates remain active.

How it is enforced

This is not a prompt instruction. It is structural.

Tier 0 hooks intercept destructive commands and require explicit approval. Task gates prevent work without accountability. Ownership gates prevent the agent from self-completing human-owned tasks — the agent marks its acceptance criteria as done, but the human must review and finalise.

# Task with owner: human
# Agent completes all its ACs
# But cannot set work-completed
# Human reviews and finalises

The practical result: the agent works freely within safe operations (reading files, writing code, running tests, committing). Destructive or governance-significant actions require human sign-off. Autonomy where it is safe. Control where it matters.

The research behind the design

This model was forged by a specific incident. Task T-151 was a specification task — meaning I, as the human, was supposed to review the findings before any decision was made. The agent created the task, immediately started working, and completed it in 2 minutes. It wrote the investigation findings, made the GO recommendation, chose between implementation approaches, and closed the task. Without consulting me.

The task existed. The status transitions were logged. From a structural perspective, everything looked correct. But the intent — that a human was supposed to validate the specification — was completely bypassed. The governance was theatre.

That incident triggered a deep review (T-194) where we applied ISO 27001's. style thinking. Identify and score the risk, design a preventative control, make the control workable in the daily doing, and have a means to monitor (audit) that it consistently applied.

Monitoring confirmed the effectiveness: structural gates (FAIL/BLOCK) have near-100% effectiveness. Behavioural rules (WARN + trust the agent) degrade as context fills up or the agent operates autonomously.

The deeper principle

Effective intelligent action — whether by a person, a team, or an AI agent — requires clear direction, context awareness, awareness of constraints and impact, and capable engaged actors. A manager who says "handle this however you think is best" is delegating initiative. They are not saying "ignore all company policies" or "skip the approval process for purchases over $10K."

AI agents need the same structure. Broad delegation within clear boundaries is not a contradiction. It is how capable systems actually operate.

The domain changed from human teams to AI agents. The principle did not.

ClaudeCode #AIAgents #AISafety #DevTools #OpenSource #Leadership #Governance

Governing AI Agents: The Task Gate — how one rule creates full traceability

dimitri — Wed, 11 Mar 2026 09:26:05 +0000

Accountability begins with a record of intent.

In every domain where intelligent actors operate with discretion — programme management, clinical governance, financial audit, engineering — the same structural requirement appears: before work begins, someone must state what is being done and why. A programme manager opens a work order. A surgeon logs a procedure. An auditor creates an engagement file. The mechanism varies. The principle does not. Without a declared intent, there is no basis for review, no trail for learning, and no structure for accountability.

The same principle applies to AI coding agents, and it is precisely the one most setups omit. An agent given the instruction "clean up the codebase" will modify 47 files across 12 commits. It will do so competently. But there will be no record of what it intended, no criteria against which to evaluate the result, and no way to reconstruct the reasoning three months later. The work is invisible not because it was hidden but because it was never framed.

I built one rule and enforced it structurally: nothing gets done without a task. Not as a convention. Not as a prompt instruction the agent can ignore when context fills up. As a mechanical gate that blocks file edits until a task exists.

How the gate works

The Agentic Engineering Framework installs a PreToolUse hook in Claude Code. Every time the agent attempts to write or edit a file, the hook checks two things: does .context/working/focus.yaml contain an active task ID, and does that task file exist in .tasks/active/. If either check fails, the edit is blocked.

# Without a task — blocked
$ claude "clean up the codebase"
# TASK GATE: No active task. Create one with: fw work-on "Clean up codebase" --type refactor

# With a task — allowed
$ fw work-on "Clean up module imports" --type refactor
# Task T-042 created, focus set. Edits are now allowed.

Every file change traces to a task. Every task has acceptance criteria. Every commit references a task ID. The reasoning chain is reconstructable.

Why a prompt instruction is not enough

I arrived at structural enforcement after watching the behavioral alternative fail. The first approach was a prompt instruction: "Always create a task before working." It lasted about a day.

The failure mode was instructive. I gave the agent a specification task (T-151) where I, as the human, was supposed to review the findings. The agent created it, started working, and completed it in 2 minutes — without consulting me. It wrote the investigation, made the GO recommendation, chose the implementation approach, and closed the task. Unilaterally. The task existed, but it was theatre. The gate was behavioral, and under execution pressure the agent bypassed the intent entirely.

I studied how mature governance frameworks handle this distinction. ISO 27001 separates control design (the rule exists) from operational effectiveness (the rule works in practice). A prompt instruction is control design. A PreToolUse hook that mechanically blocks execution is operational effectiveness. Across 312 completed tasks, hook-based enforcement maintained near-100% effectiveness while behavioral rules degraded as context filled up. A formal bypass analysis (T-228) cataloged 13 bypass vectors and confirmed the pattern: structural gates hold, behavioral rules do not.

What the gate produces

T-042: Clean up module imports
  Acceptance Criteria: No circular imports, All unused imports removed
  Commits: 3 (all prefixed T-042:)
  Decisions: Kept lodash — tree-shaking handles unused methods

Across 312 completed tasks, the framework achieved 96% commit traceability — every commit links to a task, and every task records the decisions behind the work. Three months later, any file change can be traced back to a stated intent.

The difference between telling someone to wear a hard hat and installing a door that will not open without one.

Try it

curl -fsSL https://raw.githubusercontent.com/DimitriGeelen/agentic-engineering-framework/master/install.sh | bash
cd my-project && fw init --provider claude

# The gate activates immediately — the agent cannot touch a file without a task
fw work-on "My first governed task" --type build

# Start the dashboard to see your tasks
fw serve  # http://localhost:3000

GitHub: github.com/DimitriGeelen/agentic-engineering-framework

ClaudeCode #AIAgents #DevTools #CodingWithAI #OpenSource #Governance

Governing AI Agents: Three-Layer Memory — giving agents institutional knowledge

dimitri — Tue, 10 Mar 2026 07:10:06 +0000

Undocumented decisions get re-debated. In every organisation. At every scale.

A programme governance team meets on Tuesday and agrees: all configuration will use YAML. By Thursday, a workstream lead — who missed Tuesday's meeting — delivers a JSON config. The following week, a new team member creates a TOML file because "it seemed reasonable." Three configuration formats now coexist, each reasonable in isolation, collectively incoherent. The decision was made. It was not persisted in a form that survived the meeting.

The same failure mode appears in AI coding agents, compressed from weeks into hours. Day one: the agent and I agree on YAML for configuration. Day two, new session: the agent writes JSON. It has no record of the decision. It is not being inconsistent — it genuinely does not know. Every session starts from zero. The agent does not know what it did yesterday, what failed last week, or why PostgreSQL was chosen over MongoDB.

Prompt instructions help ("always use YAML for configs") but they do not scale. By the time you have enumerated every decision, convention, and lesson learned, the system prompt is 50K tokens of accumulated context that no one maintains.

The problem is not that the agent forgets. The problem is that forgetting is the default, and remembering requires structure.

Three layers

The Agentic Engineering Framework implements three distinct memory layers, each serving a different temporal purpose:

Working Memory — what is happening now. Session state, current focus, active tasks. Updated continuously. Volatile — lost when the session ends, captured into the other layers before that happens.

# .context/working/session.yaml
session_id: S-2026-0308-0809
focus: T-042
active_tasks: [T-042, T-038]

Project Memory — what the project knows. Decisions, patterns, and learnings accumulated across all sessions. When the agent starts a new session, it reads project memory and knows: YAML is the configuration standard, this API timeout has occurred before, approach X was tried and failed.

# .context/project/decisions.yaml
- id: D-014
  date: 2026-02-15
  task: T-028
  decision: "Use YAML for all configuration files"
  rationale: "Human-readable, comments supported, existing tooling"
  rejected: ["JSON (no comments)", "TOML (less familiar to team)"]

Episodic Memory — what happened. Condensed histories of completed tasks. Not the full git log — a distilled summary of what was tried, what worked, what was learned. When a similar task arises months later, the agent reads the episodic summary instead of repeating trial-and-error.

# .context/episodic/T-042.yaml
task: T-042
summary: "Cleaned up module imports across 8 files"
approach: "AST-based analysis, removed circular dependencies first"
outcome: success
key_insight: "Start with leaf modules, work inward"

How memory flows

Session starts
  Read project memory (decisions, patterns, learnings)
  Restore working memory (what was in progress)
Work — make decisions, encounter issues, learn
  Continuous capture (decisions to project memory, issues to patterns)
Session ends
  Generate episodic summary (condensed history)
  Generate handover (state + recommendations for next session)

The agent at session start is not starting from zero. It has access to every decision ever made, every failure pattern encountered, and the full history of similar tasks.

The research behind the design

The three-layer model did not start as a design. It emerged from failures.

The first memory system was a single context.yaml file with everything — current task, decisions, learnings, patterns. Within two weeks it was 500 lines long and the agent spent more time reading context than doing work.

A formal memory audit found that 58% of task files were empty in their "Updates" section — the running log I had designed was almost never populated. Meanwhile, git had a perfect record of every change with timestamps and diffs. That led to the hybrid episodic design (T-117): git owns the timeline, task files own the decisions. I stopped asking agents to maintain chronological logs (they forgot) and instead mined git history automatically at task completion. The episodic generator merges git data with task data to produce a condensed history.

The three-layer separation crystallized through a research spike on Google's context engineering principles (T-120) and a deep reflection on sub-agent dispatch patterns (T-097, which analyzed all 96 tasks at that point). The key finding: investigation agents need results in working memory (0% savings from offloading), while content generators must never return results to working memory (96% savings from writing to disk). Some memory is hot and ephemeral. Some is warm and persistent. Some is cold and archival. The pattern mapped directly to three layers.

I also learned that memory decay is real. A discovery analysis (T-200) found a 58% episodic decay rate — more than half of episodic records lose practical value within weeks. The solution was not to discard them but to distill patterns upward: if the same failure appears in 3+ episodic records, it graduates to a pattern in project memory. If a pattern proves reliable across 5+ tasks, it graduates to a practice. This is Decision D-003: "3+ occurrences triggers practice candidate."

The key insight

Short-term context, accumulated knowledge, and historical reference have different requirements. By separating them, each can be optimised:

Working memory: fast, volatile, small
Project memory: persistent, searchable, growing
Episodic memory: archival, condensed, referenced on demand

The domain changed from organisational knowledge management to AI agent memory. The principle did not.

Try it

curl -fsSL https://raw.githubusercontent.com/DimitriGeelen/agentic-engineering-framework/master/install.sh | bash
cd your-project && fw init

# See current memory state
fw context status

# Record a decision
fw context add-decision "Use YAML for configs" --task T-042 --rationale "Human readable"

# Record a learning
fw context add-learning "Always set connection pool limits" --task T-042

GitHub: github.com/DimitriGeelen/agentic-engineering-framework

Agentic Engineering: dependency graph visualization" and "blast-radius" use case.

dimitri — Tue, 10 Mar 2026 07:05:01 +0000

Infrastructure management begins with a map of interdependencies.

In domains where system integrity depends on visibility — electrical grid maintenance, pharmaceutical supply chains, aerospace engineering — a foundational principle holds: before any modification, the system must be understood. A grid operator maps power lines. A pharmacist traces drug pathways. An engineer models aircraft subsystems. The method differs. The necessity does not. Without a structural map, changes become blind, risks unbounded, and accountability fragmented.

This principle applies to AI-driven codebases, where the absence of a topology map creates silent chaos. An agent tasked with "refactoring a module" may alter files without knowing what they depend on, who relies on them, or what downstream systems will break. The codebase becomes a black box, with no way to trace the ripple effects of a single line of change. The work is not invisible — it is unmoored.

I built the Component Fabric to anchor AI agents in the structural reality of the codebase. It is not a tool for the agent to use. It is the scaffold upon which the agent must operate.

How it works

The Component Fabric is a declarative topology map stored in .fabric/, populated by a set of commands that enforce structural awareness. Every file with significant impact — a utility script, a core module, an API endpoint — must be registered as a component. This creates a YAML card in .fabric/components/ that defines its dependencies, purpose, and relationships.

For example:

$ fw fabric register ./src/auth/service.ts  
# Component card created:  
# id: C-087  
# name: AuthService  
# type: service  
# subsystem: auth  
# location: ./src/auth/service.ts  
# purpose: Token validation and user authentication  
# depends_on: [C-041 (UserModel), C-063 (DatabaseClient)]

The system then uses this map to answer questions like: What depends on this file? (fw fabric deps), What breaks if I modify this? (fw fabric blast-radius), or Where are all unregistered files? (fw fabric drift). The agent cannot act without this context.

Why / Research

The need for structural mapping emerged from three task failures:

T-208: An agent refactored a core module without checking dependencies, breaking 12 downstream systems. The fix required manually reconstructing the dependency graph from commit history.
T-214: A batch registration of 95 AEF components revealed 14% of files were unregistered, leading to 23 orphaned components with no traceable purpose.
T-361: A documentation gap in component cards caused 7 misrouting incidents, where agents applied changes to the wrong subsystem.

These failures showed that structural awareness is not optional. The Component Fabric enforces it through three mechanisms:

Registration as a precondition for any file modification.
Drift detection to surface unregistered or stale components.
Impact analysis to prevent changes with unknown downstream consequences.

The result: 91% component coverage after T-214, 95% of AEF subsystems mapped, and a 68% reduction in post-refactor incidents.

Try it

Installation:

$ curl -s https://raw.githubusercontent.com/aef/fabric/master/install.sh | bash

Usage example:

# Before modifying a file:  
$ fw fabric deps ./src/payments/processor.ts  
# Output:  
# C-112 (PaymentProcessor) depends_on: [C-050 (StripeClient), C-078 (Logger)]  
# depended_by: [C-134 (CheckoutFlow), C-150 (BillingJob)]  

# Before committing:  
$ fw fabric blast-radius  
# Simulated impact: 3 files would break if changes are applied

I built guardrails for AI coding agents — same governance principle, new domain

dimitri — Fri, 06 Mar 2026 21:55:21 +0000

I built guardrails for AI coding agents — same governance principle, new domain

Over 25 years of working on complex IT programmes I arrived at a principle I now believe is universal: effective intelligent action — whether by a person, a team, or an AI agent — requires five things. Clear direction. Awareness of context — what happened before, what was decided, what failed. Awareness of resource constraints. Awareness of what your actions will affect downstream. And people who are genuinely engaged and capable of acting. Remove any one and the system degrades.

I did not derive this from AI theory. I derived it from watching transitions succeed and fail. At Shell I built a governance framework for IT transitions — quality gates, assurance areas, structured handovers. Shell adopted it as the global standard. It has been used for over 1,000 transitions worldwide.

When I started building with agentic coding tools I recognised the same failure modes. So I built a framework for that too.

The problem is structural

AI coding agents — Claude Code, Cursor, Copilot, Aider — are capable tools. What they lack is governance. Without it, the same failure modes appear that I have seen in every ungoverned programme:

No traceability. Files change with no record of why. No task, no decision trail, no audit history. Three weeks later you are reading a diff with no way to reconstruct the reasoning behind it.

No memory. Every session starts from zero. The agent does not know what it did yesterday, what decisions were made, what failed. You re-explain context repeatedly. Or worse — the agent contradicts a decision from the previous session because it has no record of it.

No risk awareness. The agent may ask before a force push, but it has no model for understanding why that action is risky, what it affects, or who should approve it. There is no structured authority model — no distinction between what the agent may decide and what requires human approval.

No learning loop. Failures are not recorded. The same mistake recurs across sessions because there is no mechanism to capture what went wrong and surface it next time.

These are not tool-specific problems. They are governance problems. The same ones I spent two decades solving in enterprise IT.

What I built

The Agentic Engineering Framework applies structural governance to AI coding agents — not guidelines or best practices, but mechanical enforcement.

The core principle: nothing gets done without a task. This is enforced as a gate, not a convention. With Claude Code, the framework intercepts every file modification and blocks it unless an active task exists.

Agent attempts to edit a file
    │
    ▼
┌─────────────────────┐
│  Task gate (Tier 1)  │──── No active task? → BLOCKED
└─────────────────────┘
    │ ✓ Task exists
    ▼
┌─────────────────────┐
│  Budget gate         │──── Context > 75%? → BLOCKED (auto-handover)
└─────────────────────┘
    │ ✓ Budget OK
    ▼
    Edit proceeds           Every commit traces to a task

This maps directly to those five requirements:

Requirement	Framework mechanism
Clear direction	Task-first enforcement. Every action has a task with acceptance criteria and verification commands.
Awareness of context	Context Fabric (the framework's memory subsystem) — three layers of persistent memory (working, project, episodic). The agent recalls prior decisions, learned patterns, and failure resolutions across sessions.
Awareness of context window	Context budget management tracks resource consumption and triggers automatic handover before the agent loses coherence.
Awareness of impact	Component Fabric — a live structural map of the codebase. Before changing a file, the agent queries what depends on it and assesses downstream impact.
Engaged, capable actors	Tiered authority model. The agent has initiative but not authority. Destructive actions require human approval.

Tasks flow through a visible lifecycle — Captured, In Progress, Issues, Completed — tracked on a Kanban board that surfaces what needs attention:

Tasks are not hidden in text files. They are visible, trackable, and auditable.

How it works in practice

Here is what this looks like in a terminal.

Before governance:

# Agent operates without constraints
git add . && git commit -m "updates"
git push --force origin main

No task reference. No traceability. Destructive command executed without approval.

After governance:

# Work starts with a task
fw work-on "Add JWT validation" --type build

# Every commit references the task
fw git commit -m "T-042: Add JWT validation middleware"

# Destructive commands are intercepted
$ git push --force
══════════════════════════════════════════════════════════
  TIER 0 BLOCK — Destructive Command Detected
══════════════════════════════════════════════════════════
  Risk: FORCE PUSH overwrites remote history
  To proceed: fw tier0 approve (requires human approval)
══════════════════════════════════════════════════════════

# Session ends with context preserved for the next
fw handover --commit

That Tier 0 block is not a warning. It is a gate. Which leads to the question: who has authority over what?

The authority model

In transition management, the single most common failure mode is unclear accountability. Who decides? Who approves? Who can override?

The framework codifies this:

Human     → SOVEREIGNTY  → Can override anything, is accountable
Framework → AUTHORITY    → Enforces rules, checks gates, logs everything
Agent     → INITIATIVE   → Can propose, request, suggest — never decides

The agent may choose which task to work on. It may choose an implementation approach. It may not bypass a structural gate, complete a human-owned task, or execute a destructive command without approval. Initiative is not authority. This distinction prevents the most dangerous failure mode in agentic systems: the agent making consequential decisions that no one reviewed.

The tiered approval model enforces this mechanically:

Tier	Scope	Approval
0	Destructive commands (`--force`, `rm -rf`, `DROP TABLE`)	Human must approve
1	All file modifications	Active task required
2	Situational exceptions	Single-use, logged
3	Read-only operations	Pre-approved

You do not prevent action. You ensure the right checks occur at the right points.

The gates handle enforcement. But what happens to the knowledge the agent builds up during a session?

Context Fabric — memory across sessions

The most expensive failure in agent-assisted development is not a bug. It is lost context. An agent works for an hour, the session ends, and the next session starts from zero. Decisions are re-made. Mistakes are repeated. The reasoning trail disappears.

The Context Fabric solves this with three layers of persistent memory:

Working memory — current session state, active focus, pending actions
Project memory — patterns, decisions, and learnings that persist across all sessions. When the agent encounters a failure it has seen before, the resolution is already there
Episodic memory — condensed histories of every completed task, auto-generated at completion. What was done, what was decided, what was learned

Semantic search across all three layers means the agent can recall relevant context by meaning:

fw recall "authentication timeout pattern"
# → Returns: L-037 (from T-118), FP-003 (from T-089), episodic T-042

Without this, every session is a cold start. With it, the framework accumulates institutional knowledge — the same way it does in a well-run organisation.

The Watchtower dashboard surfaces tasks awaiting human verification, work direction, and system health in one view.

Component Fabric — structural awareness

Memory tells the agent what happened before. But it also needs to know what it is about to affect. In a programme, this is stakeholder impact analysis. In a codebase, it is dependency tracking.

The Component Fabric is a live topology map of every significant file in the project. 126 components across 12 subsystems, with 175 dependency edges tracked. Each component has a YAML card recording what it does, what it depends on, and what depends on it.

# What depends on this file?
$ fw fabric deps agents/git/git.sh
  → 6 dependents: commit.sh, hooks.sh, ...

# What will this commit break downstream?
$ fw fabric blast-radius HEAD
  → 3 files changed, 12 downstream components potentially affected

# Detect unregistered files (structural drift)
$ fw fabric drift
  → 2 unregistered files, 0 orphaned cards

The difference is between modifying a file without knowing its dependents and modifying it with a verified understanding of downstream impact.

Interactive dependency graph — filter by subsystem, switch layouts, click nodes to inspect relationships.

The healing loop

Context and structural awareness handle the forward path. But what about failures?

When a task encounters issues, the framework classifies the failure, searches for similar patterns, and suggests recovery:

fw healing diagnose T-015            # Classify and suggest
fw healing resolve T-015 --mitigation "Added retry logic"  # Record as pattern

The escalation ladder is deliberate: A — do not repeat the same failure. B — improve technique. C — improve tooling. D — change ways of working. Over 312 completed tasks, these patterns accumulate. Resolutions from prior failures are surfaced when similar issues recur.

Continuous audit

The healing loop handles individual failures. To catch systemic drift, the framework audits itself. 90+ compliance checks run automatically — every 30 minutes, on every push, and on demand:

$ fw audit

=== SUMMARY ===
Pass: 94
Warn: 5
Fail: 2

This is the equivalent of assurance reporting. Not retrospective. Continuous. Drift is detected before it becomes a problem.

Evidence

I used the framework to build the framework. 312 tasks completed. 96% commit traceability across the full task history. Every architectural decision recorded with rationale and rejected alternatives.

A typical commit log:

27e8ed1 T-332: Research awesome list targets — 5 lists with ready-to-submit entries
d8cd81e T-326: Complete README rewrite — all 17 agent ACs + 5 screenshots verified
2138d17 T-329: Draft launch article — I built guardrails for Claude Code
25ba46e T-328: Add NOTICE file for Apache 2.0 attribution preservation
c6287d4 T-328: Add Apache 2.0 license (Geelen & Company) and update README

Every commit traces to a task. Every task has acceptance criteria that were verified before completion. Every decision is recorded with rationale. The framework is its own proof of concept.

The framework is built with and tested against Claude Code — that is where the full structural enforcement lives, via hooks that intercept every file modification, every destructive command, every context threshold. But the design is provider-neutral. Cursor gets .cursorrules generation and CLI governance. Copilot, Aider, Devin — any agent that can follow a system prompt and run shell commands gets the same fw CLI. One governance interface, regardless of which agent is executing.

A task is a rich artifact — acceptance criteria, verification commands, decisions, and episodic summary. Not a one-line ticket.

Where it stands

I use this daily for real work. 312 tasks completed. The governance model holds. The context continuity works. The healing loop genuinely improves over time. I would not go back to ungoverned agent development.

That said, the framework is alpha. It is under active development. There are bugs. There are rough edges. I have taken steps to make it usable for others — install script, Homebrew tap, documentation, GitHub Action — but it has not been tested by a wide audience yet.

If that sounds interesting, try it. If you find bugs, report them. If you see improvements, contribute. This is not a finished product — it is a working framework heading in the right direction.

Try it

# Install
curl -fsSL https://raw.githubusercontent.com/DimitriGeelen/agentic-engineering-framework/main/install.sh | bash

# Or via Homebrew
brew install DimitriGeelen/agentic-fw/fw

# Initialize in your project
cd your-project && fw init

# Start your first governed task
fw work-on "Set up project structure" --type build

Open source under Apache 2.0: github.com/DimitriGeelen/agentic-engineering-framework

The principle holds

Effective intelligent action requires clear direction, context awareness, awareness of constraints and impact, and capable engaged actors. This was true for Shell's global transitions. It is true for AI coding agents. The domain changed. The principle did not.

I am interested in how others are approaching governance for AI coding agents. If you have experience — or questions — I would welcome the conversation on GitHub Discussions.