DEV Community

Andrew Park for Edensoft Labs

Posted on • Edited on

Here are the Agentic Coding Standards I built for my team

I developed this standard for my team at Edensoft Labs for agentic coding in defense software, where systems stay deployed for decades and weak code becomes operational risk. Agentic coding tools are powerful, but not magic: they don't write maintainable code on their own, and code generation is just the first part of software delivery. Other essential work includes requirements, specification, verification against real system behavior, and refactoring around it, because agents generate code, but humans must own every line shipped. Here’s an outline of the standards:

Agentic Coding Standards

Tier 1: Requirements

Before any specification is written, the engineer must understand what needs to be built and whether an agent is the right tool to build it. These disciplines apply before any agent session begins. Skipping them is where the most expensive mistakes in agentic development are made.

01 Requirements interrogation

Before specifying a task for an agent, you need to know what you actually need to build. That sounds obvious. It isn’t. The most expensive mistakes in agentic coding happen not because the agent misunderstood the specification, but because the engineer didn’t fully understand the requirement before writing it.

Requirements interrogation is the discipline of surfacing what you don’t know before generation starts. It has 2 stages that happen in order. The first is stakeholder interrogation: before any technical specification is written, the engineer interviews the relevant stakeholders, including clients, product owners, domain experts, or other engineers, to surface assumptions, constraints, edge cases, and competing priorities that haven’t been explicitly stated. The second is agent interrogation: once the engineer understands the requirement, they give the agent a description of the task and ask it to interview them about every assumption, constraint, and edge case it needs to understand before proceeding. Both stages produce written artifacts that are archived after implementation so future engineers and agents can understand not just what was built but why.

Ask yourself these questions before starting any significant agent session. If you answer yes to 2 or more, don’t start an agent session yet. Complete requirements interrogation first:

  • Am I building something I haven’t built before?
  • Is there domain knowledge involved that I’m not fully confident in?
  • Do I have assumptions about what the stakeholder wants that I haven’t verified?
  • Do I have a clear understanding of the quality attribute mix that defines excellence for this product, and do I know how this task affects it?
  • Are there constraints I haven’t explicitly identified: security, compliance, integration, platform, or things the system must never do?
  • Am I uncertain about what “done” looks like for this task?
  • Could a misunderstanding here cost significant rework?
  • Could a misunderstanding here cause damage to customer mission outcomes?

[RI1] Conduct stakeholder interrogation before writing any specification. Document what was agreed, what was ruled out, and what remains uncertain: The document is the shared understanding between the stakeholders and the engineer. It’s not a summary of what you already knew. It’s a record of what was surfaced, tested, and agreed.

[RI2] After stakeholder interrogation, conduct agent interrogation. Ask the agent to interview you about every assumption, constraint, and edge case before any code is generated: The agent’s questions often surface gaps in the engineer’s own thinking that stakeholder interviews didn’t catch. The output is a refined written specification capturing the shared understanding between the engineer and the agent.

Return to stakeholder interrogation whenever agent interrogation reveals unanswered requirements questions. Engineers’ bias toward building over requirements discovery makes this one of the most common mistakes in software development.

[RI3] Both interrogation outputs are written artifacts stored in a durable location, not in memory or a chat window: Artifacts held only in memory or a chat session are lost. A written document that lives in a durable location gives every future engineer and agent access to the reasoning behind the decision.

Tier 2: Planning

These disciplines govern how the engineer translates requirements into a clear agent session setup. The agent is not yet generating code. The engineer is defining what the agent will do, confirming the task is suitable for agent execution, and establishing the constraints and verification mechanisms that will govern the session.

02 Task division and agent scope

Before directing an agent, the engineer needs to answer 2 questions: is the task fully specified, and can the output be independently verified? Getting both right is what separates agents that produce genuine leverage from agents that produce impressive-looking output that doesn’t hold up. Getting either wrong wastes time, burns token budget, and produces code that has to be discarded.

The first condition is specification clarity: can the task be fully specified, including constraints, edge cases, and what must not happen? The second is independent verifiability: can the output be checked without relying on the agent’s own explanation of what it did? These 2 conditions are the gate for autonomous and multi-agent workflows.

Agents fill underspecified gaps with defaults from their training data, not from your system’s requirements. An agent that doesn’t know the constraint won’t ask. It will make a plausible choice and move on. An agent whose output can only be evaluated by asking the agent whether it worked has not been verified. It has been trusted. Those are not the same thing.

A single agent producing a draft for immediate human review can operate under a lighter standard because the human is the verification mechanism.

[TD1] Before any significant agent session, confirm the task is fully specified and the output is independently verifiable: Both conditions must be satisfied before generation begins. These are the conditions under which agent output is trustworthy enough to enter a production codebase.

[TD2] If either condition is not yet met, complete the preparation work before starting the agent session: Specification gaps require more requirements work, design work, or stakeholder interrogation. Verification gaps require building a test harness, defining acceptance criteria, or identifying the system behavior the output must match. These are engineering tasks, not agent tasks. Starting the agent session before this work is done wastes tokens and produces output that can’t be safely accepted.

[TD3] If either condition can’t be satisfied, keep the agent in exploratory mode. Exploratory output should not be committed to the mainline until both conditions are met: Exploratory agent work surfaces questions, generates options, and exposes constraints the engineer hadn’t considered. That output informs the specification. It doesn’t become production code until both conditions are met.

Tier 3: Foundational engineering habits

These are the habits every engineer on this team builds and maintains regardless of experience level. They govern what happens from the moment the agent begins generating output through the moment that output is accepted into the codebase.

03 Specification before generation

Requirements interrogation surfaces what to build. Task division confirms the work is ready for an agent. This discipline covers the third step: translating that understanding into a precise technical specification the agent can execute from, before any code is generated. The generation is the fast part. The specification is the work. Frame the session as a planning conversation, asking the agent to walk through options, explain what it would need to know, or describe what it understands before building. The built-in plan modes in most AI coding tools are insufficient on their own: they produce shallow plans that miss constraints, edge cases, and architectural implications. Treat them as a first pass, not a complete specification.

Before starting a specification conversation, re-read the relevant documentation, constraints, and architectural decisions in the codebase. The specification you give the agent is only as good as your current understanding of the system, and that understanding needs to be grounded in what is actually documented.

Before telling the agent to build anything, ask it to describe what it understands about the task, what it’s uncertain about, and what information it would need to proceed confidently. A 2-minute planning conversation that surfaces a misunderstanding costs almost nothing. A PR review that catches the same misunderstanding after the agent has generated 500 lines costs significantly more in both review time and token usage.

[SP1] Complete the specification conversation before any agent generates code: The generation is the fast part. Starting without a complete specification produces vague code quickly.

[SP2] Document constraints, invariants, and edge cases before generation starts: The agent needs to know what to build, what it must preserve, what it must not break, and what conditions it hasn’t been told about yet.

[SP3] Be explicit about what must not be sacrificed in pursuit of the objective: An agent given a single goal will optimize for that goal and ignore everything else. The spec isn’t just a description of success. It’s a list of constraints that define the boundaries of acceptable solutions.

04 Empiricism before action

Agents assume. You verify. An agent allowed to diagnose and fix based on assumptions will be wrong in ways that are hard to detect, because the fix will be internally consistent with its assumptions even when those assumptions are false. Show the agent the real data, read what it sees, and confirm its understanding before it touches anything.

[EM1] Before diagnosing or changing code, the agent must inspect actual data, logs, errors, tests, and system state: Don’t let an agent act on an assumption about system state. Require it to show you what it found before it touches anything.

05 Verification before trust

AI output looks polished, and polished is easy to mistake for correct. When agent-generated output contains confident but wrong rationale and that rationale gets documented and merged, it becomes the foundation future agents read and build on. Wrong documentation propagates. The agent that reads it next will treat it as authoritative and extend it.

An agent writing tests for its own code will write tests designed to pass against what it just built, not tests that probe whether the code does what the system actually needs. Independent verification defined before the agent generates anything is the only reliable check. If the agent can read your tests while it works, it will optimize for passing them.

The size of a change is not a reliable indicator of how much verification it needs. A large mechanical refactor is easier for an agent to execute correctly than a small change where the agent needs to understand why, not just what. For the mechanical change, the agent pattern-matches. For the other, it has to reason about constraints it may not fully understand, and that’s where errors happen.

Code that meets our Code Maintainability Best Practices standard gives the agent significantly more to work with. When constraints, invariants, and the reasoning behind decisions are clearly documented in the code, the agent can read them directly instead of guessing. That reduces the risk substantially. Code that doesn’t meet that standard leaves the agent filling those gaps with its defaults, which are shaped by its training data, not your system. The better your documentation, the safer the agent’s work will be on changes where it needs to understand why. This is one of the most concrete reasons our documentation standards exist.

Define your verification criteria before the agent generates the code. Criteria written after reviewing the output are shaped by what the agent produced, not by what the system actually needs.

[VT1] Every piece of agent-generated output is verified against actual system behavior before being accepted, regardless of how confident the agent appeared: Confidence is not correctness. Verify against the system, not against the agent’s explanation of the system.

06 Document the why

Agent-generated code can be functionally correct, pass all tests, and still become dark code the moment it enters the codebase without documented intent. The code shows what was built. Only the engineer knows why it was built that way, what constraints shaped the solution, and what must not be changed without understanding those constraints. If that reasoning stays in the engineer’s head or the chat session, it’s lost the moment the session ends. The next engineer or agent to touch that code starts from zero.

Documenting the why is not a cleanup task. It happens as the agent generates, not after. The engineer directs the agent to document intent, rationale, and constraints inline as it produces code. A codebase where every significant decision is explained at the point of decision gives future engineers and agents the context they need to work safely. A codebase where that reasoning is absent is one that accumulates dark code with every session.

[DW1] Direct the agent to document intent, rationale, and constraints inline as it generates code: Like human developers, agents have a bias toward producing working code over producing maintainable code. Even when directed to document intent inline, an agent will often skip it in favor of completing the coding task. Don’t rely on the instruction alone. Direct the agent to document intent as it generates, then at the end of each run, before closing the session, direct it to review its output and fill any documentation gaps. The reasoning is still fresh in the session context; that window closes the moment the session ends.

[DW2] Every significant decision made during an agent session must be traceable in the code: What was built, why it was built that way, what constraints shaped the solution, and what must not be changed without understanding those constraints. If a future engineer or agent can’t recover this from reading the code alone, the documentation is incomplete.

[DW3] Don’t accept agent-generated code that lacks documented rationale for significant decisions: Treat missing documentation as a defect, not a style preference. An agent that generates without documenting is producing code that’s functionally correct today and unmaintainable tomorrow.

07 Refactor before you merge

Agent-generated code is a starting point, not a deliverable. Agents can produce code that works and passes tests while still lacking architectural fit with the existing codebase. They introduce duplicate logic, organize code in ways that conflict with existing structure, and mix approaches within the same module. These problems are visible to anyone with strong maintainability instincts. This is what technical debt at machine speed looks like: not a single catastrophic decision, but hundreds of functionally correct ones that nobody with the right instincts reviewed.

[RF1] Refactor for architectural fit before review: Code that works but doesn’t fit the architecture creates debt on merge. Fix it before it becomes someone else’s problem.

[RF2] Refactor for naming consistency before review: Agents default to their own naming conventions, not yours. Inconsistent naming makes a codebase unreadable fast.

[RF3] Refactor for duplication before review: Agents frequently reimplement logic that already exists. Duplicate logic means 2 places to change when requirements shift.

[RF4] Refactor for simplicity before review: Agents bias toward complexity. If a simpler solution exists, the agent probably didn’t find it. You have to.

If you get RF1 through RF4 backward and start with whether the feature works, a clean diff and passing tests will trick you into approving design damage every time. RF1 through RF4 are actions the submitting engineer takes before review. The following 5 questions are what the reviewer applies before approving:

[RF5] Does it fit the existing architecture?: If the code introduces a pattern that conflicts with how the rest of the system is organized, it doesn’t belong here yet.

[RF6] Does it preserve conceptual integrity?: The system should feel like it was designed by one mind. Every addition either reinforces that or erodes it.

[RF7] Does it introduce complexity that wasn’t worth the cost?: New complexity should solve a real problem. If the simpler version would have worked, use it.

[RF8] Does it damage quality attributes that matter more than the feature itself?: Know your product’s quality attribute priorities before you approve. A feature that ships faster at the cost of maintainability or security is a bad trade.

[RF9] Is this the simplest good solution, or just the fastest plausible one?: Fast and plausible is how dark code enters the codebase. Simple and correct is what we’re building toward.

Tier 4: Agent supervision skills

These disciplines start at the individual engineer level and become critical as team size grows. A single engineer directing agents needs task boundaries, active monitoring, and accountability discipline just as much as a team of engineers does. When multiple engineers are running agent sessions simultaneously, the coordination demands multiply: individual output velocity can explode while system coherence degrades. If you’re seeing 2 or more of these symptoms, your agents may be producing code faster than your team can coordinate on what’s going into the system:

  1. Duplicate logic appearing independently across modules
  2. Inconsistent patterns for solving the same problem
  3. Abstractions that conflict with each other across the codebase
  4. Dependencies that nobody on the team intended to introduce
  5. PRs that are locally coherent but don’t fit the system’s overall design

The response is tighter workflow discipline, more frequent architectural alignment across engineers, and explicit checkpoints where the team reconciles what each agent has produced against the system’s overall design. Velocity that destroys coherence is not productivity.

08 Task boundaries and structured workflows

Agents drift as work gets dense: making assumptions they shouldn’t, skipping steps, crossing responsibility boundaries quietly. The only control is breaking work into small, well-defined tasks with enough checkpoints that drift doesn’t go undetected for long.

[TB1] Break agent work into bounded tasks with clear checkpoints: Each task should have a defined start state, a defined end state, and a way to verify the agent reached the right end state before moving on.

Even for seemingly well-defined tasks, before the agent executes anything, ask it to map its intermediate steps and review them before giving the go-ahead. A task that looks like a clean A to B may involve decisions at each intermediate step that the agent would otherwise make silently during execution. Having the agent articulate those steps first surfaces specification gaps and gives it explicit anchors to return to if it starts to drift. The planning step costs almost nothing relative to discovering a specification gap mid-run (see CU3).

[TB2] Build a validation loop into the task wherever possible: An agent that can close its own validation loop catches its own errors faster and requires less corrective supervision. Build the verification mechanism before you build the feature.

[TB3] When agent output gets shaky, reduce the scope. Tighter boundaries produce more dependable agents: If a task is too large, the agent will drift, forget constraints, and produce output that doesn’t match the task definition. Reduce the scope of the task. A well-bounded task is more dependable than a large one, regardless of how long the session has been running.

No single agent pass can simultaneously optimize for functional correctness, architectural fit, naming consistency, maintainability, and your product’s quality attribute priorities. The model won’t reliably hold all of those constraints at once.

[TB4] Specification first. Generation second. Verification third. Refactoring fourth. Each pass has one job: Breaking it into passes gives each constraint a chance to be applied properly. That’s how you get to code that meets all the standards, not just the ones the agent happened to prioritize.

Keep the work plan for any multi-session task in the codebase as a simple document tracking what needs to be done, what’s done, and what comes next. Agents have no persistent memory between sessions. Each session starts completely fresh, with no knowledge of prior decisions, completed work, or the reasoning behind either. A durable work plan in the codebase is the only reliable way to maintain continuity across sessions.

Treat agent skills, system prompts, and orchestration configurations as production code. Version control them, document their purpose and constraints, and test them before relying on them in real work. A prompt stored only in a chat window or someone’s memory has no history, no review, and no way to recover when it stops working.

[TB5] For significant multi-step work, structure the workflow as an orchestrator session directing subagent sessions: The main session acts as a lightweight orchestrator: it manages task state, coordinates handoffs, and incorporates human feedback. Each significant subtask is delegated to a separate subagent session with its own clean context window focused on a single task. This pattern prevents context pollution proactively. Each subagent starts fresh with only the context it needs, produces more reliable output, and costs fewer tokens per task than a single long session attempting everything in sequence. Use a task list in the codebase to track what each subagent has completed and what comes next. When handing off between sessions, frame the handoff as a prompt for the next agent rather than a narrative summary. The next session can use it directly as its opening context. Sanity check the handoff before closing the session to confirm it is coherent and complete.

[TB6] When working in a single session, manage context rot by ending the session early and handing off deliberately: Agent reliability degrades as sessions grow longer and context fills up. As more tokens accumulate, reasoning quality declines and information from earlier in the session gets lost. In our experience, degradation becomes noticeable when the context window reaches 25-40% capacity. When output gets unreliable, ask the agent to write a handoff prompt for the next session covering what was decided, what was completed, what the next step is, and what context the fresh session needs to proceed. The same degradation that caused you to end the session also affects the agent’s ability to accurately summarize what it completed, so review the handoff carefully before closing the session. An incorrect handoff will cause the fresh session to build on wrong assumptions. Once verified, that handoff prompt becomes the opening context for the fresh session, along with the current architecture diagram and the relevant prior code.

09 Active agent monitoring

Structured workflows define the plan. Active monitoring is what keeps execution on track. Agents drift, cross responsibility boundaries quietly, and interpret ambiguous instructions in ways that are locally coherent but globally wrong. Catch drift early, while it’s cheap to correct.

Different agent tools require different supervision approaches. Some agents work interactively in a conversation, where you can redirect at any point throughout the session. Others run autonomously on a defined task and hand back results when done, so supervision happens at the handoff. Know which mode your agent is operating in and adjust how you check its work accordingly.

Agents can drift quietly while all visible indicators stay green. Tests pass, diffs look reasonable, and problems accumulate invisibly until they’re expensive to fix. The only reliable check is inspecting what the agent produces during the session, not just at the end.

Build constraints into the system for any action where a mistake would have serious consequences. An instruction to an agent carries no guarantee because agents are by nature probabilistic. They can drift, be inconsistent, or simply get it wrong, and even your clearest instruction won’t stop that from happening. A system-level constraint will be your guardrail against this risk.

[AM1] Inspect intermediate output during the session, not just the final diff: By the time you see the final diff, the agent may have made a dozen architectural decisions you didn’t intend.

[AM2] For actions that can’t be undone, remove the capability structurally: If the action can’t be undone and an agent performs it incorrectly, no amount of engineering discipline after the fact can recover the situation. Remove the capability at the system level before the session begins.

[AM3] Don’t rely on agent instructions to prevent destructive actions: File system permissions, database roles, and branch protection rules make destructive actions structurally impossible. Instructing an agent not to destroy things reduces the risk. It doesn’t eliminate it. Instructions are probabilistic. Architecture is not.

[AM4] When you review agent-generated code, it’s your responsibility to ensure it’s built on correct assumptions about the system, not just whether it works: Agents increasingly catch and fix syntax errors before returning output, but they frequently produce code built on wrong assumptions. These failures don’t show up in tests because tests only catch what someone thought to test for.

Engineer-defined tests encode judgment the agent doesn’t have: what the system must do, what it must not do, and what edge cases matter in this domain. They’re only as good as the engineer’s understanding of the system, which is why the disciplines in Tier 1 and Tier 5 matter.

Automated tests that the engineer defines before reviewing the output are one of the most effective mechanisms for validating assumptions. The human's domain knowledge determines what to test for, and the tests encode that judgment independently of the agent.

[AM5] Implement hooks to enforce structural constraints deterministically in the agent's execution pipeline: Hooks are deterministic code that executes at defined points in the workflow regardless of agent instructions or behavior. Use them to enforce invariants the agent must not violate: file path restrictions, external call rate limits, audit logging, database access controls, and any other constraint where instruction-based enforcement is insufficient. A hook that prevents the agent from writing to a protected path is categorically more reliable than an instruction telling the agent not to. Claude Code supports a ‘.claude/settings.json’ configuration file at the organization, workspace, and repo level where you can define explicit allow, ask, and deny permissions for specific actions. This is a direct structural implementation of AM5 and AM3. Other major agentic coding tools don’t currently offer an equivalent. This capability is likely to become standard across the industry. As soon as your tool supports it, use it. Ask the agent to help generate the initial permission configuration; it will often identify permission boundaries you hadn’t thought to specify.

[AM6] For large or architecturally significant agent runs, ask the agent at the end of the session to list any decisions it made that weren't explicitly covered in the specification: Apply this when the diff is too large to trace line by line, or the task touched architectural boundaries: shared interfaces, data models, external integrations, invariants, or anything other components depend on. On long runs, agents will commonly fill specification gaps with their own judgment without telling you it has done so. Triage the list: rework decisions that conflict with the architecture, and document those that are acceptable so they become part of the codebase's traceable rationale.

10 Review discipline that scales with output

If our agents produce 5x the code we used to, our review process has to handle that volume without degrading. A review process that rubber-stamps agent-generated PRs because the tests passed isn’t a review process. It’s a vulnerability pipeline.

As your use of agentic tools increases, review capacity has to scale with it. More agent output means more review load. If you don’t plan for that explicitly, reviews get rushed and the quality bar drops. Plan for how much you need to review, not just how fast you can produce.

[RD1] Reviewers must evaluate agent-generated PRs for pattern consistency, architectural fit, and maintainability, not just test results: Test results tell you whether the code works. They don’t tell you whether the code belongs in this system.

[RD2] Apply RF5 through RF9 before approving any agent-generated PR: These are the minimum questions every reviewer asks. If you can’t answer all 5, the PR is not ready to merge.

11 Accountability for agent output

Although generating code now is much cheaper, owning it in production costs exactly what it always did. Every engineer who has led a team knows this feeling: although you didn’t write that code, your name is on it because you let it ship. That’s exactly the accountability standard that applies here. Every line an agent produces carries the same long-term maintenance burden as a line you wrote by hand. The volume goes up. The ownership cost doesn’t go down.

Accountability doesn’t move to the tool. When nobody is really responsible, technical debt grows because nobody feels empowered to push back. I expect this team to push back.

Agent workflows are production code. Document them to the same standard we hold all production code to in our Code Maintainability Best Practices training. A workflow that only works because one engineer understands how it was built is a liability. Any engineer, current or future, should be able to read the documentation and take full ownership of it.

[AC1] If you check in agent-generated code, your name is on it. Review it accordingly: The agent generated it. You’re responsible for it. That accountability doesn’t transfer to the tool, the model, or the person who asked you to ship faster.

[AC2] Ownership means comprehension. If you merged it, you can explain what it does, why it was built that way, and what it touches in the system: Code you can’t explain is dark code, and dark code is a liability you’ve signed your name to.

Ownership also includes documentation. If the rationale for significant decisions is not in the code, the next engineer or agent starts without the context that shaped your solution. See discipline 06 Document the why.

12 Cost-effective agent usage

Agentic coding gives one engineer the output leverage of several manual coders at a fraction of the labor cost. The goal is not to minimize token usage. The goal is to maximize engineering leverage. A cheap session that produces shallow reasoning, weak verification, or fragile code is not cost-effective. A more expensive session that saves hours of engineering time, reduces review burden, or prevents rework is a good trade.

The most expensive agent sessions usually start with weak specification. A vague prompt that produces the wrong code and requires 3 correction cycles costs several times more than a well-specified session that gets to the right design before generation begins. Those correction cycles also produce larger diffs, confused architecture, and more review burden. Cost discipline starts with SP1 through SP3.

Agents are not a substitute for simpler deterministic tools. Search, grep, test runners, linters, type checkers, compiler errors, and log analysis often answer narrow questions faster, cheaper, and more reliably than asking an agent to reason its way to the same answer. Use the agent where reasoning, synthesis, planning, or refactoring judgment is actually needed.

[CU1] Optimize for engineering leverage, not raw token minimization: The cost of agentic coding should be evaluated against engineer time, review time, defects, rework, and technical debt. Saving tokens while wasting human judgment is a bad trade.

[CU2] Before any significant agent session, define the question being answered, the files in scope, the expected output, and when to stop.

[CU3] Before the agent executes a task, ask it to map its intermediate steps and review them before giving the go-ahead: A planning prompt costs a negligible number of tokens. A task that looks like a clean A to B may involve several intermediate decisions the agent would otherwise make silently during execution. Reviewing the plan before execution surfaces specification gaps early and gives the agent explicit anchors that reduce drift during the run.

Open-ended exploration through the codebase is one of the fastest ways to burn cost without producing value.

[CU4] Match the model to the task. Use lighter models for mechanical work and reserve frontier models for tasks that require complex reasoning: Using a frontier model for mechanical work, such as formatting, renaming, boilerplate generation, or simple refactors, wastes token budget without improving output quality. Reserve more capable and expensive models for tasks that require architectural reasoning, complex domain judgment, or synthesis across large amounts of context. Most agentic coding tools support model selection at the session or task level. Use it deliberately.

For high-stakes or expensive sessions, run a small evaluation pass across model options before committing to the full session. A few test prompts against candidate models costs a fraction of a failed long session.

[CU5] Provide the context that matters for the task, organized so the agent can reason from the right facts: Unfocused context dumps waste tokens and can dilute the agent’s attention. Give the agent architectural context, constraints, invariants, prior decisions, and relevant examples for this specific task.

[CU6] Don’t make the agent rediscover context you already have: If you know the relevant files, design decisions, error logs, or prior attempts, provide them up front. Making the agent search for information you already have wastes tokens and increases the chance it builds on the wrong assumptions.

[CU7] Stop and rescope when the agent starts thrashing: Repeated failed attempts, circular explanations, expanding diffs, and increasingly complex fixes are signs that the task boundary is wrong or the context is insufficient. Stop, clarify the problem, reduce scope, and restart with better boundaries. If rescoping doesn’t resolve the thrashing, the cause may be context rot rather than task boundary. Apply TB6.

[CU8] Capture useful understanding from agent sessions in durable artifacts: Architecture notes, task plans, decision records, and verified examples in the codebase prevent repeated token burn across sessions. If the agent produced useful understanding, write it down so the next session doesn’t rediscover it.

[CU9] Treat excessive token usage as a workflow problem, not a budgeting problem: It means the engineer didn’t bound the task, supply the right context, use the right tools, or stop when the agent drifted. The fix is better workflow discipline.

Tier 5: Systems engineering discipline

These disciplines require the systems engineering mindset: the ability to hold a system’s conceptual integrity at scale, across time, and across thousands of micro-decisions made by agents that have no understanding of the system’s organizing principles. They develop through years of continuous learning and deliberate practice.

13 Architectural ownership

Agents read files. They don’t see systems. An agent can understand what a function does, what a module contains, and what a class is responsible for. What it can’t reliably reconstruct from reading alone is the structural map of the system: which components call which, how data flows across layers, what the actual impact boundary of a change is. That understanding doesn’t live in any single file. It emerges from the relationships between files, and current agents can only approximate it by reading, which becomes less reliable the larger and more interconnected the system gets. This may improve as the technology evolves, but today the structural gap is real and the human has to fill it.

The engineers who get the most from agentic coding at scale are also the ones doing the most continuous architectural work:

  • Re-orienting agents at session boundaries.
  • Pushing back on output that conflicts with established patterns.
  • Watching for agent decisions that could change the system's structure, and making sure those decisions aren't made by default.

One of the most effective techniques for maintaining architectural continuity across sessions is pointing agents explicitly at prior solved problems in the codebase before generating new code. Every piece of code you write with care today gives future agents a clear example to follow when they work on related problems.

Encode your architectural rules, naming conventions, and structural constraints clearly enough that an agent can check compliance on every PR. Leverage agents to apply well-defined rules consistently and at scale. Don’t trust your agents to make important judgment calls that require understanding the system’s history, trade-offs, and consequences. Those are yours to make.

[AO1] Before every session, re-orient the agent to the architectural patterns and invariants of the system. Don’t assume it remembers: Context resets between sessions. The agent that helped you yesterday has no memory of what you built or why. Reconstruct that context deliberately, or the agent will make architectural decisions from scratch. The most effective implementation of AO1 is a context file that every session reads automatically. AGENTS.md is the emerging cross-tool standard, supported by Claude Code, GitHub Copilot, Cursor, Windsurf, and others, for encoding architectural patterns, invariants, and coding standards at the repo level. Claude Code additionally reads CLAUDE.md. Maintaining this file is itself an engineering discipline: keep it current, keep it precise, and treat it as the architectural memory that persists across every session.

Planning documents, architectural notes, and specification artifacts that no longer reflect the current state of the system are worse than no context at all: they give the agent false confidence about constraints and decisions that may have changed. Before providing prior artifacts as session context, verify they’re current. Outdated artifacts should be marked as historical or removed.

[AO2] Before directing an agent to make a change, trace the full impact boundary yourself. The agent can’t do this reliably: Before directing the agent to make a change, trace the impact boundary yourself: what else calls this code, what invariants could be affected, what tests cover the affected behavior, and what architectural assumptions this change touches. The agent can’t do this reliably. If you don't do it before the session starts, nobody will.

[AO3] Keep system architecture diagrams current. Update them when the architecture changes, immediately, not deferred: These are not planning documents. They’re descriptive of the current state of the system. An agent given an accurate architecture diagram before a session has a structural map of the system that it can’t reconstruct from reading files alone. An outdated diagram is worse than no diagram: it will shape the agent’s reasoning in the wrong direction. Every session where an agent touches architectural concerns should begin with the current diagram in context. Store architecture diagrams as Mermaid files in the repository. Mermaid diagrams are text-based, version-controlled, and readable by both engineers and agents. This makes the architecture a first-class artifact of the codebase rather than a document that lives elsewhere and drifts. When a session touches architectural concerns, ask the agent to propose Mermaid diagram updates at the end of the session. Review and approve them before closing the session. The approved diagram becomes the opening context for the next session.

[AO4] Run an agent-assisted architectural drift scan at minimum monthly, and more frequently on active codebases: Agentic coding sessions each make local decisions that appear reasonable in isolation. Across many sessions those decisions compound. Duplicated implementations of the same concept appear because agents don’t see what was already solved elsewhere. Abstractions become inconsistent because each session invents its own. Dependencies accumulate because agents default to adding rather than reusing. Structural changes happen as side effects of feature work and go undetected until they have calcified. No individual PR review catches this. It only becomes visible when you look at the codebase as a whole.

Configure an AI coding agent to scan for: duplicated implementations of the same concept, abstractions that no longer match the system’s actual structure, dependency growth that wasn’t deliberately chosen, and structural changes that bypassed the architectural review process. The output is a report reviewed by the engineer responsible for architectural ownership. The scan surfaces candidates. The engineer judges what they mean and what action to take. Treating scan findings as authoritative without that judgment is as dangerous as not scanning at all.

14 Simplicity as a forcing function

Agents bias toward complexity because the complicated solutions abound in their training data. They frequently default to Gang of Four design patterns and distributed system architectures where simpler solutions would be superior for maintainability and performance. These defaults are consistent enough that every engineer on the team should be checking for them on every PR, not just the ones with enough experience to recognize them instinctively.

When directing an agent to decompose a problem, orient it toward deeper modules with clean interfaces. Don’t let the agent default to service decomposition unless the operational and organizational case for it is explicit. Agents reach for microservices because they’re well-represented in training data, not because the architecture calls for it.

[SI1] Challenge every new abstraction, layer, dependency, and generalized solution before accepting it: The burden of proof is on complexity, not simplicity. If you can’t articulate why the abstraction is necessary now, it isn’t.

[SI2] Design modules with a small, simple interface and substantial internal functionality: A module built this way is easier for agents to work with for 3 concrete reasons. First, the agent can understand what the module does by reading its interface without having to trace through all its internal dependencies. Second, tests can wrap the module at its boundary and verify behavior without needing to mock a dozen internal collaborators. Third, when the agent makes a change inside the module, the impact is contained and can’t accidentally break something outside the module boundary. Shallow modules force agents to trace dependency chains across many small files to understand what’s happening, and agents that can’t see the full dependency graph make wrong assumptions. This is the same structural blindness problem described in discipline 10, compounded at the design level. The goal is a module with a focused responsibility and a clean boundary, not a god module that absorbs unrelated concerns because they happened to be convenient to add. If a module is growing in ways that make it harder to understand or test, it has crossed the line.

[SI3] Design for Service → Namespace → Class → Method navigability. Any engineer or agent should find the relevant code in seconds and safely ignore 99% of the rest: This is the practical test for whether your simplicity discipline is actually working. If a maintenance engineer has to trace through multiple layers of indirection, hunt across files with no clear organizational logic, or read large amounts of unrelated code to understand what needs to change, the system has failed this test. The same navigability that serves human maintainers serves agents: an agent that can locate the relevant code quickly and ignore everything else makes better decisions, burns fewer tokens, and is far less likely to produce changes that affect code it was never supposed to touch.

15 Domain and product knowledge

Agents make architectural tradeoffs by default, and those defaults reflect the average of their training data, not your product’s actual needs. A naval sensor system needs availability, accuracy, and adaptability. A defense analytics platform needs scalability, performance, and compliance. An architecture that serves one can actively damage another. An engineer who doesn’t understand the product deeply can’t recognize when an agent is optimizing for the wrong quality attributes.

[DK1] Before directing an agent on a feature, understand the user needs, product constraints, and quality attributes that shape the solution: The agent fills the gap with training data defaults, producing code shaped by the wrong product entirely. The debt you create is yours to pay down.

Code that works, passes tests, looks clean, and that nobody can explain is risky "dark code”. Strong discipline is needed to leverage agents to achieve 2x-10x productivity gains. Weak discipline will result in accumulating technical debt at machine speed.

You can get the full 29 page document from my LinkedIn page: https://www.linkedin.com/pulse/engineering-standards-agentic-software-development-edensoft-park-ki1se/

Top comments (0)