DEV Community

Cover image for Stop Treating AI Coding Assistants Like One Big Chat Box
Lucas
Lucas

Posted on

Stop Treating AI Coding Assistants Like One Big Chat Box

Most teams start using AI coding tools in the same way: one big chat box, one broad instruction, one increasingly long context window.

That works for small tasks. It breaks down when the hard part is not generating code, but preserving engineering judgment.

The more useful pattern I have seen is to stop treating the assistant as one generalist and start treating it as a small team of specialists:

  • an architecture reviewer
  • a repository-pattern reviewer
  • a workflow reviewer
  • a test writer
  • a component reviewer
  • a type auditor
  • a form assistant
  • an infrastructure reviewer

The important part is not the agent names. The important part is that each agent has a narrow job, a known checklist, and a shared source of truth.

That changes the developer experience from:

"Ask the AI to help and hope it remembers the rules."

to:

"Route this change through the same standards every time."

DevEx Is Mostly Remembering the Same Decisions

A lot of engineering quality is not glamorous. It is remembering decisions the team already made:

  • Do not call services across bounded contexts.
  • Do not hand-edit generated API clients.
  • Do not use offset pagination for list endpoints.
  • Do not pass connection-bound objects into durable workflow steps.
  • Do not put business logic inside shared UI primitives.
  • Do not make up citations when source material is missing.
  • Do not run infrastructure plans for the whole world when only one Terragrunt unit changed.

Humans can remember these rules, but humans get tired. Pull requests get large. Reviewers rotate. Project context gets stale.

Specialist agents help because they turn recurring review comments into repeatable workflows.

An architecture reviewer does not need to know everything. It needs to know dependency direction, service boundaries, ports and adapters, circular imports, and where orchestration belongs.

A repository reviewer does not need to critique React components. It needs to know access envelopes, cursor pagination, protocol stubs, session patterns, and data-access boundaries.

A component reviewer does not need to understand DB workflows. It needs to know design tokens, generated API hooks, form state, server state, and component boundaries.

That narrowness is the feature.

The Pattern: Agents Plus Shared Knowledge

The strongest setup I found had three layers:

  1. Shared knowledge files
  2. Specialist agents that read those files
  3. Hooks, skills, or commands that invoke the agents at useful moments

The shared knowledge files were not magic. They were plain Markdown documents covering things like:

  • architectural fitness
  • repository patterns
  • workflow patterns
  • testing principles
  • frontend standards
  • infrastructure conventions
  • documentation rules

That sounds boring. It is exactly why it works.

If the architecture rule changes, you update the rule once. The architecture reviewer, scaffolding skill, test generator, and human documentation all point back to the same source.

That gives you a useful DevEx loop:

  1. Scaffold with the standard.
  2. Review with the same standard.
  3. Fix drift when the standard and code disagree.
  4. Promote new lessons back into the shared knowledge base.

This is much better than a prompt library full of duplicated, slightly stale instructions.

Here is the shape:

The diagram is simple, but the feedback loop matters. The agent is not the source of truth. The source of truth is the standard the team can read, review, and change.

Backend Agents: Architecture and Security as Reviewable Work

On the backend side, the most useful agents were not "write me a feature" agents. They were reviewers for engineering constraints.

The architecture reviewer checked things like:

  • circular dependencies
  • domain dependency direction
  • direct service-to-service calls
  • port and adapter boundary violations
  • folder structure drift

The repository reviewer focused on data access:

  • access envelopes for authorization
  • cursor-based pagination
  • repository protocol conventions
  • in-memory adapter session patterns

The durable-workflow reviewer checked framework-specific footguns:

  • workflow step parameters must be serializable
  • guards should run first
  • queues should be configured intentionally
  • workflows should not create unit-of-work state in the wrong place
  • orchestration should be separated from execution for testability

The testing agent encoded another important stance: avoid mocks when a real in-memory adapter or real library object gives more trustworthy feedback.

This is a good use of AI because these reviews are specific. The agent is not being asked to "make the code better." It is being asked to check a bounded set of architectural contracts.

That makes the output easier to trust, easier to dispute, and easier to improve.

Frontend Agents: Make the Happy Path the Standard Path

The frontend setup had a similar shape, but different specialists:

  • a component reviewer
  • a type auditor
  • a form assistant
  • a test generator

The component reviewer checked design-system and React patterns:

  • use design tokens
  • avoid hardcoded colors and arbitrary style values
  • keep shared UI components free of business logic
  • use generated API hooks for server state
  • do not use useEffect for derived state
  • extract complex logic into hooks

The type auditor enforced generated API types, no any, discriminated unions, safe narrowing, and constant patterns.

The form assistant encoded TanStack Form plus Zod conventions so every form did not become a one-off design exercise.

The test generator reduced the blank-page cost of writing tests by giving the project a known Vitest and React Testing Library shape: provider wrappers, API hook mocks, context setup, and behavior-first assertions.

This is where AI-driven DevEx gets practical. A developer should not have to rediscover the local pattern every time they add a form or route. The agent can carry that boring context.

Infrastructure Agents: Fewer Global Plans, More Targeted Feedback

Infrastructure is where "helpful AI" can become risky fast.

The better pattern is not to let an agent improvise Terraform. The better pattern is to give agents very explicit repository rules:

  • which guidance file is canonical
  • which directories map to AWS, Azure, shared modules, and bootstrap code
  • which generated files must not be edited
  • how Terragrunt layering works
  • which environment names are real
  • when native terraform test is preferred
  • how Checkov skips must be justified
  • how stale docs should be treated

The DevEx win is focus.

Instead of every pull request triggering broad, noisy infrastructure feedback, a targeted pipeline can map changed files to the affected Terragrunt units, run plans per unit, avoid state-lock collisions with per-target concurrency, and reserve production changes for manual promotion.

That is not only safer. It is also kinder to developers.

Fast feedback is not just about speed. It is about making the feedback small enough that a human can reason about it.

Runtime Agents Need Guardrails Too

There is another side of this pattern: product runtime agents.

In AI-powered products, the agent architecture itself becomes part of DevEx. If the product is one giant prompt, debugging is miserable. If the product emits typed intermediate artifacts, debugging becomes possible.

The strongest runtime patterns I saw had the model produce structured plans and proposals, while deterministic code owned execution.

Examples:

  • A data analyst agent chooses between direct reasoning and a typed query plan.
  • Local code validates and compiles the plan into deterministic SQL.
  • A validator agent runs separate correctness and completeness checks.
  • A presentation agent emits slide events rather than mutating a deck directly.
  • The system builds a ghost preview before a user accepts the change.

In practice, that flow looks like this:

That last one is especially important.

AI should not silently mutate important state when review matters. It should propose candidate events, show the resulting state, and let deterministic code commit the accepted proposal.

This gives developers better test surfaces:

  • test the model adapter contract
  • test the event reducer
  • test proposal acceptance
  • test rejection and refinement
  • test confidence thresholds
  • test replay and audit behavior

It also gives users a better mental model: "Here is what will happen if you accept."

The Best Agents Are Boring

The best agents in this pattern are not flashy.

They are boring in the way good linters are boring. They remember the team standard. They apply it consistently. They complain in predictable ways.

That does not replace human review. It changes what human review is for.

Instead of spending senior-engineer time repeating:

  • "Use the generated type."
  • "This should be cursor pagination."
  • "This shared component has business logic."
  • "This workflow step cannot serialize that dependency."
  • "This Terraform example uses the wrong environment name."

the team can spend review time on product behavior, tradeoffs, naming, risk, and whether the standard itself should change.

A Practical Way to Start

If you want to try this pattern, do not start with ten agents.

Start with one recurring pain:

  • frontend component drift
  • test setup inconsistency
  • repository access mistakes
  • architecture boundary violations
  • infrastructure plan noise

Then write down the rules as a small knowledge file.

After that, create one specialist agent with:

  • a narrow purpose
  • files it should read
  • rules it must enforce
  • examples of violations
  • an output format that is easy to scan

For example, a first pass might be as small as this:

name: component-reviewer
purpose: Review React components for local project standards.
reads:
  - docs/frontend-standards.md
  - docs/design-tokens.md
checks:
  - Shared UI components do not call APIs.
  - Server state uses generated query hooks.
  - Forms use the project form library and schema validation.
  - Colors use design tokens, not hardcoded values.
  - Derived state is computed directly, not mirrored with effects.
output:
  - file
  - issue
  - why it matters
  - suggested fix
Enter fullscreen mode Exit fullscreen mode

That is enough to be useful. It is narrow, testable, and easy for a reviewer to disagree with.

Once that works, add one generator or scaffolder that uses the same knowledge.

That pairing matters. Generation and review should not teach different styles.

What to Watch For

This pattern has failure modes.

The big one is drift. If the code and the knowledge files disagree, agents will confidently enforce stale rules.

The fix is to treat agent guidance like code:

  • review it
  • test it through real tasks
  • keep it close to the codebase
  • delete outdated rules
  • document intentional exceptions

Another failure mode is over-broad agents. If an agent owns "quality," it owns nothing. Give each agent a job small enough that its findings are specific.

The final failure mode is letting prompts replace deterministic systems. For runtime AI, use models to produce structured intent, not to bypass validation, authorization, tests, or audit trails.

The Takeaway

The DevEx breakthrough is not "AI writes more code."

The better breakthrough is:

AI can make engineering standards easier to apply.

Specialist agents work because they encode team memory. Shared knowledge files work because they reduce drift. Typed proposals and deterministic execution work because they make AI behavior debuggable.

That is the version of AI-assisted development I want more of: less magic, more leverage.

Top comments (0)