DEV Community

Cover image for Coding Agents for Software Engineers
Arseni Kavalchuk
Arseni Kavalchuk

Posted on

Coding Agents for Software Engineers

Architecture, Context, and Efficient Usage


1️⃣ What Is a Coding Agent?

A coding agent is not just an LLM.

It is a system:

IDE / CLI
    ↓
Agent Runtime
    ↓
Context Builder
    ↓
LLM Inference
    ↓
Tool Execution (fs, git, tests, shell)
    ↓
Loop
Enter fullscreen mode Exit fullscreen mode

The model is only the reasoning engine.
The runtime handles orchestration.


2️⃣ General Architecture of a Coding Agent

A production-grade coding agent includes:

1. Indexing Layer

  • Repo scanning
  • Symbol extraction
  • Dependency graph
  • Optional embeddings

2. Context Builder

  • Select relevant files
  • Inject instructions
  • Add plan/scratchpad
  • Add recent edits

3. LLM Inference Layer

  • Tokenized prompt
  • Context window constraints
  • Streaming output

4. Tool Layer

  • File read/write
  • Test execution
  • Git diff/patch
  • Lint/build commands

5. Loop Controller

  • Plan
  • Execute
  • Validate
  • Iterate

The model does not “see the repo.”
The agent chooses what to send.


3️⃣ What Is the Context Window?

The context window is:

The maximum number of tokens the model can attend to in a single inference call.

It includes:

System instructions
+ AGENTS.md / policies
+ Scratchpad / plan files
+ Relevant source files
+ Recent conversation
+ Tool outputs
+ Your current request
+ Model output
Enter fullscreen mode Exit fullscreen mode

Everything must fit inside the window.

Large window ≠ send everything.


4️⃣ Where Does Tokenization Happen?

Typically:

  • The agent runtime tokenizes locally (client-side).
  • It estimates token usage before calling the model.
  • The server still processes tokens during inference.

Why client-side tokenization matters:

  • Avoid exceeding context
  • Control cost
  • Control chunking
  • Optimize file selection

5️⃣ What Actually Consumes Tokens?

In coding workflows, token cost usually comes from:

  • Large source files
  • Test files
  • Logs
  • Replayed conversation history
  • Repeated system instructions
  • Scratchpad growth

Your instruction verbosity is rarely the main cost.
File selection is.


6️⃣ What Makes “Good Quality” Context?

Good context is:

✅ Relevant

Only include files that matter.

✅ Structured

Clear task → constraints → deliverable.

✅ Deterministic

Explicit scope boundaries.

✅ Minimal but sufficient

No narrative fluff.
No repeated architecture explanation.

Bad context is:

  • Entire repo dump
  • Long emotional explanations
  • Old irrelevant chat history
  • Ambiguous instructions

7️⃣ What Actually Improves Coding Responses?

Not politeness.
Not verbosity.

What improves results:

1️⃣ Clear Scope

Bad:

Improve authentication system.

Good:

Scope:
- src/auth/*
- src/middleware/auth.ts
Do not touch:
- public API
- schema definitions
Enter fullscreen mode Exit fullscreen mode

2️⃣ Explicit Constraints

Examples:

  • Do not change public interfaces.
  • Preserve test behavior.
  • No new dependencies.
  • Keep diff minimal.

Constraints reduce hallucinated refactors.


3️⃣ Defined Output Format

Example:

Deliverable:
- Unified diff only
- Brief explanation (<150 words)
Enter fullscreen mode Exit fullscreen mode

This reduces drift.


4️⃣ Plan-First Workflow

Instead of:

Implement feature X.

Use:

Step 1: Generate PLAN.md
Step 2: Review plan
Step 3: Implement Step 1 only
Step 4: Run tests
Enter fullscreen mode Exit fullscreen mode

Planning reduces chaotic edits.


8️⃣ Scratchpads and Plan Files

Scratchpad = external reasoning state.

Can be:

  • PLAN.md
  • TODO.md
  • In-memory state
  • Conversation buffer

Benefits:

  • Multi-step tracking
  • Reduced re-reasoning
  • Human-agent alignment
  • Safer iteration

But:
The model does not remember it.
The agent injects it into context each time.


9️⃣ Efficient Project Structure for Coding Agents

Recommended:

/AGENTS.md        # Global behavior rules (minimal)
/PLAN.md          # Task plan (editable)
/src/...
/tests/...
Enter fullscreen mode Exit fullscreen mode

AGENTS.md should contain:

  • Coding standards
  • Test commands
  • “Plan first” rule
  • Guardrails

Keep it short.
It is injected often.


🔟 Efficient Coding Agent Usage Patterns

Pattern A — Constrained Patch

Task:
Optimize middleware performance.

Scope:
src/auth/middleware.ts

Constraints:
- Preserve API
- No new deps

Output:
Unified diff only.
Enter fullscreen mode Exit fullscreen mode

Pattern B — Incremental Execution

Implement only Step 1 from PLAN.md.
Run tests.
Update PLAN.md.
Stop.
Enter fullscreen mode Exit fullscreen mode

Pattern C — Scope Locking

Explicitly limit directories:

Touch only:
src/auth/*
Do not modify:
src/db/*
Enter fullscreen mode Exit fullscreen mode

This prevents token waste and unintended edits.


1️⃣1️⃣ What NOT to Do

❌ Send the whole repo
❌ Re-explain system architecture every turn
❌ Let scratchpads grow unbounded
❌ Leave scope ambiguous
❌ Ask for “improve everything”


1️⃣2️⃣ Big Context Myth

A 1M token context window does not mean:

  • You should send 1M tokens.
  • It will be faster.
  • It will be more accurate.

Longer context:

  • Increases latency
  • Increases cost
  • Increases noise risk

Smart context selection beats large context.


1️⃣3️⃣ Mental Model for Engineers

Treat coding agents like this:

LLM = Stateless reasoning engine
Context = Input data packet
Agent = Orchestrator
Scratchpad = External memory
Enter fullscreen mode Exit fullscreen mode

Your job:
Optimize the data packet.


1️⃣4️⃣ Core Optimization Principles

  1. Structure > verbosity
  2. Relevance > completeness
  3. Constraints > freedom
  4. Iteration > giant prompts
  5. Plan → execute → verify

Final Takeaway

Coding agents perform best when:

  • The task is clearly scoped
  • Constraints are explicit
  • Context is curated
  • Plans are externalized
  • History is pruned
  • Output format is constrained

Top comments (0)