Arseni Kavalchuk

Posted on Feb 23

Coding Agents for Software Engineers

#codequality #vibecoding

Architecture, Context, and Efficient Usage

1️⃣ What Is a Coding Agent?

A coding agent is not just an LLM.

It is a system:

IDE / CLI
    ↓
Agent Runtime
    ↓
Context Builder
    ↓
LLM Inference
    ↓
Tool Execution (fs, git, tests, shell)
    ↓
Loop

The model is only the reasoning engine.
The runtime handles orchestration.

2️⃣ General Architecture of a Coding Agent

A production-grade coding agent includes:

1. Indexing Layer

Repo scanning
Symbol extraction
Dependency graph
Optional embeddings

2. Context Builder

Select relevant files
Inject instructions
Add plan/scratchpad
Add recent edits

3. LLM Inference Layer

Tokenized prompt
Context window constraints
Streaming output

4. Tool Layer

File read/write
Test execution
Git diff/patch
Lint/build commands

5. Loop Controller

Plan
Execute
Validate
Iterate

The model does not “see the repo.”
The agent chooses what to send.

3️⃣ What Is the Context Window?

The context window is:

The maximum number of tokens the model can attend to in a single inference call.

It includes:

System instructions
+ AGENTS.md / policies
+ Scratchpad / plan files
+ Relevant source files
+ Recent conversation
+ Tool outputs
+ Your current request
+ Model output

Everything must fit inside the window.

Large window ≠ send everything.

4️⃣ Where Does Tokenization Happen?

Typically:

The agent runtime tokenizes locally (client-side).
It estimates token usage before calling the model.
The server still processes tokens during inference.

Why client-side tokenization matters:

Avoid exceeding context
Control cost
Control chunking
Optimize file selection

5️⃣ What Actually Consumes Tokens?

In coding workflows, token cost usually comes from:

Large source files
Test files
Logs
Replayed conversation history
Repeated system instructions
Scratchpad growth

Your instruction verbosity is rarely the main cost.
File selection is.

6️⃣ What Makes “Good Quality” Context?

Good context is:

✅ Relevant

Only include files that matter.

✅ Structured

Clear task → constraints → deliverable.

✅ Deterministic

Explicit scope boundaries.

✅ Minimal but sufficient

No narrative fluff.
No repeated architecture explanation.

Bad context is:

Entire repo dump
Long emotional explanations
Old irrelevant chat history
Ambiguous instructions

7️⃣ What Actually Improves Coding Responses?

Not politeness.
Not verbosity.

What improves results:

1️⃣ Clear Scope

Bad:

Improve authentication system.

Good:

Scope:
- src/auth/*
- src/middleware/auth.ts
Do not touch:
- public API
- schema definitions

2️⃣ Explicit Constraints

Examples:

Do not change public interfaces.
Preserve test behavior.
No new dependencies.
Keep diff minimal.

Constraints reduce hallucinated refactors.

3️⃣ Defined Output Format

Example:

Deliverable:
- Unified diff only
- Brief explanation (<150 words)

This reduces drift.

4️⃣ Plan-First Workflow

Instead of:

Implement feature X.

Use:

Step 1: Generate PLAN.md
Step 2: Review plan
Step 3: Implement Step 1 only
Step 4: Run tests

Planning reduces chaotic edits.

8️⃣ Scratchpads and Plan Files

Scratchpad = external reasoning state.

Can be:

PLAN.md
TODO.md
In-memory state
Conversation buffer

Benefits:

Multi-step tracking
Reduced re-reasoning
Human-agent alignment
Safer iteration

But:
The model does not remember it.
The agent injects it into context each time.

9️⃣ Efficient Project Structure for Coding Agents

Recommended:

/AGENTS.md        # Global behavior rules (minimal)
/PLAN.md          # Task plan (editable)
/src/...
/tests/...

AGENTS.md should contain:

Coding standards
Test commands
“Plan first” rule
Guardrails

Keep it short.
It is injected often.

🔟 Efficient Coding Agent Usage Patterns

Pattern A — Constrained Patch

Task:
Optimize middleware performance.

Scope:
src/auth/middleware.ts

Constraints:
- Preserve API
- No new deps

Output:
Unified diff only.

Pattern B — Incremental Execution

Implement only Step 1 from PLAN.md.
Run tests.
Update PLAN.md.
Stop.

Pattern C — Scope Locking

Explicitly limit directories:

Touch only:
src/auth/*
Do not modify:
src/db/*

This prevents token waste and unintended edits.

1️⃣1️⃣ What NOT to Do

❌ Send the whole repo
❌ Re-explain system architecture every turn
❌ Let scratchpads grow unbounded
❌ Leave scope ambiguous
❌ Ask for “improve everything”

1️⃣2️⃣ Big Context Myth

A 1M token context window does not mean:

You should send 1M tokens.
It will be faster.
It will be more accurate.

Longer context:

Increases latency
Increases cost
Increases noise risk

Smart context selection beats large context.

1️⃣3️⃣ Mental Model for Engineers

Treat coding agents like this:

LLM = Stateless reasoning engine
Context = Input data packet
Agent = Orchestrator
Scratchpad = External memory

Your job:
Optimize the data packet.

1️⃣4️⃣ Core Optimization Principles

Structure > verbosity
Relevance > completeness
Constraints > freedom
Iteration > giant prompts
Plan → execute → verify

Final Takeaway

Coding agents perform best when:

The task is clearly scoped
Constraints are explicit
Context is curated
Plans are externalized
History is pruned
Output format is constrained

DEV Community