Architecture, Context, and Efficient Usage
1️⃣ What Is a Coding Agent?
A coding agent is not just an LLM.
It is a system:
IDE / CLI
↓
Agent Runtime
↓
Context Builder
↓
LLM Inference
↓
Tool Execution (fs, git, tests, shell)
↓
Loop
The model is only the reasoning engine.
The runtime handles orchestration.
2️⃣ General Architecture of a Coding Agent
A production-grade coding agent includes:
1. Indexing Layer
- Repo scanning
- Symbol extraction
- Dependency graph
- Optional embeddings
2. Context Builder
- Select relevant files
- Inject instructions
- Add plan/scratchpad
- Add recent edits
3. LLM Inference Layer
- Tokenized prompt
- Context window constraints
- Streaming output
4. Tool Layer
- File read/write
- Test execution
- Git diff/patch
- Lint/build commands
5. Loop Controller
- Plan
- Execute
- Validate
- Iterate
The model does not “see the repo.”
The agent chooses what to send.
3️⃣ What Is the Context Window?
The context window is:
The maximum number of tokens the model can attend to in a single inference call.
It includes:
System instructions
+ AGENTS.md / policies
+ Scratchpad / plan files
+ Relevant source files
+ Recent conversation
+ Tool outputs
+ Your current request
+ Model output
Everything must fit inside the window.
Large window ≠ send everything.
4️⃣ Where Does Tokenization Happen?
Typically:
- The agent runtime tokenizes locally (client-side).
- It estimates token usage before calling the model.
- The server still processes tokens during inference.
Why client-side tokenization matters:
- Avoid exceeding context
- Control cost
- Control chunking
- Optimize file selection
5️⃣ What Actually Consumes Tokens?
In coding workflows, token cost usually comes from:
- Large source files
- Test files
- Logs
- Replayed conversation history
- Repeated system instructions
- Scratchpad growth
Your instruction verbosity is rarely the main cost.
File selection is.
6️⃣ What Makes “Good Quality” Context?
Good context is:
✅ Relevant
Only include files that matter.
✅ Structured
Clear task → constraints → deliverable.
✅ Deterministic
Explicit scope boundaries.
✅ Minimal but sufficient
No narrative fluff.
No repeated architecture explanation.
Bad context is:
- Entire repo dump
- Long emotional explanations
- Old irrelevant chat history
- Ambiguous instructions
7️⃣ What Actually Improves Coding Responses?
Not politeness.
Not verbosity.
What improves results:
1️⃣ Clear Scope
Bad:
Improve authentication system.
Good:
Scope:
- src/auth/*
- src/middleware/auth.ts
Do not touch:
- public API
- schema definitions
2️⃣ Explicit Constraints
Examples:
- Do not change public interfaces.
- Preserve test behavior.
- No new dependencies.
- Keep diff minimal.
Constraints reduce hallucinated refactors.
3️⃣ Defined Output Format
Example:
Deliverable:
- Unified diff only
- Brief explanation (<150 words)
This reduces drift.
4️⃣ Plan-First Workflow
Instead of:
Implement feature X.
Use:
Step 1: Generate PLAN.md
Step 2: Review plan
Step 3: Implement Step 1 only
Step 4: Run tests
Planning reduces chaotic edits.
8️⃣ Scratchpads and Plan Files
Scratchpad = external reasoning state.
Can be:
- PLAN.md
- TODO.md
- In-memory state
- Conversation buffer
Benefits:
- Multi-step tracking
- Reduced re-reasoning
- Human-agent alignment
- Safer iteration
But:
The model does not remember it.
The agent injects it into context each time.
9️⃣ Efficient Project Structure for Coding Agents
Recommended:
/AGENTS.md # Global behavior rules (minimal)
/PLAN.md # Task plan (editable)
/src/...
/tests/...
AGENTS.md should contain:
- Coding standards
- Test commands
- “Plan first” rule
- Guardrails
Keep it short.
It is injected often.
🔟 Efficient Coding Agent Usage Patterns
Pattern A — Constrained Patch
Task:
Optimize middleware performance.
Scope:
src/auth/middleware.ts
Constraints:
- Preserve API
- No new deps
Output:
Unified diff only.
Pattern B — Incremental Execution
Implement only Step 1 from PLAN.md.
Run tests.
Update PLAN.md.
Stop.
Pattern C — Scope Locking
Explicitly limit directories:
Touch only:
src/auth/*
Do not modify:
src/db/*
This prevents token waste and unintended edits.
1️⃣1️⃣ What NOT to Do
❌ Send the whole repo
❌ Re-explain system architecture every turn
❌ Let scratchpads grow unbounded
❌ Leave scope ambiguous
❌ Ask for “improve everything”
1️⃣2️⃣ Big Context Myth
A 1M token context window does not mean:
- You should send 1M tokens.
- It will be faster.
- It will be more accurate.
Longer context:
- Increases latency
- Increases cost
- Increases noise risk
Smart context selection beats large context.
1️⃣3️⃣ Mental Model for Engineers
Treat coding agents like this:
LLM = Stateless reasoning engine
Context = Input data packet
Agent = Orchestrator
Scratchpad = External memory
Your job:
Optimize the data packet.
1️⃣4️⃣ Core Optimization Principles
- Structure > verbosity
- Relevance > completeness
- Constraints > freedom
- Iteration > giant prompts
- Plan → execute → verify
Final Takeaway
Coding agents perform best when:
- The task is clearly scoped
- Constraints are explicit
- Context is curated
- Plans are externalized
- History is pruned
- Output format is constrained
Top comments (0)