DEV Community

Cover image for Advanced Context Engineering for Coding Agents
Alex Metelli
Alex Metelli

Posted on

Advanced Context Engineering for Coding Agents

Full reference:

📺 Advanced Context Engineering for Coding Agents

🎤 Dex Horthy

đź”— https://youtu.be/rmvDxxNubIg?si=GtPAqK-lnY58dlIO


Introduction

AI coding agents have dramatically increased developer throughput. However, in real-world usage—especially in large, long-lived (“brownfield”) codebases—many teams observe a mismatch between output and progress.

This post is a faithful technical distillation of Dex Horthy’s talk on advanced context engineering: practical techniques for making today’s LLMs effective, reliable, and scalable for serious software engineering.


The Problem: Productivity ≠ Progress

Large-scale surveys of developers show a consistent pattern:

  • AI increases code shipped
  • Code churn increases even more
  • Teams repeatedly rework AI-generated output
  • Brownfield codebases suffer the worst outcomes

AI performs well for greenfield projects, prototypes, and dashboards. But in complex systems with legacy constraints, naive agent usage becomes a tech-debt factory.

This aligns with the lived experience of many senior engineers and founders.


Why This Happens: Context Is the Only Control Surface

Large language models are:

  • Stateless (no memory between sessions)
  • Non-deterministic
  • Entirely governed by the current context window

Every decision—tool usage, file edits, hallucinations—is determined by the tokens currently in context.

Better tokens in → better tokens out.

More tokens does not mean better outcomes.


The Dumb Zone

As context usage grows, model quality degrades. Empirically, this often begins around ~40% of the context window, depending on task complexity.

This region is referred to as the dumb zone.

Common causes

  • Large tool outputs (JSON, UUIDs, logs)
  • Unfiltered file dumps
  • Repeated correction loops
  • MCPs dumping irrelevant data
  • Long chat histories full of noise

Once in the dumb zone, agents become unreliable regardless of model quality.


Trajectory Matters

LLMs learn patterns within a conversation.

If the conversation looks like:

  1. Model makes a mistake
  2. Human scolds the model
  3. Model makes another mistake
  4. Human scolds again

The most likely continuation is… another mistake.

Bad trajectories reinforce failure modes.

This is why restarting sessions or compressing context is often more effective than continued correction.


Intentional Compaction

Intentional compaction is the deliberate compression of context into a minimal, high-signal representation.

Instead of dragging an ever-growing conversation forward, you:

  1. Summarize the current state into a markdown artifact
  2. Review and validate it as a human
  3. Start a fresh context seeded with that artifact

What to compact

  • Relevant files and line ranges
  • Verified architectural behavior
  • Decisions already made
  • Explicit constraints and non-goals

What not to compact

  • Raw logs
  • Tool traces
  • Full file contents
  • Repetitive error explanations

Compaction converts exploration into a one-time cost instead of a recurring tax.


Sub-Agents Are About Context, Not Roles

Sub-agents are frequently misunderstood.

They are not about mirroring human roles like “frontend agent” or “QA agent”.

They exist to:

  • Fork a clean context window
  • Perform large exploratory reads
  • Return a succinct factual summary to a parent agent

Example:

  • Sub-agent scans a large repo
  • Returns: “Relevant logic is in foo/bar.ts:120–340, entrypoint is BazHandler”

The parent agent then reads only what matters.

This is how you scale context without entering the dumb zone.


The Research–Plan–Implement Workflow

This workflow is not “spec-driven development”. That term has become semantically diffused.

RPI is about systematic compaction at every stage.


Research: Compressing Truth

Goal:

  • Understand how the system actually works
  • Identify authoritative files and flows
  • Eliminate assumptions

Characteristics:

  • Read code, not docs
  • Produce a short research artifact
  • Validate findings manually

If agents are not onboarded with accurate context, they will fabricate.

This mirrors Memento: without memory, agents invent narratives.


Plan: Compressing Intent

Planning is the highest-leverage activity.

A good plan:

  • Lists exact steps
  • References concrete files and snippets
  • Specifies validation after each change
  • Makes failure modes obvious

A solid plan dramatically constrains agent behavior.

Bad plans produce dozens of bad lines of code.

Bad research produces hundreds.


Implement: Mechanical Execution

Once the plan is correct:

  • Execution becomes mechanical
  • Context remains small
  • Reliability increases

This is where token spend actually pays off.


Mental Alignment and Code Review

Code review is primarily about shared understanding, not syntax.

As AI output scales, reviewing thousands of lines becomes unsustainable.

High-performing teams:

  • Review research and plans
  • Attach agent transcripts or AMP threads to PRs
  • Show exact steps and test results

Reviewing plans preserves architectural coherence as throughput increases.


Limits: AI Does Not Replace Thinking

AI amplifies the quality of thinking already done.

In cases like deep architectural refactors or legacy systems with hidden invariants, teams must return to human design first.

There is no perfect prompt.

There is no silver bullet.

Thinking cannot be outsourced.


Choosing the Right Level of Context Engineering

Task Type Recommended Approach
UI tweak Direct instruction
Small feature Light plan
Cross-repo change Research + plan
Deep refactor Full RPI + human design

The ceiling of problem difficulty rises with context discipline.


What Comes Next

Coding agents will be commoditized.

The real challenge is adapting:

  • Team workflows
  • SDLC processes
  • Cultural norms

Without this, teams risk:

  • Juniors shipping slop
  • Seniors cleaning it up
  • Technical debt scaling with AI usage

This is a workflow and leadership problem, not a tooling one.


Key Takeaways

  • Context is the only lever that matters
  • More tokens often reduce correctness
  • Intentional compaction is mandatory
  • Research and planning are the highest ROI activities
  • AI amplifies thinking—it does not replace it

Source & Attribution

This article is a faithful technical adaptation of:

Dex Horthy — *Advanced Context Engineering for Coding Agents*

📺 https://youtu.be/rmvDxxNubIg?si=GtPAqK-lnY58dlIO

All ideas, terminology, and frameworks originate from the referenced talk.

Top comments (0)