Shinsuke KAGAWA

Posted on Jan 26 • Originally published at norsica.jp

Planning Is the Real Superpower of Agentic Coding

#ai #softwareengineering #llm #architecture

I see this pattern constantly: someone gives an LLM a task, it starts executing immediately, and halfway through you realize it's building the wrong thing. Or it gets stuck in a loop. Or it produces something that technically works but doesn't fit the existing codebase at all.

The instinct is to write better prompts. More detail. More constraints. More examples.

The actual fix is simpler: make it plan before it executes.

Research shows that separating planning from execution dramatically improves task success rates—by as much as 33% in complex scenarios.

In earlier articles, I wrote about why LLMs struggle with first attempts and why overloading AGENTS.md is often a symptom of that misunderstanding. This article focuses on what actually fixes that.

Why "Just Execute" Fails

This took me longer to figure out than I'd like to admit. When you ask an LLM to directly implement something, you're asking it to:

Understand the requirements
Analyze the existing codebase
Design an approach
Evaluate trade-offs
Decompose into steps
Execute each step
Verify results

All in one shot. With one context. Using the same cognitive load throughout.

Even powerful LLMs struggle with this. Not because they lack capability, but because long-horizon planning is fundamentally hard in a step-by-step mode.

The Plan-Execute Architecture

Research on LLM agents has consistently shown that separating planning and execution yields better results.

The reasons:

Benefit	Explanation
Explicit long-term planning	Even strong LLMs struggle with multi-step reasoning when taking actions one at a time. Explicit planning forces consideration of the full path.
Model flexibility	You can use a powerful model for planning and a lighter model for execution—or even different specialized models per phase.
Efficiency	Each execution step doesn't need to reason through the entire conversation history. It just needs to execute against the plan.

What matters here: the plan becomes an artifact, and the execution becomes verification against that artifact.

If you've read about why LLMs are better at verification than first-shot generation, this should sound familiar. Creating a plan first converts the execution task from "generate good code" to "implement according to this plan"—a much clearer, more verifiable objective.

The Full Workflow

The complete picture:

Step 1: Preparation
    │
    ▼
Step 2: Design (Agree on Direction)
    │
    ▼
Step 3: Work Planning  ← The Most Important Step
    │
    ▼
Step 4: Execution
    │
    ▼
Step 5: Verification & Feedback

I'll walk through each step, but Step 3 is where the magic happens.

Step 1: Preparation

Goal: Clarify what you want to achieve, not how.

Create a ticket, issue, or todo document stating the goal in plain language
Point the LLM to AGENTS.md (or CLAUDE.md, depending on your tool) and relevant context files
Don't jump into implementation details yet

This is about setting the stage, not solving the problem.

Step 2: Design (Agree on Direction)

Goal: Align on the approach before any code gets written.

Don't Let It Start Coding Immediately

Instead of "implement this feature," say:

"Before implementing, present a step-by-step plan for how you would approach this."

Review the Plan

Look for:

Contradictions with existing architecture
Simpler alternatives the LLM missed
Misunderstandings of the requirements

At this stage, you're agreeing on what to build and why this approach. The how and in what order come in Step 3.

Step 3: Work Planning (The Most Important Step)

This section is dense. But the payoff is proportional—the more carefully you plan, the smoother execution becomes.

For small tasks, you don't need all of this. See "Scaling to Task Size" at the end.

Goal: Convert the design into executable work units with clear completion criteria.

Why This Step Matters Most

Research shows that decomposing complex tasks into subtasks significantly improves LLM success rates. Step-by-step decomposition produces more accurate results than direct generation.

But there's another reason: the work plan is an artifact.

When the plan exists, the execution task transforms:

Before: "Build this feature" (generation)
After: "Implement according to this plan" (verification)

This is the same principle from Article 1. Creating a plan first means execution becomes verification—and LLMs are better at verification.

What Work Planning Includes

Task decomposition: Break the design into executable units
Dependency mapping: Define order and dependencies between tasks
Completion criteria: What does "done" mean for each task?
Checkpoint design: When do we get external feedback?

Perspectives to Consider

I'll be honest: I learned most of these the hard way. Plans would fall apart mid-implementation, and only later did I realize I'd skipped something obvious in hindsight.

These aren't meant to be followed rigidly for every task. Think of them as a mental checklist. You don't need to get all of these right—if even one of these perspectives changes your plan, it's doing its job.

Perspective 1: Current State Analysis

Understand what exists before planning changes.

What is this code's actual responsibility?
Which parts are essential business logic vs. technical constraints?
What benefits and limitations does the current design provide?
What implicit dependencies or assumptions aren't obvious from the code?

Skipping this leads to plans that don't fit the existing codebase.

Perspective 2: Strategy Selection

Consider how to approach the transition from current to desired state.

Research options:

Look for similar patterns in your tech stack
Check how comparable projects solved this
Review OSS implementations, articles, documentation

Common strategy patterns:

Strangler Pattern: Gradual replacement, incremental migration
Facade Pattern: Hide complexity behind unified interface
Feature-Driven: Vertical slices, user-value first
Foundation-Driven: Build stable base first, then features on top

The key isn't applying patterns dogmatically—it's consciously choosing an approach instead of stumbling into one.

Perspective 3: Risk Assessment

Evaluate what could go wrong with your chosen strategy.

Risk Type	Considerations
Technical	Impact on existing systems, data integrity, performance degradation
Operational	Service availability, deployment downtime, rollback procedures
Project	Schedule delays, learning curve, team coordination

Skipping risk assessment leads to expensive surprises mid-implementation.

Perspective 4: Constraints

Identify hard limits before committing to a strategy.

Technical: Library compatibility, resource capacity, performance requirements
Timeline: Deadlines, milestones, external dependencies
Resources: Team availability, skill gaps, budget
Business: Time-to-market, customer impact, regulations

A strategy that ignores constraints isn't executable.

Perspective 5: Completion Levels

Define what "done" means for each task—this is critical.

Level	Definition	Example
L1: Functional verification	Works as user-facing feature	Search actually returns results
L2: Test verification	New tests added and passing	Type definition tests pass
L3: Build verification	No compilation errors	Interface definition complete

Priority: L1 > L2 > L3. Whenever possible, verify at L1 (actually works in practice).

This directly maps to "external feedback" from the previous articles. Defining completion levels upfront ensures you get external verification at each checkpoint.

Perspective 6: Integration Points

Define when to verify things work together.

Strategy	Integration Point
Feature-driven	When users can actually use the feature
Foundation-driven	When all layers are complete and E2E tests pass
Strangler pattern	At each old-to-new system cutover

Without defined integration points, you end up with "it all works individually but doesn't work together."

Task Decomposition Principles

After considering the perspectives, break down into concrete tasks:

Executable granularity:

Each task = one meaningful commit
Clear completion criteria
Explicit dependencies

Minimize dependencies:

Maximum 2 levels deep (A→B→C is okay, A→B→C→D needs redesign)
Tasks with 3+ chained dependencies should be split
Each task should ideally provide independent value

Build quality in:

Don't make "write tests" a separate task—include testing in the implementation task
Tag each task with its completion level (L1/L2/L3, though in practice L1 is almost always what you want)

Work Planning Anti-Patterns

Anti-Pattern	Consequence
Skip current-state analysis	Plan doesn't fit codebase
Ignore risks	Expensive surprises mid-implementation
Ignore constraints	Plan isn't executable
Over-detail	Lose flexibility, waste planning time
Undefined completion criteria	"Done" is ambiguous, verification impossible

Scaling to Task Size

Not every task needs full work planning.

Scale	Planning Depth
Small (1-2 hours)	Verbal/mental notes or simple TODO list
Medium (1 day to 1 week)	Written work plan, but abbreviated
Large (1+ weeks)	Full work plan covering all perspectives

For a typo fix, you don't need a work plan. For a multi-week refactor, you absolutely do.

Step 4: Execution

Goal: Implement according to the work plan.

Work in Small Steps

Follow the plan. One task at a time. One file, one function at a time where appropriate.

Types-First

When adding new functionality, define interfaces and types before implementing logic. Type definitions become guardrails that help both you and the LLM stay on track.

Why This Changes Everything

With a work plan in place, execution becomes verification. The LLM isn't guessing what to build—it's checking whether the implementation matches the plan.

If you need to deviate from the plan, update the plan first, then continue implementation. Don't let plan and implementation drift apart.

Step 5: Verification & Feedback

Goal: Verify results and externalize learnings.

Feedback Format

When something goes wrong, don't just paste an error. Include the intent:

❌ Just the error
[error log]

✅ Intent + error
Goal: Redirect to dashboard after authentication
Issue: Following error occurs
[error log]

Without intent, the LLM optimizes for "remove the error." With intent, it optimizes for "achieve the goal."

Externalize Learnings

If you find yourself explaining the same thing twice, it's time to write it down.

I covered this in detail in the previous article—where to put rules, what to write, and how to verify they work. The short version: write root causes, not specific incidents, and put them where they'll actually be read.

Referencing Skills and Rules

One common failure mode: you reference a skill or rule file, but the LLM just reads it and moves on without actually applying it.

The Problem

Pattern	Issue
Write "see AGENTS.md"	It's already loaded—redundant reference adds noise
`@file.md` only	LLM reads it, then continues. Reading ≠ applying
"Please reference X"	References it minimally, doesn't apply the content

The Solution: Blocking References

Make the reference a task with verification:

## Required Rules [MANDATORY - MUST BE ACTIVE]

**LOADING PROTOCOL:**
- STEP 1: CHECK if `.agents/skills/coding-rules/SKILL.md` is active
- STEP 2: If NOT active → Execute BLOCKING READ
- STEP 3: CONFIRM skill active before proceeding

Why This Works

Element	Effect
Action verbs	"CHECK", "READ", "CONFIRM"—not just "reference"
STEP numbers	Forces sequence, can't skip
Before proceeding	Blocking—must complete before continuing
If NOT active	Conditional—skips if already loaded (efficiency)

This maps to the task clarity principle: "check if loaded → load if needed → confirm → proceed" is far clearer than "please reference this file."

How This Connects to the Theory

Step	Connection to LLM Characteristics
Step 1: Preparation	Task clarification
Step 2: Design	Artifact-first (design doc is an artifact)
Step 3: Work Planning	Artifact-first (plan is an artifact) + external feedback design
Step 4: Execution	Transform "generation" into "verification against plan"
Step 5: Verification	Obtain external feedback + externalize learnings

The work plan created in Step 3 converts Step 4 from "generate from scratch" to "verify against specification." This is the key mechanism for improving accuracy.

The Research

The practices in this article aren't just workflow opinions—they're backed by research on how LLM agents perform.

ADaPT (Prasad et al., NAACL 2024): Separating planning and execution, with dynamic subtask decomposition when needed, achieved up to 33% higher success rates than baselines (28.3% on ALFWorld, 27% on WebShop, 33% on TextCraft).

Plan-and-Execute (LangChain): Explicit long-term planning enables handling complex tasks that even powerful LLMs struggle with in step-by-step mode.

Multi-Layer Task Decomposition (PMC, 2024): Step-by-step models generate more accurate results than direct generation—task decomposition directly improves output quality.

Task Decomposition (Amazon Science, 2025): With proper task decomposition, smaller specialized models can match the performance of larger general models.

Key Takeaways

Don't let it execute immediately. Ask for a plan first. Even just "present your approach step-by-step before implementing" makes a significant difference.
Work Planning is the superpower. A plan is an artifact. Having it converts execution from generation to verification—and LLMs are better at verification.
Define completion criteria. L1 (works as feature) > L2 (tests pass) > L3 (builds). Know what "done" means before starting.
Scale to task size. Small task = mental note. Large task = full work plan. Don't over-plan trivial work, don't under-plan complex work.
Update plan before deviating. If implementation needs to differ from the plan, update the plan first. Drift kills the verification benefit.
Include intent with errors. "Goal + error" beats "just error." The LLM should know what you're trying to achieve, not just what went wrong.

References

Prasad, A., et al. (2024). "ADaPT: As-Needed Decomposition and Planning with Language Models." NAACL 2024 Findings. arXiv:2311.05772
Wang, L., et al. (2023). "Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models." ACL 2023.
LangChain. "Plan-and-Execute Agents." https://blog.langchain.com/planning-agents/

DEV Community