The Anatomy of a Perfect AI Agent Task

John Young — Mon, 27 Apr 2026 20:30:03 +0000

A well-crafted task for an AI coding agent is essentially context engineering — you're deliberately curating the minimum set of information the agent needs to produce the right output on the first try. Rather than pre-loading everything up front, the best approach combines focused instructions with enough pointers that the agent can pull in additional context just-in-time as it works (Anthropic — Effective Context Engineering). Below is a breakdown of every element that matters, why it matters, and a full example at the end that ties it all together.

When to Use This

The seven elements below describe the upper-bound shape of a non-trivial task spec, not a baseline checklist. For trivial work — fixing a typo, renaming a variable, anything where the agent has no real risk of getting it wrong — skip the elaborate spec. (The companion sizing post uses "describable in one sentence" as a sizing test, not a triviality test — well-sized tasks often fit in one sentence yet still warrant a full spec when there are constraints, edge cases, or pitfalls to communicate. The worked example below is one such task.) Even for non-trivial tasks, treat these elements as a maximum rather than a minimum: frontier LLMs reliably follow only ~150–200 instructions before performance degrades, and every irrelevant detail dilutes the signal of the rest (HumanLayer: Writing a Good CLAUDE.md).

1. State the Goal, Not the Steps

Lead with the outcome you want, not a micro-managed sequence of instructions. Agents perform better when they understand the "why" and can plan their own approach.

Bad: "Open user.go, find the CreateUser function, add a field called PhoneNumber..."
Good: "Add phone number support to user registration, including validation, storage, and API response."

"The best task descriptions share three properties: they state the goal, provide constraints, and define done."
— Claude Directory: Context Engineering for Claude Code

2. Provide Architectural Context the Agent Can't Infer

The agent can read your code. What it can't read is the reasoning behind your architectural decisions, team conventions, or the "why" behind structural choices. Include only what's not derivable from the codebase itself.

Include things like:

Why the architecture is shaped a certain way (e.g., "We use the repository pattern to keep DB logic out of handlers")
Relevant files and entry points (saves the agent from searching blindly and burning context window)
Technology choices and versions (e.g., "Go 1.22, sqlc for query generation, chi router")
Domain-specific terminology the agent might misinterpret

"Claude already knows what your project is after reading a few files. What it needs is information it can't derive from reading code."
— Claude Directory: Context Engineering

That said, there's a discipline to this — more context is not always better. Research suggests frontier LLMs can reliably follow roughly 150–200 instructions before performance degrades, and broader context-rot studies show models attend to context less reliably as input grows (Chroma: Context Rot — Hong et al., 2025). Every irrelevant detail you add dilutes the signal of the details that actually matter.

"Your CLAUDE.md file should contain as few instructions as possible — ideally only ones which are universally applicable. An LLM will perform better when its context window is full of focused, relevant context compared to when it has a lot of irrelevant context."
— HumanLayer: Writing a Good CLAUDE.md

3. Define Explicit Constraints and Non-Goals

This is where most tasks fall apart. Without boundaries, agents will happily refactor your auth layer while you asked them to add a field to a struct.

Constraints: What rules must be followed (e.g., "Do not change the public API contract," "Use the existing validate package, do not introduce a new dependency")
Non-goals: What is explicitly out of scope (e.g., "Do not modify the frontend," "Do not refactor existing tests")

"Without constraints, AI might miss pagination for list APIs, use field injection instead of constructor injection, or not adhere to your project's package structure."
— JetBrains: Coding Guidelines for AI Agents

4. Provide Concrete Examples and Reference Implementations

One of the highest-leverage things you can do. Point the agent at an existing implementation in your codebase that follows the pattern you want replicated.

"Follow the same pattern as internal/order/handler.go for the new endpoint."
"See migrations/003_add_email.sql for the migration format we use."

"Include helpful examples for reference. ❌ 'Implement tests for class ImageProcessor' → ✅ 'Implement tests for class ImageProcessor. Check text_processor.py for test organization examples.'"
— Augment Code: Best Practices for AI Coding Agents

5. Define "Done" with Acceptance Criteria

If you don't define what "done" looks like, the agent will decide for you — and you probably won't agree.

Acceptance criteria should be:

Observable (can be verified by running something)
Specific (not "should work correctly")
Testable (ideally map to test cases)

"Create a set of tests that will determine if the generated code works based on your requirements."
— Google Cloud: Five Best Practices for AI Coding Assistants

6. Include Verification Commands

Tell the agent exactly how to confirm its own work. This is the difference between "I think it works" and "it passes the build."

go test ./internal/user/...
go vet ./...
golangci-lint run
curl -X POST localhost:8080/api/v1/users -d '{"phone": "+1234567890"}' | jq .

"Claude Code's best practices emphasize including Bash commands for verification. This gives Claude persistent context it can't infer from code alone."
— Claude Code Docs: Best Practices

7. Call Out Edge Cases and Known Pitfalls

You know things about your system the agent doesn't. If there's a footgun, flag it. If there's a non-obvious coupling between modules, say so.

"The user_id column has a unique constraint — the migration must handle existing duplicates."
"The Validate() method is called both at the handler level and inside the repository. Don't double-validate."

The Full Example

A non-trivial feature decomposes into a handful of well-sized tasks. Take adding an optional phone number to user registration — accepted on signup, persisted on the user record, and returned by the user API. That feature splits into four tasks, one per architectural layer:

Migration — Add a nullable phone_number column with reversible up/down SQL.
Model + sqlc — Update the User struct and regenerate sqlc queries.
Service + validation — Add ValidatePhone to UserService using validate.PhoneE164, with unit tests.
Handler + integration — Wire the field through POST and GET /api/v1/users and add integration tests.

The third is spec'd out in full below as the worked example. It's the strongest illustration of the seven elements at the right scope: the diff fits in one sentence, it stays inside a single layer, the agent reads ~5 files, the change lands well under the 200 LOC ceiling, and it can be verified independently — passing every gate of the companion sizing post's decision flowchart.

## Task Spec: Add E.164 phone validation to UserService

### Goal
Phone numbers submitted to user registration must be rejected at the service layer when they aren't valid E.164. This task delivers that check; handler wiring and DB persistence are separate tasks.

### Architectural Context
- Semantic validation belongs in the service, not the handler. Handler does null/shape; service owns format and bounds.
- `UserService.ValidateEmail` is the canonical example of this split — match its shape.

### Relevant Files
- `internal/user/service.go` — add `ValidatePhone` here.
- `internal/user/service_test.go` — add tests here.
- `internal/pkg/validate/phone.go` — read-only reference for `PhoneE164` and `validate.Error`.

### Reference Implementation
Mirror `UserService.ValidateEmail` in `service.go`:
- Signature: `func (s *UserService) ValidatePhone(phone *string) error`.
- Nil pointer → return nil. Empty string → return error.
- Return the `*validate.Error` from `PhoneE164` unwrapped — no `fmt.Errorf`.
- Copy the table-driven layout from `TestUserService_ValidateEmail`.

### Constraints
- Use `validate.PhoneE164`. No regex, no new dependencies.
- Don't touch `UserRepository` or its mock — validation is pure.
- Don't wrap the error; the handler relies on `errors.As(&validate.Error{})` to map it to HTTP 422.

### Non-Goals
No handler, migration, sqlc, or integration-test changes. No edits to `ValidateEmail` or other unrelated methods.

### Edge Cases
- `phone == nil` → return nil (field not provided).
- `*phone == ""` → return `validate.Error` (malformed input).
- Strict E.164: `1234567890` (no leading `+`) must fail.
- The handler already checks the JSON field is present and is a string — don't re-check those concerns here.

### Acceptance Criteria
1. `ValidatePhone(phone *string) error` on `UserService`.
2. `nil` phone → returns nil.
3. Empty or non-E.164 → returns `*validate.Error` (verifiable via `errors.As`).
4. Valid E.164 (e.g., `+14155552671`) → returns nil.
5. At least four test cases: valid, invalid, nil, empty.
6. Only `service.go` and `service_test.go` change.

### Verification
    go test ./internal/user/... -v -run TestValidatePhone
    go vet ./...
    golangci-lint run ./internal/user/...

Why This Works

Element	Purpose
Goal	Anchors the agent on what and why, not how
Architectural context	Provides knowledge the agent can't infer from code
Relevant files	Eliminates unnecessary exploration and context burn
Reference implementation	"Do it like this" is worth 1,000 words of description
Constraints + non-goals	Prevents scope creep and unsolicited refactors
Edge cases	Surfaces domain knowledge only you have
Acceptance criteria	Defines "done" in observable, testable terms
Verification commands	Lets the agent self-check before declaring victory

References

Anthropic — Effective Context Engineering for AI Agents — Why just-in-time context retrieval and focused instructions outperform pre-loading everything into the prompt.
Claude Code Docs — Best Practices — Including verification commands and CLAUDE.md conventions so the agent can self-check its work.
Claude Directory — Context Engineering for Claude Code — The task trifecta: state the goal, provide constraints, define done.
Augment Code — Best Practices for Using AI Coding Agents — Pointing agents at reference implementations and reviewing changes after each sub-task.
JetBrains — Coding Guidelines for Your AI Agents — How missing constraints lead agents to skip pagination, misuse injection patterns, and ignore project conventions.
Google Cloud — Five Best Practices for AI Coding Assistants — Planning-first workflow and using tests as acceptance criteria for generated code.
HumanLayer — Writing a Good CLAUDE.md — Why fewer, focused instructions outperform instruction overload, and the ~150–200 instruction ceiling for frontier models.
Chroma — Context Rot (Hong et al., 2025) — Empirical study across 18 LLMs showing that attention to context degrades non-uniformly as input length grows.

DEV Community: John Young