<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: John Young</title>
    <description>The latest articles on DEV Community by John Young (@johnayoung).</description>
    <link>https://dev.to/johnayoung</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F982173%2Fbae3202f-5bd5-4bb9-a29f-dce02da5c2f0.png</url>
      <title>DEV Community: John Young</title>
      <link>https://dev.to/johnayoung</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/johnayoung"/>
    <language>en</language>
    <item>
      <title>The Anatomy of a Perfect AI Agent Task</title>
      <dc:creator>John Young</dc:creator>
      <pubDate>Mon, 27 Apr 2026 20:30:03 +0000</pubDate>
      <link>https://dev.to/johnayoung/the-anatomy-of-a-perfect-ai-agent-task-4a2m</link>
      <guid>https://dev.to/johnayoung/the-anatomy-of-a-perfect-ai-agent-task-4a2m</guid>
      <description>&lt;p&gt;A well-crafted task for an AI coding agent is essentially context engineering — you're deliberately curating the minimum set of information the agent needs to produce the right output on the first try. Rather than pre-loading everything up front, the best approach combines focused instructions with enough pointers that the agent can pull in additional context just-in-time as it works (&lt;a href="https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents" rel="noopener noreferrer"&gt;Anthropic — Effective Context Engineering&lt;/a&gt;). Below is a breakdown of every element that matters, why it matters, and a full example at the end that ties it all together.&lt;/p&gt;




&lt;h2&gt;
  
  
  When to Use This
&lt;/h2&gt;

&lt;p&gt;The seven elements below describe the upper-bound shape of a non-trivial task spec, not a baseline checklist. For trivial work — fixing a typo, renaming a variable, anything where the agent has no real risk of getting it wrong — skip the elaborate spec. (The &lt;a href="https://dev.to/blog/how-to-size-tasks-for-ai-coding-agents/#heuristic-2-the-one-sentence-diff-test"&gt;companion sizing post&lt;/a&gt; uses "describable in one sentence" as a &lt;em&gt;sizing&lt;/em&gt; test, not a triviality test — well-sized tasks often fit in one sentence yet still warrant a full spec when there are constraints, edge cases, or pitfalls to communicate. The worked example below is one such task.) Even for non-trivial tasks, treat these elements as a maximum rather than a minimum: frontier LLMs reliably follow only ~150–200 instructions before performance degrades, and every irrelevant detail dilutes the signal of the rest (&lt;a href="https://www.humanlayer.dev/blog/writing-a-good-claude-md" rel="noopener noreferrer"&gt;HumanLayer: Writing a Good CLAUDE.md&lt;/a&gt;).&lt;/p&gt;




&lt;h2&gt;
  
  
  1. State the Goal, Not the Steps
&lt;/h2&gt;

&lt;p&gt;Lead with the &lt;em&gt;outcome&lt;/em&gt; you want, not a micro-managed sequence of instructions. Agents perform better when they understand the "why" and can plan their own approach.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Bad:&lt;/strong&gt; "Open &lt;code&gt;user.go&lt;/code&gt;, find the &lt;code&gt;CreateUser&lt;/code&gt; function, add a field called &lt;code&gt;PhoneNumber&lt;/code&gt;..."&lt;br&gt;
&lt;strong&gt;Good:&lt;/strong&gt; "Add phone number support to user registration, including validation, storage, and API response."&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"The best task descriptions share three properties: they state the goal, provide constraints, and define done."&lt;/em&gt;&lt;br&gt;
— &lt;a href="https://www.claudedirectory.org/blog/context-engineering-claude-code" rel="noopener noreferrer"&gt;Claude Directory: Context Engineering for Claude Code&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;


&lt;h2&gt;
  
  
  2. Provide Architectural Context the Agent Can't Infer
&lt;/h2&gt;

&lt;p&gt;The agent can read your code. What it &lt;em&gt;can't&lt;/em&gt; read is the reasoning behind your architectural decisions, team conventions, or the "why" behind structural choices. Include only what's not derivable from the codebase itself.&lt;/p&gt;

&lt;p&gt;Include things like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Why&lt;/strong&gt; the architecture is shaped a certain way (e.g., "We use the repository pattern to keep DB logic out of handlers")&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Relevant files and entry points&lt;/strong&gt; (saves the agent from searching blindly and burning context window)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Technology choices and versions&lt;/strong&gt; (e.g., "Go 1.22, sqlc for query generation, chi router")&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Domain-specific terminology&lt;/strong&gt; the agent might misinterpret&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"Claude already knows what your project is after reading a few files. What it needs is information it can't derive from reading code."&lt;/em&gt;&lt;br&gt;
— &lt;a href="https://www.claudedirectory.org/blog/context-engineering-claude-code" rel="noopener noreferrer"&gt;Claude Directory: Context Engineering&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That said, there's a discipline to this — more context is not always better. Research suggests frontier LLMs can reliably follow roughly 150–200 instructions before performance degrades, and broader context-rot studies show models attend to context less reliably as input grows (&lt;a href="https://research.trychroma.com/context-rot" rel="noopener noreferrer"&gt;Chroma: Context Rot — Hong et al., 2025&lt;/a&gt;). Every irrelevant detail you add dilutes the signal of the details that actually matter.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"Your CLAUDE.md file should contain as few instructions as possible — ideally only ones which are universally applicable. An LLM will perform better when its context window is full of focused, relevant context compared to when it has a lot of irrelevant context."&lt;/em&gt;&lt;br&gt;
— &lt;a href="https://www.humanlayer.dev/blog/writing-a-good-claude-md" rel="noopener noreferrer"&gt;HumanLayer: Writing a Good CLAUDE.md&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;


&lt;h2&gt;
  
  
  3. Define Explicit Constraints and Non-Goals
&lt;/h2&gt;

&lt;p&gt;This is where most tasks fall apart. Without boundaries, agents will happily refactor your auth layer while you asked them to add a field to a struct.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Constraints:&lt;/strong&gt; What rules must be followed (e.g., "Do not change the public API contract," "Use the existing &lt;code&gt;validate&lt;/code&gt; package, do not introduce a new dependency")&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Non-goals:&lt;/strong&gt; What is explicitly out of scope (e.g., "Do not modify the frontend," "Do not refactor existing tests")&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"Without constraints, AI might miss pagination for list APIs, use field injection instead of constructor injection, or not adhere to your project's package structure."&lt;/em&gt;&lt;br&gt;
— &lt;a href="https://blog.jetbrains.com/idea/2025/05/coding-guidelines-for-your-ai-agents/" rel="noopener noreferrer"&gt;JetBrains: Coding Guidelines for AI Agents&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;


&lt;h2&gt;
  
  
  4. Provide Concrete Examples and Reference Implementations
&lt;/h2&gt;

&lt;p&gt;One of the highest-leverage things you can do. Point the agent at an existing implementation in your codebase that follows the pattern you want replicated.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"Follow the same pattern as &lt;code&gt;internal/order/handler.go&lt;/code&gt; for the new endpoint."&lt;/li&gt;
&lt;li&gt;"See &lt;code&gt;migrations/003_add_email.sql&lt;/code&gt; for the migration format we use."&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"Include helpful examples for reference. ❌ 'Implement tests for class ImageProcessor' → ✅ 'Implement tests for class ImageProcessor. Check text_processor.py for test organization examples.'"&lt;/em&gt;&lt;br&gt;
— &lt;a href="https://www.augmentcode.com/blog/best-practices-for-using-ai-coding-agents" rel="noopener noreferrer"&gt;Augment Code: Best Practices for AI Coding Agents&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;


&lt;h2&gt;
  
  
  5. Define "Done" with Acceptance Criteria
&lt;/h2&gt;

&lt;p&gt;If you don't define what "done" looks like, the agent will decide for you — and you probably won't agree.&lt;/p&gt;

&lt;p&gt;Acceptance criteria should be:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Observable&lt;/strong&gt; (can be verified by running something)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Specific&lt;/strong&gt; (not "should work correctly")&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Testable&lt;/strong&gt; (ideally map to test cases)&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"Create a set of tests that will determine if the generated code works based on your requirements."&lt;/em&gt;&lt;br&gt;
— &lt;a href="https://cloud.google.com/blog/topics/developers-practitioners/five-best-practices-for-using-ai-coding-assistants" rel="noopener noreferrer"&gt;Google Cloud: Five Best Practices for AI Coding Assistants&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;


&lt;h2&gt;
  
  
  6. Include Verification Commands
&lt;/h2&gt;

&lt;p&gt;Tell the agent exactly how to confirm its own work. This is the difference between "I think it works" and "it passes the build."&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;go test ./internal/user/...&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;go vet ./...&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;golangci-lint run&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;curl -X POST localhost:8080/api/v1/users -d '{"phone": "+1234567890"}' | jq .&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"Claude Code's best practices emphasize including Bash commands for verification. This gives Claude persistent context it can't infer from code alone."&lt;/em&gt;&lt;br&gt;
— &lt;a href="https://code.claude.com/docs/en/best-practices" rel="noopener noreferrer"&gt;Claude Code Docs: Best Practices&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;


&lt;h2&gt;
  
  
  7. Call Out Edge Cases and Known Pitfalls
&lt;/h2&gt;

&lt;p&gt;You know things about your system the agent doesn't. If there's a footgun, flag it. If there's a non-obvious coupling between modules, say so.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;"The &lt;code&gt;user_id&lt;/code&gt; column has a unique constraint — the migration must handle existing duplicates."&lt;/li&gt;
&lt;li&gt;"The &lt;code&gt;Validate()&lt;/code&gt; method is called both at the handler level and inside the repository. Don't double-validate."&lt;/li&gt;
&lt;/ul&gt;


&lt;h2&gt;
  
  
  The Full Example
&lt;/h2&gt;

&lt;p&gt;A non-trivial feature decomposes into a handful of well-sized tasks. Take adding an optional phone number to user registration — accepted on signup, persisted on the user record, and returned by the user API. That feature splits into four tasks, one per architectural layer:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Migration&lt;/strong&gt; — Add a nullable &lt;code&gt;phone_number&lt;/code&gt; column with reversible up/down SQL.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Model + sqlc&lt;/strong&gt; — Update the &lt;code&gt;User&lt;/code&gt; struct and regenerate sqlc queries.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Service + validation&lt;/strong&gt; — Add &lt;code&gt;ValidatePhone&lt;/code&gt; to &lt;code&gt;UserService&lt;/code&gt; using &lt;code&gt;validate.PhoneE164&lt;/code&gt;, with unit tests.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Handler + integration&lt;/strong&gt; — Wire the field through &lt;code&gt;POST&lt;/code&gt; and &lt;code&gt;GET /api/v1/users&lt;/code&gt; and add integration tests.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The third is spec'd out in full below as the worked example. It's the strongest illustration of the seven elements at the right scope: the diff fits in one sentence, it stays inside a single layer, the agent reads ~5 files, the change lands well under the 200 LOC ceiling, and it can be verified independently — passing every gate of the &lt;a href="https://dev.to/blog/how-to-size-tasks-for-ai-coding-agents/#sizing-decision-flowchart"&gt;companion sizing post's decision flowchart&lt;/a&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gu"&gt;## Task Spec: Add E.164 phone validation to UserService&lt;/span&gt;

&lt;span class="gu"&gt;### Goal&lt;/span&gt;
Phone numbers submitted to user registration must be rejected at the service layer when they aren't valid E.164. This task delivers that check; handler wiring and DB persistence are separate tasks.

&lt;span class="gu"&gt;### Architectural Context&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Semantic validation belongs in the service, not the handler. Handler does null/shape; service owns format and bounds.
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="sb"&gt;`UserService.ValidateEmail`&lt;/span&gt; is the canonical example of this split — match its shape.

&lt;span class="gu"&gt;### Relevant Files&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="sb"&gt;`internal/user/service.go`&lt;/span&gt; — add &lt;span class="sb"&gt;`ValidatePhone`&lt;/span&gt; here.
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="sb"&gt;`internal/user/service_test.go`&lt;/span&gt; — add tests here.
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="sb"&gt;`internal/pkg/validate/phone.go`&lt;/span&gt; — read-only reference for &lt;span class="sb"&gt;`PhoneE164`&lt;/span&gt; and &lt;span class="sb"&gt;`validate.Error`&lt;/span&gt;.

&lt;span class="gu"&gt;### Reference Implementation&lt;/span&gt;
Mirror &lt;span class="sb"&gt;`UserService.ValidateEmail`&lt;/span&gt; in &lt;span class="sb"&gt;`service.go`&lt;/span&gt;:
&lt;span class="p"&gt;-&lt;/span&gt; Signature: &lt;span class="sb"&gt;`func (s *UserService) ValidatePhone(phone *string) error`&lt;/span&gt;.
&lt;span class="p"&gt;-&lt;/span&gt; Nil pointer → return nil. Empty string → return error.
&lt;span class="p"&gt;-&lt;/span&gt; Return the &lt;span class="sb"&gt;`*validate.Error`&lt;/span&gt; from &lt;span class="sb"&gt;`PhoneE164`&lt;/span&gt; unwrapped — no &lt;span class="sb"&gt;`fmt.Errorf`&lt;/span&gt;.
&lt;span class="p"&gt;-&lt;/span&gt; Copy the table-driven layout from &lt;span class="sb"&gt;`TestUserService_ValidateEmail`&lt;/span&gt;.

&lt;span class="gu"&gt;### Constraints&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; Use &lt;span class="sb"&gt;`validate.PhoneE164`&lt;/span&gt;. No regex, no new dependencies.
&lt;span class="p"&gt;-&lt;/span&gt; Don't touch &lt;span class="sb"&gt;`UserRepository`&lt;/span&gt; or its mock — validation is pure.
&lt;span class="p"&gt;-&lt;/span&gt; Don't wrap the error; the handler relies on &lt;span class="sb"&gt;`errors.As(&amp;amp;validate.Error{})`&lt;/span&gt; to map it to HTTP 422.

&lt;span class="gu"&gt;### Non-Goals&lt;/span&gt;
No handler, migration, sqlc, or integration-test changes. No edits to &lt;span class="sb"&gt;`ValidateEmail`&lt;/span&gt; or other unrelated methods.

&lt;span class="gu"&gt;### Edge Cases&lt;/span&gt;
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="sb"&gt;`phone == nil`&lt;/span&gt; → return nil (field not provided).
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="sb"&gt;`*phone == ""`&lt;/span&gt; → return &lt;span class="sb"&gt;`validate.Error`&lt;/span&gt; (malformed input).
&lt;span class="p"&gt;-&lt;/span&gt; Strict E.164: &lt;span class="sb"&gt;`1234567890`&lt;/span&gt; (no leading &lt;span class="sb"&gt;`+`&lt;/span&gt;) must fail.
&lt;span class="p"&gt;-&lt;/span&gt; The handler already checks the JSON field is present and is a string — don't re-check those concerns here.

&lt;span class="gu"&gt;### Acceptance Criteria&lt;/span&gt;
&lt;span class="p"&gt;1.&lt;/span&gt; &lt;span class="sb"&gt;`ValidatePhone(phone *string) error`&lt;/span&gt; on &lt;span class="sb"&gt;`UserService`&lt;/span&gt;.
&lt;span class="p"&gt;2.&lt;/span&gt; &lt;span class="sb"&gt;`nil`&lt;/span&gt; phone → returns nil.
&lt;span class="p"&gt;3.&lt;/span&gt; Empty or non-E.164 → returns &lt;span class="sb"&gt;`*validate.Error`&lt;/span&gt; (verifiable via &lt;span class="sb"&gt;`errors.As`&lt;/span&gt;).
&lt;span class="p"&gt;4.&lt;/span&gt; Valid E.164 (e.g., &lt;span class="sb"&gt;`+14155552671`&lt;/span&gt;) → returns nil.
&lt;span class="p"&gt;5.&lt;/span&gt; At least four test cases: valid, invalid, nil, empty.
&lt;span class="p"&gt;6.&lt;/span&gt; Only &lt;span class="sb"&gt;`service.go`&lt;/span&gt; and &lt;span class="sb"&gt;`service_test.go`&lt;/span&gt; change.

&lt;span class="gu"&gt;### Verification&lt;/span&gt;
    go test ./internal/user/... -v -run TestValidatePhone
    go vet ./...
    golangci-lint run ./internal/user/...
&lt;span class="p"&gt;```&lt;/span&gt;&lt;span class="nl"&gt;
&lt;/span&gt;
`

---

## Why This Works

| Element                      | Purpose                                                                      |
| ---------------------------- | ---------------------------------------------------------------------------- |
| **Goal**                     | Anchors the agent on *what* and *why*, not *how*                             |
| **Architectural context**    | Provides knowledge the agent can't infer from code                           |
| **Relevant files**           | Eliminates unnecessary exploration and context burn                          |
| **Reference implementation** | "Do it like this" is worth 1,000 words of description                        |
| **Constraints + non-goals**  | Prevents scope creep and unsolicited refactors                               |
| **Edge cases**               | Surfaces domain knowledge only you have                                      |
| **Acceptance criteria**      | Defines "done" in observable, testable terms                                 |
| **Verification commands**    | Lets the agent self-check before declaring victory                           |

---

## References

1. [Anthropic — Effective Context Engineering for AI Agents](https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents) — Why just-in-time context retrieval and focused instructions outperform pre-loading everything into the prompt.
2. [Claude Code Docs — Best Practices](https://code.claude.com/docs/en/best-practices) — Including verification commands and CLAUDE.md conventions so the agent can self-check its work.
3. [Claude Directory — Context Engineering for Claude Code](https://www.claudedirectory.org/blog/context-engineering-claude-code) — The task trifecta: state the goal, provide constraints, define done.
4. [Augment Code — Best Practices for Using AI Coding Agents](https://www.augmentcode.com/blog/best-practices-for-using-ai-coding-agents) — Pointing agents at reference implementations and reviewing changes after each sub-task.
5. [JetBrains — Coding Guidelines for Your AI Agents](https://blog.jetbrains.com/idea/2025/05/coding-guidelines-for-your-ai-agents/) — How missing constraints lead agents to skip pagination, misuse injection patterns, and ignore project conventions.
6. [Google Cloud — Five Best Practices for AI Coding Assistants](https://cloud.google.com/blog/topics/developers-practitioners/five-best-practices-for-using-ai-coding-assistants) — Planning-first workflow and using tests as acceptance criteria for generated code.
7. [HumanLayer — Writing a Good CLAUDE.md](https://www.humanlayer.dev/blog/writing-a-good-claude-md) — Why fewer, focused instructions outperform instruction overload, and the ~150–200 instruction ceiling for frontier models.
8. [Chroma — Context Rot (Hong et al., 2025)](https://research.trychroma.com/context-rot) — Empirical study across 18 LLMs showing that attention to context degrades non-uniformly as input length grows.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>devops</category>
      <category>productivity</category>
    </item>
  </channel>
</rss>
