DEV Community

Aman Bhandari
Aman Bhandari

Posted on

How I turned 10 practitioners into a single .claude/ pedagogy

Every rule in my .claude/ directory cites the practitioner whose working method it leans on. This is not reading-list decoration. It is a traceability requirement: if a teaching exchange or an engineering decision cannot be pinned to a named 2025-2026 practitioner doing that specific thing in public, the rule is ungrounded and gets removed.

Eight practitioners form the spine. The five-node concentric loop pins each node to one or two of them. The five agentic-engineering habits pin each habit to one. Together they define what the framework inherits from the applied community instead of inventing in a vacuum.

Framework repo: claude-code-agent-skills-framework.

The eight, by node and habit

Chip Huyen — Code / I-O framing (concentric-loop Node 2). Every code node opens with explicit input/output specification: what goes in, what comes out, what data is available at what latency tier (online / nearline / offline). The latency-tier framing originates with the Netflix recommendation system (Amatriain and Basilico, 2013). Huyen mainstreamed it in Designing Machine Learning Systems, Chapter 2. Source: chiphuyen.com.

Eugene Yan — Start with the problem, not the technology (Node 3 baseline, agentic Habit 4 prerequisite). Before any ML or agent component is introduced, ask: what regex, SQL, or rule-based filter already gets 50-70%? The Four Questions (what is the problem, who has it, would a non-AI solution work, what does success look like measurably) come from Yan's applied-LLM writing. Source: eugeneyan.com/start-here.

Hamel Husain — Manual trace labeling (Node 3, agentic Habit 4). Before trusting an LLM or agent at scale, label 20-100 real traces by hand. The trace becomes the eval harness. Husain's 90%+ human-judge agreement in his LLM-judge field guide is a workflow outcome, not a universal KPI — he explicitly warns raw agreement misleads on imbalanced data. Source: hamel.dev/blog/posts/field-guide.

Jeremy Howard — Top-down learning (Node 4). fast.ai Part 1. Get a working artifact end-to-end first, then spiral into mechanism. Whole game, then atoms, then whole game with new eyes. Source: course.fast.ai.

Sebastian Raschka — Bottom-up from scratch (Node 4, paired with Howard). Build a Large Language Model from Scratch. Raw tensors, manual attention, instruction-finetuning implemented by hand. The bottom-up complement to Howard's top-down. Source: sebastianraschka.com.

Andrej Karpathy — Atomic derivation (Node 5, agentic Habit 5). Shrink the concept until it fits in your head. micrograd is 100 lines of autograd; nanoGPT is 300 lines of training. The 40-line version is what you review the 40,000-line version against. Source: karpathy.ai/zero-to-hero.

Julia Evans — OS descent safety net (Node 5 paired with Karpathy). When an abstraction leaks, drop to strace, tcpdump, perf, /proc. Evans's zines and blog are the field guide for the moment an explanation stops working at the application layer and the real answer is two layers below. Source: jvns.ca.

Harper Reed — Spec first (agentic Habit 1). Every agent-led task starts with idea.md (brainstorm) and plan.md (plan). The agent executes against the plan; the operator reviews the plan, not every line of code. The compounding artifact is the spec plus plan, not the code. Source: harper.blog/2025/02/16/my-llm-codegen-workflow-atm.

Geoffrey Litt — Primary/secondary split (agentic Habit 2). Tight-loop design stays human-primary. The agent is a pair-programmer at most. Well-defined execution goes async to agents and is reviewed in batch. Two parallel streams, rotated consciously. Source: geoffreylitt.com.

Shrivu Shankar — Agent primitive vocabulary (agentic Habit 3). Three reusable patterns for multi-agent work: assembly-line (sequential pipeline), call-center (router + specialists), manager-worker (decompose + aggregate). Pick the one that matches the job shape; do not default to the most complex. Source: blog.sshh.io.

(The list counts ten names because Howard/Raschka and Karpathy/Evans each pair on a node. Eight distinct nodes + habits, ten distinct practitioners.)

The extraction method — what I actually read for

These practitioners did not write rules for me. They wrote blog posts, books, lectures, tweets. I extracted the rules by reading for a specific thing.

Not what they argue for. Their theses are often context-dependent and date fast. Yan's position on when to reach for ML versus a regex is a working stance, not a universal claim.

Not their specific tools. The tools they reach for (Aider for Reed, fast.ai's library for Howard, a specific Jupyter setup for Raschka) will rotate. Reading for the tool produces a rule that retires in 18 months.

What I read for: the workflow and the failure mode. Reed writes down his codegen workflow explicitly. Husain writes down the trace-labeling routine explicitly. Shankar names the three agent primitives explicitly. The workflow is transferable. The failure mode each workflow prevents is the part that compounds across domains.

Applied practitioners publish the workflow, the failure mode, and the eval loop. Researchers publish the result. The result is often non-transferable; the workflow almost always is. The eight practitioners above are applied practitioners specifically because of this property — they write about how they work, not only about what they produced.

Why this pinning matters

Without practitioner pinning, rules drift. A rule that says "always derive before deploying" sounds authoritative until it has been in the file for six months and nobody can remember why it was written or what it corrects. Six months later somebody else edits it because a different-sounding advice from a different blog post feels more recent, and the original intent gets quietly overwritten.

With pinning, the rule is anchored: "This rule is the Karpathy atomic-derivation discipline applied to the learning pipeline. If the Karpathy constraint stops being load-bearing for this work, the rule retires." The retirement condition is falsifiable. The rule's origin is traceable. Edits that drift from the original practitioner's position get flagged on next audit.

Pinning also blocks a specific failure mode: manufacturing a rule from thin air, calling it a best practice, and committing it to the canon. A rule that cannot be pinned to a practitioner doing that specific thing in public is probably either obvious (and does not need a rule) or invented (and should not be canonized).

The one-rule-per-practitioner shape is on purpose

Each practitioner occupies one node in the loop or one habit in the agentic-engineering rule. They do not appear everywhere. Pinning a practitioner to multiple roles is how you lose the specificity that made the pinning worth doing in the first place.

Husain is on Node 3 (manual trace labeling) and Habit 4 (eval on agent output) because those are the same discipline at two surfaces. He is not on Node 5 or Habit 2, because his public writing is not where I go for atomic derivation or for primary/secondary split. Respecting what each practitioner is specifically good at is what keeps the rules tight.

The capstone effect

When the framework reaches the point where every rule has a WHY tag, a retire-when clause, and a practitioner pin, the result is a system that decays cleanly rather than accumulating silently. New model? Audit the WHY tags against the retire-when clauses. Shifted stack? Audit the practitioner pins and check whether the cited 2025-2026 methods still apply.

This is the opposite of the usual trajectory for .claude/ directories, which grow organically to 50 rules, become furniture, and start fighting improved model defaults without anybody noticing.

What to do with this pattern

Pick one practitioner you already read. Name the specific workflow you inherit from them. Turn it into one rule. Tag it with the practitioner. Add the WHY and retire-when clauses. Commit.

One rule, one practitioner, one audit condition. Do it for three practitioners you respect. The result is a framework you can actually defend in a year — because every rule in it points at somebody doing the work in public, and somebody's public work is a standard you can audit against when the model shifts underneath you.


Aman Bhandari. Operator of an AI-engineering research lab running Claude Opus as the coaching partner, plus a QA-automation surface shipping against a real sprint workload. Public artifacts: claude-code-agent-skills-framework and claude-code-mcp-qa-automation. github.com/aman-bhandari.

Top comments (0)