A curated set of resources for understanding LLM agent architecture, the control plane, and how to build effective agents, with direct links to every resource.
1. Recommended path
If you only have a few hours, do these in order:
- Anthropic: Building effective agents (~1 hour) The single best practical overview from people who ship them.
- Lilian Weng: LLM Powered Autonomous Agents (~1 hour) The canonical academic-flavored overview: planning, memory, tool use.
- Model Context Protocol intro + Claude Code documentation (1–2 hours) The control-plane mental model clicks fast once you've read both.
- Skim one framework's "concepts" page, LangGraph overview is the densest (30 min).
- Dip into papers (ReAct, Reflexion, …) only when a specific pattern catches your interest.
2. Foundational essays: read these first
Building effective agents
Erik Schluntz & Barry Zhang, Anthropic, December 2024. The best practical overview. Covers workflows vs agents, common patterns (prompt chaining, routing, parallelization, orchestrator-workers, evaluator-optimizer), and (crucially) when not to use an agent. The companion code lives in the Claude Cookbooks agent patterns folder.
LLM Powered Autonomous Agents
Lilian Weng (OpenAI), June 2023. The canonical academic-flavored overview: planning, memory, tool use. Still the most-cited single piece in the field. Lives on her blog Lil'Log.
AI Engineering (chapter on Agents)
Chip Huyen, O'Reilly, 2024. Excellent on the engineering side: evaluation, failure modes, planning loops. The whole book is worth owning. See also Chip Huyen's books page and the supporting GitHub repository.
3. Patterns & techniques: the original papers
| Paper | Year | Key idea |
|---|---|---|
| ReAct | Yao et al., 2022 | Interleave Thought → Action → Observation |
| Reflexion | Shinn et al., 2023 | Self-critique to improve over iterations |
| Toolformer | Schick et al., 2023 | Tool use as a learned skill |
| Tree of Thoughts | Yao et al., 2023 | Explicit search over reasoning branches |
| Plan-and-Solve | Wang et al., 2023 | Decompose first, then execute step by step |
| Voyager | Wang et al., 2023 | Skill libraries / procedural memory in the wild (project site) |
| Self-Refine | Madaan et al., 2023 | Iterative improvement via self-feedback (project site) |
| Chain-of-Thought | Wei et al., 2022 | Step-by-step reasoning prompts |
| Generative Agents | Park et al., 2023 | The famous Smallville simulation |
4. Protocols & specs (the control-plane stuff)
Model Context Protocol (MCP)
Anthropic's open spec for plugging tool servers into any agent. The de-facto standard for tool interoperability. Start with the introduction and the main GitHub org.
AGENTS.md
Cross-vendor spec for "instructions to coding agents" files. Originated by OpenAI Codex, Amp, Jules (Google), Cursor, and Factory; now stewarded by the Agentic AI Foundation under the Linux Foundation. Implemented across most coding agents. Source on GitHub.
Agent Skills
Anthropic's open SKILL.md standard for lazy-loaded capability bundles. olders of instructions, scripts, and resources that an agent discovers via metadata and loads on demand. Originally a Claude Code feature, now adopted by Cursor, GitHub Copilot, VS Code, Gemini CLI, OpenAI Codex, OpenHands, Goose, Letta, JetBrains Junie, Factory, Amp, and ~20 other tools. Start with the overview, then the specification. Source on GitHub; Anthropic's example skills at anthropics/skills.
OpenAPI → tool schemas
Tool schemas can be auto-generated from OpenAPI specs. Most frameworks support this directly.
5. Claude Code & Anthropic ecosystem
Claude Code documentation
The official source of truth, updates frequently. Sections on hooks, skills, subagents, MCP, settings, slash commands, plugins, output styles, status lines. The mirror at docs.anthropic.com/en/docs/claude-code also serves the same content. Source on GitHub.
Claude Agent SDK
Same docs site. The SDK exposes the same primitives (tools, hooks, permissions) that Claude Code uses, so reading the SDK docs is one of the fastest ways to understand the harness model.
Claude Cookbooks
Practical agent recipes on GitHub (formerly Anthropic Cookbook). The patterns/agents/ folder contains the reference implementations for Building Effective Agents (orchestrator-workers, evaluator-optimizer, etc.).
Anthropic Engineering blog
Periodic deep dives on agent design, tool use, and prompt engineering. Published under anthropic.com/engineering and anthropic.com/research.
6. Frameworks (good for "show me code")
Each framework's docs is essentially an opinionated essay on agent architecture. Read the concepts pages, not the API reference.
| Framework | Strength | Links |
|---|---|---|
| LangGraph (LangChain) | Stateful loops, multi-agent, human-in-the-loop | docs · product · GitHub |
| LlamaIndex Workflows / Agents | Retrieval and memory | agents docs · Workflows 1.0 announcement |
| Pydantic AI | Typed tool calls, clean mental model | docs · GitHub |
| smolagents (Hugging Face) | Minimal, code-as-action | docs · GitHub · intro blog |
| CrewAI | Multi-agent role-based | docs · GitHub |
| AutoGen (Microsoft) | Conversational multi-agent (now in maintenance, see Microsoft Agent Framework below) | docs · GitHub |
| Microsoft Agent Framework | The successor to AutoGen, enterprise-ready | docs |
| OpenAI Agents SDK | Lightweight handoff-based (production successor to Swarm) | docs · GitHub · original Swarm |
| DSPy (Stanford) | Programmatic prompts, optimization | site · GitHub |
7. Memory & retrieval
- GraphRAG (Microsoft Research, 2024): graph-augmented retrieval over a corpus. GitHub · project page · paper.
- MemGPT / Letta: tiered memory inspired by OS virtual memory. The original MemGPT paper (Packer et al., 2023) is the canonical reference; the modern Letta framework is the production successor.
- Vector DB docs: Qdrant, Weaviate, pgvector, Chroma: each has good intro material.
- For RAG patterns, the LlamaIndex agents docs is the canonical reference.
8. Observability & evaluation
Tracing platforms
Each has docs that double as a tutorial on what to instrument:
- Langfuse: open source, self-hostable. Docs · GitHub
- LangSmith: hosted, by LangChain. Docs
- Arize Phoenix: open source, very conceptual docs. Docs · GitHub
- Helicone: proxy-based. Docs · GitHub
- Braintrust: eval-focused. Docs
Standards
- OpenTelemetry GenAI semantic conventions: the emerging standard for tracing LLM/agent calls. See also the agent-spans page.
Evaluation frameworks & benchmarks
-
lm-evaluation-harness(EleutherAI): base-model benchmarks; backend for the HuggingFace Open LLM Leaderboard. - HELM (Stanford CRFM): holistic evaluation framework. GitHub · paper
- AgentBench (Tsinghua): multi-environment LLM-as-agent benchmark. Paper
- SWE-bench: solving real GitHub issues. GitHub
- τ-bench (Sierra): tool-agent-user interaction in real-world domains. Blog post · τ²-bench
9. Safety, security, and guardrails
- OWASP Top 10 for LLM Applications: the standard threat list. Project page · 2025 PDF
- Prompt injection: Simon Willison's prompt injection series is the most comprehensive ongoing coverage. He coined the term and continues to write about new variants on his main blog and his substack.
- NIST AI Risk Management Framework: for governance angles. AI RMF 1.0 PDF · Resource Center
- Anthropic's Responsible Scaling Policy model-level safety thinking, published on anthropic.com.
10. Multi-agent & emerging directions
- AutoGen paper (Wu et al., 2023): multi-agent conversation framework
- MetaGPT: assembly-line multi-agent. Paper
- ChatDev: software-company-as-multi-agent. Paper
- Generative Agents (Park et al., 2023): the famous Smallville simulation. Paper
11. Going deeper: books
- Chip Huyen: AI Engineering (O'Reilly, 2024): production AI systems. Author page · GitHub
- Jay Alammar & Maarten Grootendorst: Hands-On Large Language Models (O'Reilly, 2024): visual, accessible. O'Reilly · GitHub
- Sebastian Raschka: Build a Large Language Model (From Scratch) (Manning, 2024): for understanding what's inside the LLM. GitHub · author's books page
12. Communities & ongoing reading
- Anthropic, OpenAI, DeepMind engineering blogs: best practical writing
- Simon Willison's blog: daily LLM news and analysis (the best single feed in the field)
- Latent Space podcast: interviews with builders, hosted by swyx and Alessio. Newsletter
-
Hacker News
aitag: high-signal discussions - LangChain blog, LlamaIndex blog: framework-level pattern writeups
-
arXiv
cs.CLandcs.AI: primary research
13. By topic: quick reference
| If you want to understand… | Start with |
|---|---|
| What an agent is | Anthropic Building effective agents |
| Planning patterns | ReAct, Plan-and-Solve papers |
| Memory architectures | Lilian Weng's post, MemGPT/Letta |
| Tool integration | MCP docs |
| Configuration / control plane | Claude Code docs (hooks, skills, subagents) |
| Multi-agent systems | LangGraph, AutoGen, MetaGPT |
| Production tracing | Arize Phoenix or Langfuse |
| Agent evaluation | SWE-bench, τ-bench, AgentBench |
| Prompt injection / safety | Simon Willison's series, OWASP LLM Top 10 |
| RAG | LlamaIndex agents, GraphRAG |
| LLMs from the inside | Sebastian Raschka's book |
14. A note on freshness
This field moves fast. Patterns from 2023 may be obsolete; protocols from 2024 may be standard by next quarter. Treat any specific tool or framework recommendation as a snapshot, not gospel. The concepts (loop, memory, tools, control plane, three knobs) are stable. The implementations churn.
When in doubt: read the official docs of whatever tool you're actually using, then triangulate with one or two of the foundational essays above.
Top comments (1)
I put all my notes in this repo github.com/jocerfranquiz/notes-on-...