Jocer Franquiz

Posted on Apr 9

A Serious (and hype-less) Study Guide on Agents and LLMs

#ai #agentskills #learning #llm

A curated set of resources for understanding LLM agent architecture, the control plane, and how to build effective agents, with direct links to every resource.

1. Recommended path

If you only have a few hours, do these in order:

Anthropic: Building effective agents (~1 hour) The single best practical overview from people who ship them.
Lilian Weng: LLM Powered Autonomous Agents (~1 hour) The canonical academic-flavored overview: planning, memory, tool use.
Model Context Protocol intro + Claude Code documentation (1–2 hours) The control-plane mental model clicks fast once you've read both.
Skim one framework's "concepts" page, LangGraph overview is the densest (30 min).
Dip into papers (ReAct, Reflexion, …) only when a specific pattern catches your interest.

2. Foundational essays: read these first

Building effective agents

Erik Schluntz & Barry Zhang, Anthropic, December 2024. The best practical overview. Covers workflows vs agents, common patterns (prompt chaining, routing, parallelization, orchestrator-workers, evaluator-optimizer), and (crucially) when not to use an agent. The companion code lives in the Claude Cookbooks agent patterns folder.

LLM Powered Autonomous Agents

Lilian Weng (OpenAI), June 2023. The canonical academic-flavored overview: planning, memory, tool use. Still the most-cited single piece in the field. Lives on her blog Lil'Log.

AI Engineering (chapter on Agents)

Chip Huyen, O'Reilly, 2024. Excellent on the engineering side: evaluation, failure modes, planning loops. The whole book is worth owning. See also Chip Huyen's books page and the supporting GitHub repository.

3. Patterns & techniques: the original papers

Paper	Year	Key idea
ReAct	Yao et al., 2022	Interleave Thought → Action → Observation
Reflexion	Shinn et al., 2023	Self-critique to improve over iterations
Toolformer	Schick et al., 2023	Tool use as a learned skill
Tree of Thoughts	Yao et al., 2023	Explicit search over reasoning branches
Plan-and-Solve	Wang et al., 2023	Decompose first, then execute step by step
Voyager	Wang et al., 2023	Skill libraries / procedural memory in the wild (project site)
Self-Refine	Madaan et al., 2023	Iterative improvement via self-feedback (project site)
Chain-of-Thought	Wei et al., 2022	Step-by-step reasoning prompts
Generative Agents	Park et al., 2023	The famous Smallville simulation

4. Protocols & specs (the control-plane stuff)

Model Context Protocol (MCP)

Anthropic's open spec for plugging tool servers into any agent. The de-facto standard for tool interoperability. Start with the introduction and the main GitHub org.

AGENTS.md

Cross-vendor spec for "instructions to coding agents" files. Originated by OpenAI Codex, Amp, Jules (Google), Cursor, and Factory; now stewarded by the Agentic AI Foundation under the Linux Foundation. Implemented across most coding agents. Source on GitHub.

Agent Skills

Anthropic's open SKILL.md standard for lazy-loaded capability bundles. olders of instructions, scripts, and resources that an agent discovers via metadata and loads on demand. Originally a Claude Code feature, now adopted by Cursor, GitHub Copilot, VS Code, Gemini CLI, OpenAI Codex, OpenHands, Goose, Letta, JetBrains Junie, Factory, Amp, and ~20 other tools. Start with the overview, then the specification. Source on GitHub; Anthropic's example skills at anthropics/skills.

OpenAPI → tool schemas

Tool schemas can be auto-generated from OpenAPI specs. Most frameworks support this directly.

5. Claude Code & Anthropic ecosystem

Claude Code documentation

The official source of truth, updates frequently. Sections on hooks, skills, subagents, MCP, settings, slash commands, plugins, output styles, status lines. The mirror at docs.anthropic.com/en/docs/claude-code also serves the same content. Source on GitHub.

Claude Agent SDK

Same docs site. The SDK exposes the same primitives (tools, hooks, permissions) that Claude Code uses, so reading the SDK docs is one of the fastest ways to understand the harness model.

Claude Cookbooks

Practical agent recipes on GitHub (formerly Anthropic Cookbook). The patterns/agents/ folder contains the reference implementations for Building Effective Agents (orchestrator-workers, evaluator-optimizer, etc.).

Anthropic Engineering blog

Periodic deep dives on agent design, tool use, and prompt engineering. Published under anthropic.com/engineering and anthropic.com/research.

6. Frameworks (good for "show me code")

Each framework's docs is essentially an opinionated essay on agent architecture. Read the concepts pages, not the API reference.

Framework	Strength	Links
LangGraph (LangChain)	Stateful loops, multi-agent, human-in-the-loop	docs · product · GitHub
LlamaIndex Workflows / Agents	Retrieval and memory	agents docs · Workflows 1.0 announcement
Pydantic AI	Typed tool calls, clean mental model	docs · GitHub
smolagents (Hugging Face)	Minimal, code-as-action	docs · GitHub · intro blog
CrewAI	Multi-agent role-based	docs · GitHub
AutoGen (Microsoft)	Conversational multi-agent (now in maintenance, see Microsoft Agent Framework below)	docs · GitHub
Microsoft Agent Framework	The successor to AutoGen, enterprise-ready	docs
OpenAI Agents SDK	Lightweight handoff-based (production successor to Swarm)	docs · GitHub · original Swarm
DSPy (Stanford)	Programmatic prompts, optimization	site · GitHub

7. Memory & retrieval

GraphRAG (Microsoft Research, 2024): graph-augmented retrieval over a corpus. GitHub · project page · paper.
MemGPT / Letta: tiered memory inspired by OS virtual memory. The original MemGPT paper (Packer et al., 2023) is the canonical reference; the modern Letta framework is the production successor.
Vector DB docs: Qdrant, Weaviate, pgvector, Chroma: each has good intro material.
For RAG patterns, the LlamaIndex agents docs is the canonical reference.

8. Observability & evaluation

Tracing platforms

Each has docs that double as a tutorial on what to instrument:

Langfuse: open source, self-hostable. Docs · GitHub
LangSmith: hosted, by LangChain. Docs
Arize Phoenix: open source, very conceptual docs. Docs · GitHub
Helicone: proxy-based. Docs · GitHub
Braintrust: eval-focused. Docs

Standards

OpenTelemetry GenAI semantic conventions: the emerging standard for tracing LLM/agent calls. See also the agent-spans page.

Evaluation frameworks & benchmarks

lm-evaluation-harness (EleutherAI): base-model benchmarks; backend for the HuggingFace Open LLM Leaderboard.
HELM (Stanford CRFM): holistic evaluation framework. GitHub · paper
AgentBench (Tsinghua): multi-environment LLM-as-agent benchmark. Paper
SWE-bench: solving real GitHub issues. GitHub
τ-bench (Sierra): tool-agent-user interaction in real-world domains. Blog post · τ²-bench

9. Safety, security, and guardrails

OWASP Top 10 for LLM Applications: the standard threat list. Project page · 2025 PDF
Prompt injection: Simon Willison's prompt injection series is the most comprehensive ongoing coverage. He coined the term and continues to write about new variants on his main blog and his substack.
NIST AI Risk Management Framework: for governance angles. AI RMF 1.0 PDF · Resource Center
Anthropic's Responsible Scaling Policy model-level safety thinking, published on anthropic.com.

10. Multi-agent & emerging directions

AutoGen paper (Wu et al., 2023): multi-agent conversation framework
MetaGPT: assembly-line multi-agent. Paper
ChatDev: software-company-as-multi-agent. Paper
Generative Agents (Park et al., 2023): the famous Smallville simulation. Paper

11. Going deeper: books

Chip Huyen: AI Engineering (O'Reilly, 2024): production AI systems. Author page · GitHub
Jay Alammar & Maarten Grootendorst: Hands-On Large Language Models (O'Reilly, 2024): visual, accessible. O'Reilly · GitHub
Sebastian Raschka: Build a Large Language Model (From Scratch) (Manning, 2024): for understanding what's inside the LLM. GitHub · author's books page

12. Communities & ongoing reading

Anthropic, OpenAI, DeepMind engineering blogs: best practical writing
- Anthropic Engineering · Anthropic Research
Simon Willison's blog: daily LLM news and analysis (the best single feed in the field)
Latent Space podcast: interviews with builders, hosted by swyx and Alessio. Newsletter
Hacker News ai tag: high-signal discussions
LangChain blog, LlamaIndex blog: framework-level pattern writeups
arXiv cs.CL and cs.AI: primary research

13. By topic: quick reference

If you want to understand…	Start with
What an agent is	Anthropic Building effective agents
Planning patterns	ReAct, Plan-and-Solve papers
Memory architectures	Lilian Weng's post, MemGPT/Letta
Tool integration	MCP docs
Configuration / control plane	Claude Code docs (hooks, skills, subagents)
Multi-agent systems	LangGraph, AutoGen, MetaGPT
Production tracing	Arize Phoenix or Langfuse
Agent evaluation	SWE-bench, τ-bench, AgentBench
Prompt injection / safety	Simon Willison's series, OWASP LLM Top 10
RAG	LlamaIndex agents, GraphRAG
LLMs from the inside	Sebastian Raschka's book

14. A note on freshness

This field moves fast. Patterns from 2023 may be obsolete; protocols from 2024 may be standard by next quarter. Treat any specific tool or framework recommendation as a snapshot, not gospel. The concepts (loop, memory, tools, control plane, three knobs) are stable. The implementations churn.

When in doubt: read the official docs of whatever tool you're actually using, then triangulate with one or two of the foundational essays above.