π§± FOUNDATION β What You Already Have + What's Missing
You know deep learning and LLM token prediction. Good. But you're likely missing:
Mathematical gaps to close first:
Information theory β entropy, KL divergence, mutual information. These underpin how LLMs are trained and evaluated.
Optimization theory β beyond SGD. Understand Adam, second-order methods, loss landscape geometry.
Probability theory (advanced) β Bayesian inference, variational inference, stochastic processes. Agents make decisions under uncertainty; you need this language.
Graph theory β agents are graphs of computation. LangGraph isn't magic; it's directed graphs with state.
Control theory β feedback loops, stability, convergence. Agentic systems are control systems at heart.
Game theory β multi-agent systems have competitive and cooperative dynamics. Nash equilibria, mechanism design.
π€ LAYER 1 β The Full GenAI Landscape
Before agents, you need to understand every major paradigm of generative modeling:
Language Models (deepen what you know):
Transformer architecture internals β attention heads, positional encoding, KV cache
Scaling laws (Chinchilla, Kaplan et al.) β how compute, data, and model size interact
Emergent abilities β why capabilities appear suddenly at scale; still an open research question
In-context learning β why does it work? Bayesian meta-learning interpretations
Chain-of-Thought reasoning β why does asking a model to "think step by step" actually help? The mechanistic reasons
Beyond language:
Diffusion models β the math: score matching, denoising score matching, stochastic differential equations (SDEs). These are not just image models; they're a general generative framework
Flow matching β newer and faster than diffusion; increasingly used in production (Meta's Voicebox, Stable Diffusion 3)
VAEs and VQ-VAEs β the latent space compression underpinning most multimodal systems
Autoregressive image models β VQGAN, LlamaGen; images as token sequences
Video generation β Sora-style architectures, spatiotemporal transformers
Alignment & Training:
RLHF β the full pipeline: reward model training, PPO, why it's unstable
DPO (Direct Preference Optimization) β why it replaced PPO in many labs; the math behind it
Constitutional AI β Anthropic's approach; RLAIF (RL from AI feedback)
RLOO, GRPO, REINFORCE variants β the new wave of alignment algorithms
Mechanistic interpretability β understanding what circuits inside transformers implement; superposition, features, induction heads
π§ LAYER 2 β The Core of Agentic AI (Research Level)
This is where your focus needs to be deep, not broad.
2.1 Agent Architectures
The core loop every agent runs:
Perceive β Reason β Plan β Act β Observe β Repeat
Architectures to master:
ReAct (Reason + Act) β the foundational paper. Read Yao et al., 2022. Agents interleave reasoning traces with actions.
Reflexion β agents that reflect on past failures and self-improve without gradient updates. Read Shinn et al., 2023.
Tree of Thoughts (ToT) β search over reasoning paths instead of greedy decoding. Read Yao et al., 2023.
Graph of Thoughts (GoT) β reasoning as an arbitrary graph, not just trees or chains
MCTS + LLMs β Monte Carlo Tree Search applied to LLM reasoning (used in AlphaCode 2, o1-style models)
Self-Consistency β sample multiple reasoning paths, take majority vote. Simple but powerful.
2.2 Planning
This is a core research area. Agents fail most often because of bad planning:
Classical planning β STRIPS, PDDL. Know the history; it informs modern hybrid approaches.
LLM-based planning β how do you get an LLM to plan reliably? Task decomposition, subgoal generation.
Hierarchical planning β plans at multiple levels of abstraction (macro-tasks β micro-actions)
Plan verification β how does an agent know its plan is valid before executing it?
Replanning β what happens when the environment changes mid-execution?
2.3 Memory (The Most Underrated Topic)
Memory is what separates toy agents from real ones. Four types you must understand:
In-context memory β everything in the context window. Fast but limited and ephemeral.
External/episodic memory β vector databases (Pinecone, Weaviate, Chroma), retrieval by embedding similarity
Semantic/procedural memory β knowledge graphs, structured databases
Parametric memory β knowledge baked into model weights via fine-tuning
Research questions here: How do you decide what to store? When to retrieve? How to handle memory conflicts? These are open problems.
2.4 Tool Use
Function calling β how LLMs interface with external tools via structured JSON schemas
Tool selection β given 50 tools, how does an agent pick the right one?
Tool composition β chaining tools in sequence or parallel
Computer use / GUI agents β agents that operate browsers, desktops, terminals. This is a hot research area (Anthropic's computer use, OpenAI's Operator)
Code execution as a tool β the agent writes and runs code as part of reasoning (code interpreter pattern)
2.5 Reasoning Models (The Newest Frontier)
This is the current bleeding edge:
Chain-of-Thought with search (o1/o3 style) β internal extended reasoning before answering; models that "think longer" on hard problems
Process Reward Models (PRMs) β instead of rewarding final answers, reward each reasoning step. Critical for reliable reasoning.
Outcome Reward Models (ORMs) β train on final answer correctness
Test-time compute scaling β the idea that you can trade inference compute for accuracy. This is a paradigm shift away from "bigger training = better"
Self-play and self-improvement β agents that generate their own training data
πΈοΈ LAYER 3 β Multi-Agent Systems (Research Level)
This is an entire subfield:
3.1 Coordination Patterns
Hierarchical β orchestrator delegates to subagents
Flat/peer-to-peer β agents communicate as equals
Market-based β agents bid for tasks (auction mechanisms)
Blackboard systems β shared memory space all agents read/write to
3.2 Communication
How do agents communicate? Natural language? Structured JSON? Embeddings?
Agent protocols β Anthropic's MCP (Model Context Protocol) is a real standard for agent-tool communication; study it deeply
A2A (Agent-to-Agent) protocol β Google's emerging standard for inter-agent communication
3.3 Emergent Behavior
What happens when many agents interact? Do they cooperate or compete?
Social simulation research β "Generative Agents" paper (Park et al., 2023) β 25 agents in a simulated town, emergent social behaviors
Collective intelligence β can multi-agent systems exceed the capability of any individual agent?
3.4 Trust & Verification in Multi-Agent
How does an orchestrator know if a subagent's output is correct?
Agent debate β multiple agents argue opposing positions; a judge agent decides
Ensemble and critic patterns β one agent generates, another critiques
π LAYER 4 β RAG (Retrieval-Augmented Generation) β Deep
RAG is the primary way agents access external knowledge:
Basic RAG:
- Chunk documents β embed β store in vector DB β retrieve top-k β inject into context
Advanced RAG (what researchers work on):
HyDE (Hypothetical Document Embeddings) β generate a fake answer, embed it, use it to retrieve real docs
FLARE β generate incrementally, retrieve only when confidence is low
Self-RAG β model decides when to retrieve; produces reflection tokens
Corrective RAG (CRAG) β retrieval quality assessment; fallback to web search if docs are irrelevant
GraphRAG (Microsoft) β builds a knowledge graph from documents; queries the graph, not just vector similarity
Agentic RAG β the agent actively formulates queries, checks sufficiency, re-retrieves if needed
Evaluation of RAG:
- RAGAS framework β faithfulness, answer relevance, context precision, context recall
βοΈ LAYER 5 β Orchestration Frameworks (Know Deeply, Not Just Superficially)
| Framework | What It's For |
|---|---|
| LangChain | General LLM pipeline construction; chains, memory, tools |
| LangGraph | Stateful, cyclical agent workflows as directed graphs |
| LlamaIndex | Data ingestion, RAG pipelines, knowledge agents |
| CrewAI | Role-based multi-agent teams with defined tasks |
| AutoGen (Microsoft) | Conversational multi-agent systems |
| DSPy | Compiling prompts automatically instead of writing them manually β a major research direction |
| Haystack | Production-grade RAG and agent pipelines |
For a researcher, DSPy deserves special attention. It treats prompt engineering as an optimization problem and automatically tunes prompts/few-shot examples. It's the future direction of the field.
π LAYER 6 β Safety, Alignment & Trust
For a researcher, this is not optional:
Prompt injection β adversarial inputs that hijack agent behavior; major attack surface
Goal hijacking β agent is manipulated into pursuing a different goal
Reward hacking β agent finds shortcuts to maximize reward without achieving the real goal
Corrigibility β designing agents that accept human correction without resistance
Constitutional AI and RLAIF β scalable oversight mechanisms
Minimal footprint principle β agents should request only necessary permissions, prefer reversible actions
Human-in-the-loop design patterns β when should an agent pause and ask? This is a UX + systems problem.
Agent red-teaming β adversarially probing agent systems for failures
π LAYER 7 β Evaluation (A Research Specialty Unto Itself)
How do you know if your agent is actually good?
Benchmarks to know:
GAIA β general AI assistants benchmark; real-world tasks requiring tools and multi-step reasoning
SWE-bench β agents solving real GitHub issues; the gold standard for coding agents
AgentBench β multi-environment agent evaluation
WebArena / WorkArena β web browsing agent benchmarks
HotpotQA, MultiHop RAG β multi-hop reasoning benchmarks
Evaluation methodologies:
LLM-as-Judge β use a strong LLM to evaluate outputs; fast but has biases
Process reward models β evaluate reasoning steps, not just final answers
Human-in-the-loop evaluation β gold standard but expensive
Trajectory evaluation β did the agent take reasonable steps, not just reach the right answer?
π¬ LAYER 8 β Core Research Papers to Read (In Order)
Foundational:
Attention Is All You Need β Vaswani et al., 2017
Language Models are Few-Shot Learners (GPT-3) β Brown et al., 2020
Chain-of-Thought Prompting Elicits Reasoning in LLMs β Wei et al., 2022
Training Language Models to Follow Instructions with Human Feedback (InstructGPT) β Ouyang et al., 2022
Agentic AI Core:
ReAct: Synergizing Reasoning and Acting in Language Models β Yao et al., 2022
Toolformer: Language Models Can Teach Themselves to Use Tools β Schick et al., 2023
HuggingGPT / JARVIS β orchestrating multiple models as tools
Reflexion: Language Agents with Verbal Reinforcement Learning β Shinn et al., 2023
Tree of Thoughts β Yao et al., 2023
Generative Agents: Interactive Simulacra of Human Behavior β Park et al., 2023
Reasoning Models:
Let's Verify Step by Step (Process Reward Models) β Lightman et al., OpenAI
Self-play Fine-Tuning (SPIN)
DeepSeek-R1 β open-source reasoning model; read the technical report
RAG:
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks β Lewis et al., 2020
Self-RAG β Asai et al., 2023
GraphRAG β Edge et al., Microsoft 2024
Safety:
Constitutional AI β Anthropic, 2022
Direct Preference Optimization β Rafailov et al., 2023
π οΈ LAYER 9 β Engineering Skills (For a Researcher, These Enable Your Experiments)
Python mastery β async/await matters a lot for agents running parallel tool calls
Docker + cloud deployment β your agents need to run somewhere persistent
Vector databases β understand HNSW indexing, approximate nearest neighbor search
Distributed systems basics β multi-agent systems are distributed systems
LLM APIs β OpenAI, Anthropic, Google; function calling, streaming, token management
Weights & Biases / MLflow β experiment tracking; non-negotiable for research
Hugging Face ecosystem β transformers, datasets, PEFT, TRL libraries
π LAYER 10 β Active Research Frontiers (Where Papers Are Being Written Now)
These are the open questions. Choose one as your niche:
Long-horizon planning β agents fail at tasks requiring 50+ steps. Why? How to fix?
Memory consolidation β how should agents decide what to remember and forget?
Efficient tool use β reducing the number of tool calls needed to complete a task
Agent communication protocols β formalizing how agents talk to each other
Trustworthy delegation β how does a human safely give an agent significant autonomy?
Emergent multi-agent behavior β cooperation, deception, coalition formation
Test-time compute scaling laws β what's the relationship between thinking time and performance?
Embodied agents β connecting language agents to physical robots (the bridge between Agentic AI and robotics)
Neurosymbolic agents β combining LLM flexibility with formal logic guarantees
Self-modifying agents β agents that improve their own prompts/tools over time
π Suggested Study Sequence
Month 1-2: Fill math gaps (probability, graph theory, information theory)
Month 2-3: Deep GenAI (diffusion, alignment, RLHF, DPO, interpretability)
Month 3-4: Core agent papers (ReAct β Reflexion β ToT β Generative Agents)
Month 4-5: RAG (Basic β Advanced β GraphRAG β Agentic RAG)
Month 5-6: Build: one end-to-end agentic system in LangGraph + CrewAI
Month 6-7: Multi-agent systems + safety research
Month 7-8: Deep dive on one research frontier; start reading Arxiv daily
Month 8+: Start contributing β replicate a paper, extend it, publish
The single most important mindset shift: stop consuming, start building experiments. Every paper you read should have an associated experiment you run. That's what separates a researcher from a student.
Top comments (0)