Jenil Sheth

Posted on Apr 24

Shifting Deep Learning to Agentic AI

#agents #ai #deeplearning #llm

🧱 FOUNDATION — What You Already Have + What's Missing

You know deep learning and LLM token prediction. Good. But you're likely missing:

Mathematical gaps to close first:

Information theory — entropy, KL divergence, mutual information. These underpin how LLMs are trained and evaluated.
Optimization theory — beyond SGD. Understand Adam, second-order methods, loss landscape geometry.
Probability theory (advanced) — Bayesian inference, variational inference, stochastic processes. Agents make decisions under uncertainty; you need this language.
Graph theory — agents are graphs of computation. LangGraph isn't magic; it's directed graphs with state.
Control theory — feedback loops, stability, convergence. Agentic systems are control systems at heart.
Game theory — multi-agent systems have competitive and cooperative dynamics. Nash equilibria, mechanism design.

🤖 LAYER 1 — The Full GenAI Landscape

Before agents, you need to understand every major paradigm of generative modeling:

Language Models (deepen what you know):

Transformer architecture internals — attention heads, positional encoding, KV cache
Scaling laws (Chinchilla, Kaplan et al.) — how compute, data, and model size interact
Emergent abilities — why capabilities appear suddenly at scale; still an open research question
In-context learning — why does it work? Bayesian meta-learning interpretations
Chain-of-Thought reasoning — why does asking a model to "think step by step" actually help? The mechanistic reasons

Beyond language:

Diffusion models — the math: score matching, denoising score matching, stochastic differential equations (SDEs). These are not just image models; they're a general generative framework
Flow matching — newer and faster than diffusion; increasingly used in production (Meta's Voicebox, Stable Diffusion 3)
VAEs and VQ-VAEs — the latent space compression underpinning most multimodal systems
Autoregressive image models — VQGAN, LlamaGen; images as token sequences
Video generation — Sora-style architectures, spatiotemporal transformers

Alignment & Training:

RLHF — the full pipeline: reward model training, PPO, why it's unstable
DPO (Direct Preference Optimization) — why it replaced PPO in many labs; the math behind it
Constitutional AI — Anthropic's approach; RLAIF (RL from AI feedback)
RLOO, GRPO, REINFORCE variants — the new wave of alignment algorithms
Mechanistic interpretability — understanding what circuits inside transformers implement; superposition, features, induction heads

🧠 LAYER 2 — The Core of Agentic AI (Research Level)

This is where your focus needs to be deep, not broad.

2.1 Agent Architectures

The core loop every agent runs:


Perceive → Reason → Plan → Act → Observe → Repeat

Architectures to master:

ReAct (Reason + Act) — the foundational paper. Read Yao et al., 2022. Agents interleave reasoning traces with actions.
Reflexion — agents that reflect on past failures and self-improve without gradient updates. Read Shinn et al., 2023.
Tree of Thoughts (ToT) — search over reasoning paths instead of greedy decoding. Read Yao et al., 2023.
Graph of Thoughts (GoT) — reasoning as an arbitrary graph, not just trees or chains
MCTS + LLMs — Monte Carlo Tree Search applied to LLM reasoning (used in AlphaCode 2, o1-style models)
Self-Consistency — sample multiple reasoning paths, take majority vote. Simple but powerful.

2.2 Planning

This is a core research area. Agents fail most often because of bad planning:

Classical planning — STRIPS, PDDL. Know the history; it informs modern hybrid approaches.
LLM-based planning — how do you get an LLM to plan reliably? Task decomposition, subgoal generation.
Hierarchical planning — plans at multiple levels of abstraction (macro-tasks → micro-actions)
Plan verification — how does an agent know its plan is valid before executing it?
Replanning — what happens when the environment changes mid-execution?

2.3 Memory (The Most Underrated Topic)

Memory is what separates toy agents from real ones. Four types you must understand:

In-context memory — everything in the context window. Fast but limited and ephemeral.
External/episodic memory — vector databases (Pinecone, Weaviate, Chroma), retrieval by embedding similarity
Semantic/procedural memory — knowledge graphs, structured databases
Parametric memory — knowledge baked into model weights via fine-tuning

Research questions here: How do you decide what to store? When to retrieve? How to handle memory conflicts? These are open problems.

2.4 Tool Use

Function calling — how LLMs interface with external tools via structured JSON schemas
Tool selection — given 50 tools, how does an agent pick the right one?
Tool composition — chaining tools in sequence or parallel
Computer use / GUI agents — agents that operate browsers, desktops, terminals. This is a hot research area (Anthropic's computer use, OpenAI's Operator)
Code execution as a tool — the agent writes and runs code as part of reasoning (code interpreter pattern)

2.5 Reasoning Models (The Newest Frontier)

This is the current bleeding edge:

Chain-of-Thought with search (o1/o3 style) — internal extended reasoning before answering; models that "think longer" on hard problems
Process Reward Models (PRMs) — instead of rewarding final answers, reward each reasoning step. Critical for reliable reasoning.
Outcome Reward Models (ORMs) — train on final answer correctness
Test-time compute scaling — the idea that you can trade inference compute for accuracy. This is a paradigm shift away from "bigger training = better"
Self-play and self-improvement — agents that generate their own training data

🕸️ LAYER 3 — Multi-Agent Systems (Research Level)

This is an entire subfield:

3.1 Coordination Patterns

Hierarchical — orchestrator delegates to subagents
Flat/peer-to-peer — agents communicate as equals
Market-based — agents bid for tasks (auction mechanisms)
Blackboard systems — shared memory space all agents read/write to

3.2 Communication

How do agents communicate? Natural language? Structured JSON? Embeddings?
Agent protocols — Anthropic's MCP (Model Context Protocol) is a real standard for agent-tool communication; study it deeply
A2A (Agent-to-Agent) protocol — Google's emerging standard for inter-agent communication

3.3 Emergent Behavior

What happens when many agents interact? Do they cooperate or compete?
Social simulation research — "Generative Agents" paper (Park et al., 2023) — 25 agents in a simulated town, emergent social behaviors
Collective intelligence — can multi-agent systems exceed the capability of any individual agent?

3.4 Trust & Verification in Multi-Agent

How does an orchestrator know if a subagent's output is correct?
Agent debate — multiple agents argue opposing positions; a judge agent decides
Ensemble and critic patterns — one agent generates, another critiques

📚 LAYER 4 — RAG (Retrieval-Augmented Generation) — Deep

RAG is the primary way agents access external knowledge:

Basic RAG:

Chunk documents → embed → store in vector DB → retrieve top-k → inject into context

Advanced RAG (what researchers work on):

HyDE (Hypothetical Document Embeddings) — generate a fake answer, embed it, use it to retrieve real docs
FLARE — generate incrementally, retrieve only when confidence is low
Self-RAG — model decides when to retrieve; produces reflection tokens
Corrective RAG (CRAG) — retrieval quality assessment; fallback to web search if docs are irrelevant
GraphRAG (Microsoft) — builds a knowledge graph from documents; queries the graph, not just vector similarity
Agentic RAG — the agent actively formulates queries, checks sufficiency, re-retrieves if needed

Evaluation of RAG:

RAGAS framework — faithfulness, answer relevance, context precision, context recall

⚙️ LAYER 5 — Orchestration Frameworks (Know Deeply, Not Just Superficially)

| Framework | What It's For |

|---|---|

| LangChain | General LLM pipeline construction; chains, memory, tools |

| LangGraph | Stateful, cyclical agent workflows as directed graphs |

| LlamaIndex | Data ingestion, RAG pipelines, knowledge agents |

| CrewAI | Role-based multi-agent teams with defined tasks |

| AutoGen (Microsoft) | Conversational multi-agent systems |

| DSPy | Compiling prompts automatically instead of writing them manually — a major research direction |

| Haystack | Production-grade RAG and agent pipelines |

For a researcher, DSPy deserves special attention. It treats prompt engineering as an optimization problem and automatically tunes prompts/few-shot examples. It's the future direction of the field.

🔒 LAYER 6 — Safety, Alignment & Trust

For a researcher, this is not optional:

Prompt injection — adversarial inputs that hijack agent behavior; major attack surface
Goal hijacking — agent is manipulated into pursuing a different goal
Reward hacking — agent finds shortcuts to maximize reward without achieving the real goal
Corrigibility — designing agents that accept human correction without resistance
Constitutional AI and RLAIF — scalable oversight mechanisms
Minimal footprint principle — agents should request only necessary permissions, prefer reversible actions
Human-in-the-loop design patterns — when should an agent pause and ask? This is a UX + systems problem.
Agent red-teaming — adversarially probing agent systems for failures

📏 LAYER 7 — Evaluation (A Research Specialty Unto Itself)

How do you know if your agent is actually good?

Benchmarks to know:

GAIA — general AI assistants benchmark; real-world tasks requiring tools and multi-step reasoning
SWE-bench — agents solving real GitHub issues; the gold standard for coding agents
AgentBench — multi-environment agent evaluation
WebArena / WorkArena — web browsing agent benchmarks
HotpotQA, MultiHop RAG — multi-hop reasoning benchmarks

Evaluation methodologies:

LLM-as-Judge — use a strong LLM to evaluate outputs; fast but has biases
Process reward models — evaluate reasoning steps, not just final answers
Human-in-the-loop evaluation — gold standard but expensive
Trajectory evaluation — did the agent take reasonable steps, not just reach the right answer?

🔬 LAYER 8 — Core Research Papers to Read (In Order)

Foundational:

Attention Is All You Need — Vaswani et al., 2017
Language Models are Few-Shot Learners (GPT-3) — Brown et al., 2020
Chain-of-Thought Prompting Elicits Reasoning in LLMs — Wei et al., 2022
Training Language Models to Follow Instructions with Human Feedback (InstructGPT) — Ouyang et al., 2022

Agentic AI Core:

ReAct: Synergizing Reasoning and Acting in Language Models — Yao et al., 2022
Toolformer: Language Models Can Teach Themselves to Use Tools — Schick et al., 2023
HuggingGPT / JARVIS — orchestrating multiple models as tools
Reflexion: Language Agents with Verbal Reinforcement Learning — Shinn et al., 2023
Tree of Thoughts — Yao et al., 2023
Generative Agents: Interactive Simulacra of Human Behavior — Park et al., 2023

Reasoning Models:

Let's Verify Step by Step (Process Reward Models) — Lightman et al., OpenAI
Self-play Fine-Tuning (SPIN)
DeepSeek-R1 — open-source reasoning model; read the technical report

RAG:

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks — Lewis et al., 2020
Self-RAG — Asai et al., 2023
GraphRAG — Edge et al., Microsoft 2024

Safety:

Constitutional AI — Anthropic, 2022
Direct Preference Optimization — Rafailov et al., 2023

🛠️ LAYER 9 — Engineering Skills (For a Researcher, These Enable Your Experiments)

Python mastery — async/await matters a lot for agents running parallel tool calls
Docker + cloud deployment — your agents need to run somewhere persistent
Vector databases — understand HNSW indexing, approximate nearest neighbor search
Distributed systems basics — multi-agent systems are distributed systems
LLM APIs — OpenAI, Anthropic, Google; function calling, streaming, token management
Weights & Biases / MLflow — experiment tracking; non-negotiable for research
Hugging Face ecosystem — transformers, datasets, PEFT, TRL libraries

🔭 LAYER 10 — Active Research Frontiers (Where Papers Are Being Written Now)

These are the open questions. Choose one as your niche:

Long-horizon planning — agents fail at tasks requiring 50+ steps. Why? How to fix?
Memory consolidation — how should agents decide what to remember and forget?
Efficient tool use — reducing the number of tool calls needed to complete a task
Agent communication protocols — formalizing how agents talk to each other
Trustworthy delegation — how does a human safely give an agent significant autonomy?
Emergent multi-agent behavior — cooperation, deception, coalition formation
Test-time compute scaling laws — what's the relationship between thinking time and performance?
Embodied agents — connecting language agents to physical robots (the bridge between Agentic AI and robotics)
Neurosymbolic agents — combining LLM flexibility with formal logic guarantees
Self-modifying agents — agents that improve their own prompts/tools over time

📐 Suggested Study Sequence


Month 1-2:   Fill math gaps (probability, graph theory, information theory)

Month 2-3:   Deep GenAI (diffusion, alignment, RLHF, DPO, interpretability)

Month 3-4:   Core agent papers (ReAct → Reflexion → ToT → Generative Agents)

Month 4-5:   RAG (Basic → Advanced → GraphRAG → Agentic RAG)

Month 5-6:   Build: one end-to-end agentic system in LangGraph + CrewAI

Month 6-7:   Multi-agent systems + safety research

Month 7-8:   Deep dive on one research frontier; start reading Arxiv daily

Month 8+:    Start contributing — replicate a paper, extend it, publish

The single most important mindset shift: stop consuming, start building experiments. Every paper you read should have an associated experiment you run. That's what separates a researcher from a student.

DEV Community