DEV Community

Jenil Sheth
Jenil Sheth

Posted on

Shifting Deep Learning to Agentic AI

🧱 FOUNDATION β€” What You Already Have + What's Missing

You know deep learning and LLM token prediction. Good. But you're likely missing:

Mathematical gaps to close first:

  • Information theory β€” entropy, KL divergence, mutual information. These underpin how LLMs are trained and evaluated.

  • Optimization theory β€” beyond SGD. Understand Adam, second-order methods, loss landscape geometry.

  • Probability theory (advanced) β€” Bayesian inference, variational inference, stochastic processes. Agents make decisions under uncertainty; you need this language.

  • Graph theory β€” agents are graphs of computation. LangGraph isn't magic; it's directed graphs with state.

  • Control theory β€” feedback loops, stability, convergence. Agentic systems are control systems at heart.

  • Game theory β€” multi-agent systems have competitive and cooperative dynamics. Nash equilibria, mechanism design.


πŸ€– LAYER 1 β€” The Full GenAI Landscape

Before agents, you need to understand every major paradigm of generative modeling:

Language Models (deepen what you know):

  • Transformer architecture internals β€” attention heads, positional encoding, KV cache

  • Scaling laws (Chinchilla, Kaplan et al.) β€” how compute, data, and model size interact

  • Emergent abilities β€” why capabilities appear suddenly at scale; still an open research question

  • In-context learning β€” why does it work? Bayesian meta-learning interpretations

  • Chain-of-Thought reasoning β€” why does asking a model to "think step by step" actually help? The mechanistic reasons

Beyond language:

  • Diffusion models β€” the math: score matching, denoising score matching, stochastic differential equations (SDEs). These are not just image models; they're a general generative framework

  • Flow matching β€” newer and faster than diffusion; increasingly used in production (Meta's Voicebox, Stable Diffusion 3)

  • VAEs and VQ-VAEs β€” the latent space compression underpinning most multimodal systems

  • Autoregressive image models β€” VQGAN, LlamaGen; images as token sequences

  • Video generation β€” Sora-style architectures, spatiotemporal transformers

Alignment & Training:

  • RLHF β€” the full pipeline: reward model training, PPO, why it's unstable

  • DPO (Direct Preference Optimization) β€” why it replaced PPO in many labs; the math behind it

  • Constitutional AI β€” Anthropic's approach; RLAIF (RL from AI feedback)

  • RLOO, GRPO, REINFORCE variants β€” the new wave of alignment algorithms

  • Mechanistic interpretability β€” understanding what circuits inside transformers implement; superposition, features, induction heads


🧠 LAYER 2 β€” The Core of Agentic AI (Research Level)

This is where your focus needs to be deep, not broad.

2.1 Agent Architectures

The core loop every agent runs:


Perceive β†’ Reason β†’ Plan β†’ Act β†’ Observe β†’ Repeat

Enter fullscreen mode Exit fullscreen mode

Architectures to master:

  • ReAct (Reason + Act) β€” the foundational paper. Read Yao et al., 2022. Agents interleave reasoning traces with actions.

  • Reflexion β€” agents that reflect on past failures and self-improve without gradient updates. Read Shinn et al., 2023.

  • Tree of Thoughts (ToT) β€” search over reasoning paths instead of greedy decoding. Read Yao et al., 2023.

  • Graph of Thoughts (GoT) β€” reasoning as an arbitrary graph, not just trees or chains

  • MCTS + LLMs β€” Monte Carlo Tree Search applied to LLM reasoning (used in AlphaCode 2, o1-style models)

  • Self-Consistency β€” sample multiple reasoning paths, take majority vote. Simple but powerful.

2.2 Planning

This is a core research area. Agents fail most often because of bad planning:

  • Classical planning β€” STRIPS, PDDL. Know the history; it informs modern hybrid approaches.

  • LLM-based planning β€” how do you get an LLM to plan reliably? Task decomposition, subgoal generation.

  • Hierarchical planning β€” plans at multiple levels of abstraction (macro-tasks β†’ micro-actions)

  • Plan verification β€” how does an agent know its plan is valid before executing it?

  • Replanning β€” what happens when the environment changes mid-execution?

2.3 Memory (The Most Underrated Topic)

Memory is what separates toy agents from real ones. Four types you must understand:

  • In-context memory β€” everything in the context window. Fast but limited and ephemeral.

  • External/episodic memory β€” vector databases (Pinecone, Weaviate, Chroma), retrieval by embedding similarity

  • Semantic/procedural memory β€” knowledge graphs, structured databases

  • Parametric memory β€” knowledge baked into model weights via fine-tuning

Research questions here: How do you decide what to store? When to retrieve? How to handle memory conflicts? These are open problems.

2.4 Tool Use

  • Function calling β€” how LLMs interface with external tools via structured JSON schemas

  • Tool selection β€” given 50 tools, how does an agent pick the right one?

  • Tool composition β€” chaining tools in sequence or parallel

  • Computer use / GUI agents β€” agents that operate browsers, desktops, terminals. This is a hot research area (Anthropic's computer use, OpenAI's Operator)

  • Code execution as a tool β€” the agent writes and runs code as part of reasoning (code interpreter pattern)

2.5 Reasoning Models (The Newest Frontier)

This is the current bleeding edge:

  • Chain-of-Thought with search (o1/o3 style) β€” internal extended reasoning before answering; models that "think longer" on hard problems

  • Process Reward Models (PRMs) β€” instead of rewarding final answers, reward each reasoning step. Critical for reliable reasoning.

  • Outcome Reward Models (ORMs) β€” train on final answer correctness

  • Test-time compute scaling β€” the idea that you can trade inference compute for accuracy. This is a paradigm shift away from "bigger training = better"

  • Self-play and self-improvement β€” agents that generate their own training data


πŸ•ΈοΈ LAYER 3 β€” Multi-Agent Systems (Research Level)

This is an entire subfield:

3.1 Coordination Patterns

  • Hierarchical β€” orchestrator delegates to subagents

  • Flat/peer-to-peer β€” agents communicate as equals

  • Market-based β€” agents bid for tasks (auction mechanisms)

  • Blackboard systems β€” shared memory space all agents read/write to

3.2 Communication

  • How do agents communicate? Natural language? Structured JSON? Embeddings?

  • Agent protocols β€” Anthropic's MCP (Model Context Protocol) is a real standard for agent-tool communication; study it deeply

  • A2A (Agent-to-Agent) protocol β€” Google's emerging standard for inter-agent communication

3.3 Emergent Behavior

  • What happens when many agents interact? Do they cooperate or compete?

  • Social simulation research β€” "Generative Agents" paper (Park et al., 2023) β€” 25 agents in a simulated town, emergent social behaviors

  • Collective intelligence β€” can multi-agent systems exceed the capability of any individual agent?

3.4 Trust & Verification in Multi-Agent

  • How does an orchestrator know if a subagent's output is correct?

  • Agent debate β€” multiple agents argue opposing positions; a judge agent decides

  • Ensemble and critic patterns β€” one agent generates, another critiques


πŸ“š LAYER 4 β€” RAG (Retrieval-Augmented Generation) β€” Deep

RAG is the primary way agents access external knowledge:

Basic RAG:

  • Chunk documents β†’ embed β†’ store in vector DB β†’ retrieve top-k β†’ inject into context

Advanced RAG (what researchers work on):

  • HyDE (Hypothetical Document Embeddings) β€” generate a fake answer, embed it, use it to retrieve real docs

  • FLARE β€” generate incrementally, retrieve only when confidence is low

  • Self-RAG β€” model decides when to retrieve; produces reflection tokens

  • Corrective RAG (CRAG) β€” retrieval quality assessment; fallback to web search if docs are irrelevant

  • GraphRAG (Microsoft) β€” builds a knowledge graph from documents; queries the graph, not just vector similarity

  • Agentic RAG β€” the agent actively formulates queries, checks sufficiency, re-retrieves if needed

Evaluation of RAG:

  • RAGAS framework β€” faithfulness, answer relevance, context precision, context recall

βš™οΈ LAYER 5 β€” Orchestration Frameworks (Know Deeply, Not Just Superficially)

| Framework | What It's For |

|---|---|

| LangChain | General LLM pipeline construction; chains, memory, tools |

| LangGraph | Stateful, cyclical agent workflows as directed graphs |

| LlamaIndex | Data ingestion, RAG pipelines, knowledge agents |

| CrewAI | Role-based multi-agent teams with defined tasks |

| AutoGen (Microsoft) | Conversational multi-agent systems |

| DSPy | Compiling prompts automatically instead of writing them manually β€” a major research direction |

| Haystack | Production-grade RAG and agent pipelines |

For a researcher, DSPy deserves special attention. It treats prompt engineering as an optimization problem and automatically tunes prompts/few-shot examples. It's the future direction of the field.


πŸ”’ LAYER 6 β€” Safety, Alignment & Trust

For a researcher, this is not optional:

  • Prompt injection β€” adversarial inputs that hijack agent behavior; major attack surface

  • Goal hijacking β€” agent is manipulated into pursuing a different goal

  • Reward hacking β€” agent finds shortcuts to maximize reward without achieving the real goal

  • Corrigibility β€” designing agents that accept human correction without resistance

  • Constitutional AI and RLAIF β€” scalable oversight mechanisms

  • Minimal footprint principle β€” agents should request only necessary permissions, prefer reversible actions

  • Human-in-the-loop design patterns β€” when should an agent pause and ask? This is a UX + systems problem.

  • Agent red-teaming β€” adversarially probing agent systems for failures


πŸ“ LAYER 7 β€” Evaluation (A Research Specialty Unto Itself)

How do you know if your agent is actually good?

Benchmarks to know:

  • GAIA β€” general AI assistants benchmark; real-world tasks requiring tools and multi-step reasoning

  • SWE-bench β€” agents solving real GitHub issues; the gold standard for coding agents

  • AgentBench β€” multi-environment agent evaluation

  • WebArena / WorkArena β€” web browsing agent benchmarks

  • HotpotQA, MultiHop RAG β€” multi-hop reasoning benchmarks

Evaluation methodologies:

  • LLM-as-Judge β€” use a strong LLM to evaluate outputs; fast but has biases

  • Process reward models β€” evaluate reasoning steps, not just final answers

  • Human-in-the-loop evaluation β€” gold standard but expensive

  • Trajectory evaluation β€” did the agent take reasonable steps, not just reach the right answer?


πŸ”¬ LAYER 8 β€” Core Research Papers to Read (In Order)

Foundational:

  1. Attention Is All You Need β€” Vaswani et al., 2017

  2. Language Models are Few-Shot Learners (GPT-3) β€” Brown et al., 2020

  3. Chain-of-Thought Prompting Elicits Reasoning in LLMs β€” Wei et al., 2022

  4. Training Language Models to Follow Instructions with Human Feedback (InstructGPT) β€” Ouyang et al., 2022

Agentic AI Core:

  1. ReAct: Synergizing Reasoning and Acting in Language Models β€” Yao et al., 2022

  2. Toolformer: Language Models Can Teach Themselves to Use Tools β€” Schick et al., 2023

  3. HuggingGPT / JARVIS β€” orchestrating multiple models as tools

  4. Reflexion: Language Agents with Verbal Reinforcement Learning β€” Shinn et al., 2023

  5. Tree of Thoughts β€” Yao et al., 2023

  6. Generative Agents: Interactive Simulacra of Human Behavior β€” Park et al., 2023

Reasoning Models:

  1. Let's Verify Step by Step (Process Reward Models) β€” Lightman et al., OpenAI

  2. Self-play Fine-Tuning (SPIN)

  3. DeepSeek-R1 β€” open-source reasoning model; read the technical report

RAG:

  1. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks β€” Lewis et al., 2020

  2. Self-RAG β€” Asai et al., 2023

  3. GraphRAG β€” Edge et al., Microsoft 2024

Safety:

  1. Constitutional AI β€” Anthropic, 2022

  2. Direct Preference Optimization β€” Rafailov et al., 2023


πŸ› οΈ LAYER 9 β€” Engineering Skills (For a Researcher, These Enable Your Experiments)

  • Python mastery β€” async/await matters a lot for agents running parallel tool calls

  • Docker + cloud deployment β€” your agents need to run somewhere persistent

  • Vector databases β€” understand HNSW indexing, approximate nearest neighbor search

  • Distributed systems basics β€” multi-agent systems are distributed systems

  • LLM APIs β€” OpenAI, Anthropic, Google; function calling, streaming, token management

  • Weights & Biases / MLflow β€” experiment tracking; non-negotiable for research

  • Hugging Face ecosystem β€” transformers, datasets, PEFT, TRL libraries


πŸ”­ LAYER 10 β€” Active Research Frontiers (Where Papers Are Being Written Now)

These are the open questions. Choose one as your niche:

  1. Long-horizon planning β€” agents fail at tasks requiring 50+ steps. Why? How to fix?

  2. Memory consolidation β€” how should agents decide what to remember and forget?

  3. Efficient tool use β€” reducing the number of tool calls needed to complete a task

  4. Agent communication protocols β€” formalizing how agents talk to each other

  5. Trustworthy delegation β€” how does a human safely give an agent significant autonomy?

  6. Emergent multi-agent behavior β€” cooperation, deception, coalition formation

  7. Test-time compute scaling laws β€” what's the relationship between thinking time and performance?

  8. Embodied agents β€” connecting language agents to physical robots (the bridge between Agentic AI and robotics)

  9. Neurosymbolic agents β€” combining LLM flexibility with formal logic guarantees

  10. Self-modifying agents β€” agents that improve their own prompts/tools over time


πŸ“ Suggested Study Sequence


Month 1-2:   Fill math gaps (probability, graph theory, information theory)

Month 2-3:   Deep GenAI (diffusion, alignment, RLHF, DPO, interpretability)

Month 3-4:   Core agent papers (ReAct β†’ Reflexion β†’ ToT β†’ Generative Agents)

Month 4-5:   RAG (Basic β†’ Advanced β†’ GraphRAG β†’ Agentic RAG)

Month 5-6:   Build: one end-to-end agentic system in LangGraph + CrewAI

Month 6-7:   Multi-agent systems + safety research

Month 7-8:   Deep dive on one research frontier; start reading Arxiv daily

Month 8+:    Start contributing β€” replicate a paper, extend it, publish

Enter fullscreen mode Exit fullscreen mode

The single most important mindset shift: stop consuming, start building experiments. Every paper you read should have an associated experiment you run. That's what separates a researcher from a student.

Top comments (0)