Learning from the Future: 7 Insights from UC Berkeley's LLM Agents Course

Micol Altomare — Sun, 01 Jun 2025 06:22:05 +0000

I’m a firm believer in lifelong learning, especially in a field as dynamic as Artificial Intelligence. So, when I heard about the Advanced LLM Agents MOOC (Massive Open Online Course) offered by UC Berkeley, featuring lectures from leading researchers, I knew I had to dive in. The promise of learning from the architects of tomorrow's AI, like Prof. Dawn Song and a host of industry pioneers, was too good to pass up. You can explore the course yourself – the content is publicly available: https://llmagents-learning.org/sp25.

One lecture that particularly resonated with me was "Learning to Self-Improve & Reason with LLMs" by Jason Weston of Meta & NYU. His presentation was a deep dive into the cutting edge of making LLMs smarter and more capable. Here are my top 7 takeaways from his insightful session:

The Grand Goal: Self-Training AI: Weston kicked off by outlining a tantalizing research ambition: an AI that "trains" itself as much as possible. This involves the AI creating new tasks, evaluating its own performance ("self-rewarding"), and updating itself. The big question? Can this iterative self-improvement lead to superhuman capabilities? This set an inspiring tone for the entire lecture.
System 1 vs. System 2 Thinking for LLMs: A core concept discussed was the distinction between two types of reasoning. System 1 is reactive and associative, which is how current LLMs often operate, leading to issues like hallucinations and spurious correlations. System 2 is more deliberate, involving multiple "calls" to System 1, planning, and verification. The lecture emphasized that improving reasoning in LLM agents means developing their System 2 capabilities.
The Shoulders of Giants – Building on Past Breakthroughs: It was fascinating to see the "pre-history" of modern LLMs. Weston highlighted work like the 2008 paper "A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning" by Collobert and Weston[cite: 16]. This paper, which initially wasn't mainstream, eventually won a "Test of Time Award", underscoring that today's rapid advancements are built upon years of foundational research.
Beyond Scaling – The Need for Smarter, Not Just Bigger: The "scaling hypothesis" – the idea that bigger models and datasets automatically lead to better performance – was discussed, referencing Ilya Sutskever's influential insight. However, Weston pointed out that just language modeling (and by extension, scaling alone) isn't enough to create truly intelligent agents, stating the answer to "Is just language modeling enough?" is "no". This realization motivates the push towards self-improvement and enhanced reasoning.
Self-Rewarding LLMs in Action: A significant portion of the talk focused on Self-Rewarding LLMs, where models learn to assign rewards to their own outputs and optimize themselves. The iterative two-step process is key: (1) Self-instruction creation (generating prompts, responses, and self-rewards with the LM) and (2) Instruction-following training (using techniques like Direct Preference Optimization (DPO) on the selected preference pairs). This loop allows models to get better at both following instructions and evaluating their responses over iterations.
Making Reasoning Verifiable: Iterative Reasoning Preference Optimization: Weston detailed approaches to improve reasoning on more complex tasks, such as Iterative Reasoning Preference Optimization. This method involves the LLM generating multiple chains of thought (CoTs) and answers for a given problem. Preference pairs are then built based on whether the answer is correct vs. not, and the model is trained using these preferences (DPO + NLL term for correct answers). This shows a concrete path to refining the reasoning process itself through self-generated data.
The Next Frontier: Meta-Rewarding LLMs: Taking self-improvement a step further, the lecture touched upon Meta-Rewarding LLMs. Here, the LLM acts as an actor, a judge (evaluating responses), and even a "meta-judge" that evaluates the quality of its own judgments. This "meta-judge" step adds another layer of training signal, aiming to improve judgment capabilities which can saturate in simpler self-rewarding schemes. The meme-worthy image of "WE NEED TO GO DEEPER" perfectly captured this recursive improvement idea.

Jason Weston's lecture was a truly eye-opening session, offering a deep dive into the quest to build LLMs that can genuinely reason and essentially lift themselves by their own bootstraps. But his insights into self-improvement, like the fascinating concepts of Self-Rewarding and Meta-Rewarding LLMs, are just one facet of the incredible landscape this UC Berkeley LLM Agents course explores. The syllabus is a veritable feast for AI enthusiasts, covering a wide array of critical areas. You'll find sessions on inference-time techniques for LLM reasoning, how agents manage search and planning, and the intricacies of agentic workflow, tool use, and functional calling. The course also delves into specialized applications like LLMs for code generation and verification, their growing role in mathematics through data curation and theorem proving, and the exciting developments in multimodal autonomous AI agents. And, crucially, it addresses the paramount challenge of building safe and secure agentic AI, a topic Prof. Dawn Song herself champions.

Learning Agentic AI with UC Berkeley’s LLM Agents Course

Micol Altomare — Fri, 20 Dec 2024 07:12:37 +0000

As someone spending time in the realms of product in the fintech industry and machine learning at the University of Toronto, I’ve spent a lot of time thinking about the intersection of AI research and its real-world applications. So, when I signed up for UC Berkeley’s LLM Agents MOOC, I was excited to dig into how large language models (LLMs) are evolving into agents that can reason, act, and collaborate in increasingly complex ways.

This course turned out to be one of the most engaging learning experiences I’ve had recently. The lectures were packed with insights into the theory and practice of LLMs, and the course Discord server gave the whole thing a collaborative, human touch. Talking through lecture concepts, asking questions, and seeing other students’ perspectives made the learning experience way more dynamic and social than I expected for an online course.

Shunyu Yao’s lecture on ReAct frameworks was definitely a favourite. It broke down how agents can unify reasoning across tasks like question answering, symbolic reasoning, and tool use. It struck me that while these systems are powerful, they’re still far from perfect and require a lot of manual design for domain-specific applications. As someone working on user-facing products, it made me think about how important it is to design systems that balance flexibility with reliability, especially for non-technical end users.

Another standout lecture came from Burak Gokturk, who discussed trends in enterprise AI. One of the big takeaways was how AI is shifting from dense, single-task models toward sparse, multi-modal ones that can handle everything from text to images to video. This resonates with what I see in the tech world—companies are racing to build generalist systems that can do it all, but the real challenge lies in making them scalable, safe, and cost-effective.

The course didn’t just stick to theory—it dove into real challenges like debugging monolithic models, building modular AI systems, and even designing agents for software development. One thing that stuck with me from Lecture 5 on compound AI systems was how modularity can make these systems more transparent and controllable. That’s something I think we need more of in the real world, especially as these models become more integrated into workflows that affect actual people.

But honestly, what made the course special wasn’t just the lectures—it was the format. The asynchronous structure meant I could fit it around my schedule, and the Discord server made it feel like I wasn’t learning in isolation. I appreciated how accessible everything was, from the well-organized slides to the recorded sessions.

Looking back, the biggest thing I’ve taken away is the importance of bridging research and application. Whether you’re debugging a compound AI system or designing a user-friendly product, it’s all about balancing ambition with responsibility. For now, I’m excited to apply what I’ve learned—both in my studies and in building better tools for the future.

DEV Community: Micol Altomare

Learning from the Future: 7 Insights from UC Berkeley's LLM Agents Course

Learning Agentic AI with UC Berkeley’s LLM Agents Course