<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Syed Mehrab</title>
    <description>The latest articles on DEV Community by Syed Mehrab (@syed_mehrab_08fb0419feedf).</description>
    <link>https://dev.to/syed_mehrab_08fb0419feedf</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3733218%2F29235aaf-3322-4c3d-b0c0-26018ca61736.jpg</url>
      <title>DEV Community: Syed Mehrab</title>
      <link>https://dev.to/syed_mehrab_08fb0419feedf</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/syed_mehrab_08fb0419feedf"/>
    <language>en</language>
    <item>
      <title>Stop Building Static Chatbots: A Beginner’s Guide to LangGraph with Persistence 🚀</title>
      <dc:creator>Syed Mehrab</dc:creator>
      <pubDate>Thu, 09 Apr 2026 08:07:46 +0000</pubDate>
      <link>https://dev.to/syed_mehrab_08fb0419feedf/stop-building-static-chatbots-a-beginners-guide-to-langgraph-with-persistence-34lo</link>
      <guid>https://dev.to/syed_mehrab_08fb0419feedf/stop-building-static-chatbots-a-beginners-guide-to-langgraph-with-persistence-34lo</guid>
      <description>&lt;p&gt;Building a chatbot that just responds to prompts is easy. Building an Agent that can think, use tools, and &lt;strong&gt;remember&lt;/strong&gt; conversations across restarts? That’s where it gets tricky.&lt;/p&gt;

&lt;p&gt;Enter &lt;strong&gt;LangGraph&lt;/strong&gt;. It’s the evolution of LangChain, designed to give you total control over the flow of your AI.&lt;/p&gt;

&lt;p&gt;In this post, I’ll break down the core concepts and show you how to implement a persistent agent using MongoDB as your "brain."&lt;/p&gt;

&lt;h2&gt;
  
  
  1. The Core Concepts (The "Mental Model")
&lt;/h2&gt;

&lt;p&gt;Before we code, you need to understand the four pillars of LangGraph:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;State&lt;/strong&gt;: The "&lt;em&gt;&lt;strong&gt;Source of Truth&lt;/strong&gt;&lt;/em&gt;." It's a shared dictionary or object that every part of your graph can see and update.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Nodes&lt;/strong&gt;: These are just regular Python functions. They take the State, do some work (like calling Claude or GPT), and return an update.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Edges&lt;/strong&gt;: These are the "&lt;strong&gt;&lt;em&gt;traffic lights&lt;/em&gt;&lt;/strong&gt;" They tell the graph where to go next. Conditional Edges are the most powerful. they decide if the agent should call a tool or talk to the human.&lt;/p&gt;

&lt;p&gt;Checkpointer: The "Save Game" button. It stores your state in a database (like MongoDB) so your agent doesn't suffer from amnesia if the server restarts.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Setting Up Your State
&lt;/h2&gt;

&lt;p&gt;In LangGraph, you define what your agent needs to remember. Usually, this is a list of messages.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Annotated&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;TypedDict&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langgraph.graph.message&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;add_messages&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;AgentState&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;TypedDict&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# 'add_messages' is a Reducer. 
&lt;/span&gt;    &lt;span class="c1"&gt;# It tells LangGraph: "Don't overwrite the list, just append new messages!"
&lt;/span&gt;    &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Annotated&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;add_messages&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
    &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  3. Creating Your First Node
&lt;/h2&gt;

&lt;p&gt;A node is where the logic happens. Here is a simple node that calls an LLM:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langchain_anthropic&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ChatAnthropic&lt;/span&gt;

&lt;span class="n"&gt;model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ChatAnthropic&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;claude-3-5-sonnet-20240620&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;chatbot_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;AgentState&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# The node receives the state, calls the LLM
&lt;/span&gt;    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;invoke&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;state&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="c1"&gt;# It returns a dictionary updating the 'messages' list
&lt;/span&gt;    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;]}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  4. Adding "Memory" with MongoDB Checkpointers
&lt;/h2&gt;

&lt;p&gt;If you want your agent to remember "Ali" from yesterday's conversation, you need a Checkpointer. Since many of us use MongoDB, the MongoDBSaver is a perfect choice.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why use a Checkpointer?&lt;/strong&gt;&lt;br&gt;
Fault Tolerance: If a node fails, it resumes from the last save.&lt;/p&gt;

&lt;p&gt;Multi-User: Use a thread_id to keep 1,000 different user conversations separate.&lt;/p&gt;

&lt;p&gt;Audit Trails: You can literally see the "thinking" process saved in your collections.&lt;/p&gt;
&lt;h2&gt;
  
  
  5. Putting it all together: The Implementation
&lt;/h2&gt;

&lt;p&gt;Here is how you compile your graph with persistence:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langgraph.graph&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;StateGraph&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;START&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;END&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;langgraph.checkpoint.mongodb&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;MongoDBSaver&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pymongo&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;MongoClient&lt;/span&gt;

&lt;span class="c1"&gt;# 1. Setup Database
&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MongoClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mongodb://localhost:27017&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;checkpointer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;MongoDBSaver&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;db_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;agent_memory&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 2. Build the Graph
&lt;/span&gt;&lt;span class="n"&gt;workflow&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;StateGraph&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;AgentState&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_node&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;chatbot&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chatbot_node&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 3. Define the Flow
&lt;/span&gt;&lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_edge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;START&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;chatbot&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_edge&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;chatbot&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;END&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 4. Compile with Checkpointer!
&lt;/span&gt;&lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;workflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;compile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;checkpointer&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;checkpointer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# 5. Run it with a Thread ID
&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;configurable&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;thread_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user_42&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}}&lt;/span&gt;
&lt;span class="n"&gt;inputs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;messages&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Hi, my name is Ali!&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)],&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ali_01&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;inputs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  6. Pro-Tip: Debugging in MongoDB Compass
&lt;/h2&gt;

&lt;p&gt;When you run this code, open MongoDB Compass. You will see a collection named checkpoints.&lt;/p&gt;

&lt;p&gt;Each document is a "Snapshot" of your agent's brain at a specific moment. If your agent starts hallucinating, you can look at the channel_values in the database to see exactly what "State" the agent was in when it made the mistake.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Summary&lt;/strong&gt;&lt;br&gt;
LangGraph turns AI development from "prompt engineering" into "system engineering."&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Nodes do the work.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Edges manage the flow.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Checkpointers give it a soul (memory).&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmny7cilu28vznf4amae5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmny7cilu28vznf4amae5.png" alt=" " width="800" height="436"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>langchain</category>
      <category>langgraph</category>
      <category>mongodb</category>
      <category>ai</category>
    </item>
    <item>
      <title>Building the "Boss" of AI: The Agentic Supervisor Architecture</title>
      <dc:creator>Syed Mehrab</dc:creator>
      <pubDate>Thu, 26 Mar 2026 08:53:39 +0000</pubDate>
      <link>https://dev.to/syed_mehrab_08fb0419feedf/building-the-boss-of-ai-the-agentic-supervisor-architecture-33b0</link>
      <guid>https://dev.to/syed_mehrab_08fb0419feedf/building-the-boss-of-ai-the-agentic-supervisor-architecture-33b0</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/..." class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/..." alt="Uploading image" width="800" height="400"&gt;&lt;/a&gt;In the world of AI Agents, we are moving past single-task bots toward multi-agent systems. But how do you prevent a team of specialized agents from descending into chaos? The answer is the Supervisor Architecture.&lt;/p&gt;

&lt;p&gt;Think of it as the "Project Manager" for your LLMs. Instead of every agent trying to talk to everyone else (which is computationally expensive and confusing), you introduce a central orchestrator.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How the Architecture Works&lt;/strong&gt;&lt;br&gt;
The Supervisor pattern follows a Hub-and-Spoke model. Here is the typical lifecycle of a request:&lt;/p&gt;

&lt;p&gt;The Intake: The user provides a complex goal (e.g., "Research this company and write a 5-page investment report").&lt;/p&gt;

&lt;p&gt;The Planner (Supervisor): An LLM with a specialized prompt acts as the Supervisor. It breaks the goal into sub-tasks.&lt;/p&gt;

&lt;p&gt;The Delegation: The Supervisor looks at its "team" (e.g., a Researcher Agent, a Coder Agent, and a Writer Agent) and hands off the first task.&lt;/p&gt;

&lt;p&gt;The Review: When an agent finishes, it sends the result back to the Supervisor—not to the next agent. The Supervisor decides if the work is good enough or needs a revision.&lt;/p&gt;

&lt;p&gt;The Hand-off: Once Task A is perfect, the Supervisor passes that context to the agent responsible for Task B.&lt;/p&gt;

&lt;h2&gt;
  
  
  How the Architecture Works
&lt;/h2&gt;

&lt;p&gt;The Supervisor pattern follows a Hub-and-Spoke model. Here is the typical lifecycle of a request:&lt;/p&gt;

&lt;p&gt;The Intake: The user provides a complex goal (e.g., "Research this company and write a 5-page investment report").&lt;/p&gt;

&lt;p&gt;The Planner (Supervisor): An LLM with a specialized prompt acts as the Supervisor. It breaks the goal into sub-tasks.&lt;/p&gt;

&lt;p&gt;The Delegation: The Supervisor looks at its "team" (e.g., a Researcher Agent, a Coder Agent, and a Writer Agent) and hands off the first task.&lt;/p&gt;

&lt;p&gt;The Review: When an agent finishes, it sends the result back to the Supervisor—not to the next agent. The Supervisor decides if the work is good enough or needs a revision.&lt;/p&gt;

&lt;p&gt;The Hand-off: Once Task A is perfect, the Supervisor passes that context to the agent responsible for Task B.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why Use a Supervisor?
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;State Management: The Supervisor keeps the "Source of Truth." Individual agents don't need to remember the entire conversation; they only need to know their current task.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Error Correction: If a specialized agent hallucinates, the Supervisor (using a different model or prompt) can catch the error before the final output.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Scalability: You can easily add a "Legal Agent" or an "SEO Agent" to the spoke without rewriting the logic for the other agents.&lt;/p&gt;

&lt;h2&gt;
  
  
  Top Tools for Building Supervisor Architectures
&lt;/h2&gt;

&lt;p&gt;If you are looking to build this today, these frameworks have built-in support for supervisor patterns:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
LangGraph (by LangChain): This is the current gold standard. It allows you to create "cycles" and state machines where a supervisor node manages the flow between other nodes.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;2.&lt;br&gt;
CrewAI: Uses a "Manager" role that can be assigned to an LLM to automatically coordinate "Tasks" among a "Crew."&lt;/p&gt;

&lt;p&gt;3.&lt;br&gt;
Autogen (by Microsoft): Uses a GroupChatManager that acts as the moderator for a conversation between multiple agents.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Future: Hierarchical Supervision
&lt;/h2&gt;

&lt;p&gt;In very complex systems, we are now seeing Hierarchical Supervision. A "Lead Supervisor" manages several "Sub-Supervisors," who each manage their own team of functional agents. This mimics a real-world corporate structure and allows AI to handle massive, multi-week projects.&lt;/p&gt;

&lt;p&gt;Key Takeaway: Don't just build agents; build teams. And every team needs a leader.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Stop Leaking Your Database Logic: The Repository Pattern 🏗️</title>
      <dc:creator>Syed Mehrab</dc:creator>
      <pubDate>Sat, 14 Mar 2026 12:58:12 +0000</pubDate>
      <link>https://dev.to/syed_mehrab_08fb0419feedf/stop-leaking-your-database-logic-the-repository-pattern-37c9</link>
      <guid>https://dev.to/syed_mehrab_08fb0419feedf/stop-leaking-your-database-logic-the-repository-pattern-37c9</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fshze4mva8nch8w8acdzq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fshze4mva8nch8w8acdzq.png" alt=" "&gt;&lt;/a&gt;&lt;br&gt;
Ever found yourself writing the same SQL query or ORM call in three different controllers? Or worse—trying to unit test business logic but getting stuck because it’s hard-coded to a live database?&lt;/p&gt;

&lt;p&gt;That’s where the Repository Pattern saves the day. It acts as a mediator between your domain/business logic and the data mapping layer.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Architecture at a Glance&lt;/strong&gt;&lt;br&gt;
Instead of your service talking directly to the database, it talks to an Interface.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Client/Service:&lt;/strong&gt; "I need user #42."&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Repository:&lt;/strong&gt; "I'll go get that for you (I don't care if it's from SQL, NoSQL, or a Cache)."&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Data Source:&lt;/strong&gt; Returns the raw data.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;💻 &lt;strong&gt;The "Before vs. After"&lt;/strong&gt;&lt;br&gt;
The "Spaghetti" Way (Logic + DB Mixed):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Controller
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_user_dashboard&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;user&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;query&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SELECT * FROM users WHERE id = ?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;# Leaking DB logic
&lt;/span&gt;    &lt;span class="n"&gt;stats&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;stats_&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;render_template&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;stats&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;stats&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;The Repository Way (Clean &amp;amp; Decoupled):&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Service
&lt;/span&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_user_dashboard&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;user&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;user_repo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_by_id&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;# Abstracted
&lt;/span&gt;    &lt;span class="n"&gt;stats&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;user_repo&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_activity_stats&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_id&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;render_template&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;stats&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;stats&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Why bother?
&lt;/h2&gt;

&lt;p&gt;Testability: You can easily "mock" the repository during unit tests without needing a real database.&lt;/p&gt;

&lt;p&gt;Don't Repeat Yourself (DRY): Centralize your data access logic. If a query needs to change, you change it in one place.&lt;/p&gt;

&lt;p&gt;Flexibility: Want to switch from Postgres to MongoDB? You only update the Repository implementation; your business logic stays exactly the same.&lt;/p&gt;

&lt;p&gt;Separation of Concerns: Your API/Controller stays "thin" and focused only on handling requests.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The "Golden Rule"&lt;/strong&gt;&lt;br&gt;
The Repository Pattern is powerful, but don't over-engineer! If you are building a very simple CRUD app with only 2-3 tables, adding a repository layer might just be unnecessary boilerplate.&lt;/p&gt;

&lt;p&gt;Use it when your business logic starts getting complex or when you plan to support multiple data sources.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl6bp2hufcy04vobgj1vr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl6bp2hufcy04vobgj1vr.png" alt=" "&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>architecture</category>
      <category>backend</category>
      <category>database</category>
      <category>softwareengineering</category>
    </item>
    <item>
      <title>Stop Staring at JSON: A Developer's Guide to MongoDB Compass 🧭</title>
      <dc:creator>Syed Mehrab</dc:creator>
      <pubDate>Tue, 10 Mar 2026 07:26:48 +0000</pubDate>
      <link>https://dev.to/syed_mehrab_08fb0419feedf/stop-staring-at-json-a-developers-guide-to-mongodb-compass-4885</link>
      <guid>https://dev.to/syed_mehrab_08fb0419feedf/stop-staring-at-json-a-developers-guide-to-mongodb-compass-4885</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0pjja41i0tjji5kdx1r0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0pjja41i0tjji5kdx1r0.png" alt=" " width="800" height="446"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you are building an Auth Service or a CRUD app, looking at raw BSON in a terminal is a recipe for a headache. &lt;strong&gt;MongoDB Compass&lt;/strong&gt; is the official GUI that lets you visualize your data, analyze your schemas, and manage your indexes without writing a single line of MQL (MongoDB Query Language).&lt;/p&gt;

&lt;p&gt;Here is how to get it running on your machine and the essential "day one" commands you need.&lt;/p&gt;

&lt;p&gt;🚀 &lt;/p&gt;

&lt;h2&gt;
  
  
  Installation Guide
&lt;/h2&gt;

&lt;p&gt;🐧 Ubuntu (24.04+)&lt;br&gt;
For Linux users, the .deb package is the most stable way to ensure all dependencies are met.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Terminal Way:&lt;/strong&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# 1. Download the latest package&lt;/span&gt;
wget https://downloads.mongodb.com/compass/mongodb-compass_1.45.0_amd64.deb

&lt;span class="c"&gt;# 2. Install it&lt;/span&gt;
&lt;span class="nb"&gt;sudo &lt;/span&gt;apt &lt;span class="nb"&gt;install&lt;/span&gt; ./mongodb-compass_1.45.0_amd64.deb &lt;span class="nt"&gt;-y&lt;/span&gt;

&lt;span class="c"&gt;# 3. Launch&lt;/span&gt;
mongodb-compass
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;🪟 Windows&lt;/strong&gt;&lt;br&gt;
Download the .exe or .msi installer from the Official Download Page.&lt;/p&gt;

&lt;p&gt;Run the installer and follow the wizard.&lt;/p&gt;

&lt;p&gt;Once installed, it will be available in your Start Menu.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;🍎 macOS&lt;/strong&gt;&lt;br&gt;
Download the .dmg file.&lt;/p&gt;

&lt;p&gt;Open the .dmg and drag the MongoDB Compass icon into your Applications folder.&lt;/p&gt;

&lt;p&gt;If you get a "Developer cannot be verified" warning, go to System Settings &amp;gt; Privacy &amp;amp; Security and click "Open Anyway."&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;🔌 Connecting for the First Time&lt;/strong&gt;&lt;br&gt;
Most local development setups use the default port. Paste this into the connection string box:&lt;/p&gt;

&lt;p&gt;Standard Local URI:&lt;br&gt;
mongodb://localhost:27017&lt;/p&gt;

&lt;p&gt;If you are using Docker:&lt;br&gt;
mongodb://admin:password@localhost:27017&lt;/p&gt;
&lt;h2&gt;
  
  
  🛠 Top 3 Features for Every Developer
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. The "Filter" Bar (Finding your Data)&lt;/strong&gt;&lt;br&gt;
Instead of writing db.users.find({"email": "&lt;a href="mailto:test@example.com"&gt;test@example.com&lt;/a&gt;"}), just type this into the Filter field:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="nl"&gt;"email"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"test@example.com"&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Hit Find, and Compass will instantly isolate that document.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Schema Analysis&lt;/strong&gt;&lt;br&gt;
Ever wonder why your Python UserInDB model is crashing? Use the Schema tab. Compass will scan your collection and show you if some documents are missing fields (like phone) or have the wrong data type (like a string where an int should be).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Visual Explain Plan&lt;/strong&gt;&lt;br&gt;
Is your login query slow? Click the Explain Plan tab. It shows you exactly how MongoDB is searching. If you see "COLLSCAN" (Collection Scan), it’s time to add an index!&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pro Tip for Auth Devs&lt;/strong&gt;&lt;br&gt;
When testing JWT Refresh Tokens, keep Compass open on your users collection. Watch the token_creation_at and last_login fields update in real-time as you hit your FastAPI endpoints. It’s the fastest way to debug your Repository logic!&lt;/p&gt;

</description>
    </item>
    <item>
      <title>16-bit AI Quality at 11-bit Size? How DFloat11 achieves Lossless LLM Compression</title>
      <dc:creator>Syed Mehrab</dc:creator>
      <pubDate>Fri, 06 Mar 2026 21:03:33 +0000</pubDate>
      <link>https://dev.to/syed_mehrab_08fb0419feedf/16-bit-ai-quality-at-11-bit-size-how-dfloat11-achieves-lossless-llm-compression-3ahj</link>
      <guid>https://dev.to/syed_mehrab_08fb0419feedf/16-bit-ai-quality-at-11-bit-size-how-dfloat11-achieves-lossless-llm-compression-3ahj</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjlih82pnrhn87rd0l8z3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjlih82pnrhn87rd0l8z3.png" alt=" " width="800" height="372"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The AI world has a massive "obesity" problem. Models like Llama 3.1 405B are brilliant, but they are also digital giants. To run them, you usually have two choices:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Buy more GPUs: (Extremely expensive)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Quantize the model: (Shrink it to 4-bit or 8-bit, but lose accuracy/logic)&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;But what if I told you there is a third way? A way to shrink a model by &lt;strong&gt;30%&lt;/strong&gt; without losing a &lt;strong&gt;single bit&lt;/strong&gt; of information?&lt;/p&gt;

&lt;p&gt;Enter *&lt;em&gt;DFloat11 *&lt;/em&gt;(Dynamic-Length Float), a new lossless compression framework that is changing the game for LLM inference.&lt;/p&gt;

&lt;p&gt;🧠 &lt;strong&gt;The Core Insight: BFloat16 is Inefficient&lt;/strong&gt;&lt;br&gt;
Most modern LLMs are stored in BFloat16 format. Each number uses 16 bits: 1 for sign, 8 for exponent, and 7 for mantissa.&lt;/p&gt;

&lt;p&gt;Researchers found something shocking: while the sign and mantissa are fully utilized, the &lt;strong&gt;exponent bits&lt;/strong&gt; are mostly "empty air." Out of 256 possible exponent values, only about 40 actually show up in real models. This is a massive waste of memory.&lt;/p&gt;

&lt;p&gt;🛠️ &lt;strong&gt;How DFloat11 Works&lt;/strong&gt;&lt;br&gt;
Instead of cutting off bits (like quantization), DFloat11 uses Entropy Coding (Huffman Coding):&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Common Exponents get very short codes (2-3 bits).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Rare Exponents get longer codes.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Sign &amp;amp; Mantissa stay exactly the same.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The result? The model's "weight" drops from 16 bits to roughly &lt;strong&gt;10.8 - 11.1 bits&lt;/strong&gt;. It’s like a ZIP file for your LLM, but one that stays "zipped" even while the model is running!&lt;/p&gt;

&lt;p&gt;🚀 &lt;strong&gt;The "Magic" of Lossless&lt;/strong&gt;&lt;br&gt;
The biggest headache with 4-bit or 8-bit quantization is the "Accuracy Drop." In reasoning-heavy models like DeepSeek-R1, quantizing can lead to a 9% drop in accuracy.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;DFloat11 is Bit-for-Bit identical&lt;/strong&gt;. * MMLU Scores? Identical.&lt;/p&gt;

&lt;p&gt;WikiText Perplexity? Identical.&lt;/p&gt;

&lt;p&gt;Logic &amp;amp; Reasoning? Zero change.&lt;/p&gt;

&lt;p&gt;💻 &lt;strong&gt;GPU Magic: Making Huffman Coding Fast&lt;/strong&gt;&lt;br&gt;
Huffman decoding is usually slow on GPUs because it's sequential. DFloat11 solves this with three brilliant engineering tricks:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Hierarchical LUTs: Compact lookup tables that fit in the GPU’s lightning-fast SRAM.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Two-Phase Kernels: A smart way for GPU threads to coordinate where to read and write variable-length data.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Transformer-Block Batching: Decompressing entire blocks at once to keep the GPU cores busy.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;📊 &lt;strong&gt;The Real-World Impact&lt;/strong&gt;&lt;br&gt;
Llama 3.1 405B on One Node: You can now run the 810GB Llama 405B on a single 8x80GB GPU server instead of two.&lt;/p&gt;

&lt;p&gt;5.7x - 14.9x Longer Context: Because weights take up less room, there is more "VRAM" left for the KV Cache (the model's memory of your conversation).&lt;/p&gt;

&lt;p&gt;Faster than Offloading: It is 2.3x to 46x faster than trying to offload parts of the model to your system RAM (CPU).&lt;/p&gt;

&lt;p&gt;Read the full paper: &lt;a href="https://arxiv.org/abs/2504.11651" rel="noopener noreferrer"&gt;https://arxiv.org/abs/2504.11651&lt;/a&gt;&lt;br&gt;
Github : &lt;a href="https://github.com/LeanModels/DFloat11" rel="noopener noreferrer"&gt;https://github.com/LeanModels/DFloat11&lt;/a&gt;&lt;br&gt;
connect on LinkedIn: &lt;a href="https://www.linkedin.com/in/syed-mehrab-18934220a/" rel="noopener noreferrer"&gt;https://www.linkedin.com/in/syed-mehrab-18934220a/&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>machinelearning</category>
      <category>llm</category>
      <category>gpu</category>
    </item>
    <item>
      <title>Giving LLMs a Long-Term Memory: An Introduction to Mem0 🧠</title>
      <dc:creator>Syed Mehrab</dc:creator>
      <pubDate>Wed, 04 Mar 2026 16:31:44 +0000</pubDate>
      <link>https://dev.to/syed_mehrab_08fb0419feedf/giving-llms-a-long-term-memory-an-introduction-to-mem0-3jhp</link>
      <guid>https://dev.to/syed_mehrab_08fb0419feedf/giving-llms-a-long-term-memory-an-introduction-to-mem0-3jhp</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2zi9ex6bedwmqmbp5dpx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2zi9ex6bedwmqmbp5dpx.png" alt=" " width="800" height="436"&gt;&lt;/a&gt;We’ve all been there: You build a sophisticated AI agent, have a great conversation, and then, the moment you start a new session, it treats you like a complete stranger.&lt;/p&gt;

&lt;p&gt;Most LLMs are essentially goldfish. While &lt;strong&gt;RAG (Retrieval-Augmented Generation)&lt;/strong&gt; helps them "read" documents, it doesn't really help them "remember" you. That’s where Mem0 comes in.&lt;/p&gt;

&lt;h2&gt;
  
  
  What is Mem0?
&lt;/h2&gt;

&lt;p&gt;Mem0 (pronounced "Memory Zero") is a self-improving memory layer for AI assistants and agents. It allows your LLM applications to retain information across different sessions, learning from user interactions to provide a truly personalized experience.&lt;/p&gt;

&lt;p&gt;Think of it as the "Personalized Intelligence" layer. Instead of just searching through a static PDF, the AI learns that you prefer Python over JavaScript, or that you’re currently working on a specific microservices architecture.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;em&gt;Key Features:&lt;/em&gt;&lt;/strong&gt;&lt;br&gt;
Adaptive Learning: It doesn't just store data; it improves based on user interactions.&lt;/p&gt;

&lt;p&gt;User-Centric: It organizes memory by user, session, and even AI agent.&lt;/p&gt;

&lt;p&gt;Platform Agnostic: It works with OpenAI, Anthropic, Llama, and more.&lt;/p&gt;

&lt;p&gt;Developer Friendly: The API is designed to be integrated into existing stacks in minutes.&lt;/p&gt;
&lt;h2&gt;
  
  
  How It Works
&lt;/h2&gt;

&lt;p&gt;Standard RAG pulls snippets of text based on a query. Mem0, however, acts more like a continuously updated diary. When a user says something important, Mem0 extracts the "fact," stores it, and makes it available for the next prompt.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Quick Start&lt;/strong&gt;&lt;br&gt;
Getting started is surprisingly simple:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from mem0 import Memory

# Initialize Mem0
m = Memory()

# Store a memory
m.add("I'm allergic to peanuts and prefer coding in Rust.", user_id="dev_user_123")

# Retrieve relevant memories later
all_memories = m.get_all(user_id="dev_user_123")
print(all_memories)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;Why use Mem0 over standard Vector DBs?&lt;/strong&gt;&lt;br&gt;
While you could build this yourself using Pinecone or Milvus, Mem0 handles the heavy lifting of memory management:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Conflict Resolution: If you tell the AI "I live in New York" today and "I moved to Tokyo" tomorrow, Mem0 understands the update.&lt;/li&gt;
&lt;li&gt;Contextual Ranking: It prioritizes the most relevant memories for the current conversation.&lt;/li&gt;
&lt;li&gt;No Manual Cleanup: You don't have to write complex logic to delete or update old embeddings.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;strong&gt;Alternatives to Mem0&lt;/strong&gt;&lt;br&gt;
&lt;em&gt;If you're exploring different ways to handle AI memory, here are the top contenders and how they differ:&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Zep&lt;/strong&gt;: A high-performance, production-grade long-term memory store. Unlike Mem0, Zep excels at automatically enriching and summarizing chat history, making it great for high-scale applications that need to stay fast.&lt;/p&gt;

&lt;p&gt;*&lt;em&gt;Letta *&lt;/em&gt;(formerly MemGPT): If you want your agents to manage their own memory like an OS manages RAM, this is it. It allows LLMs to "page" information in and out of their context window dynamically.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;LangChain Memory Modules&lt;/strong&gt;: The "classic" choice. It’s perfect for quick prototyping (using ConversationBufferMemory), though it can be harder to scale for long-term, multi-session persistence compared to a dedicated memory layer.&lt;/p&gt;

&lt;p&gt;*&lt;em&gt;Redis *&lt;/em&gt;(with Vector Search): The speed king. If you already use Redis for caching, you can use its vector capabilities to store user sessions. However, you’ll have to build the "memory extraction" logic yourself.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Pinecone / Weaviate&lt;/strong&gt;: These are pure Vector Databases. They are industry standards for storing massive amounts of data, but they don't "manage" the human-like memory logic (like updating old facts) out of the box like Mem0 does.&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>llm</category>
      <category>rag</category>
    </item>
    <item>
      <title>Beyond Chatbots: Can We Give AI Agents an "Undo" Button? Exploring Gorilla GoEx 🦍</title>
      <dc:creator>Syed Mehrab</dc:creator>
      <pubDate>Sat, 28 Feb 2026 18:01:33 +0000</pubDate>
      <link>https://dev.to/syed_mehrab_08fb0419feedf/beyond-chatbots-can-we-give-ai-agents-an-undo-button-exploring-gorilla-goex-2npn</link>
      <guid>https://dev.to/syed_mehrab_08fb0419feedf/beyond-chatbots-can-we-give-ai-agents-an-undo-button-exploring-gorilla-goex-2npn</guid>
      <description>&lt;p&gt;The world of Large Language Models (LLMs) is shifting. We are moving from simple chatbots that just "talk" to &lt;strong&gt;Autonomous Agents&lt;/strong&gt; that can actually "do" things: like sending Slack messages, managing files, or calling APIs.&lt;/p&gt;

&lt;p&gt;But there’s a massive problem: &lt;strong&gt;Trust&lt;/strong&gt;. How do we stop an LLM from sending a wrong email or deleting a critical database entry?&lt;/p&gt;

&lt;p&gt;I’ve been diving into the research from the UC Berkeley Gorilla LLM team, specifically their latest tool: &lt;strong&gt;GoEx (Gorilla Execution Engine)&lt;/strong&gt;. Here’s what I’ve learned and where I think the next big research challenge lies.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What is GoEx? (The Post-Facto Paradigm)&lt;/strong&gt;&lt;br&gt;
Traditionally, we try to verify LLM code before it runs (Pre-facto). But code is hard to read! GoEx introduces &lt;strong&gt;Post-Facto Validation&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Instead of over-analyzing the code, GoEx lets the LLM execute the action and gives the human two powerful safety nets:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Undo Feature&lt;/strong&gt;: If the LLM sends a Slack message or creates a file you don't like, you can simply "revert" the state.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Damage Confinement&lt;/strong&gt;: It restricts the "blast radius" by limiting permissions (e.g., the LLM can read emails but can’t send them without extra clearance).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Missing Piece: The "Social Damage" Gap&lt;/strong&gt;&lt;br&gt;
While GoEx is a huge step forward, my deep dive into the paper [arXiv:2404.06921] led me to an interesting research gap.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The Problem&lt;/strong&gt;: Technical reversibility $\neq$ Social reversibility.If an LLM sends a sensitive Slack message and the recipient reads it within 2 seconds, deleting it doesn't solve the problem. The "Information Leak" has already happened.&lt;br&gt;
&lt;strong&gt;My Take&lt;/strong&gt;: We need a "Semantic Damage Confinement" layer. This would involve:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Risk-based Buffering: Delaying high-risk messages based on sentiment analysis.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Context-Aware Throttling: Switching back to "Pre-facto" validation automatically if the action is deemed socially irreversible.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Check out the project:&lt;/p&gt;

&lt;p&gt;📄 Paper: &lt;a href="https://arxiv.org/abs/2404.06921" rel="noopener noreferrer"&gt;https://arxiv.org/abs/2404.06921&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;💻 GitHub: gorilla/goex&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9kmdsxtiv6ey5j1bd6b0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9kmdsxtiv6ey5j1bd6b0.png" alt=" " width="678" height="204"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>llm</category>
      <category>ai</category>
      <category>gorillallm</category>
    </item>
  </channel>
</rss>
