<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Peng Qian</title>
    <description>The latest articles on DEV Community by Peng Qian (@qtalen).</description>
    <link>https://dev.to/qtalen</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2927908%2F838177c1-066e-4986-a3ba-d764afa88632.jpg</url>
      <title>DEV Community: Peng Qian</title>
      <link>https://dev.to/qtalen</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/qtalen"/>
    <language>en</language>
    <item>
      <title>The hardest part of enterprise AI agents is wiring into real workflows.

I built a setup where:
✅Agent Skills load in real time from a database
✅Scripts run safely in containers
✅A “skills agent” acts as a tool to keep the main agent’s context clean</title>
      <dc:creator>Peng Qian</dc:creator>
      <pubDate>Thu, 19 Mar 2026 10:34:25 +0000</pubDate>
      <link>https://dev.to/qtalen/the-hardest-part-of-enterprise-ai-agents-is-wiring-into-real-workflows-i-built-a-setup-where-2492</link>
      <guid>https://dev.to/qtalen/the-hardest-part-of-enterprise-ai-agents-is-wiring-into-real-workflows-i-built-a-setup-where-2492</guid>
      <description>&lt;p&gt;

&lt;/p&gt;
&lt;div class="ltag__link--embedded"&gt;
  &lt;div class="crayons-story "&gt;
  &lt;a href="https://dev.to/qtalen/how-to-use-agent-skills-in-enterprise-llm-agent-systems-15h2" class="crayons-story__hidden-navigation-link"&gt;How to Use Agent Skills in Enterprise LLM Agent Systems&lt;/a&gt;


  &lt;div class="crayons-story__body crayons-story__body-full_post"&gt;
    &lt;div class="crayons-story__top"&gt;
      &lt;div class="crayons-story__meta"&gt;
        &lt;div class="crayons-story__author-pic"&gt;

          &lt;a href="/qtalen" class="crayons-avatar  crayons-avatar--l  "&gt;
            &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2927908%2F838177c1-066e-4986-a3ba-d764afa88632.jpg" alt="qtalen profile" class="crayons-avatar__image"&gt;
          &lt;/a&gt;
        &lt;/div&gt;
        &lt;div&gt;
          &lt;div&gt;
            &lt;a href="/qtalen" class="crayons-story__secondary fw-medium m:hidden"&gt;
              Peng Qian
            &lt;/a&gt;
            &lt;div class="profile-preview-card relative mb-4 s:mb-0 fw-medium hidden m:inline-block"&gt;
              
                Peng Qian
                
              
              &lt;div id="story-author-preview-content-3367444" class="profile-preview-card__content crayons-dropdown branded-7 p-4 pt-0"&gt;
                &lt;div class="gap-4 grid"&gt;
                  &lt;div class="-mt-4"&gt;
                    &lt;a href="/qtalen" class="flex"&gt;
                      &lt;span class="crayons-avatar crayons-avatar--xl mr-2 shrink-0"&gt;
                        &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2927908%2F838177c1-066e-4986-a3ba-d764afa88632.jpg" class="crayons-avatar__image" alt=""&gt;
                      &lt;/span&gt;
                      &lt;span class="crayons-link crayons-subtitle-2 mt-5"&gt;Peng Qian&lt;/span&gt;
                    &lt;/a&gt;
                  &lt;/div&gt;
                  &lt;div class="print-hidden"&gt;
                    
                      Follow
                    
                  &lt;/div&gt;
                  &lt;div class="author-preview-metadata-container"&gt;&lt;/div&gt;
                &lt;/div&gt;
              &lt;/div&gt;
            &lt;/div&gt;

          &lt;/div&gt;
          &lt;a href="https://dev.to/qtalen/how-to-use-agent-skills-in-enterprise-llm-agent-systems-15h2" class="crayons-story__tertiary fs-xs"&gt;&lt;time&gt;Mar 19&lt;/time&gt;&lt;span class="time-ago-indicator-initial-placeholder"&gt;&lt;/span&gt;&lt;/a&gt;
        &lt;/div&gt;
      &lt;/div&gt;

    &lt;/div&gt;

    &lt;div class="crayons-story__indention"&gt;
      &lt;h2 class="crayons-story__title crayons-story__title-full_post"&gt;
        &lt;a href="https://dev.to/qtalen/how-to-use-agent-skills-in-enterprise-llm-agent-systems-15h2" id="article-link-3367444"&gt;
          How to Use Agent Skills in Enterprise LLM Agent Systems
        &lt;/a&gt;
      &lt;/h2&gt;
        &lt;div class="crayons-story__tags"&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/ai"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;ai&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/programming"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;programming&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/datascience"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;datascience&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/tutorial"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;tutorial&lt;/a&gt;
        &lt;/div&gt;
      &lt;div class="crayons-story__bottom"&gt;
        &lt;div class="crayons-story__details"&gt;
          &lt;a href="https://dev.to/qtalen/how-to-use-agent-skills-in-enterprise-llm-agent-systems-15h2" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left"&gt;
            &lt;div class="multiple_reactions_aggregate"&gt;
              &lt;span class="multiple_reactions_icons_container"&gt;
                  &lt;span class="crayons_icon_container"&gt;
                    &lt;img src="https://assets.dev.to/assets/sparkle-heart-5f9bee3767e18deb1bb725290cb151c25234768a0e9a2bd39370c382d02920cf.svg" width="18" height="18"&gt;
                  &lt;/span&gt;
              &lt;/span&gt;
              &lt;span class="aggregate_reactions_counter"&gt;3&lt;span class="hidden s:inline"&gt; reactions&lt;/span&gt;&lt;/span&gt;
            &lt;/div&gt;
          &lt;/a&gt;
            &lt;a href="https://dev.to/qtalen/how-to-use-agent-skills-in-enterprise-llm-agent-systems-15h2#comments" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left flex items-center"&gt;
              Comments


              3&lt;span class="hidden s:inline"&gt; comments&lt;/span&gt;
            &lt;/a&gt;
        &lt;/div&gt;
        &lt;div class="crayons-story__save"&gt;
          &lt;small class="crayons-story__tertiary fs-xs mr-2"&gt;
            11 min read
          &lt;/small&gt;
            
              &lt;span class="bm-initial"&gt;
                

              &lt;/span&gt;
              &lt;span class="bm-success"&gt;
                

              &lt;/span&gt;
            
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;




</description>
      <category>ai</category>
      <category>programming</category>
      <category>datascience</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>How to Use Agent Skills in Enterprise LLM Agent Systems</title>
      <dc:creator>Peng Qian</dc:creator>
      <pubDate>Thu, 19 Mar 2026 10:32:10 +0000</pubDate>
      <link>https://dev.to/qtalen/how-to-use-agent-skills-in-enterprise-llm-agent-systems-15h2</link>
      <guid>https://dev.to/qtalen/how-to-use-agent-skills-in-enterprise-llm-agent-systems-15h2</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Enterprise-grade agentic systems have fallen way behind the desktop agent apps that everyone's been buzzing about lately.&lt;/p&gt;

&lt;p&gt;After spending the better part of a year building enterprise agent applications, I came to one conclusion: &lt;strong&gt;if your agent system can't plug into your company's existing business processes, it won't bring real value to your organization.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Desktop systems like OpenClaw and Claude Cowork solved this problem. They don't change their agent setup at all. Instead, they use Agent Skills to capture human business processes, then share those skills between desktop agent systems through the file system. That's how they tackle one business problem after another.&lt;/p&gt;

&lt;p&gt;But enterprise users write their skills through a web interface and save them to a database. There's a good chance the process involves complex approval and security audit steps, too. So how does your agent load these skills in real time without any downtime?&lt;/p&gt;

&lt;p&gt;The latest version of Microsoft Agent Framework finally makes this possible with its Agent Skills feature.&lt;/p&gt;

&lt;h3&gt;
  
  
  TL;DR
&lt;/h3&gt;

&lt;p&gt;With Agent Skills in Microsoft Agent Framework, enterprise agent systems can load user-defined business process skills from a database in real time, and run the scripts and generated code that come with those skills safely inside containers.&lt;/p&gt;

&lt;p&gt;Your agent system stays secure and stable, while gaining the same flexible business process orchestration that desktop agents enjoy.&lt;/p&gt;

&lt;p&gt;All the source code in this tutorial is available at the end of the article.&lt;/p&gt;




&lt;h2&gt;
  
  
  Before We Start
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Install the latest Microsoft Agent Framework
&lt;/h3&gt;

&lt;p&gt;To use Agent Skills, install the latest version of Microsoft Agent Framework:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;agent-framework &lt;span class="nt"&gt;--pre&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or, like me, you can pin the version of &lt;code&gt;agent-framework&lt;/code&gt; in your &lt;code&gt;pyproject.toml&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tex"&gt;&lt;code&gt;dependencies = [
    "agent-framework&amp;gt;=1.0.0rc4",
    "agent-framework-ag-ui&amp;gt;=1.0.0b260311",
]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then tell &lt;code&gt;uv&lt;/code&gt; to allow prerelease versions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;uv &lt;span class="nb"&gt;sync&lt;/span&gt; &lt;span class="nt"&gt;--prerelease&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;allow
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Install Tavily Agent Skills
&lt;/h3&gt;

&lt;p&gt;My end goal is to show you how to share and load Agent Skills between agents deployed across distributed nodes. But I think we should start simple. First, let me show you how to load and use skills from the community.&lt;/p&gt;

&lt;p&gt;Let's start with Tavily Agent Skills. We'll only load the &lt;code&gt;tavily-best-practices&lt;/code&gt; skill. It guides my agent on how to generate Tavily-based search code based on the task at hand, instead of calling a hardcoded function:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx skills add tavily-ai/skills
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Don't worry. After the initial demo, I'll walk you through how to load skills from a database in real time.&lt;/p&gt;




&lt;h2&gt;
  
  
  How to Load Agent Skills from Disk
&lt;/h2&gt;

&lt;p&gt;Let's start with the most basic approach.&lt;/p&gt;

&lt;p&gt;In Microsoft Agent Framework, context operations are handled by a base class called &lt;code&gt;ContextProvider&lt;/code&gt;. The latest version of MAF ships a &lt;code&gt;SkillsProvider&lt;/code&gt; class. Use it directly and pass the location of your skills through the &lt;code&gt;skill_paths&lt;/code&gt; attribute, and you're done. skill_paths doesn't require a default directory like &lt;code&gt;.claude/skills&lt;/code&gt;, and you can pass in multiple paths.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;skills_provider&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SkillsProvider&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;skill_paths&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;get_current_directory&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.agents/skills&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next, create your agent and pass the &lt;code&gt;skills_provider&lt;/code&gt; instance through &lt;code&gt;context_providers&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;skills_agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;chat_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;as_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SkillsAssistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;instructions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;re a helpful assistant, and you&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ll respond to user requests according to your skills.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;context_providers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;skills_provider&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;code_tool&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To run the Python code the agent writes based on the Tavily skill instructions, you need to pass a &lt;code&gt;code_interpreter&lt;/code&gt; tool to the agent. Let the code run inside a container environment. I'll cover that in detail later.&lt;/p&gt;

&lt;p&gt;Write a &lt;code&gt;main&lt;/code&gt; method to test the agent:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;code_executor&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;session&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_session&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;skills_agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Check how gold ETFs performed in February 2026 and give some investment advice.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;session&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Microsoft Agent Framework provides an OpenTelemetry-based telemetry tool. I hooked it up to MLflow. Let's run the agent once and see what happens:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fitwpu0eb19fqkya3s0wb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fitwpu0eb19fqkya3s0wb.png" alt="Through MLflow, you can see that the agent successfully loaded and executed the Agent Skills. Image by Author" width="800" height="561"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can see that once the agent decided it needed Tavily to search, it loaded the full &lt;code&gt;SKILL.md&lt;/code&gt; document, wrote Tavily search code following the instructions, then sent it to the code interpreter for execution. Exactly what we expected.&lt;/p&gt;

&lt;p&gt;You can learn how to use MLFlow in this article:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.dataleadsfuture.com/monitoring-qwen-3-agents-with-mlflow-3-x-end-to-end-tracking-tutorial/" rel="noopener noreferrer"&gt;Monitoring Qwen 3 Agents with MLflow 3.x: End-to-End Tracing Tutorial&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  How Agent Skills Work
&lt;/h2&gt;

&lt;p&gt;Now let's talk about how to get the most out of Agent Skills in enterprise systems. That means loading external skills in real time, containerizing the code interpreter, and managing context more carefully.&lt;/p&gt;

&lt;p&gt;But before we go there, let's dig into how Agent Skills actually work inside MAF, so the rest of this tutorial makes more sense.&lt;/p&gt;

&lt;p&gt;As I mentioned, &lt;code&gt;SkillsProvider&lt;/code&gt; extends &lt;code&gt;BaseContextProvider&lt;/code&gt;, which means it works by operating on the agent's context.&lt;/p&gt;

&lt;p&gt;When you initialize &lt;code&gt;SkillsProvider&lt;/code&gt;, you pass one or more search paths to the &lt;code&gt;skill_paths&lt;/code&gt; attribute. Take the &lt;code&gt;.agents/skills&lt;/code&gt; directory as an example. On startup, &lt;code&gt;SkillsProvider&lt;/code&gt; recursively searches this directory and finds every subdirectory that contains a &lt;code&gt;SKILL.md&lt;/code&gt; file. Then it extracts the &lt;code&gt;name&lt;/code&gt; and &lt;code&gt;description&lt;/code&gt; fields from each &lt;code&gt;SKILL.md&lt;/code&gt; file, along with the file content, and stores everything in a &lt;code&gt;Skill&lt;/code&gt; object.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;SkillsProvider&lt;/code&gt; loops through these &lt;code&gt;Skill&lt;/code&gt; objects, formats the &lt;code&gt;name&lt;/code&gt; and &lt;code&gt;description&lt;/code&gt; fields like this, and merges them into the agent's system prompt. This keeps the agent aware of available skills without loading their full content upfront.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;lines&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;  &amp;lt;skill&amp;gt;&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;lines&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;    &amp;lt;name&amp;gt;&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;xml_escape&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;skill&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;&amp;lt;/name&amp;gt;&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;lines&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;    &amp;lt;description&amp;gt;&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;xml_escape&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;skill&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;&amp;lt;/description&amp;gt;&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;lines&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;  &amp;lt;/skill&amp;gt;&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;SkillsProvider&lt;/code&gt; also adds two methods to the agent through context: &lt;code&gt;load_skill&lt;/code&gt; and &lt;code&gt;read_skill_resource&lt;/code&gt;. When the agent decides which skill it needs based on the user's request, it calls &lt;code&gt;load_skill&lt;/code&gt; to look up the matching &lt;code&gt;Skill&lt;/code&gt; object by name and loads its full content into the context.&lt;/p&gt;

&lt;p&gt;If a skill's content references extra resource files like &lt;code&gt;references/search.md&lt;/code&gt;, the agent can call &lt;code&gt;read_skill_resource&lt;/code&gt; to load those files.&lt;/p&gt;

&lt;p&gt;Here's the full workflow:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fev66hdgmrajgccddrswy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fev66hdgmrajgccddrswy.png" alt="A diagram illustrating the workflow of SkillsProvider. Image by Author" width="771" height="481"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This design follows the progressive disclosure principle defined by &lt;a href="https://agentskills.io/home?ref=dataleadsfuture.com" rel="noopener noreferrer"&gt;agentskills.io&lt;/a&gt;. Skill content loads into the agent's context gradually, only when needed. No context explosion, no wasted tokens.&lt;/p&gt;




&lt;h2&gt;
  
  
  Agent Skills for Enterprise Systems
&lt;/h2&gt;

&lt;p&gt;Alright, enough theory. Let's get into today's main topic: &lt;strong&gt;how to use Agent Skills in enterprise-grade agentic systems.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Load skills from external systems in real time
&lt;/h3&gt;

&lt;p&gt;What if business users write their skills through a cloud-based web page and save them to a database? How do you handle that?&lt;/p&gt;

&lt;p&gt;We need a new approach to sync and apply Agent Skills in real time.&lt;/p&gt;

&lt;p&gt;As I covered earlier, when &lt;code&gt;SkillsProvider&lt;/code&gt; initializes, it loads all &lt;code&gt;SKILL.md&lt;/code&gt; files from the input paths into an in-memory list of &lt;code&gt;Skill&lt;/code&gt; objects.&lt;/p&gt;

&lt;p&gt;Besides the file system approach, &lt;code&gt;SkillsProvider&lt;/code&gt; also supports Code Defined Skills, where you write skill content directly in code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pathlib&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Path&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;agent_framework&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Skill&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;SkillsProvider&lt;/span&gt;

&lt;span class="n"&gt;my_skill&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Skill&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;my-code-skill&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;A code-defined skill&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Instructions for the skill.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then pass it to &lt;code&gt;SkillsProvider&lt;/code&gt; through the &lt;code&gt;skills&lt;/code&gt; attribute:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;skills_provider&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SkillsProvider&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;skill_paths&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;__file__&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;parent&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;skills&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;skills&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;my_skill&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This opens the door to managing and loading skills from a database. But the original &lt;code&gt;SkillsProvider&lt;/code&gt; class only accepts skills at initialization time. We want to load skills dynamically while the agent system is running, so we need to extend &lt;code&gt;SkillsProvider&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;After reading the source code, I found that every class extending &lt;code&gt;BaseContextProvider&lt;/code&gt; has a &lt;code&gt;before_run&lt;/code&gt; method that gets called when the agent calls &lt;code&gt;run&lt;/code&gt;. We can load the latest skills from the database before &lt;code&gt;before_run&lt;/code&gt; executes, then update &lt;code&gt;SkillsProvider&lt;/code&gt;'s &lt;code&gt;self._skills&lt;/code&gt; list and refresh the skills description in &lt;code&gt;instructions&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;What I need is a hook method. Every time before &lt;code&gt;before_run&lt;/code&gt; runs, this hook fetches the latest skills. All I need to do is put the database fetching logic inside this hook.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faq33zflqmrchfw1awav4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faq33zflqmrchfw1awav4.png" alt="The workflow for loading skills from the database in real time. Image by Author" width="698" height="527"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The simplest way to give &lt;code&gt;SkillsProvider&lt;/code&gt; this hook is to build an &lt;code&gt;UpdatableSkillsProvider&lt;/code&gt; subclass. This subclass accepts a &lt;code&gt;skills_updater&lt;/code&gt; parameter at initialization:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;UpdatableSkillsProvider&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;SkillsProvider&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;skill_paths&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;Path&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;Sequence&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;skills_updater&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Callable&lt;/span&gt;&lt;span class="p"&gt;[[],&lt;/span&gt; &lt;span class="n"&gt;Awaitable&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Sequence&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Skill&lt;/span&gt;&lt;span class="p"&gt;]]]&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;
    &lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="nf"&gt;super&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;skill_paths&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;skill_paths&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_skills_updater&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;skills_updater&lt;/span&gt;
        &lt;span class="bp"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;UpdatableSkillsProvider&lt;/code&gt; calls the hook through a private &lt;code&gt;_update&lt;/code&gt; method, which also updates &lt;code&gt;self._skills&lt;/code&gt; and the agent's system prompt. Then &lt;code&gt;before_run&lt;/code&gt; calls &lt;code&gt;_update&lt;/code&gt; to keep skills fresh in real time:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;UpdatableSkillsProvider&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;SkillsProvider&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="bp"&gt;...&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_update&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_skills_updater&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt;

        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;new_skills&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_skills_updater&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;skill&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;new_skills&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_skills&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;skill&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;skill&lt;/span&gt;

            &lt;span class="n"&gt;has_scripts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;any&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scripts&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_skills&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;values&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;

            &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_instructions&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;_create_instructions&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;prompt_template&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_instruction_template&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;skills&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_skills&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;include_script_runner_instructions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;has_scripts&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;

            &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_tools&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_create_tools&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;include_script_runner_tool&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;has_scripts&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;require_script_approval&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_require_script_approval&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;exc&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exception&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Failed to update skills: %s&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;exc&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="nd"&gt;@override&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;before_run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_update&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;super&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;before_run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let's write a &lt;code&gt;get_latest_skills&lt;/code&gt; hook to simulate loading the latest skills from a database:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@lru_cache&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_latest_skills&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Skill&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Pseudocode. In this hook method, you can read the skills text from the database 
    and dynamically build Skill objects.
    :return: 
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;code_style_skill&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Skill&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;code-style&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Coding style guidelines and conventions for the team&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;dedent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="se"&gt;\
&lt;/span&gt;&lt;span class="s"&gt;            Use this skill when answering questions about coding style,
            conventions, or best practices for the team.
        &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;code_style_skill&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Call the agent's &lt;code&gt;run&lt;/code&gt; method, then check in MLFlow whether the skills loaded by &lt;code&gt;get_latest_skills&lt;/code&gt; show up in the agent's system prompt:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F68h00cw37l6aqouzechh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F68h00cw37l6aqouzechh.png" alt="The skills loaded from the database have been updated into the agent's system prompt. Image by Author" width="800" height="509"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The hook method works. We can now load skills from a database in real time.&lt;/p&gt;

&lt;h3&gt;
  
  
  Run scripts from skills safely inside containers
&lt;/h3&gt;

&lt;p&gt;As of the latest version, Microsoft Agent Framework can't run Python scripts locally or inside containers. But most skills guide the agent through business logic using scripts, so we need to give the agent the ability to run those scripts in a code interpreter.&lt;/p&gt;

&lt;p&gt;As the predecessor to MAF, Autogen provided a way to run Python scripts inside Docker containers. You can learn about that in this article:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.dataleadsfuture.com/exclusive-reveal-code-sandbox-tech-behind-manus-and-claude-agent-skills/" rel="noopener noreferrer"&gt;Exclusive Reveal: Code Sandbox Tech Behind Manus and Claude Agent Skills&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We need something like Autogen's &lt;code&gt;DockerCommandLineCodeExecutor&lt;/code&gt; for Agent Framework. With the help of AI coding tools, building a code executor for Agent Framework isn't hard. (You can find it in the source code repo at the end of the article.)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;code_executor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;DockerCommandLineCodeExecutor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;python-code-sandbox&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;work_dir&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;work_dir&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;delete_tmp_files&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;environment&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;TAVILY_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;TAVILY_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To keep LLM calls simple, we also need an object-oriented &lt;code&gt;CodeExecutionTool&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;CodeExecutionTool&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Tool for executing code using a CodeExecutor.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;executor&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;CodeExecutor&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_executor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;executor&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;execute_code&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;language&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Literal&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;python&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sh&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;python&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_executor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute_code_blocks&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;CodeBlock&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;language&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;language&lt;/span&gt;&lt;span class="p"&gt;)],&lt;/span&gt;
            &lt;span class="nc"&gt;CancellationToken&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next, initialize an &lt;code&gt;execute_code&lt;/code&gt; tool and wire it up to the agent at initialization:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;code_tool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;CodeExecutionTool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;code_executor&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;execute_code&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In MLflow, you can see that when the agent needs to search the web, it generates Python code based on the skill's instructions and sends it to the container for execution:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa34kkyrhxca4f14aqvwe.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa34kkyrhxca4f14aqvwe.png" alt="The agent generated code based on the skill's instructions and executed it in a container. Image by Author" width="800" height="470"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This approach not only lets the agent run code defined in skills, but also keeps that execution safe inside a container.&lt;/p&gt;

&lt;p&gt;Of course, in a server-side deployment, you'd send code to a centralized Jupyter kernel environment for execution. But that's a whole other story. You can dig into that in my other articles.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.dataleadsfuture.com/how-i-crushed-advent-of-code-and-solved-hard-problems-using-autogen-jupyter-executor-and-qwen3/" rel="noopener noreferrer"&gt;How I Crushed Advent of Code And Solved Hard Problems Using Autogen Jupyter Executor and Qwen3&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Reduce context length even further
&lt;/h3&gt;

&lt;p&gt;Agent Skills uses progressive disclosure to keep irrelevant skill content from eating up your context window. But as the conversation or task moves forward, skill content that was loaded into earlier messages will still pile up in the context over time.&lt;/p&gt;

&lt;p&gt;Agent systems today have several context pruning techniques available. Context trimming and context compression, both common in desktop agents, work really well.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr62ga60f8gi0rzcupi5b.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr62ga60f8gi0rzcupi5b.png" alt="The difference between context pruning and context compression. Image by Author" width="800" height="434"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Beyond those two, today I want to share a context engineering technique I discovered at work that fits Agent Skills even better.&lt;/p&gt;

&lt;p&gt;As you know, in enterprise scenarios, loading a skill usually means running one atomic workflow: researching a topic through web search? Sure. Running a SWOT analysis on a company and writing a report? No problem.&lt;/p&gt;

&lt;p&gt;These workflows all share one thing in common. You give the agent the right input, then wait for it to return an output. Which skill the agent loaded, and how it worked through the task — I honestly don't care. I wouldn't even mind if the agent unloaded the skill after finishing to save tokens.&lt;/p&gt;

&lt;p&gt;That sounds a lot like how a function works. So, can we use an agent with skills loaded as a tool for another agent? Absolutely. That's exactly what I do.&lt;/p&gt;

&lt;p&gt;Microsoft Agent Framework has a method on Agent called &lt;code&gt;as_tool&lt;/code&gt;. It turns an agent into a function-callable tool.&lt;/p&gt;

&lt;p&gt;So I designed a main agent. The main agent takes user requests and generates the right response to return. The agent with Agent Skills loading capability turns itself into a tool for the main agent using &lt;code&gt;as_tool&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;chat_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;as_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Assistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;instructions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;dedent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    You&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;re a smart little helper who, for each user request, 
    picks the right task description to call a tool, gets the answer, 
    and then delivers the final result.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;skills_agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;as_tool&lt;/span&gt;&lt;span class="p"&gt;()],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The skills agent's workflow stays the same. It loads the right skill based on the task description, generates and runs code, then returns the result.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuco2dlm4p81kbyrhe91b.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuco2dlm4p81kbyrhe91b.png" alt="The skill agent is provided as a tool for the main agent to call. Image by Author" width="800" height="478"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;But the main agent is different. Its context only holds user messages, the message calling the skills agent tool, and the final response. No skill-related content at all. The main agent's context stays clean, and even after running for a long time, it won't interfere with the LLM.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyczyc9ercjrderiz25de.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyczyc9ercjrderiz25de.png" alt="Keep the main agent's context clean by loading skills into the sub-agent. Image by Author" width="402" height="551"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;There's a nice bonus too. LLMs know what they want better than humans do, so before the main agent calls the skills agent, it rewrites the user's task into something more precise. This helps the skills agent execute more accurately.&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;That's everything I have for you today on Agent Skills for enterprise agent systems.&lt;/p&gt;

&lt;p&gt;Unlike desktop agents, enterprise agent systems run on cloud servers. There's no way to update an agent's skills through the file system in real time without downtime.&lt;/p&gt;

&lt;p&gt;So I went with a targeted approach. This approach lets users write skill content through a web interface and save it to a database, while agents read the latest skills in real time and sync them across server nodes.&lt;/p&gt;

&lt;p&gt;I used the latest version of Microsoft Agent Framework to build this, but you can use any other framework. The principles are the same.&lt;/p&gt;

&lt;p&gt;I also covered how to run scripts the agent generates from skills inside containers, which is much safer than running scripts directly on a desktop system.&lt;/p&gt;

&lt;p&gt;I shared a context management approach I found at work that works especially well for skills-based agents.&lt;/p&gt;

&lt;p&gt;The Microsoft Agent Framework API is still a bit unstable. If anything is unclear, feel free to leave a comment, and I'll get back to you as soon as I can.&lt;/p&gt;

&lt;p&gt;Thanks for reading! Share this with your friends if you think it might help someone else.&lt;/p&gt;




&lt;p&gt;Enjoyed this read? &lt;a href="https://www.dataleadsfuture.com/#/portal/signup" rel="noopener noreferrer"&gt;Subscribe now to get more cutting-edge data science tips straight to your inbox!&lt;/a&gt; Your feedback and questions are welcome — let’s discuss in the comments below!&lt;/p&gt;

&lt;p&gt;This article was originally published on &lt;a href="https://www.dataleadsfuture.com/how-to-use-agent-skills-in-enterprise-llm-agent-systems/" rel="noopener noreferrer"&gt;Data Leads Future&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>datascience</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Most “agent memory” setups just save everything and hope semantic search will fix it.

I share a pattern that works better:
✅ Use an LLM to decide what is worth remembering
✅ Store memories in RedisVL
✅ Run memory extraction in parallel with asyncio</title>
      <dc:creator>Peng Qian</dc:creator>
      <pubDate>Tue, 10 Feb 2026 11:49:54 +0000</pubDate>
      <link>https://dev.to/qtalen/most-agent-memory-setups-just-save-everything-and-hope-semantic-search-will-fix-it-i-share-a-38m5</link>
      <guid>https://dev.to/qtalen/most-agent-memory-setups-just-save-everything-and-hope-semantic-search-will-fix-it-i-share-a-38m5</guid>
      <description>&lt;p&gt;

&lt;/p&gt;
&lt;div class="ltag__link--embedded"&gt;
  &lt;div class="crayons-story "&gt;
  &lt;a href="https://dev.to/qtalen/advanced-redisvl-long-term-memory-tutorial-using-an-llm-to-extract-memories-35kc" class="crayons-story__hidden-navigation-link"&gt;Advanced RedisVL Long-term Memory Tutorial: Using an LLM to Extract Memories&lt;/a&gt;


  &lt;div class="crayons-story__body crayons-story__body-full_post"&gt;
    &lt;div class="crayons-story__top"&gt;
      &lt;div class="crayons-story__meta"&gt;
        &lt;div class="crayons-story__author-pic"&gt;

          &lt;a href="/qtalen" class="crayons-avatar  crayons-avatar--l  "&gt;
            &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2927908%2F838177c1-066e-4986-a3ba-d764afa88632.jpg" alt="qtalen profile" class="crayons-avatar__image"&gt;
          &lt;/a&gt;
        &lt;/div&gt;
        &lt;div&gt;
          &lt;div&gt;
            &lt;a href="/qtalen" class="crayons-story__secondary fw-medium m:hidden"&gt;
              Peng Qian
            &lt;/a&gt;
            &lt;div class="profile-preview-card relative mb-4 s:mb-0 fw-medium hidden m:inline-block"&gt;
              
                Peng Qian
                
              
              &lt;div id="story-author-preview-content-3234678" class="profile-preview-card__content crayons-dropdown branded-7 p-4 pt-0"&gt;
                &lt;div class="gap-4 grid"&gt;
                  &lt;div class="-mt-4"&gt;
                    &lt;a href="/qtalen" class="flex"&gt;
                      &lt;span class="crayons-avatar crayons-avatar--xl mr-2 shrink-0"&gt;
                        &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2927908%2F838177c1-066e-4986-a3ba-d764afa88632.jpg" class="crayons-avatar__image" alt=""&gt;
                      &lt;/span&gt;
                      &lt;span class="crayons-link crayons-subtitle-2 mt-5"&gt;Peng Qian&lt;/span&gt;
                    &lt;/a&gt;
                  &lt;/div&gt;
                  &lt;div class="print-hidden"&gt;
                    
                      Follow
                    
                  &lt;/div&gt;
                  &lt;div class="author-preview-metadata-container"&gt;&lt;/div&gt;
                &lt;/div&gt;
              &lt;/div&gt;
            &lt;/div&gt;

          &lt;/div&gt;
          &lt;a href="https://dev.to/qtalen/advanced-redisvl-long-term-memory-tutorial-using-an-llm-to-extract-memories-35kc" class="crayons-story__tertiary fs-xs"&gt;&lt;time&gt;Feb 10&lt;/time&gt;&lt;span class="time-ago-indicator-initial-placeholder"&gt;&lt;/span&gt;&lt;/a&gt;
        &lt;/div&gt;
      &lt;/div&gt;

    &lt;/div&gt;

    &lt;div class="crayons-story__indention"&gt;
      &lt;h2 class="crayons-story__title crayons-story__title-full_post"&gt;
        &lt;a href="https://dev.to/qtalen/advanced-redisvl-long-term-memory-tutorial-using-an-llm-to-extract-memories-35kc" id="article-link-3234678"&gt;
          Advanced RedisVL Long-term Memory Tutorial: Using an LLM to Extract Memories
        &lt;/a&gt;
      &lt;/h2&gt;
        &lt;div class="crayons-story__tags"&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/ai"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;ai&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/agents"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;agents&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/programming"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;programming&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/redis"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;redis&lt;/a&gt;
        &lt;/div&gt;
      &lt;div class="crayons-story__bottom"&gt;
        &lt;div class="crayons-story__details"&gt;
          &lt;a href="https://dev.to/qtalen/advanced-redisvl-long-term-memory-tutorial-using-an-llm-to-extract-memories-35kc" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left"&gt;
            &lt;div class="multiple_reactions_aggregate"&gt;
              &lt;span class="multiple_reactions_icons_container"&gt;
                  &lt;span class="crayons_icon_container"&gt;
                    &lt;img src="https://assets.dev.to/assets/sparkle-heart-5f9bee3767e18deb1bb725290cb151c25234768a0e9a2bd39370c382d02920cf.svg" width="18" height="18"&gt;
                  &lt;/span&gt;
              &lt;/span&gt;
              &lt;span class="aggregate_reactions_counter"&gt;2&lt;span class="hidden s:inline"&gt; reactions&lt;/span&gt;&lt;/span&gt;
            &lt;/div&gt;
          &lt;/a&gt;
            &lt;a href="https://dev.to/qtalen/advanced-redisvl-long-term-memory-tutorial-using-an-llm-to-extract-memories-35kc#comments" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left flex items-center"&gt;
              Comments


              1&lt;span class="hidden s:inline"&gt; comment&lt;/span&gt;
            &lt;/a&gt;
        &lt;/div&gt;
        &lt;div class="crayons-story__save"&gt;
          &lt;small class="crayons-story__tertiary fs-xs mr-2"&gt;
            10 min read
          &lt;/small&gt;
            
              &lt;span class="bm-initial"&gt;
                

              &lt;/span&gt;
              &lt;span class="bm-success"&gt;
                

              &lt;/span&gt;
            
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;




</description>
      <category>ai</category>
      <category>agents</category>
      <category>programming</category>
      <category>redis</category>
    </item>
    <item>
      <title>Advanced RedisVL Long-term Memory Tutorial: Using an LLM to Extract Memories</title>
      <dc:creator>Peng Qian</dc:creator>
      <pubDate>Tue, 10 Feb 2026 11:49:27 +0000</pubDate>
      <link>https://dev.to/qtalen/advanced-redisvl-long-term-memory-tutorial-using-an-llm-to-extract-memories-35kc</link>
      <guid>https://dev.to/qtalen/advanced-redisvl-long-term-memory-tutorial-using-an-llm-to-extract-memories-35kc</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;In this weekend note, we keep talking about how to build long-term memory for an agent with RedisVL.&lt;/p&gt;

&lt;p&gt;When we build a long-term memory module for an agent, we need to care about two points most:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;After long running, will the saved memories grow too large and cause context explosion?&lt;/li&gt;
&lt;li&gt;How do we recall the memories that matter most to the current context?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We will solve these two problems today.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;TLDR:&lt;/strong&gt; In this hands-on tutorial, we first use an LLM to extract information from user messages that has value for later chats. Then we store that as long-term memory in RedisVL. When needed, we search related memories with semantic search. With this setup, the agent understands the past context of the user and gives more accurate answers.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjfkidrk7fzb6y54ql7x8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjfkidrk7fzb6y54ql7x8.png" alt="After using an LLM to extract memories, how my long‑term memory module runs. Image by Author" width="800" height="236"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;With this kind of long-term memory, we do not worry about memory explosion after long running. We also do not worry that unrelated memories will hurt LLM responses.&lt;/p&gt;

&lt;p&gt;You can get all the source code at the end of this post.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why do we do this?
&lt;/h3&gt;

&lt;p&gt;In the last hands-on post, I shared how to build short-term and long-term memory for an agent with RedisVL:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.dataleadsfuture.com/build-long-term-and-short-term-memory-for-agents-using-redisvl/" rel="noopener noreferrer"&gt;Build Long-Term and Short-Term Memory for Agents Using RedisVL&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The short-term part works very well. RedisVL API feels much simpler than the raw Redis API. I can write that code with little effort.&lt;/p&gt;

&lt;p&gt;The long-term part does not work. We follow the official example and store user queries and LLM responses in RedisVL. When chat continues, RedisVL keeps pulling repeated queries or unrelated answers via semantic search. This troubles the LLM a lot and blocks the chat from going on.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can we avoid RedisVL?
&lt;/h3&gt;

&lt;p&gt;Your boss will not agree. There is already Redis in your stack. Why do you still want to install mem0 or other open source tools? How about extra cost? This is real life.&lt;/p&gt;

&lt;p&gt;So we still need to make RedisVL work. But I do not want to run around like a headless fly. Before that I want to see how humans handle memory.&lt;/p&gt;




&lt;h2&gt;
  
  
  How Humans Handle Memory
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What deserves memory
&lt;/h3&gt;

&lt;p&gt;First, we need to know one thing. Only information that centers on me and links to me tightly deserves a sticky note.&lt;/p&gt;

&lt;p&gt;So what information about me do I want to write down?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Preference settings such as tools I like, languages I use, my schedule, and my tone when I talk&lt;/li&gt;
&lt;li&gt;Stable personal info such as my role, my time zone, and my daily habits&lt;/li&gt;
&lt;li&gt;Goals and decisions, such as chosen options, plans, and sentences that start with “I decide to...”&lt;/li&gt;
&lt;li&gt;Key milestones such as job change, moving, deadlines, and product launches&lt;/li&gt;
&lt;li&gt;Work and project context, such as project names, stakeholders, needs, and status like “done/next” step.&lt;/li&gt;
&lt;li&gt;Repeated pain points or strong views that will change LLM advice later&lt;/li&gt;
&lt;li&gt;Things I say with “remember this ...” or “do not forget ...”&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  What does not deserve memory
&lt;/h3&gt;

&lt;p&gt;I do not plan to store any LLM answer. LLM answers to the same question will change with context. So LLM answers in long-term memory do not help much.&lt;/p&gt;

&lt;p&gt;Besides LLM answers, I also do not want to keep these:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;One-time small things that likely will not matter later&lt;/li&gt;
&lt;li&gt;Very sensitive personal data such as health diagnosis, exact address, government IDs, passwords, bank accounts&lt;/li&gt;
&lt;li&gt;Things I clearly ask not to remember&lt;/li&gt;
&lt;li&gt;Things I already wrote down on the sticky note&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Design a Prompt for LLM Memory Extraction
&lt;/h2&gt;

&lt;p&gt;Now we know how humans handle memory. Next, I want to build an agent that follows the same rules and extracts memories from my daily chats.&lt;/p&gt;

&lt;p&gt;The key lives in the &lt;code&gt;system prompt&lt;/code&gt;. I need to describe all rules in the system prompt. Then I ask the agent to follow these rules with very high consistency.&lt;/p&gt;

&lt;p&gt;In the past, I might have tried some “write 1000-line prompt” challenge. Now I do not need that. I just open any LLM client, paste these rules, then ask the LLM to help me write a &lt;code&gt;system prompt&lt;/code&gt;. This takes less than one minute.&lt;/p&gt;

&lt;p&gt;After a few tries, I pick one I like. Here is that &lt;code&gt;system prompt&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tex"&gt;&lt;code&gt;Your job:
Based only on the user’s current input and the existing related memories, decide whether you need to add a new “long-term memory,” and if needed, **extract just one fact**. You do not talk to the user. You only handle memory extraction and deduplication.

---

### 1. Core principles

1. Only save information that **will likely be useful in the future**.
2. **At most one fact per turn**, and it must clearly appear in the current input.
3. **Never invent or infer anything**. You can only restate or lightly rephrase what the user has explicitly said.
4. If the current input has nothing worth keeping, or the information is already in the related memories, then do not add a new memory.

---

### 2. What counts as “long-term memory”

Only consider the categories below, and decide whether the information has long-term value:

...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Due to space, I only show part of the prompt here. You can get the full prompt from the source code at the end.&lt;/p&gt;




&lt;h2&gt;
  
  
  Build a ContextProvider for Long-term Memory
&lt;/h2&gt;

&lt;p&gt;After we finish the memory extraction rules, we start to build the long-term memory module for the agent.&lt;/p&gt;

&lt;p&gt;For future use, I still pick Microsoft Agent Framework MAF. It gives a &lt;code&gt;ContextProvider&lt;/code&gt; feature that lets us plug long-term memory into the agent in a simple way.&lt;/p&gt;

&lt;p&gt;Of course the principle of long-term memory stays the same. You can use any agent framework you like and build your own memory module. Or you can ignore frameworks and first build storage and retrieval of memories, and then call them through function calls. That is fine.&lt;/p&gt;

&lt;h3&gt;
  
  
  Run memory extraction in sequence
&lt;/h3&gt;

&lt;p&gt;In the last post, I already built a long-term memory module with &lt;code&gt;ContextProvider&lt;/code&gt;. The new version looks similar. But this time, we use an LLM to extract memories. So after we set up &lt;code&gt;ContextProvider&lt;/code&gt;, we first use the &lt;code&gt;system prompt&lt;/code&gt; to build a memory extraction agent.&lt;/p&gt;

&lt;p&gt;If you do not know how to use &lt;code&gt;ContextProvider&lt;/code&gt; yet, I suggest you read my last post again. That post explains &lt;code&gt;ChatMessageStore&lt;/code&gt; and &lt;code&gt;ContextProvider&lt;/code&gt; in Microsoft Agent Framework in detail:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.dataleadsfuture.com/build-long-term-and-short-term-memory-for-agents-using-redisvl/" rel="noopener noreferrer"&gt;Build Long-Term and Short-Term Memory for Agents Using RedisVL&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To avoid too much unrelated data from Redis, I set the &lt;code&gt;distance_threshold&lt;/code&gt; value pretty small. But not too small. If too small, then it loses meaning. You can pick the value you like.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;LongTermMemory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ContextProvider&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;thread_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;session_tag&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;distance_threshold&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;context_prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ContextProvider&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DEFAULT_CONTEXT_PROMPT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;redis_url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;redis://localhost:6379&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;embedding_model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;BAAI/bge-m3&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;llm_model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Qwen3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NEXT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;llm_api_key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;llm_base_url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="bp"&gt;...&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_init_extractor&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_init_extractor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prompt.md&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;encoding&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;utf-8&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;system_prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_extractor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAILikeChatClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;model_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_llm_model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;as_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;extractor&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;instructions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;default_options&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;response_format&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ExtractResult&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;extra_body&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;enable_thinking&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next we implement the &lt;code&gt;invoking&lt;/code&gt; method. This method runs before the user agent calls the LLM. In this method, we extract and store long-term memory.&lt;/p&gt;

&lt;p&gt;To make the logic clear, I first implement the &lt;code&gt;invoking&lt;/code&gt; method in order, as in this diagram:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5nwk96ayu5xi64e3138n.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5nwk96ayu5xi64e3138n.png" alt="Run the memory extraction logic step by step in order. Image by Author" width="800" height="236"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;When a new user request comes into &lt;code&gt;ContextProvider&lt;/code&gt;, we first search RedisVL with semantic search for the most similar memories.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;LongTermMemory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ContextProvider&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="bp"&gt;...&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;invoking&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ChatMessage&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;MutableSequence&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;ChatMessage&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ChatMessage&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

        &lt;span class="n"&gt;line_sep_memories&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_get_line_sep_memories&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="bp"&gt;...&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_get_line_sep_memories&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_semantic_store&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_relevant&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;session_tag&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_session_tag&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;line_sep_memories&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;* &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;line_sep_memories&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next, we send these existing memories plus the user request to the memory extraction agent. That agent first checks if anything is worth saving according to the rules. Then it extracts a new helpful memory from the user request and saves it into RedisVL.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;LongTermMemory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ContextProvider&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="bp"&gt;...&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;invoking&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ChatMessage&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;MutableSequence&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;ChatMessage&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="bp"&gt;...&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_save_memory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;line_sep_memories&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="bp"&gt;...&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_save_memory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ChatMessage&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;MutableSequence&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;ChatMessage&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="n"&gt;relevant_memory&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;detect_messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="nc"&gt;ChatMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;Role&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;USER&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Existing related memories：&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;relevant_memory&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;relevant_memory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_extractor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;detect_messages&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;extract_result&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ExtractResult&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;cast&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ExtractResult&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;extract_result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;should_write_memory&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_semantic_store&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_messages&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
                    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;extract_result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;memory_to_write&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
                &lt;span class="p"&gt;],&lt;/span&gt;
                &lt;span class="n"&gt;session_tag&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_session_tag&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Last, we put the memories from RedisVL into &lt;code&gt;Context&lt;/code&gt; as extra context. These memory messages get merged into the history of the real chat agent. They give the chat agent extra background to produce answers.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;LongTermMemory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ContextProvider&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="bp"&gt;...&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;invoking&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ChatMessage&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;MutableSequence&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;ChatMessage&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="bp"&gt;...&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="nc"&gt;ChatMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_context_prompt&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;line_sep_memories&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;line_sep_memories&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now we build a simple chat agent to test the new long-term memory module:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAILikeChatClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;Qwen3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;MAX&lt;/span&gt;
&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;as_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;assistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;instructions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a helpful assistant.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;context_provider&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;LongTermMemory&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;thread&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_new_thread&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;user_input&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;input&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;User: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;startswith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;exit&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="k"&gt;break&lt;/span&gt;
        &lt;span class="n"&gt;stream&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run_stream&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;thread&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;thread&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Assistant: &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;end&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;flush&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now we chat with the agent and see how it works:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqcnctscvwqhjh6w2zn5n.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqcnctscvwqhjh6w2zn5n.png" alt="The yellow parts are memories fetched from RedisVL, and the green parts are memories extracted by the LLM. Image by Author" width="800" height="331"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;From MLFlow we see that the retrieved memories go in as a separate message in the chat history:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9h8qiasa2ggguho0xm7b.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9h8qiasa2ggguho0xm7b.png" alt="The retrieved memory will be added to the chat history as a separate message. Image by Author" width="800" height="495"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Then we check Redis and see what memories we saved:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcrir4vlqejr2vekvt8eg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcrir4vlqejr2vekvt8eg.png" alt="The info that the LLM pulls out from the user’s past conversations will be saved into Redis as valuable memories. Image by Author" width="800" height="457"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We can see that as the chat goes on, the new long-term memory module no longer stores and retrieves all chat history without filter. It keeps and retrieves only memories that matter to the user, and these memories give strong help in later chats.&lt;/p&gt;

&lt;h3&gt;
  
  
  Use concurrency to speed up
&lt;/h3&gt;

&lt;p&gt;Everything looks fine except for the part where we use an LLM to extract memories that deserve saving.&lt;/p&gt;

&lt;p&gt;The largest delay in an agent often comes from LLM calls. Now we add one more LLM call. We also need to wait for the LLM to decide whether to save memory before we go on with the real chat.&lt;/p&gt;

&lt;p&gt;We can add some logs and see how much delay we add:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgopspxtr21l7dmgrpqrp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgopspxtr21l7dmgrpqrp.png" alt="Every time we use the LLM to pull out memories, it takes a bit more than one second. Image by Author" width="800" height="333"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We add more than one second per chat turn.&lt;/p&gt;

&lt;p&gt;One way to optimize is to use a smaller model like &lt;code&gt;qwen3-8b&lt;/code&gt;. But the gain stays small. We save little time and hurt memory quality due to the smaller model.&lt;/p&gt;

&lt;p&gt;Today I use a different way. I use concurrent programming so that the LLM call for memory extraction and the LLM call for user reply run at the same time.&lt;/p&gt;

&lt;p&gt;Let us see the result after that change:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F72jeq2os0151pg3sta11.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F72jeq2os0151pg3sta11.png" alt="The memory retrieval step basically doesn’t take any time at all, so how did I pull that off? Image by Author" width="800" height="384"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The time cost for extraction and storage of memory becomes almost nothing while the effect stays the same. How do we reach that?&lt;/p&gt;

&lt;p&gt;If you built multi-agent workflows with LangGraph or LlamaIndex, you have likely seen the fan-out idea. It lets many nodes run at the same time, and then you take the final result.&lt;/p&gt;

&lt;p&gt;The base idea uses the &lt;code&gt;asyncio&lt;/code&gt; module in Python. You often see &lt;code&gt;async&lt;/code&gt; and &lt;code&gt;await&lt;/code&gt; when you write agent code. I wrote many posts in the past about &lt;code&gt;asyncio&lt;/code&gt; and concurrency:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.dataleadsfuture.com/use-these-methods-to-make-your-python-concurrent-tasks-perform-better/" rel="noopener noreferrer"&gt;Use These Methods to Make Your Python Concurrent Tasks Perform Better&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In short, when you face delays because of long IO calls, you can use concurrent programming.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; If you use &lt;code&gt;mlflow.openai.autolog()&lt;/code&gt; to trace LLM calls, you may see that concurrent runs stop working. I still do not know why. I suggest you comment out MLFlow parts before you go on.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F57jdd3pzzalnd29zmmc2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F57jdd3pzzalnd29zmmc2.png" alt="By running the memory retrieval process with concurrent programming. Image by Author" width="800" height="460"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Back to our current case. We see that &lt;code&gt;_save_memory&lt;/code&gt; is an &lt;code&gt;async&lt;/code&gt; method. That means we can run it with concurrency.&lt;/p&gt;

&lt;p&gt;How do we do that? Very simple. When we call &lt;code&gt;_save_memory&lt;/code&gt;, we use &lt;code&gt;asyncio.create_task&lt;/code&gt; to build a new concurrent task. That is all.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;LongTermMemory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ContextProvider&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="bp"&gt;...&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;invoking&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ChatMessage&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;MutableSequence&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;ChatMessage&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="bp"&gt;...&lt;/span&gt;
        &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_task&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_save_memory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;line_sep_memories&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="bp"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Since real user chat often takes more time than memory extraction, we do not need to wait for that task in code. We only need to create the task.&lt;/p&gt;

&lt;p&gt;With that, we add a memory extraction module that does not bring much extra delay to the agent system.&lt;/p&gt;




&lt;h2&gt;
  
  
  Conslusion
&lt;/h2&gt;

&lt;p&gt;Redis now serves as standard infra for many companies. With RedisVL, it can cache and search information by semantics. This makes it easier to build short-term and long-term memory on top of Redis.&lt;/p&gt;

&lt;p&gt;But if you build long-term memory with RedisVL API directly, you may not see good results. The system has no “brain” to judge which information deserves long-term storage and keeps value over time.&lt;/p&gt;

&lt;p&gt;So in this tutorial, I first use an agent to extract useful memories and then write them into RedisVL. This improves the value of saved information. Long-term memory works much better now and fills the gap in my last post.&lt;/p&gt;

&lt;p&gt;I also share a short guide on how to use concurrent programming so that many LLM calls run at the same time. This cuts system delay by a lot. If you like concurrency, you can read my old posts.&lt;/p&gt;

&lt;p&gt;Thanks for reading. If you have any questions or ideas, leave me a note. I will reply as soon as I can.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.dataleadsfuture.com/advanced-redisvl-long-term-memory-tutorial-using-an-llm-to-extract-memories/#/portal/signup/" rel="noopener noreferrer"&gt;&lt;strong&gt;Do not forget to subscribe to my blog&lt;/strong&gt;&lt;/a&gt; and follow my new work in AI applications.&lt;/p&gt;

&lt;p&gt;Also share this post with your friends. It may help more people.&lt;/p&gt;




&lt;p&gt;Enjoyed this read? &lt;a href="https://www.dataleadsfuture.com/#/portal/signup" rel="noopener noreferrer"&gt;&lt;strong&gt;Subscribe now to get more cutting-edge data science tips straight to your inbox!&lt;/strong&gt;&lt;/a&gt; Your feedback and questions are welcome — let’s discuss in the comments below!&lt;/p&gt;

&lt;p&gt;This article was originally published on &lt;a href="https://www.dataleadsfuture.com/advanced-redisvl-long-term-memory-tutorial-using-an-llm-to-extract-memories/" rel="noopener noreferrer"&gt;Data Leads Future&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>programming</category>
      <category>redis</category>
    </item>
    <item>
      <title>Tried RedisVL as a memory layer for AI agents:

✅ Short-term memory with MessageHistory: works great
😒 Long-term semantic memory with SemanticMessageHistory: not so much

If you are thinking about RedisVL for agent memory, this will save you time:</title>
      <dc:creator>Peng Qian</dc:creator>
      <pubDate>Thu, 29 Jan 2026 02:49:49 +0000</pubDate>
      <link>https://dev.to/qtalen/tried-redisvl-as-a-memory-layer-for-ai-agents-short-term-memory-with-messagehistory-works-hd3</link>
      <guid>https://dev.to/qtalen/tried-redisvl-as-a-memory-layer-for-ai-agents-short-term-memory-with-messagehistory-works-hd3</guid>
      <description>&lt;p&gt;

&lt;/p&gt;
&lt;div class="ltag__link--embedded"&gt;
  &lt;div class="crayons-story "&gt;
  &lt;a href="https://dev.to/qtalen/build-long-term-and-short-term-memory-for-agents-using-redisvl-4h8m" class="crayons-story__hidden-navigation-link"&gt;Build Long-Term and Short-Term Memory for Agents Using RedisVL&lt;/a&gt;


  &lt;div class="crayons-story__body crayons-story__body-full_post"&gt;
    &lt;div class="crayons-story__top"&gt;
      &lt;div class="crayons-story__meta"&gt;
        &lt;div class="crayons-story__author-pic"&gt;

          &lt;a href="/qtalen" class="crayons-avatar  crayons-avatar--l  "&gt;
            &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2927908%2F838177c1-066e-4986-a3ba-d764afa88632.jpg" alt="qtalen profile" class="crayons-avatar__image"&gt;
          &lt;/a&gt;
        &lt;/div&gt;
        &lt;div&gt;
          &lt;div&gt;
            &lt;a href="/qtalen" class="crayons-story__secondary fw-medium m:hidden"&gt;
              Peng Qian
            &lt;/a&gt;
            &lt;div class="profile-preview-card relative mb-4 s:mb-0 fw-medium hidden m:inline-block"&gt;
              
                Peng Qian
                
              
              &lt;div id="story-author-preview-content-3193063" class="profile-preview-card__content crayons-dropdown branded-7 p-4 pt-0"&gt;
                &lt;div class="gap-4 grid"&gt;
                  &lt;div class="-mt-4"&gt;
                    &lt;a href="/qtalen" class="flex"&gt;
                      &lt;span class="crayons-avatar crayons-avatar--xl mr-2 shrink-0"&gt;
                        &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2927908%2F838177c1-066e-4986-a3ba-d764afa88632.jpg" class="crayons-avatar__image" alt=""&gt;
                      &lt;/span&gt;
                      &lt;span class="crayons-link crayons-subtitle-2 mt-5"&gt;Peng Qian&lt;/span&gt;
                    &lt;/a&gt;
                  &lt;/div&gt;
                  &lt;div class="print-hidden"&gt;
                    
                      Follow
                    
                  &lt;/div&gt;
                  &lt;div class="author-preview-metadata-container"&gt;&lt;/div&gt;
                &lt;/div&gt;
              &lt;/div&gt;
            &lt;/div&gt;

          &lt;/div&gt;
          &lt;a href="https://dev.to/qtalen/build-long-term-and-short-term-memory-for-agents-using-redisvl-4h8m" class="crayons-story__tertiary fs-xs"&gt;&lt;time&gt;Jan 29&lt;/time&gt;&lt;span class="time-ago-indicator-initial-placeholder"&gt;&lt;/span&gt;&lt;/a&gt;
        &lt;/div&gt;
      &lt;/div&gt;

    &lt;/div&gt;

    &lt;div class="crayons-story__indention"&gt;
      &lt;h2 class="crayons-story__title crayons-story__title-full_post"&gt;
        &lt;a href="https://dev.to/qtalen/build-long-term-and-short-term-memory-for-agents-using-redisvl-4h8m" id="article-link-3193063"&gt;
          Build Long-Term and Short-Term Memory for Agents Using RedisVL
        &lt;/a&gt;
      &lt;/h2&gt;
        &lt;div class="crayons-story__tags"&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/programming"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;programming&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/ai"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;ai&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/datascience"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;datascience&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/redis"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;redis&lt;/a&gt;
        &lt;/div&gt;
      &lt;div class="crayons-story__bottom"&gt;
        &lt;div class="crayons-story__details"&gt;
          &lt;a href="https://dev.to/qtalen/build-long-term-and-short-term-memory-for-agents-using-redisvl-4h8m" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left"&gt;
            &lt;div class="multiple_reactions_aggregate"&gt;
              &lt;span class="multiple_reactions_icons_container"&gt;
                  &lt;span class="crayons_icon_container"&gt;
                    &lt;img src="https://assets.dev.to/assets/sparkle-heart-5f9bee3767e18deb1bb725290cb151c25234768a0e9a2bd39370c382d02920cf.svg" width="18" height="18"&gt;
                  &lt;/span&gt;
              &lt;/span&gt;
              &lt;span class="aggregate_reactions_counter"&gt;1&lt;span class="hidden s:inline"&gt; reaction&lt;/span&gt;&lt;/span&gt;
            &lt;/div&gt;
          &lt;/a&gt;
            &lt;a href="https://dev.to/qtalen/build-long-term-and-short-term-memory-for-agents-using-redisvl-4h8m#comments" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left flex items-center"&gt;
              Comments


              &lt;span class="hidden s:inline"&gt;Add Comment&lt;/span&gt;
            &lt;/a&gt;
        &lt;/div&gt;
        &lt;div class="crayons-story__save"&gt;
          &lt;small class="crayons-story__tertiary fs-xs mr-2"&gt;
            9 min read
          &lt;/small&gt;
            
              &lt;span class="bm-initial"&gt;
                

              &lt;/span&gt;
              &lt;span class="bm-success"&gt;
                

              &lt;/span&gt;
            
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;




</description>
      <category>programming</category>
      <category>ai</category>
      <category>datascience</category>
      <category>redis</category>
    </item>
    <item>
      <title>Build Long-Term and Short-Term Memory for Agents Using RedisVL</title>
      <dc:creator>Peng Qian</dc:creator>
      <pubDate>Thu, 29 Jan 2026 02:48:18 +0000</pubDate>
      <link>https://dev.to/qtalen/build-long-term-and-short-term-memory-for-agents-using-redisvl-4h8m</link>
      <guid>https://dev.to/qtalen/build-long-term-and-short-term-memory-for-agents-using-redisvl-4h8m</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;For this weekend note, I want to share some tries I made using RedisVL to add short-term and long-term memory to my agent system.&lt;/p&gt;

&lt;p&gt;TLDR: RedisVL works pretty well for short-term memory. It feels a bit simpler than using the traditional Redis API. For long-term memory with semantic search, the experience is not good. I do not recommend it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why RedisVL?
&lt;/h3&gt;

&lt;p&gt;Big companies like to use mature infrastructure to build new features.&lt;/p&gt;

&lt;p&gt;We know mem0 and Graphiti are good open source software for long-term agent memory. But companies want to stay safe. Building new infrastructure costs money. It is unstable. It needs people who know how to run it.&lt;/p&gt;

&lt;p&gt;So when Redis launched RedisVL with vector search, we naturally wanted to try it first. You can connect it to existing Redis clusters and start using it. That sounds nice. But is it really nice? We need to try it for real.&lt;/p&gt;

&lt;p&gt;Today I will cover how to use &lt;code&gt;MessageHistory&lt;/code&gt; and &lt;code&gt;SemanticMessageHistory&lt;/code&gt; from RedisVL to add short-term and long-term memory to agents built on the Microsoft Agent Framework.&lt;/p&gt;

&lt;p&gt;You can find the source code at the end of this article.&lt;/p&gt;




&lt;p&gt;📫 &lt;a href="https://www.dataleadsfuture.com/build-long-term-and-short-term-memory-for-agents-using-redisvl/#/portal/signup" rel="noopener noreferrer"&gt;&lt;strong&gt;Don’t forget to follow my blog to stay updated on my latest progress in AI application practices.&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Preparation
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Install Redis
&lt;/h3&gt;

&lt;p&gt;If you want to try it locally, you can install a Redis instance with Docker.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker run &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="nt"&gt;--name&lt;/span&gt; redis &lt;span class="nt"&gt;-p&lt;/span&gt; 6379:6379 &lt;span class="nt"&gt;-p&lt;/span&gt; 8001:8001 redis/redis-stack:latest
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Cannot use Docker Desktop? See my other article.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.dataleadsfuture.com/a-quick-guide-to-containerizing-agent-applications-with-podman/?source=post_page-----b6919f293d16---------------------------------------" rel="noopener noreferrer"&gt;A Quick Guide to Containerizing Agent Applications with Podman&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The Redis instance will listen on ports 6379 and 8001. Your RedisVL client should connect to &lt;code&gt;redis://localhost:6379&lt;/code&gt;. You can visit &lt;code&gt;http://localhost:8001&lt;/code&gt; in the browser to open the Redis console.&lt;/p&gt;

&lt;h3&gt;
  
  
  Install RedisVL
&lt;/h3&gt;

&lt;p&gt;Install RedisVL with pip.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;redisvl
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After installation, you can use the RedisVL CLI to manage your indexes and keep your testing neat.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;rvl index listall
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Implement Short-Term Memory Using MessageHistory
&lt;/h2&gt;

&lt;p&gt;There are lots of “How to” RedisVL articles online, so let’s start straight from Microsoft Agent Framework and see how to use &lt;code&gt;MessageHistory&lt;/code&gt; for short-term memory.&lt;/p&gt;

&lt;p&gt;As in the official tutorial, you should implement a &lt;code&gt;RedisVLMessageStore&lt;/code&gt; based on &lt;code&gt;ChatMessageStoreProtocol&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;RedisVLMessageStore&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ChatMessageStoreProtocol&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;thread_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;common_thread&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;top_k&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;session_tag&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;redis_url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;redis://localhost:6379&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_thread_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;thread_id&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_top_k&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;top_k&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_session_tag&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;session_tag&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;session_&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;uuid4&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_redis_url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;redis_url&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_init_message_history&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In &lt;code&gt;__init__&lt;/code&gt; you should note two parameters.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;thread_id&lt;/code&gt; is used for the &lt;code&gt;name&lt;/code&gt; parameter when creating &lt;code&gt;MessageHistory&lt;/code&gt;. I like to bind it to the agent. Each agent gets a unique &lt;code&gt;thread_id&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;session_tag&lt;/code&gt; lets you set a tag for each user so different sessions do not mix.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The protocol asks us to implement two methods &lt;code&gt;list_messages&lt;/code&gt; and &lt;code&gt;add_messages&lt;/code&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;list_messages&lt;/code&gt; runs before the agent calls the LLM. It gets all available chat messages from the message store. It takes no parameters, so it cannot support long-term memory. More on that later.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;add_messages&lt;/code&gt; runs after the agent gets the LLM’s reply. It stores new messages into the message store.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here is how the message store works.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgtw2bb5qyugnkn5hrm0k.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgtw2bb5qyugnkn5hrm0k.png" alt="The calling order of message store in the agent. Image by Author" width="686" height="473"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;So in &lt;code&gt;list_messages&lt;/code&gt; and &lt;code&gt;add_messages&lt;/code&gt;, we just use RedisVL’s &lt;code&gt;MessageHistory&lt;/code&gt; to do the job.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;list_messages&lt;/code&gt; below uses &lt;code&gt;get_recent&lt;/code&gt; to get &lt;code&gt;top_k&lt;/code&gt; recent messages and turns them into &lt;code&gt;ChatMessage&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;RedisVLMessageStore&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ChatMessageStoreProtocol&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="bp"&gt;...&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;list_messages&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;ChatMessage&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_message_history&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_recent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;top_k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_top_k&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;session_tag&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_session_tag&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_back_to_chat_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;add_messages&lt;/code&gt; turns the &lt;code&gt;ChatMessage&lt;/code&gt; into Redis messages and calls &lt;code&gt;add_messages&lt;/code&gt; to store them.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;RedisVLMessageStore&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ChatMessageStoreProtocol&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="bp"&gt;...&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;add_messages&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Sequence&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;ChatMessage&lt;/span&gt;&lt;span class="p"&gt;]):&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_to_redis_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_message_history&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_messages&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;session_tag&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_session_tag&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is short-term memory done with RedisVL. You may also implement &lt;code&gt;deserialize&lt;/code&gt;, &lt;code&gt;serialize&lt;/code&gt; and &lt;code&gt;update_from_state&lt;/code&gt; for saving and loading the memory, but it is not important now. See the full code at the end.&lt;/p&gt;

&lt;h3&gt;
  
  
  Test RedisVLMessageStore
&lt;/h3&gt;

&lt;p&gt;Let’s build an agent and test the message store.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAILikeChatClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;Qwen3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NEXT&lt;/span&gt;
&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;create_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;assistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;instructions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;re a little helper who answers my questions in one sentence.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;chat_message_store_factory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;RedisVLMessageStore&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;session_tag&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user_abc&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now a console loop for multi-turn dialog. Remember, Microsoft Agent Framework does not support short-term memory unless you use an &lt;code&gt;AgentThread&lt;/code&gt; and pass it to &lt;code&gt;run&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;thread&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_new_thread&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;user_input&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;input&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;User: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;startswith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;exit&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="k"&gt;break&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;thread&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;thread&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Assistant: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;thread&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;message_store&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;clear&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;AgentThread&lt;/code&gt; when created calls the factory method to build the &lt;code&gt;RedisVLMessageStore&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;To check if the store works, we can use &lt;code&gt;mlflow.openai.autolog()&lt;/code&gt; to see if messages sent to the LLM contain historical messages.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;mlflow&lt;/span&gt;
&lt;span class="n"&gt;mlflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_tracking_uri&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;MLFLOW_TRACKING_URI&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;mlflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_experiment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Default&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;mlflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;openai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;autolog&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7wyl7jsqr7yjpa0dsesu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7wyl7jsqr7yjpa0dsesu.png" alt="You can see that the conversation comes with a complete history of messages. Image by Author" width="720" height="648"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;See my other article for using MLFlow to track LLM calls.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.dataleadsfuture.com/monitoring-qwen-3-agents-with-mlflow-3-x-end-to-end-tracking-tutorial/" rel="noopener noreferrer"&gt;Monitoring Qwen 3 Agents with MLflow 3.x: End-to-End Tracing Tutorial&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Let’s open the Redis console to see the cache.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqhxotsxih8tzwnl4mj0w.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqhxotsxih8tzwnl4mj0w.png" alt="How the cache is stored in Redis. Image by Author" width="720" height="248"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As you can see, after using &lt;code&gt;MessageHistory&lt;/code&gt; as MAF's message store, we can implement multi-turn conversations with historical messages.&lt;/p&gt;

&lt;p&gt;With &lt;code&gt;thread_id&lt;/code&gt; and &lt;code&gt;session_tag&lt;/code&gt; parameters, we can also implement the feature that lets users switch between multiple conversation sessions, like in popular LLM chat applications.&lt;/p&gt;

&lt;p&gt;Feels simpler than the official &lt;code&gt;RedisMessageStore&lt;/code&gt; solution right?&lt;/p&gt;




&lt;h2&gt;
  
  
  Implement Long-Term Memory Using SemanticMessageHistory
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;SemanticMessageHistory&lt;/code&gt; is a subclass of &lt;code&gt;MessageHistory&lt;/code&gt;. It adds a &lt;code&gt;get_relevant&lt;/code&gt; method for vector search.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;what have I learned about the size of England?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;semantic_history&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_distance_threshold&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.35&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;semantic_history&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_relevant&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tex"&gt;&lt;code&gt;Batches: 100&lt;span class="c"&gt;%|██████████| 1/1 [00:00&amp;lt;00:00, 56.30it/s]&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;'role': 'user', 'content': 'what is the size of England compared to Portugal?'&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Compared to &lt;code&gt;MessageHistory&lt;/code&gt; the big thing here is that we can get the most relevant historical messages based on the user request.&lt;/p&gt;

&lt;p&gt;You might think that if &lt;code&gt;MessageStore&lt;/code&gt; short-term memory is nice, then &lt;code&gt;SemanticMessageHistory&lt;/code&gt; with semantic search must be even better.&lt;/p&gt;

&lt;p&gt;From my experience, this is not the case.&lt;/p&gt;

&lt;p&gt;From my test results, it is not like that. Let’s now make a long-term memory adapter for Microsoft Agent Framework using &lt;code&gt;SemanticMessageHistory&lt;/code&gt; and see the result.&lt;/p&gt;

&lt;h3&gt;
  
  
  Use SemanticMessageHistory in Microsoft Agent Framework
&lt;/h3&gt;

&lt;p&gt;Earlier I said &lt;code&gt;list_messages&lt;/code&gt; in &lt;code&gt;ChatMessageStoreProtocol&lt;/code&gt; has no parameters, so we cannot search history. Thus, we cannot use &lt;code&gt;MessageStore&lt;/code&gt; for long-term memory.&lt;/p&gt;

&lt;p&gt;Microsoft Agent Framework has a &lt;code&gt;ContextProvider&lt;/code&gt; class. From its name, it is for context engineering.&lt;/p&gt;

&lt;p&gt;So we should build long-term memory on this class.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;RedisVLSemanticMemory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ContextProvider&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;thread_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;session_tag&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;distance_threshold&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;redis_url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;redis://localhost:6379&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;embedding_model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;BAAI/bge-m3&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;embedding_api_key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;embedding_endpoint&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_thread_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;thread_id&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;semantic_thread&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_session_tag&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;session_tag&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;session_&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;uuid4&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_distance_threshold&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;distance_threshold&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_redis_url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;redis_url&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_embedding_model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;embedding_model&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_embedding_api_key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;embedding_api_key&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;EMBEDDING_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_embedding_endpoint&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;embedding_endpoint&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;EMBEDDING_ENDPOINT&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_init_semantic_store&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;ContextProvider&lt;/code&gt; has two methods &lt;code&gt;invoked&lt;/code&gt; and &lt;code&gt;invoking&lt;/code&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;invoked&lt;/code&gt; runs after LLM call. It stores the latest messages in RedisVL. It has both &lt;code&gt;request_message&lt;/code&gt; and &lt;code&gt;response_messages&lt;/code&gt; parameters but stores them separately.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;invoking&lt;/code&gt; runs before LLM call. It uses the user’s current input to search for relevant history in RedisVL and returns a &lt;code&gt;Context&lt;/code&gt; object.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The &lt;code&gt;Context&lt;/code&gt; object has three variables.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;instructions&lt;/code&gt; string. The agent adds this to the system prompt.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;messages&lt;/code&gt; list. Put history messages found in long-term memory here.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;tools&lt;/code&gt; list for functions. The agent adds these tools to its &lt;code&gt;ChatOptions&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fanoep9ppsgwxkvqrpf5h.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fanoep9ppsgwxkvqrpf5h.png" alt="The purpose of the three types of messages retrieved. Image by Author" width="720" height="139"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Since we want to use vector search to get relevant history, we put those messages in &lt;code&gt;messages&lt;/code&gt;. The order between &lt;code&gt;MessageStore&lt;/code&gt; messages and &lt;code&gt;ContextProvider&lt;/code&gt; messages matters. Here is the order of their calls.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftbiqyfu1xp0kqbrrikkv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftbiqyfu1xp0kqbrrikkv.png" alt="The calling order of long-term and short-term memory in the agent. Image by Author" width="720" height="259"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Setting up a TextVectorizer
&lt;/h3&gt;

&lt;p&gt;Semantic vector search needs embeddings. We must set up a vectorizer.&lt;/p&gt;

&lt;p&gt;In &lt;code&gt;__init__&lt;/code&gt; besides &lt;code&gt;thread_id&lt;/code&gt; and &lt;code&gt;session_tag&lt;/code&gt; we set the embedding model info.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;RedisVLSemanticMemory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ContextProvider&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="bp"&gt;...&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_init_semantic_store&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_embedding_api_key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;vectorizer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;HFTextVectorizer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_embedding_model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;vectorizer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAITextVectorizer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_embedding_model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;api_config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;api_key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_embedding_api_key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;base_url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_embedding_endpoint&lt;/span&gt;
                &lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_semantic_store&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SemanticMessageHistory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_thread_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;session_tag&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_session_tag&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;distance_threshold&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_distance_threshold&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;redis_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_redis_url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;vectorizer&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;vectorizer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I can choose a server-hosted embedding model with OpenAI API or a local HuggingFace model, depending on whether &lt;code&gt;embedding_api_key&lt;/code&gt; is set.&lt;/p&gt;

&lt;h3&gt;
  
  
  Implement invoked and invoking methods
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;invoked&lt;/code&gt; is easy. As said &lt;code&gt;SemanticMessageHistory&lt;/code&gt; stores request and response separately. I merge them into one list, then call &lt;code&gt;add_messages&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;RedisVLSemanticMemory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ContextProvider&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="bp"&gt;...&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;invoked&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;request_messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ChatMessage&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;Sequence&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;ChatMessage&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="n"&gt;response_messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ChatMessage&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;Sequence&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;ChatMessage&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;invoke_exception&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request_messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ChatMessage&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;request_messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;request_messages&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response_messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ChatMessage&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;response_messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;response_messages&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;chat_messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;request_messages&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;response_messages&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_to_redis_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;chat_messages&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_semantic_store&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_messages&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;session_tag&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_session_tag&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;invoking&lt;/code&gt; below:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;RedisVLSemanticMemory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ContextProvider&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="bp"&gt;...&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;invoking&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ChatMessage&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;MutableSequence&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;ChatMessage&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ChatMessage&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="c1"&gt;# 1
&lt;/span&gt;            &lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;
                            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
        &lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_semantic_store&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_relevant&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;raw&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;session_tag&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_session_tag&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sorted&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;timestamp&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="c1"&gt;# 2
&lt;/span&gt;        &lt;span class="n"&gt;relevant_messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_back_to_chat_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                             &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;relevant_messages&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;relevant_messages&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;# 3
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Points to note.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The &lt;code&gt;messages&lt;/code&gt; parameter may be a list for multi-modal input. Merge all text.&lt;/li&gt;
&lt;li&gt;Since messages are stored separately, I need to sort them by timestamp to keep order.&lt;/li&gt;
&lt;li&gt;Put the retrieved messages into &lt;code&gt;Context.messages&lt;/code&gt; so they go to the end of the current chat messages.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Test semantic memory
&lt;/h3&gt;

&lt;p&gt;Unlike message store, we can set &lt;code&gt;ContextProvider&lt;/code&gt; directly in the agent.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;memory_provider&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;RedisVLSemanticMemory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;session_tag&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user_abc&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;distance_threshold&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAILikeChatClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;Qwen3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NEXT&lt;/span&gt;
&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;create_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;assistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;instructions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;re a little helper who answers my questions in one sentence.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;context_providers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;memory_provider&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now a &lt;code&gt;main&lt;/code&gt; with a &lt;code&gt;thread&lt;/code&gt; instance to keep short-term memory while testing multi-turn dialog.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;thread&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_new_thread&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;user_input&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;input&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;User: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;startswith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;exit&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="k"&gt;break&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;thread&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;thread&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;memory_provider&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;clear&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Test result:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0dl4vlwzwl7cxkypuya8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0dl4vlwzwl7cxkypuya8.png" alt="The  raw `distance_threshold` endraw  is too high, causing irrelevant messages to be retrieved. Image by Author" width="622" height="216"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It seems the default value of &lt;code&gt;distance_threshold&lt;/code&gt; 0.3 is too high. Let's set it lower:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;memory_provider&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;RedisVLSemanticMemory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;session_tag&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user_abc&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;distance_threshold&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Test again:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdfbxjjkc9a0eplhdu1kn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdfbxjjkc9a0eplhdu1kn.png" alt="Only request messages were retrieved, not response messages. Image by Author" width="720" height="234"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Lower threshold stops unrelated messages. But since requests and responses are stored separately, only requests are found. ContextProvider puts retrieved messages at the end of the message list. The LLM may think the user asked two questions. MLFlow shows it.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmhznpnvj5gky9o7nunjj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmhznpnvj5gky9o7nunjj.png" alt="Two similar questions were both added to the message list, but without attaching the already provided answers. Image by Author" width="720" height="373"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is bad. We care more about the LLM’s answers than the requests. But vector search often finds the questions, not the answers. This just adds useless questions and does not help the LLM answer.&lt;/p&gt;

&lt;p&gt;Hard to say if the fault is Microsoft Agent Framework or RedisVL.&lt;/p&gt;

&lt;p&gt;When &lt;code&gt;ContextProviderlong-term&lt;/code&gt; finds related chat messages, they go after the ones from message store. If long-term and short-term messages repeat, they can confuse the LLM.&lt;/p&gt;

&lt;p&gt;Also, RedisVL not storing requests and responses together is a choice I do not like. LLM responses cost more. In production, a response may involve web search, RAG retrieval, or running code. But vector search finds just the request, not the answer. That is a waste.&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Today, we tried using RedisVL for short-term and long-term memory in Microsoft Agent Framework and checked the results.&lt;/p&gt;

&lt;p&gt;RedisVL is very handy for short-term agent memory. It is simpler than using the Redis API.&lt;/p&gt;

&lt;p&gt;But &lt;code&gt;SemanticMessageHistory&lt;/code&gt; for semantic search of the user history did not perform well. I explained why.&lt;/p&gt;

&lt;p&gt;Thanks to the solid Redis infrastructure, semantic caches with RedisVL are simpler than other vector solutions.&lt;/p&gt;

&lt;p&gt;Next time, I will show you a semantic cache with RedisVL to save big costs for your company.&lt;/p&gt;

&lt;p&gt;Share your thoughts in the comments.&lt;/p&gt;

&lt;p&gt;👉 &lt;a href="https://www.dataleadsfuture.com/build-long-term-and-short-term-memory-for-agents-using-redisvl/#/portal/signup" rel="noopener noreferrer"&gt;&lt;strong&gt;Subscribe to my blog&lt;/strong&gt;&lt;/a&gt; to follow my latest agent app work.&lt;/p&gt;

&lt;p&gt;And share this article with friends. Maybe it will help more people.😁&lt;/p&gt;




&lt;p&gt;Enjoyed this read? &lt;a href="https://www.dataleadsfuture.com/#/portal/signup" rel="noopener noreferrer"&gt;&lt;strong&gt;Subscribe now to get more cutting-edge data science tips straight to your inbox!&lt;/strong&gt;&lt;/a&gt; Your feedback and questions are welcome — let’s discuss in the comments below!&lt;/p&gt;

&lt;p&gt;This article was originally published on &lt;a href="https://www.dataleadsfuture.com/build-long-term-and-short-term-memory-for-agents-using-redisvl/" rel="noopener noreferrer"&gt;Data Leads Future&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>programming</category>
      <category>ai</category>
      <category>datascience</category>
      <category>redis</category>
    </item>
    <item>
      <title>🎯As companies keep rolling out agent systems internally, issues like prompt injection attacks and tricking agents into breaking the rules are starting to pop up. 

It’s becoming urgent to build compliance safeguards for these agents.</title>
      <dc:creator>Peng Qian</dc:creator>
      <pubDate>Fri, 09 Jan 2026 11:12:51 +0000</pubDate>
      <link>https://dev.to/qtalen/as-companies-keep-rolling-out-agent-systems-internally-issues-like-prompt-injection-attacks-and-24j</link>
      <guid>https://dev.to/qtalen/as-companies-keep-rolling-out-agent-systems-internally-issues-like-prompt-injection-attacks-and-24j</guid>
      <description>&lt;div class="ltag__link"&gt;
  &lt;a href="/qtalen" class="ltag__link__link"&gt;
    &lt;div class="ltag__link__pic"&gt;
      &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2927908%2F838177c1-066e-4986-a3ba-d764afa88632.jpg" alt="qtalen"&gt;
    &lt;/div&gt;
  &lt;/a&gt;
  &lt;a href="https://dev.to/qtalen/microsoft-agent-framework-maf-middleware-basics-add-compliance-fences-to-your-agent-37o0" class="ltag__link__link"&gt;
    &lt;div class="ltag__link__content"&gt;
      &lt;h2&gt;Microsoft Agent Framework (MAF) Middleware Basics: Add Compliance Fences to Your Agent&lt;/h2&gt;
      &lt;h3&gt;Peng Qian ・ Jan 9&lt;/h3&gt;
      &lt;div class="ltag__link__taglist"&gt;
        &lt;span class="ltag__link__tag"&gt;#programming&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#ai&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#tutorial&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#agentaichallenge&lt;/span&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/a&gt;
&lt;/div&gt;


</description>
      <category>programming</category>
      <category>ai</category>
      <category>tutorial</category>
      <category>agentaichallenge</category>
    </item>
    <item>
      <title>Microsoft Agent Framework (MAF) Middleware Basics: Add Compliance Fences to Your Agent</title>
      <dc:creator>Peng Qian</dc:creator>
      <pubDate>Fri, 09 Jan 2026 10:22:10 +0000</pubDate>
      <link>https://dev.to/qtalen/microsoft-agent-framework-maf-middleware-basics-add-compliance-fences-to-your-agent-37o0</link>
      <guid>https://dev.to/qtalen/microsoft-agent-framework-maf-middleware-basics-add-compliance-fences-to-your-agent-37o0</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Microsoft added a middleware feature to their Agent Framework (MAF). This means you can use the chain-of-responsibility design pattern to add extra logic before or after the agent runs, function tools are called, or when the LLM is invoked, without changing the original business logic.&lt;/p&gt;

&lt;p&gt;This feature matters a lot.&lt;/p&gt;

&lt;p&gt;When building enterprise-level agent applications, teams typically collaborate across departments. Besides your own part, you might need to dynamically include permissions, logs, finance checks, and compliance reviews from other teams.&lt;/p&gt;

&lt;p&gt;These parts shouldn’t affect the agent’s core ability, but should be easy to install or remove. Like middleware in FastAPI or other web frameworks, MAF middleware enables this capability for agents as well.&lt;/p&gt;

&lt;p&gt;In today’s guide, I’ll show how I use MAF’s middleware and AG-UI to add a compliance review that checks user input before sending it to the agent.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvjks1u0jkkizzwsnitsy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvjks1u0jkkizzwsnitsy.png" alt="Inducing an agent will be blocked by compliance rules specific to certain business scenarios. Image by Author" width="800" height="548"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This will teach you how to use middleware in enterprise agent applications and give you a first look at using AG-UI for microservice distributed agent development. Let’s start.&lt;/p&gt;

&lt;p&gt;You can get all the source code at the end. 👇&lt;/p&gt;




&lt;p&gt;📫 &lt;a href="https://www.dataleadsfuture.com/microsoft-agent-framework-maf-middleware-basics-add-compliance-fences-to-your-agent/#/portal/signup" rel="noopener noreferrer"&gt;&lt;strong&gt;Don’t forget to follow my blog to stay updated on my latest progress in AI application practices.&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  System Setup
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Install the latest Microsoft Agent Framework
&lt;/h3&gt;

&lt;p&gt;MAF is still updating quickly. Since APIs change a lot, this guide uses the newest version. It’s better to install the prerelease version.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;agent-framework &lt;span class="nt"&gt;--pre&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or add the dependency in your &lt;code&gt;pyproject.toml&lt;/code&gt; file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tex"&gt;&lt;code&gt;"agent-framework-ag-ui&amp;gt;=1.0.0b251223"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Install Microsoft Agent Framework AG-UI
&lt;/h3&gt;

&lt;p&gt;MAF works with AG-UI to support distributed agent development. You’ll need this capability today, so install the latest version of ag-ui; otherwise, APIs won’t match up.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tex"&gt;&lt;code&gt;"agent-framework-ag-ui&amp;gt;=1.0.0b251223"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After installing the needed Python packages, we can move on. First, let's get a quick background on what middleware is and what it can do.&lt;/p&gt;




&lt;h2&gt;
  
  
  Quick Intro to MAF Middleware
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What is middleware
&lt;/h3&gt;

&lt;p&gt;According to the MAF documentation:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Middleware in the Agent Framework intercepts, changes, and enhances &amp;gt;agent behavior at different execution points. You can use it for logging, &amp;gt;security checks, error handling, and result transformation without &amp;gt;changing the agent’s or function’s core logic.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That’s what we’ll learn today.&lt;/p&gt;

&lt;h3&gt;
  
  
  How middleware works
&lt;/h3&gt;

&lt;p&gt;As I said before, MAF middleware uses the chain-of-responsibility pattern. Each piece of logic lives in its own node. Every node knows the next one. When a node finishes running, it passes control to the next node.&lt;/p&gt;

&lt;p&gt;Here’s a simple example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;logging_agent_middleware&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;AgentRunContext&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nb"&gt;next&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Callable&lt;/span&gt;&lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="n"&gt;AgentRunContext&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;Awaitable&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[Agent] Starting execution&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;next&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[Agent] Execution completed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The next parameter points to the next node. You can run code before or after calling it.&lt;/p&gt;

&lt;p&gt;The actual agent logic acts as the last node. After all middleware nodes finish, the agent runs.&lt;/p&gt;

&lt;p&gt;In MAF, middleware can run in three stages:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Before or after run or run_stream.&lt;/li&gt;
&lt;li&gt;Before or after a function call.&lt;/li&gt;
&lt;li&gt;Before or after calling the LLM.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn0ahstals97y7enjrmbj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn0ahstals97y7enjrmbj.png" alt="The Microsoft Agent Framework middleware works at different stages of agent execution. Image by Author" width="735" height="360"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now let’s look at how different middleware types work.&lt;/p&gt;

&lt;h3&gt;
  
  
  Function-Based middleware
&lt;/h3&gt;

&lt;p&gt;If your middleware is simple, like just logging agent runs, use a function-based middleware.&lt;/p&gt;

&lt;p&gt;You only need a function with two parameters: context and next. The context keeps your runtime info, and next calls the next node.&lt;/p&gt;

&lt;p&gt;MAF uses the type annotation of context to tell which stage this code belongs to. For example, if it runs at the agent stage, the type should be AgentRunContext:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;logging_agent_middleware&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;AgentRunContext&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nb"&gt;next&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Callable&lt;/span&gt;&lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="n"&gt;AgentRunContext&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;Awaitable&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[Agent] Starting execution&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;next&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[Agent] Execution completed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For a function call stage, use FunctionInvocationContext:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;logging_function_middleware&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;FunctionInvocationContext&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nb"&gt;next&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Callable&lt;/span&gt;&lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="n"&gt;FunctionInvocationContext&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;Awaitable&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;]],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[Function] Calling &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;function&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;next&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[Function] &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;function&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; completed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And for the chat stage, use ChatContext:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;logging_chat_middleware&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ChatContext&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nb"&gt;next&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Callable&lt;/span&gt;&lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="n"&gt;ChatContext&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;Awaitable&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;]],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[Chat] Sending &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; messages to AI.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;next&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[Chat] AI response received.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you dislike type annotations, you can use decorators.&lt;/p&gt;

&lt;p&gt;@agent_middleware runs at the agent stage:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@agent_middleware&lt;/span&gt;    
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;logging_agent_middleware&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;next&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[Agent] Starting execution&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;next&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[Agent] Execution completed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then you don’t need to add type annotations anymore.&lt;/p&gt;

&lt;p&gt;There are also @function_middleware and @chat_middleware for function calls and chat calls.&lt;/p&gt;

&lt;p&gt;If your middleware needs to save state or handle more complex logic, function-based won’t be enough. Use class-based middleware.&lt;/p&gt;

&lt;h3&gt;
  
  
  Class-Based middleware
&lt;/h3&gt;

&lt;p&gt;Class-based middleware organizes code with object-oriented methods. That lets middleware remember state and handle tricky logic.&lt;/p&gt;

&lt;p&gt;A class-based middleware must meet two rules:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Inherit from the right base class: AgentMiddleware, FunctionMiddleware, or ChatMiddleware.&lt;/li&gt;
&lt;li&gt;Have a process method with the same parameters as the function-based ones. They use the same contexts.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Here’s an example for a middleware class that runs at the function call stage:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;LoggingFunctionMiddleware&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;FunctionMiddleware&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;process&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;FunctionInvocationContext&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="nb"&gt;next&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Callable&lt;/span&gt;&lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="n"&gt;FunctionInvocationContext&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;Awaitable&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[Function Class] Calling &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;function&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;next&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[Function Class] &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;function&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; completed.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Just make sure to pair the right base class with the right context type. The others follow the same rule.&lt;/p&gt;

&lt;h3&gt;
  
  
  How to use middleware
&lt;/h3&gt;

&lt;p&gt;There are three stages for middleware and three ways to build it. Let’s put that in one grid chart to see how they connect.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3g7tn6xumfcl4srx01ej.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3g7tn6xumfcl4srx01ej.png" alt="Use a grid chart to describe the implementations of different middleware. Image by Author" width="771" height="391"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The framework now only supports passing middleware when creating the agent:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;chat_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;assistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;instructions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a helpful assistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;get_weather&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;middleware&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="n"&gt;logging_agent_middleware&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="nc"&gt;LoggingFunctionMiddleware&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;  &lt;span class="n"&gt;logging_chat_middleware&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;blocking_middleware&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;logging_function_middleware&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can mix all nine types freely.&lt;/p&gt;

&lt;p&gt;But note that &lt;strong&gt;only the last function middleware you add actually works right now. I’m not sure if that’s a bug, but we’ll find out later.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Project Practice: Add Compliance Check to Your Agent
&lt;/h2&gt;

&lt;p&gt;Now let’s get hands-on. I’ll show how to use MAF middleware to add compliance checking to an agent.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why add compliance checks
&lt;/h3&gt;

&lt;p&gt;Every LLM already has basic compliance setups built in based on local laws. When companies self-host LLMs, they also add custom checks in frameworks like vLLM. But those only watch the model’s input or output.&lt;/p&gt;

&lt;p&gt;Now that agents are everywhere, we also need checks at the agent level: preventing prompt injection, checking MCP permissions, and so on. Middleware makes this possible.&lt;/p&gt;

&lt;p&gt;In today’s demo, we’ll review every user message to make sure no one tries to make our finance assistant promise investment returns.&lt;/p&gt;

&lt;p&gt;In the end, the agent will refuse to answer questions like “Will I lose money?” or “Can you guarantee profit?”&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc2uk14uy0lkxmavhew86.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc2uk14uy0lkxmavhew86.png" alt="Inducing an agent will be blocked by compliance rules specific to certain business scenarios. Image by Author" width="800" height="548"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  How will you do it
&lt;/h3&gt;

&lt;p&gt;Why use compliance checks as an example? Because in real web apps, product teams don’t manage compliance themselves. The compliance department creates the rules and sends them as microservices to each product.&lt;/p&gt;

&lt;p&gt;That way, teams don’t touch those rules. They just plug them in using framework middleware. It’s common in normal web apps.&lt;/p&gt;

&lt;p&gt;We’ll do the same with MAF agents, using middleware to insert compliance logic.&lt;/p&gt;

&lt;p&gt;To simulate real setups, this project has two parts: one server and one client.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5kyxv1p1e9x322w4yq8i.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5kyxv1p1e9x322w4yq8i.png" alt="The compliance check middleware will include both server and client modules. Image by Author" width="671" height="422"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;On the compliance department side, we’ll deploy a separate agent that reviews messages. It uses an LLM to check user inputs for prompt injections or non-compliant content.&lt;/p&gt;

&lt;p&gt;On the business side, we’ll have a middleware that intercepts user requests and sends them to that server. It decides whether the agent should respond.&lt;/p&gt;

&lt;p&gt;The two parts communicate using the AG-UI protocol.&lt;/p&gt;

&lt;h3&gt;
  
  
  Server implementation
&lt;/h3&gt;

&lt;p&gt;Let’s build the compliance-checking agent server.&lt;/p&gt;

&lt;p&gt;Since it only checks user requests, I’ll use the Qwen3-30b-a3b-instruct-2507 model for speed.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAILikeChatClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;Qwen3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Q30B_A3B&lt;/span&gt;
&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;create_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Assistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;instructions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;dedent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    You are a compliance review officer. You will review user requests or system-generated text for compliance.
    Your main task is to check user requests and determine whether they are trying to induce the system to produce content that guarantees investment returns or similar topics.

    You should output a JSON text, like {&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;is_compliance&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: 1, &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;reason&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="s"&gt;}

    Here, is_compliance being 1 means compliant, and 0 means non-compliant.

    reason should state the reason for compliance or non-compliance.

    Only output the JSON text without any markdown formatting, and do not add any introduction or explanation.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We’ll make the output structured as JSON for clarity and speed.&lt;/p&gt;

&lt;p&gt;Although MAF supports structured output when using Qwen models:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.dataleadsfuture.com/make-microsoft-agent-frameworks-structured-output-work-with-qwen-and-deepseek-models/" rel="noopener noreferrer"&gt;Make Microsoft Agent Framework’s Structured Output Work With Qwen and DeepSeek Models&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For some reason, it doesn’t work when used as an AG-UI server.&lt;/p&gt;

&lt;p&gt;So we have to tell the format in the prompt.&lt;/p&gt;

&lt;p&gt;Next, use add_agent_framework_fastapi_endpoint from agent_framework_ag_ui to register it with FastAPI.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;FastAPI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AG-UI Server&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;add_agent_framework_fastapi_endpoint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/compliance&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Finally, run it with uvicorn:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;uvicorn&lt;/span&gt;
    &lt;span class="n"&gt;uvicorn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;host&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;0.0.0.0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;port&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;8888&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Middleware implementation
&lt;/h3&gt;

&lt;p&gt;This middleware is more complex, so we’ll use class-based middleware.&lt;/p&gt;

&lt;p&gt;Here’s the full code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ComplianceCheckMiddleware&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ChatMiddleware&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_init_compliant_agent&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="nf"&gt;super&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;process&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ChatContext&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="nb"&gt;next&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Callable&lt;/span&gt;&lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="n"&gt;ChatContext&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;Awaitable&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;]],&lt;/span&gt;
    &lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;check_result&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ReviewResults&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_get_compliance_result&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;check_result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;is_compliance&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_output_result&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;😒We can’t keep providing the service because:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;fill&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;check_result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;reason&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt;

        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;next&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="nd"&gt;@staticmethod&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_output_result&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ChatContext&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;is_streaming&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="c1"&gt;#4
&lt;/span&gt;            &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;output_stream&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;AsyncIterable&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;AgentRunResponseUpdate&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
                &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="nc"&gt;AgentRunResponseUpdate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;contents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;TextContent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;)])&lt;/span&gt;
            &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;output_stream&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AgentRunResponse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;ChatMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;Role&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ASSISTANT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_get_compliance_result&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ChatContext&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;ReviewResults&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;:]&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;#2
&lt;/span&gt;
        &lt;span class="n"&gt;check_result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ReviewResults&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;model_validate_json&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;#3
&lt;/span&gt;        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;check_result&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_init_compliant_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AGUIChatClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;  &lt;span class="c1"&gt;#1
&lt;/span&gt;            &lt;span class="n"&gt;endpoint&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://127.0.0.1:8888/compliance&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;compliance_agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;instructions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You’re a compliance officer, and you review user requests.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A few details to watch:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;_init_compliant_agent creates the AG-UI client but works just like a normal chat client.&lt;/li&gt;
&lt;li&gt;I sent recent user messages for better review accuracy. &lt;strong&gt;But the AgentMiddleware context only holds the latest message. To get the message history, you must use ChatMiddleware.&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Since AG-UI doesn’t support response_format, I parse JSON manually.&lt;/li&gt;
&lt;li&gt;_output_result sends text output if a check fails. It switches based on context.is_streaming.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Now we can make a business agent. Use a bigger model and a normal system prompt; just remember to load the ComplianceCheckMiddleware.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;chat_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAILikeChatClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;Qwen3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NEXT&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;chat_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;chat_assistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;instructions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a helpful assistant. Answer the user&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s question in short and simple words.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;middleware&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;ComplianceCheckMiddleware&lt;/span&gt;&lt;span class="p"&gt;()]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let’s test it with a multi-turn chat client:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;thread&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_new_thread&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;user_input&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;input&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;User: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;startswith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;exit&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="k"&gt;break&lt;/span&gt;
        &lt;span class="n"&gt;stream&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run_stream&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;thread&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;thread&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Assistant: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;end&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;flush&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You’ll see the agent chats normally most of the time.&lt;/p&gt;

&lt;p&gt;If you ask about guaranteed returns, it refuses to answer but continues working fine afterward.&lt;/p&gt;

&lt;p&gt;Task done.&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;In this guide, we explored how middleware works in Microsoft Agent Framework.&lt;/p&gt;

&lt;p&gt;Middleware lets us add new logic for logging, permissions, or compliance without touching the main agent code or prompt text.&lt;/p&gt;

&lt;p&gt;In the project section, I used class-based middleware to show how to review user inputs for compliance.&lt;/p&gt;

&lt;p&gt;We also took a quick look at AG-UI for building agent microservices. This helps when many teams need to make agents collaborate, and I’ll cover AG-UI and A2A in detail later.&lt;/p&gt;

&lt;p&gt;If you have questions or want to learn more, leave a comment.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://www.dataleadsfuture.com/microsoft-agent-framework-maf-middleware-basics-add-compliance-fences-to-your-agent/#/portal/signup" rel="noopener noreferrer"&gt;Don’t forget to subscribe to my blog&lt;/a&gt; and share this article with your friends—maybe it’ll help someone build smarter agents 😁.&lt;/strong&gt;&lt;/p&gt;




&lt;p&gt;Enjoyed this read? &lt;a href="https://www.dataleadsfuture.com/#/portal/signup" rel="noopener noreferrer"&gt;&lt;strong&gt;Subscribe now to get more cutting-edge data science tips straight to your inbox!&lt;/strong&gt;&lt;/a&gt; Your feedback and questions are welcome — let’s discuss in the comments below!&lt;/p&gt;

&lt;p&gt;This article was originally published on &lt;a href="https://www.dataleadsfuture.com/microsoft-agent-framework-maf-middleware-basics-add-compliance-fences-to-your-agent/" rel="noopener noreferrer"&gt;Data Leads Future&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>programming</category>
      <category>ai</category>
      <category>tutorial</category>
      <category>agentaichallenge</category>
    </item>
    <item>
      <title>My Agent System Looks Powerful but Is Just Industrial Trash</title>
      <dc:creator>Peng Qian</dc:creator>
      <pubDate>Tue, 30 Dec 2025 06:42:22 +0000</pubDate>
      <link>https://dev.to/qtalen/my-agent-system-looks-powerful-but-is-just-industrial-trash-d10</link>
      <guid>https://dev.to/qtalen/my-agent-system-looks-powerful-but-is-just-industrial-trash-d10</guid>
      <description>&lt;p&gt;This weekend note is a bit late because Phase One of my Deep Data Analyst project failed for now. That means I can’t continue the promised Data Analyst Agent tutorial.&lt;/p&gt;




&lt;h2&gt;
  
  
  What Happened?
&lt;/h2&gt;

&lt;p&gt;I actually built a single-agent data analysis assistant based on the ReAct pattern.&lt;/p&gt;

&lt;p&gt;This assistant could take a user’s analysis request, come up with a reasonable hypothesis, run EDA and modeling on the uploaded dataset, give professional business insights and actionable suggestions, and even create charts to back up its points.&lt;/p&gt;

&lt;p&gt;If you’re curious about how it worked, here’s a screenshot that shows how cool it looked:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa5l13qs3ahwrox8hbb9m.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa5l13qs3ahwrox8hbb9m.gif" alt="The cool effects of my data analysis agent. Image by Author" width="600" height="522"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After all, this was just a single-agent app. It wasn’t that hard to build. If you remember, I explained how I used a ReAct agent to solve the Advent of Code challenges. Here’s that tutorial:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.dataleadsfuture.com/how-i-crushed-advent-of-code-and-solved-hard-problems-using-autogen-jupyter-executor-and-qwen3/" rel="noopener noreferrer"&gt;How I Crushed Advent of Code And Solved Hard Problems Using Autogen Jupyter Executor and Qwen3&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you tweak that agent’s prompt a bit, you can get the same kind of data analysis ability I’m talking about.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Do I Call It a Failure?
&lt;/h2&gt;

&lt;p&gt;Because my agent, like most that AI hobbyists build, is just one of those:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Perfect for impressing your boss with a beautiful, powerful prototype, but once real users try it, it suddenly breaks down and becomes industrial trash.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Why Do I Say That?
&lt;/h2&gt;

&lt;p&gt;My agent has two serious problems.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. Very poor robustness
&lt;/h3&gt;

&lt;p&gt;This is the top feedback I got after giving it to analyst users.&lt;/p&gt;

&lt;p&gt;If you try it once, it looks amazing. It uses methods and technical skills beyond a regular analyst to give you a very professional argument. You’d think replacing humans with AI was the smartest move you've ever made.&lt;/p&gt;

&lt;p&gt;But data analysis is about testing cause and effect over time. You must run the same analysis daily or weekly to see if the assistant’s advice actually works.&lt;/p&gt;

&lt;p&gt;Even with the same question, the agent changes its hypotheses and analysis methods each run. It then gives different advice each time.&lt;/p&gt;

&lt;p&gt;That’s what I mean by poor stability and consistency.&lt;/p&gt;

&lt;p&gt;Imagine you ask it to use an RFM model to segment your users and give marketing suggestions. Before a campaign, it uses features A, B, C and makes five levels for each. After the campaign, it suddenly adds a derived metric D and now segments on A, B, C, D.&lt;/p&gt;

&lt;p&gt;You couldn’t even run an A/B test properly.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. It suffers from context position bias
&lt;/h3&gt;

&lt;p&gt;If you’ve read my earlier posts, you know my Data Analyst agent runs code through a stateful Jupyter Kernel-based interpreter.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.dataleadsfuture.com/exclusive-reveal-code-sandbox-tech-behind-manus-and-claude-agent-skills/" rel="noopener noreferrer"&gt;Exclusive Reveal: Code Sandbox Tech Behind Manus and Claude Agent Skills&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This lets the agent act like a human analyst, first making a hypothesis, running code in a Jupyter notebook to test it, and then coming up with a new hypothesis based on results — iterating over and over.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F81pwwirqivgr8yt7itwg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F81pwwirqivgr8yt7itwg.png" alt="Agents based on the ReAct mode will perform EDA like human analysts. Image by Author" width="800" height="411"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This gives the agent strong autonomous exploration and error-recovery skills.&lt;/p&gt;

&lt;p&gt;But here’s the problem. In a past post, I mentioned that LLMs have position bias when dealing with long conversation histories:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.dataleadsfuture.com/fixing-the-agent-handoff-problem-in-llamaindexs-agentworkflow-system/" rel="noopener noreferrer"&gt;Fixing the Agent Handoff Problem in LlamaIndex's AgentWorkflow System&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In short, LLMs don’t treat each message fairly. They don’t weight importance by recency like you think they would.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fviuoux88uoerrz4wg2ph.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fviuoux88uoerrz4wg2ph.png" alt="LLMs do not assign weights to message history as people might think. Arxiv 2404.01430" width="575" height="295"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As we keep making and testing hypotheses, the history grows. Each message in it matters. The first shows the data structure, a later one proves a hypothesis wrong, so we skip it next time — all important.&lt;/p&gt;

&lt;p&gt;The LLM doesn’t see it this way. As the process goes, it starts focusing on wrong messages while ignoring the ones that have been fixed. So it repeats mistakes.&lt;/p&gt;

&lt;p&gt;This either wastes tokens and time or sends the analysis off-track into another topic. Neither is good.&lt;/p&gt;

&lt;p&gt;So Phase One of my data analysis agent is done.&lt;/p&gt;




&lt;h2&gt;
  
  
  Any Ways to Fix It?
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Build a multi-agent system with atomic skills
&lt;/h3&gt;

&lt;p&gt;For robustness, you’d probably think of using a Context Engineer to lock in the plan and metric definitions before analysis starts.&lt;/p&gt;

&lt;p&gt;Also, when an analysis works well, we should save the plan and prior assumptions in long-term memory.&lt;/p&gt;

&lt;p&gt;Both mean giving the agent new skills.&lt;/p&gt;

&lt;p&gt;But remember, my agent is based on ReAct, which means its prompt is already huge — over a thousand lines now.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F481wcyxlly81c4mmdvdg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F481wcyxlly81c4mmdvdg.png" alt="Agents based on the ReAct pattern are often too complex to debug. Image by Author" width="800" height="307"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Adding anything risks breaking this fragile system and disrupting prompt-following.&lt;/p&gt;

&lt;p&gt;So a single agent won’t cut it. We should split the system into multiple agents with atomic skills, then use some orchestration to bring them together.&lt;/p&gt;

&lt;p&gt;We can imagine this multi-agent app as a coordinate system with at least these agents:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Issue Clarification Agent&lt;/strong&gt; — asks the user questions to clarify the problem, confirm metrics, and scope.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Retrieval Agent&lt;/strong&gt; — pulls metric definitions and calculation methods from a knowledge base, plus analysis methods written by real data scientists.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Planner Agent&lt;/strong&gt; — proposes prior hypotheses, sets an analysis approach, and makes a full plan to keep later agents on track.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Analyst Agent&lt;/strong&gt; — breaks the plan into steps, uses Python to execute them, and tests the prior hypotheses.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Storyteller Agent&lt;/strong&gt; — turns complex technical results into engaging business stories and actionable advice for decision-makers.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Validator Agent&lt;/strong&gt; — ensures the whole process is correct, reliable, and business-compliant.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Orchestrator Agent&lt;/strong&gt; — manages all the agents and assigns tasks.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F989qofuuums3592yvdsm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F989qofuuums3592yvdsm.png" alt="My new design for the multi-agent data analyst. Image by Author" width="800" height="344"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Choose the right agent framework
&lt;/h3&gt;

&lt;p&gt;We need an agent framework that supports message passing. When a new task comes up or an agent finishes, a message should go to the orchestrator. The orchestrator should also send tasks by message.&lt;/p&gt;

&lt;p&gt;The framework should support context state saving. Agents’ intermediate results should go to the context, not all to the LLM, so position bias doesn’t get in the way.&lt;/p&gt;

&lt;p&gt;If you ask GPT, it will recommend LangGraph and Autogen.&lt;/p&gt;

&lt;p&gt;I’d skip LangGraph. Even though its workflow is fine, its agents still run on LangChain, which I just don’t like.&lt;/p&gt;

&lt;p&gt;When people compare Autogen with others, they say Autogen is better for research-heavy tasks like data analysis that need more autonomy.&lt;/p&gt;

&lt;p&gt;But Autogen’s Selector Group Chat, while good for orchestrators, can’t manage message history well. You can’t control what goes to the LLM, and orchestration is a black box.&lt;/p&gt;

&lt;p&gt;Autogen’s GraphFlow is also half-baked. Workflow only supports agent nodes and no context state management.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.dataleadsfuture.com/i-used-autogen-graphflow-and-qwen3-coder-to-solve-math-problems-and-it-worked/" rel="noopener noreferrer"&gt;I Used Autogen GraphFlow and Qwen3 Coder to Solve Math Problems — And It Worked&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The bigger risk: Autogen has stopped development. For a 50k-star agent framework, that’s a shame.&lt;/p&gt;

&lt;h3&gt;
  
  
  What about Microsoft Agent Framework (MAF)?
&lt;/h3&gt;

&lt;p&gt;I like it. Easy to use, takes good ideas from earlier frameworks, and avoids their mistakes.&lt;/p&gt;

&lt;p&gt;I’m ready to use it with Qwen3 and DeepSeek:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.dataleadsfuture.com/make-microsoft-agent-frameworks-structured-output-work-with-qwen-and-deepseek-models/" rel="noopener noreferrer"&gt;Make Microsoft Agent Framework’s Structured Output Work With Qwen and DeepSeek Models&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I’m studying MAF’s Workflow feature now. It’s nice: multiple node types, context state management, OpenTelemetry observability, and orchestration modes like Switch-Case and Multi-Selection. It has almost everything I want.&lt;/p&gt;

&lt;p&gt;It also feels ambitious. With new abilities like MCP, A2A, AG-UI, and Microsoft backing it, MAF should have a better long-term future than Autogen.&lt;/p&gt;




&lt;h2&gt;
  
  
  My Next Steps
&lt;/h2&gt;

&lt;p&gt;I’m reading MAF’s user guide and source now. I’ll start using it in my agent system.&lt;/p&gt;

&lt;p&gt;I’m still working on Deep Data Analyst. After switching frameworks, I’ll need to adapt things for a while.&lt;/p&gt;

&lt;p&gt;The good news: a multi-agent system lets me add skills step by step, so I can share and show progress anytime instead of waiting until the whole project is done. 😂&lt;/p&gt;

&lt;p&gt;I also want to explore Workflow’s potential in MAF. I’ll see if it can handle different AI agent design patterns. That will help us understand how to use this promising framework.&lt;/p&gt;

&lt;p&gt;What are you interested in? Leave me a comment.&lt;/p&gt;

&lt;p&gt;Don’t forget to subscribe to my newsletter &lt;a href="https://www.dataleadsfuture.com/my-agent-system-looks-powerful-but-is-just-industrial-trash/#/portal/signup/" rel="noopener noreferrer"&gt;&lt;strong&gt;Mr.Q’s Weekend Notes&lt;/strong&gt;&lt;/a&gt; to get my latest agent research in your inbox without waiting.&lt;/p&gt;

&lt;p&gt;And share my blog with your friends — maybe it can help more people.&lt;/p&gt;




&lt;p&gt;Enjoyed this read? &lt;a href="https://www.dataleadsfuture.com/#/portal/signup" rel="noopener noreferrer"&gt;&lt;strong&gt;Subscribe now to get more cutting-edge data science tips straight to your inbox!&lt;/strong&gt;&lt;/a&gt; Your feedback and questions are welcome — let’s discuss in the comments below!&lt;/p&gt;

&lt;p&gt;This article was originally published on &lt;a href="https://www.dataleadsfuture.com/my-agent-system-looks-powerful-but-is-just-industrial-trash/" rel="noopener noreferrer"&gt;Data Leads Future&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>programming</category>
      <category>ai</category>
      <category>datascience</category>
      <category>agents</category>
    </item>
    <item>
      <title>Make Microsoft Agent Framework’s Structured Output Work With Qwen and DeepSeek Models</title>
      <dc:creator>Peng Qian</dc:creator>
      <pubDate>Mon, 15 Dec 2025 02:33:06 +0000</pubDate>
      <link>https://dev.to/qtalen/make-microsoft-agent-frameworks-structured-output-work-with-qwen-and-deepseek-models-4egj</link>
      <guid>https://dev.to/qtalen/make-microsoft-agent-frameworks-structured-output-work-with-qwen-and-deepseek-models-4egj</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Today, we’ll add some extra features to the Microsoft Agent Framework so that Qwen and DeepSeek can also utilize structured output.&lt;/p&gt;

&lt;p&gt;The main reason is that Autogen has stayed on version v0.75 for a long time, which makes it necessary to switch to Microsoft Agent Framework soon.&lt;/p&gt;

&lt;p&gt;Every time we switch the agent framework, we have to make it work with some common LLMs. This time is no exception. Luckily, Microsoft Agent Framework is pretty easy to use. We just need to adapt the structured output feature, and we can use it right away.&lt;/p&gt;

&lt;p&gt;As usual, I’ll put the source code at the end of the article for you to use.&lt;/p&gt;




&lt;h2&gt;
  
  
  Background On Structured Output
&lt;/h2&gt;

&lt;h3&gt;
  
  
  How does Agent Framework do structured output?
&lt;/h3&gt;

&lt;p&gt;In Microsoft Agent Framework, we set the &lt;code&gt;response_format&lt;/code&gt; parameter to a Pydantic &lt;code&gt;BaseModel&lt;/code&gt; data class to tell the LLM to produce structured output, like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BaseModel&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;PersonInfo&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Information about a person.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
    &lt;span class="n"&gt;age&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
    &lt;span class="n"&gt;occupation&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;

&lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Please provide information about John Smith, who is a 35-year-old software engineer.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;response_format&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;PersonInfo&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There are two places to set the &lt;code&gt;response_format&lt;/code&gt; parameter:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Set it during the &lt;code&gt;ChatAgent&lt;/code&gt; initialization. This becomes a global parameter for the agent, and all later communications with OpenAI-compatible models use it.&lt;/li&gt;
&lt;li&gt;Set it when calling &lt;code&gt;run&lt;/code&gt; or &lt;code&gt;run_stream&lt;/code&gt;. This works only for that single API call.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The &lt;code&gt;response_format&lt;/code&gt; set in &lt;code&gt;run&lt;/code&gt; or &lt;code&gt;run_stream&lt;/code&gt; is higher priority than the setting in the &lt;code&gt;ChatAgent&lt;/code&gt; creation. That means the &lt;code&gt;response_format&lt;/code&gt; in &lt;code&gt;run&lt;/code&gt; will override what was set when creating the &lt;code&gt;ChatAgent&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu1qvy28862asdwqarwck.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu1qvy28862asdwqarwck.png" alt="The conversion process of the response_format parameter in Microsoft Agent Framework. Image by Author" width="800" height="221"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;By default, we use &lt;code&gt;OpenAIChatClient&lt;/code&gt; to call OpenAI’s API. Before the API call, a &lt;code&gt;_prepare_options&lt;/code&gt; method converts the &lt;code&gt;BaseModel&lt;/code&gt; into &lt;code&gt;{"type": "json_schema", "json_schema": &amp;lt;base model schema&amp;gt;}&lt;/code&gt; and passes it to the LLM.&lt;/p&gt;

&lt;p&gt;So that’s how Agent Framework makes the LLM do structured output. Our extension will go into the &lt;code&gt;_prepare_options&lt;/code&gt; method of &lt;code&gt;OpenAIChatClient&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Do Qwen and DeepSeek support json_schema settings?
&lt;/h3&gt;

&lt;p&gt;According to the official docs, both Qwen and DeepSeek support structured output. But they only support setting the OpenAI client’s &lt;code&gt;response_format&lt;/code&gt; to &lt;code&gt;{"type": "json_object"}&lt;/code&gt; and require the keyword &lt;code&gt;json&lt;/code&gt; in the prompt to enable structured output. They do not support OpenAI’s API way of setting &lt;code&gt;response_format&lt;/code&gt; to &lt;code&gt;json_schema&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;If we don’t extend the Microsoft Agent Framework and force &lt;code&gt;response_format&lt;/code&gt; to be a &lt;code&gt;BaseModel&lt;/code&gt; class, we’ll see errors like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Error code: 400 - {'error': {'message': "&amp;lt;400&amp;gt; InternalError.Algo.InvalidParameter: 'messages' must contain the word 'json' in some form, to use 'response_format' of type 'json_object'.", 'type': 'invalid_request_error', 'param': None, 'code': 'invalid_parameter_error'}}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So for Qwen and DeepSeek, without modifying the Microsoft Agent Framework, we can’t use the structured output feature.&lt;/p&gt;

&lt;h3&gt;
  
  
  How to make Qwen and DeepSeek output using json_schema
&lt;/h3&gt;

&lt;p&gt;Even though Qwen and DeepSeek don’t support &lt;code&gt;{"type": "json_schema"}&lt;/code&gt;, we can still inject &lt;code&gt;json_schema&lt;/code&gt; into the system prompt so the LLM outputs according to our data class.&lt;/p&gt;

&lt;p&gt;The trick is: before calling the OpenAI API, convert the &lt;code&gt;BaseModel&lt;/code&gt; to its &lt;code&gt;json_schema&lt;/code&gt;, attach it to the system prompt, and send it along.&lt;/p&gt;

&lt;p&gt;If you want to know exactly how I made Qwen output according to a Pydantic BaseModel’s rules, read my popular article where I explain multiple methods for this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.dataleadsfuture.com/build-autogen-agents-with-qwen3-structured-output-thinking-mode/" rel="noopener noreferrer"&gt;Build AutoGen Agents with Qwen3: Structured Output &amp;amp; Thinking Mode&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  How I Extended It
&lt;/h2&gt;

&lt;p&gt;Now, let’s see exactly how to extend Microsoft Agent Framework so Qwen and DeepSeek can do structured output.&lt;/p&gt;

&lt;p&gt;I know you want the answer fast, so here’s the modified code you can use right now:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;typing&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;override&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;MutableSequence&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;textwrap&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;dedent&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;copy&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;deepcopy&lt;/span&gt;

&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pydantic&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;BaseModel&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;agent_framework.openai&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OpenAIChatClient&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;agent_framework&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ChatMessage&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ChatOptions&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;TextContent&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;OpenAILikeChatClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;OpenAIChatClient&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="nd"&gt;@override&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_prepare_options&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;MutableSequence&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;ChatMessage&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
            &lt;span class="n"&gt;chat_options&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ChatOptions&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="n"&gt;chat_options_copy&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;deepcopy&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chat_options&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;# 1
&lt;/span&gt;        &lt;span class="nf"&gt;if &lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;chat_options&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;response_format&lt;/span&gt;
            &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chat_options&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;response_format&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="ow"&gt;and&lt;/span&gt; &lt;span class="nf"&gt;issubclass&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chat_options&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;response_format&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;structured_output_prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_build_structured_prompt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;chat_options&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;response_format&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="c1"&gt;# 2
&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="c1"&gt;# 3
&lt;/span&gt;                &lt;span class="n"&gt;first_message&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
                &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;first_message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="c1"&gt;# 4
&lt;/span&gt;                    &lt;span class="n"&gt;new_system_message&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ChatMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                        &lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;first_message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;structured_output_prompt&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                    &lt;span class="p"&gt;)&lt;/span&gt;
                    &lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;new_system_message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:]]&lt;/span&gt;
                &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                    &lt;span class="n"&gt;new_system_message&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ChatMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt; &lt;span class="c1"&gt;# 5
&lt;/span&gt;                        &lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;system&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;structured_output_prompt&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                    &lt;span class="p"&gt;)&lt;/span&gt;
                    &lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;new_system_message&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

            &lt;span class="n"&gt;chat_options_copy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;response_format&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;type&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;json_object&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nf"&gt;super&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;_prepare_options&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;chat_options_copy&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="nd"&gt;@staticmethod&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_build_structured_prompt&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response_format&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;json_schema&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;response_format&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;model_json_schema&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;structured_output_prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;dedent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
        &lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s"&gt;
        &amp;lt;output-format&amp;gt;&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;
        Your output must adhere to the following JSON schema format,
        without any Markdown syntax, and without any preface or explanation:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;
        &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;json_schema&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;
        &amp;lt;/output-format&amp;gt;
        &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;structured_output_prompt&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;As I said before, both &lt;code&gt;run&lt;/code&gt; and &lt;code&gt;run_stream&lt;/code&gt; call &lt;code&gt;OpenAIChatClient&lt;/code&gt;’s &lt;code&gt;_prepare_options&lt;/code&gt; method, so it’s the best place to extend.&lt;/p&gt;

&lt;p&gt;I marked each part of the code with numbers in the comments so I can explain in order:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The &lt;code&gt;chat_options&lt;/code&gt; object is the parameters you pass to the method. We need to &lt;code&gt;deepcopy&lt;/code&gt; it to a new object because we’re going to change &lt;code&gt;response_format&lt;/code&gt; to &lt;code&gt;{"type": "json_object"}&lt;/code&gt; to work with DeepSeek. Agent Framework still needs the original &lt;code&gt;BaseModel&lt;/code&gt; to convert the returned JSON string back to a data class.&lt;/li&gt;
&lt;li&gt;Then we take the &lt;code&gt;json_schema&lt;/code&gt; from the &lt;code&gt;BaseModel&lt;/code&gt;, turn it into part of the system prompt, and wrap it with &lt;code&gt;xml&lt;/code&gt; tags.&lt;/li&gt;
&lt;li&gt;The original &lt;code&gt;_prepare_options&lt;/code&gt; checks if &lt;code&gt;messages&lt;/code&gt; is empty. We’ll only handle the case where &lt;code&gt;messages&lt;/code&gt; is not empty, meaning the user sends at least a user message.&lt;/li&gt;
&lt;li&gt;If the first message in &lt;code&gt;messages&lt;/code&gt; is a system message, we attach the structured output prompt to the system message, replacing the old system message.&lt;/li&gt;
&lt;li&gt;If the first message is a user message, we create a new system message with just the structured output prompt and put it at the front of the &lt;code&gt;messages&lt;/code&gt; list.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;With this change, Microsoft Agent Framework now supports structured output for Qwen and DeepSeek. Next, let’s test some common cases to make sure it works.&lt;/p&gt;




&lt;h2&gt;
  
  
  Testing the Extension
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Prepare an MLflow server to observe
&lt;/h3&gt;

&lt;p&gt;Before testing, we need a monitoring tool to check the messages Agent Framework sends to the LLM API.&lt;/p&gt;

&lt;p&gt;Agent Framework supports logging platforms based on &lt;code&gt;opentelemetry&lt;/code&gt;, but it doesn’t log system messages by default, so that won’t work for our case today.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fopfygvdqwf51gtatlbju.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fopfygvdqwf51gtatlbju.png" alt="Agent Framework's OpenTelemetry output doesn't log the system message used when calling the LLM. Image by Author" width="800" height="408"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In a previous article, I showed how I use MLflow to see the messages sent to OpenAI’s API:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.dataleadsfuture.com/monitoring-qwen-3-agents-with-mlflow-3-x-end-to-end-tracking-tutorial/" rel="noopener noreferrer"&gt;Monitoring Qwen 3 Agents with MLflow 3.x: End-to-End Tracing Tutorial&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;So today we’ll still use MLflow’s &lt;code&gt;openai.autolog&lt;/code&gt; API, because it can record system messages sent to the LLM.&lt;/p&gt;

&lt;p&gt;You just need to start a &lt;code&gt;server&lt;/code&gt; like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;mlflow server &lt;span class="nt"&gt;--host&lt;/span&gt; 0.0.0.0 &lt;span class="nt"&gt;--port&lt;/span&gt; 5000
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then in the test code, add a call to &lt;code&gt;openai.autolog&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;mlflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_tracking_uri&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;MLFLOW_TRACKING_URI&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;mlflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_experiment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Default&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;mlflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;openai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;autolog&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Test single-turn conversation
&lt;/h3&gt;

&lt;p&gt;First, let’s follow the official docs to test normal structured output.&lt;/p&gt;

&lt;p&gt;Set up a data class, then set it in the &lt;code&gt;run&lt;/code&gt; method:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;PersonInfo&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Information about a person.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
    &lt;span class="n"&gt;age&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
    &lt;span class="n"&gt;occupation&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;


&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Please provide information about John Smith, who is a 35-year-old software engineer.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;response_format&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;PersonInfo&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Check on MLflow:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flk1jadc81nko1ne9jvte.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flk1jadc81nko1ne9jvte.png" alt="The json_schema prompt has already been appended to the system prompt. Image by Author" width="768" height="510"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We can see the data class has been turned into a &lt;code&gt;json_schema&lt;/code&gt; prompt, attached to the system prompt. Also, we can get the structured object directly through &lt;code&gt;response.value&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Test multi-turn conversation
&lt;/h3&gt;

&lt;p&gt;Now let’s test Microsoft Agent Framework’s multi-turn example.&lt;/p&gt;

&lt;p&gt;First, set a &lt;code&gt;response_format&lt;/code&gt;at &lt;code&gt;create_agent&lt;/code&gt;, without setting it in &lt;code&gt;run&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;OutText&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;

&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;instructions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a good assistant.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;assistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;response_format&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;OutText&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;result1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;How many kilometers is the highway from Wuhan to Beijing?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;thread&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;thread&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then use &lt;code&gt;run_stream&lt;/code&gt; for the second turn and set another &lt;code&gt;response_format&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ETA&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;BaseModel&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;hours&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;

&lt;span class="n"&gt;final_response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;AgentRunResponse&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;from_agent_response_generator&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run_stream&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;How long would it take to drive there at 120 km/h?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;thread&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;thread&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;response_format&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ETA&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;output_format_type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;ETA&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;final_response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Check on MLflow:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmpbnxxcc1mkc7cqes38d.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmpbnxxcc1mkc7cqes38d.png" alt="The first round of conversation used the default response_format parameter. Image by Author" width="794" height="461"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffpl9izfn2brpup7kx404.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffpl9izfn2brpup7kx404.png" alt="The second round of conversation switched to the response_format parameter passed into the run_stream method. Image by Author" width="800" height="577"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;No problems at all. The &lt;code&gt;response_format&lt;/code&gt; in &lt;code&gt;run_stream&lt;/code&gt; overrides the one set in &lt;code&gt;create_agent&lt;/code&gt; as expected.&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;With Autogen no longer updated, we’ve started moving to Microsoft Agent Framework.&lt;/p&gt;

&lt;p&gt;During this migration, we extended the Microsoft Agent Framework so Qwen and DeepSeek can use structured output.&lt;/p&gt;

&lt;p&gt;I hope Qwen and DeepSeek’s APIs will one day support setting &lt;code&gt;response_format&lt;/code&gt; to &lt;code&gt;{"type": "json_schema"}&lt;/code&gt; directly, so we wouldn’t have to adapt the framework every time we switch.&lt;/p&gt;

&lt;p&gt;Structured output is just about adding a &lt;code&gt;json_schema&lt;/code&gt; description in the system prompt so the LLM outputs content as we define. So even if you’re not using Microsoft Agent Framework, you can modify things in a similar way.&lt;/p&gt;

&lt;p&gt;That’s it for today’s journey. If you find this tutorial useful, please share it with your friends.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.dataleadsfuture.com/#/portal/signup" rel="noopener noreferrer"&gt;&lt;strong&gt;And feel free to follow my blog so you can keep up with my latest progress in AI Agents.&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;Enjoyed this read? &lt;a href="https://www.dataleadsfuture.com/#/portal/signup" rel="noopener noreferrer"&gt;Subscribe now to get more cutting-edge data science tips straight to your inbox!&lt;/a&gt; Your feedback and questions are welcome — let’s discuss in the comments below!&lt;/p&gt;

&lt;p&gt;This article was originally published on &lt;a href="https://www.dataleadsfuture.com/make-microsoft-agent-frameworks-structured-output-work-with-qwen-and-deepseek-models/" rel="noopener noreferrer"&gt;Data Leads Future&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>programming</category>
      <category>qwen</category>
    </item>
    <item>
      <title>A Quick Guide to Containerizing Agent Applications with Podman</title>
      <dc:creator>Peng Qian</dc:creator>
      <pubDate>Mon, 08 Dec 2025 02:13:24 +0000</pubDate>
      <link>https://dev.to/qtalen/a-quick-guide-to-containerizing-agent-applications-with-podman-344p</link>
      <guid>https://dev.to/qtalen/a-quick-guide-to-containerizing-agent-applications-with-podman-344p</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;For enterprise-level agent applications, the best way to safely run code generated by agents is to use containerization. This isolates the code execution environment from your server’s operating system.&lt;/p&gt;

&lt;p&gt;In a previous article, we built a code interpreter sandbox based on a Jupyter container. We proved that once an agent has access to a stateful code runtime, it gains the ability to solve complex problems and perform data analysis:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.dataleadsfuture.com/exclusive-reveal-code-sandbox-tech-behind-manus-and-claude-agent-skills/" rel="noopener noreferrer"&gt;Exclusive Reveal: Code Sandbox Tech Behind Manus and Claude Agent Skills&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;However, Docker Desktop is off-limits in most enterprises due to its commercial license restrictions.&lt;/p&gt;

&lt;p&gt;Yet our multi-agent development work absolutely depends on a containerized environment. So we must find a suitable alternative. Ideally, one fully compatible with Docker, so it works seamlessly with existing agent frameworks that rely on the Docker client.&lt;/p&gt;

&lt;p&gt;Podman is exactly what we need. Developed by Red Hat, it’s an open-source container management tool that runs on both Mac and Windows. It can fully replace Docker Desktop in your development setup.&lt;/p&gt;

&lt;p&gt;This short post will help you quickly set up Podman so you can start building agent applications with code interpreters right away. If you’d like deeper background knowledge and a more thorough introduction to Podman, I highly recommend the &lt;a href="https://trk.udemy.com/c/4744570/3193861/39854?couponCode=KEEPLEARNING&amp;amp;ref=dataleadsfuture.com" rel="noopener noreferrer"&gt;Podman for the Absolute Beginners - Hands-On DevOps course on Udemy&lt;/a&gt;.&lt;/p&gt;




&lt;h2&gt;
  
  
  Quick Start
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Prerequisites
&lt;/h3&gt;

&lt;p&gt;This guide assumes you’re developing on Windows 11. From what I know, setup on Mac is even simpler.&lt;/p&gt;

&lt;p&gt;Podman runs its host inside a Linux system on WSL2. So before proceeding, make sure you’ve already installed WSL2 on your machine.&lt;/p&gt;

&lt;p&gt;If your company requires a VPN to access the internet, you’ll also need to configure WSL2. To do this, create a &lt;code&gt;.wslconfig&lt;/code&gt; file in your Windows &lt;code&gt;%USERPROFILE%&lt;/code&gt; directory (your user folder) with the following content:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[experimental]
autoMemoryReclaim=gradual  
networkingMode=mirrored
dnsTunneling=true
firewall=true
autoProxy=true
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Install Podman
&lt;/h3&gt;

&lt;p&gt;As mentioned, Podman’s host runs in a WSL2-based Linux environment called &lt;code&gt;podman-machine-default&lt;/code&gt;. So when installing Podman, you actually need to set up this &lt;code&gt;podman-machine-default&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;One approach is to install Podman Desktop first, then create the &lt;code&gt;podman-machine-default&lt;/code&gt; through its interface. But for some reason, this always failed for me—the installer would just hang.&lt;/p&gt;

&lt;p&gt;So I went the other way: I installed podman-machine standalone first, then installed Podman Desktop. Go to Podman’s GitHub Releases page, download the latest &lt;code&gt;.msi&lt;/code&gt; installer, and run it.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2mdqjrrg6l8o7ca38ldw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2mdqjrrg6l8o7ca38ldw.png" alt="Find and download the .msi installer under the latest release. Image by Author" width="800" height="536"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After installation, create your &lt;code&gt;podman-machine-default&lt;/code&gt; host:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;podman machine init &lt;span class="nt"&gt;--rootful&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;true&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Important note: if your agent framework uses the Docker SDK to launch containers, you must set &lt;code&gt;--rootful=true&lt;/code&gt;. I’ll explain why later.&lt;/p&gt;

&lt;p&gt;Next, install Podman Desktop from &lt;a href="https://podman-desktop.io/?ref=dataleadsfuture.com" rel="noopener noreferrer"&gt;here&lt;/a&gt;. Just click “Next” all the way through.&lt;/p&gt;

&lt;p&gt;Once done, open Podman Desktop from your taskbar, go to the &lt;strong&gt;Settings&lt;/strong&gt; tab, and check &lt;strong&gt;Resources&lt;/strong&gt;. You should see the &lt;code&gt;podman-machine&lt;/code&gt; you just created. That means Podman Desktop is ready.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb1oc5ozal6qcxw3e9nii.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fb1oc5ozal6qcxw3e9nii.png" alt="If you see this screen, it means your Podman Desktop is already installed. Image by Author" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;But since your agents will use the Docker SDK to manage containers, you need to enable Docker compatibility in Podman Desktop, as shown below:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7ih2q9p3o83m2y1frboa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7ih2q9p3o83m2y1frboa.png" alt="We use the Docker SDK to manage containers in our agent framework, so make sure Docker Compatibility is turned on. Image by Author" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This setting also enables Podman Compose, which lets you route &lt;code&gt;docker compose&lt;/code&gt; commands to &lt;code&gt;podman compose&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Set up proxy configuration
&lt;/h3&gt;

&lt;p&gt;If your system uses a system-wide proxy, Podman Desktop will automatically pick it up. But here’s the catch: Podman Desktop applies these settings inside the &lt;code&gt;podman-machine&lt;/code&gt; Linux system. Unlike Windows, where bypass domains are separated by semicolons, Linux uses commas. You’ll need to manually adjust this, or your &lt;code&gt;podman-machine&lt;/code&gt; might lose external network access.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvx1s82ad3m14vl58cjvh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvx1s82ad3m14vl58cjvh.png" alt="You should use the Linux path separator here. Image by Author" width="800" height="533"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Create a soft link for the storage path
&lt;/h3&gt;

&lt;p&gt;By default, Podman stores data in &lt;code&gt;%USERPROFILE%\.local\share\containers\podman&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;I really dislike keeping data files on the C drive. I prefer moving them to D drive—for example, &lt;code&gt;D:\Documents\AppData\podman&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Here’s how: copy everything from the C-drive &lt;code&gt;podman&lt;/code&gt; folder to your new D-drive location, delete the original C-drive folder, then create a junction link (remember to replace &lt;code&gt;%USERPROFILE%&lt;/code&gt; with your actual path):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;mklink /j C:&lt;span class="se"&gt;\U&lt;/span&gt;sers&lt;span class="se"&gt;\q&lt;/span&gt;ianpeng&lt;span class="se"&gt;\.&lt;/span&gt;&lt;span class="nb"&gt;local&lt;/span&gt;&lt;span class="se"&gt;\s&lt;/span&gt;hare&lt;span class="se"&gt;\c&lt;/span&gt;ontainers&lt;span class="se"&gt;\p&lt;/span&gt;odman D:&lt;span class="se"&gt;\D&lt;/span&gt;ocuments&lt;span class="se"&gt;\A&lt;/span&gt;ppData&lt;span class="se"&gt;\p&lt;/span&gt;odman
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now everything works normally, but your data lives safely on D drive—no risk of losing it during a system reinstall.&lt;/p&gt;

&lt;h3&gt;
  
  
  Test your Podman installation
&lt;/h3&gt;

&lt;p&gt;If you followed my &lt;a href="https://www.dataleadsfuture.com/exclusive-reveal-code-sandbox-tech-behind-manus-and-claude-agent-skills/" rel="noopener noreferrer"&gt;earlier article&lt;/a&gt; and created a &lt;code&gt;Dockerfile&lt;/code&gt; for a &lt;code&gt;jupyter-server&lt;/code&gt;, you can now test whether Podman builds images correctly. Navigate to your &lt;code&gt;Dockerfile&lt;/code&gt; directory and run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;podman build &lt;span class="nt"&gt;--network&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;host &lt;span class="nt"&gt;-t&lt;/span&gt; jupyter-server &lt;span class="nb"&gt;.&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Remember, the &lt;code&gt;podman-machine&lt;/code&gt; host is network-isolated from your Windows system. The &lt;code&gt;--network=host&lt;/code&gt; flag lets the build process access your Windows network, making it easier to pull base images and install Python packages from PyPI.&lt;/p&gt;

&lt;p&gt;Note: this flag only affects image building—it has no impact when running containers.&lt;/p&gt;

&lt;h3&gt;
  
  
  Make Docker CLI commands work
&lt;/h3&gt;

&lt;p&gt;If you’re used to typing &lt;code&gt;docker&lt;/code&gt; commands, here’s a neat trick: create a &lt;code&gt;docker.bat&lt;/code&gt; file in &lt;code&gt;%USERPROFILE%/.local/bin&lt;/code&gt; (Make sure this path is in your system &lt;code&gt;PATH&lt;/code&gt;).&lt;/p&gt;

&lt;p&gt;Put this inside &lt;code&gt;docker.bat&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;@echo off
setlocal EnableDelayedExpansion
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="s2"&gt;"%~1"&lt;/span&gt;&lt;span class="o"&gt;==&lt;/span&gt;&lt;span class="s2"&gt;"build"&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;
    &lt;span class="nb"&gt;shift
    &lt;/span&gt;rem Default to &lt;span class="nt"&gt;--network&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;host
    podman build &lt;span class="nt"&gt;--network&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;host %2 %3 %4 %5 %6 %7 %8 %9
&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;
    podman %&lt;span class="k"&gt;*&lt;/span&gt;
&lt;span class="o"&gt;)&lt;/span&gt;
endlocal
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now you can keep using &lt;code&gt;docker&lt;/code&gt; commands as usual. Since Podman’s CLI is mostly compatible with Docker’s, everything feels the same—except when you run &lt;code&gt;docker build&lt;/code&gt;, it automatically adds &lt;code&gt;--network=host&lt;/code&gt; to the underlying &lt;code&gt;podman build&lt;/code&gt; command.&lt;/p&gt;

&lt;h3&gt;
  
  
  Call Docker SDK from inside a containerized app
&lt;/h3&gt;

&lt;p&gt;If your agent app runs inside a container and needs to use the Docker SDK to spin up Python code interpreter containers (a common pattern across agent frameworks), you’ll need to let the SDK talk to Podman.&lt;/p&gt;

&lt;p&gt;Do this by mounting the &lt;code&gt;podman.sock&lt;/code&gt; socket into your app container so the Docker SDK can reach it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker run &lt;span class="nt"&gt;-v&lt;/span&gt; /run/podman/podman.sock:/var/run/docker.sock &lt;span class="nt"&gt;--rm&lt;/span&gt; app-test
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Again, your &lt;code&gt;podman-machine-default&lt;/code&gt; must be initialized with &lt;code&gt;--rootful=true&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Fix Docker SDK timeout issues
&lt;/h3&gt;

&lt;p&gt;When your agent generates code and sends it via Docker SDK to a code interpreter container, there’s often a delay between code generations while the LLM produces tokens. On the second SDK call, you might hit a timeout like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;requests.exceptions.ConnectionError: &lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'Connection aborted.'&lt;/span&gt;, RemoteDisconnected&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'Remote end closed connection without response'&lt;/span&gt;&lt;span class="o"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This happens because &lt;code&gt;podman-machine&lt;/code&gt; sets a very short service_timeout for its engine by default.&lt;/p&gt;

&lt;p&gt;Fix it by SSHing into your &lt;code&gt;podman-machine&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;podman machine ssh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now you’re in the Linux VM. Edit &lt;code&gt;/etc/containers/containers.conf&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;vi /etc/containers/containers.conf
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Set &lt;code&gt;service_timeout&lt;/code&gt; to 0 (meaning “never time out”) under the [engine] section:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="o"&gt;[&lt;/span&gt;engine]
cgroup_manager &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"cgroupfs"&lt;/span&gt;
service_timeout &lt;span class="o"&gt;=&lt;/span&gt; 0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Since this is a dev environment, security isn’t a concern here.&lt;/p&gt;

&lt;h3&gt;
  
  
  Adjust Podman container stop timeout
&lt;/h3&gt;

&lt;p&gt;During development, you often start and stop your agent program from the command line. But you might notice it takes forever to exit after the program finishes.&lt;/p&gt;

&lt;p&gt;That’s because Podman waits by default (10 seconds!) for containers to gracefully shut down after receiving a stop signal.&lt;/p&gt;

&lt;p&gt;Shorten this by editing the same config file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="o"&gt;[&lt;/span&gt;engine]
stop_timeout &lt;span class="o"&gt;=&lt;/span&gt; 2
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now containers stop in 2 seconds—much snappier!&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Ever since Gemini 3.0 accidentally deleted 800GB of user data, running agent-generated code inside containers has become essential. And yes—stateful code interpreters truly boost an agent’s ability to tackle complex problems.&lt;/p&gt;

&lt;p&gt;Most agent frameworks rely on the Docker SDK to manage containers. But Docker Desktop’s licensing blocks its use in enterprise dev environments.&lt;/p&gt;

&lt;p&gt;This guide shows you how to use Podman Desktop as a drop-in replacement. Following these steps, you can quickly build a Docker-compatible container dev environment. I’ve used this exact setup for over six months with zero issues.&lt;/p&gt;

&lt;p&gt;This article covers only the minimal setup needed for agent development. If you want to dive deeper into Podman’s internals and advanced usage, check out the official tutorials.&lt;/p&gt;

&lt;p&gt;Our journey toward &lt;strong&gt;Deep Data Analyst&lt;/strong&gt; Agents continues—stay tuned for the next episode!&lt;/p&gt;

</description>
      <category>ai</category>
      <category>containers</category>
      <category>devops</category>
      <category>chatgpt</category>
    </item>
    <item>
      <title>Share My LLM Prompts and Tips That Make Work and Learning Super Efficient</title>
      <dc:creator>Peng Qian</dc:creator>
      <pubDate>Fri, 28 Nov 2025 08:17:28 +0000</pubDate>
      <link>https://dev.to/qtalen/share-my-llm-prompts-and-tips-that-make-work-and-learning-super-efficient-2oj3</link>
      <guid>https://dev.to/qtalen/share-my-llm-prompts-and-tips-that-make-work-and-learning-super-efficient-2oj3</guid>
      <description>&lt;p&gt;A lot of friends ask me how I manage to stay busy with work every day, yet still find time to learn and write blog posts. The answer is simple: I use AI to help me learn about AI.&lt;/p&gt;

&lt;p&gt;Today I’m sharing all the AI tools, tricks, and prompts I’ve used over the past two years at work. No fluff—just straight-up useful stuff.&lt;/p&gt;




&lt;h2&gt;
  
  
  Find a Good AI Client
&lt;/h2&gt;

&lt;p&gt;If you use LLMs to boost your daily productivity, chatting with the model is still the main way most people interact with it. That means having a solid AI client app is essential.&lt;/p&gt;

&lt;p&gt;My favorite AI client right now is &lt;a href="https://www.cherry-ai.com/?ref=dataleadsfuture.com" rel="noopener noreferrer"&gt;Cherry Studio Community Edition&lt;/a&gt;. It supports multiple languages, is completely open source and free, lets you connect to all kinds of model services, and even lets you add your own System Prompt and MCP tools. These features form the foundation for all the tips I’ll share next.&lt;/p&gt;

&lt;p&gt;You can pick any model service you like. I recommend &lt;a href="https://openrouter.ai/models?ref=dataleadsfuture.com" rel="noopener noreferrer"&gt;OpenRouter&lt;/a&gt;—with one API key, you get access to Gemini, GPT, Claude, Qwen, and many other commercial or open-source models.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq1jo2u8srx67hmcc4nje.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq1jo2u8srx67hmcc4nje.png" alt="Configure the model service in Cherry Studio's settings interface. Image by Author" width="800" height="514"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Then there’s the agent interface. You can fill in different System Prompts based on your use case. The prompts I use daily go right here.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk40ispj2t0v49x1l1u5t.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk40ispj2t0v49x1l1u5t.png" alt="Enter the prompt you want to share today in the agent interface. Image by Author" width="800" height="514"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Finally, in the chat interface, you can add the agent you just set up as your assistant, tweak your model settings, and start chatting with the LLM.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftpsqdoszarfwr4ccsaho.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftpsqdoszarfwr4ccsaho.png" alt="Just add the agent you just set up as an assistant, and you're good to go! Image by Author" width="800" height="514"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Best of all, using an AI desktop client lets you keep your knowledge base local and connect to self-hosted LLM services, giving you and your company the strongest data security possible.&lt;/p&gt;




&lt;h2&gt;
  
  
  Some Suggestions on Models and Settings
&lt;/h2&gt;

&lt;p&gt;Next up are the models and parameters I use daily. These aren’t “correct” answers—just my personal experience.&lt;/p&gt;

&lt;h3&gt;
  
  
  Model choices
&lt;/h3&gt;

&lt;p&gt;I follow a simple rule: if I’m using it myself, I go with the best. So I stick to commercial models unless I’m building agents, where I might pick an open-source model based on need. Here’s what I use regularly:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Gemini 3.0&lt;/strong&gt; Pro is my top pick for vibe coding. Right now, code generated by Gemini 3.0 has the highest accuracy, which saves me tons of debugging time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;GPT-5&lt;/strong&gt; is the classic reliable choice—a great balance between cost and expertise. I use it for everything except coding.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Qwen3 Max&lt;/strong&gt;… well, I really dislike its overly encouraging tone. It always tells me I’m doing great, no matter what, and that drives me nuts. But I have to admit—Qwen3 shines in localization and language handling. I use it whenever I need translation or proofreading.&lt;/p&gt;

&lt;h3&gt;
  
  
  Parameter settings
&lt;/h3&gt;

&lt;p&gt;If you’ve read my articles before, you probably already know what each LLM parameter does. Here’s how I personally set them when using models myself.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Temperature.&lt;/strong&gt; Lower values make the model more predictable; higher values make it more creative. I adjust based on context. For coding, I set temperature to 0.01 to keep responses consistent across chats. For everything else, I stick with the default 0.7—it feels like talking to a real person. Later, I’ll show you how to make the LLM write creative motivational text—I crank temperature up to 0.8–0.9 for that.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Context length.&lt;/strong&gt; Few people pay attention to this setting, but besides saving token costs, it can offer unexpected benefits. For my translation agent, I set context length to 2—meaning no chat history is kept. That way, the LLM only translates what I input right now, without interference from past translations.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Max tokens.&lt;/strong&gt; This controls the maximum number of tokens per response. I always adjust it. For reasoning tasks, I keep it low—otherwise wait times and token costs explode. For writing, I set it high to avoid cutting off long articles due to default limits.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Try using MCP
&lt;/h3&gt;

&lt;p&gt;I’ve always felt that MCP was built exactly for making LLM clients more powerful. With MCP, your personal chat interface can unlock all kinds of agent capabilities. Here’s an example:&lt;/p&gt;

&lt;p&gt;We all know that due to design choices and legal restrictions, the built-in web search in LLMs keeps getting worse. Default search rarely gives useful info, especially over multi-turn conversations.&lt;/p&gt;

&lt;p&gt;But you can install a &lt;code&gt;tavily-mcp&lt;/code&gt; service for better web search. Just sign up on their site, get an API key, and add the tavily-mcp JSON config to your client.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff5n9uxob2ftui0qquaiy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff5n9uxob2ftui0qquaiy.png" alt="You can find the JSON for configuring MCP on the official websites of the various tools. Image by Author" width="800" height="514"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Traditional search works like this: take your keywords, search the web, then build an answer from the results.&lt;/p&gt;

&lt;p&gt;MCP-based search is different.&lt;/p&gt;

&lt;p&gt;During the conversation, the LLM dynamically decides if it needs to search—and generates its own search keywords based on context. This leads to much more accurate results.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1fssjcqecq7eu8rwgwvq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1fssjcqecq7eu8rwgwvq.png" alt="When using tavily-mcp, the agent will try searching with multiple keywords until it finds the answer. Image by Author" width="800" height="645"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Other MCP tools are great too. &lt;code&gt;fetch&lt;/code&gt; grabs webpage content from any URL you give it. &lt;code&gt;memory&lt;/code&gt; uses a knowledge graph to remember key info from your chats, helping you build your own custom AI agent.&lt;/p&gt;

&lt;p&gt;Now that we’ve covered LLM setup tips, let’s move on to my prompt-writing tricks.&lt;/p&gt;




&lt;h2&gt;
  
  
  My Prompt-Writing Tips
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Is structured JSON prompting really necessary?
&lt;/h3&gt;

&lt;p&gt;After Gemini 3.0 launched, people noticed that using JSON-formatted prompts seemed to help LLMs follow instructions better. That sparked debate: should we always use JSON for clearer, more precise structure?&lt;/p&gt;

&lt;p&gt;I’ve discussed this several times with the brilliant engineers at Qwen. Their answer was clear:&lt;/p&gt;

&lt;p&gt;It depends on what format of the training data used during model training. LLMs essentially memorize knowledge—including input formats—through their parameters. If most training text were in Markdown, then Markdown is naturally the best fit.&lt;/p&gt;

&lt;p&gt;That’s why LLMs output in Markdown by default—they were trained on Markdown-heavy data. So their most familiar format is still Markdown.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2qjrhz6wnhyv9opwwbcu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2qjrhz6wnhyv9opwwbcu.png" alt="What you think is a clearer format doesn't mean the LLM sees it that way too. Image by Author" width="660" height="213"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It’s like someone who eats bread every day telling a rice-eating kid that bread is the real staple food—completely forgetting that for the kid, rice is the staple.&lt;/p&gt;

&lt;p&gt;So my conclusion? Just stick with Markdown—it’s plenty structured. And with modern LLMs, even plain-text instructions work fine as long as you’re clear.&lt;/p&gt;

&lt;h3&gt;
  
  
  A universal template for system prompts
&lt;/h3&gt;

&lt;p&gt;Just like I usually write articles in a three-part structure (introduction, body, conclusion), having a prompt template saves you from staring at a blank screen.&lt;/p&gt;

&lt;p&gt;What template works for system prompts? I use the “Who, Can, Do” framework. Every good prompt includes: role, what to (not) do, and how to do it. I define each part with a subheading. Here’s an example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;## Role
You are a data analyst skilled at breaking complex tasks into Python-solvable subtasks.

## Tasks
1. **Task breakdown**: Split the user request into substeps, each solvable with Python.
2. **Code generation**: Turn the current substep into Python code.
3. **Code execution**: Run the code using a tool and get the result.
4. **Iterate**: Use the result to decide the next step. Repeat steps 1–3 until you have a final answer.
5. **Insight &amp;amp; advice**: Add thoughtful, practical insights based on the analysis.

## Requirements
- Execute one step at a time. No skipping or combining.

## Output
- Use Markdown with a clear structure.
- Keep tone friendly but authoritative.
- Add emojis for warmth.
- Format numbers with commas (e.g., 1,000).
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here’s what each section means:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Role&lt;/strong&gt; tells the LLM “who I am and what I can do.” “Who I am” shapes output style—serious or playful—based on the role you assign. “What I can do” sets initial boundaries. For MoE models, it can even influence which expert module activates.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Tasks&lt;/strong&gt; are your specific instructions. Since I recommend Markdown, use ordered lists if steps must run in sequence; otherwise, use unordered lists. Lists make your intent crystal clear.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Requirements&lt;/strong&gt; remind the LLM of its limits. Model makers train their LLMs to answer everything—but reality isn’t like that. Explicitly stating what it can’t do reduces hallucinations.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Output&lt;/strong&gt; guides output format and tone. Use this section when you care about style, structure, or voice.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;You can add extra sections based on your needs.&lt;/p&gt;

&lt;p&gt;For a coding agent, add a “Code Style” section:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;## Code Style
- Code runs in Jupyter. Reuse existing variables.
- Write incrementally and leverage kernel state to avoid repetition.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you use RAG or want to show the LLM how to think, add an “Examples” section.&lt;/p&gt;

&lt;h3&gt;
  
  
  Let the LLM help you debug your prompt
&lt;/h3&gt;

&lt;p&gt;Tweaking prompts is expensive—especially when a perfectly tuned prompt stops working after switching models. What can you do besides starting over?&lt;/p&gt;

&lt;p&gt;Use the LLM itself!&lt;/p&gt;

&lt;p&gt;After setting your System Prompt, just ask: “&lt;strong&gt;Please repeat in detail what you understand my instructions to be.&lt;/strong&gt;”&lt;/p&gt;

&lt;p&gt;It’s like asking your friends to repeat back a task before they start—to confirm understanding.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffja7fddysudkcnhon2rv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffja7fddysudkcnhon2rv.png" alt="Ask the LLM to repeat the instructions I gave it. Image by Author" width="800" height="716"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This makes the LLM instantly return its interpretation of your system prompt. Compare it to your original intent and spot gaps.&lt;/p&gt;

&lt;p&gt;Or go further: ask “&lt;strong&gt;Please use Markdown to repeat in detail what you understand my instructions to be.&lt;/strong&gt;”&lt;/p&gt;

&lt;p&gt;The LLM will then write a clean Markdown version of its understanding. You can copy the good parts straight back into your own prompt. Trust me—it will follow what it writes. I’ve tested this countless times.&lt;/p&gt;

&lt;h3&gt;
  
  
  Learn LLM fundamentals systematically
&lt;/h3&gt;

&lt;p&gt;Knowing what works (and what doesn’t) with LLMs requires understanding how they work. This guide can’t cover everything—you need proper training.&lt;/p&gt;

&lt;p&gt;I highly recommend Coursera’s &lt;a href="https://imp.i384100.net/e1VAx6?ttd=c_1&amp;amp;ref=dataleadsfuture.com" rel="noopener noreferrer"&gt;&lt;strong&gt;Google AI Essentials Specialization&lt;/strong&gt;&lt;/a&gt;. &lt;strong&gt;It teaches you how to use LLM tools, write effective prompts, handle hallucinations, and more.&lt;/strong&gt; In just 4 hours, you’ll go from beginner to AI power user.&lt;/p&gt;

&lt;p&gt;I took this course when I started. It provided me with a rock-solid foundation for agent development—and many of the tricks in this post came straight from it.&lt;/p&gt;

&lt;p&gt;Beyond theory, I’ll now share some of my go-to prompt examples. You can use them directly at work or as inspiration for your own prompts. Let’s dive in.&lt;/p&gt;




&lt;h2&gt;
  
  
  My Work Prompt Examples
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Prompt for blog cover images
&lt;/h3&gt;

&lt;p&gt;Let’s start with the prompt I use to generate blog cover art.&lt;/p&gt;

&lt;p&gt;If you’ve read my posts before, you’ll notice a consistent style: a cute little rabbit busy doing various things. I had DeepSeek generate the image prompt, then used DALL·E 3 to create the picture.&lt;/p&gt;

&lt;p&gt;Even though OpenAI says DALL·E 3 boosts prompts automatically, I still get better results by first using an LLM to write a full image prompt. Here’s what I give DeepSeek:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;## Role
You are a visual artist skilled at writing DALL·E 3-friendly prompts.

## Task
Rewrite my [scene description] into a detailed English prompt perfect for DALL·E 3.

## Length
Describe in 5 bullet points. Only the prompt—no intro or explanation.

## Style
Colorful illustration on slightly yellowed parchment paper, filled with tech elements.

## Visuals
Include impressive details like camera angle and lighting.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I used this exact prompt in a previous post about generating ink-wash style illustrations:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.dataleadsfuture.com/use-llamaindex-workflow-to-create-an-ink-painting-style-image-generation-workflow/" rel="noopener noreferrer"&gt;Use LLamaIndex Workflow to Create an Ink Painting Style Image Generation Workflow&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Even though that post automated the whole workflow, you can still manually generate the image prompt first, then create the picture.&lt;/p&gt;

&lt;h3&gt;
  
  
  Daily translation assistant
&lt;/h3&gt;

&lt;p&gt;Since I started blogging, I have often chatted with readers from around the world and answered their questions.&lt;/p&gt;

&lt;p&gt;I want my English to sound natural and conversational—not stiff like machine translation. So I use an LLM with this prompt:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;## Role
You are an expert Chinese-English translator in computer science and programming.

## Task
- Detect the language of the user’s message.
- If it’s not Chinese, translate it into Chinese.
- If it’s Chinese, translate it to English.

## Style
- Keep translations simple and clear. Avoid complex words.
- Use vocabulary a middle schooler would understand.
- Sound conversational—like a good friend chatting with you.

## Rules
1. Only translate—never do anything else.
2. Output only the translation—no intro or notes.

## Special Terms
Translate these terms as follows:
[Chinese phrase]: [English translation]
大模型: LLM
大语言模型: LLM
私有化部署: self-hosted

-----------------------------------------------
Now translate this:
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This prompt automatically detects the input language and translates it into the corresponding language:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fusxb3ljrehhq9hapmcbw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fusxb3ljrehhq9hapmcbw.png" alt="Automatically detect the language and translate it. Image by Author" width="800" height="488"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;A few notes:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;I set context length to 2—only the current text and translation are kept. This prevents interference from past chats.&lt;/li&gt;
&lt;li&gt;I lock down translations for special terms because models often disagree on them.&lt;/li&gt;
&lt;li&gt;I end with “Now translate this:”. Even though LLMs use Transformer architecture, they still predict the next token based on prior ones. Sometimes the input is a reader’s question—if I don’t add this line, the LLM answers the question instead of translating it. This phrase makes the task clear.&lt;/li&gt;
&lt;/ol&gt;

&lt;h3&gt;
  
  
  Article translation prompt
&lt;/h3&gt;

&lt;p&gt;English isn’t my first language, so I always need to translate my new articles.&lt;/p&gt;

&lt;p&gt;At first, I used DeepL plus Grammarly for polishing. Paying for two subscriptions every month got expensive—and let’s be honest, machine translation isn’t great.&lt;/p&gt;

&lt;p&gt;So as soon as GPT-3.5 came out, I switched to using LLMs for translation. Even the article you’re reading now was translated by an LLM:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;## Role
You are a senior data science expert and blog editor.

## Task
I wrote a Chinese data science article. Translate it into English.

## Rules
- Keep my original paragraph breaks.
- Never add code that wasn’t there.
- Avoid adverbs, prepositions, and passive voice.
- **Only translate—don’t rewrite or change my content.**

## Style
- Conversational, light, and cheerful.
- Don’t bold the first word or phrase in list items.
- Use Title Case for headings and Sentence case for subheadings.

## Tone
- Keep it simple and easy to understand.
- Use words any US 12th grader would know.
- Sound like you’re chatting with a good friend.

## Audience
Beginners in data science and people curious about the field.

## Special Terms
Translate these terms as follows:
[Chinese phrase]: [English translation]
大模型: LLM
大语言模型: LLM
私有化部署: self-hosted
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Data science research assistant
&lt;/h3&gt;

&lt;p&gt;My day job is data science, so I trained a specialized assistant—not a general-purpose one. I want to write code that matches my habits. Over time, these preferences settled into the “Requirements” section:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;## Role
You are a data scientist acting as my programming assistant. Help me improve my data science coding and algorithm skills.

## Requirements
1. Be truthful and precise. Never make things up.
2. Ensure all code runs correctly.
3. Use Python 3.12+ features, syntax, APIs, and best practices.
4. Always use the latest versions and APIs of third-party libraries.
5. Write clean, efficient, readable code.
6. Comment only when necessary.
7. Use vertical bar | for type unions.
8. Prefer `with` statements.
9. Use `pathlib` for file and directory operations.

## Tool Use
* tavily-search: Always use `"search_depth": 'advanced'`.

## Response
Be truthful, thorough, well-organized, and accurate.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Remember the MCP tip earlier? Here I tell the LLM to use tavily’s advanced mode for deep web searches when it lacks info.&lt;/p&gt;

&lt;p&gt;Honestly, with this setup plus Gemini 3.0 Pro, the LLM has massively boosted my data science learning—high efficiency and minimal hallucinations.&lt;/p&gt;

&lt;h3&gt;
  
  
  General-purpose daily assistant
&lt;/h3&gt;

&lt;p&gt;This one’s simple. I just want honest, helpful answers—no fake positivity like Qwen3-Max’s constant praise. It’s great for everyday questions, and Qwen3-Max is cheaper too:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;## Role
You are my personal assistant. Give me sincere, useful advice for life and work.

## Requirements
* All knowledge, sources, and text must be real and accurate. No fabrications.
* Never lie. If you don’t know something, say so.
* Don’t just agree with me—point out problems or suggest improvements.

## Output
* Never use em dashes in your replies.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Wrap-Up
&lt;/h2&gt;

&lt;p&gt;That’s all the LLM tips and prompt examples I’ve gathered over the past two years. They’re not perfect, but they’ve seriously boosted my daily productivity. I hope they help you, too.&lt;/p&gt;

&lt;p&gt;I’ll keep updating this post with more practical tricks. If there’s something you’d like to see, leave me a comment!&lt;/p&gt;

&lt;p&gt;One last plug: take the &lt;a href="https://imp.i384100.net/e1VAx6?ttd=c_2&amp;amp;ref=dataleadsfuture.com" rel="noopener noreferrer"&gt;&lt;strong&gt;Google AI Essentials Specialization&lt;/strong&gt;&lt;/a&gt; on Coursera. Its glowing reviews prove it’s worth your time. For a small fee, you’ll quickly master LLM usage—and earn a Google AI certificate to boot.&lt;/p&gt;




&lt;p&gt;Enjoyed this read? &lt;a href="https://www.dataleadsfuture.com/#/portal/signup" rel="noopener noreferrer"&gt;Subscribe now to get more cutting-edge data science tips straight to your inbox!&lt;/a&gt; Your feedback and questions are welcome — let’s discuss in the comments below!&lt;/p&gt;

&lt;p&gt;This article was originally published on &lt;a href="https://www.dataleadsfuture.com/share-my-llm-prompts-and-tips-that-make-work-and-learning-super-efficient/" rel="noopener noreferrer"&gt;Data Leads Future&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>promptengineering</category>
      <category>chatgpt</category>
      <category>llm</category>
    </item>
  </channel>
</rss>
