<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Peng Qian</title>
    <description>The latest articles on DEV Community by Peng Qian (@qtalen).</description>
    <link>https://dev.to/qtalen</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2927908%2F838177c1-066e-4986-a3ba-d764afa88632.jpg</url>
      <title>DEV Community: Peng Qian</title>
      <link>https://dev.to/qtalen</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/qtalen"/>
    <language>en</language>
    <item>
      <title>Your AI always writes messy code? That's because your spec files have never been properly reviewed. 

By adding a reflection mechanism to the proposal stage of OpenSpec, I got DeepSeek-v4-pro to match Opus 4.6 in coding ability:</title>
      <dc:creator>Peng Qian</dc:creator>
      <pubDate>Mon, 01 Jun 2026 04:05:05 +0000</pubDate>
      <link>https://dev.to/qtalen/your-ai-always-writes-messy-code-thats-because-your-spec-files-have-never-been-properly-5gnc</link>
      <guid>https://dev.to/qtalen/your-ai-always-writes-messy-code-thats-because-your-spec-files-have-never-been-properly-5gnc</guid>
      <description>&lt;div class="ltag__link--embedded"&gt;
  &lt;div class="crayons-story "&gt;
  &lt;a href="https://dev.to/qtalen/reflection-sdd-use-a-reflection-harness-to-level-up-your-openspec-workflow-15l7" class="crayons-story__hidden-navigation-link"&gt;Reflection SDD: Use a Reflection Harness to Level Up Your OpenSpec Workflow&lt;/a&gt;


  &lt;div class="crayons-story__body crayons-story__body-full_post"&gt;
    &lt;div class="crayons-story__top"&gt;
      &lt;div class="crayons-story__meta"&gt;
        &lt;div class="crayons-story__author-pic"&gt;

          &lt;a href="/qtalen" class="crayons-avatar  crayons-avatar--l  "&gt;
            &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2927908%2F838177c1-066e-4986-a3ba-d764afa88632.jpg" alt="qtalen profile" class="crayons-avatar__image"&gt;
          &lt;/a&gt;
        &lt;/div&gt;
        &lt;div&gt;
          &lt;div&gt;
            &lt;a href="/qtalen" class="crayons-story__secondary fw-medium m:hidden"&gt;
              Peng Qian
            &lt;/a&gt;
            &lt;div class="profile-preview-card relative mb-4 s:mb-0 fw-medium hidden m:inline-block"&gt;
              
                Peng Qian
                
              
              &lt;div id="story-author-preview-content-3792447" class="profile-preview-card__content crayons-dropdown branded-7 p-4 pt-0"&gt;
                &lt;div class="gap-4 grid"&gt;
                  &lt;div class="-mt-4"&gt;
                    &lt;a href="/qtalen" class="flex"&gt;
                      &lt;span class="crayons-avatar crayons-avatar--xl mr-2 shrink-0"&gt;
                        &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2927908%2F838177c1-066e-4986-a3ba-d764afa88632.jpg" class="crayons-avatar__image" alt=""&gt;
                      &lt;/span&gt;
                      &lt;span class="crayons-link crayons-subtitle-2 mt-5"&gt;Peng Qian&lt;/span&gt;
                    &lt;/a&gt;
                  &lt;/div&gt;
                  &lt;div class="print-hidden"&gt;
                    
                      Follow
                    
                  &lt;/div&gt;
                  &lt;div class="author-preview-metadata-container"&gt;&lt;/div&gt;
                &lt;/div&gt;
              &lt;/div&gt;
            &lt;/div&gt;

          &lt;/div&gt;
          &lt;a href="https://dev.to/qtalen/reflection-sdd-use-a-reflection-harness-to-level-up-your-openspec-workflow-15l7" class="crayons-story__tertiary fs-xs"&gt;&lt;time&gt;Jun 1&lt;/time&gt;&lt;span class="time-ago-indicator-initial-placeholder"&gt;&lt;/span&gt;&lt;/a&gt;
        &lt;/div&gt;
      &lt;/div&gt;

    &lt;/div&gt;

    &lt;div class="crayons-story__indention"&gt;
      &lt;h2 class="crayons-story__title crayons-story__title-full_post"&gt;
        &lt;a href="https://dev.to/qtalen/reflection-sdd-use-a-reflection-harness-to-level-up-your-openspec-workflow-15l7" id="article-link-3792447"&gt;
          Reflection SDD: Use a Reflection Harness to Level Up Your OpenSpec Workflow
        &lt;/a&gt;
      &lt;/h2&gt;
        &lt;div class="crayons-story__tags"&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/ai"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;ai&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/programming"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;programming&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/coding"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;coding&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/vibecoding"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;vibecoding&lt;/a&gt;
        &lt;/div&gt;
      &lt;div class="crayons-story__bottom"&gt;
        &lt;div class="crayons-story__details"&gt;
          &lt;a href="https://dev.to/qtalen/reflection-sdd-use-a-reflection-harness-to-level-up-your-openspec-workflow-15l7" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left"&gt;
            &lt;div class="multiple_reactions_aggregate"&gt;
              &lt;span class="multiple_reactions_icons_container"&gt;
                  &lt;span class="crayons_icon_container"&gt;
                    &lt;img src="https://assets.dev.to/assets/sparkle-heart-5f9bee3767e18deb1bb725290cb151c25234768a0e9a2bd39370c382d02920cf.svg" width="18" height="18"&gt;
                  &lt;/span&gt;
              &lt;/span&gt;
              &lt;span class="aggregate_reactions_counter"&gt;1&lt;span class="hidden s:inline"&gt;&amp;nbsp;reaction&lt;/span&gt;&lt;/span&gt;
            &lt;/div&gt;
          &lt;/a&gt;
            &lt;a href="https://dev.to/qtalen/reflection-sdd-use-a-reflection-harness-to-level-up-your-openspec-workflow-15l7#comments" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left flex items-center"&gt;
              

              1&lt;span class="hidden s:inline"&gt;&amp;nbsp;comment&lt;/span&gt;
            &lt;/a&gt;
        &lt;/div&gt;
        &lt;div class="crayons-story__save"&gt;
          &lt;small class="crayons-story__tertiary fs-xs mr-2"&gt;
            9 min read
          &lt;/small&gt;
            
              &lt;span class="bm-initial"&gt;
                

              &lt;/span&gt;
              &lt;span class="bm-success"&gt;
                

              &lt;/span&gt;
            
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;


</description>
    </item>
    <item>
      <title>Reflection SDD: Use a Reflection Harness to Level Up Your OpenSpec Workflow</title>
      <dc:creator>Peng Qian</dc:creator>
      <pubDate>Mon, 01 Jun 2026 04:00:07 +0000</pubDate>
      <link>https://dev.to/qtalen/reflection-sdd-use-a-reflection-harness-to-level-up-your-openspec-workflow-15l7</link>
      <guid>https://dev.to/qtalen/reflection-sdd-use-a-reflection-harness-to-level-up-your-openspec-workflow-15l7</guid>
      <description>&lt;p&gt;Stop letting bad spec files tank your code quality&lt;/p&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;In this article, I want to walk you through how I introduced a reflection mechanism into the OpenSpec workflow inside OpenCode, and how that dramatically improved the quality of AI-generated code.&lt;/p&gt;

&lt;p&gt;After nearly a month of testing, this reflection workflow has gotten DeepSeek-V4-pro in OpenCode to perform at roughly the same level as Claude Opus 4.6. The only cost is some extra review time and a few more tokens. Trust me, it's worth it.&lt;/p&gt;

&lt;p&gt;Can't wait to find out how? Let's get into it.&lt;/p&gt;




&lt;h2&gt;
  
  
  Why This Works
&lt;/h2&gt;

&lt;p&gt;I've been using AI coding for a while now. Compared to Claude Code, I prefer building my own SDD-based coding workflow with OpenCode and OpenSpec. I wrote a well-received article specifically about this OpenCode workflow:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.dataleadsfuture.com/how-i-use-opencode-oh-my-opencode-slim-and-openspec-to-build-my-own-ai-coding-environment/" rel="noopener noreferrer"&gt;How I Use OpenCode, Oh-My-OpenCode-Slim, and OpenSpec to Build My Own AI Coding Environment&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;With OpenSpec, and the explore → propose → apply → verify → archive workflow loop, we can finally get LLMs to handle complex project development.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fth233gs7ef7nlx1uc67f.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fth233gs7ef7nlx1uc67f.png" alt="With the OpenSpec workflow loop, LLMs can finally write usable code. Image by Author" width="361" height="341"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;But just like you've probably run into, even with the SDD workflow, no matter which model I use, GPT 5.5 or the latest DeepSeek-V4-pro, the AI still inevitably produces hidden bugs or piles up messy code. Code I'd never feel comfortable putting in production.&lt;/p&gt;

&lt;p&gt;My first fix was to call a &lt;code&gt;@reviewer&lt;/code&gt; sub-agent after the &lt;code&gt;/opsx-apply&lt;/code&gt; phase to do a code review on the changes. Sometimes that worked and caught architectural or implementation issues. But the impact was limited.&lt;/p&gt;

&lt;p&gt;Often I'd only discover something was wrong after using the project for a while: a scenario wasn't covered, edge cases were missed, or one part of the code got updated but a related module didn't.&lt;/p&gt;

&lt;p&gt;Later I stepped back and looked at the whole AI coding workflow again, and that's when I spotted the real problem.&lt;/p&gt;

&lt;p&gt;As programmers, we always focus on whether our code is good, so we naturally look at things from the code level.&lt;/p&gt;

&lt;p&gt;But we completely overlooked the quality of the proposal files that OpenSpec generates. We'd finish discussing requirements, generate the proposal file, and then just let the AI start implementing. That's how input-level bugs get introduced. When things go wrong, we blame the model.&lt;/p&gt;

&lt;p&gt;Think about it, back in the traditional coding era, when a product manager handed over a requirements doc, there was a critical step before we started coding: requirements review. We wouldn't touch the keyboard until every issue in the requirements doc or design doc was sorted out.&lt;/p&gt;

&lt;p&gt;So why did we forget this step in the AI coding era? That's exactly what we're going to fix today: add the requirements review step back into the OpenSpec workflow and see if it makes AI-generated code better.&lt;/p&gt;




&lt;h2&gt;
  
  
  How to Do It
&lt;/h2&gt;

&lt;p&gt;The SDD workflow isn't anything exotic. If you're familiar with multi-agent system design patterns, you'll recognize that SDD is basically the &lt;code&gt;plan-execute&lt;/code&gt; pattern.&lt;/p&gt;

&lt;p&gt;One agent breaks the user's task into a step-by-step plan file, then another agent follows that plan to execute the task. This lets the agent system handle complex work.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9ckmakdln92c90sana65.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9ckmakdln92c90sana65.png" alt="The Plan agent makes the plan, the Executor agent gets it done. Image by Author" width="561" height="431"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;But how do you guarantee the quality of what the agents produce in a &lt;code&gt;plan-execute&lt;/code&gt; setup? That's where a pattern called &lt;code&gt;reflection&lt;/code&gt; comes in.&lt;/p&gt;

&lt;p&gt;The reflection pattern adds a reflection agent to the multi-agent workflow. This agent typically runs on a completely different LLM and reviews the output of the &lt;code&gt;plan&lt;/code&gt; or &lt;code&gt;executor&lt;/code&gt; agent from a different angle, which raises the overall performance of the multi-agent system.&lt;/p&gt;

&lt;p&gt;The reflection pattern sees wide use in content creation and deep research scenarios, which proves it works.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5kdinm7axsenigi4o7b7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5kdinm7axsenigi4o7b7.png" alt="Just adding a reflection step can dramatically improve task quality. Image by Author" width="681" height="431"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Since the pattern is proven, we can bring the same reflection step into the OpenSpec workflow, targeting the proposal files.&lt;/p&gt;

&lt;p&gt;I'll introduce a reflection agent that runs on a different LLM from the primary agent, reviewing the proposal files from a different angle. We'll also adjust the OpenSpec workflow so this reflection agent plays an active role in it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Introducing the reflection agent
&lt;/h3&gt;

&lt;p&gt;Adding a new agent in OpenCode is simple. Just drop a new Markdown file into &lt;code&gt;~/.config/opencode/agents/&lt;/code&gt; with the agent's prompt inside.&lt;/p&gt;

&lt;p&gt;The core job of this agent is straightforward: review the artifact files that OpenSpec generates from multiple angles, making sure the quality of the requirements input is solid from the start. This agent sits between the &lt;code&gt;/opsx-propose&lt;/code&gt; and &lt;code&gt;/opsx-apply&lt;/code&gt; phases.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqrc98kmccmdg6z9g2swm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqrc98kmccmdg6z9g2swm.png" alt="Only proposals that pass review can move on to the apply phase. Image by Author" width="651" height="171"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We can have deepseek-v4-pro generate the first draft of this agent's prompt:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;You are an &lt;span class="gs"&gt;**OpenSpec Change Reviewer**&lt;/span&gt; — a critical thinker and auditor focused on substance.

Your job is to review every artifact in an OpenSpec change before it moves to implementation, and find the issues that would actually cause implementation failure or rework.

&lt;span class="gu"&gt;## Core Principle: Distinguish Substantive Defects from Formatting Issues&lt;/span&gt;

&lt;span class="gs"&gt;**Substantive defects = issues that cause the implementation to go in the wrong direction, miss critical scenarios, create contradictions, or make acceptance impossible.**&lt;/span&gt;
&lt;span class="gs"&gt;**Formatting issues = style or wording differences that don't affect implementation quality.**&lt;/span&gt;

Your primary job is to find the former. You can mention the latter, but mark them as optional suggestions and put them at the end.&lt;span class="sb"&gt;


&lt;/span&gt;&lt;span class="gu"&gt;## Your Position&lt;/span&gt;

You work in the &lt;span class="gs"&gt;**phase between `/opsx-propose` and `/opsx-apply`**&lt;/span&gt;:

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;explore → /opsx-propose → ⬅ you are here (possibly multiple rounds) → /opsx-apply → verify → archive&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;
The spec is not yet frozen. Implementation has not started. Your mission: &lt;span class="gs"&gt;**find the defects that would actually cause rework or incidents before any code gets written**&lt;/span&gt;. Catching a spec error takes minutes. Fixing wrong code takes hours.

&lt;span class="gu"&gt;## Principles&lt;/span&gt;
&lt;span class="p"&gt;
-&lt;/span&gt; &lt;span class="gs"&gt;**Constructive and strict.**&lt;/span&gt; For every issue, explain not just "what" but "why it would cause rework or an incident."
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="gs"&gt;**Specific, not vague.**&lt;/span&gt; Point to exact file locations, requirement names, and task numbers.
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="gs"&gt;**Severity levels.**&lt;/span&gt; 🔴 Blocking vs 🟡 Should Fix vs 💡 Suggestion — don't mix them up.
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="gs"&gt;**Context-aware.**&lt;/span&gt; Evaluate against the existing system (&lt;span class="sb"&gt;`openspec/specs/`&lt;/span&gt;) rather than in a vacuum.
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="gs"&gt;**Read-only.**&lt;/span&gt; Never modify files. You surface problems; OpenSpec executes the fixes.

&lt;span class="gu"&gt;## Anti-Patterns to Avoid&lt;/span&gt;
&lt;span class="p"&gt;
-&lt;/span&gt; Rubber-stamping: saying "looks good!" without deep review.
&lt;span class="p"&gt;-&lt;/span&gt; Nitpicking: focusing on formatting while missing architectural flaws.
&lt;span class="p"&gt;-&lt;/span&gt; Jumping to solutions: proposing fixes before the user acknowledges the problem exists.
&lt;span class="p"&gt;-&lt;/span&gt; Ignoring existing specs: reviewing incremental changes without understanding the baseline.
&lt;span class="p"&gt;-&lt;/span&gt; Vague feedback: "this could be better" — say exactly what and why.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To review proposals from a different angle and improve the reflection quality, I recommend using a different LLM for the reflection agent than the one the main agent uses. For example, my primary coding agent uses deepseek-v4-pro, so the reflection agent uses kimi k2.6.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;OpenSpec Change Reviewer — after propose and before apply, critically reviews all artifact files under the change (proposal/design/specs/tasks)&lt;/span&gt;
&lt;span class="na"&gt;mode&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;subagent&lt;/span&gt;
&lt;span class="na"&gt;model&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;kimi-for-coding/k2p6&lt;/span&gt;
&lt;span class="na"&gt;tools&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;write&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
    &lt;span class="na"&gt;edit&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
    &lt;span class="na"&gt;bash&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Locking down the openSpec workflow
&lt;/h3&gt;

&lt;p&gt;OpenSpec doesn't actually have a fixed workflow design, users follow a default SDD best practice. So the first step is to lock this workflow down, making OpenCode write an OpenSpec proposal before writing any code.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="nn"&gt;---&lt;/span&gt;
&lt;span class="na"&gt;name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;openspec-workflow&lt;/span&gt;
&lt;span class="na"&gt;description&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Mandatory prerequisite for ALL OpenSpec operations — load this BEFORE any openspec-* skill. Use when running /opsx-apply, /opsx-propose, /opsx-verify, /opsx-archive, /opsx-explore, /opsx-sync; running `openspec status`, `openspec list`, `openspec instructions`; reading files under `openspec/changes/`; or doing any OpenSpec stage (propose, apply, verify, archive, explore, sync).&lt;/span&gt;
&lt;span class="na"&gt;license&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;MIT&lt;/span&gt;
&lt;span class="na"&gt;compatibility&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Requires openspec CLI.&lt;/span&gt;
&lt;span class="na"&gt;metadata&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;author&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;Peng Qian&lt;/span&gt;
    &lt;span class="na"&gt;version&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;1.0"&lt;/span&gt;
&lt;span class="nn"&gt;---&lt;/span&gt;

&lt;span class="gu"&gt;## OpenSpec Workflow (Mandatory)&lt;/span&gt;

&lt;span class="gs"&gt;**All code changes must have a proposal before any code gets written.**&lt;/span&gt;

&lt;span class="gu"&gt;### Process&lt;/span&gt;
&lt;span class="p"&gt;1.&lt;/span&gt; &lt;span class="gs"&gt;**Explore**&lt;/span&gt; - When the user says "think about it," "discuss," or "explore," discuss only — no coding.
&lt;span class="p"&gt;2.&lt;/span&gt; &lt;span class="gs"&gt;**Propose**&lt;/span&gt; - Create proposal files under &lt;span class="sb"&gt;`openspec/changes/&amp;lt;change-name&amp;gt;/`&lt;/span&gt;.
&lt;span class="p"&gt;3.&lt;/span&gt; &lt;span class="gs"&gt;**Apply**&lt;/span&gt; - Implement according to the proposal tasks. &lt;span class="gs"&gt;**No file modifications without a proposal.**&lt;/span&gt;
&lt;span class="p"&gt;4.&lt;/span&gt; &lt;span class="gs"&gt;**Verify**&lt;/span&gt; - Verify after implementation is complete.
&lt;span class="p"&gt;5.&lt;/span&gt; &lt;span class="gs"&gt;**Archive**&lt;/span&gt; - Archive the change.

&lt;span class="gu"&gt;### Hard Rules&lt;/span&gt;
&lt;span class="p"&gt;
-&lt;/span&gt; &lt;span class="gs"&gt;**Bug fixes don't require editing or creating a proposal.**&lt;/span&gt; This hard rule only applies to feature changes or new features.
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="gs"&gt;**No proposal, no change:**&lt;/span&gt; If the user asks to modify code, confirm that a matching proposal exists first, or create one.
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="gs"&gt;**No proposal, no edits:**&lt;/span&gt; Before editing a file, check that a matching change directory exists under &lt;span class="sb"&gt;`openspec/changes/`&lt;/span&gt;.
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="gs"&gt;**No coding in Explore mode:**&lt;/span&gt; When the user is in explore mode, &lt;span class="gs"&gt;**do not**&lt;/span&gt; create proposals, &lt;span class="gs"&gt;**do not**&lt;/span&gt; edit files, &lt;span class="gs"&gt;**do not**&lt;/span&gt; write tests.
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="gs"&gt;**After a change is complete:**&lt;/span&gt; Run the verify process to check that the implementation matches the proposal.

&lt;span class="gu"&gt;### Proposal Creation Requirements&lt;/span&gt;

Every change must include:
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="sb"&gt;`proposal.md`&lt;/span&gt; - reason and scope of the change
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="sb"&gt;`design.md`&lt;/span&gt; - design plan
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="sb"&gt;`tasks.md`&lt;/span&gt; - specific task list
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="sb"&gt;`.openspec.yaml`&lt;/span&gt; - change metadata

Additional requirements:
&lt;span class="p"&gt;-&lt;/span&gt; &lt;span class="gs"&gt;**Task granularity:**&lt;/span&gt; Each task in &lt;span class="sb"&gt;`tasks.md`&lt;/span&gt; should take no more than 2 hours.

&lt;span class="gu"&gt;### Violation Handling&lt;/span&gt;

Stop immediately and alert the user if any of the following are detected:
&lt;span class="p"&gt;-&lt;/span&gt; Code modification starts without a proposal.
&lt;span class="p"&gt;-&lt;/span&gt; Files are edited in explore mode.
&lt;span class="p"&gt;-&lt;/span&gt; Files outside the current proposal's scope are modified.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can put this workflow into your project's &lt;code&gt;AGENTS.md&lt;/code&gt; file so OpenCode follows it consistently. Or put it in the global &lt;code&gt;~/.config/opencode/AGENTS.md&lt;/code&gt; file so you don't have to configure it for every project.&lt;/p&gt;

&lt;p&gt;A better option is to turn the workflow into a skill, so OpenCode only loads this workflow definition when using OpenSpec. I've packaged the full workflow as the &lt;code&gt;openspec-workflow&lt;/code&gt; skill, you can grab the source file at the end of the article.&lt;/p&gt;

&lt;h3&gt;
  
  
  Inserting the reflection agent into the workflow
&lt;/h3&gt;

&lt;p&gt;Once the OpenSpec workflow is locked down as a skill, every SDD coding session will follow the spec-first, code-second process, which means the &lt;code&gt;/opsx-propose&lt;/code&gt; and &lt;code&gt;/opsx-apply&lt;/code&gt; phases.&lt;/p&gt;

&lt;p&gt;As mentioned earlier, the reflection agent sits between these two phases, reviewing the quality of the proposal files before any implementation starts. Following the reflection pattern, this review-and-fix cycle can run for multiple rounds until the proposal artifacts have no serious issues.&lt;/p&gt;

&lt;p&gt;To prevent an infinite loop, we need a hard cap on the number of review rounds. Let's update the &lt;code&gt;openspec-workflow&lt;/code&gt; &lt;code&gt;SKILL.md&lt;/code&gt; file to add the reflection process:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight markdown"&gt;&lt;code&gt;&lt;span class="gu"&gt;### OpenSpec Reflection Process&lt;/span&gt;
&lt;span class="p"&gt;
1.&lt;/span&gt; &lt;span class="gs"&gt;**After each batch of artifacts is created**&lt;/span&gt;, the &lt;span class="sb"&gt;`@openspec-reviewer`&lt;/span&gt; agent &lt;span class="gs"&gt;**must**&lt;/span&gt; be called to review the &lt;span class="gs"&gt;**artifact files in that batch**&lt;/span&gt;.
&lt;span class="p"&gt;2.&lt;/span&gt; The main agent fixes the &lt;span class="gs"&gt;**current batch of artifact files**&lt;/span&gt; based on the feedback from &lt;span class="sb"&gt;`@openspec-reviewer`&lt;/span&gt;.
&lt;span class="p"&gt;3.&lt;/span&gt; Call &lt;span class="sb"&gt;`@openspec-reviewer`&lt;/span&gt; again to review the &lt;span class="gs"&gt;**current batch of artifact files**&lt;/span&gt;.
&lt;span class="p"&gt;4.&lt;/span&gt; &lt;span class="gs"&gt;**Review pass criteria:**&lt;/span&gt;
   4a. &lt;span class="gs"&gt;**Single-round pass:**&lt;/span&gt; After the current review round, if "### 🔴 Remaining Issues" does not exist or is empty, move to the next batch.
   4b. &lt;span class="gs"&gt;**Fix loop:**&lt;/span&gt; If 🔴 issues remain → main agent fixes → next review round → back to 4a. Repeat until passing or 4c triggers.
   4c. &lt;span class="gs"&gt;**Hard cap (MAX_ROUNDS = 5):**&lt;/span&gt; If the same batch has gone through 5 review rounds without passing 4a → stop the loop and hand off to a human for a decision.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;At this point, a typical reflection process for OpenSpec proposals is in place. All you need to do is run &lt;code&gt;/opsx-propose&lt;/code&gt;, go grab a coffee, and wait for the reflection agent to gradually refine your proposal.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F99xb4z1qaiu6pj928k67.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F99xb4z1qaiu6pj928k67.png" alt="After the OpenSpec proposal is created, the openspec-reviewer agent will be automatically invoked to begin reviewing and fixing it. Image by Author" width="579" height="334"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;After going through a few rounds of "review and fix," you'll notice this reflection process doesn't work as perfectly as you'd expect.&lt;/p&gt;

&lt;p&gt;It's common to fix issue A only to have issue B pop up. Sometimes all 5 review rounds finish, and not every issue is fully resolved. Especially with powerful but not top-tier models like deepseek-v4. It's fine. We can optimize the process to fix this.&lt;/p&gt;

&lt;h3&gt;
  
  
  So what's next:
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;I'll require that discussions in the explore phase be saved as files, so the proposal creation and review process has a checklist baseline.&lt;/li&gt;
&lt;li&gt;Review one file at a time, rather than waiting for all files to be generated before starting the review.&lt;/li&gt;
&lt;li&gt;Save the logs from each round of review, so they can be used in the next round or when someone needs to step in manually.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I've also included the full source files for the reflection agent and the OpenSpec Workflow skill.&lt;/p&gt;

&lt;p&gt;Interested? &lt;a href="https://www.dataleadsfuture.com/reflection-sdd-use-a-reflection-harness-to-level-up-your-openspec-workflow/" rel="noopener noreferrer"&gt;&lt;strong&gt;Click here to keep reading.&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>coding</category>
      <category>vibecoding</category>
    </item>
    <item>
      <title>Ride the wave of AI coding, don't get swept away by it.

In my latest article, I dive into the practical details of building your own personal AI coding setup using OpenCode Oh-My-OpenCode-Slim OpenSpec. 

It will help you get a better handle on AI coding.</title>
      <dc:creator>Peng Qian</dc:creator>
      <pubDate>Fri, 17 Apr 2026 12:14:25 +0000</pubDate>
      <link>https://dev.to/qtalen/ride-the-wave-of-ai-coding-dont-get-swept-away-by-it-in-my-latest-article-i-dive-into-the-2e57</link>
      <guid>https://dev.to/qtalen/ride-the-wave-of-ai-coding-dont-get-swept-away-by-it-in-my-latest-article-i-dive-into-the-2e57</guid>
      <description>&lt;div class="ltag__link--embedded"&gt;
  &lt;div class="crayons-story "&gt;
  &lt;a href="https://dev.to/qtalen/how-i-use-opencode-oh-my-opencode-slim-and-openspec-to-build-my-own-ai-coding-environment-2clf" class="crayons-story__hidden-navigation-link"&gt;How I Use OpenCode, Oh-My-OpenCode-Slim, and OpenSpec to Build My Own AI Coding Environment&lt;/a&gt;


  &lt;div class="crayons-story__body crayons-story__body-full_post"&gt;
    &lt;div class="crayons-story__top"&gt;
      &lt;div class="crayons-story__meta"&gt;
        &lt;div class="crayons-story__author-pic"&gt;

          &lt;a href="/qtalen" class="crayons-avatar  crayons-avatar--l  "&gt;
            &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2927908%2F838177c1-066e-4986-a3ba-d764afa88632.jpg" alt="qtalen profile" class="crayons-avatar__image"&gt;
          &lt;/a&gt;
        &lt;/div&gt;
        &lt;div&gt;
          &lt;div&gt;
            &lt;a href="/qtalen" class="crayons-story__secondary fw-medium m:hidden"&gt;
              Peng Qian
            &lt;/a&gt;
            &lt;div class="profile-preview-card relative mb-4 s:mb-0 fw-medium hidden m:inline-block"&gt;
              
                Peng Qian
                
              
              &lt;div id="story-author-preview-content-3509709" class="profile-preview-card__content crayons-dropdown branded-7 p-4 pt-0"&gt;
                &lt;div class="gap-4 grid"&gt;
                  &lt;div class="-mt-4"&gt;
                    &lt;a href="/qtalen" class="flex"&gt;
                      &lt;span class="crayons-avatar crayons-avatar--xl mr-2 shrink-0"&gt;
                        &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2927908%2F838177c1-066e-4986-a3ba-d764afa88632.jpg" class="crayons-avatar__image" alt=""&gt;
                      &lt;/span&gt;
                      &lt;span class="crayons-link crayons-subtitle-2 mt-5"&gt;Peng Qian&lt;/span&gt;
                    &lt;/a&gt;
                  &lt;/div&gt;
                  &lt;div class="print-hidden"&gt;
                    
                      Follow
                    
                  &lt;/div&gt;
                  &lt;div class="author-preview-metadata-container"&gt;&lt;/div&gt;
                &lt;/div&gt;
              &lt;/div&gt;
            &lt;/div&gt;

          &lt;/div&gt;
          &lt;a href="https://dev.to/qtalen/how-i-use-opencode-oh-my-opencode-slim-and-openspec-to-build-my-own-ai-coding-environment-2clf" class="crayons-story__tertiary fs-xs"&gt;&lt;time&gt;Apr 17&lt;/time&gt;&lt;span class="time-ago-indicator-initial-placeholder"&gt;&lt;/span&gt;&lt;/a&gt;
        &lt;/div&gt;
      &lt;/div&gt;

    &lt;/div&gt;

    &lt;div class="crayons-story__indention"&gt;
      &lt;h2 class="crayons-story__title crayons-story__title-full_post"&gt;
        &lt;a href="https://dev.to/qtalen/how-i-use-opencode-oh-my-opencode-slim-and-openspec-to-build-my-own-ai-coding-environment-2clf" id="article-link-3509709"&gt;
          How I Use OpenCode, Oh-My-OpenCode-Slim, and OpenSpec to Build My Own AI Coding Environment
        &lt;/a&gt;
      &lt;/h2&gt;
        &lt;div class="crayons-story__tags"&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/vibecoding"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;vibecoding&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/coding"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;coding&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/ai"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;ai&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/python"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;python&lt;/a&gt;
        &lt;/div&gt;
      &lt;div class="crayons-story__bottom"&gt;
        &lt;div class="crayons-story__details"&gt;
          &lt;a href="https://dev.to/qtalen/how-i-use-opencode-oh-my-opencode-slim-and-openspec-to-build-my-own-ai-coding-environment-2clf" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left"&gt;
            &lt;div class="multiple_reactions_aggregate"&gt;
              &lt;span class="multiple_reactions_icons_container"&gt;
                  &lt;span class="crayons_icon_container"&gt;
                    &lt;img src="https://assets.dev.to/assets/sparkle-heart-5f9bee3767e18deb1bb725290cb151c25234768a0e9a2bd39370c382d02920cf.svg" width="18" height="18"&gt;
                  &lt;/span&gt;
              &lt;/span&gt;
              &lt;span class="aggregate_reactions_counter"&gt;2&lt;span class="hidden s:inline"&gt;&amp;nbsp;reactions&lt;/span&gt;&lt;/span&gt;
            &lt;/div&gt;
          &lt;/a&gt;
            &lt;a href="https://dev.to/qtalen/how-i-use-opencode-oh-my-opencode-slim-and-openspec-to-build-my-own-ai-coding-environment-2clf#comments" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left flex items-center"&gt;
              

              &lt;span class="hidden s:inline"&gt;Add&amp;nbsp;Comment&lt;/span&gt;
            &lt;/a&gt;
        &lt;/div&gt;
        &lt;div class="crayons-story__save"&gt;
          &lt;small class="crayons-story__tertiary fs-xs mr-2"&gt;
            8 min read
          &lt;/small&gt;
            
              &lt;span class="bm-initial"&gt;
                

              &lt;/span&gt;
              &lt;span class="bm-success"&gt;
                

              &lt;/span&gt;
            
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;


</description>
    </item>
    <item>
      <title>How I Use OpenCode, Oh-My-OpenCode-Slim, and OpenSpec to Build My Own AI Coding Environment</title>
      <dc:creator>Peng Qian</dc:creator>
      <pubDate>Fri, 17 Apr 2026 12:11:48 +0000</pubDate>
      <link>https://dev.to/qtalen/how-i-use-opencode-oh-my-opencode-slim-and-openspec-to-build-my-own-ai-coding-environment-2clf</link>
      <guid>https://dev.to/qtalen/how-i-use-opencode-oh-my-opencode-slim-and-openspec-to-build-my-own-ai-coding-environment-2clf</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;I have never used Claude Code. The reason is simple. Claude Code is too expensive. Even with a subscription, the cost-to-value ratio does not work for my research.&lt;/p&gt;

&lt;p&gt;So I have been building my own AI coding environment using OpenCode as the foundation, combined with &lt;code&gt;Oh-My-OpenCode-Slim (multi-agent orchestration)&lt;/code&gt; and &lt;code&gt;OpenSpec (SDD)&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;My take is this: if you understand what you want to build, and you know how to use coding tools properly, especially with well-written Spec files as constraints, frontier open-source models like &lt;code&gt;Qwen3.6-Plus&lt;/code&gt;, &lt;code&gt;Kimi-k2.5&lt;/code&gt;, and &lt;code&gt;GLM-5&lt;/code&gt; can handle your daily coding tasks just fine.&lt;/p&gt;

&lt;p&gt;There is another huge advantage to open-source software: community power. The community can tune system prompts and model parameters to fit different models well, and get the most out of open-source models.&lt;/p&gt;

&lt;p&gt;In this article, I want to share what I have learned from using OpenCode and its surrounding tools. &lt;strong&gt;I will skip the generic tutorials you find everywhere online and focus only on the details I think actually matter.&lt;/strong&gt; I hope this helps you make better choices.&lt;/p&gt;




&lt;h2&gt;
  
  
  Tool Installation and Environment Setup
&lt;/h2&gt;

&lt;p&gt;I will cover my experience in two parts: installing and configuring OpenCode and its related plugins, and my AI coding workflow.&lt;/p&gt;

&lt;p&gt;Let's start with tool installation and environment setup, beginning with OpenCode itself.&lt;/p&gt;

&lt;h3&gt;
  
  
  Installing OpenCode
&lt;/h3&gt;

&lt;p&gt;Unlike most coding agents that only offer a TUI-based command-line tool, OpenCode also comes with a desktop app with a graphical interface. I use the desktop app for my daily coding work. It is clearly much more efficient than the TUI version.&lt;/p&gt;

&lt;p&gt;That said, you still need to install the command-line program first. From my testing, some plugins need a command-line environment to check whether OpenCode is installed on your machine during project initialization.&lt;/p&gt;

&lt;p&gt;First, make sure you have Node.js installed. Then run this npm command to install the OpenCode CLI:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npm i &lt;span class="nt"&gt;-g&lt;/span&gt; opencode-ai
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After that, go to the official website, download the OpenCode Desktop installer, and double-click to install.&lt;/p&gt;

&lt;h3&gt;
  
  
  Configuring OpenCode
&lt;/h3&gt;

&lt;p&gt;After installing OpenCode Desktop, open the app. Once you select your project directory, you will land on the main OpenCode interface. The features are fairly intuitive, so I will not walk through each one. But before you type your first &lt;code&gt;Hello World&lt;/code&gt;, you should check your terminal configuration first.&lt;/p&gt;

&lt;h4&gt;
  
  
  Configuring the terminal
&lt;/h4&gt;

&lt;p&gt;I use Windows 11. On Windows, OpenCode Desktop defaults to PowerShell as its terminal. Many companies, though, do not allow PowerShell. If you are in a non-English locale, OpenCode may run into character encoding issues when running shell commands through PowerShell, causing those commands to fail.&lt;/p&gt;

&lt;p&gt;In that case, you need to change your default terminal.&lt;/p&gt;

&lt;p&gt;OpenCode uses the &lt;code&gt;SHELL&lt;/code&gt; environment variable to determine which terminal to use. You can configure Windows Command Prompt (&lt;code&gt;cmd.exe&lt;/code&gt;), &lt;code&gt;WSL&lt;/code&gt;, or &lt;code&gt;git bash&lt;/code&gt;. Personally, I prefer &lt;code&gt;cmd.exe&lt;/code&gt; because I had already installed a lot of CLI tools before setting up OpenCode. Using &lt;code&gt;cmd.exe&lt;/code&gt; directly saves me from reinstalling everything.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;SET &lt;span class="nv"&gt;SHELL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"%windir%&lt;/span&gt;&lt;span class="se"&gt;\s&lt;/span&gt;&lt;span class="s2"&gt;ystem32&lt;/span&gt;&lt;span class="se"&gt;\c&lt;/span&gt;&lt;span class="s2"&gt;md.exe"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Configuring model providers
&lt;/h4&gt;

&lt;p&gt;Next, let's talk about how to configure model providers.&lt;/p&gt;

&lt;p&gt;Open the settings window and select "Providers". You will see a list of the most popular model providers and API relays. If you want to use open-source models, though, they probably will not be on that list.&lt;/p&gt;

&lt;p&gt;At that point, you might click "Custom provider" and manually fill in the &lt;code&gt;model id&lt;/code&gt;, &lt;code&gt;base url&lt;/code&gt;, &lt;code&gt;api key&lt;/code&gt;, and so on. The problem is that OpenCode then has no idea about your model's context window size or pricing, which causes features like automatic context compression to stop working correctly.&lt;/p&gt;

&lt;p&gt;The right approach is to click the "Show more providers" link at the bottom, find the provider you want to add, and enter that provider's &lt;code&gt;api key&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6771y7fmdh2hnndnpnwp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6771y7fmdh2hnndnpnwp.png" alt="Click the show more providers link and pick a provider for your open-source model. Image by Author" width="800" height="505"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Once configured, all models from that provider will appear in the model list. These models come with metadata like context size and pricing, so context management plugins can work correctly.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiekeexv0rbtmlf9xg44z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiekeexv0rbtmlf9xg44z.png" alt="By setting up your provider correctly, you'll get all kinds of metadata about your models. Image by Author" width="800" height="587"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The downside is that you cannot directly see your provider ID this way, which makes it tricky to configure &lt;code&gt;Oh-My-OpenCode-Slim&lt;/code&gt; later.&lt;/p&gt;

&lt;p&gt;No worries. OpenCode already saved your provider configuration when you selected your provider. You can find it at &lt;code&gt;~/.local/share/opencode/auth.json&lt;/code&gt;. Your provider ID and API key are both there.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foji5lja0iyuverihwlf9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foji5lja0iyuverihwlf9.png" alt="You can find your provider ID in the auth.json file. Image by Author" width="745" height="236"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Enable workspaces
&lt;/h4&gt;

&lt;p&gt;The biggest difference between AI coding and traditional coding is that while you wait for the AI to work, you can actually work on another requirement at the same time. If you use git for version control, you would normally need to create a separate directory and check out a new branch.&lt;/p&gt;

&lt;p&gt;Or you can use git's worktree feature to create a new worktree on top of your current branch. When you are doing parallel development, using worktrees is much more convenient than creating new branches.&lt;/p&gt;

&lt;p&gt;Compared to OpenCode CLI, the desktop app has a clear advantage here: it natively supports the worktree feature. In OpenCode Desktop, this feature is called "workspace".&lt;/p&gt;

&lt;p&gt;The way to open a workspace is a bit hidden. Right-click the project icon in the top-left corner of the window, then select "Enable Workspace" from the menu. From there, you can create multiple workspaces in the conversation list and work on them simultaneously. The corresponding branches and code directories will be created automatically.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbx8sybzg7l7meajwnmch.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbx8sybzg7l7meajwnmch.png" alt="Right-click on your project to enable the workspace. Image by Author" width="259" height="260"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;When the coding work in a workspace is done, you can ask the AI to submit a PR for the current code, then close the workspace. The branch and code directory that were created for it will be cleaned up automatically. Very convenient.&lt;/p&gt;

&lt;h4&gt;
  
  
  Choosing the right agent
&lt;/h4&gt;

&lt;p&gt;If you want to use OpenCode without any extra plugins, pay attention to how you use agents.&lt;/p&gt;

&lt;p&gt;OpenCode has two types of agents: primary agents, which you choose yourself, and sub-agents, which the primary agent calls on its own when needed.&lt;/p&gt;

&lt;p&gt;Without any plugins installed, OpenCode only provides two primary agents: &lt;code&gt;Build&lt;/code&gt; and &lt;code&gt;Plan&lt;/code&gt;: The &lt;code&gt;Build&lt;/code&gt; agent has full tool access and is the default choice. The &lt;code&gt;Plan&lt;/code&gt; agent has no editing permissions. Its job is to ask you clarifying questions when you describe a requirement, and eventually produce an execution plan.&lt;/p&gt;

&lt;p&gt;When you first try this, you might go straight to the Build agent. But for complex tasks, &lt;code&gt;Build&lt;/code&gt; tends to just start coding based on its own interpretation. That is like looking through a straw. It fixes things locally without thinking through the overall architecture and design patterns.&lt;/p&gt;

&lt;p&gt;The right approach is to start every new requirement with the &lt;code&gt;Plan&lt;/code&gt; agent for requirement clarification, and get a solid execution plan first. Only then should you hand things off to &lt;code&gt;Build&lt;/code&gt; to start development.&lt;/p&gt;

&lt;p&gt;But even that is not enough. Model context is limited. As coding progresses, the execution plan from earlier in the conversation can get pushed out of the context window.&lt;/p&gt;

&lt;p&gt;A better approach is to ask &lt;code&gt;Build&lt;/code&gt; to save the execution plan as a Markdown file before starting to code. Review that file, confirm everything looks good, then start a fresh session and have &lt;code&gt;Build&lt;/code&gt; load the execution plan document back in before executing.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpecr0cd2cplwvs92stgh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpecr0cd2cplwvs92stgh.png" alt="The Plan agent generates a tasks file, and the Build agent executes them. Image by Author" width="682" height="331"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Once you start working this way, you will feel how much value comes from planning before executing. It also sets you up well for SDD coding later, which I will cover when we get to OpenSpec.&lt;/p&gt;

&lt;h4&gt;
  
  
  Always start a new session
&lt;/h4&gt;

&lt;p&gt;I just mentioned that after forming a development plan, you should create a new session before continuing with coding. Why?&lt;/p&gt;

&lt;p&gt;Because anyone familiar with LLMs knows that even though context windows are long today, and OpenCode does offer context compression, context rot is still a real problem. LLMs pay more attention to the beginning and end of the context window, and less to the middle. I call this positional bias.&lt;/p&gt;

&lt;p&gt;So to make sure the LLM follows instructions accurately based on the conversation context, especially after forming an execution plan where you need precise execution, start a new session after each major milestone. Do not keep working in the same session forever.&lt;/p&gt;

&lt;h4&gt;
  
  
  Do not forget to create AGENTS.md
&lt;/h4&gt;

&lt;p&gt;The &lt;code&gt;AGENTS.md&lt;/code&gt; file is called "rules" in OpenCode. You can create it automatically with the &lt;code&gt;/init&lt;/code&gt; command. It tells the LLM what rules to follow during coding.&lt;/p&gt;

&lt;p&gt;You may ask: if this file is created by the LLM, does that not mean the LLM already knows all these rules internally? Is saving them to a file redundant?&lt;/p&gt;

&lt;p&gt;Not at all. &lt;code&gt;AGENTS.md&lt;/code&gt; is a file written specifically for the LLM to read. In my view, it serves three important purposes:&lt;/p&gt;

&lt;p&gt;First, &lt;code&gt;AGENTS.md&lt;/code&gt; acts as long-term memory for the project. It locks in facts and choices. For example, after asking the LLM to set up the project structure or create a new module, run &lt;code&gt;/init&lt;/code&gt; once. The project architecture gets locked into &lt;code&gt;AGENTS.md&lt;/code&gt;. Without this, the LLM will scan the entire project from scratch every time you start a new session, wasting a huge number of tokens.&lt;/p&gt;

&lt;p&gt;Another example: if you use &lt;code&gt;uv&lt;/code&gt; to manage your project and use uv sync &lt;code&gt;--prerelease=allow&lt;/code&gt; to sync prerelease dependencies, write that clearly in &lt;code&gt;AGENTS.md&lt;/code&gt;. This prevents the LLM from making errors with dependency management.&lt;/p&gt;

&lt;p&gt;Second, &lt;code&gt;AGENTS.md&lt;/code&gt; narrows the probability distribution and reduces hallucinations. LLMs are probability models. When facing a question, an LLM generates several possible answers with associated probabilities, then picks one based on the &lt;code&gt;temperature&lt;/code&gt; parameter.&lt;/p&gt;

&lt;p&gt;For example, when a method parameter can accept multiple types, the LLM might consider these options:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Use &lt;code&gt;Optional[int]&lt;/code&gt; (40% chance)&lt;/li&gt;
&lt;li&gt;Use &lt;code&gt;int | None&lt;/code&gt; (40% chance)&lt;/li&gt;
&lt;li&gt;Use no type annotation at all (20% chance)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;At that point, the LLM will randomly pick &lt;code&gt;Optional[int]&lt;/code&gt; or &lt;code&gt;int | None&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;But once you explicitly require &lt;code&gt;str | None&lt;/code&gt; syntax in &lt;code&gt;AGENTS.md&lt;/code&gt;, the probability distribution shifts to:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Use &lt;code&gt;int | None&lt;/code&gt; (100% chance)&lt;/li&gt;
&lt;li&gt;Use &lt;code&gt;Optional[int]&lt;/code&gt; (0% chance)&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;At this point, the LLM will just go ahead and pick &lt;code&gt;int | None&lt;/code&gt; as the final answer.&lt;/p&gt;

&lt;p&gt;Third, &lt;code&gt;AGENTS.md&lt;/code&gt; serves a harness engineering purpose. You can give the LLM direct instructions through &lt;code&gt;AGENTS.md&lt;/code&gt; that it must follow. For example, you can tell the LLM to communicate with you in Chinese, or require that during the spec-driven process, it cannot create new proposals without your approval.&lt;/p&gt;

&lt;h4&gt;
  
  
  Improve the success rate of Skills loading
&lt;/h4&gt;

&lt;p&gt;By now, most people in the AI coding space have heard of Skills. But many find that Skills do not load reliably under normal conditions.&lt;/p&gt;

&lt;p&gt;There are two reasons for this. On one hand, the &lt;code&gt;description&lt;/code&gt; in a Skill's front matter is often unclear. We need to write the front matter carefully, especially the &lt;code&gt;description&lt;/code&gt; part. It should clearly explain when the Skill applies and what it provides, so the LLM can load the right Skill for the situation.&lt;/p&gt;

&lt;p&gt;On the other hand, for common coding scenarios, LLMs have learned so much during pretraining that they do not feel the need to load a Skill for extra guidance.&lt;/p&gt;

&lt;p&gt;For this, there is a simple and proven fix. Just add this line to &lt;code&gt;AGENTS.md&lt;/code&gt;:&lt;/p&gt;




&lt;h2&gt;
  
  
  Here's what's coming up next:
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;ol&gt;
&lt;li&gt;How to improve the Skills loading success rate.&lt;/li&gt;
&lt;li&gt;How to properly configure Oh-My-OpenCode-Slim.&lt;/li&gt;
&lt;li&gt;The complete OpenSpec and SDD development workflow.&lt;/li&gt;
&lt;/ol&gt;
&lt;/blockquote&gt;

&lt;p&gt;Visit &lt;a href="https://www.dataleadsfuture.com/how-i-use-opencode-oh-my-opencode-slim-and-openspec-to-build-my-own-ai-coding-environment/" rel="noopener noreferrer"&gt;&lt;strong&gt;Data Leads Future&lt;/strong&gt;&lt;/a&gt; to read the full content.&lt;/p&gt;

</description>
      <category>vibecoding</category>
      <category>coding</category>
      <category>ai</category>
      <category>python</category>
    </item>
    <item>
      <title>The hardest part of enterprise AI agents is wiring into real workflows.

I built a setup where:
✅Agent Skills load in real time from a database
✅Scripts run safely in containers
✅A “skills agent” acts as a tool to keep the main agent’s context clean</title>
      <dc:creator>Peng Qian</dc:creator>
      <pubDate>Thu, 19 Mar 2026 10:34:25 +0000</pubDate>
      <link>https://dev.to/qtalen/the-hardest-part-of-enterprise-ai-agents-is-wiring-into-real-workflows-i-built-a-setup-where-2492</link>
      <guid>https://dev.to/qtalen/the-hardest-part-of-enterprise-ai-agents-is-wiring-into-real-workflows-i-built-a-setup-where-2492</guid>
      <description>&lt;div class="ltag__link--embedded"&gt;
  &lt;div class="crayons-story "&gt;
  &lt;a href="https://dev.to/qtalen/how-to-use-agent-skills-in-enterprise-llm-agent-systems-15h2" class="crayons-story__hidden-navigation-link"&gt;How to Use Agent Skills in Enterprise LLM Agent Systems&lt;/a&gt;


  &lt;div class="crayons-story__body crayons-story__body-full_post"&gt;
    &lt;div class="crayons-story__top"&gt;
      &lt;div class="crayons-story__meta"&gt;
        &lt;div class="crayons-story__author-pic"&gt;

          &lt;a href="/qtalen" class="crayons-avatar  crayons-avatar--l  "&gt;
            &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2927908%2F838177c1-066e-4986-a3ba-d764afa88632.jpg" alt="qtalen profile" class="crayons-avatar__image"&gt;
          &lt;/a&gt;
        &lt;/div&gt;
        &lt;div&gt;
          &lt;div&gt;
            &lt;a href="/qtalen" class="crayons-story__secondary fw-medium m:hidden"&gt;
              Peng Qian
            &lt;/a&gt;
            &lt;div class="profile-preview-card relative mb-4 s:mb-0 fw-medium hidden m:inline-block"&gt;
              
                Peng Qian
                
              
              &lt;div id="story-author-preview-content-3367444" class="profile-preview-card__content crayons-dropdown branded-7 p-4 pt-0"&gt;
                &lt;div class="gap-4 grid"&gt;
                  &lt;div class="-mt-4"&gt;
                    &lt;a href="/qtalen" class="flex"&gt;
                      &lt;span class="crayons-avatar crayons-avatar--xl mr-2 shrink-0"&gt;
                        &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2927908%2F838177c1-066e-4986-a3ba-d764afa88632.jpg" class="crayons-avatar__image" alt=""&gt;
                      &lt;/span&gt;
                      &lt;span class="crayons-link crayons-subtitle-2 mt-5"&gt;Peng Qian&lt;/span&gt;
                    &lt;/a&gt;
                  &lt;/div&gt;
                  &lt;div class="print-hidden"&gt;
                    
                      Follow
                    
                  &lt;/div&gt;
                  &lt;div class="author-preview-metadata-container"&gt;&lt;/div&gt;
                &lt;/div&gt;
              &lt;/div&gt;
            &lt;/div&gt;

          &lt;/div&gt;
          &lt;a href="https://dev.to/qtalen/how-to-use-agent-skills-in-enterprise-llm-agent-systems-15h2" class="crayons-story__tertiary fs-xs"&gt;&lt;time&gt;Mar 19&lt;/time&gt;&lt;span class="time-ago-indicator-initial-placeholder"&gt;&lt;/span&gt;&lt;/a&gt;
        &lt;/div&gt;
      &lt;/div&gt;

    &lt;/div&gt;

    &lt;div class="crayons-story__indention"&gt;
      &lt;h2 class="crayons-story__title crayons-story__title-full_post"&gt;
        &lt;a href="https://dev.to/qtalen/how-to-use-agent-skills-in-enterprise-llm-agent-systems-15h2" id="article-link-3367444"&gt;
          How to Use Agent Skills in Enterprise LLM Agent Systems
        &lt;/a&gt;
      &lt;/h2&gt;
        &lt;div class="crayons-story__tags"&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/ai"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;ai&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/programming"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;programming&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/datascience"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;datascience&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/tutorial"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;tutorial&lt;/a&gt;
        &lt;/div&gt;
      &lt;div class="crayons-story__bottom"&gt;
        &lt;div class="crayons-story__details"&gt;
          &lt;a href="https://dev.to/qtalen/how-to-use-agent-skills-in-enterprise-llm-agent-systems-15h2" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left"&gt;
            &lt;div class="multiple_reactions_aggregate"&gt;
              &lt;span class="multiple_reactions_icons_container"&gt;
                  &lt;span class="crayons_icon_container"&gt;
                    &lt;img src="https://assets.dev.to/assets/sparkle-heart-5f9bee3767e18deb1bb725290cb151c25234768a0e9a2bd39370c382d02920cf.svg" width="18" height="18"&gt;
                  &lt;/span&gt;
              &lt;/span&gt;
              &lt;span class="aggregate_reactions_counter"&gt;6&lt;span class="hidden s:inline"&gt;&amp;nbsp;reactions&lt;/span&gt;&lt;/span&gt;
            &lt;/div&gt;
          &lt;/a&gt;
            &lt;a href="https://dev.to/qtalen/how-to-use-agent-skills-in-enterprise-llm-agent-systems-15h2#comments" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left flex items-center"&gt;
              

              3&lt;span class="hidden s:inline"&gt;&amp;nbsp;comments&lt;/span&gt;
            &lt;/a&gt;
        &lt;/div&gt;
        &lt;div class="crayons-story__save"&gt;
          &lt;small class="crayons-story__tertiary fs-xs mr-2"&gt;
            11 min read
          &lt;/small&gt;
            
              &lt;span class="bm-initial"&gt;
                

              &lt;/span&gt;
              &lt;span class="bm-success"&gt;
                

              &lt;/span&gt;
            
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;


</description>
      <category>ai</category>
      <category>programming</category>
      <category>datascience</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>How to Use Agent Skills in Enterprise LLM Agent Systems</title>
      <dc:creator>Peng Qian</dc:creator>
      <pubDate>Thu, 19 Mar 2026 10:32:10 +0000</pubDate>
      <link>https://dev.to/qtalen/how-to-use-agent-skills-in-enterprise-llm-agent-systems-15h2</link>
      <guid>https://dev.to/qtalen/how-to-use-agent-skills-in-enterprise-llm-agent-systems-15h2</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Enterprise-grade agentic systems have fallen way behind the desktop agent apps that everyone's been buzzing about lately.&lt;/p&gt;

&lt;p&gt;After spending the better part of a year building enterprise agent applications, I came to one conclusion: &lt;strong&gt;if your agent system can't plug into your company's existing business processes, it won't bring real value to your organization.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Desktop systems like OpenClaw and Claude Cowork solved this problem. They don't change their agent setup at all. Instead, they use Agent Skills to capture human business processes, then share those skills between desktop agent systems through the file system. That's how they tackle one business problem after another.&lt;/p&gt;

&lt;p&gt;But enterprise users write their skills through a web interface and save them to a database. There's a good chance the process involves complex approval and security audit steps, too. So how does your agent load these skills in real time without any downtime?&lt;/p&gt;

&lt;p&gt;The latest version of Microsoft Agent Framework finally makes this possible with its Agent Skills feature.&lt;/p&gt;

&lt;h3&gt;
  
  
  TL;DR
&lt;/h3&gt;

&lt;p&gt;With Agent Skills in Microsoft Agent Framework, enterprise agent systems can load user-defined business process skills from a database in real time, and run the scripts and generated code that come with those skills safely inside containers.&lt;/p&gt;

&lt;p&gt;Your agent system stays secure and stable, while gaining the same flexible business process orchestration that desktop agents enjoy.&lt;/p&gt;

&lt;p&gt;All the source code in this tutorial is available at the end of the article.&lt;/p&gt;




&lt;h2&gt;
  
  
  Before We Start
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Install the latest Microsoft Agent Framework
&lt;/h3&gt;

&lt;p&gt;To use Agent Skills, install the latest version of Microsoft Agent Framework:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;agent-framework &lt;span class="nt"&gt;--pre&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or, like me, you can pin the version of &lt;code&gt;agent-framework&lt;/code&gt; in your &lt;code&gt;pyproject.toml&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tex"&gt;&lt;code&gt;dependencies = [
    "agent-framework&amp;gt;=1.0.0rc4",
    "agent-framework-ag-ui&amp;gt;=1.0.0b260311",
]
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then tell &lt;code&gt;uv&lt;/code&gt; to allow prerelease versions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;uv &lt;span class="nb"&gt;sync&lt;/span&gt; &lt;span class="nt"&gt;--prerelease&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;allow
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Install Tavily Agent Skills
&lt;/h3&gt;

&lt;p&gt;My end goal is to show you how to share and load Agent Skills between agents deployed across distributed nodes. But I think we should start simple. First, let me show you how to load and use skills from the community.&lt;/p&gt;

&lt;p&gt;Let's start with Tavily Agent Skills. We'll only load the &lt;code&gt;tavily-best-practices&lt;/code&gt; skill. It guides my agent on how to generate Tavily-based search code based on the task at hand, instead of calling a hardcoded function:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;npx skills add tavily-ai/skills
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Don't worry. After the initial demo, I'll walk you through how to load skills from a database in real time.&lt;/p&gt;




&lt;h2&gt;
  
  
  How to Load Agent Skills from Disk
&lt;/h2&gt;

&lt;p&gt;Let's start with the most basic approach.&lt;/p&gt;

&lt;p&gt;In Microsoft Agent Framework, context operations are handled by a base class called &lt;code&gt;ContextProvider&lt;/code&gt;. The latest version of MAF ships a &lt;code&gt;SkillsProvider&lt;/code&gt; class. Use it directly and pass the location of your skills through the &lt;code&gt;skill_paths&lt;/code&gt; attribute, and you're done. skill_paths doesn't require a default directory like &lt;code&gt;.claude/skills&lt;/code&gt;, and you can pass in multiple paths.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;skills_provider&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SkillsProvider&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;skill_paths&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;get_current_directory&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.agents/skills&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next, create your agent and pass the &lt;code&gt;skills_provider&lt;/code&gt; instance through &lt;code&gt;context_providers&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;skills_agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;chat_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;as_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;SkillsAssistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;instructions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;re a helpful assistant, and you&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ll respond to user requests according to your skills.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;context_providers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;skills_provider&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;code_tool&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To run the Python code the agent writes based on the Tavily skill instructions, you need to pass a &lt;code&gt;code_interpreter&lt;/code&gt; tool to the agent. Let the code run inside a container environment. I'll cover that in detail later.&lt;/p&gt;

&lt;p&gt;Write a &lt;code&gt;main&lt;/code&gt; method to test the agent:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;code_executor&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;session&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_session&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;skills_agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Check how gold ETFs performed in February 2026 and give some investment advice.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;session&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Microsoft Agent Framework provides an OpenTelemetry-based telemetry tool. I hooked it up to MLflow. Let's run the agent once and see what happens:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fitwpu0eb19fqkya3s0wb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fitwpu0eb19fqkya3s0wb.png" alt="Through MLflow, you can see that the agent successfully loaded and executed the Agent Skills. Image by Author" width="800" height="561"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can see that once the agent decided it needed Tavily to search, it loaded the full &lt;code&gt;SKILL.md&lt;/code&gt; document, wrote Tavily search code following the instructions, then sent it to the code interpreter for execution. Exactly what we expected.&lt;/p&gt;

&lt;p&gt;You can learn how to use MLFlow in this article:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.dataleadsfuture.com/monitoring-qwen-3-agents-with-mlflow-3-x-end-to-end-tracking-tutorial/" rel="noopener noreferrer"&gt;Monitoring Qwen 3 Agents with MLflow 3.x: End-to-End Tracing Tutorial&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  How Agent Skills Work
&lt;/h2&gt;

&lt;p&gt;Now let's talk about how to get the most out of Agent Skills in enterprise systems. That means loading external skills in real time, containerizing the code interpreter, and managing context more carefully.&lt;/p&gt;

&lt;p&gt;But before we go there, let's dig into how Agent Skills actually work inside MAF, so the rest of this tutorial makes more sense.&lt;/p&gt;

&lt;p&gt;As I mentioned, &lt;code&gt;SkillsProvider&lt;/code&gt; extends &lt;code&gt;BaseContextProvider&lt;/code&gt;, which means it works by operating on the agent's context.&lt;/p&gt;

&lt;p&gt;When you initialize &lt;code&gt;SkillsProvider&lt;/code&gt;, you pass one or more search paths to the &lt;code&gt;skill_paths&lt;/code&gt; attribute. Take the &lt;code&gt;.agents/skills&lt;/code&gt; directory as an example. On startup, &lt;code&gt;SkillsProvider&lt;/code&gt; recursively searches this directory and finds every subdirectory that contains a &lt;code&gt;SKILL.md&lt;/code&gt; file. Then it extracts the &lt;code&gt;name&lt;/code&gt; and &lt;code&gt;description&lt;/code&gt; fields from each &lt;code&gt;SKILL.md&lt;/code&gt; file, along with the file content, and stores everything in a &lt;code&gt;Skill&lt;/code&gt; object.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;SkillsProvider&lt;/code&gt; loops through these &lt;code&gt;Skill&lt;/code&gt; objects, formats the &lt;code&gt;name&lt;/code&gt; and &lt;code&gt;description&lt;/code&gt; fields like this, and merges them into the agent's system prompt. This keeps the agent aware of available skills without loading their full content upfront.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;lines&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;  &amp;lt;skill&amp;gt;&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;lines&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;    &amp;lt;name&amp;gt;&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;xml_escape&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;skill&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;&amp;lt;/name&amp;gt;&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;lines&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;    &amp;lt;description&amp;gt;&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;xml_escape&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;skill&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;&amp;lt;/description&amp;gt;&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;lines&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;  &amp;lt;/skill&amp;gt;&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;SkillsProvider&lt;/code&gt; also adds two methods to the agent through context: &lt;code&gt;load_skill&lt;/code&gt; and &lt;code&gt;read_skill_resource&lt;/code&gt;. When the agent decides which skill it needs based on the user's request, it calls &lt;code&gt;load_skill&lt;/code&gt; to look up the matching &lt;code&gt;Skill&lt;/code&gt; object by name and loads its full content into the context.&lt;/p&gt;

&lt;p&gt;If a skill's content references extra resource files like &lt;code&gt;references/search.md&lt;/code&gt;, the agent can call &lt;code&gt;read_skill_resource&lt;/code&gt; to load those files.&lt;/p&gt;

&lt;p&gt;Here's the full workflow:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fev66hdgmrajgccddrswy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fev66hdgmrajgccddrswy.png" alt="A diagram illustrating the workflow of SkillsProvider. Image by Author" width="771" height="481"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This design follows the progressive disclosure principle defined by &lt;a href="https://agentskills.io/home?ref=dataleadsfuture.com" rel="noopener noreferrer"&gt;agentskills.io&lt;/a&gt;. Skill content loads into the agent's context gradually, only when needed. No context explosion, no wasted tokens.&lt;/p&gt;




&lt;h2&gt;
  
  
  Agent Skills for Enterprise Systems
&lt;/h2&gt;

&lt;p&gt;Alright, enough theory. Let's get into today's main topic: &lt;strong&gt;how to use Agent Skills in enterprise-grade agentic systems.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Load skills from external systems in real time
&lt;/h3&gt;

&lt;p&gt;What if business users write their skills through a cloud-based web page and save them to a database? How do you handle that?&lt;/p&gt;

&lt;p&gt;We need a new approach to sync and apply Agent Skills in real time.&lt;/p&gt;

&lt;p&gt;As I covered earlier, when &lt;code&gt;SkillsProvider&lt;/code&gt; initializes, it loads all &lt;code&gt;SKILL.md&lt;/code&gt; files from the input paths into an in-memory list of &lt;code&gt;Skill&lt;/code&gt; objects.&lt;/p&gt;

&lt;p&gt;Besides the file system approach, &lt;code&gt;SkillsProvider&lt;/code&gt; also supports Code Defined Skills, where you write skill content directly in code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pathlib&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Path&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;agent_framework&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Skill&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;SkillsProvider&lt;/span&gt;

&lt;span class="n"&gt;my_skill&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Skill&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;my-code-skill&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;A code-defined skill&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Instructions for the skill.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then pass it to &lt;code&gt;SkillsProvider&lt;/code&gt; through the &lt;code&gt;skills&lt;/code&gt; attribute:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;skills_provider&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SkillsProvider&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;skill_paths&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;__file__&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;parent&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;skills&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;skills&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;my_skill&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This opens the door to managing and loading skills from a database. But the original &lt;code&gt;SkillsProvider&lt;/code&gt; class only accepts skills at initialization time. We want to load skills dynamically while the agent system is running, so we need to extend &lt;code&gt;SkillsProvider&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;After reading the source code, I found that every class extending &lt;code&gt;BaseContextProvider&lt;/code&gt; has a &lt;code&gt;before_run&lt;/code&gt; method that gets called when the agent calls &lt;code&gt;run&lt;/code&gt;. We can load the latest skills from the database before &lt;code&gt;before_run&lt;/code&gt; executes, then update &lt;code&gt;SkillsProvider&lt;/code&gt;'s &lt;code&gt;self._skills&lt;/code&gt; list and refresh the skills description in &lt;code&gt;instructions&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;What I need is a hook method. Every time before &lt;code&gt;before_run&lt;/code&gt; runs, this hook fetches the latest skills. All I need to do is put the database fetching logic inside this hook.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faq33zflqmrchfw1awav4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Faq33zflqmrchfw1awav4.png" alt="The workflow for loading skills from the database in real time. Image by Author" width="698" height="527"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The simplest way to give &lt;code&gt;SkillsProvider&lt;/code&gt; this hook is to build an &lt;code&gt;UpdatableSkillsProvider&lt;/code&gt; subclass. This subclass accepts a &lt;code&gt;skills_updater&lt;/code&gt; parameter at initialization:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;UpdatableSkillsProvider&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;SkillsProvider&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;skill_paths&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;Path&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;Sequence&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;Path&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;skills_updater&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Callable&lt;/span&gt;&lt;span class="p"&gt;[[],&lt;/span&gt; &lt;span class="n"&gt;Awaitable&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Sequence&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Skill&lt;/span&gt;&lt;span class="p"&gt;]]]&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;
    &lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="nf"&gt;super&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;skill_paths&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;skill_paths&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_skills_updater&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;skills_updater&lt;/span&gt;
        &lt;span class="bp"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;UpdatableSkillsProvider&lt;/code&gt; calls the hook through a private &lt;code&gt;_update&lt;/code&gt; method, which also updates &lt;code&gt;self._skills&lt;/code&gt; and the agent's system prompt. Then &lt;code&gt;before_run&lt;/code&gt; calls &lt;code&gt;_update&lt;/code&gt; to keep skills fresh in real time:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;UpdatableSkillsProvider&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;SkillsProvider&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="bp"&gt;...&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_update&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_skills_updater&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt;

        &lt;span class="k"&gt;try&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;new_skills&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_skills_updater&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;skill&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;new_skills&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_skills&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;skill&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;skill&lt;/span&gt;

            &lt;span class="n"&gt;has_scripts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;any&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;scripts&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_skills&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;values&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;

            &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_instructions&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;_create_instructions&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;prompt_template&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_instruction_template&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;skills&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_skills&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;include_script_runner_instructions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;has_scripts&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;

            &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_tools&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_create_tools&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;include_script_runner_tool&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;has_scripts&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;require_script_approval&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_require_script_approval&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;except&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;exc&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;logger&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exception&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Failed to update skills: %s&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;exc&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="nd"&gt;@override&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;before_run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_update&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;super&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;before_run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let's write a &lt;code&gt;get_latest_skills&lt;/code&gt; hook to simulate loading the latest skills from a database:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@lru_cache&lt;/span&gt;
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_latest_skills&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;Skill&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    Pseudocode. In this hook method, you can read the skills text from the database 
    and dynamically build Skill objects.
    :return: 
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
    &lt;span class="n"&gt;code_style_skill&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Skill&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;code-style&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Coding style guidelines and conventions for the team&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;dedent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="se"&gt;\
&lt;/span&gt;&lt;span class="s"&gt;            Use this skill when answering questions about coding style,
            conventions, or best practices for the team.
        &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;code_style_skill&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Call the agent's &lt;code&gt;run&lt;/code&gt; method, then check in MLFlow whether the skills loaded by &lt;code&gt;get_latest_skills&lt;/code&gt; show up in the agent's system prompt:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F68h00cw37l6aqouzechh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F68h00cw37l6aqouzechh.png" alt="The skills loaded from the database have been updated into the agent's system prompt. Image by Author" width="800" height="509"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The hook method works. We can now load skills from a database in real time.&lt;/p&gt;

&lt;h3&gt;
  
  
  Run scripts from skills safely inside containers
&lt;/h3&gt;

&lt;p&gt;As of the latest version, Microsoft Agent Framework can't run Python scripts locally or inside containers. But most skills guide the agent through business logic using scripts, so we need to give the agent the ability to run those scripts in a code interpreter.&lt;/p&gt;

&lt;p&gt;As the predecessor to MAF, Autogen provided a way to run Python scripts inside Docker containers. You can learn about that in this article:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.dataleadsfuture.com/exclusive-reveal-code-sandbox-tech-behind-manus-and-claude-agent-skills/" rel="noopener noreferrer"&gt;Exclusive Reveal: Code Sandbox Tech Behind Manus and Claude Agent Skills&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We need something like Autogen's &lt;code&gt;DockerCommandLineCodeExecutor&lt;/code&gt; for Agent Framework. With the help of AI coding tools, building a code executor for Agent Framework isn't hard. (You can find it in the source code repo at the end of the article.)&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;code_executor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;DockerCommandLineCodeExecutor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;image&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;python-code-sandbox&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;work_dir&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;work_dir&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;delete_tmp_files&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;environment&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;TAVILY_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;TAVILY_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To keep LLM calls simple, we also need an object-oriented &lt;code&gt;CodeExecutionTool&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;CodeExecutionTool&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;Tool for executing code using a CodeExecutor.&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;executor&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;CodeExecutor&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_executor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;executor&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;execute_code&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;language&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Literal&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;python&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;sh&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;python&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_executor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute_code_blocks&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;CodeBlock&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;code&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;language&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;language&lt;/span&gt;&lt;span class="p"&gt;)],&lt;/span&gt;
            &lt;span class="nc"&gt;CancellationToken&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next, initialize an &lt;code&gt;execute_code&lt;/code&gt; tool and wire it up to the agent at initialization:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;code_tool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;CodeExecutionTool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;code_executor&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;execute_code&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In MLflow, you can see that when the agent needs to search the web, it generates Python code based on the skill's instructions and sends it to the container for execution:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa34kkyrhxca4f14aqvwe.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa34kkyrhxca4f14aqvwe.png" alt="The agent generated code based on the skill's instructions and executed it in a container. Image by Author" width="800" height="470"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This approach not only lets the agent run code defined in skills, but also keeps that execution safe inside a container.&lt;/p&gt;

&lt;p&gt;Of course, in a server-side deployment, you'd send code to a centralized Jupyter kernel environment for execution. But that's a whole other story. You can dig into that in my other articles.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.dataleadsfuture.com/how-i-crushed-advent-of-code-and-solved-hard-problems-using-autogen-jupyter-executor-and-qwen3/" rel="noopener noreferrer"&gt;How I Crushed Advent of Code And Solved Hard Problems Using Autogen Jupyter Executor and Qwen3&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Reduce context length even further
&lt;/h3&gt;

&lt;p&gt;Agent Skills uses progressive disclosure to keep irrelevant skill content from eating up your context window. But as the conversation or task moves forward, skill content that was loaded into earlier messages will still pile up in the context over time.&lt;/p&gt;

&lt;p&gt;Agent systems today have several context pruning techniques available. Context trimming and context compression, both common in desktop agents, work really well.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr62ga60f8gi0rzcupi5b.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fr62ga60f8gi0rzcupi5b.png" alt="The difference between context pruning and context compression. Image by Author" width="800" height="434"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Beyond those two, today I want to share a context engineering technique I discovered at work that fits Agent Skills even better.&lt;/p&gt;

&lt;p&gt;As you know, in enterprise scenarios, loading a skill usually means running one atomic workflow: researching a topic through web search? Sure. Running a SWOT analysis on a company and writing a report? No problem.&lt;/p&gt;

&lt;p&gt;These workflows all share one thing in common. You give the agent the right input, then wait for it to return an output. Which skill the agent loaded, and how it worked through the task — I honestly don't care. I wouldn't even mind if the agent unloaded the skill after finishing to save tokens.&lt;/p&gt;

&lt;p&gt;That sounds a lot like how a function works. So, can we use an agent with skills loaded as a tool for another agent? Absolutely. That's exactly what I do.&lt;/p&gt;

&lt;p&gt;Microsoft Agent Framework has a method on Agent called &lt;code&gt;as_tool&lt;/code&gt;. It turns an agent into a function-callable tool.&lt;/p&gt;

&lt;p&gt;So I designed a main agent. The main agent takes user requests and generates the right response to return. The agent with Agent Skills loading capability turns itself into a tool for the main agent using &lt;code&gt;as_tool&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;chat_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;as_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Assistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;instructions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;dedent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    You&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;re a smart little helper who, for each user request, 
    picks the right task description to call a tool, gets the answer, 
    and then delivers the final result.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;skills_agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;as_tool&lt;/span&gt;&lt;span class="p"&gt;()],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The skills agent's workflow stays the same. It loads the right skill based on the task description, generates and runs code, then returns the result.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuco2dlm4p81kbyrhe91b.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuco2dlm4p81kbyrhe91b.png" alt="The skill agent is provided as a tool for the main agent to call. Image by Author" width="800" height="478"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;But the main agent is different. Its context only holds user messages, the message calling the skills agent tool, and the final response. No skill-related content at all. The main agent's context stays clean, and even after running for a long time, it won't interfere with the LLM.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyczyc9ercjrderiz25de.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyczyc9ercjrderiz25de.png" alt="Keep the main agent's context clean by loading skills into the sub-agent. Image by Author" width="402" height="551"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;There's a nice bonus too. LLMs know what they want better than humans do, so before the main agent calls the skills agent, it rewrites the user's task into something more precise. This helps the skills agent execute more accurately.&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;That's everything I have for you today on Agent Skills for enterprise agent systems.&lt;/p&gt;

&lt;p&gt;Unlike desktop agents, enterprise agent systems run on cloud servers. There's no way to update an agent's skills through the file system in real time without downtime.&lt;/p&gt;

&lt;p&gt;So I went with a targeted approach. This approach lets users write skill content through a web interface and save it to a database, while agents read the latest skills in real time and sync them across server nodes.&lt;/p&gt;

&lt;p&gt;I used the latest version of Microsoft Agent Framework to build this, but you can use any other framework. The principles are the same.&lt;/p&gt;

&lt;p&gt;I also covered how to run scripts the agent generates from skills inside containers, which is much safer than running scripts directly on a desktop system.&lt;/p&gt;

&lt;p&gt;I shared a context management approach I found at work that works especially well for skills-based agents.&lt;/p&gt;

&lt;p&gt;The Microsoft Agent Framework API is still a bit unstable. If anything is unclear, feel free to leave a comment, and I'll get back to you as soon as I can.&lt;/p&gt;

&lt;p&gt;Thanks for reading! Share this with your friends if you think it might help someone else.&lt;/p&gt;




&lt;p&gt;Enjoyed this read? &lt;a href="https://www.dataleadsfuture.com/#/portal/signup" rel="noopener noreferrer"&gt;Subscribe now to get more cutting-edge data science tips straight to your inbox!&lt;/a&gt; Your feedback and questions are welcome — let’s discuss in the comments below!&lt;/p&gt;

&lt;p&gt;This article was originally published on &lt;a href="https://www.dataleadsfuture.com/how-to-use-agent-skills-in-enterprise-llm-agent-systems/" rel="noopener noreferrer"&gt;Data Leads Future&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>datascience</category>
      <category>tutorial</category>
    </item>
    <item>
      <title>Most “agent memory” setups just save everything and hope semantic search will fix it.

I share a pattern that works better:
✅ Use an LLM to decide what is worth remembering
✅ Store memories in RedisVL
✅ Run memory extraction in parallel with asyncio</title>
      <dc:creator>Peng Qian</dc:creator>
      <pubDate>Tue, 10 Feb 2026 11:49:54 +0000</pubDate>
      <link>https://dev.to/qtalen/most-agent-memory-setups-just-save-everything-and-hope-semantic-search-will-fix-it-i-share-a-38m5</link>
      <guid>https://dev.to/qtalen/most-agent-memory-setups-just-save-everything-and-hope-semantic-search-will-fix-it-i-share-a-38m5</guid>
      <description>&lt;p&gt;

&lt;/p&gt;
&lt;div class="ltag__link--embedded"&gt;
  &lt;div class="crayons-story "&gt;
  &lt;a href="https://dev.to/qtalen/advanced-redisvl-long-term-memory-tutorial-using-an-llm-to-extract-memories-35kc" class="crayons-story__hidden-navigation-link"&gt;Advanced RedisVL Long-term Memory Tutorial: Using an LLM to Extract Memories&lt;/a&gt;


  &lt;div class="crayons-story__body crayons-story__body-full_post"&gt;
    &lt;div class="crayons-story__top"&gt;
      &lt;div class="crayons-story__meta"&gt;
        &lt;div class="crayons-story__author-pic"&gt;

          &lt;a href="/qtalen" class="crayons-avatar  crayons-avatar--l  "&gt;
            &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2927908%2F838177c1-066e-4986-a3ba-d764afa88632.jpg" alt="qtalen profile" class="crayons-avatar__image"&gt;
          &lt;/a&gt;
        &lt;/div&gt;
        &lt;div&gt;
          &lt;div&gt;
            &lt;a href="/qtalen" class="crayons-story__secondary fw-medium m:hidden"&gt;
              Peng Qian
            &lt;/a&gt;
            &lt;div class="profile-preview-card relative mb-4 s:mb-0 fw-medium hidden m:inline-block"&gt;
              
                Peng Qian
                
              
              &lt;div id="story-author-preview-content-3234678" class="profile-preview-card__content crayons-dropdown branded-7 p-4 pt-0"&gt;
                &lt;div class="gap-4 grid"&gt;
                  &lt;div class="-mt-4"&gt;
                    &lt;a href="/qtalen" class="flex"&gt;
                      &lt;span class="crayons-avatar crayons-avatar--xl mr-2 shrink-0"&gt;
                        &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2927908%2F838177c1-066e-4986-a3ba-d764afa88632.jpg" class="crayons-avatar__image" alt=""&gt;
                      &lt;/span&gt;
                      &lt;span class="crayons-link crayons-subtitle-2 mt-5"&gt;Peng Qian&lt;/span&gt;
                    &lt;/a&gt;
                  &lt;/div&gt;
                  &lt;div class="print-hidden"&gt;
                    
                      Follow
                    
                  &lt;/div&gt;
                  &lt;div class="author-preview-metadata-container"&gt;&lt;/div&gt;
                &lt;/div&gt;
              &lt;/div&gt;
            &lt;/div&gt;

          &lt;/div&gt;
          &lt;a href="https://dev.to/qtalen/advanced-redisvl-long-term-memory-tutorial-using-an-llm-to-extract-memories-35kc" class="crayons-story__tertiary fs-xs"&gt;&lt;time&gt;Feb 10&lt;/time&gt;&lt;span class="time-ago-indicator-initial-placeholder"&gt;&lt;/span&gt;&lt;/a&gt;
        &lt;/div&gt;
      &lt;/div&gt;

    &lt;/div&gt;

    &lt;div class="crayons-story__indention"&gt;
      &lt;h2 class="crayons-story__title crayons-story__title-full_post"&gt;
        &lt;a href="https://dev.to/qtalen/advanced-redisvl-long-term-memory-tutorial-using-an-llm-to-extract-memories-35kc" id="article-link-3234678"&gt;
          Advanced RedisVL Long-term Memory Tutorial: Using an LLM to Extract Memories
        &lt;/a&gt;
      &lt;/h2&gt;
        &lt;div class="crayons-story__tags"&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/ai"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;ai&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/agents"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;agents&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/programming"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;programming&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/redis"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;redis&lt;/a&gt;
        &lt;/div&gt;
      &lt;div class="crayons-story__bottom"&gt;
        &lt;div class="crayons-story__details"&gt;
          &lt;a href="https://dev.to/qtalen/advanced-redisvl-long-term-memory-tutorial-using-an-llm-to-extract-memories-35kc" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left"&gt;
            &lt;div class="multiple_reactions_aggregate"&gt;
              &lt;span class="multiple_reactions_icons_container"&gt;
                  &lt;span class="crayons_icon_container"&gt;
                    &lt;img src="https://assets.dev.to/assets/sparkle-heart-5f9bee3767e18deb1bb725290cb151c25234768a0e9a2bd39370c382d02920cf.svg" width="18" height="18"&gt;
                  &lt;/span&gt;
              &lt;/span&gt;
              &lt;span class="aggregate_reactions_counter"&gt;2&lt;span class="hidden s:inline"&gt;&amp;nbsp;reactions&lt;/span&gt;&lt;/span&gt;
            &lt;/div&gt;
          &lt;/a&gt;
            &lt;a href="https://dev.to/qtalen/advanced-redisvl-long-term-memory-tutorial-using-an-llm-to-extract-memories-35kc#comments" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left flex items-center"&gt;
              

              1&lt;span class="hidden s:inline"&gt;&amp;nbsp;comment&lt;/span&gt;
            &lt;/a&gt;
        &lt;/div&gt;
        &lt;div class="crayons-story__save"&gt;
          &lt;small class="crayons-story__tertiary fs-xs mr-2"&gt;
            10 min read
          &lt;/small&gt;
            
              &lt;span class="bm-initial"&gt;
                

              &lt;/span&gt;
              &lt;span class="bm-success"&gt;
                

              &lt;/span&gt;
            
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;




</description>
      <category>ai</category>
      <category>agents</category>
      <category>programming</category>
      <category>redis</category>
    </item>
    <item>
      <title>Advanced RedisVL Long-term Memory Tutorial: Using an LLM to Extract Memories</title>
      <dc:creator>Peng Qian</dc:creator>
      <pubDate>Tue, 10 Feb 2026 11:49:27 +0000</pubDate>
      <link>https://dev.to/qtalen/advanced-redisvl-long-term-memory-tutorial-using-an-llm-to-extract-memories-35kc</link>
      <guid>https://dev.to/qtalen/advanced-redisvl-long-term-memory-tutorial-using-an-llm-to-extract-memories-35kc</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;In this weekend note, we keep talking about how to build long-term memory for an agent with RedisVL.&lt;/p&gt;

&lt;p&gt;When we build a long-term memory module for an agent, we need to care about two points most:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;After long running, will the saved memories grow too large and cause context explosion?&lt;/li&gt;
&lt;li&gt;How do we recall the memories that matter most to the current context?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We will solve these two problems today.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;TLDR:&lt;/strong&gt; In this hands-on tutorial, we first use an LLM to extract information from user messages that has value for later chats. Then we store that as long-term memory in RedisVL. When needed, we search related memories with semantic search. With this setup, the agent understands the past context of the user and gives more accurate answers.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjfkidrk7fzb6y54ql7x8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjfkidrk7fzb6y54ql7x8.png" alt="After using an LLM to extract memories, how my long‑term memory module runs. Image by Author" width="800" height="236"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;With this kind of long-term memory, we do not worry about memory explosion after long running. We also do not worry that unrelated memories will hurt LLM responses.&lt;/p&gt;

&lt;p&gt;You can get all the source code at the end of this post.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why do we do this?
&lt;/h3&gt;

&lt;p&gt;In the last hands-on post, I shared how to build short-term and long-term memory for an agent with RedisVL:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.dataleadsfuture.com/build-long-term-and-short-term-memory-for-agents-using-redisvl/" rel="noopener noreferrer"&gt;Build Long-Term and Short-Term Memory for Agents Using RedisVL&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The short-term part works very well. RedisVL API feels much simpler than the raw Redis API. I can write that code with little effort.&lt;/p&gt;

&lt;p&gt;The long-term part does not work. We follow the official example and store user queries and LLM responses in RedisVL. When chat continues, RedisVL keeps pulling repeated queries or unrelated answers via semantic search. This troubles the LLM a lot and blocks the chat from going on.&lt;/p&gt;

&lt;h3&gt;
  
  
  Can we avoid RedisVL?
&lt;/h3&gt;

&lt;p&gt;Your boss will not agree. There is already Redis in your stack. Why do you still want to install mem0 or other open source tools? How about extra cost? This is real life.&lt;/p&gt;

&lt;p&gt;So we still need to make RedisVL work. But I do not want to run around like a headless fly. Before that I want to see how humans handle memory.&lt;/p&gt;




&lt;h2&gt;
  
  
  How Humans Handle Memory
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What deserves memory
&lt;/h3&gt;

&lt;p&gt;First, we need to know one thing. Only information that centers on me and links to me tightly deserves a sticky note.&lt;/p&gt;

&lt;p&gt;So what information about me do I want to write down?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Preference settings such as tools I like, languages I use, my schedule, and my tone when I talk&lt;/li&gt;
&lt;li&gt;Stable personal info such as my role, my time zone, and my daily habits&lt;/li&gt;
&lt;li&gt;Goals and decisions, such as chosen options, plans, and sentences that start with “I decide to...”&lt;/li&gt;
&lt;li&gt;Key milestones such as job change, moving, deadlines, and product launches&lt;/li&gt;
&lt;li&gt;Work and project context, such as project names, stakeholders, needs, and status like “done/next” step.&lt;/li&gt;
&lt;li&gt;Repeated pain points or strong views that will change LLM advice later&lt;/li&gt;
&lt;li&gt;Things I say with “remember this ...” or “do not forget ...”&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  What does not deserve memory
&lt;/h3&gt;

&lt;p&gt;I do not plan to store any LLM answer. LLM answers to the same question will change with context. So LLM answers in long-term memory do not help much.&lt;/p&gt;

&lt;p&gt;Besides LLM answers, I also do not want to keep these:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;One-time small things that likely will not matter later&lt;/li&gt;
&lt;li&gt;Very sensitive personal data such as health diagnosis, exact address, government IDs, passwords, bank accounts&lt;/li&gt;
&lt;li&gt;Things I clearly ask not to remember&lt;/li&gt;
&lt;li&gt;Things I already wrote down on the sticky note&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Design a Prompt for LLM Memory Extraction
&lt;/h2&gt;

&lt;p&gt;Now we know how humans handle memory. Next, I want to build an agent that follows the same rules and extracts memories from my daily chats.&lt;/p&gt;

&lt;p&gt;The key lives in the &lt;code&gt;system prompt&lt;/code&gt;. I need to describe all rules in the system prompt. Then I ask the agent to follow these rules with very high consistency.&lt;/p&gt;

&lt;p&gt;In the past, I might have tried some “write 1000-line prompt” challenge. Now I do not need that. I just open any LLM client, paste these rules, then ask the LLM to help me write a &lt;code&gt;system prompt&lt;/code&gt;. This takes less than one minute.&lt;/p&gt;

&lt;p&gt;After a few tries, I pick one I like. Here is that &lt;code&gt;system prompt&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tex"&gt;&lt;code&gt;Your job:
Based only on the user’s current input and the existing related memories, decide whether you need to add a new “long-term memory,” and if needed, **extract just one fact**. You do not talk to the user. You only handle memory extraction and deduplication.

---

### 1. Core principles

1. Only save information that **will likely be useful in the future**.
2. **At most one fact per turn**, and it must clearly appear in the current input.
3. **Never invent or infer anything**. You can only restate or lightly rephrase what the user has explicitly said.
4. If the current input has nothing worth keeping, or the information is already in the related memories, then do not add a new memory.

---

### 2. What counts as “long-term memory”

Only consider the categories below, and decide whether the information has long-term value:

...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Due to space, I only show part of the prompt here. You can get the full prompt from the source code at the end.&lt;/p&gt;




&lt;h2&gt;
  
  
  Build a ContextProvider for Long-term Memory
&lt;/h2&gt;

&lt;p&gt;After we finish the memory extraction rules, we start to build the long-term memory module for the agent.&lt;/p&gt;

&lt;p&gt;For future use, I still pick Microsoft Agent Framework MAF. It gives a &lt;code&gt;ContextProvider&lt;/code&gt; feature that lets us plug long-term memory into the agent in a simple way.&lt;/p&gt;

&lt;p&gt;Of course the principle of long-term memory stays the same. You can use any agent framework you like and build your own memory module. Or you can ignore frameworks and first build storage and retrieval of memories, and then call them through function calls. That is fine.&lt;/p&gt;

&lt;h3&gt;
  
  
  Run memory extraction in sequence
&lt;/h3&gt;

&lt;p&gt;In the last post, I already built a long-term memory module with &lt;code&gt;ContextProvider&lt;/code&gt;. The new version looks similar. But this time, we use an LLM to extract memories. So after we set up &lt;code&gt;ContextProvider&lt;/code&gt;, we first use the &lt;code&gt;system prompt&lt;/code&gt; to build a memory extraction agent.&lt;/p&gt;

&lt;p&gt;If you do not know how to use &lt;code&gt;ContextProvider&lt;/code&gt; yet, I suggest you read my last post again. That post explains &lt;code&gt;ChatMessageStore&lt;/code&gt; and &lt;code&gt;ContextProvider&lt;/code&gt; in Microsoft Agent Framework in detail:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.dataleadsfuture.com/build-long-term-and-short-term-memory-for-agents-using-redisvl/" rel="noopener noreferrer"&gt;Build Long-Term and Short-Term Memory for Agents Using RedisVL&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To avoid too much unrelated data from Redis, I set the &lt;code&gt;distance_threshold&lt;/code&gt; value pretty small. But not too small. If too small, then it loses meaning. You can pick the value you like.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;LongTermMemory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ContextProvider&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;thread_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;session_tag&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;distance_threshold&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;context_prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ContextProvider&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;DEFAULT_CONTEXT_PROMPT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;redis_url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;redis://localhost:6379&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;embedding_model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;BAAI/bge-m3&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;llm_model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Qwen3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NEXT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;llm_api_key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;llm_base_url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="bp"&gt;...&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_init_extractor&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_init_extractor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;prompt.md&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;encoding&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;utf-8&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;system_prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_extractor&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAILikeChatClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;model_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_llm_model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;as_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;extractor&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;instructions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;system_prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;default_options&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;response_format&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ExtractResult&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;extra_body&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;enable_thinking&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="bp"&gt;False&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="p"&gt;},&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next we implement the &lt;code&gt;invoking&lt;/code&gt; method. This method runs before the user agent calls the LLM. In this method, we extract and store long-term memory.&lt;/p&gt;

&lt;p&gt;To make the logic clear, I first implement the &lt;code&gt;invoking&lt;/code&gt; method in order, as in this diagram:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5nwk96ayu5xi64e3138n.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5nwk96ayu5xi64e3138n.png" alt="Run the memory extraction logic step by step in order. Image by Author" width="800" height="236"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;When a new user request comes into &lt;code&gt;ContextProvider&lt;/code&gt;, we first search RedisVL with semantic search for the most similar memories.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;LongTermMemory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ContextProvider&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="bp"&gt;...&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;invoking&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ChatMessage&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;MutableSequence&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;ChatMessage&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ChatMessage&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

        &lt;span class="n"&gt;line_sep_memories&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_get_line_sep_memories&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="bp"&gt;...&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_get_line_sep_memories&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_semantic_store&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_relevant&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;session_tag&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_session_tag&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;line_sep_memories&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;* &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;str&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;line_sep_memories&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next, we send these existing memories plus the user request to the memory extraction agent. That agent first checks if anything is worth saving according to the rules. Then it extracts a new helpful memory from the user request and saves it into RedisVL.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;LongTermMemory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ContextProvider&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="bp"&gt;...&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;invoking&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ChatMessage&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;MutableSequence&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;ChatMessage&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="bp"&gt;...&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_save_memory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;line_sep_memories&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="bp"&gt;...&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_save_memory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ChatMessage&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;MutableSequence&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;ChatMessage&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="n"&gt;relevant_memory&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;detect_messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="nc"&gt;ChatMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;Role&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;USER&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Existing related memories：&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;relevant_memory&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
            &lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;relevant_memory&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
            &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_extractor&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;detect_messages&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;extract_result&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ExtractResult&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;cast&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ExtractResult&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;extract_result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;should_write_memory&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_semantic_store&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_messages&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
                    &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;role&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;content&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;extract_result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;memory_to_write&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
                &lt;span class="p"&gt;],&lt;/span&gt;
                &lt;span class="n"&gt;session_tag&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_session_tag&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Last, we put the memories from RedisVL into &lt;code&gt;Context&lt;/code&gt; as extra context. These memory messages get merged into the history of the real chat agent. They give the chat agent extra background to produce answers.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;LongTermMemory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ContextProvider&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="bp"&gt;...&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;invoking&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ChatMessage&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;MutableSequence&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;ChatMessage&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="bp"&gt;...&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="nc"&gt;ChatMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_context_prompt&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;line_sep_memories&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;line_sep_memories&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now we build a simple chat agent to test the new long-term memory module:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAILikeChatClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;Qwen3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;MAX&lt;/span&gt;
&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;as_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;assistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;instructions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a helpful assistant.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;context_provider&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nc"&gt;LongTermMemory&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;thread&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_new_thread&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;user_input&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;input&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;User: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;startswith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;exit&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="k"&gt;break&lt;/span&gt;
        &lt;span class="n"&gt;stream&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run_stream&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;thread&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;thread&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Assistant: &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;end&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;flush&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now we chat with the agent and see how it works:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqcnctscvwqhjh6w2zn5n.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqcnctscvwqhjh6w2zn5n.png" alt="The yellow parts are memories fetched from RedisVL, and the green parts are memories extracted by the LLM. Image by Author" width="800" height="331"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;From MLFlow we see that the retrieved memories go in as a separate message in the chat history:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9h8qiasa2ggguho0xm7b.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9h8qiasa2ggguho0xm7b.png" alt="The retrieved memory will be added to the chat history as a separate message. Image by Author" width="800" height="495"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Then we check Redis and see what memories we saved:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcrir4vlqejr2vekvt8eg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcrir4vlqejr2vekvt8eg.png" alt="The info that the LLM pulls out from the user’s past conversations will be saved into Redis as valuable memories. Image by Author" width="800" height="457"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We can see that as the chat goes on, the new long-term memory module no longer stores and retrieves all chat history without filter. It keeps and retrieves only memories that matter to the user, and these memories give strong help in later chats.&lt;/p&gt;

&lt;h3&gt;
  
  
  Use concurrency to speed up
&lt;/h3&gt;

&lt;p&gt;Everything looks fine except for the part where we use an LLM to extract memories that deserve saving.&lt;/p&gt;

&lt;p&gt;The largest delay in an agent often comes from LLM calls. Now we add one more LLM call. We also need to wait for the LLM to decide whether to save memory before we go on with the real chat.&lt;/p&gt;

&lt;p&gt;We can add some logs and see how much delay we add:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgopspxtr21l7dmgrpqrp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgopspxtr21l7dmgrpqrp.png" alt="Every time we use the LLM to pull out memories, it takes a bit more than one second. Image by Author" width="800" height="333"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We add more than one second per chat turn.&lt;/p&gt;

&lt;p&gt;One way to optimize is to use a smaller model like &lt;code&gt;qwen3-8b&lt;/code&gt;. But the gain stays small. We save little time and hurt memory quality due to the smaller model.&lt;/p&gt;

&lt;p&gt;Today I use a different way. I use concurrent programming so that the LLM call for memory extraction and the LLM call for user reply run at the same time.&lt;/p&gt;

&lt;p&gt;Let us see the result after that change:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F72jeq2os0151pg3sta11.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F72jeq2os0151pg3sta11.png" alt="The memory retrieval step basically doesn’t take any time at all, so how did I pull that off? Image by Author" width="800" height="384"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The time cost for extraction and storage of memory becomes almost nothing while the effect stays the same. How do we reach that?&lt;/p&gt;

&lt;p&gt;If you built multi-agent workflows with LangGraph or LlamaIndex, you have likely seen the fan-out idea. It lets many nodes run at the same time, and then you take the final result.&lt;/p&gt;

&lt;p&gt;The base idea uses the &lt;code&gt;asyncio&lt;/code&gt; module in Python. You often see &lt;code&gt;async&lt;/code&gt; and &lt;code&gt;await&lt;/code&gt; when you write agent code. I wrote many posts in the past about &lt;code&gt;asyncio&lt;/code&gt; and concurrency:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.dataleadsfuture.com/use-these-methods-to-make-your-python-concurrent-tasks-perform-better/" rel="noopener noreferrer"&gt;Use These Methods to Make Your Python Concurrent Tasks Perform Better&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In short, when you face delays because of long IO calls, you can use concurrent programming.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; If you use &lt;code&gt;mlflow.openai.autolog()&lt;/code&gt; to trace LLM calls, you may see that concurrent runs stop working. I still do not know why. I suggest you comment out MLFlow parts before you go on.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F57jdd3pzzalnd29zmmc2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F57jdd3pzzalnd29zmmc2.png" alt="By running the memory retrieval process with concurrent programming. Image by Author" width="800" height="460"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Back to our current case. We see that &lt;code&gt;_save_memory&lt;/code&gt; is an &lt;code&gt;async&lt;/code&gt; method. That means we can run it with concurrency.&lt;/p&gt;

&lt;p&gt;How do we do that? Very simple. When we call &lt;code&gt;_save_memory&lt;/code&gt;, we use &lt;code&gt;asyncio.create_task&lt;/code&gt; to build a new concurrent task. That is all.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;LongTermMemory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ContextProvider&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="bp"&gt;...&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;invoking&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ChatMessage&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;MutableSequence&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;ChatMessage&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="bp"&gt;...&lt;/span&gt;
        &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_task&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_save_memory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;line_sep_memories&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="bp"&gt;...&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Since real user chat often takes more time than memory extraction, we do not need to wait for that task in code. We only need to create the task.&lt;/p&gt;

&lt;p&gt;With that, we add a memory extraction module that does not bring much extra delay to the agent system.&lt;/p&gt;




&lt;h2&gt;
  
  
  Conslusion
&lt;/h2&gt;

&lt;p&gt;Redis now serves as standard infra for many companies. With RedisVL, it can cache and search information by semantics. This makes it easier to build short-term and long-term memory on top of Redis.&lt;/p&gt;

&lt;p&gt;But if you build long-term memory with RedisVL API directly, you may not see good results. The system has no “brain” to judge which information deserves long-term storage and keeps value over time.&lt;/p&gt;

&lt;p&gt;So in this tutorial, I first use an agent to extract useful memories and then write them into RedisVL. This improves the value of saved information. Long-term memory works much better now and fills the gap in my last post.&lt;/p&gt;

&lt;p&gt;I also share a short guide on how to use concurrent programming so that many LLM calls run at the same time. This cuts system delay by a lot. If you like concurrency, you can read my old posts.&lt;/p&gt;

&lt;p&gt;Thanks for reading. If you have any questions or ideas, leave me a note. I will reply as soon as I can.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.dataleadsfuture.com/advanced-redisvl-long-term-memory-tutorial-using-an-llm-to-extract-memories/#/portal/signup/" rel="noopener noreferrer"&gt;&lt;strong&gt;Do not forget to subscribe to my blog&lt;/strong&gt;&lt;/a&gt; and follow my new work in AI applications.&lt;/p&gt;

&lt;p&gt;Also share this post with your friends. It may help more people.&lt;/p&gt;




&lt;p&gt;Enjoyed this read? &lt;a href="https://www.dataleadsfuture.com/#/portal/signup" rel="noopener noreferrer"&gt;&lt;strong&gt;Subscribe now to get more cutting-edge data science tips straight to your inbox!&lt;/strong&gt;&lt;/a&gt; Your feedback and questions are welcome — let’s discuss in the comments below!&lt;/p&gt;

&lt;p&gt;This article was originally published on &lt;a href="https://www.dataleadsfuture.com/advanced-redisvl-long-term-memory-tutorial-using-an-llm-to-extract-memories/" rel="noopener noreferrer"&gt;Data Leads Future&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>programming</category>
      <category>redis</category>
    </item>
    <item>
      <title>Tried RedisVL as a memory layer for AI agents:

✅ Short-term memory with MessageHistory: works great
😒 Long-term semantic memory with SemanticMessageHistory: not so much

If you are thinking about RedisVL for agent memory, this will save you time:</title>
      <dc:creator>Peng Qian</dc:creator>
      <pubDate>Thu, 29 Jan 2026 02:49:49 +0000</pubDate>
      <link>https://dev.to/qtalen/tried-redisvl-as-a-memory-layer-for-ai-agents-short-term-memory-with-messagehistory-works-hd3</link>
      <guid>https://dev.to/qtalen/tried-redisvl-as-a-memory-layer-for-ai-agents-short-term-memory-with-messagehistory-works-hd3</guid>
      <description>&lt;p&gt;

&lt;/p&gt;
&lt;div class="ltag__link--embedded"&gt;
  &lt;div class="crayons-story "&gt;
  &lt;a href="https://dev.to/qtalen/build-long-term-and-short-term-memory-for-agents-using-redisvl-4h8m" class="crayons-story__hidden-navigation-link"&gt;Build Long-Term and Short-Term Memory for Agents Using RedisVL&lt;/a&gt;


  &lt;div class="crayons-story__body crayons-story__body-full_post"&gt;
    &lt;div class="crayons-story__top"&gt;
      &lt;div class="crayons-story__meta"&gt;
        &lt;div class="crayons-story__author-pic"&gt;

          &lt;a href="/qtalen" class="crayons-avatar  crayons-avatar--l  "&gt;
            &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2927908%2F838177c1-066e-4986-a3ba-d764afa88632.jpg" alt="qtalen profile" class="crayons-avatar__image"&gt;
          &lt;/a&gt;
        &lt;/div&gt;
        &lt;div&gt;
          &lt;div&gt;
            &lt;a href="/qtalen" class="crayons-story__secondary fw-medium m:hidden"&gt;
              Peng Qian
            &lt;/a&gt;
            &lt;div class="profile-preview-card relative mb-4 s:mb-0 fw-medium hidden m:inline-block"&gt;
              
                Peng Qian
                
              
              &lt;div id="story-author-preview-content-3193063" class="profile-preview-card__content crayons-dropdown branded-7 p-4 pt-0"&gt;
                &lt;div class="gap-4 grid"&gt;
                  &lt;div class="-mt-4"&gt;
                    &lt;a href="/qtalen" class="flex"&gt;
                      &lt;span class="crayons-avatar crayons-avatar--xl mr-2 shrink-0"&gt;
                        &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2927908%2F838177c1-066e-4986-a3ba-d764afa88632.jpg" class="crayons-avatar__image" alt=""&gt;
                      &lt;/span&gt;
                      &lt;span class="crayons-link crayons-subtitle-2 mt-5"&gt;Peng Qian&lt;/span&gt;
                    &lt;/a&gt;
                  &lt;/div&gt;
                  &lt;div class="print-hidden"&gt;
                    
                      Follow
                    
                  &lt;/div&gt;
                  &lt;div class="author-preview-metadata-container"&gt;&lt;/div&gt;
                &lt;/div&gt;
              &lt;/div&gt;
            &lt;/div&gt;

          &lt;/div&gt;
          &lt;a href="https://dev.to/qtalen/build-long-term-and-short-term-memory-for-agents-using-redisvl-4h8m" class="crayons-story__tertiary fs-xs"&gt;&lt;time&gt;Jan 29&lt;/time&gt;&lt;span class="time-ago-indicator-initial-placeholder"&gt;&lt;/span&gt;&lt;/a&gt;
        &lt;/div&gt;
      &lt;/div&gt;

    &lt;/div&gt;

    &lt;div class="crayons-story__indention"&gt;
      &lt;h2 class="crayons-story__title crayons-story__title-full_post"&gt;
        &lt;a href="https://dev.to/qtalen/build-long-term-and-short-term-memory-for-agents-using-redisvl-4h8m" id="article-link-3193063"&gt;
          Build Long-Term and Short-Term Memory for Agents Using RedisVL
        &lt;/a&gt;
      &lt;/h2&gt;
        &lt;div class="crayons-story__tags"&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/programming"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;programming&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/ai"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;ai&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/datascience"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;datascience&lt;/a&gt;
            &lt;a class="crayons-tag  crayons-tag--monochrome " href="/t/redis"&gt;&lt;span class="crayons-tag__prefix"&gt;#&lt;/span&gt;redis&lt;/a&gt;
        &lt;/div&gt;
      &lt;div class="crayons-story__bottom"&gt;
        &lt;div class="crayons-story__details"&gt;
          &lt;a href="https://dev.to/qtalen/build-long-term-and-short-term-memory-for-agents-using-redisvl-4h8m" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left"&gt;
            &lt;div class="multiple_reactions_aggregate"&gt;
              &lt;span class="multiple_reactions_icons_container"&gt;
                  &lt;span class="crayons_icon_container"&gt;
                    &lt;img src="https://assets.dev.to/assets/sparkle-heart-5f9bee3767e18deb1bb725290cb151c25234768a0e9a2bd39370c382d02920cf.svg" width="18" height="18"&gt;
                  &lt;/span&gt;
              &lt;/span&gt;
              &lt;span class="aggregate_reactions_counter"&gt;1&lt;span class="hidden s:inline"&gt;&amp;nbsp;reaction&lt;/span&gt;&lt;/span&gt;
            &lt;/div&gt;
          &lt;/a&gt;
            &lt;a href="https://dev.to/qtalen/build-long-term-and-short-term-memory-for-agents-using-redisvl-4h8m#comments" class="crayons-btn crayons-btn--s crayons-btn--ghost crayons-btn--icon-left flex items-center"&gt;
              

              &lt;span class="hidden s:inline"&gt;Add&amp;nbsp;Comment&lt;/span&gt;
            &lt;/a&gt;
        &lt;/div&gt;
        &lt;div class="crayons-story__save"&gt;
          &lt;small class="crayons-story__tertiary fs-xs mr-2"&gt;
            9 min read
          &lt;/small&gt;
            
              &lt;span class="bm-initial"&gt;
                

              &lt;/span&gt;
              &lt;span class="bm-success"&gt;
                

              &lt;/span&gt;
            
        &lt;/div&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/div&gt;
&lt;/div&gt;

&lt;/div&gt;




</description>
      <category>programming</category>
      <category>ai</category>
      <category>datascience</category>
      <category>redis</category>
    </item>
    <item>
      <title>Build Long-Term and Short-Term Memory for Agents Using RedisVL</title>
      <dc:creator>Peng Qian</dc:creator>
      <pubDate>Thu, 29 Jan 2026 02:48:18 +0000</pubDate>
      <link>https://dev.to/qtalen/build-long-term-and-short-term-memory-for-agents-using-redisvl-4h8m</link>
      <guid>https://dev.to/qtalen/build-long-term-and-short-term-memory-for-agents-using-redisvl-4h8m</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;For this weekend note, I want to share some tries I made using RedisVL to add short-term and long-term memory to my agent system.&lt;/p&gt;

&lt;p&gt;TLDR: RedisVL works pretty well for short-term memory. It feels a bit simpler than using the traditional Redis API. For long-term memory with semantic search, the experience is not good. I do not recommend it.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why RedisVL?
&lt;/h3&gt;

&lt;p&gt;Big companies like to use mature infrastructure to build new features.&lt;/p&gt;

&lt;p&gt;We know mem0 and Graphiti are good open source software for long-term agent memory. But companies want to stay safe. Building new infrastructure costs money. It is unstable. It needs people who know how to run it.&lt;/p&gt;

&lt;p&gt;So when Redis launched RedisVL with vector search, we naturally wanted to try it first. You can connect it to existing Redis clusters and start using it. That sounds nice. But is it really nice? We need to try it for real.&lt;/p&gt;

&lt;p&gt;Today I will cover how to use &lt;code&gt;MessageHistory&lt;/code&gt; and &lt;code&gt;SemanticMessageHistory&lt;/code&gt; from RedisVL to add short-term and long-term memory to agents built on the Microsoft Agent Framework.&lt;/p&gt;

&lt;p&gt;You can find the source code at the end of this article.&lt;/p&gt;




&lt;p&gt;📫 &lt;a href="https://www.dataleadsfuture.com/build-long-term-and-short-term-memory-for-agents-using-redisvl/#/portal/signup" rel="noopener noreferrer"&gt;&lt;strong&gt;Don’t forget to follow my blog to stay updated on my latest progress in AI application practices.&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Preparation
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Install Redis
&lt;/h3&gt;

&lt;p&gt;If you want to try it locally, you can install a Redis instance with Docker.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker run &lt;span class="nt"&gt;-d&lt;/span&gt; &lt;span class="nt"&gt;--name&lt;/span&gt; redis &lt;span class="nt"&gt;-p&lt;/span&gt; 6379:6379 &lt;span class="nt"&gt;-p&lt;/span&gt; 8001:8001 redis/redis-stack:latest
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Cannot use Docker Desktop? See my other article.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.dataleadsfuture.com/a-quick-guide-to-containerizing-agent-applications-with-podman/?source=post_page-----b6919f293d16---------------------------------------" rel="noopener noreferrer"&gt;A Quick Guide to Containerizing Agent Applications with Podman&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The Redis instance will listen on ports 6379 and 8001. Your RedisVL client should connect to &lt;code&gt;redis://localhost:6379&lt;/code&gt;. You can visit &lt;code&gt;http://localhost:8001&lt;/code&gt; in the browser to open the Redis console.&lt;/p&gt;

&lt;h3&gt;
  
  
  Install RedisVL
&lt;/h3&gt;

&lt;p&gt;Install RedisVL with pip.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;redisvl
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After installation, you can use the RedisVL CLI to manage your indexes and keep your testing neat.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;rvl index listall
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Implement Short-Term Memory Using MessageHistory
&lt;/h2&gt;

&lt;p&gt;There are lots of “How to” RedisVL articles online, so let’s start straight from Microsoft Agent Framework and see how to use &lt;code&gt;MessageHistory&lt;/code&gt; for short-term memory.&lt;/p&gt;

&lt;p&gt;As in the official tutorial, you should implement a &lt;code&gt;RedisVLMessageStore&lt;/code&gt; based on &lt;code&gt;ChatMessageStoreProtocol&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;RedisVLMessageStore&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ChatMessageStoreProtocol&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;thread_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;common_thread&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;top_k&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;session_tag&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;redis_url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;redis://localhost:6379&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_thread_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;thread_id&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_top_k&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;top_k&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_session_tag&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;session_tag&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;session_&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;uuid4&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_redis_url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;redis_url&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_init_message_history&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In &lt;code&gt;__init__&lt;/code&gt; you should note two parameters.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;thread_id&lt;/code&gt; is used for the &lt;code&gt;name&lt;/code&gt; parameter when creating &lt;code&gt;MessageHistory&lt;/code&gt;. I like to bind it to the agent. Each agent gets a unique &lt;code&gt;thread_id&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;session_tag&lt;/code&gt; lets you set a tag for each user so different sessions do not mix.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The protocol asks us to implement two methods &lt;code&gt;list_messages&lt;/code&gt; and &lt;code&gt;add_messages&lt;/code&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;list_messages&lt;/code&gt; runs before the agent calls the LLM. It gets all available chat messages from the message store. It takes no parameters, so it cannot support long-term memory. More on that later.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;add_messages&lt;/code&gt; runs after the agent gets the LLM’s reply. It stores new messages into the message store.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here is how the message store works.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgtw2bb5qyugnkn5hrm0k.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgtw2bb5qyugnkn5hrm0k.png" alt="The calling order of message store in the agent. Image by Author" width="686" height="473"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;So in &lt;code&gt;list_messages&lt;/code&gt; and &lt;code&gt;add_messages&lt;/code&gt;, we just use RedisVL’s &lt;code&gt;MessageHistory&lt;/code&gt; to do the job.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;list_messages&lt;/code&gt; below uses &lt;code&gt;get_recent&lt;/code&gt; to get &lt;code&gt;top_k&lt;/code&gt; recent messages and turns them into &lt;code&gt;ChatMessage&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;RedisVLMessageStore&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ChatMessageStoreProtocol&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="bp"&gt;...&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;list_messages&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;ChatMessage&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_message_history&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_recent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;top_k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_top_k&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;session_tag&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_session_tag&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_back_to_chat_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;add_messages&lt;/code&gt; turns the &lt;code&gt;ChatMessage&lt;/code&gt; into Redis messages and calls &lt;code&gt;add_messages&lt;/code&gt; to store them.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;RedisVLMessageStore&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ChatMessageStoreProtocol&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="bp"&gt;...&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;add_messages&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Sequence&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;ChatMessage&lt;/span&gt;&lt;span class="p"&gt;]):&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_to_redis_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_message_history&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_messages&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;session_tag&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_session_tag&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;That is short-term memory done with RedisVL. You may also implement &lt;code&gt;deserialize&lt;/code&gt;, &lt;code&gt;serialize&lt;/code&gt; and &lt;code&gt;update_from_state&lt;/code&gt; for saving and loading the memory, but it is not important now. See the full code at the end.&lt;/p&gt;

&lt;h3&gt;
  
  
  Test RedisVLMessageStore
&lt;/h3&gt;

&lt;p&gt;Let’s build an agent and test the message store.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAILikeChatClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;Qwen3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NEXT&lt;/span&gt;
&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;create_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;assistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;instructions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;re a little helper who answers my questions in one sentence.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;chat_message_store_factory&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nc"&gt;RedisVLMessageStore&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;session_tag&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user_abc&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now a console loop for multi-turn dialog. Remember, Microsoft Agent Framework does not support short-term memory unless you use an &lt;code&gt;AgentThread&lt;/code&gt; and pass it to &lt;code&gt;run&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;thread&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_new_thread&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;user_input&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;input&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;User: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;startswith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;exit&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="k"&gt;break&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;thread&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;thread&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Assistant: &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;thread&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;message_store&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;clear&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;AgentThread&lt;/code&gt; when created calls the factory method to build the &lt;code&gt;RedisVLMessageStore&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;To check if the store works, we can use &lt;code&gt;mlflow.openai.autolog()&lt;/code&gt; to see if messages sent to the LLM contain historical messages.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;mlflow&lt;/span&gt;
&lt;span class="n"&gt;mlflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_tracking_uri&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;MLFLOW_TRACKING_URI&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="n"&gt;mlflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_experiment&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Default&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;mlflow&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;openai&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;autolog&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7wyl7jsqr7yjpa0dsesu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7wyl7jsqr7yjpa0dsesu.png" alt="You can see that the conversation comes with a complete history of messages. Image by Author" width="720" height="648"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;See my other article for using MLFlow to track LLM calls.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.dataleadsfuture.com/monitoring-qwen-3-agents-with-mlflow-3-x-end-to-end-tracking-tutorial/" rel="noopener noreferrer"&gt;Monitoring Qwen 3 Agents with MLflow 3.x: End-to-End Tracing Tutorial&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Let’s open the Redis console to see the cache.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqhxotsxih8tzwnl4mj0w.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fqhxotsxih8tzwnl4mj0w.png" alt="How the cache is stored in Redis. Image by Author" width="720" height="248"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As you can see, after using &lt;code&gt;MessageHistory&lt;/code&gt; as MAF's message store, we can implement multi-turn conversations with historical messages.&lt;/p&gt;

&lt;p&gt;With &lt;code&gt;thread_id&lt;/code&gt; and &lt;code&gt;session_tag&lt;/code&gt; parameters, we can also implement the feature that lets users switch between multiple conversation sessions, like in popular LLM chat applications.&lt;/p&gt;

&lt;p&gt;Feels simpler than the official &lt;code&gt;RedisMessageStore&lt;/code&gt; solution right?&lt;/p&gt;




&lt;h2&gt;
  
  
  Implement Long-Term Memory Using SemanticMessageHistory
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;SemanticMessageHistory&lt;/code&gt; is a subclass of &lt;code&gt;MessageHistory&lt;/code&gt;. It adds a &lt;code&gt;get_relevant&lt;/code&gt; method for vector search.&lt;/p&gt;

&lt;p&gt;Example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;what have I learned about the size of England?&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;semantic_history&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_distance_threshold&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;0.35&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;semantic_history&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_relevant&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tex"&gt;&lt;code&gt;Batches: 100&lt;span class="c"&gt;%|██████████| 1/1 [00:00&amp;lt;00:00, 56.30it/s]&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;'role': 'user', 'content': 'what is the size of England compared to Portugal?'&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Compared to &lt;code&gt;MessageHistory&lt;/code&gt; the big thing here is that we can get the most relevant historical messages based on the user request.&lt;/p&gt;

&lt;p&gt;You might think that if &lt;code&gt;MessageStore&lt;/code&gt; short-term memory is nice, then &lt;code&gt;SemanticMessageHistory&lt;/code&gt; with semantic search must be even better.&lt;/p&gt;

&lt;p&gt;From my experience, this is not the case.&lt;/p&gt;

&lt;p&gt;From my test results, it is not like that. Let’s now make a long-term memory adapter for Microsoft Agent Framework using &lt;code&gt;SemanticMessageHistory&lt;/code&gt; and see the result.&lt;/p&gt;

&lt;h3&gt;
  
  
  Use SemanticMessageHistory in Microsoft Agent Framework
&lt;/h3&gt;

&lt;p&gt;Earlier I said &lt;code&gt;list_messages&lt;/code&gt; in &lt;code&gt;ChatMessageStoreProtocol&lt;/code&gt; has no parameters, so we cannot search history. Thus, we cannot use &lt;code&gt;MessageStore&lt;/code&gt; for long-term memory.&lt;/p&gt;

&lt;p&gt;Microsoft Agent Framework has a &lt;code&gt;ContextProvider&lt;/code&gt; class. From its name, it is for context engineering.&lt;/p&gt;

&lt;p&gt;So we should build long-term memory on this class.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;RedisVLSemanticMemory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ContextProvider&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;thread_id&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;session_tag&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;distance_threshold&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;redis_url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;redis://localhost:6379&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;embedding_model&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;BAAI/bge-m3&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;embedding_api_key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;embedding_endpoint&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_thread_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;thread_id&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;semantic_thread&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_session_tag&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;session_tag&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;session_&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;uuid4&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_distance_threshold&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;distance_threshold&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_redis_url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;redis_url&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_embedding_model&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;embedding_model&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_embedding_api_key&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;embedding_api_key&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;EMBEDDING_API_KEY&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_embedding_endpoint&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;embedding_endpoint&lt;/span&gt; &lt;span class="ow"&gt;or&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;getenv&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;EMBEDDING_ENDPOINT&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_init_semantic_store&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;ContextProvider&lt;/code&gt; has two methods &lt;code&gt;invoked&lt;/code&gt; and &lt;code&gt;invoking&lt;/code&gt;.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;invoked&lt;/code&gt; runs after LLM call. It stores the latest messages in RedisVL. It has both &lt;code&gt;request_message&lt;/code&gt; and &lt;code&gt;response_messages&lt;/code&gt; parameters but stores them separately.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;invoking&lt;/code&gt; runs before LLM call. It uses the user’s current input to search for relevant history in RedisVL and returns a &lt;code&gt;Context&lt;/code&gt; object.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The &lt;code&gt;Context&lt;/code&gt; object has three variables.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;instructions&lt;/code&gt; string. The agent adds this to the system prompt.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;messages&lt;/code&gt; list. Put history messages found in long-term memory here.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;tools&lt;/code&gt; list for functions. The agent adds these tools to its &lt;code&gt;ChatOptions&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fanoep9ppsgwxkvqrpf5h.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fanoep9ppsgwxkvqrpf5h.png" alt="The purpose of the three types of messages retrieved. Image by Author" width="720" height="139"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Since we want to use vector search to get relevant history, we put those messages in &lt;code&gt;messages&lt;/code&gt;. The order between &lt;code&gt;MessageStore&lt;/code&gt; messages and &lt;code&gt;ContextProvider&lt;/code&gt; messages matters. Here is the order of their calls.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftbiqyfu1xp0kqbrrikkv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftbiqyfu1xp0kqbrrikkv.png" alt="The calling order of long-term and short-term memory in the agent. Image by Author" width="720" height="259"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Setting up a TextVectorizer
&lt;/h3&gt;

&lt;p&gt;Semantic vector search needs embeddings. We must set up a vectorizer.&lt;/p&gt;

&lt;p&gt;In &lt;code&gt;__init__&lt;/code&gt; besides &lt;code&gt;thread_id&lt;/code&gt; and &lt;code&gt;session_tag&lt;/code&gt; we set the embedding model info.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;RedisVLSemanticMemory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ContextProvider&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="bp"&gt;...&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_init_semantic_store&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_embedding_api_key&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;vectorizer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;HFTextVectorizer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_embedding_model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;vectorizer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAITextVectorizer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;model&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_embedding_model&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;api_config&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;api_key&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_embedding_api_key&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;base_url&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_embedding_endpoint&lt;/span&gt;
                &lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_semantic_store&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;SemanticMessageHistory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_thread_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;session_tag&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_session_tag&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;distance_threshold&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_distance_threshold&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;redis_url&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_redis_url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;vectorizer&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;vectorizer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I can choose a server-hosted embedding model with OpenAI API or a local HuggingFace model, depending on whether &lt;code&gt;embedding_api_key&lt;/code&gt; is set.&lt;/p&gt;

&lt;h3&gt;
  
  
  Implement invoked and invoking methods
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;invoked&lt;/code&gt; is easy. As said &lt;code&gt;SemanticMessageHistory&lt;/code&gt; stores request and response separately. I merge them into one list, then call &lt;code&gt;add_messages&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;RedisVLSemanticMemory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ContextProvider&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="bp"&gt;...&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;invoked&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;request_messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ChatMessage&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;Sequence&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;ChatMessage&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="n"&gt;response_messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ChatMessage&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;Sequence&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;ChatMessage&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;invoke_exception&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;Exception&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request_messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ChatMessage&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;request_messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;request_messages&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response_messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ChatMessage&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="n"&gt;response_messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;response_messages&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;chat_messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;request_messages&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;response_messages&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_to_redis_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;chat_messages&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_semantic_store&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_messages&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;session_tag&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_session_tag&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;invoking&lt;/code&gt; below:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;RedisVLSemanticMemory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ContextProvider&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="bp"&gt;...&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;invoking&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ChatMessage&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="n"&gt;MutableSequence&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;ChatMessage&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Any&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nf"&gt;isinstance&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ChatMessage&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="c1"&gt;# 1
&lt;/span&gt;            &lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;prompt&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;
                            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
        &lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_semantic_store&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_relevant&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;prompt&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;raw&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;session_tag&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;_session_tag&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;context&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;sorted&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;timestamp&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="c1"&gt;# 2
&lt;/span&gt;        &lt;span class="n"&gt;relevant_messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_back_to_chat_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                             &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;m&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;relevant_messages&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nc"&gt;Context&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;relevant_messages&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;# 3
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Points to note.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The &lt;code&gt;messages&lt;/code&gt; parameter may be a list for multi-modal input. Merge all text.&lt;/li&gt;
&lt;li&gt;Since messages are stored separately, I need to sort them by timestamp to keep order.&lt;/li&gt;
&lt;li&gt;Put the retrieved messages into &lt;code&gt;Context.messages&lt;/code&gt; so they go to the end of the current chat messages.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Test semantic memory
&lt;/h3&gt;

&lt;p&gt;Unlike message store, we can set &lt;code&gt;ContextProvider&lt;/code&gt; directly in the agent.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;memory_provider&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;RedisVLSemanticMemory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;session_tag&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user_abc&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;distance_threshold&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.3&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAILikeChatClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;Qwen3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NEXT&lt;/span&gt;
&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;create_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;assistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;instructions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;re a little helper who answers my questions in one sentence.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;context_providers&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;memory_provider&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now a &lt;code&gt;main&lt;/code&gt; with a &lt;code&gt;thread&lt;/code&gt; instance to keep short-term memory while testing multi-turn dialog.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;thread&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_new_thread&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;user_input&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;input&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;User: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;startswith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;exit&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="k"&gt;break&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;thread&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;thread&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;memory_provider&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;clear&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Test result:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0dl4vlwzwl7cxkypuya8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0dl4vlwzwl7cxkypuya8.png" alt="The  raw `distance_threshold` endraw  is too high, causing irrelevant messages to be retrieved. Image by Author" width="622" height="216"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It seems the default value of &lt;code&gt;distance_threshold&lt;/code&gt; 0.3 is too high. Let's set it lower:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;memory_provider&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;RedisVLSemanticMemory&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;session_tag&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user_abc&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;distance_threshold&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mf"&gt;0.2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Test again:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdfbxjjkc9a0eplhdu1kn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdfbxjjkc9a0eplhdu1kn.png" alt="Only request messages were retrieved, not response messages. Image by Author" width="720" height="234"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Lower threshold stops unrelated messages. But since requests and responses are stored separately, only requests are found. ContextProvider puts retrieved messages at the end of the message list. The LLM may think the user asked two questions. MLFlow shows it.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmhznpnvj5gky9o7nunjj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmhznpnvj5gky9o7nunjj.png" alt="Two similar questions were both added to the message list, but without attaching the already provided answers. Image by Author" width="720" height="373"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This is bad. We care more about the LLM’s answers than the requests. But vector search often finds the questions, not the answers. This just adds useless questions and does not help the LLM answer.&lt;/p&gt;

&lt;p&gt;Hard to say if the fault is Microsoft Agent Framework or RedisVL.&lt;/p&gt;

&lt;p&gt;When &lt;code&gt;ContextProviderlong-term&lt;/code&gt; finds related chat messages, they go after the ones from message store. If long-term and short-term messages repeat, they can confuse the LLM.&lt;/p&gt;

&lt;p&gt;Also, RedisVL not storing requests and responses together is a choice I do not like. LLM responses cost more. In production, a response may involve web search, RAG retrieval, or running code. But vector search finds just the request, not the answer. That is a waste.&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Today, we tried using RedisVL for short-term and long-term memory in Microsoft Agent Framework and checked the results.&lt;/p&gt;

&lt;p&gt;RedisVL is very handy for short-term agent memory. It is simpler than using the Redis API.&lt;/p&gt;

&lt;p&gt;But &lt;code&gt;SemanticMessageHistory&lt;/code&gt; for semantic search of the user history did not perform well. I explained why.&lt;/p&gt;

&lt;p&gt;Thanks to the solid Redis infrastructure, semantic caches with RedisVL are simpler than other vector solutions.&lt;/p&gt;

&lt;p&gt;Next time, I will show you a semantic cache with RedisVL to save big costs for your company.&lt;/p&gt;

&lt;p&gt;Share your thoughts in the comments.&lt;/p&gt;

&lt;p&gt;👉 &lt;a href="https://www.dataleadsfuture.com/build-long-term-and-short-term-memory-for-agents-using-redisvl/#/portal/signup" rel="noopener noreferrer"&gt;&lt;strong&gt;Subscribe to my blog&lt;/strong&gt;&lt;/a&gt; to follow my latest agent app work.&lt;/p&gt;

&lt;p&gt;And share this article with friends. Maybe it will help more people.😁&lt;/p&gt;




&lt;p&gt;Enjoyed this read? &lt;a href="https://www.dataleadsfuture.com/#/portal/signup" rel="noopener noreferrer"&gt;&lt;strong&gt;Subscribe now to get more cutting-edge data science tips straight to your inbox!&lt;/strong&gt;&lt;/a&gt; Your feedback and questions are welcome — let’s discuss in the comments below!&lt;/p&gt;

&lt;p&gt;This article was originally published on &lt;a href="https://www.dataleadsfuture.com/build-long-term-and-short-term-memory-for-agents-using-redisvl/" rel="noopener noreferrer"&gt;Data Leads Future&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>programming</category>
      <category>ai</category>
      <category>datascience</category>
      <category>redis</category>
    </item>
    <item>
      <title>🎯As companies keep rolling out agent systems internally, issues like prompt injection attacks and tricking agents into breaking the rules are starting to pop up. 

It’s becoming urgent to build compliance safeguards for these agents.</title>
      <dc:creator>Peng Qian</dc:creator>
      <pubDate>Fri, 09 Jan 2026 11:12:51 +0000</pubDate>
      <link>https://dev.to/qtalen/as-companies-keep-rolling-out-agent-systems-internally-issues-like-prompt-injection-attacks-and-24j</link>
      <guid>https://dev.to/qtalen/as-companies-keep-rolling-out-agent-systems-internally-issues-like-prompt-injection-attacks-and-24j</guid>
      <description>&lt;div class="ltag__link"&gt;
  &lt;a href="/qtalen" class="ltag__link__link"&gt;
    &lt;div class="ltag__link__pic"&gt;
      &lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2927908%2F838177c1-066e-4986-a3ba-d764afa88632.jpg" alt="qtalen"&gt;
    &lt;/div&gt;
  &lt;/a&gt;
  &lt;a href="https://dev.to/qtalen/microsoft-agent-framework-maf-middleware-basics-add-compliance-fences-to-your-agent-37o0" class="ltag__link__link"&gt;
    &lt;div class="ltag__link__content"&gt;
      &lt;h2&gt;Microsoft Agent Framework (MAF) Middleware Basics: Add Compliance Fences to Your Agent&lt;/h2&gt;
      &lt;h3&gt;Peng Qian ・ Jan 9&lt;/h3&gt;
      &lt;div class="ltag__link__taglist"&gt;
        &lt;span class="ltag__link__tag"&gt;#programming&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#ai&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#tutorial&lt;/span&gt;
        &lt;span class="ltag__link__tag"&gt;#agentaichallenge&lt;/span&gt;
      &lt;/div&gt;
    &lt;/div&gt;
  &lt;/a&gt;
&lt;/div&gt;


</description>
      <category>programming</category>
      <category>ai</category>
      <category>tutorial</category>
      <category>agentaichallenge</category>
    </item>
    <item>
      <title>Microsoft Agent Framework (MAF) Middleware Basics: Add Compliance Fences to Your Agent</title>
      <dc:creator>Peng Qian</dc:creator>
      <pubDate>Fri, 09 Jan 2026 10:22:10 +0000</pubDate>
      <link>https://dev.to/qtalen/microsoft-agent-framework-maf-middleware-basics-add-compliance-fences-to-your-agent-37o0</link>
      <guid>https://dev.to/qtalen/microsoft-agent-framework-maf-middleware-basics-add-compliance-fences-to-your-agent-37o0</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Microsoft added a middleware feature to their Agent Framework (MAF). This means you can use the chain-of-responsibility design pattern to add extra logic before or after the agent runs, function tools are called, or when the LLM is invoked, without changing the original business logic.&lt;/p&gt;

&lt;p&gt;This feature matters a lot.&lt;/p&gt;

&lt;p&gt;When building enterprise-level agent applications, teams typically collaborate across departments. Besides your own part, you might need to dynamically include permissions, logs, finance checks, and compliance reviews from other teams.&lt;/p&gt;

&lt;p&gt;These parts shouldn’t affect the agent’s core ability, but should be easy to install or remove. Like middleware in FastAPI or other web frameworks, MAF middleware enables this capability for agents as well.&lt;/p&gt;

&lt;p&gt;In today’s guide, I’ll show how I use MAF’s middleware and AG-UI to add a compliance review that checks user input before sending it to the agent.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvjks1u0jkkizzwsnitsy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fvjks1u0jkkizzwsnitsy.png" alt="Inducing an agent will be blocked by compliance rules specific to certain business scenarios. Image by Author" width="800" height="548"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This will teach you how to use middleware in enterprise agent applications and give you a first look at using AG-UI for microservice distributed agent development. Let’s start.&lt;/p&gt;

&lt;p&gt;You can get all the source code at the end. 👇&lt;/p&gt;




&lt;p&gt;📫 &lt;a href="https://www.dataleadsfuture.com/microsoft-agent-framework-maf-middleware-basics-add-compliance-fences-to-your-agent/#/portal/signup" rel="noopener noreferrer"&gt;&lt;strong&gt;Don’t forget to follow my blog to stay updated on my latest progress in AI application practices.&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  System Setup
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Install the latest Microsoft Agent Framework
&lt;/h3&gt;

&lt;p&gt;MAF is still updating quickly. Since APIs change a lot, this guide uses the newest version. It’s better to install the prerelease version.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;agent-framework &lt;span class="nt"&gt;--pre&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or add the dependency in your &lt;code&gt;pyproject.toml&lt;/code&gt; file:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tex"&gt;&lt;code&gt;"agent-framework-ag-ui&amp;gt;=1.0.0b251223"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Install Microsoft Agent Framework AG-UI
&lt;/h3&gt;

&lt;p&gt;MAF works with AG-UI to support distributed agent development. You’ll need this capability today, so install the latest version of ag-ui; otherwise, APIs won’t match up.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight tex"&gt;&lt;code&gt;"agent-framework-ag-ui&amp;gt;=1.0.0b251223"
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After installing the needed Python packages, we can move on. First, let's get a quick background on what middleware is and what it can do.&lt;/p&gt;




&lt;h2&gt;
  
  
  Quick Intro to MAF Middleware
&lt;/h2&gt;

&lt;h3&gt;
  
  
  What is middleware
&lt;/h3&gt;

&lt;p&gt;According to the MAF documentation:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;Middleware in the Agent Framework intercepts, changes, and enhances &amp;gt;agent behavior at different execution points. You can use it for logging, &amp;gt;security checks, error handling, and result transformation without &amp;gt;changing the agent’s or function’s core logic.&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That’s what we’ll learn today.&lt;/p&gt;

&lt;h3&gt;
  
  
  How middleware works
&lt;/h3&gt;

&lt;p&gt;As I said before, MAF middleware uses the chain-of-responsibility pattern. Each piece of logic lives in its own node. Every node knows the next one. When a node finishes running, it passes control to the next node.&lt;/p&gt;

&lt;p&gt;Here’s a simple example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;logging_agent_middleware&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;AgentRunContext&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nb"&gt;next&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Callable&lt;/span&gt;&lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="n"&gt;AgentRunContext&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;Awaitable&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[Agent] Starting execution&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;next&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[Agent] Execution completed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The next parameter points to the next node. You can run code before or after calling it.&lt;/p&gt;

&lt;p&gt;The actual agent logic acts as the last node. After all middleware nodes finish, the agent runs.&lt;/p&gt;

&lt;p&gt;In MAF, middleware can run in three stages:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Before or after run or run_stream.&lt;/li&gt;
&lt;li&gt;Before or after a function call.&lt;/li&gt;
&lt;li&gt;Before or after calling the LLM.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn0ahstals97y7enjrmbj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn0ahstals97y7enjrmbj.png" alt="The Microsoft Agent Framework middleware works at different stages of agent execution. Image by Author" width="735" height="360"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now let’s look at how different middleware types work.&lt;/p&gt;

&lt;h3&gt;
  
  
  Function-Based middleware
&lt;/h3&gt;

&lt;p&gt;If your middleware is simple, like just logging agent runs, use a function-based middleware.&lt;/p&gt;

&lt;p&gt;You only need a function with two parameters: context and next. The context keeps your runtime info, and next calls the next node.&lt;/p&gt;

&lt;p&gt;MAF uses the type annotation of context to tell which stage this code belongs to. For example, if it runs at the agent stage, the type should be AgentRunContext:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;logging_agent_middleware&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;AgentRunContext&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nb"&gt;next&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Callable&lt;/span&gt;&lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="n"&gt;AgentRunContext&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;Awaitable&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[Agent] Starting execution&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;next&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[Agent] Execution completed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;For a function call stage, use FunctionInvocationContext:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;logging_function_middleware&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;FunctionInvocationContext&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nb"&gt;next&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Callable&lt;/span&gt;&lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="n"&gt;FunctionInvocationContext&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;Awaitable&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;]],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[Function] Calling &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;function&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;next&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[Function] &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;function&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; completed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And for the chat stage, use ChatContext:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;logging_chat_middleware&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ChatContext&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nb"&gt;next&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Callable&lt;/span&gt;&lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="n"&gt;ChatContext&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;Awaitable&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;]],&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[Chat] Sending &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; messages to AI.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;next&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[Chat] AI response received.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you dislike type annotations, you can use decorators.&lt;/p&gt;

&lt;p&gt;@agent_middleware runs at the agent stage:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@agent_middleware&lt;/span&gt;    
&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;logging_agent_middleware&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;next&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[Agent] Starting execution&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;next&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[Agent] Execution completed&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then you don’t need to add type annotations anymore.&lt;/p&gt;

&lt;p&gt;There are also @function_middleware and @chat_middleware for function calls and chat calls.&lt;/p&gt;

&lt;p&gt;If your middleware needs to save state or handle more complex logic, function-based won’t be enough. Use class-based middleware.&lt;/p&gt;

&lt;h3&gt;
  
  
  Class-Based middleware
&lt;/h3&gt;

&lt;p&gt;Class-based middleware organizes code with object-oriented methods. That lets middleware remember state and handle tricky logic.&lt;/p&gt;

&lt;p&gt;A class-based middleware must meet two rules:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Inherit from the right base class: AgentMiddleware, FunctionMiddleware, or ChatMiddleware.&lt;/li&gt;
&lt;li&gt;Have a process method with the same parameters as the function-based ones. They use the same contexts.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Here’s an example for a middleware class that runs at the function call stage:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;LoggingFunctionMiddleware&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;FunctionMiddleware&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;process&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;FunctionInvocationContext&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="nb"&gt;next&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Callable&lt;/span&gt;&lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="n"&gt;FunctionInvocationContext&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;Awaitable&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;]]&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[Function Class] Calling &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;function&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;next&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[Function Class] &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;function&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; completed.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Just make sure to pair the right base class with the right context type. The others follow the same rule.&lt;/p&gt;

&lt;h3&gt;
  
  
  How to use middleware
&lt;/h3&gt;

&lt;p&gt;There are three stages for middleware and three ways to build it. Let’s put that in one grid chart to see how they connect.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3g7tn6xumfcl4srx01ej.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3g7tn6xumfcl4srx01ej.png" alt="Use a grid chart to describe the implementations of different middleware. Image by Author" width="771" height="391"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The framework now only supports passing middleware when creating the agent:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;chat_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;assistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;instructions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a helpful assistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;get_weather&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
    &lt;span class="n"&gt;middleware&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="n"&gt;logging_agent_middleware&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="nc"&gt;LoggingFunctionMiddleware&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt;  &lt;span class="n"&gt;logging_chat_middleware&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;blocking_middleware&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;logging_function_middleware&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can mix all nine types freely.&lt;/p&gt;

&lt;p&gt;But note that &lt;strong&gt;only the last function middleware you add actually works right now. I’m not sure if that’s a bug, but we’ll find out later.&lt;/strong&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Project Practice: Add Compliance Check to Your Agent
&lt;/h2&gt;

&lt;p&gt;Now let’s get hands-on. I’ll show how to use MAF middleware to add compliance checking to an agent.&lt;/p&gt;

&lt;h3&gt;
  
  
  Why add compliance checks
&lt;/h3&gt;

&lt;p&gt;Every LLM already has basic compliance setups built in based on local laws. When companies self-host LLMs, they also add custom checks in frameworks like vLLM. But those only watch the model’s input or output.&lt;/p&gt;

&lt;p&gt;Now that agents are everywhere, we also need checks at the agent level: preventing prompt injection, checking MCP permissions, and so on. Middleware makes this possible.&lt;/p&gt;

&lt;p&gt;In today’s demo, we’ll review every user message to make sure no one tries to make our finance assistant promise investment returns.&lt;/p&gt;

&lt;p&gt;In the end, the agent will refuse to answer questions like “Will I lose money?” or “Can you guarantee profit?”&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc2uk14uy0lkxmavhew86.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fc2uk14uy0lkxmavhew86.png" alt="Inducing an agent will be blocked by compliance rules specific to certain business scenarios. Image by Author" width="800" height="548"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  How will you do it
&lt;/h3&gt;

&lt;p&gt;Why use compliance checks as an example? Because in real web apps, product teams don’t manage compliance themselves. The compliance department creates the rules and sends them as microservices to each product.&lt;/p&gt;

&lt;p&gt;That way, teams don’t touch those rules. They just plug them in using framework middleware. It’s common in normal web apps.&lt;/p&gt;

&lt;p&gt;We’ll do the same with MAF agents, using middleware to insert compliance logic.&lt;/p&gt;

&lt;p&gt;To simulate real setups, this project has two parts: one server and one client.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5kyxv1p1e9x322w4yq8i.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F5kyxv1p1e9x322w4yq8i.png" alt="The compliance check middleware will include both server and client modules. Image by Author" width="671" height="422"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;On the compliance department side, we’ll deploy a separate agent that reviews messages. It uses an LLM to check user inputs for prompt injections or non-compliant content.&lt;/p&gt;

&lt;p&gt;On the business side, we’ll have a middleware that intercepts user requests and sends them to that server. It decides whether the agent should respond.&lt;/p&gt;

&lt;p&gt;The two parts communicate using the AG-UI protocol.&lt;/p&gt;

&lt;h3&gt;
  
  
  Server implementation
&lt;/h3&gt;

&lt;p&gt;Let’s build the compliance-checking agent server.&lt;/p&gt;

&lt;p&gt;Since it only checks user requests, I’ll use the Qwen3-30b-a3b-instruct-2507 model for speed.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAILikeChatClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;model_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;Qwen3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Q30B_A3B&lt;/span&gt;
&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;create_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Assistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;instructions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;dedent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
    You are a compliance review officer. You will review user requests or system-generated text for compliance.
    Your main task is to check user requests and determine whether they are trying to induce the system to produce content that guarantees investment returns or similar topics.

    You should output a JSON text, like {&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;is_compliance&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: 1, &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;reason&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;: &lt;/span&gt;&lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="s"&gt;}

    Here, is_compliance being 1 means compliant, and 0 means non-compliant.

    reason should state the reason for compliance or non-compliance.

    Only output the JSON text without any markdown formatting, and do not add any introduction or explanation.
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We’ll make the output structured as JSON for clarity and speed.&lt;/p&gt;

&lt;p&gt;Although MAF supports structured output when using Qwen models:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://www.dataleadsfuture.com/make-microsoft-agent-frameworks-structured-output-work-with-qwen-and-deepseek-models/" rel="noopener noreferrer"&gt;Make Microsoft Agent Framework’s Structured Output Work With Qwen and DeepSeek Models&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For some reason, it doesn’t work when used as an AG-UI server.&lt;/p&gt;

&lt;p&gt;So we have to tell the format in the prompt.&lt;/p&gt;

&lt;p&gt;Next, use add_agent_framework_fastapi_endpoint from agent_framework_ag_ui to register it with FastAPI.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;app&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;FastAPI&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;title&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;AG-UI Server&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;add_agent_framework_fastapi_endpoint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/compliance&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Finally, run it with uvicorn:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;__name__&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;__main__&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;uvicorn&lt;/span&gt;
    &lt;span class="n"&gt;uvicorn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;app&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;host&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;0.0.0.0&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;port&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;8888&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Middleware implementation
&lt;/h3&gt;

&lt;p&gt;This middleware is more complex, so we’ll use class-based middleware.&lt;/p&gt;

&lt;p&gt;Here’s the full code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;ComplianceCheckMiddleware&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ChatMiddleware&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_init_compliant_agent&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="nf"&gt;super&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="n"&gt;kwargs&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;process&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ChatContext&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="nb"&gt;next&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Callable&lt;/span&gt;&lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="n"&gt;ChatContext&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;Awaitable&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;]],&lt;/span&gt;
    &lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;check_result&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ReviewResults&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_get_compliance_result&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="ow"&gt;not&lt;/span&gt; &lt;span class="n"&gt;check_result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;is_compliance&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;_output_result&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;😒We can’t keep providing the service because:&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;fill&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;check_result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;reason&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;return&lt;/span&gt;

        &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="nf"&gt;next&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="nd"&gt;@staticmethod&lt;/span&gt;
    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_output_result&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ChatContext&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;is_streaming&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="c1"&gt;#4
&lt;/span&gt;            &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;output_stream&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;AsyncIterable&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;AgentRunResponseUpdate&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
                &lt;span class="k"&gt;yield&lt;/span&gt; &lt;span class="nc"&gt;AgentRunResponseUpdate&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;contents&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;TextContent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;)])&lt;/span&gt;
            &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;output_stream&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AgentRunResponse&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;ChatMessage&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;Role&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ASSISTANT&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;)]&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_get_compliance_result&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;ChatContext&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;ReviewResults&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;role&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;value&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;user&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;:]&lt;/span&gt;
        &lt;span class="n"&gt;response&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;messages&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;#2
&lt;/span&gt;
        &lt;span class="n"&gt;check_result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ReviewResults&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;model_validate_json&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;#3
&lt;/span&gt;        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;check_result&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;_init_compliant_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AGUIChatClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;  &lt;span class="c1"&gt;#1
&lt;/span&gt;            &lt;span class="n"&gt;endpoint&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http://127.0.0.1:8888/compliance&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;compliance_agent&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;instructions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You’re a compliance officer, and you review user requests.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A few details to watch:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;_init_compliant_agent creates the AG-UI client but works just like a normal chat client.&lt;/li&gt;
&lt;li&gt;I sent recent user messages for better review accuracy. &lt;strong&gt;But the AgentMiddleware context only holds the latest message. To get the message history, you must use ChatMiddleware.&lt;/strong&gt;
&lt;/li&gt;
&lt;li&gt;Since AG-UI doesn’t support response_format, I parse JSON manually.&lt;/li&gt;
&lt;li&gt;_output_result sends text output if a check fails. It switches based on context.is_streaming.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Now we can make a business agent. Use a bigger model and a normal system prompt; just remember to load the ComplianceCheckMiddleware.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;chat_client&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OpenAILikeChatClient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;model_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;Qwen3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;NEXT&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;chat_client&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;create_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;chat_assistant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;instructions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;You are a helpful assistant. Answer the user&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s question in short and simple words.&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;middleware&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nc"&gt;ComplianceCheckMiddleware&lt;/span&gt;&lt;span class="p"&gt;()]&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let’s test it with a multi-turn chat client:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;thread&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_new_thread&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;user_input&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;input&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;User: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;startswith&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;exit&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
            &lt;span class="k"&gt;break&lt;/span&gt;
        &lt;span class="n"&gt;stream&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run_stream&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;user_input&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;thread&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;thread&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;Assistant: &lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;stream&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;text&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;end&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;""&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;flush&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You’ll see the agent chats normally most of the time.&lt;/p&gt;

&lt;p&gt;If you ask about guaranteed returns, it refuses to answer but continues working fine afterward.&lt;/p&gt;

&lt;p&gt;Task done.&lt;/p&gt;




&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;In this guide, we explored how middleware works in Microsoft Agent Framework.&lt;/p&gt;

&lt;p&gt;Middleware lets us add new logic for logging, permissions, or compliance without touching the main agent code or prompt text.&lt;/p&gt;

&lt;p&gt;In the project section, I used class-based middleware to show how to review user inputs for compliance.&lt;/p&gt;

&lt;p&gt;We also took a quick look at AG-UI for building agent microservices. This helps when many teams need to make agents collaborate, and I’ll cover AG-UI and A2A in detail later.&lt;/p&gt;

&lt;p&gt;If you have questions or want to learn more, leave a comment.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;&lt;a href="https://www.dataleadsfuture.com/microsoft-agent-framework-maf-middleware-basics-add-compliance-fences-to-your-agent/#/portal/signup" rel="noopener noreferrer"&gt;Don’t forget to subscribe to my blog&lt;/a&gt; and share this article with your friends—maybe it’ll help someone build smarter agents 😁.&lt;/strong&gt;&lt;/p&gt;




&lt;p&gt;Enjoyed this read? &lt;a href="https://www.dataleadsfuture.com/#/portal/signup" rel="noopener noreferrer"&gt;&lt;strong&gt;Subscribe now to get more cutting-edge data science tips straight to your inbox!&lt;/strong&gt;&lt;/a&gt; Your feedback and questions are welcome — let’s discuss in the comments below!&lt;/p&gt;

&lt;p&gt;This article was originally published on &lt;a href="https://www.dataleadsfuture.com/microsoft-agent-framework-maf-middleware-basics-add-compliance-fences-to-your-agent/" rel="noopener noreferrer"&gt;Data Leads Future&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>programming</category>
      <category>ai</category>
      <category>tutorial</category>
      <category>agentaichallenge</category>
    </item>
  </channel>
</rss>
