<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Imtiaz Ahmad</title>
    <description>The latest articles on DEV Community by Imtiaz Ahmad (@imtiaz_ahmad004).</description>
    <link>https://dev.to/imtiaz_ahmad004</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3966573%2F7d0fedf3-26bc-4c41-9137-ed854a6ee78a.jpeg</url>
      <title>DEV Community: Imtiaz Ahmad</title>
      <link>https://dev.to/imtiaz_ahmad004</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/imtiaz_ahmad004"/>
    <language>en</language>
    <item>
      <title>Building a persistent AI business assistant with LangChain, FastAPI, and Redis</title>
      <dc:creator>Imtiaz Ahmad</dc:creator>
      <pubDate>Wed, 03 Jun 2026 12:50:23 +0000</pubDate>
      <link>https://dev.to/imtiaz_ahmad004/building-a-persistent-ai-business-assistant-with-langchain-fastapi-and-redis-181l</link>
      <guid>https://dev.to/imtiaz_ahmad004/building-a-persistent-ai-business-assistant-with-langchain-fastapi-and-redis-181l</guid>
      <description>&lt;p&gt;TL;DR: I built a personal AI assistant that actually knows my business — using a LangChain agent, dual-layer memory (Redis + pgvector), and a model router that switches between GPT-4o and Claude 3.5 by task type. Here's the full architecture.&lt;/p&gt;




&lt;p&gt;The architecture&lt;/p&gt;

&lt;p&gt;The system has three layers:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Frontend — Next.js 14, WebSocket streaming for real-time responses&lt;/li&gt;
&lt;li&gt;Agent layer — FastAPI + LangChain AgentExecutor with four tools (email, CRM, tasks, calendar)&lt;/li&gt;
&lt;li&gt;Memory layer — Redis for session state, Supabase pgvector for long-term RAG&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The memory problem&lt;/p&gt;

&lt;p&gt;Most LLM demos are stateless. Each request hits the API cold. Jarvis solves this with a hybrid retriever: BM25 keyword search for exact names/dates + semantic cosine search for concepts. A cross-encoder re-ranker then trims results to the top 5 chunks before injection.&lt;/p&gt;

&lt;p&gt;The model router&lt;/p&gt;

&lt;p&gt;Not all tasks need the same model. I route tool-use tasks (CRM lookups, scheduling, email sends) to GPT-4o function calling, and writing/reasoning tasks to Claude 3.5 Sonnet. This cuts costs and improves output quality vs. using one model for everything.&lt;/p&gt;

&lt;p&gt;Code snippet — tool registration in LangChain:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;tools&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="nc"&gt;CRMQueryTool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;supabase&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="nc"&gt;EmailDraftTool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;client&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;sendgrid&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="nc"&gt;TaskManagerTool&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;db&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;redis&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="nc"&gt;CalendarReaderTool&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;agent&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;initialize_agent&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;llm&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;agent&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;AgentType&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;OPENAI_FUNCTIONS&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Key learnings&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Context injection strategy matters more than model choice&lt;/li&gt;
&lt;li&gt;Redis TTL for session memory should match your average session length (I use 2h)&lt;/li&gt;
&lt;li&gt;Always stream responses — users abandon non-streaming AI UIs within 3 seconds&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Full repo coming soon. Follow for updates.&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>architecture</category>
      <category>rag</category>
    </item>
  </channel>
</rss>
