<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Sowmya</title>
    <description>The latest articles on DEV Community by Sowmya (@sowmya_5ab5a8078a6f2ad464).</description>
    <link>https://dev.to/sowmya_5ab5a8078a6f2ad464</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F4007770%2F26dd78bf-d44e-42dd-a569-6a6c5217bf1d.png</url>
      <title>DEV Community: Sowmya</title>
      <link>https://dev.to/sowmya_5ab5a8078a6f2ad464</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/sowmya_5ab5a8078a6f2ad464"/>
    <language>en</language>
    <item>
      <title>How Three People Built a Working AI Agent in One Night: The Architecture, the Bugs, and What We'd Do Differently</title>
      <dc:creator>Sowmya</dc:creator>
      <pubDate>Mon, 29 Jun 2026 09:21:24 +0000</pubDate>
      <link>https://dev.to/sowmya_5ab5a8078a6f2ad464/how-three-people-built-a-working-ai-agent-in-one-night-the-architecture-the-bugs-and-what-wed-on</link>
      <guid>https://dev.to/sowmya_5ab5a8078a6f2ad464/how-three-people-built-a-working-ai-agent-in-one-night-the-architecture-the-bugs-and-what-wed-on</guid>
      <description>&lt;p&gt;&lt;em&gt;Three people. One shared codebase. A working multi-agent system with persistent memory and cost-aware routing by the end of the night.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;This is the honest account of how we built it, what broke, and what we'd do differently if we started over.&lt;/p&gt;

&lt;p&gt;The Split That Made It Work&lt;/p&gt;

&lt;p&gt;Before writing a single line of code, we made one important decision: divide ownership instead of sharing responsibility.&lt;/p&gt;

&lt;p&gt;The wrong approach would have been three people building "the agent" together, constantly editing the same files. That's how you end up with merge conflicts and chaos late at night.&lt;/p&gt;

&lt;p&gt;Instead, we divided the project into three independent layers.&lt;/p&gt;

&lt;p&gt;Person A – Memory Layer&lt;/p&gt;

&lt;p&gt;Owned memory.py.&lt;/p&gt;

&lt;p&gt;Implemented:&lt;/p&gt;

&lt;p&gt;recall_memory(customer) -&amp;gt; str&lt;br&gt;
save_memory(customer, notes) -&amp;gt; None&lt;/p&gt;

&lt;p&gt;This module handled persistent memory and was developed completely independently.&lt;/p&gt;

&lt;p&gt;Person B – Runtime Layer&lt;/p&gt;

&lt;p&gt;Owned runtime.py.&lt;/p&gt;

&lt;p&gt;Implemented:&lt;/p&gt;

&lt;p&gt;ask_ai(prompt) -&amp;gt; str&lt;br&gt;
get_decisions() -&amp;gt; list&lt;/p&gt;

&lt;p&gt;This module handled model execution and routing while remaining isolated from the rest of the system.&lt;/p&gt;

&lt;p&gt;Person C – Agents and UI&lt;/p&gt;

&lt;p&gt;Owned agents.py and app.py.&lt;/p&gt;

&lt;p&gt;Instead of waiting for the other two modules, placeholder functions were created that returned hardcoded responses. That allowed the UI and orchestration logic to be built in parallel.&lt;/p&gt;

&lt;p&gt;The most important decision we made wasn't technical.&lt;/p&gt;

&lt;p&gt;We agreed on the function signatures before anyone started coding.&lt;/p&gt;

&lt;p&gt;recall_memory(customer: str) -&amp;gt; str&lt;br&gt;
save_memory(customer: str, notes: str) -&amp;gt; None&lt;br&gt;
ask_ai(prompt: str) -&amp;gt; str&lt;br&gt;
get_decisions() -&amp;gt; list&lt;/p&gt;

&lt;p&gt;Those four functions became the contract between every part of the project.&lt;/p&gt;

&lt;p&gt;During development, the application imported fake implementations.&lt;/p&gt;

&lt;p&gt;from fakes import recall_memory, save_memory, ask_ai, get_decisions&lt;/p&gt;

&lt;p&gt;Once the other modules were finished, only the imports changed.&lt;/p&gt;

&lt;p&gt;from memory import recall_memory, save_memory&lt;br&gt;
from runtime import ask_ai, get_decisions&lt;/p&gt;

&lt;p&gt;That was the entire integration.&lt;/p&gt;

&lt;p&gt;No rewrites.&lt;/p&gt;

&lt;p&gt;No merge headaches.&lt;/p&gt;

&lt;p&gt;Our Git Strategy&lt;/p&gt;

&lt;p&gt;The repository started with a clean main branch containing only the project setup.&lt;/p&gt;

&lt;p&gt;Each developer created their own branch.&lt;/p&gt;

&lt;p&gt;main&lt;br&gt;
├── person-a (memory.py + data/)&lt;br&gt;
├── person-b (runtime.py)&lt;br&gt;
└── person-c (agents.py, app.py, fakes.py)&lt;/p&gt;

&lt;p&gt;Since everyone worked on separate files, merging became straightforward.&lt;/p&gt;

&lt;p&gt;At the end:&lt;/p&gt;

&lt;p&gt;git fetch origin&lt;br&gt;
git merge origin/person-a&lt;br&gt;
git merge origin/person-b&lt;/p&gt;

&lt;p&gt;The only real merge conflict happened inside requirements.txt, where multiple people added dependencies.&lt;/p&gt;

&lt;p&gt;The fix was simply combining all required packages.&lt;/p&gt;

&lt;p&gt;The final Git history looked like this:&lt;/p&gt;

&lt;p&gt;ca5a67b final working sales follow-up agent&lt;br&gt;
5c939a3 Merge remote-tracking branch 'origin/person-b'&lt;br&gt;
97f4546 runtime layer with cascadeflow&lt;br&gt;
e6bea7d memory layer with Hindsight + transcripts&lt;br&gt;
060685b initial repo setup&lt;/p&gt;

&lt;p&gt;Readable.&lt;/p&gt;

&lt;p&gt;Simple.&lt;/p&gt;

&lt;p&gt;Easy to trace.&lt;/p&gt;

&lt;p&gt;The Bugs That Actually Happened&lt;/p&gt;

&lt;p&gt;Every demo makes things look smooth.&lt;/p&gt;

&lt;p&gt;Reality wasn't.&lt;/p&gt;

&lt;p&gt;Here are the issues we actually ran into.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;A Dependency Version That Didn't Exist&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Our requirements.txt specified:&lt;/p&gt;

&lt;p&gt;cascadeflow==0.2.0&lt;/p&gt;

&lt;p&gt;The problem?&lt;/p&gt;

&lt;p&gt;That version had been removed from PyPI.&lt;/p&gt;

&lt;p&gt;The installation either failed or installed an incompatible version.&lt;/p&gt;

&lt;p&gt;The quality= parameter we needed only became available starting with version 0.7.0.&lt;/p&gt;

&lt;p&gt;The fix was:&lt;/p&gt;

&lt;p&gt;cascadeflow==0.7.1&lt;/p&gt;

&lt;p&gt;Lesson: Always verify that pinned dependency versions actually exist before committing them.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The Reasoning Model Returned Nothing&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The large reasoning model (gpt-oss-120b) first generates internal reasoning before producing an answer.&lt;/p&gt;

&lt;p&gt;With:&lt;/p&gt;

&lt;p&gt;max_tokens = 512&lt;/p&gt;

&lt;p&gt;the model spent the entire token budget thinking.&lt;/p&gt;

&lt;p&gt;There were no tokens left for the response itself.&lt;/p&gt;

&lt;p&gt;The result looked like routing had failed because the returned string was empty.&lt;/p&gt;

&lt;p&gt;The actual fix was increasing the token limit.&lt;/p&gt;

&lt;p&gt;max_tokens = 2048&lt;/p&gt;

&lt;p&gt;Reasoning models need room to both think and answer.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Async Broke Inside Streamlit&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Hindsight uses aiohttp.&lt;/p&gt;

&lt;p&gt;Running it inside Streamlit caused:&lt;/p&gt;

&lt;p&gt;RuntimeError:&lt;br&gt;
Timeout context manager should be used inside a task&lt;/p&gt;

&lt;p&gt;The existing event loop conflicted with Hindsight's own async handling.&lt;/p&gt;

&lt;p&gt;The solution was wrapping every Hindsight call inside its own thread with a fresh event loop.&lt;/p&gt;

&lt;p&gt;def _run_in_thread(fn, *args, **kwargs):&lt;br&gt;
    with concurrent.futures.ThreadPoolExecutor(max_workers=1) as executor:&lt;br&gt;
        return executor.submit(fn, *args, **kwargs).result(timeout=30)&lt;/p&gt;

&lt;p&gt;The callers never had to know threading existed.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt; Blocks Polluted Agent Responses&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The smaller reasoning model (qwen3-32b) occasionally returned internal reasoning wrapped inside:&lt;/p&gt;

&lt;p&gt;...&lt;/p&gt;

&lt;p&gt;Passing that directly into another agent confused downstream processing.&lt;/p&gt;

&lt;p&gt;The fix was removing those sections before handing responses to the next agent.&lt;/p&gt;

&lt;p&gt;def strip_think_blocks(text):&lt;br&gt;
    return re.sub(r'.*?', '', text, flags=re.DOTALL).strip()&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Our Memory Bank Was Full of Test Data&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;While testing memory persistence, we saved lots of fake customer information.&lt;/p&gt;

&lt;p&gt;Unfortunately, those records stayed in the memory bank.&lt;/p&gt;

&lt;p&gt;During the live demo, the agent started recalling old testing conversations.&lt;/p&gt;

&lt;p&gt;The fix was creating a brand-new memory bank specifically for the demo.&lt;/p&gt;

&lt;p&gt;BANK_ID = "acme-demo-live"&lt;/p&gt;

&lt;p&gt;Fresh bank.&lt;/p&gt;

&lt;p&gt;Fresh memory.&lt;/p&gt;

&lt;p&gt;Predictable behavior.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The Wrong Demo Data&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The transcript files inside data/ weren't the ones we'd built the demo around.&lt;/p&gt;

&lt;p&gt;Our intended story relied on:&lt;/p&gt;

&lt;p&gt;Price appearing to be the customer's objection&lt;br&gt;
Security being the real blocker&lt;br&gt;
Sarah being the actual decision-maker&lt;/p&gt;

&lt;p&gt;The wrong transcript completely broke that narrative.&lt;/p&gt;

&lt;p&gt;We replaced the development data with the correct transcript files before the final presentation.&lt;/p&gt;

&lt;p&gt;Lesson: Version control your demo data as carefully as your source code.&lt;/p&gt;

&lt;p&gt;What We'd Do Differently&lt;br&gt;
Start with Interfaces&lt;/p&gt;

&lt;p&gt;Agreeing on function signatures before coding was the best decision we made.&lt;/p&gt;

&lt;p&gt;On a larger team, we'd formalize them even further using shared interfaces, type definitions, or a dedicated types.py.&lt;/p&gt;

&lt;p&gt;Test Persistence First&lt;/p&gt;

&lt;p&gt;The first successful test should always be:&lt;/p&gt;

&lt;p&gt;save_memory()&lt;br&gt;
↓&lt;br&gt;
restart application&lt;br&gt;
↓&lt;br&gt;
recall_memory()&lt;/p&gt;

&lt;p&gt;If persistence doesn't work, everything built on top of it becomes unreliable.&lt;/p&gt;

&lt;p&gt;Separate Development and Demo Memory&lt;/p&gt;

&lt;p&gt;Never test using the same memory bank you'll showcase during a presentation.&lt;/p&gt;

&lt;p&gt;Maintain separate environments for development and demonstrations.&lt;/p&gt;

&lt;p&gt;Keep the Fake Layer&lt;/p&gt;

&lt;p&gt;The placeholder implementations (fakes.py) turned out to be incredibly useful.&lt;/p&gt;

&lt;p&gt;They allowed us to:&lt;/p&gt;

&lt;p&gt;Test the UI without API costs&lt;br&gt;
Debug layouts independently&lt;br&gt;
Simulate edge cases quickly&lt;/p&gt;

&lt;p&gt;Instead of deleting the fake layer after integration, we'd keep it permanently.&lt;/p&gt;

&lt;p&gt;The Result&lt;/p&gt;

&lt;p&gt;By the end of the project, we had built:&lt;/p&gt;

&lt;p&gt;Two collaborating AI agents&lt;br&gt;
Persistent memory across sessions&lt;br&gt;
Cost-aware model routing&lt;br&gt;
A complete routing audit trail&lt;br&gt;
A Streamlit interface with three live panels&lt;/p&gt;

&lt;p&gt;The technology certainly mattered.&lt;/p&gt;

&lt;p&gt;But the biggest reason the project came together smoothly wasn't the framework, the models, or the libraries.&lt;/p&gt;

&lt;p&gt;It was four function signatures that everyone agreed on before writing any code.&lt;/p&gt;

&lt;p&gt;Sometimes good software architecture starts with something as simple as a shared contract.&lt;/p&gt;

&lt;p&gt;GitHub Repository: &lt;a href="https://github.com/Adithya-1987/Sales_agent" rel="noopener noreferrer"&gt;https://github.com/Adithya-1987/Sales_agent&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Demo Video: &lt;a href="https://www.loom.com/share/8ce49cbf4e6b4955917251133dd916b2" rel="noopener noreferrer"&gt;https://www.loom.com/share/8ce49cbf4e6b4955917251133dd916b2&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Original Technical Write-up: &lt;a href="https://dev.to/guguloth_adithyajadhav_9a/a-sales-agent-that-remembers-why-the-deal-is-stuck-80c"&gt;https://dev.to/guguloth_adithyajadhav_9a/a-sales-agent-that-remembers-why-the-deal-is-stuck-80c&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>programming</category>
      <category>webdev</category>
      <category>architecture</category>
    </item>
  </channel>
</rss>
