Sowmya

Posted on Jun 29

How Three People Built a Working AI Agent in One Night: The Architecture, the Bugs, and What We'd Do Differently

#ai #architecture #webdev #programming

Three people. One shared codebase. A working multi-agent system with persistent memory and cost-aware routing by the end of the night.

This is the honest account of how we built it, what broke, and what we'd do differently if we started over.

The Split That Made It Work

Before writing a single line of code, we made one important decision: divide ownership instead of sharing responsibility.

The wrong approach would have been three people building "the agent" together, constantly editing the same files. That's how you end up with merge conflicts and chaos late at night.

Instead, we divided the project into three independent layers.

Person A – Memory Layer

Owned memory.py.

Implemented:

recall_memory(customer) -> str
save_memory(customer, notes) -> None

This module handled persistent memory and was developed completely independently.

Person B – Runtime Layer

Owned runtime.py.

Implemented:

ask_ai(prompt) -> str
get_decisions() -> list

This module handled model execution and routing while remaining isolated from the rest of the system.

Person C – Agents and UI

Owned agents.py and app.py.

Instead of waiting for the other two modules, placeholder functions were created that returned hardcoded responses. That allowed the UI and orchestration logic to be built in parallel.

The most important decision we made wasn't technical.

We agreed on the function signatures before anyone started coding.

recall_memory(customer: str) -> str
save_memory(customer: str, notes: str) -> None
ask_ai(prompt: str) -> str
get_decisions() -> list

Those four functions became the contract between every part of the project.

During development, the application imported fake implementations.

from fakes import recall_memory, save_memory, ask_ai, get_decisions

Once the other modules were finished, only the imports changed.

from memory import recall_memory, save_memory
from runtime import ask_ai, get_decisions

That was the entire integration.

No rewrites.

No merge headaches.

Our Git Strategy

The repository started with a clean main branch containing only the project setup.

Each developer created their own branch.

main
├── person-a (memory.py + data/)
├── person-b (runtime.py)
└── person-c (agents.py, app.py, fakes.py)

Since everyone worked on separate files, merging became straightforward.

At the end:

git fetch origin
git merge origin/person-a
git merge origin/person-b

The only real merge conflict happened inside requirements.txt, where multiple people added dependencies.

The fix was simply combining all required packages.

The final Git history looked like this:

ca5a67b final working sales follow-up agent
5c939a3 Merge remote-tracking branch 'origin/person-b'
97f4546 runtime layer with cascadeflow
e6bea7d memory layer with Hindsight + transcripts
060685b initial repo setup

Readable.

Simple.

Easy to trace.

The Bugs That Actually Happened

Every demo makes things look smooth.

Reality wasn't.

Here are the issues we actually ran into.

A Dependency Version That Didn't Exist

Our requirements.txt specified:

cascadeflow==0.2.0

The problem?

That version had been removed from PyPI.

The installation either failed or installed an incompatible version.

The quality= parameter we needed only became available starting with version 0.7.0.

The fix was:

cascadeflow==0.7.1

Lesson: Always verify that pinned dependency versions actually exist before committing them.

The Reasoning Model Returned Nothing

The large reasoning model (gpt-oss-120b) first generates internal reasoning before producing an answer.

With:

max_tokens = 512

the model spent the entire token budget thinking.

There were no tokens left for the response itself.

The result looked like routing had failed because the returned string was empty.

The actual fix was increasing the token limit.

max_tokens = 2048

Reasoning models need room to both think and answer.

Async Broke Inside Streamlit

Hindsight uses aiohttp.

Running it inside Streamlit caused:

RuntimeError:
Timeout context manager should be used inside a task

The existing event loop conflicted with Hindsight's own async handling.

The solution was wrapping every Hindsight call inside its own thread with a fresh event loop.

def _run_in_thread(fn, *args, **kwargs):
with concurrent.futures.ThreadPoolExecutor(max_workers=1) as executor:
return executor.submit(fn, *args, **kwargs).result(timeout=30)

The callers never had to know threading existed.

Blocks Polluted Agent Responses

The smaller reasoning model (qwen3-32b) occasionally returned internal reasoning wrapped inside:

...

Passing that directly into another agent confused downstream processing.

The fix was removing those sections before handing responses to the next agent.

def strip_think_blocks(text):
return re.sub(r'.*?', '', text, flags=re.DOTALL).strip()

Our Memory Bank Was Full of Test Data

While testing memory persistence, we saved lots of fake customer information.

Unfortunately, those records stayed in the memory bank.

During the live demo, the agent started recalling old testing conversations.

The fix was creating a brand-new memory bank specifically for the demo.

BANK_ID = "acme-demo-live"

Fresh bank.

Fresh memory.

Predictable behavior.

The Wrong Demo Data

The transcript files inside data/ weren't the ones we'd built the demo around.

Our intended story relied on:

Price appearing to be the customer's objection
Security being the real blocker
Sarah being the actual decision-maker

The wrong transcript completely broke that narrative.

We replaced the development data with the correct transcript files before the final presentation.

Lesson: Version control your demo data as carefully as your source code.

What We'd Do Differently
Start with Interfaces

Agreeing on function signatures before coding was the best decision we made.

On a larger team, we'd formalize them even further using shared interfaces, type definitions, or a dedicated types.py.

Test Persistence First

The first successful test should always be:

save_memory()
↓
restart application
↓
recall_memory()

If persistence doesn't work, everything built on top of it becomes unreliable.

Separate Development and Demo Memory

Never test using the same memory bank you'll showcase during a presentation.

Maintain separate environments for development and demonstrations.

Keep the Fake Layer

The placeholder implementations (fakes.py) turned out to be incredibly useful.

They allowed us to:

Test the UI without API costs
Debug layouts independently
Simulate edge cases quickly

Instead of deleting the fake layer after integration, we'd keep it permanently.

The Result

By the end of the project, we had built:

Two collaborating AI agents
Persistent memory across sessions
Cost-aware model routing
A complete routing audit trail
A Streamlit interface with three live panels

The technology certainly mattered.

But the biggest reason the project came together smoothly wasn't the framework, the models, or the libraries.

It was four function signatures that everyone agreed on before writing any code.

Sometimes good software architecture starts with something as simple as a shared contract.

GitHub Repository: https://github.com/Adithya-1987/Sales_agent

Demo Video: https://www.loom.com/share/8ce49cbf4e6b4955917251133dd916b2

Original Technical Write-up: https://dev.to/guguloth_adithyajadhav_9a/a-sales-agent-that-remembers-why-the-deal-is-stuck-80c

Top comments (2)

UnitBuilds • Jun 29

Oh cool, I did that but for coding agents in an evening, then hit bottlenecks, 4 days later, it was a full standalone OS, even UEFI got ripped out 😅

Mateo Ruiz • Jun 29

The interface-first approach is probably the biggest takeaway here. When teams agree on contracts before implementation, parallel development becomes much smoother especially for multi-agent systems where memory, orchestration, and runtime evolve independently. We've found the same pattern on AI engineering projects at IT Path Solutions: clear boundaries between components reduce integration issues far more than changing models or frameworks. Architecture decisions like these tend to have a much bigger impact on delivery speed than the AI stack itself.