Replacing a CRM's Memory With Hindsight: A FastAPI + React Build Log

#ai #llm #agentmemory #hindsight

I've shipped three SaaS tools in the last four years. Each one had some version of the same problem: the database knew what happened, but nothing could reason about it. You could query "show me all deals in negotiation stage" but not "which deals are about to slip and why." The gap between stored data and actionable intelligence required a human to fill it.

The Deal Intelligence Agent was my attempt to close that gap with a persistent AI memory layer. This is the build log — what the architecture looks like, where it got complicated, and what Hindsight by Vectorize made possible that I couldn't have built myself in reasonable time.

The full stack

Backend: FastAPI with async throughout. Python 3.11. Services are split: MemoryService owns the Hindsight integration, LLMService owns Groq completions and prompt construction, DealService owns business logic, AutopilotService and RoleplayService are higher-level features built on top of those primitives.

Frontend: React with Vite, Tailwind, Framer Motion. Pages: Dashboard (risk heatmap + pipeline forecast), Chat (deal-scoped agent), Deals (CRUD + detail view), Intelligence (competitor radar + pattern analysis), Analytics (revenue forecasting), Autopilot (autonomous scan console), Roleplay (simulation interface).

Inference: Groq running Llama 3.3 70B. Fast enough that streaming feels instant.

Memory: Hindsight's persistent memory layer scoped per deal, with a local defaultdict fallback for development.

Deploy: Docker Compose locally, Render for production. The backend and frontend are separate services; nginx proxies the React build.

What Hindsight replaced

Before Hindsight, I had a PostgreSQL table called deal_events with columns deal_id, event_type, content, created_at. Simple. It stored everything. And it was useless for the agent because I couldn't do semantic search against it without building a separate embedding pipeline, a vector store, an indexing job, a retrieval API.

That's a non-trivial amount of infrastructure for what is, conceptually, a simple feature: "given this question, find the relevant history."

Hindsight replaces all of it. The SDK gives me client.memory.store for writes and client.memory.search for semantic retrieval. The embedding, indexing, and similarity search are handled. I define the structure of what I store; Hindsight handles the retrieval machinery.

python

Write

result = await asyncio.to_thread(
self.client.memory.store,
user_id=deal_id,
text=f"[{entry_type.upper()}] Deal {deal_id}: {content}",
metadata={"deal_id": deal_id, "type": entry_type, "content": content}
)

Read

results = await asyncio.to_thread(
self.client.memory.search,
user_id=deal_id,
query=query,
limit=limit
)
return results.get("memories", [])

That's the entire retrieval pipeline. What would have taken a week of infra work — Pinecone setup, embedding model selection, chunking strategy, index management — is a two-call SDK interface.

The async threading issue that surprised me

Hindsight's Python SDK is synchronous. FastAPI is async. Running a synchronous SDK call inside an async handler blocks the event loop — which in practice means every request queues behind every Hindsight call. Under any real load, this degrades to single-threaded throughput.

The fix is asyncio.to_thread, which runs the synchronous call in a thread pool and returns an awaitable:

python
result = await asyncio.to_thread(
self.client.memory.store,
user_id=deal_id,
text=entry["embedding_text"],
metadata=entry_metadata
)

This is the same pattern you'd use for any blocking I/O (database calls with a synchronous driver, file operations, etc.) in an async context. It's not complicated, but it's easy to miss and the failure mode is subtle — the app "works" under low load and quietly breaks under real usage.

The Autopilot background task pattern

The Autopilot feature — autonomous cross-deal scan, objection resolution, playbook generation — runs as a FastAPI BackgroundTask:

python
@app.post("/api/autopilot/run")
async def run_autopilot(background_tasks: BackgroundTasks):
if autopilot_svc.is_running():
return {"status": "already_running"}
background_tasks.add_task(autopilot_svc.run_autopilot_loop)
return {"status": "started"}

The loop runs asynchronously without blocking the HTTP response. The frontend polls /api/autopilot/logs for live log updates, giving the impression of streaming execution without a WebSocket. The logs are stored in a module-level list with timestamps and structured levels (INFO, PROCESS, RECALL, MATCH, SUCCESS) that the frontend uses to color-code the console output.

Revenue forecasting from typed memory

One feature that came almost for free from the write-everything approach: the revenue forecast is calculated from typed memory entries rather than from a separate analytics model.

python
async def get_pipeline_forecast(self) -> Dict:
stage_weights = {
"prospecting": 0.1, "qualification": 0.2,
"proposal": 0.4, "negotiation": 0.7, "closing": 0.9
}
weighted_pipeline = sum(
deal["deal_value"] * stage_weights.get(deal["stage"], 0.1)
for deal in active_deals
)

Stage weights applied to deal values give a probabilistic forecast. The deal values and stages are updated from memory events — stage changes, outcome signals — which means the forecast updates automatically as memory is written. No separate ETL. No scheduled job. The memory layer is the source of truth for analytics.

What I'd build differently

The one architectural decision I'd revisit: I used deal_id as Hindsight's user_id from day one, which is the right call for per-deal memory isolation. But the agent memory query path for cross-deal reasoning — used by the Autopilot — requires iterating across all deal stores manually rather than issuing a single cross-pipeline query. For a system with hundreds of deals, that iteration becomes expensive. I'd design a separate "pattern" pipeline in Hindsight for cross-deal memories from day one, and write closed-deal summaries there at outcome time.

The rest I'd keep. The write-everything discipline, the typed embedding prefixes, the in-process fallback, the background task pattern — all of those decisions held up under real usage.

GitHub: github.com/chaitanya07-ai/deal-intelligence-agent | Live: deal-intelligence-agent-1.onrender.com