đźš© The Problem
Every LLM I’ve worked with has the same fundamental flaw: it forgets everything the moment a request ends.
For a chatbot answering trivia, that’s fine.
For a sales intelligence agent that needs to recall objections from three calls ago, competitor mentions from last quarter, and the fact that the CTO is the real decision‑maker — it’s a non‑starter.
That’s the gap we set out to solve when building the Deal Intelligence Agent: a FastAPI + React system that gives sales teams a persistent, queryable memory layer across their pipeline.
⚙️ Architecture Overview
We designed the agent as a layered system where memory is a first‑class citizen:
FastAPI backend → REST + streaming endpoints
MemoryService → wraps Hindsight SDK for semantic retrieval
LLMService → Groq Llama 3.3 70B completions with injected context
DealService → orchestrates memory + LLM logic
React frontend → chat, deal detail, competitor radar, risk heatmap, revenue forecasting
Twilio + SMTP → SMS, voice calls, personalized follow‑up emails
Every outbound action — SMS, email, briefing — writes back to Hindsight as a memory event. The agent’s context grows with every interaction.
📝 How Memory Gets Written
Memory writes are treated as domain events, not side effects.
python
async def store_memory(
self,
deal_id: str,
entry_type: str,
content: str,
metadata: Optional[Dict] = None
) -> Dict:
entry = {
"id": self._generate_id(deal_id, content),
"deal_id": deal_id,
"type": entry_type,
"content": content,
"embedding_text": f"[{entry_type.upper()}] Deal {deal_id}: {content}"
}
if self.use_hindsight:
result = await asyncio.to_thread(
self.client.memory.store,
user_id=deal_id,
text=entry["embedding_text"],
metadata={"deal_id": deal_id, "type": entry_type, "content": content, **metadata}
)
Embedding text prepends [OBJECTION], [COMPETITOR], [STAKEHOLDER] → retrieval is context‑aware.
Async wrappers (asyncio.to_thread) prevent blocking FastAPI’s event loop.
🔍 How Memory Gets Read
Retrieval is scoped by deal ID, ensuring precision.
python
@app.post("/api/chat")
async def chat(msg: ChatMessage):
memories = []
if msg.deal_id:
memories = await memory_svc.get_relevant_memories(
deal_id=msg.deal_id,
query=msg.message,
limit=10
)
response = await llm_svc.chat_with_context(
user_message=msg.message,
memories=memories,
deal_id=msg.deal_id,
extra_context=msg.context
)
The agent runs semantic search, returns the top 10 relevant entries, and injects them into the prompt.
Responses shift from generic advice to deal‑aware strategy.
📊 Example Context Injection
text
[MEMORY CONTEXT]
1. [OBJECTION][2024-11-03] Price is 40% above current vendor
2. [COMPETITOR][2024-11-03] Salesforce mentioned as incumbent
3. [STAKEHOLDER][2024-10-28] David Kim (CTO) — wants API docs
4. [PRICING][2024-11-10] Offered 15% discount; they want 25%
đź’ˇ Lessons Learned
Treat memory writes as domain events, not logs.
Scope memory by the right identity (deal_id worked best).
Semantic retrieval beats brute‑force context stuffing.
Async wrappers prevent concurrency bugs.
Graceful degradation makes onboarding smoother.
🎯 Conclusion
LLMs are stateless — but agents don’t have to be.
With Hindsight, the gap between “generic chatbot” and “deal‑aware agent” closes with a handful of well‑placed store_memory and get_relevant_memories calls.
The model stays dumb.
The architecture makes it look smart.
👉 Repo: https://github.com/chaitanya07-ai/deal-intelligence-agent
👉 Live Demo: https://deal-intelligence-agent-1.onrender.com
Top comments (0)