Sadhuram Agarwal

Posted on May 17

How I Stopped My AI Sales Agent From Forgetting Everything Using Hindsight

#webdev #ai #javascript #python

Sales reps forget things. That's not an insult — it's arithmetic.
Fifty active deals, three calls a week each, one brain.
Something gets dropped. An objection from Call 2 goes unaddressed in Call 5.
A competitor mention slips through. The CFO's name is forgotten.
The deal dies not because the product was wrong, but because the rep
sounded like they'd never met this person before.

I built DealMind AI to fix this. Here's exactly how it works and
what I learned building it.

## What DealMind AI Does

DealMind AI is a sales intelligence agent with persistent memory.
It sits between a sales rep and their prospects. Every call gets logged.
Every objection, competitor mention, budget detail, and commitment
gets stored in a persistent memory bank. When the rep comes back
a month later, the agent recalls everything relevant and tells
them exactly what to say.

The result: the agent knew that Ananya Singh from HealthPlus India
had raised board approval concerns in 4 of her 5 calls, that her
CFO needed compliance docs before Q3, and that she'd asked for a
pilot program in Call 3 — without the rep having to remember any of it.

The stack:

Memory: Hindsight by Vectorize — persistent semantic memory
Runtime Intelligence: cascadeflow — cost-intelligent model routing
LLM: Groq (llama-3.3-70b-versatile)
Backend: FastAPI (Python)
Frontend: React + Tailwind CSS

The Problem With Stateless Agents

Every AI agent I'd built before this had the same flaw:
it started every conversation from zero. You could give it
a system prompt, stuff context into the window, pass in
a CRM summary — but it had no memory of its own.
It couldn't learn. It couldn't notice patterns.
It couldn't say "you mentioned this exact concern three weeks ago."

The moment I realized this was the wrong architecture:
I asked an agent to prep me for a call with a prospect
I'd spoken to four times. It gave me generic discovery questions.
It had no idea we were past discovery. That's not an agent.
That's autocomplete with a chat interface.

Why Hindsight Changes Everything

Hindsight is a
persistent memory engine for AI agents built by Vectorize.
The full documentation is at hindsight.vectorize.io.
For a deeper understanding of what agent memory means architecturally,
read Vectorize's agent memory overview.

The core idea: instead of stuffing context into a prompt,
you give your agent a persistent memory bank it can write to and
read from across sessions. Three operations:

retain — write a memory
recall — semantic search across all stored memories
reflect — generate a synthesized, reasoned response grounded in memory

In DealMind, every prospect gets their own Hindsight memory bank
with a mission statement that tells the agent what to care about:

def hindsight_ensure_bank(bank_id: str, prospect_name: str, company: str):
    http_requests.put(
        f"{HINDSIGHT_BASE}/banks/{bank_id}",
        json={
            "mission": f"Sales intelligence for {prospect_name} from {company}. "
                       f"Track all objections, budget details, competitor mentions, "
                       f"and commitments across every interaction."
        },
        headers=get_hindsight_headers()
    )

Storing a call note is a single POST:

def hindsight_store(bank_id: str, content: str):
    r = http_requests.post(
        f"{HINDSIGHT_BASE}/banks/{bank_id}/memories",
        json={"items": [{"content": content}]},
        headers=get_hindsight_headers()
    )
    return r.status_code == 200

Recalling everything relevant before a call:

def hindsight_recall(bank_id: str, query: str) -> str:
    r = http_requests.post(
        f"{HINDSIGHT_BASE}/banks/{bank_id}/memories/recall",
        json={"query": query, "top_k": 10},
        headers=get_hindsight_headers()
    )
    if r.status_code == 200:
        results = r.json().get("results", [])
        return "\n".join([item.get("text", "") for item in results])
    return ""

And the most powerful operation — reflect — generates a
synthesized analysis grounded in everything stored:

def hindsight_reflect(bank_id: str, query: str) -> str:
    r = http_requests.post(
        f"{HINDSIGHT_BASE}/banks/{bank_id}/reflect",
        json={"query": "What are the most critical things to know "
                       "before the next call? Focus on objections, "
                       "budget, competitors, and commitments."},
        headers=get_hindsight_headers()
    )
    if r.status_code == 200:
        return r.json().get("text", "")
    return ""

The agent uses all three in combination. Before any call,
it recalls raw memories, reflects on them, and feeds both
into the prompt that generates the call prep brief.

The Call Prep Brief in Action

This is what the agent actually produces when a rep clicks
"Prep for Call" on Ananya Singh's ₹50L deal after 5 calls:

TOP 3 THINGS TO REMEMBER:

Board approval required for deals above ₹10L — raised in 4 of 5 calls
CFO approval needed before Q3 ends — confirmed in Call 2
She requested a pilot program in Call 3 — never followed up on

BIGGEST OBJECTION TO HANDLE TODAY:
Objection: Budget needs CFO approval before Q3 ends
Script: "Ananya, I know the CFO needs to sign off before Q3.
I can have the full compliance package to you by Thursday —
that gives Rajesh two weeks to review before the deadline."
COMPETITOR WATCH:
Salesforce India — mentioned in Call 2 as alternative being evaluated
SINGLE BEST NEXT STEP:
Send the security whitepaper and compliance docs to Rajesh directly.
She asked for this in Call 3. It hasn't been sent yet.

That's not generated from a template. Every line references
something real from memory. The agent knew about Rajesh
(the CFO) because a rep mentioned him in Call 2 and
Hindsight stored it.

The backend exposes 9 endpoints:

POST /log-call          # stores call in Hindsight memory
GET  /recall/{id}       # semantic search across past calls  
POST /prepare-for-call  # AI call prep brief from memory
POST /draft-followup    # personalized follow-up email
GET  /deal-risk/{id}    # AI risk score 1-10
GET  /audit-trail       # full cost + model audit log
GET  /prospects         # all prospects with memory profiles

Every time a call is logged, it goes to two places:
the Hindsight Cloud bank for that prospect,
and a local in-process store as a fallback.
This dual-write architecture means the agent is never
blind — even if the cloud connection drops.

def get_best_memory(prospect_id: str, query: str) -> str:
    bank_id = get_bank_id(prospect_id)

    # Try Hindsight Cloud semantic recall first
    hs_memory = hindsight_recall(bank_id, query)
    if hs_memory and len(hs_memory) > 50:
        return f"[Hindsight Cloud Memory]\n{hs_memory}"

    # Try Hindsight reflect for synthesized intelligence
    hs_reflect = hindsight_reflect(bank_id, query)
    if hs_reflect and len(hs_reflect) > 50:
        return f"[Hindsight Cloud Reflection]\n{hs_reflect}"

    # Fall back to local memory
    local = local_recall(prospect_id)
    if local:
        return f"[Local Memory]\n{local}"

    return "No previous interactions found."

How cascadeflow Cut Inference Cost by 90%

The other technology in this stack is
cascadeflow —
a runtime intelligence layer for AI agents.
Full docs at docs.cascadeflow.ai.

The problem it solves: not every query needs GPT-4.
Most of them don't. But if you default everything to your
most capable model, costs spiral fast.
cascadeflow routes queries to the cheapest model that
can handle them and only escalates when quality requires it.

The result in DealMind:

90% cost reduction vs GPT-4 baseline
Average latency: 631ms per query
Total cost per full session: $0.000386

The Before/After That Convinced Me

Without Hindsight:

Rep opens the agent before Call 5 with Ananya Singh.

Agent: "This appears to be the first interaction with this prospect.
Start with discovery questions."

The agent had no idea they were 5 calls deep into a ₹50L deal.

With Hindsight:

Same scenario, same agent, Hindsight memory active.

Agent: "Board approval required for deals above ₹10L —
raised in 4 of 5 calls. CFO Rajesh needs compliance docs
before Q3. She asked for a pilot program in Call 3 that
was never followed up on. She mentioned Salesforce as
an alternative in Call 2."

The agent remembered. The rep walked into the call prepared.

The Deal Risk Scorer

One feature that surprised me with how well it worked:
the deal risk scorer. It pulls the full memory for a prospect,
feeds it to the LLM, and asks for a structured JSON risk assessment:

prompt = f"""
Analyze this sales deal. Return ONLY raw JSON.

Deal memory:
{memory}

Return exactly:
{{
  "risk_score": <1-10, 10=highest risk>,
  "risk_reason": "<one specific sentence with real details>",
  "recommended_action": "<one concrete next step>",
  "deal_stage": "<discovery|evaluation|negotiation|closing>"
}}
"""

The agent doesn't just return a number. It returns a reason
grounded in the actual conversation history —
"risk score 6 because board approval has been pending for
3 calls and Q3 deadline is in 2 weeks."

What I Learned

1. Memory changes the category of what you're building.
Without memory, you have a chatbot. With memory, you have an agent.
The difference is that an agent can have intent across time —
it can notice patterns, track commitments, and change behavior
based on history. Hindsight is what makes that possible.

2. Dual memory architecture is production-grade thinking.
Cloud + local fallback isn't over-engineering.
It's the difference between a demo that works in a conference room
and a product that works when your cloud provider has an outage.

3. The mission statement matters.
Giving each Hindsight bank a custom mission statement
changed the quality of recall dramatically.
"Track objections, budget details, competitor mentions,
and commitments" tells the memory engine what to prioritize.
Generic banks give generic results.

4. cascadeflow is infrastructure, not a feature.
Routing queries intelligently from the start means
you never have to retrofit cost controls later.
95.8% savings isn't a benchmark — it's what happens when
you don't default every query to your most expensive model.

5. The demo moment is everything.
The moment where the agent references something from
Call 2 while prepping for Call 5 — without being told to —
is the moment that makes people stop and pay attention.
Build toward that moment.