My Sales Agent Stopped Forgetting. Here's What Changed When I Added Hindsight.

SOMESH PANDEY — Sat, 11 Apr 2026 18:47:10 +0000

My Sales Agent Stopped Forgetting.
Here’s What Changed When I Added Hindsight.
By Somesh Pandey · Team: Harshit Pandey, Shiva Singh, Somesh Pandey
GitHub: github.com/HarshitPandey-2021/salesgpt-hackathon Demo: gpt-sales.streamlit.app

Sales reps don’t lose deals because the product is bad. They lose because they forget.
I’ve seen it happen. A rep gets on their fifth call with a prospect and asks: “So what’s your team size again?” The prospect — who already answered this question twice — goes quiet for a second. That silence is expensive. It signals to the buyer that they’re not important enough to be remembered. And somewhere in that silence, the deal starts to die.
That’s the problem SalesGPT was built to fix. Not with a fancier CRM or a better prompt template — but with real, persistent agent memory that carries full prospect context across every conversation, indefinitely.

What SalesGPT Does
SalesGPT is an AI sales assistant that maintains a persistent, per-prospect memory layer. When a rep talks to “Sarah” on Wednesday, the agent already knows that Sarah mentioned a budget concern last Monday, runs a 5-person team, and is specifically interested in automation features.
The agent isn’t querying a CRM or scanning a chat log. It’s recalling structured facts about Sarah from a dedicated memory bank — injecting that context into every response and retaining new information after each interaction. Over time, it builds a genuine model of each prospect: their constraints, priorities, and objections.

The stack is deliberately minimal:
• Hindsight by Vectorize — for persistent agent memory (retain, recall, reflect)
• Groq (llama-3.3-70b-versatile) — for fast LLM inference
• Streamlit — for the conversational frontend
• Python — for the orchestration layer

The Core Technical Story: The Retain–Recall Loop
The most interesting engineering problem wasn’t the LLM integration. It was the memory loop.
The naive approach is to dump the full conversation history into the prompt on every call. This works for three conversations. By conversation fifteen, you’re burning tokens on irrelevant context from two months ago, latency spikes, and the agent starts confusing details across prospects.
Hindsight solves this differently. Instead of raw conversation logs, it extracts structured facts from each interaction and stores them in a per-prospect memory bank. When the agent needs context, it doesn’t retrieve everything — it recalls only what’s semantically relevant to the current query.

The Flow in Code
Here is the core memory loop from SalesGPT:

Step 1: Recall relevant past context for this prospect

memories = hindsight_client.recall(
bank_id=f"prospect_{prospect_id}",
query=user_message
)

Step 2: Build context-aware prompt

context = "\n".join([m["content"] for m in memories])
prompt = f"""You are a sales assistant.
Prospect context from past interactions:
{context}

Current message: {user_message}"""

Step 3: Generate response with Groq

response = groq_client.chat.completions.create(
model="llama-3.3-70b-versatile",
messages=[{"role": "user", "content": prompt}]
)

Step 4: Retain new interaction for future recall

hindsight_client.retain(
bank_id=f"prospect_{prospect_id}",
content=f"Rep: {user_message}\nAgent: {response_text}"
)

The bank_id pattern is key. Each prospect gets their own isolated memory bank. When a rep starts a session with Sarah, recall pulls Sarah’s history — not a mix of Sarah, Mike, and the rest of the pipeline.
Hindsight’s recall isn’t keyword search. It uses semantic retrieval across stored facts, which means a query like “what are her budget concerns?” can surface a fact stored as “Sarah mentioned pricing is steep for a team of five” — even though the words don’t literally overlap. That’s the difference between real agent memory and a CRM search bar.

Before and After: What Actually Changes
The difference isn’t subtle. Here’s a real example from a test scenario:

Call 1 (Monday)
Rep: "Tell me about your team and what you’re looking for."
Sarah: "We’re a 5-person team. I liked the demo, but pricing feels steep."

Call 2 (Wednesday) — Without Memory
Agent: "Hi Sarah! How can I help you today?"
"Could you remind me of your team setup?"

Call 2 (Wednesday) — With Hindsight
Agent: "Sarah, last we spoke you flagged pricing as a concern for your
5-person team. I wanted to share our Starter plan — designed for
teams under 10, roughly half the cost of our standard tier.
Does that change things?"

The second response is only possible if the agent knows Sarah’s team size, remembers the pricing objection, and connects a solution to that specific objection. None of that exists in the current session. All of it came from Hindsight recall.

Results
After running five conversations with the same test prospect:

Metric Result
Key details retained (team size, budget, objections) 100%
Responses personalized using prospect-specific context 95%
Repeated questions across sessions Zero
Memory confidence score after 5 interactions 87%

Lessons Learned

Separate memory banks per prospect are non-negotiable. A single shared memory bank across all prospects creates retrieval noise. The agent starts confusing Sarah’s objections with Mike’s. Per-entity isolation was the first architectural decision we made and never needed revisiting.
What you retain matters as much as how you recall. Early on, we retained entire conversation transcripts. We switched to retaining structured summaries — “prospect raised pricing concern, 5-person team, interested in automation” — and recall quality improved immediately. Garbage in, garbage out applies directly to memory systems.
Fast inference is not optional in a conversational agent. We needed an extra round-trip to Hindsight on every message. Groq’s llama-3.3-70b inference is fast enough that users don’t feel it. A slower model would have made every interaction feel laggy in a way that breaks the sales conversation flow.
The value of memory is invisible until the second conversation. The first conversation with a new prospect looks identical whether memory is enabled or not. The ROI only shows on call two. Building demos and validating memory systems requires multi-session test cases — single-session evals miss this entirely.
Prompt structure around injected memory changes response quality significantly. Unstructured injection (“here are some things I remember”) produces worse responses than structured injection (“Prospect profile: [facts]” + “Recent objections: [facts]”). Treat memory injection like a structured input format, not an append operation.

What’s Next
The current implementation handles individual prospect memory well. The natural extension is team-level memory: a shared bank where multiple reps contribute to a growing model of each account. When reps change, the institutional knowledge stays.
The other clear direction is multi-channel memory — email threads, call recordings, LinkedIn interactions — all flowing into the same Hindsight bank that the agent queries. That’s where persistent memory becomes genuinely transformative for sales organizations.
For now, SalesGPT solves the original problem: no more “could you remind me what you mentioned last time?” The agent already knows.

The code is open source: github.com/HarshitPandey-2021/salesgpt-hackathon. The live demo runs at gpt-sales.streamlit.app. If you want to understand how I built the memory layer, the Hindsight documentation is a good starting point.

# How I Stopped Repeating Discovery Calls with Hindsight Most sales assistants sound good in one chat and forget everything in the next. I wanted the opposite: a system that compounds context, so the second conversation is better than the first and the fi