DEV Community

Shankar S Belavi
Shankar S Belavi

Posted on

Our sales agent gave the same pitch twice to the same prospect. Here's how we gave it a memory.

A prospect stopped the call halfway through and said:

You already told me this last week.

Not a great moment. But it wasn't the rep's fault.
Over several weeks, that prospect had mentioned budget constraints, onboarding concerns, internal approvals, and competitor pricing — across multiple calls, Slack threads, and CRM notes that nobody went back to read. By the next call, almost all of it was gone. The rep walked in blind and repeated themselves.
That moment is what SalesMemory is built to prevent.

What We Built

CRMs store data. This remembers context.
SalesMemory is a web application that gives sales reps persistent agent memory across every prospect interaction. Before a call, the rep types a prospect name. Within seconds, the system generates a complete pre-call intelligence brief: previous objections, budget signals, emotional tone, competitor mentions, momentum shifts, deal risks, a recommended action, and a deal health score.
After the call, the rep writes 2–5 sentences about what happened. That interaction becomes permanent memory. Every future brief gets smarter because of it.
The technology making this possible is Hindsight, an agent memory framework built by Vectorize that enables persistent memory across sessions. Without it, every conversation starts from zero. With it, intelligence compounds over time.

The Architecture Decision Most People Disagreed With

We didn't use a database.
No PostgreSQL. No MongoDB. No ORM. No CRM schema. Most developers would immediately reach for tables — prospects, calls, deal_stage, next_follow_up. We deliberately didn't.
Every interaction is stored as a memory. Every recall is semantic. Hindsight is the only persistence layer in the system.
Here's what storing an interaction looks like:

pythondef retain_interaction(prospect_name: str, summary: str, outcome: str, timestamp: str):
    content = f"""
Prospect: {prospect_name}
Date: {timestamp}
Outcome: {outcome}
Summary: {summary}
"""
    client.retain(
        pipeline_id=PIPELINE_ID,
        content=content,
        metadata={
            "prospect": prospect_name,
            "outcome": outcome,
            "timestamp": timestamp,
            "type": "call_log"
        }
    )
Enter fullscreen mode Exit fullscreen mode

And recalling it before a call:

pythondef recall_prospect(prospect_name: str) -> str:
    results = client.recall(
        pipeline_id=PIPELINE_ID,
        query=f"prospect interactions with {prospect_name}",
        top_k=10
    )
    return results
Enter fullscreen mode Exit fullscreen mode

That's the entire data layer. No migrations. No connection pooling. No ALTER TABLE when the product evolves.
The tradeoff is real — you can't run SQL aggregations across your pipeline. What you gain is something SQL can never give you: semantic retrieval. A rep who logged "she pushed back on timing" and another who wrote "onboarding timeline is a concern" both surface for the same Hindsight query. The system retrieves by meaning, not by string match. That changes everything about how sales context works.

The Moment the Product Stopped Feeling Like a Chatbot

A new prospect with zero interaction history gets a generic brief: "Understand their priorities and introduce the product." Not impressive.
But after 4–5 interactions, something shifts. The system starts behaving like a sales rep who has been inside the deal for months. It remembers unresolved objections, previous frustrations, legal blockers, pricing sensitivity, onboarding concerns, emotional hesitation, momentum changes. And then it reasons about them.
That transition — from retrieval to reasoning — is the real product.

The Deal Health Score: Computed, Not Stored

Every prospect gets a score from 0–100. But here's the part that surprises people: we never store it anywhere.
There's no database field for it. No cron job recalculating metrics overnight. Every time a rep opens a brief, the backend recalls all memory for that prospect, feeds it into the LLM, and asks the model to reason about the deal from scratch:

python
BRIEF_SYSTEM_PROMPT = """
You are a sales intelligence assistant. You read raw memory from past prospect
interactions and return a structured JSON pre-call brief for a sales rep.

Deal health scoring guide:
- 0-20: Cold. No engagement, no signals, or long silence.
- 21-40: Warming up. Early interest but objections unresolved.
- 41-60: Engaged. Active conversations, some positive signals.
- 61-80: Hot. Strong signals, near decision stage.
- 81-100: Closing. Verbal commitment or trial agreed.

Return ONLY valid JSON. No explanation. No markdown. No code fences.
"""
Enter fullscreen mode Exit fullscreen mode

The score is generated fresh every time. That means it automatically improves as more memory accumulates. No migrations. No recalculation pipelines. No stale data. Just evolving context.
The output is a structured object: score, label, momentum, risk, recommended_action, and confidence — calibrated to how much memory actually exists. A first call gets confidence: low. After four or five interactions, the score is grounded in something real.

Real Example: Priya Sharma
One prospect made the value of memory undeniable.
Priya Sharma, VP Sales at Rentokil, across four logged interactions: budget frozen until Q3 → onboarding concerns raised → ROI calculator received positively, pilot requested → pilot agreed, migration concerns emerged.
The pre-call brief after interaction 4:

Score: 70/100
Label: Engaged
Momentum: ↑ Improving
Risk: "Data migration concerns may stall the deal"
Recommended action: "Provide a detailed migration timeline and onboarding plan"
Confidence: Medium
Enter fullscreen mode Exit fullscreen mode

That recommendation was never manually programmed. The model inferred it from accumulated memory. That's the difference between storing notes and understanding a relationship.

The Weekly Digest: One Reasoning Pass, All Prospects

We originally planned to generate one LLM summary per prospect and merge the results. Then we tried something different.
Instead of separate calls, the backend recalls memory for every known prospect, combines everything into one prompt, and asks the model to prioritise all deals simultaneously:

pythonasync def generate_digest() -> dict:
    all_prospects = get_all_prospects()
    prospect_contexts = []

    for name in all_prospects:
        recalled = recall_prospect(name)
        if recalled:
            prospect_contexts.append({"name": name, "context": recalled})

    prompt = build_digest_prompt(prospect_contexts)
    response = groq_client.chat.completions.create(
        model="llama-3.3-70b-versatile",
        messages=[
            {"role": "system", "content": DIGEST_SYSTEM_PROMPT},
            {"role": "user", "content": prompt}
        ],
        temperature=0.3,
        max_tokens=1200
    )
    return json.loads(response.choices[0].message.content.strip())
Enter fullscreen mode Exit fullscreen mode

The output buckets every prospect into: 🔴 Needs attention now / 🟡 Follow up this week / 🟢 On track.
That single architectural decision dramatically improved prioritisation quality, because the model can compare opportunities it can see simultaneously. James Okafor at Paysend — API reliability resolved, security report sent, no response for 7 days, competitor pricing pressure unresolved — surfaces as 🔴 Needs attention now with a specific recommended action. Five separate calls wouldn't produce that cross-prospect reasoning. It also cut latency and API cost compared to the prototype.

What We Learned

Semantic memory is not search. "Implementation timeline concerns" and "onboarding delays" get recalled together even with completely different wording. That changes what the system can actually do with informal, unstructured rep notes.
The prompt is the product. Weak scoring instructions produced vague, inconsistent scores. Detailed rubrics with explicit deduction and addition logic produced reasoning that matched experienced sales intuition. We spent more time on the scoring prompt than on any other single piece of code.
Metadata is what makes retrieval trustworthy. Storing {"prospect": name, "outcome": outcome, "timestamp": timestamp} alongside every interaction means the timeline view filters by prospect reliably. Without it, semantic search at scale becomes a guess.
One reasoning pass beats many. The digest improved dramatically when the model could compare all prospects simultaneously — better output quality, lower latency, simpler code.
Reps don't want admin work. Nobody updates 15 structured CRM fields after a call. But they will write: "She seemed worried about onboarding timelines." That's enough for semantic memory to work. The bar for logging has to be low enough that people actually use it.

The Most Interesting Part

We didn't build a better CRM. We built a system that remembers conversations the way humans do — imperfectly, contextually, semantically, and cumulatively over time.
Traditional CRMs answer: When was the last activity? What stage is the deal in?
This system answers: What concern keeps resurfacing? Is momentum improving or declining? What should the rep focus on right now?
That difference sounds subtle. But once you see a rep walk into a call already aware of months of context, objections, and momentum shifts — it stops feeling like note retrieval. It starts feeling like memory.
If you want to build something similar, the Hindsight documentation covers the retain/recall/reflect pattern in depth. The Python SDK is straightforward and maps cleanly to most agent use cases.

Top comments (0)