DEV Community

Suhail Tamboli
Suhail Tamboli

Posted on

5 prospects, 20 past interactions, one agent that knew all of it before the call started.

**

I Threw Away the Database and Let Hindsight Remember Everything

**

The first thing most developers do when they start a new project is spin up a database. Postgres, SQLite, whatever — you create a schema, define your tables, and build your data model before you write a single line of business logic. I've done it a hundred times.

For SalesMemory, we didn't. Not because we were being clever. Because once we understood what we were actually building, a relational database was the wrong tool for the job.

Here's what we built: a persistent, agent memory layer for sales reps. Before a call, the rep types a prospect name and gets a structured brief — objections raised, budget signals, competitor mentions, deal health score — synthesised from every past interaction. After the call, they log 2–3 sentences. The agent stores it permanently. Every future brief gets smarter because of it.

The technology making this possible is Hindsight, an agent memory system built by Vectorize that handles retain, recall, and reflect across sessions. Without it, the agent forgets everything the moment a session ends. With it, the agent compounds intelligence over time.

But the more interesting story isn't "we used a memory layer." It's what happens when you commit to that memory layer as your only persistence layer — and what you give up and gain when you do.

**

The Problem We Were Actually Solving

**

Sales reps talk to the same prospects across weeks and months. The context from those conversations — objections raised, budget signals, competitor comparisons, personal details — lives in CRM notes nobody reads, or only in the rep's head.

Every call without memory is a wasted opportunity. The rep re-introduces themselves. They miss that the prospect already said budget is frozen until Q3. They forget that onboarding speed was the deal-breaker last time.

Our product fixes that with one lookup before the call and one note after.

The thing is: this problem doesn't fit neatly into a relational schema. Prospect data isn't structured. A rep logs "she seemed unsure about the onboarding" — not {objection_type: "onboarding", sentiment: "negative", confidence: 0.7}. You can't schema-design your way to capturing what a sales conversation actually contains.

**

The Architecture: One Persistence Layer

**

Most developers would immediately reach for Postgres. We deliberately didn't.

Hindsight is the only persistence layer in SalesMemory. There is no database. Every call log goes into Hindsight via client.retain(). Every pre-call brief is generated by feeding recalled Hindsight memory into the LLM. The entire interaction history lives in semantic memory, not rows.

Here's what storing an interaction looks like:

python
def retain_interaction(prospect_name: str, summary: str, outcome: str, timestamp: str):
    content = f"""
Prospect: {prospect_name}
Date: {timestamp}
Outcome: {outcome}
Summary: {summary}
"""
    client.retain(
        pipeline_id=PIPELINE_ID,
        content=content,
        metadata={
            "prospect": prospect_name,
            "outcome": outcome,
            "timestamp": timestamp,
            "type": "call_log"
        }
    )

Enter fullscreen mode Exit fullscreen mode

And recalling it before a call:

python
def recall_prospect(prospect_name: str) -> str:
    results = client.recall(
        pipeline_id=PIPELINE_ID,
        query=f"prospect interactions with {prospect_name}",
        top_k=10
    )
    return results
Enter fullscreen mode Exit fullscreen mode

That's the entire data layer. No ORM. No migrations. No schema versioning. No connection pooling. No ALTER TABLE when the product changes.

The tradeoff is real: you can't do SELECT COUNT(*) FROM deals WHERE stage = 'closing'. You can't run SQL aggregations across your prospect base. If you need that kind of reporting, you need a database.

What you gain is something SQL can never give you: semantic retrieval. When we query Hindsight with "prospect interactions with Priya Sharma", it returns contextually relevant memories — not just exact-match rows. A rep who logged "she pushed back on timing" and another who wrote "onboarding timeline is a concern" both surface for the same query. Hindsight understands meaning, not keywords.

**

The Deal Health Score: Computed, Not Stored

**

Here's the design decision I find most interesting.

Every pre-call brief includes a Deal Health Score from 0–100. It's one of the features users notice immediately — they want to know if a deal is improving or stalling before they pick up the phone.

We don't store this score anywhere.

There is no deals table with a health_score column. No score recalculation job. No stale data problem. The score is computed fresh on every brief request by feeding all recalled memory into the LLM with a specific scoring rubric:

python
BRIEF_SYSTEM_PROMPT = """
You are a sales intelligence assistant. You read raw memory from past prospect
interactions and return a structured JSON pre-call brief for a sales rep.

Deal health scoring guide:
- 0-20: Cold. No engagement, no signals, or long silence.
- 21-40: Warming up. Early interest but objections unresolved.
- 41-60: Engaged. Active conversations, some positive signals.
- 61-80: Hot. Strong signals, near decision stage.
- 81-100: Closing. Verbal commitment or trial agreed.

Return ONLY valid JSON. No explanation. No markdown. No code fences.

Enter fullscreen mode Exit fullscreen mode

The LLM returns a structured object: score, label (Cold / Warming up / Engaged / Hot / At risk), momentum (improving / stalling / declining), risk (single biggest risk in plain English), recommended_action, and confidence (based on how much memory exists).

Critically: the score automatically improves as more memory accumulates. After one interaction, confidence is low and the score reflects limited data. After four interactions, the LLM has seen objections, resolutions, competitor mentions, and deal progression — and the score reflects that. No developer had to build a recalculation system. It just works because the memory compounds.

Here's a real example. Priya Sharma, VP Sales at Rentokil, after four logged interactions:

Score: 70/100
Label: Engaged
Momentum: ↑ Improving
Risk: "Data migration concerns may stall the deal"
Recommended action: "Provide a detailed data migration plan and timeline"
Confidence: Medium

Enter fullscreen mode Exit fullscreen mode

Compare that to what the brief looks like for a prospect with zero interactions: "No prior context found. Approach as a first contact." Same system, same code — the intelligence difference is entirely a function of accumulated memory.

The Weekly Digest: One Prompt, Five Prospects

The digest endpoint does something non-obvious.

Instead of making five separate recall calls and then trying to merge and compare the results, we pull memory for all prospects in a single pass, build one large prompt containing all contexts, and ask the LLM to categorise and prioritise everything simultaneously:

python
async def generate_digest() -> dict:
    all_prospects = get_all_prospects()
    prospect_contexts = []

    for name in all_prospects:
        recalled = recall_prospect(name)
        if recalled:
            prospect_contexts.append({"name": name, "context": recalled})

    prompt = build_digest_prompt(prospect_contexts)
    response = groq_client.chat.completions.create(
        model="llama-3.3-70b-versatile",
        messages=[
            {"role": "system", "content": DIGEST_SYSTEM_PROMPT},
            {"role": "user", "content": prompt}
        ],
        temperature=0.3,
        max_tokens=1200
    )
    return json.loads(response.choices[0].message.content.strip())

Enter fullscreen mode Exit fullscreen mode

The LLM returns a prioritised action plan sorted into three buckets: 🔴 Needs attention now, 🟡 Follow up this week, 🟢 On track.

action plan

Why one call instead of five? The LLM reasons better when it can compare prospects it can see simultaneously. James Okafor hasn't responded in seven days after we sent him a security report — that surfaces as "needs attention now" because the LLM can see both that there's been recent positive momentum and that the silence is abnormal relative to his prior cadence. Five separate calls wouldn't give it that context.

It also cuts latency by roughly 4x and eliminates a merge step that would require its own logic to get right.

What This Taught Us

Semantic memory is not keyword search. The difference matters when reps log notes informally. "She pushed back on timing" and "onboarding timeline is a concern" both surface for the same recall query because Hindsight retrieves by meaning, not string match. This makes the system usable in the way humans actually communicate.

The prompt is the product. The quality of the deal health score is entirely determined by the scoring rubric in the system prompt. Vague instructions ("rate this deal") produce vague, inconsistent scores. Specific rubrics with deduction/addition logic produce scores that match human intuition. We spent more time on the scoring prompt than on any other single piece of code.

Metadata is what makes retrieval trustworthy. Storing {"prospect": name, "outcome": outcome, "timestamp": timestamp} alongside the content means the memory timeline filters by prospect reliably. Without metadata, you're doing full semantic search on everything and hoping the right things surface. The structure you add at storage time is what you get back at retrieval time.

One LLM call beats five. We initially prototyped the digest as five separate calls with a merge step. The output was coherent per-prospect but missed cross-prospect comparisons the LLM could make when it saw everything at once. Collapsing to one call improved output quality, cut latency, and simplified the code.

The "no CRM update required" framing is what makes it sell. Every salesperson has been told to keep the CRM updated. Nobody does. Our system works with informal, unstructured notes because Hindsight stores and recalls semantic meaning, not structured fields. "She seemed unsure about the onboarding" is valid input. The bar for logging is low enough that reps actually use it.

**

What We'd Do Differently

**

The no-database choice works for the workflows we built. It would break immediately if we needed aggregate reporting — total deal value by stage, close rate by rep, pipeline velocity. For a production system serving a real sales team, you'd probably want Hindsight as the semantic memory layer alongside a lightweight analytics store for structured metrics. They're not mutually exclusive.

The scoring prompt also needs human calibration at scale. Our rubric produces sensible scores for the five prospects we tested it against. Whether it holds across fifty different industries, deal sizes, and sales styles is an open question.

The thing that surprised me most building this: the memory layer isn't a feature. It's the entire product. Without persistent memory across sessions, every brief is generic and every digest is meaningless. With it, the agent genuinely gets smarter over time in the way a good sales manager does — by remembering what happened and reasoning about what it means.


That's the gap SalesMemory fills. Not smarter AI. Just AI that remembers.

Check out the Hindsight documentation if you want to build something similar — the Python SDK is straightforward and the retain/recall pattern maps cleanly to most agent use cases.

Top comments (0)