How I Built a Sales Agent That Scores Deal Health from Memory — No CRM Update Required

SHRISHANT KASHID — Mon, 18 May 2026 17:53:16 +0000

Sales reps talk to the same people for months. The context — budget concerns raised in week two, the competitor mentioned offhand in week five, the onboarding objection that nearly killed the deal — lives in CRM fields nobody reads, or only in the rep's head. Every call starts with re-establishing what both sides already know.
I wanted to fix that with a simple constraint: one lookup before the call, one note after. No structured fields, no mandatory forms, no CRM discipline required.
The result is SalesMemory — a web app that gives reps a persistent agent memory layer across every prospect interaction. Before a call, the rep gets a structured brief: objections raised, budget signals, competitor mentions, deal health score. After the call, they write 2–3 sentences. That's it. The agent does the rest.

The Core Architecture

Frontend: React + Vite + Tailwind CSS (Vercel)
Backend: Python + FastAPI (Render)
Memory layer: Hindsight via the hindsight-client Python SDK
LLM: Groq API — llama-3.3-70b-versatile 5.Database: None

That last point is intentional. Hindsight is the only persistence layer in the system. Every interaction is stored as a memory with metadata, recalled via semantic query, and fed directly to the LLM. There's no PostgreSQL, no schema, no ORM. The tradeoff — you can't write SELECT SUM(deals) WHERE stage = 'closing'is one we made consciously, because the gain is something SQL can never give you: semantic retrieval.
Querying Hindsight for "Priya's budget concerns" returns everything contextually relevant to that topic, even if the rep logged it as "she mentioned the rollout budget might be an issue until Q3." Keyword search doesn't do that. SQL doesn't do that.

How Memory Is Stored and Recalled

After a call, the rep types the prospect name, writes a few sentences about what happened, and picks an outcome tag (First contact / Objection logged / Positive signal / Deal progressed). The backend formats that into a structured memory entry and stores it permanently via Hindsight:

def retain_interaction(prospect_name: str, summary: str, outcome: str, timestamp: str):
    content = f"""
Prospect: {prospect_name}
Date: {timestamp}
Outcome: {outcome}
Summary: {summary}
"""
    client.retain(
        pipeline_id=PIPELINE_ID,
        content=content,
        metadata={
            "prospect": prospect_name,
            "outcome": outcome,
            "timestamp": timestamp,
            "type": "call_log"
        }
    )

The metadata is what makes this trustworthy. Without it, you're doing full semantic search across everything and hoping the right memories surface. With

{"prospect": name, "type": "call_log"}

, the timeline view for any individual prospect filters reliably — you get their history, not someone else's.
Before the next call, the system recalls up to ten recent interactions for that prospect:

def recall_prospect(prospect_name: str) -> str:
    results = client.recall(
        pipeline_id=PIPELINE_ID,
        query=f"prospect interactions with {prospect_name}",
        top_k=10
    )
    return results

That recalled context — the full interaction history, in plain language — goes directly into the LLM prompt for the pre-call brief. No transformation step, no structured parsing of past data. The model reads what the rep wrote and extracts signal.

The Deal Health Score: Compute It Fresh Every Time

Most products that surface a "health score" store it somewhere. We don't. The score doesn't exist until a brief is requested, at which point the LLM reads all recalled memory for that prospect and reasons about momentum, risk, and confidence.
This architecture has one property that surprised me: the score automatically improves as memory accumulates. There are no recalculation jobs, no schema migrations when the scoring logic changes. Update the prompt, and the next brief for every prospect reflects the updated reasoning.
The system prompt gives the LLM explicit scoring guidance:

BRIEF_SYSTEM_PROMPT = """
You are a sales intelligence assistant. You read raw memory from past prospect
interactions and return a structured JSON pre-call brief for a sales rep.

Deal health scoring guide:
- 0-20: Cold. No engagement, no signals, or long silence.
- 21-40: Warming up. Early interest but objections unresolved.
- 41-60: Engaged. Active conversations, some positive signals.
- 61-80: Hot. Strong signals, near decision stage.
- 81-100: Closing. Verbal commitment or trial agreed.

Return ONLY valid JSON. No explanation. No markdown. No code fences.

The scoring logic penalizes unresolved objections, budget uncertainty, long gaps since last contact, and competitor mentions without resolution. It rewards pilot agreements, confirmed budgets, multiple positive signals, and clear next steps.
The output is a structured object with six fields: score, label, momentum, risk, recommended_action, and confidence. That last field — confidence — is calibrated to how much memory actually exists for the prospect. A rep walking into their first call gets confidence: low and a frank note that there's no history to draw on. After four or five interactions, confidence: medium or high means the score is actually grounded in something.

What the Memory Compounding Looks Like in Practice

Here's Priya Sharma, VP Sales at Rentokil, after four logged interactions:
Interaction 1: Budget freeze flagged.
Interaction 2: Onboarding timeline concern raised.
Interaction 3: ROI calculator landed well. Pilot requested.
Interaction 4: Pilot agreed. Data migration concern surfaced.

Pre-call brief after interaction 4:

Score: 70/100
Label: Engaged
Momentum: ↑ Improving
Risk: "Data migration concerns may stall the deal"
Recommended action: "Provide a detailed data migration plan and timeline to alleviate Priya's concerns"
Confidence: Medium

The brief for Priya after interaction 1 was generic — "budget freeze mentioned, no clear next step, proceed cautiously." The brief after interaction 4 is specific, scored, and tells the rep exactly what to do on this call. That delta is the entire product value made visible.

Compare that to James Okafor, Head of Revenue at Paysend. Three interactions: API reliability concern, then a technical deep dive that went well, then a security report sent — after which he went silent for seven days. His brief: Score: ~61, Engaged but stalling. Risk: competitor discount pressure. Status: Needs attention now.

The agent flagged James as "needs attention now" in the weekly digest. Without memory, he's just a name in a pipeline.

The Weekly Digest: One Prompt, All Prospects
The most interesting architectural decision in the system is the digest endpoint. Instead of making one LLM call per prospect and merging the results, it recalls memory for every known prospect in a single pass, builds one large prompt with all contexts, and asks the model to categorize and prioritize all of them simultaneously:

async def generate_digest() -> dict:
    all_prospects = get_all_prospects()
    prospect_contexts = []

    for name in all_prospects:
        recalled = recall_prospect(name)
        if recalled:
            prospect_contexts.append({"name": name, "context": recalled})

    prompt = build_digest_prompt(prospect_contexts)
    response = groq_client.chat.completions.create(
        model="llama-3.3-70b-versatile",
        messages=[
            {"role": "system", "content": DIGEST_SYSTEM_PROMPT},
            {"role": "user", "content": prompt}
        ],
        temperature=0.3,
        max_tokens=1200
    )
    return json.loads(response.choices[0].message.content.strip())

The output buckets every prospect into one of three categories:

🔴 Needs attention now — no response in 5+ days, stalled deal, at-risk signal
🟡 Follow up this week — active deal, clear next step needed
🟢 On track — waiting on prospect, no rep action needed

Each item includes a specific reason and a one-sentence action. The LLM reasons better when it can see all five prospects simultaneously — it can compare James Okafor going silent for seven days against Marcus Webb where budget is confirmed but the rep hasn't followed up on the lunch-and-learn request. Five separate calls can't produce that relative prioritization.

What I Learned

Semantic recall is not keyword search. "She pushed back on timing" and "onboarding timeline is a concern" both surface for the same Hindsight query. This matters in practice because reps don't log notes in consistent language, and you can't force them to.

The prompt is the product. The deal health score quality is entirely determined by the system prompt. "Rate the deal from 0 to 100" produces meaningless scores. Explicit deduction logic — "unresolved objections drop the score; pilot agreements raise it" — produces scores that match what an experienced rep would say about the deal.

Metadata is what makes retrieval trustworthy. Storing every interaction with {"prospect": name, "outcome": outcome, "timestamp": timestamp} means the timeline view filters accurately by prospect. Without it, semantic search is a guess.

One LLM call beats five. The digest endpoint cuts latency, cuts API cost, and produces more coherent prioritization because the model can compare prospects it can see simultaneously.

No CRM update required" is the actual value prop. Every sales tool has told reps to keep the CRM updated. None of them do. A system that works with informal, unstructured notes — because Hindsight stores and recalls semantic meaning, not structured fields — removes the compliance burden entirely.

Closing Thought

**
Hindsight is built on a simple premise: agents that can't remember aren't really agents. Between sessions, they lose everything. With persistent memory across sessions, an agent can genuinely compound intelligence over time — each interaction making every future interaction more useful.
For sales, that property maps directly to something reps care about: walking into every call knowing exactly what happened before, what to address, and what's at risk. One lookup before, one note after. The agent handles everything in between.
If you want to build something similar, the Hindsight documentation covers the retain/recall/reflect architecture in depth