DEV Community

Cover image for I Stopped Repeating Context to My AI
Radhika
Radhika

Posted on

I Stopped Repeating Context to My AI

Before every client call, every interview, every networking conversation, I was doing the same fifteen-minute ritual: searching through old notes, scrolling back through Slack threads, re-reading documents I'd already read once. The information existed. It just wasn't organized around the person I was about to meet. That friction — not missing data, but scattered data — is what eventually became MeetMind.

Why I Built This

The problem wasn't that I had no notes. It was that no tool organized those notes by contact. I wanted to type a name and instantly get back:

  • Relevant context from past conversations
  • Commitments I'd made or they'd made
  • Follow-up items I'd logged
  • A few natural conversation starters

That's a small ask. But nothing I tried actually did it. CRMs were too heavy. Notes apps had no retrieval layer. Plain text files required me to do the work the tool should be doing. So I built MeetMind — a Flask-backed web application that does exactly this: enter a name, get a structured briefing drawn from your own memory of that person, generated in seconds using Groq's API with LLaMA 3.

The interesting part isn't that it works. It's that almost every design decision in the system was shaped by committing early to Hindsight as the memory layer.

What MeetMind Looks Like in Practice

The landing page makes the pitch in one line: Never walk into a meeting unprepared. That's the whole product.

The UI is a single-page Flask app with two modes: Briefing and Save Notes. Before a meeting, you type a contact's name and hit generate. After a meeting, you write a short note about what happened and save it. The loop is deliberately that tight — nothing about calendar integrations, nothing about recording, no ambient capture. You decide what's worth remembering.

That floating memory card in the hero section isn't decorative — it's a literal example of what the system surfaces. When you've stored notes for a contact, MeetMind shows you a quick preview before you even generate the full briefing. The has_memory boolean the API returns drives this: contacts with stored history get a different visual treatment than first-time lookups.

The note-taking flow is just as minimal. A contact name field, a free-text area with the placeholder "Budget discussed, follow-ups, preferences, key decisions...", and a "Save to memory" button. That's the entire input surface.

How It Hangs Together

Under the hood there are three files that matter: app.py (Flask routing and orchestration), memory_vault.py (persistence and retrieval), and ai_brain.py (LLM integration via Groq). The entire orchestration layer is two lines:

# app.py — the orchestration layer
history = hindsight_db.recall(contact)
briefing_text = generate_meeting_briefing(contact, history)
Enter fullscreen mode Exit fullscreen mode

One call retrieves a list of strings. The other turns that list into a formatted briefing. Neither function knows the other exists. Every complexity is encapsulated inside one of them.

The Decision That Shaped Everything: Designing Around Hindsight's Interface

The most consequential choice in this project wasn't the model, the prompt, or the storage format. It was deciding early to model the local memory layer around Hindsight's retain and recall API.

Hindsight is a memory library built for exactly this problem: giving LLM-backed applications a way to store and retrieve contextual information across sessions. The core primitive is clean — you retain text, and later you recall relevant text given a query. It doesn't expose vector indices or similarity thresholds. It just gives you back what's relevant.

I built a local implementation that mirrors that interface exactly:

class MockHindsightClient:
    def retain(self, text):
        # Parse and persist a note about a contact
        ...

    def recall(self, query_name):
        # Return all stored notes for a contact
        contact_name = query_name.strip().lower()
        return self.storage.get(contact_name, [])

hindsight_db = MockHindsightClient()
Enter fullscreen mode Exit fullscreen mode

The instance is named hindsight_db — not memory_store, not notes_manager. That naming was intentional. I wanted the rest of the codebase to treat it as Hindsight, so swapping the local implementation for the real client later is a one-file change. The interface is the contract. The implementation behind it is a detail.

Thinking through what agent memory actually needs to do clarified something useful: the retrieval model here is intentionally simple. Exact name-keyed lookup. "Rahul" maps to "rahul" after a .lower() call, and you get his notes. No fuzzy matching, no embeddings, no semantic retrieval at this layer. That simplicity is a feature. The complexity lives in what you do with those notes, which is the LLM's job.

Prompt Structure Was the Real Work

The LLM integration is 45 lines. Most of that is the prompt. I used llama-3.3-70b-versatile via Groq at temperature=0.6 — warm enough to sound natural, cool enough not to improvise details that aren't in the context. The actual API call is four lines. What took real iteration was the output schema.

system_instruction = (
    "You are an elite executive assistant AI. Your goal is to prepare a brief, "
    "high-impact, bulleted meeting preparation guide for the user."
)

user_prompt = f"""
Please prepare a briefing for my upcoming meeting with: {contact_name}

Here is the historical context of my past interactions with this person:
{formatted_history}

Provide your output exactly in this structure:
1. SUMMARY OF PAST INTERACTION (Keep it short)
2. KEY REMINDERS (Promises made or things to look out for)
3. SUGGESTED CONVERSATION OPENERS
"""
Enter fullscreen mode Exit fullscreen mode

Every time I gave the model free-form latitude, it decided the most important thing to tell me was to "approach the meeting with confidence." That kind of filler is useless at 8:45am before a client call. Pinning output to three labeled sections eliminated the drift almost entirely. The model follows structure when you give it structure. This is not a new insight, but it's one you have to rediscover every time you write a new prompt.

The formatted_history handling is also doing real work. Rather than passing raw strings into the prompt, I format them as bullet points — and the null case gets explicit treatment:

if past_history_list:
    formatted_history = "\n".join([f"- {note}" for note in past_history_list])
else:
    formatted_history = "No previous history found. This is your very first meeting with them."
Enter fullscreen mode Exit fullscreen mode

The else branch isn't defensive programming. It's functional. Without it, the model generates vague summaries that sound like they came from real context when they didn't. Telling it "this is a first meeting" shifts output to introductory openers — which is exactly right.

What a Real Session Looks Like

Given a contact with two stored notes — say, "needs a React website, budget ₹50,000, follow up Monday, prefers morning meetings" and "confirmed Tailwind CSS, three-week timeline, 50% upfront payment" — the briefing comes back with a tight interaction summary, a reminders section that flags the follow-up and payment structure, and conversation openers that reference the project by name rather than being generic. The output is deterministic in structure, varied in language, and grounded in your actual stored history.

The /get_briefing endpoint returns a has_memory flag alongside the briefing:

return jsonify({
    "briefing": briefing_text,
    "has_memory": len(history) > 0
})
Enter fullscreen mode Exit fullscreen mode

The frontend uses this to render the briefing differently depending on whether prior notes exist. A first-contact briefing looks and reads differently from one backed by two months of notes. That's a small thing, but it's what makes the tool feel honest rather than performative.

The Builder's Experience

Pushing every line of this to GitHub — the Flask routes, the Groq API wiring, the memory interface, the prompt iteration — felt genuinely good. Not in a "shipped a feature" way. In a "made a thing that works and is mine" way.

That feeling is specific. It's what happens when you close the gap between a problem you actually have and a system that solves it. The backend talked to the API. The memory layer held state. The briefing came back structured and grounded. That's the part of building no tutorial gives you — the moment a system you wrote does something real for you.

I've stopped re-reading old notes before calls. I open MeetMind, type a name, and the context is already there.

What I Learned

The memory interface shapes everything downstream. Committing to Hindsight's retain/recall abstraction early made the rest of the code straightforward. The LLM layer doesn't know where notes came from. The Flask layer doesn't know how retrieval works. Those boundaries are cheap at the start and expensive to add later.

Prompt structure is your output schema. Numbered sections and explicit labels aren't stylistic choices — they're functional constraints. Unstructured prompts produce unstructured outputs, and unstructured outputs are hard to parse and hard to trust when you're about to walk into a room.

Explicit null handling in prompts is not optional. LLMs default toward plausible-sounding output. If you don't tell them there's no prior context, they'll invent some. The empty-state case is a separate codepath in the prompt, not an edge case.

Name internal objects after the interface they implement. hindsight_db rather than notes_manager made the intended architecture visible in every file that imports it. Comments rot. Names don't.

Temperature is a decision, not a default. 0.6 was a deliberate choice for a tool where hallucinated specifics have real costs — you might repeat something back to a client that you never actually discussed. Slightly flat language is a much cheaper failure mode than invented facts.


The system is under 150 lines of Python, a few HTML templates, and a persistent memory layer backed by Hindsight. The design decisions — how memory is abstracted, how the prompt is structured, how the null case is handled — all scale directly to a production implementation. Swapping the local memory client for the real Hindsight integration is a one-file change. The rest of the system doesn't need to know it happened.

That's the kind of architecture worth building toward. And honestly, the kind of tool worth using.


Links

Top comments (0)