Prachi Arora

Posted on Jun 6

Designing a Meeting Assistant People Actually Want to Use

#python #flask #ai #webdev

Most meeting tools help during a meeting, but the real challenge often starts before it. Users spend time searching for context, reviewing past interactions, and preparing discussion points.

While building MeetMind, our goal was to make meeting preparation and follow-up simpler and more intuitive. As a frontend developer, I focused on designing user-friendly interfaces, building responsive components, and creating a smooth workflow from meeting preparation to post-meeting insights.

In this article, I'll share the design decisions, frontend challenges, and lessons I learned while building the user experience behind MeetMind.

How We Used Hindsight Memory to Make Our AI Meeting Assistant Actually Remember Things

Hook

I've been in too many meetings where I blanked on something a client told me weeks ago. You're sitting there, nodding, and somewhere in the back of your head you know they mentioned a budget number or a deadline — but you can't pull it up. That feeling is expensive. It erodes trust, slows decisions, and makes you look unprepared.

That's the problem MeetMind was built to solve. And the hardest part of building it wasn't the AI — it was making the AI remember.

What Is MeetMind — And How Does It Actually Work?

MeetMind is a web application that functions as your AI-powered pre-meeting assistant. Here's the full user flow:

Before a meeting: Type a contact's name, click "Get Briefing." The app retrieves everything stored about that person — notes, promises, project details — passes it to the LLM, and returns a structured briefing: a summary of past interactions, key reminders, and conversation openers grounded in your actual history with them.

After a meeting: Type your notes and click "Save." The system stores them under that contact's name for next time.

Under the hood: Python + Flask backend, Llama 3.3 70B on Groq's inference API, and a JSON-backed memory layer modeled on the Hindsight architecture.

The interface is intentionally minimal. Two panels, two actions — "Generate my briefing" before the meeting, "Save to memory" after. The complexity lives in the backend, not the UI.

My Role

I built memory_vault.py — the module that stores and retrieves contact history — and wired it into the Flask routes in app.py. I also owned the prompt engineering in ai_brain.py: turning raw stored notes into something an LLM could synthesize into a reliable, structured briefing.

The question I kept running into: how do you give a stateless LLM meaningful past context without just dumping a wall of text at it?

The Core Technical Problem

Every call to Groq's API starts from zero. The model has no session state, no memory of past calls, no knowledge of your contacts. When you ask MeetMind for a briefing on "Rahul," the LLM literally does not know who Rahul is unless you tell it — right now, in this prompt.

This is the fundamental constraint of building stateful applications on stateless LLM APIs. You have to externalize memory, retrieve the right pieces at query time, and inject them into context in a form the model can actually use. If you get that pipeline wrong, your "AI assistant" is just generating plausible-sounding text with no grounding in reality — which is worse than useless for a meeting tool.

What We Tried First

The naive approach: save every interaction as a single text blob and pass the whole thing into the prompt every time.

It worked briefly. As history grew, token counts climbed and output quality dropped. The model fixated on old irrelevant details — a coffee preference from six months ago got as much attention as the current project deadline. Briefings became noisy and untrustworthy. We needed structure around what gets stored, how it's retrieved, and how it's formatted before the model sees it.

Why Hindsight Changed How We Thought About This

We studied Hindsight as a reference for how agent memory should be designed. The core concept, explained well at vectorize.io/what-is-agent-memory, is that memory shouldn't be a passive storage dump — it should be an active system with two clean primitives: retain (write a memory) and recall (retrieve memories relevant to a query).

This reframe matters. Instead of "where do I store this?" you ask "what interface do I expose for writing and reading memory?" Once that interface is defined, the backend, retrieval strategy, and prompt formatting all become swappable implementation details.

We built MockHindsightClient on exactly this interface — with the plan to replace the JSON backend with the full Hindsight SDK and semantic vector search when ready.

Code Walkthrough

Snippet 1 — Writing a memory: retain()

# memory_vault.py
def retain(self, text):
    if "Meeting with " in text:
        content_split = text.split("Meeting with ")[1]
        parts = content_split.split(": ")
        if len(parts) >= 2:
            contact_name = parts[0].strip().lower()
            notes = ": ".join(parts[1:]).strip()
            if contact_name in self.storage:
                self.storage[contact_name].append(notes)
            else:
                self.storage[contact_name] = [notes]
            self._save()
            return True

This method receives a plain-English string like "Meeting with Rahul: He confirmed Tailwind CSS and wants the project in 3 weeks." It parses out the contact name (normalized to lowercase for consistent lookup later) and the note body, then appends the note to that contact's list in the JSON store.

The line ": ".join(parts[1:]) is the subtle one. If you split on ": " and take only parts[1], any note containing a colon — a timestamp like "follow up at 3:00 PM", a URL, a ratio — gets silently truncated. Joining from index 1 onward reassembles the full note string regardless of how many colons it contains. We hit this bug in testing and lost notes before catching it.

_save() is called on every write, not batched. Slower? Yes. But losing a memory because the app crashed between a write and a flush is a worse outcome than a slightly slower save. Durability is the right tradeoff here.

Snippet 2 — Reading a memory: recall()

# memory_vault.py
def recall(self, query_name):
    contact_name = query_name.strip().lower()
    return self.storage.get(contact_name, [])

Four lines, but every decision here is intentional. The same .strip().lower() normalization applied during retain() is applied during recall() — so "Rahul", "rahul", and " rahul " all resolve to the same key. Without this symmetry, you'd save under one key and look up under another and get nothing.

Returning [] instead of None when a contact isn't found is a small but important API choice. The caller does len(history) > 0 to set the has_memory flag, and iterates over the list when formatting the prompt. Both operations work correctly on an empty list without any null checks. The interface is always safe to use.

Snippet 3 — The Flask route that connects memory to the LLM

# app.py
@app.route('/get_briefing', methods=['POST'])
def get_briefing():
    contact = request.json.get('contact', '').strip()
    if not contact:
        return jsonify({"error": "Please enter a valid name."}), 400
    history = hindsight_db.recall(contact)
    briefing_text = generate_meeting_briefing(contact, history)
    return jsonify({
        "briefing": briefing_text,
        "has_memory": len(history) > 0
    })

This route is the architectural seam of the whole system. recall() runs before generate_meeting_briefing() — memory is retrieved first, then handed to the LLM. The LLM never reaches back into storage itself; it only sees what the route explicitly passes it. That separation makes the system easier to test, debug, and upgrade.

The has_memory boolean isn't just a debugging convenience — it's a trust signal for the frontend. A briefing grounded in real stored history is different from a first-meeting cold-start response, and users should know which one they're looking at.

Snippet 4 — Prompt construction and the Hindsight memory injection

# ai_brain.py
if past_history_list:
    formatted_history = "\n".join([f"- {note}" for note in past_history_list])
else:
    formatted_history = "No previous history found. This is your very first meeting with them."

Bullet formatting helps the model treat each note as a discrete fact rather than reading the whole history as a paragraph. The cold-start fallback injects a specific string so the model knows it's a first meeting, rather than hallucinating fake history. And explicit output structure locks in predictable, renderable output — structure in the prompt doesn't constrain quality; it channels it.

Before and After Memory

Without memory (first meeting):

Summary: First interaction — no prior context. Reminders: Introduce yourself. Note their goals. Opener: "Thanks for the time — what are you working on?"

With Hindsight memory:

Summary: Rahul — React + Tailwind website client. Budget ₹50,000, 3 weeks, 50% upfront agreed. Reminders: Follow-up promised for Monday. Confirm payment. Prefers mornings. Opener: "Good morning Rahul — checking on the payment and confirming we're on track for 3-week delivery."

The second briefing is the model synthesizing real stored facts. Not guessing. That's the difference memory makes.

Challenges We Faced

Name normalization isn't fully solved. .strip().lower() handles case and whitespace, but "Rahul" and "Rahul Singh" still resolve as different contacts. Fuzzy matching is in the backlog.

Over-constraining the prompt backfired. One iteration made briefings sound robotic. Three section labels with short hints is the right balance — consistent structure without killing natural language quality.

JSON concurrency. Fine for single-user local use; multi-user production needs a proper database or the full Hindsight vector store.

Lessons Learned

Define your memory interface first. We treated memory as an implementation detail and rewrote it twice. The retain/recall contract from Hindsight gave us the right abstraction from the start.

Prompt structure is engineering, not art. Specifying the exact output format — three numbered sections — was a technical decision that made the system reliable and the frontend renderable. Treat it like an API contract.

Surface the memory state explicitly. The has_memory flag isn't just for debugging. Users interpret AI output differently when they know if it's grounded in real history.

Cold-start is a real use case. The fallback path for a first meeting took real iteration. Don't treat it as an edge case.

Conclusion

MeetMind taught us that memory isn't a feature you bolt on after the LLM integration works — it's a first-class architectural concern that shapes every layer of the system. The retain/recall model from Hindsight gave us the right frame for that from the beginning, and even our lightweight JSON-backed implementation of that pattern produced a meaningfully better product than the memoryless approach we started with.

If you're building anything where an AI needs to carry context across sessions, read the Hindsight docs before you write your first line of memory logic. The pattern matters more than the implementation.