Cheetu AI

Posted on May 26

Designing a Conversation Memory Layer for Real-Time Meetings

#ai #productivity #webdev #workplace

Designing a Conversation Memory Layer for Real-Time Meetings

Most AI meeting tools focus on one simple promise:

> Join the meeting, record the conversation, and generate notes afterward.

That workflow can be useful.

But it also creates a question:

> What if the most valuable part of a meeting assistant is not the note after the meeting, but the memory it creates during and after the conversation?

At **Cheetu AI**, we have been exploring this idea through four product layers:

* Real-time transcription
* Live translation
* AI summaries
* Searchable conversation memory

This post breaks down how these layers can work together to turn conversations into reusable knowledge.

---

## Why Meeting Notes Are Not Enough

A meeting is not just a temporary event.

It often contains:

* Customer feedback
* Product decisions
* Sales objections
* Project risks
* Technical discussions
* Hiring signals
* Action items
* Follow-up commitments

The problem is that most of this information is easy to lose.

It may exist in:

* Someone's memory
* A recording
* A transcript
* A chat message
* A manual note
* A follow-up email

But when you need the exact context later, it is often hard to find.

For example:

```text
What did the customer say about onboarding?

Who owned the follow-up after the product review?

Did we already discuss the Q3 launch risk?

Where did we mention Spanish caption support?

A good meeting system should help answer these questions without forcing users to manually search through recordings or notes.

Layer 1: Real-Time Transcription

The first layer is transcription.

But timing matters.

A transcript generated after the meeting is useful as a record.

A transcript generated during the meeting becomes part of the meeting interface.

Real-time transcription helps users:

Follow fast conversations
Recover missed details
Stay focused instead of taking manual notes
Review who said what
Capture timestamps automatically

A simple transcript segment might look like this:

{
  "session_id": "meeting_001",
  "speaker": "Speaker A",
  "start_time": "00:04:12",
  "end_time": "00:04:18",
  "text": "Let's prioritize onboarding improvements for Q3.",
  "language": "en"
}

This structure may look basic, but it is already useful.

It gives the system:

Speaker context
Time context
Language context
Searchable text
A foundation for summaries and retrieval

Layer 2: Live Translation

Global teams often share one working language.

But a shared language does not mean everyone participates equally.

Some people may understand most of the conversation but miss nuance.

Some may need more time to process what was said.

Some may hesitate to ask questions because they are still translating mentally.

Live translation can reduce this gap.

A useful translation experience should support:

Original captions
Translated captions
Speaker labels
Low latency
One-click language switching
A clean reading experience

A translated segment could be represented like this:

{
  "speaker": "Speaker A",
  "timestamp": "00:04:12",
  "original": {
    "language": "en",
    "text": "Let's prioritize onboarding improvements for Q3."
  },
  "translation": {
    "language": "es",
    "text": "Prioricemos las mejoras de incorporación para el tercer trimestre."
  }
}

The goal is not just translation.

The goal is participation.

Translation after the meeting helps people review.

Translation during the meeting helps people join the conversation.

Layer 3: AI Summaries

Many AI summaries feel too generic.

They compress the conversation, but they do not always make the outcome clearer.

A weak summary might say:

The team discussed onboarding improvements and next steps.

That is readable, but not very actionable.

A better summary should separate the meeting into useful sections:

Key points
Decisions
Risks
Open questions
Action items
Owners
Due dates

For example:

## Decisions

* Prioritize onboarding improvements for Q3.
* Keep analytics improvements in scope, but make onboarding the first priority.

## Risks

* The current setup flow may reduce activation.
* Documentation may not be clear enough for new users.

## Action Items

* Maya: Audit onboarding steps by Friday.
* Alex: Review activation metrics from the last cohort.
* Sam: Draft updated setup docs before next Tuesday.

## Open Questions

* Should enterprise users get a separate onboarding path?
* Do we need in-product guidance during setup?

This format is more useful because it connects the conversation to execution.

A strong AI summary should answer:

What happened?
What matters?
What changed?
What is still unresolved?
Who needs to do what next?

Layer 4: Searchable Conversation Memory

Transcription captures the conversation.

Translation makes it easier to understand.

Summaries make it easier to review.

But search makes it reusable.

Once conversations are structured, they can become a searchable memory layer.

Instead of asking:

Where is the recording?

A user can ask:

What did the customer say about onboarding friction?

Instead of asking:

Did we already talk about this?

A user can ask:

What decisions did we make about the Q3 roadmap?

Instead of searching manually through notes, a user can ask:

Summarize all open risks mentioned in meetings this week.

A simplified search request might look like this:

{
  "query": "What did customers say about onboarding friction?",
  "filters": {
    "date_range": "last_90_days",
    "conversation_type": "customer_call"
  },
  "top_k": 5
}

A useful response should include source context:

{
  "answer": "Customers mentioned that onboarding felt too manual, especially during workspace setup and team invitation.",
  "sources": [
    {
      "session": "Customer Call - Acme",
      "timestamp": "00:12:44",
      "speaker": "Customer"
    },
    {
      "session": "Customer Call - Northstar",
      "timestamp": "00:27:10",
      "speaker": "Customer"
    }
  ]
}

Source context is important.

Without it, AI answers can feel uncertain.

With it, users can verify where the answer came from.

Why Source Context Matters

Conversation memory should not feel like a black box.

If an AI system gives an answer based on past meetings, users should be able to inspect the source.

A good source reference may include:

Meeting title
Speaker
Timestamp
Transcript segment
Original language
Translated text
Summary section

For example:

Source:
Customer Call - Acme
00:12:44
Speaker: Customer

"The setup process takes too long when inviting multiple teammates."

This makes the answer more trustworthy.

It also helps users return to the original moment instead of relying only on the generated response.

Why No Meeting Bot Can Improve the Experience

Many AI meeting assistants work by sending a bot into the call.

That approach is common, but it can create friction.

A visible meeting bot may cause issues when:

External guests do not recognize it
Participants feel monitored
Teams have strict recording policies
Sensitive conversations require more privacy
The bot changes the natural meeting dynamic

A no-bot approach can feel lighter.

The assistant supports the user without becoming another participant in the room.

For Cheetu AI, this is an important design direction:

Help users capture, understand, summarize, and search conversations without requiring an AI bot to join the meeting.

A Simple Architecture View

At a high level, a conversation memory layer could look like this:

Audio Stream
    ↓
Real-Time Transcription
    ↓
Speaker Labels + Timestamps
    ↓
Live Translation
    ↓
Structured AI Summary
    ↓
Chunks + Metadata
    ↓
Search Index
    ↓
Conversation Memory

Each layer adds a different kind of value.

Layer	Main Purpose
Transcription	Turn speech into text
Translation	Make conversations understandable across languages
Summary	Turn long conversations into structured outcomes
Metadata	Preserve speaker, time, topic, and language context
Search	Retrieve useful information later
Source context	Make AI answers verifiable

The product challenge is making this pipeline feel simple.

Users should not need to think about the system.

They should simply feel that their conversations are easier to follow, review, and search.

Example Use Cases

Product Teams

Product teams can use conversation memory to:

Search across user interviews
Find repeated customer pain points
Review product decisions
Track open questions from roadmap discussions

Sales Teams

Sales teams can use it to:

Find customer objections
Review commitments made during calls
Generate follow-up notes
Track account risks

Students and Researchers

Students and researchers can use it to:

Search lecture notes
Ask questions across past sessions
Summarize long discussions
Return to exact source moments

Global Teams

Global teams can use it to:

Follow meetings in a preferred language
Review original and translated captions
Reduce misunderstandings
Make participation more equal

Design Principles

When designing a conversation memory system, these principles matter.

1. Keep the Meeting Natural

The assistant should support the conversation without changing the room dynamic.

2. Make Real-Time Output Useful

Live transcription and translation should be readable, fast, and easy to scan.

3. Structure Summaries Around Action

Summaries should highlight decisions, risks, open questions, and action items.

4. Make Search Source-Grounded

Answers should include where the information came from.

5. Respect User Control

Conversation data should feel personal, private, and manageable.

Final Thought

The next generation of meeting tools should not only generate better notes.

They should help people understand conversations as they happen and reuse that knowledge afterward.

That means combining:

Real-time transcription
Live translation
Structured AI summaries
Searchable conversation memory
Source-grounded answers
A low-friction meeting experience

That is the direction we are exploring with Cheetu AI.

The interesting question is not only:

How do we record more meetings?

It is:

How do we help people remember the conversations that already matter?

Learn more: Cheetu AI

DEV Community