DEV Community

Cover image for Designing a Conversation Memory Layer for Real-Time Meetings
Cheetu AI
Cheetu AI

Posted on

Designing a Conversation Memory Layer for Real-Time Meetings

Designing a Conversation Memory Layer for Real-Time Meetings

Most AI meeting tools focus on one simple promise:

> Join the meeting, record the conversation, and generate notes afterward.

That workflow can be useful.

But it also creates a question:

> What if the most valuable part of a meeting assistant is not the note after the meeting, but the memory it creates during and after the conversation?

At **Cheetu AI**, we have been exploring this idea through four product layers:

* Real-time transcription
* Live translation
* AI summaries
* Searchable conversation memory

This post breaks down how these layers can work together to turn conversations into reusable knowledge.

---

## Why Meeting Notes Are Not Enough

A meeting is not just a temporary event.

It often contains:

* Customer feedback
* Product decisions
* Sales objections
* Project risks
* Technical discussions
* Hiring signals
* Action items
* Follow-up commitments

The problem is that most of this information is easy to lose.

It may exist in:

* Someone's memory
* A recording
* A transcript
* A chat message
* A manual note
* A follow-up email

But when you need the exact context later, it is often hard to find.

For example:

```text
What did the customer say about onboarding?
Enter fullscreen mode Exit fullscreen mode
Who owned the follow-up after the product review?
Enter fullscreen mode Exit fullscreen mode
Did we already discuss the Q3 launch risk?
Enter fullscreen mode Exit fullscreen mode
Where did we mention Spanish caption support?
Enter fullscreen mode Exit fullscreen mode

A good meeting system should help answer these questions without forcing users to manually search through recordings or notes.


Layer 1: Real-Time Transcription

The first layer is transcription.

But timing matters.

A transcript generated after the meeting is useful as a record.

A transcript generated during the meeting becomes part of the meeting interface.

Real-time transcription helps users:

  • Follow fast conversations
  • Recover missed details
  • Stay focused instead of taking manual notes
  • Review who said what
  • Capture timestamps automatically

A simple transcript segment might look like this:

{
  "session_id": "meeting_001",
  "speaker": "Speaker A",
  "start_time": "00:04:12",
  "end_time": "00:04:18",
  "text": "Let's prioritize onboarding improvements for Q3.",
  "language": "en"
}
Enter fullscreen mode Exit fullscreen mode

This structure may look basic, but it is already useful.

It gives the system:

  • Speaker context
  • Time context
  • Language context
  • Searchable text
  • A foundation for summaries and retrieval

Layer 2: Live Translation

Global teams often share one working language.

But a shared language does not mean everyone participates equally.

Some people may understand most of the conversation but miss nuance.

Some may need more time to process what was said.

Some may hesitate to ask questions because they are still translating mentally.

Live translation can reduce this gap.

A useful translation experience should support:

  • Original captions
  • Translated captions
  • Speaker labels
  • Low latency
  • One-click language switching
  • A clean reading experience

A translated segment could be represented like this:

{
  "speaker": "Speaker A",
  "timestamp": "00:04:12",
  "original": {
    "language": "en",
    "text": "Let's prioritize onboarding improvements for Q3."
  },
  "translation": {
    "language": "es",
    "text": "Prioricemos las mejoras de incorporación para el tercer trimestre."
  }
}
Enter fullscreen mode Exit fullscreen mode

The goal is not just translation.

The goal is participation.

Translation after the meeting helps people review.

Translation during the meeting helps people join the conversation.


Layer 3: AI Summaries

Many AI summaries feel too generic.

They compress the conversation, but they do not always make the outcome clearer.

A weak summary might say:

The team discussed onboarding improvements and next steps.

That is readable, but not very actionable.

A better summary should separate the meeting into useful sections:

  • Key points
  • Decisions
  • Risks
  • Open questions
  • Action items
  • Owners
  • Due dates

For example:

## Decisions

* Prioritize onboarding improvements for Q3.
* Keep analytics improvements in scope, but make onboarding the first priority.

## Risks

* The current setup flow may reduce activation.
* Documentation may not be clear enough for new users.

## Action Items

* Maya: Audit onboarding steps by Friday.
* Alex: Review activation metrics from the last cohort.
* Sam: Draft updated setup docs before next Tuesday.

## Open Questions

* Should enterprise users get a separate onboarding path?
* Do we need in-product guidance during setup?
Enter fullscreen mode Exit fullscreen mode

This format is more useful because it connects the conversation to execution.

A strong AI summary should answer:

  1. What happened?
  2. What matters?
  3. What changed?
  4. What is still unresolved?
  5. Who needs to do what next?

Layer 4: Searchable Conversation Memory

Transcription captures the conversation.

Translation makes it easier to understand.

Summaries make it easier to review.

But search makes it reusable.

Once conversations are structured, they can become a searchable memory layer.

Instead of asking:

Where is the recording?

A user can ask:

What did the customer say about onboarding friction?
Enter fullscreen mode Exit fullscreen mode

Instead of asking:

Did we already talk about this?

A user can ask:

What decisions did we make about the Q3 roadmap?
Enter fullscreen mode Exit fullscreen mode

Instead of searching manually through notes, a user can ask:

Summarize all open risks mentioned in meetings this week.
Enter fullscreen mode Exit fullscreen mode

A simplified search request might look like this:

{
  "query": "What did customers say about onboarding friction?",
  "filters": {
    "date_range": "last_90_days",
    "conversation_type": "customer_call"
  },
  "top_k": 5
}
Enter fullscreen mode Exit fullscreen mode

A useful response should include source context:

{
  "answer": "Customers mentioned that onboarding felt too manual, especially during workspace setup and team invitation.",
  "sources": [
    {
      "session": "Customer Call - Acme",
      "timestamp": "00:12:44",
      "speaker": "Customer"
    },
    {
      "session": "Customer Call - Northstar",
      "timestamp": "00:27:10",
      "speaker": "Customer"
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

Source context is important.

Without it, AI answers can feel uncertain.

With it, users can verify where the answer came from.


Why Source Context Matters

Conversation memory should not feel like a black box.

If an AI system gives an answer based on past meetings, users should be able to inspect the source.

A good source reference may include:

  • Meeting title
  • Speaker
  • Timestamp
  • Transcript segment
  • Original language
  • Translated text
  • Summary section

For example:

Source:
Customer Call - Acme
00:12:44
Speaker: Customer

"The setup process takes too long when inviting multiple teammates."
Enter fullscreen mode Exit fullscreen mode

This makes the answer more trustworthy.

It also helps users return to the original moment instead of relying only on the generated response.


Why No Meeting Bot Can Improve the Experience

Many AI meeting assistants work by sending a bot into the call.

That approach is common, but it can create friction.

A visible meeting bot may cause issues when:

  • External guests do not recognize it
  • Participants feel monitored
  • Teams have strict recording policies
  • Sensitive conversations require more privacy
  • The bot changes the natural meeting dynamic

A no-bot approach can feel lighter.

The assistant supports the user without becoming another participant in the room.

For Cheetu AI, this is an important design direction:

Help users capture, understand, summarize, and search conversations without requiring an AI bot to join the meeting.


A Simple Architecture View

At a high level, a conversation memory layer could look like this:

Audio Stream
    ↓
Real-Time Transcription
    ↓
Speaker Labels + Timestamps
    ↓
Live Translation
    ↓
Structured AI Summary
    ↓
Chunks + Metadata
    ↓
Search Index
    ↓
Conversation Memory
Enter fullscreen mode Exit fullscreen mode

Each layer adds a different kind of value.

Layer Main Purpose
Transcription Turn speech into text
Translation Make conversations understandable across languages
Summary Turn long conversations into structured outcomes
Metadata Preserve speaker, time, topic, and language context
Search Retrieve useful information later
Source context Make AI answers verifiable

The product challenge is making this pipeline feel simple.

Users should not need to think about the system.

They should simply feel that their conversations are easier to follow, review, and search.


Example Use Cases

Product Teams

Product teams can use conversation memory to:

  • Search across user interviews
  • Find repeated customer pain points
  • Review product decisions
  • Track open questions from roadmap discussions

Sales Teams

Sales teams can use it to:

  • Find customer objections
  • Review commitments made during calls
  • Generate follow-up notes
  • Track account risks

Students and Researchers

Students and researchers can use it to:

  • Search lecture notes
  • Ask questions across past sessions
  • Summarize long discussions
  • Return to exact source moments

Global Teams

Global teams can use it to:

  • Follow meetings in a preferred language
  • Review original and translated captions
  • Reduce misunderstandings
  • Make participation more equal

Design Principles

When designing a conversation memory system, these principles matter.

1. Keep the Meeting Natural

The assistant should support the conversation without changing the room dynamic.

2. Make Real-Time Output Useful

Live transcription and translation should be readable, fast, and easy to scan.

3. Structure Summaries Around Action

Summaries should highlight decisions, risks, open questions, and action items.

4. Make Search Source-Grounded

Answers should include where the information came from.

5. Respect User Control

Conversation data should feel personal, private, and manageable.


Final Thought

The next generation of meeting tools should not only generate better notes.

They should help people understand conversations as they happen and reuse that knowledge afterward.

That means combining:

  • Real-time transcription
  • Live translation
  • Structured AI summaries
  • Searchable conversation memory
  • Source-grounded answers
  • A low-friction meeting experience

That is the direction we are exploring with Cheetu AI.

The interesting question is not only:

How do we record more meetings?

It is:

How do we help people remember the conversations that already matter?

Learn more: Cheetu AI


Top comments (0)