DEV Community

Cover image for How Conversation Memory Improves Conversation experience
Bala Madhusoodhanan
Bala Madhusoodhanan

Posted on

How Conversation Memory Improves Conversation experience

Intro:
Introduction Better search queries and prefer? carefully chosen keywords are essential when building Reliably Retrieval-Augmented Generation (RAG) agents in Copilot Studio. LLMs are powerful, but their outputs depend heavily on the information they retrieve at runtime: if the retrieval is off—because the query is vague or lacks domain context—the LLM can hallucinate, produce irrelevant results, or miss critical legal/regulatory nuance. Well-formed queries minimize hallucinations, improve retrieval precision, reduce wasted compute, and maintain context across multi-turn conversations. This post describes a small experiment comparing retrieval query quality with conversation memory enabled versus disabled and synthesizes practical takeaways for Copilot Studio users.

Experimental setup
Goal: Test how including conversation history (memory) affects the search queries that the query-generation component produces and therefore the relevance of retrieved RAG documents.

Source / domain used in evaluation: legal/regulatory text (example: EU AI Act) — representative of domains that rely on precise terminology and structured references.

RAG platform: Copilot Studio (using the "Create search query" component that builds queries from user input + optional conversation history).

Two configurations:
Memory ON: include 3 past messages from the current conversation in the query-generation step.
Memory OFF: do not include conversation history (fresh query from each user message).

Interaction thread used (Startup Journey) — five turns:

  • We're building an AI tool that screens job applicants' CVs. Does the EU AI Act apply to us?
  • You mentioned it's high-risk. What exactly does that mean for us in practice?
  • What documents do we need to prepare before we can launch?
  • Who needs to sign off on that documentation — us or a third party?
  • And what happens if we launch without doing any of this?

Observations:

Scenario *Question * Key Word Search Search Query
Memory with 3 past messages from the current conversation's history to include. "We're building an AI tool that screens job applicants' CVs. Does the EU AI Act apply to us?" EU AI Act applicability AI tool screening job applicants CVs We're building an AI tool that screens job applicants' CVs. Does the EU AI Act apply to our tool?
Memory with 3 past messages from the current conversation's history to include. "You mentioned it's high-risk. What exactly does that mean for us in practice?" EU AI Act high-risk requirements CV screening practical implications You mentioned that our AI tool for screening job applicants' CVs would be considered high-risk under the EU AI Act. What does being classified as high-risk mean for us in practical terms?
Memory with 3 past messages from the current conversation's history to include. What documents do we need to prepare before we can launch? Required documents high-risk AI tool EU AI Act launch What documents do we need to prepare before launching our high-risk AI tool for screening job applicants’ CVs under the EU AI Act?
Memory with 3 past messages from the current conversation's history to include. "Who needs to sign off on that documentation — us or a third party?" EU AI Act high-risk AI documentation sign off provider third party Who is responsible for signing off on the required documentation for launching a high-risk AI tool under the EU AI Act—do we, as the provider, need to approve it ourselves, or does a third party need to be involved?
Memory with 3 past messages from the current conversation's history to include. And what happens if we launch without doing any of this? EU AI Act consequences launch high-risk AI tool without documentation compliance What are the consequences if we launch our high-risk AI tool for CV screening without preparing the required documentation and compliance steps under the EU AI Act?
No memory "We're building an AI tool that screens job applicants' CVs. Does the EU AI Act apply to us?" EU AI Act applicability AI tool screening job applicants CVs We are developing an AI tool that screens job applicants' CVs. Does the EU AI Act apply to our project?
No memory "You mentioned it's high-risk. What exactly does that mean for us in practice?" high-risk meaning practical implications You mentioned it's high-risk. What exactly does "high-risk" mean for us in practice?
No memory What documents do we need to prepare before we can launch? documents needed prepare before launch What documents do we need to prepare before we can launch?
No memory "Who needs to sign off on that documentation — us or a third party?" documentation sign off responsible us third party approval Who is responsible for signing off on that documentation—do we need to sign it ourselves, or does a third party need to provide approval?
No memory And what happens if we launch without doing any of this? consequences launch without preparation risks What are the potential consequences if we proceed with the launch without completing any of the recommended steps or preparations?

Key analysis & characteristics:

  • Domain anchoring: When conversation context (e.g., topic, dataset, or domain) is preserved, subsequent queries stay focused on the same subject. Without that anchor, later turns drift toward generic wording and can retrieve off-topic material.
  • Pronoun & reference resolution: Memory enables the system to resolve implicit references (“it”, “that document”, “the tool”) into concrete entities tied to the ongoing conversation. Without memory, follow-ups become ambiguous and require the user to restate context.
  • Terminology carry-over: Repeating or preserving precise domain vocabulary (technical terms, roles, artifacts) across turns improves the chance that retrieval will match indexed content. If history isn’t included, these specific terms tend to drop out and be replaced by vague synonyms.
  • Query specificity and evolution: With memory, queries often become richer and more targeted as the dialog proceeds (the model can refine intent). Without memory, query specificity typically decays over turns, reducing retrieval precision.
  • Retrieval precision and downstream quality: Because retrieval is the upstream signal for generation, better-focused queries produce higher-quality retrieved items; that in turn reduces hallucinations and improves final answer accuracy.
  • Degradation pattern: Conversations that rely on assumed context show a consistent decay in query usefulness when history is excluded — the further from the initial turn, the weaker the query signal.

Characteristic

Memory Included

No Memory

Domain anchoring

✅ Persistent across turns

❌ Lost after initial message

Pronoun & reference resolution

✅ Resolved to concrete entities

❌ Remains ambiguous

Terminology preservation

✅ Specific terms persist

❌ Specific terms drop out

Query specificity

✅ Becomes more focused / refined

📉 Becomes generic over turns

Retrieval precision (expected)

✅ Higher, more relevant results

⚠️ Lower, more irrelevant results

Downstream generation quality

✅ More accurate, fewer hallucinations

⚠️ Prone to errors and omissions

Closing Remarks:
Memory (or otherwise preserving context) is a practical lever for improving RAG systems in any domain that relies on precise language or multi‑turn interaction. If your application requires follow-ups, conditional reasoning, or references to previously introduced entities, include a small, targeted history window (or re-anchor the context manually) so the query generator can produce focused searches. For robust evaluation, instrument metrics such as precision, relevance ratings, and hallucination frequency; vary the history window to observe the tradeoff between context usefulness and prompt/token cost.

Top comments (0)