Meta Description: Learn how to build AI agents with persistent memory using Azure AI Foundry Memory Service. A complete developer guide covering concepts, memory types, scope, provisioning, and a full Python implementation with the Foundry Hosted Agent Framework.
Persistent Agent Memory with Azure AI Foundry: A Complete Developer Guide
Table of Contents
- Introduction
- What Is Azure AI Foundry Memory?
- Memory Types Deep Dive
- Memory Architecture: How It Really Works
- Access Patterns: Tool vs. Low-Level API
- Understanding Scope
- Hands-On: Provisioning a Memory Store
- Hands-On: Building the Foundry Hosted Memory Agent
- Running & Deploying the Agent
- Security Best Practices
- Quotas, Limits & Regional Availability
- Conclusion + Next Steps
Introduction
Imagine you're a developer who has just shipped a polished AI assistant for your SaaS product. Users log in, ask questions, and get sharp, helpful responses. The launch goes well. Then the complaints start rolling in.
"Why does it keep asking me for my name every single session?"
"I told it last week that I'm vegetarian — why is it recommending steak again?"
"It feels like talking to someone with amnesia."
This is the stateless agent problem, and it is one of the most frustrating gaps between the promise of conversational AI and the lived reality of production deployments. Every conversation starts from a blank slate. The agent has no idea who it is talking to, what that person prefers, or what was discussed yesterday, last week, or a month ago. The result is a user experience that feels hollow and repetitive — the opposite of the intelligent, personalized assistant your users were promised.
The solution is persistent memory, and Azure AI Foundry Memory is Microsoft's production-grade answer to this exact problem. Introduced as part of the Azure AI Foundry platform, the Memory Service gives agents the ability to remember facts across sessions, distill long conversation histories into concise summaries, and retrieve the right context at the right moment — all without you having to build and maintain a custom memory system from scratch.
In this guide, we'll go deep. You'll learn exactly what Azure AI Foundry Memory is, how its three-phase pipeline works under the hood, how to choose the right memory type for your use case, how to provision a Memory Store, and how to build a fully functional hosted memory agent using the Foundry Agent Framework in Python. By the end, you'll have everything you need to ship an AI assistant that actually remembers.
[IMAGE: Side-by-side comparison diagram: Left side shows "Without Memory" — user repeats preferences every session; Right side shows "With Azure AI Foundry Memory" — agent greets user by name, recalls preferences, picks up mid-conversation]
What Is Azure AI Foundry Memory?
At its core, Azure AI Foundry Memory is a managed service within the Azure AI Foundry platform that provides AI agents with long-term, persistent memory across conversational sessions. Rather than relying solely on the context window of an LLM — which is inherently short-lived and discarded at the end of a session — the Memory Service stores extracted facts and conversation summaries in a durable Memory Store that persists indefinitely and can be retrieved semantically on future turns.
The distinction between short-term and long-term memory is fundamental here. Short-term memory is what the model currently "sees" in its context window: the ongoing conversation, the system prompt, any retrieved documents. This is what most production agents rely on exclusively. Long-term memory, by contrast, is what persists beyond the conversation — the facts, preferences, and summaries that accumulate over time and make an agent feel genuinely knowledgeable about its users.
Azure AI Foundry Memory provides the long-term layer. It doesn't replace the context window; it feeds it. On each new session, the Memory Service surfaces relevant stored information and injects it into the model's context, seamlessly bridging what was learned in past conversations with what the agent needs to know right now.
The Three-Phase Memory Pipeline
The entire lifecycle of a memory — from raw conversation to retrieved context — flows through three distinct phases:
[IMAGE: Architecture diagram showing the three-phase memory pipeline — Extraction (conversation → facts), Consolidation (merge/dedup/resolve conflicts), Retrieval (semantic search → context injection) — with arrows flowing left to right across a horizontal timeline]
Phase 1: Extraction. As the conversation progresses, the Memory Service analyzes the dialogue and identifies facts worth remembering. This might be a user's name, a dietary restriction they mentioned in passing, a product preference, or any other detail that could be useful in a future session. The extraction is LLM-powered, meaning it understands context and nuance rather than performing naive keyword matching.
Phase 2: Consolidation. Raw extracted facts are rarely clean. A user might say "I'm vegetarian" in one session and "I don't eat meat or fish" in another. Consolidation is the process of merging new extractions with existing memories, deduplicating redundant entries, and resolving conflicts. If a user previously said they live in Seattle and now mentions moving to Austin, consolidation ensures the old fact is replaced rather than both being stored as true simultaneously. This phase is what separates a memory system that grows intelligently over time from one that simply accumulates noise.
Phase 3: Retrieval. When a new session begins (or mid-session when relevant), the Memory Service performs a semantic search against the stored memories for that user's scope. It returns the most contextually relevant facts and injects them into the agent's system context. This means the model doesn't receive a dump of every stored memory — it receives precisely the subset that matters for the current conversation.
This pipeline runs largely automatically when you use the Memory Search Tool access pattern, making it straightforward to add persistent memory to an existing agent with minimal code changes.
Memory Types Deep Dive
Azure AI Foundry Memory ships with two distinct memory types, each designed for a different category of information. Choosing the right type — or combining both — is key to building a memory system that is both effective and well-governed.
User Profile Memory
User Profile Memory is designed for static, durable facts about a person: their name, dietary restrictions, language preferences, accessibility needs, known product configurations, and similar attributes that are unlikely to change frequently and are broadly applicable across any conversation topic.
The defining characteristic of User Profile Memory is that it is fetched once, at session start. Rather than performing a semantic search every time a potentially relevant memory might exist, the service retrieves the entire user profile upfront and injects it into the system context as a stable foundation. This is efficient and appropriate because user profile facts are almost always relevant — an agent that knows a user is allergic to gluten should factor that in regardless of what the conversation is about.
You configure User Profile Memory using the user_profile_details parameter when creating your Memory Store. This is a natural-language instruction to the extraction model describing what kinds of facts should be captured and, importantly, what should be excluded. In the provisioning script we'll walk through later, you'll see this line:
user_profile_details=(
"Avoid irrelevant or sensitive data, such as age, financials, precise location, and credentials"
),
This governance instruction is critical. Without explicit guidance, the extraction model might capture personally sensitive information you neither want nor are permitted to store. Always define clear exclusion criteria in user_profile_details.
Chat Summary Memory
Chat Summary Memory takes a different approach. Instead of extracting discrete atomic facts, it produces distilled, per-topic summaries of conversation content. Think of it as an intelligent compression layer: instead of storing a 40-message exchange verbatim, Chat Summary Memory produces a compact narrative that captures the essential arc — what was discussed, what decisions were made, what questions remain open.
Unlike User Profile Memory, Chat Summary Memory is retrieved contextually via semantic search. Not every conversation summary is relevant to every future session — a summary of a technical support conversation is probably not relevant when the user is asking about billing. The semantic retrieval layer ensures that only topically relevant summaries surface, keeping the context window focused and the model's reasoning clean.
You enable Chat Summary Memory by setting chat_summary_enabled=True in your MemoryStoreDefaultOptions. In the sample we're working with, it's disabled (chat_summary_enabled=False) to keep the demo focused on user profile memory, but for production agents handling diverse conversation topics, enabling both types together is usually the right call.
| Memory Type | What It Stores | When Retrieved | Config Parameter |
|---|---|---|---|
| User Profile Memory | Static facts: name, preferences, restrictions | Once at session start |
user_profile_details (string instruction) |
| Chat Summary Memory | Per-topic conversation summaries | Contextually via semantic search | chat_summary_enabled=True |
The two memory types are complementary. User Profile Memory gives the agent a stable, always-available picture of who it's talking to. Chat Summary Memory gives it the ability to recall the arc of past conversations in relevant contexts. Together, they provide a comprehensive long-term memory foundation.
Memory Architecture: How It Really Works
Understanding the architectural components of Azure AI Foundry Memory helps you reason about isolation, scalability, and the operational characteristics of the system you're building.
The Memory Store
The Memory Store is the top-level resource — the durable storage container that holds all memories. You create one Memory Store per application or use case. The store is associated with your Azure AI Foundry project and requires two model deployments to function: a chat model (used for extraction and consolidation, e.g., gpt-4.1-mini) and an embedding model (used for semantic retrieval, e.g., text-embedding-3-small). Both must be available within your Foundry project.
The Memory Store is not a simple database. It is an intelligent service layer that orchestrates LLM-powered extraction, runs consolidation logic, maintains vector embeddings for semantic search, and handles scope-based isolation — all managed for you.
Scopes and Isolation
Within a single Memory Store, memories are organized by scope. A scope is a logical namespace that isolates one set of memories from another. In almost all multi-user scenarios, each user gets their own scope, ensuring that one user's memories are completely invisible to another user's sessions.
[IMAGE: Diagram showing memory scope isolation — a single Memory Store containing three separate scope "buckets" (User A, User B, User C), each with their own memory items, illustrating per-user isolation]
The scope is a string identifier, and the platform supports dynamic resolution of this string at runtime (more on this in the Understanding Scope section). Within a scope, the Memory Store maintains all extracted memories and summaries for that logical entity. You can have up to 100 scopes per store and up to 10,000 memories per scope, giving you room to scale across a substantial user base within a single store instance.
LLM-Powered Consolidation
The consolidation phase deserves special attention because it is what distinguishes Azure AI Foundry Memory from a naive "append facts to a database" approach. After each conversation turn (or at session end, depending on configuration), the extraction model identifies new facts from the recent exchange and the consolidation model compares them against existing stored memories for that scope.
Consolidation performs three operations: merge (combining related facts into a single, richer record), deduplication (discarding new facts that are already represented), and conflict resolution (updating or replacing facts that contradict existing memories). This is what allows the memory store to remain clean and authoritative over time, rather than becoming an ever-growing pile of contradictory statements.
The billing model reflects this LLM-powered approach: you are billed based on underlying model usage (chat model tokens for extraction/consolidation, embedding model calls for retrieval), not on a per-memory or per-store flat fee. Keep this in mind when designing high-volume deployments.
Access Patterns: Tool vs. Low-Level API
Azure AI Foundry Memory exposes two distinct integration patterns, and choosing the right one shapes how much control you have versus how much complexity you manage.
Memory Search Tool
The Memory Search Tool is the high-level, agent-native access pattern. You attach it to your agent as a tool, and the framework handles the read/write lifecycle automatically. At the start of each session, user profile memories are injected into the system context. During the session, the tool can perform contextual semantic searches when the agent's reasoning determines that retrieving additional memories is warranted. At the end of the turn, newly extracted facts are written back to the store.
This is the pattern used in the Foundry Agent Framework sample we'll build in this guide, implemented via FoundryMemoryProvider. It is the right choice for the vast majority of production agents because it requires minimal code, works consistently across session types, and handles the extraction/consolidation lifecycle without custom orchestration.
Low-Level Memory Store APIs
For scenarios requiring precise control — custom extraction logic, selective memory writes, cross-scope queries, or integration with non-agent systems — the Memory Store APIs provide direct programmatic access. You can call memory_stores.search() to perform arbitrary semantic searches, write memories explicitly, delete specific records, and enumerate the contents of a scope.
The low-level APIs are accessed via the project.beta.memory_stores client in the Azure AI Projects SDK (the allow_preview=True flag is required, as the API is currently in public preview). These APIs give you the full power of the memory service but require you to implement your own retrieval and injection logic.
For most teams shipping their first memory-enabled agent, starting with the Memory Search Tool is strongly recommended. Reach for the low-level APIs when you have a specific requirement that the tool abstraction cannot satisfy.
Understanding Scope
Scope is one of the most important concepts in Azure AI Foundry Memory, and it is worth spending time to understand how it resolves at runtime.
The {{$userId}} Template
In the agent code, you'll specify the scope as the string literal "{{$userId}}". This is a template placeholder that the Foundry hosting infrastructure replaces with the actual, authenticated user identity at runtime. You never need to hard-code or dynamically construct the scope string in your application code; the platform resolves it for you.
The resolution logic uses two potential sources, in order of precedence:
x-memory-user-idheader: If the HTTP request to the hosted agent includes this header, its value is used as the user identity for scope resolution. This is useful when you have your own user identity system and want to pass a stable internal user ID (e.g., a database primary key or a GUID from your own identity provider) to the memory service.Entra ID TID+OID fallback: If no
x-memory-user-idheader is present, the platform falls back to the combination of the authenticated user's Azure Entra tenant ID and object ID. This works automatically in scenarios where your users authenticate via Entra, ensuring that each Entra user naturally gets their own memory scope without any additional instrumentation.
Static Scopes
For non-user-specific memory — shared knowledge that applies to all users of an agent, or a single-user personal assistant — you can use a static string as the scope instead of the {{$userId}} template. For example, scope="shared" or scope="assistant-owner". Static scopes are a good fit for personal productivity agents where a single person owns the deployment, or for storing organization-wide facts that should be available to all users.
RBAC Requirements
Accessing the Memory Store requires the Azure AI User role assigned on your Foundry project scope in Azure RBAC. This applies to both the agent's runtime identity (typically a managed identity or a service principal) and to developers running scripts locally (typically via az login and DefaultAzureCredential).
[IMAGE: Screenshot or mockup of Azure portal showing the Azure AI Foundry project with RBAC role assignment — specifically the "Azure AI User" role assignment screen]
Make sure this role assignment is in place before running either the provisioning script or the agent itself; you'll encounter authorization errors at both the memory_stores.get() and memory_stores.create() calls otherwise.
Hands-On: Provisioning a Memory Store
Before your agent can use persistent memory, you need to create a Memory Store within your Foundry project. The following script, provision_memory_store.py, handles idempotent provisioning — it creates the store if it doesn't exist, and safely no-ops if it already does.
Environment Setup
Start by setting the required environment variables. These values come from your Azure AI Foundry project settings and your deployed model names:
export FOUNDRY_PROJECT_ENDPOINT="https://<account>.services.ai.azure.com/api/projects/<project>"
export AZURE_AI_MODEL_DEPLOYMENT_NAME="gpt-4.1-mini"
export AZURE_AI_EMBEDDING_MODEL_DEPLOYMENT_NAME="text-embedding-3-small"
export MEMORY_STORE_NAME="agent_framework_memory"
The endpoint follows the standard Foundry project endpoint format. The model deployment names must correspond to models that are actually deployed within your project — the memory service will call these models directly for extraction, consolidation, and embedding generation.
The Provisioning Script
# Copyright (c) Microsoft. All rights reserved.
"""Provision the Azure AI Foundry Memory Store used by this sample."""
import asyncio
import os
from azure.ai.projects.aio import AIProjectClient
from azure.ai.projects.models import (
MemoryStoreDefaultDefinition,
MemoryStoreDefaultOptions,
)
from azure.core.exceptions import ResourceNotFoundError
from azure.identity.aio import DefaultAzureCredential
from dotenv import load_dotenv
load_dotenv()
async def main() -> None:
endpoint = os.environ["FOUNDRY_PROJECT_ENDPOINT"]
memory_store_name = os.environ["MEMORY_STORE_NAME"]
chat_model = os.environ["AZURE_AI_MODEL_DEPLOYMENT_NAME"]
embedding_model = os.environ["AZURE_AI_EMBEDDING_MODEL_DEPLOYMENT_NAME"]
async with (
DefaultAzureCredential() as credential,
AIProjectClient(endpoint=endpoint, credential=credential, allow_preview=True) as project,
):
try:
existing = await project.beta.memory_stores.get(name=memory_store_name)
print(f"Memory store '{existing.name}' already exists (id={existing.id}); leaving as-is.")
return
except ResourceNotFoundError:
pass
print(f"Creating memory store '{memory_store_name}'...")
definition = MemoryStoreDefaultDefinition(
chat_model=chat_model,
embedding_model=embedding_model,
options=MemoryStoreDefaultOptions(
chat_summary_enabled=False,
user_profile_enabled=True,
user_profile_details=(
"Avoid irrelevant or sensitive data, such as age, financials, precise location, and credentials"
),
),
)
created = await project.beta.memory_stores.create(
name=memory_store_name,
description="Memory store for the Agent Framework foundry-hosted memory sample",
definition=definition,
)
print(f"Created memory store '{created.name}' (id={created.id}).")
try:
verified = await project.beta.memory_stores.get(name=memory_store_name)
except ResourceNotFoundError as exc:
raise RuntimeError(
f"Memory store '{memory_store_name}' was not found after creation; "
"the service may not have persisted it."
) from exc
print(f"Verified memory store '{verified.name}' is available on the service (id={verified.id}).")
if __name__ == "__main__":
asyncio.run(main())
Code Walkthrough
A few implementation details are worth calling out explicitly.
allow_preview=True: This flag on the AIProjectClient constructor is required to access the project.beta namespace, which hosts the memory store APIs. Because the Memory Service is currently in public preview, the beta client is where the APIs live. This flag is a deliberate opt-in to acknowledge that the API surface may evolve.
Idempotency via ResourceNotFoundError: The script first attempts a GET on the named store. If it succeeds, the store already exists and the script exits cleanly. Only if a ResourceNotFoundError is raised does the script proceed to create. This makes the provisioning script safe to re-run in CI/CD pipelines or developer setup scripts without risk of accidentally creating duplicate stores or raising errors on subsequent runs.
Post-creation verification: After create() returns, the script performs a second GET to confirm the store is actually visible on the service. This guards against an edge case where the creation call returns success but the resource isn't yet available — a realistic concern with eventually consistent distributed services.
MemoryStoreDefaultOptions: This is where you configure which memory types are active. Here, user_profile_enabled=True activates profile memory with the specified exclusion instruction, while chat_summary_enabled=False keeps summaries off for this sample. Adjust these settings based on your application's memory requirements.
DefaultAzureCredential: Authentication uses the standard Azure credential chain, which automatically picks up az login tokens during local development, managed identity in Azure-hosted environments, and environment-variable credentials in CI. You don't need to manage credentials explicitly.
Hands-On: Building the Foundry Hosted Memory Agent
With the Memory Store provisioned, it's time to build the agent. The following main.py implements a fully functional hosted memory agent using the Foundry Agent Framework's FoundryChatClient, FoundryMemoryProvider, and ResponsesHostServer.
[IMAGE: Diagram showing the FoundryMemoryProvider context provider flow — on each agent turn: (1) retrieve user-profile memories, (2) semantic search for contextual memories, (3) inject into model context, (4) post-turn update store with new facts]
# Copyright (c) Microsoft. All rights reserved.
"""Foundry Memory hosted agent sample."""
import asyncio
import logging
import os
from agent_framework import Agent
from agent_framework.foundry import FoundryChatClient, FoundryMemoryProvider
from agent_framework_foundry_hosting import ResponsesHostServer
from azure.identity.aio import DefaultAzureCredential
from dotenv import load_dotenv
load_dotenv()
logger = logging.getLogger(__name__)
def _resolved_env(name: str) -> str:
value = os.environ.get(name, "").strip()
if (value.startswith("${") and value.endswith("}")) or (
value.startswith("{{") and value.endswith("}}")
):
return ""
return value
async def main() -> None:
client = FoundryChatClient(
project_endpoint=os.environ["FOUNDRY_PROJECT_ENDPOINT"],
model=os.environ["AZURE_AI_MODEL_DEPLOYMENT_NAME"],
credential=DefaultAzureCredential(),
allow_preview=True,
)
memory_store_name = _resolved_env("MEMORY_STORE_NAME")
context_providers = []
if not memory_store_name:
logger.warning("MEMORY_STORE_NAME is not set; memory will not be available to the agent.")
else:
memory_provider = FoundryMemoryProvider(
project_client=client.project_client,
memory_store_name=memory_store_name,
scope="{{$userId}}",
)
context_providers.append(memory_provider)
agent = Agent(
client=client,
instructions=(
"You are a helpful assistant that remembers facts the user has shared "
"across conversations. Relevant memories from previous interactions are "
"automatically provided to you in the system context. Use them when "
"answering, and acknowledge when you are relying on remembered facts."
),
context_providers=context_providers,
default_options={"store": False},
)
server = ResponsesHostServer(agent)
await server.run_async()
if __name__ == "__main__":
asyncio.run(main())
Code Walkthrough
FoundryChatClient: This is the Foundry-native chat client from the Agent Framework. It wraps the AIProjectClient and handles authentication, model routing, and API versioning. Crucially, client.project_client exposes the underlying AIProjectClient instance, which is then reused by the FoundryMemoryProvider. This single-auth-context pattern means your agent uses one credential and one client for both chat completions and memory operations — no separate credential configuration required.
_resolved_env: This helper function detects unresolved template placeholders in environment variables — strings that look like ${VARIABLE} or {{VARIABLE}} — and returns an empty string in those cases. This handles the scenario where a deployment template has been applied but the MEMORY_STORE_NAME variable wasn't injected correctly, preventing the agent from crashing at startup with a cryptic error. Instead, it degrades gracefully by skipping memory initialization and logging a warning.
FoundryMemoryProvider: This is the core memory integration component. It takes the shared project_client, the memory store name, and the scope template string. When instantiated with scope="{{$userId}}", the provider passes this template to the hosting infrastructure, which resolves it to the actual user identity at request time. The FoundryMemoryProvider acts as a context provider — on each agent turn, it retrieves relevant memories and injects them into the system context before the model processes the user's message.
default_options={"store": False}: This tells the Agent Framework not to maintain its own conversation history store. Conversation history management is delegated to the hosting infrastructure — the ResponsesHostServer and the Foundry platform — rather than being handled at the application layer. This is the correct approach for agents hosted on Foundry, as the platform manages session state natively.
context_providers: The agent accepts a list of context providers, each of which can inject additional context into the system prompt before each model call. By appending the FoundryMemoryProvider to this list, you wire memory retrieval directly into the agent's reasoning loop. Adding other context providers (e.g., RAG document retrievers, user preference loaders) follows the same pattern.
ResponsesHostServer: This class wraps the agent in a server that implements the OpenAI Responses API protocol, making it compatible with any client that speaks that protocol. It handles the HTTP serving, session routing, and lifecycle management, letting you focus on agent logic rather than networking boilerplate.
Running & Deploying the Agent
With the code in place, you have three paths to run and deploy your memory agent: the azd CLI, the VS Code Foundry Toolkit extension, and a direct Python execution for quick local testing.
Using azd (Recommended)
The Azure Developer CLI with the AI agents extension provides the most streamlined workflow from local development to production deployment:
# Install azd AI agent extension
azd ext install azure.ai.agents
# Initialize agent from manifest
mkdir my-memory-agent && cd my-memory-agent
azd ai agent init -m https://github.com/microsoft-foundry/foundry-samples/blob/main/samples/python/hosted-agents/agent-framework/responses/13-foundry-memory/agent.manifest.yaml
# Run locally
azd ai agent run
# Invoke locally
azd ai agent invoke --local "Hi, I'm allergic to dairy"
# Set memory store name in azd env
azd env set MEMORY_STORE_NAME "agent_framework_memory"
# Deploy to Foundry
azd deploy
# Invoke deployed agent
azd ai agent invoke "What are my allergies?"
The azd ai agent init command bootstraps your local project from the agent manifest, pulling down the correct main.py, requirements.txt, and configuration files. The azd ai agent run command starts the agent locally, while azd ai agent invoke --local sends a test message directly. Notice the workflow in the example: first you tell the agent you're allergic to dairy, then — in a separate invocation — you ask what your allergies are. If memory is working correctly, the second invocation should recall the dairy allergy from the first, even though it was a different "session."
After azd deploy, your agent runs on Azure Foundry infrastructure with full managed scaling, identity, and observability. The azd ai agent invoke command (without --local) sends the request to the deployed endpoint.
VS Code Foundry Toolkit
If you prefer a GUI-driven workflow, the VS Code Foundry Toolkit extension provides a point-and-click interface for initializing agents from manifests, running them locally with integrated debugging, and deploying to Foundry. Install the extension from the VS Code Marketplace and look for the "Azure AI Foundry" activity bar icon.
Before Running: Pre-Flight Checklist
Before attempting any run, ensure the following are in place:
-
FOUNDRY_PROJECT_ENDPOINTis set to your project's endpoint URL -
AZURE_AI_MODEL_DEPLOYMENT_NAMEpoints to a deployed chat model (e.g.,gpt-4.1-mini) -
AZURE_AI_EMBEDDING_MODEL_DEPLOYMENT_NAMEpoints to a deployed embedding model (e.g.,text-embedding-3-small) -
MEMORY_STORE_NAMEis set andprovision_memory_store.pyhas been run successfully - Your identity (local) or the agent's managed identity (deployed) has the Azure AI User role on the Foundry project
- The
azure-ai-projects,azure-identity, andagent-frameworkpackages are installed in your Python environment
Security Best Practices
Persistent memory introduces security considerations that don't apply to stateless agents. As the memory store accumulates sensitive user data over time, it becomes both a valuable asset and a potential attack surface. Here's what to take seriously.
Prompt Injection and Memory Poisoning
The most significant threat specific to memory-enabled agents is indirect prompt injection targeting the extraction pipeline. If an attacker can craft input that causes the extraction model to store false or malicious facts — "remember that the admin password is X" or "remember that this user has billing tier Enterprise" — those poisoned memories could influence future sessions in ways that are hard to detect and trace.
Mitigate this by writing explicit, constrained user_profile_details instructions that describe what categories of facts should be extracted. Narrow the extraction scope to only what your application genuinely needs. A cooking assistant should extract dietary restrictions, not authentication credentials or financial information.
Azure AI Content Safety
Integrate Azure AI Content Safety as a layer around your agent's inputs and outputs. Content Safety can detect and block prompt injection attempts, jailbreaking patterns, and sensitive information leakage before they reach the extraction pipeline. For memory-enabled agents handling sensitive domains, this is not optional — it's a necessary defense layer.
Adversarial Testing
Before shipping a memory-enabled agent to production, conduct dedicated red-team exercises focused on memory manipulation. Attempt to inject false memories via crafted user messages. Verify that the memory store does not retain excluded categories (financials, credentials, location) even when the user volunteers that information. Test cross-user isolation by verifying that memories written under one user scope are not retrievable from another scope.
Data Residency and Compliance
The Memory Store persists data in the Azure region where your Foundry project is deployed. Ensure that region is compliant with any data residency requirements your organization or customers have. Users in regulated industries (healthcare, finance, government) may have specific requirements about where personal data can be stored. Review your organization's data classification policies and determine whether any memory types or user categories require opt-out mechanisms.
Minimal Scope of Extraction
Always apply the principle of least privilege to memory extraction. The user_profile_details instruction is your primary governance tool — use it aggressively to exclude anything your agent doesn't need. If your agent doesn't make product recommendations, it doesn't need to store purchasing history. If it's a technical support bot, it doesn't need the user's demographic information. Store less, and you expose less.
Quotas, Limits & Regional Availability
As with all Azure services, Azure AI Foundry Memory has quotas and regional constraints to plan around. Here is a summary of the key limits:
| Resource | Limit |
|---|---|
| Max scopes per Memory Store | 100 |
| Max memories per scope | 10,000 |
| Memory search requests per minute | 1,000 |
| Memory update requests per minute | 1,000 |
| Billing basis | Underlying model usage (tokens + embeddings) |
| Preview status | Public preview |
The per-scope memory limit of 10,000 is generous for most use cases, but if your agent is designed for very long-lived users who interact frequently, you should implement a memory lifecycle policy — periodically consolidating or pruning older, less relevant memories to stay well within the limit.
Regional Availability
Azure AI Foundry Memory is available in the following regions as of the time of writing. Always check the official Azure documentation for the latest regional coverage, as new regions are added regularly:
- Australia East
- Brazil South
- Canada East
- East US 2
- France Central
- Italy North
- Japan East
- Korea Central
- North Central US
- Norway East
- South Africa North
- South India
- Sweden Central
- Switzerland North
- UAE North
- UK South
- West US
- West US 2
- West US 3
If your target deployment region is not on this list, you may need to deploy your Foundry project to a supported region. For global deployments, consider the latency implications of cross-region memory lookups and whether a region close to your primary user base is available.
Conclusion + Next Steps
Stateless agents are a frustrating limitation that erodes user trust and undermines the value of conversational AI. Azure AI Foundry Memory solves this with a production-grade, managed memory service that handles extraction, consolidation, and contextual retrieval automatically — letting you focus on building the agent experience rather than the memory infrastructure.
In this guide, we've covered the complete picture: the three-phase memory pipeline that transforms raw conversation into durable, organized knowledge; the two memory types (User Profile Memory and Chat Summary Memory) and when to use each; the scoping model that enforces per-user isolation; the RBAC requirements; and a full end-to-end Python implementation using FoundryChatClient, FoundryMemoryProvider, and ResponsesHostServer from the Foundry Agent Framework.
The key things to take away:
- Run
provision_memory_store.pyonce to create your Memory Store before starting the agent — provisioning is idempotent and safe to re-run. - Use
scope="{{$userId}}"and let the platform resolve the actual user identity — don't try to manage scope strings manually in application code. - Define explicit extraction constraints in
user_profile_detailsto govern what gets stored and protect user privacy. - Test memory persistence end-to-end using
azd ai agent invoke --localbefore deploying — the round-trip from storage to retrieval is the behavior that matters most. - Take prompt injection and memory poisoning seriously; integrate Azure AI Content Safety and conduct adversarial testing before going to production.
The sample code in this guide is drawn from the official Microsoft Foundry Samples repository. To get started immediately, explore the following resources:
- 📚 Azure AI Foundry Memory Documentation
- 💻 GitHub Sample #13 — Foundry Hosted Memory Agent
- 🔧 Azure AI Projects SDK Reference
- 🚀 Azure Developer CLI (azd) Documentation
The era of agents that actually remember is here. Start building.
Top comments (0)