AI agents are slow by nature. When a user sends a message to an AI-powered chatbot, what looks like a simple interaction often triggers a chain of events under the hood. The agent reasons about the request, identifies which tools it needs, executes those tools against backend services, waits for responses, and then synthesizes a final reply. In a well-architected system, this chain can span multiple microservices, LLM calls, and external API round-trips.
The chatbot architecture I will be describing in this post is built around the Model Context Protocol (MCP), a standard that allows AI agents to discover and execute backend tools dynamically. The backend is composed of four microservices, each owning a distinct domain: conversation persistence, RAG-powered recommendations, domain-specific operations, and the MCP gateway itself that brokers all tool discovery and execution.
The problem we kept running into was straightforward: the backend was asynchronous and slow, but the user experience needed to feel fast and responsive. A user asking the agent to pull records, fill out a form, or query a knowledge base could be waiting anywhere from 5 to 30 seconds. A blank screen for that duration is unacceptable in a production product.
Redis solved three distinct problems for us, and each solution used a fundamentally different Redis pattern. This post walks through all three: a Pub/Sub channel for real-time progress streaming, an execution result cache to protect the upstream data layer, and a permission preflight cache to make per-user authorization checks fast enough to run on every request.
Why the Workflow Orchestrator Creates a Frontend Problem
In an MCP-based chatbot, the microservices are intentionally stateless. Each service owns its domain and none of them manage the conversation loop itself. That responsibility belongs to the Workflow Orchestrator. When a user sends a message, the Orchestrator receives it, invokes the AI agent, interprets the agent’s tool call requests, routes them to the MCP gateway, collects the results, and feeds them back to the agent for the next reasoning step. It is the conductor of the entire multi-turn interaction.
This design is clean and scales well. But it creates a specific problem for the frontend. The Orchestrator is excellent at managing backend flow but it is essentially opaque to the user interface. While it is busy triggering tool calls, waiting on LLM responses, and routing results between services, the frontend has no visibility into what is happening. For multi-turn form filling workflows or deep knowledge base queries that take upwards of 20 to 30 seconds, a blank screen is not acceptable. This is the problem the first Redis pattern solves.
Pattern 1: Redis Pub/Sub for Real-Time Progress Streaming
Rather than having the frontend poll an endpoint repeatedly asking “are you done yet?”, we flipped the model: the backend publishes progress updates as they happen, and the frontend listens.
Two services publish to this dedicated Redis instance: the MCP gateway and the domain service responsible for form filling workflows. This is an important architectural detail. The updates do not come from a single source. As the Orchestrator triggers tool discovery, tool execution, and domain-specific operations across multiple services, each of those services independently publishes its own granular status updates to the shared Redis channel.
Each channel is keyed by a combination of the session identifier and the message identifier, ensuring that progress updates for one user’s active request are completely isolated from another user’s concurrent session. Messages carry a TTL of 10 minutes. The kinds of updates published range from high level orchestration signals to granular step level messages such as “Thinking…”, “Verifying answer…” and “Submitting…” so the user knows exactly where in the process the agent is.
On the frontend side, we implemented a Server-Sent Events (SSE) endpoint on the MCP gateway rather than exposing the Redis channel directly or relying on WebSockets. The frontend opens a single long-lived HTTP connection to this endpoint keyed by the conversation identifier. The SSE endpoint internally subscribes to the relevant Redis channel and streams any published messages directly to the browser as they arrive. Redis handles the inter-service messaging, SSE handles the last mile delivery to the browser, and the frontend gets a real-time stream through a standard HTTP connection with no polling and no WebSocket infrastructure to manage.
Pattern 2: Redis Key-Value Cache for Tool Execution Results
One of the first lessons you learn when building agentic systems is that AI models are highly redundant. When an agent reasons through a multi-turn conversation, it frequently requests the exact same data multiple times. If every one of these tool calls hits your upstream GraphQL API, your backend will quickly become a bottleneck.
We deployed a dedicated Redis instance to act as an execution cache for MCP tool calls. The naive approach to caching API calls is to hash the entire JSON payload of the request but for AI tools this is a trap. AI agents often inject dynamic metadata like session identifiers, timestamps, or UI display preferences into their tool calls. If you hash the whole payload, every request looks unique and your cache hit rate drops significantly.
Instead we designed a deterministic cache key strategy. We generate the key by combining the tool’s stable identifier with a strict filtered subset of the input parameters, stripping out irrelevant metadata and caching only on the core query arguments. This ensures that slightly differently phrased agent requests still hit the cache. We assigned these cached responses a TTL of 45 minutes, long enough to cover an extended user session but short enough to prevent the AI from reasoning over stale data.
Pattern 3: Redis Memoization for Permission Preflighting
Every tool call in an MCP-based architecture carries a security question: does this user have permission to execute this tool against this data? In a multi-tenant platform with varying roles and access levels, that question cannot be skipped. But answering it naively on every request creates a serious performance problem. In a single session this might mean dozens of preflight checks, each one adding a synchronous HTTP round-trip to the critical path of the agent’s reasoning loop.
The solution is to cache the result of each authorization check. After the MCP gateway performs a preflight check for a given user against a given tool, the result is stored with a TTL of 30 minutes. For the duration of that window, any subsequent request from the same user for the same tool is resolved directly from the cache. The cache key is a composite of the user identifier and the tool identifier, making it a per-user per-tool lookup. A blanket cache keyed only on the tool would incorrectly serve one user’s permissions to another.
The tradeoff is worth being explicit about. A 30 minute TTL means that if a user’s permissions are revoked, the MCP gateway may continue to honor the cached result for up to 30 minutes. For most enterprise use cases this is an acceptable window since permission changes are administrative actions that do not require immediate propagation. But it is a deliberate architectural decision that should be documented and understood by anyone operating the system.
Conclusion
Redis is not one tool. It is a runtime with multiple distinct operating modes, and the teams that get the most out of it are the ones that treat each mode as a first class architectural decision. In this system, Pub/Sub solved the frontend visibility gap, key-value caching protected the upstream data layer, and memoization made per-request security enforcement viable at scale.
The decision to keep these three usages on separate instances was deliberate. Each has different TTL requirements, different eviction policy needs, and different failure tolerance. The Pub/Sub instance going down is a UX problem. The execution cache going down is a performance problem. The permission cache going down is a security problem. Conflating them onto a single instance means a misconfigured eviction policy in one domain can silently corrupt the behavior of another.
If you are building agentic systems on top of MCP or similar tool-calling architectures, model each problem independently, pick the right Redis pattern for it, and keep the instances separate.
Top comments (0)