AHMED HASAN AKHTAR OVIEDO

Posted on Nov 30, 2025

Beyond the Context Window: Building a Stateful 'Memory' MCP Server on Cloudflare Workers

#mcp #ai #cloudflare #architecture

Large Language Models (LLMs) like Claude possess incredible reasoning capabilities, but they suffer from a critical flaw: They have no object permanence.

Once you close a chat session, the "mind" is wiped. While features like "Projects" or huge context windows (200k+ tokens) help, they are temporary buffers, not true memory. They are expensive, slow to re-process, and don't persist across different interfaces (e.g., moving from VS Code to the web interface).

To move from Chatbots to true Agents, we need to solve the state problem.

The Theory: The "Sidecar" Memory Pattern

The Model Context Protocol (MCP) is often described as a way to "connect AI to tools." But theoretically, it allows us to decouple the reasoning engine (the LLM) from the state (the data).

Instead of trying to cram everything into the prompt (Context Stuffing), we can use an MCP Server as a Sidecar Attachment.

The Brain: The LLM (Stateless, reasoning-only).
The Hippocampus: Our MCP Server (Stateful, long-term storage).

By building a server that allows the AI to write to a database, not just read from it, we give the model the ability to "learn" user preferences permanently without fine-tuning the model itself.

The Architecture: Why Edge Computing?

For this architecture to work, latency is the enemy. If Claude decides to call a tool to "recall" a memory, and that tool takes 2 seconds to wake up (a cold start), the chat experience feels sluggish.

This is why we are choosing Cloudflare Workers over traditional containers (like Docker/EC2) or standard serverless functions (like AWS Lambda):

0ms Cold Starts: The memory is available instantly.
Global KV Store: Cloudflare KV (Key-Value) replicates data to the edge. If you travel from New York to Tokyo, your AI's memory travels with you.
Native SSE (Server-Sent Events): MCP relies on a persistent connection stream. Workers handle these long-lived connections effortlessly compared to traditional REST APIs.

Today, we will build this. A Read/Write MCP Server that gives your AI a persistent memory bank.

The Tech Stack

Runtime: Cloudflare Workers (TypeScript)
Framework: Hono (Lightweight, web-standard compliant)
Protocol: Model Context Protocol SDK
Storage: Cloudflare KV

Step 1: Initialize the Project

First, let's create a new Worker using Hono. It's much cleaner than raw worker syntax and handles routing beautifully.

npm create hono@latest mcp-memory-vault
# Select "Cloudflare Workers" template
cd mcp-memory-vault
npm install @modelcontextprotocol/sdk eventsource

Step 2: Configure The KV Namespace

We need a persistent place to store our memories. In your wrangler.toml file, add a KV namespace binding. This tells Cloudflare to expose a database to your code.

name = "mcp-memory-vault"
main = "src/index.ts"
compatibility_date = "2024-11-20"

# Add this block
[[kv_namespaces]]
binding = "MEMORY_KV"
id = "YOUR_KV_NAMESPACE_ID" 
# Run 'npx wrangler kv:namespace create MEMORY_KV' to get this ID

Step 3: The Code

Here is the secret sauce. We aren't just creating a server; we are mapping MCP "Tools" to Cloudflare KV operations.

Create a file src/index.ts. Note how we handle the SSE stream—this is the trickiest part of building a remote MCP server.

import { Hono } from 'hono'
import { MCP, Tool } from '@modelcontextprotocol/sdk'
import { SSEServerTransport } from '@modelcontextprotocol/sdk/server/sse'

// Define the shape of our environment
type Bindings = {
  MEMORY_KV: KVNamespace
}

const app = new Hono<{ Bindings: Bindings }>()

// 1. Define the Tools
// We give the AI three abilities: Remember (Write), Recall (Read), and List (Audit).
const TOOLS = [
  {
    name: "remember_fact",
    description: "Stores a specific fact, preference, or snippet for later retrieval.",
    inputSchema: {
      type: "object",
      properties: {
        key: { type: "string", description: "The category or label (e.g., 'project_db_schema')" },
        value: { type: "string", description: "The content to remember" }
      },
      required: ["key", "value"]
    }
  },
  {
    name: "recall_fact",
    description: "Retrieves a stored fact by its key.",
    inputSchema: {
      type: "object",
      properties: {
        key: { type: "string" }
      },
      required: ["key"]
    }
  },
  {
    name: "list_memories",
    description: "Lists all stored memory keys to see what is saved.",
    inputSchema: {
      type: "object",
      properties: {},
    }
  }
];

// 2. The SSE Endpoint (Handshake)
app.get('/sse', async (c) => {
  const transport = new SSEServerTransport('/message', c.res);

  // Initialize MCP Server within the request context
  const server = new MCP.Server({
    name: "cloud-memory",
    version: "1.0.0"
  }, {
    capabilities: {
      tools: {}
    }
  });

  // Register Tools
  server.setRequestHandler(MCP.ListToolsRequestSchema, async () => ({
    tools: TOOLS
  }));

  // Handle Tool Execution
  server.setRequestHandler(MCP.CallToolRequestSchema, async (request) => {
    const { name, arguments: args } = request.params;

    // Connect to Cloudflare KV
    const kv = c.env.MEMORY_KV;

    if (name === 'remember_fact') {
      const { key, value } = args as { key: string, value: string };
      // Check for Spanish preference context (a little easter egg)
      const content = `Guardado exitosamente: ${key}`; 
      await kv.put(key, value);
      return { content: [{ type: "text", text: content }] };
    }

    if (name === 'recall_fact') {
      const { key } = args as { key: string };
      const value = await kv.get(key);
      return { 
        content: [{ 
          type: "text", 
          text: value ? `Memory found: ${value}` : "No memory found for that key." 
        }] 
      };
    }

    if (name === 'list_memories') {
      const list = await kv.list();
      const keys = list.keys.map(k => k.name).join(", ");
      return { content: [{ type: "text", text: `Stored Keys: ${keys}` }] };
    }

    throw new Error(`Tool not found: ${name}`);
  });

  // Connect transport
  await server.connect(transport);

  // Return the stream
  return transport.handleRequest(c.req.raw);
});

// 3. The Message Endpoint (For subsequent communication)
app.post('/message', async (c) => {
  // In a real production app, you might use Durable Objects here for 
  // complex state, but for KV operations, this stateless handler works.
  return c.text("Message received");
});

export default app

Step 4: Deploying

Deploying to the edge is one command:

npx wrangler deploy

You will get a URL like: https://mcp-memory-vault.yourname.workers.dev.

Step 5: Connecting to Claude Desktop

Locate your Claude Desktop configuration file.

Add your new remote server. Note that we use npx -y to run the inspector or a local bridge, as the desktop app typically talks to a local process which then tunnels to the remote URL.

{
  "mcpServers": {
    "cloudflare-memory": {
      "command": "npx",
      "args": [
        "-y",
        "@modelcontextprotocol/inspector",
        "[https://mcp-memory-vault.yourname.workers.dev/sse](https://mcp-memory-vault.yourname.workers.dev/sse)"
      ]
    }
  }
}

The Result

Now, open Claude. You can treat it like a teammate with a notebook.

User:

"Please remember_fact with key 'deploy_script' and value 'npm run build && wrangler deploy --env production'"

Claude:

Guardado exitosamente: deploy_script

...Two days later...

User:

"How do I deploy? Check my memories."

Claude (calling recall_fact):

Memory found: npm run build && wrangler deploy --env production

Conclusion

We just built a server that transforms an LLM from a temporary chat interface into a persistent operational partner. By leveraging Cloudflare KV, we ensured it's fast, cheap, and available globally.

The Model Context Protocol isn't just about giving AI access to Google; it's about giving AI access to your world, your context, and your specific way of doing things.

Reference Repository

You can find the underlying SDK and examples at the official repository below:

GitHub: Official Model Context Protocol SDK

Happy Coding!

Top comments (2)

JOAN CRISTIAN MEDINA QUISPE • Dec 4 '25

La falta de persistencia real es el gran dolor de cabeza con los LLMs actuales. Me parece genial cómo has mapeado las herramientas (remember_fact, recall_fact) directamente a operaciones KV de una forma tan limpia con Hono.

DAVID JORDAN ANAMPA PANCCA • Dec 4 '25

Está muy bien planteado. La forma en que usas Cloudflare Workers y KV para gestionar la memoria de manera eficiente tiene mucho potencial. Es un enfoque práctico y directo para resolver el problema de la persistencia en los LLMs.