DEV Community

Cover image for Your AI has no memory. Here's How to Add One with Node.js and Mem0
Ayomide olofinsawe
Ayomide olofinsawe

Posted on

Your AI has no memory. Here's How to Add One with Node.js and Mem0

Every chat interface you've ever used feels like the AI remembers you. It doesn't. There's no memory, no session, no awareness of previous conversations. What you're seeing is your application feeding the entire conversation history back into every request. The model just processes whatever is in front of it and forgets everything the moment it responds.

That approach works, but it has real ceilings. The message array grows with every exchange, so longer conversations mean larger payloads and higher token costs. You'll hit context window limits eventually. And across sessions, everything is lost start a new conversation, and the user has to repeat themselves from scratch. There's also no intelligence to it: a throwaway message like "ok thanks" carries the same weight as "I'm building a fintech app in Node.js, and I hate ORMs."

What you actually want is a system that extracts the facts that matter, stores them, and retrieves only the relevant ones when needed without stuffing the entire conversation history into every request.

That's what Mem0 does. It sits between your application and your LLM as a dedicated memory layer: automatically pulling meaningful facts out of conversations, persisting them across sessions, and injecting the right context into future requests.

In this tutorial, we'll build a memory-aware REST API from scratch using:

  • Express and TypeScript for the API layer
  • Groq as the LLM provider (fast inference, generous free tier)
  • Mem0 as the memory layer that ties it all together By the end, you'll have a working API where a user can send a message, have relevant facts automatically extracted and stored, and receive responses informed by everything the system has learned about them across previous conversations.

Let's get into it.

What is Mem0 and How Does It Work

Mem0 is a memory layer for AI applications. The core idea is simple: instead of your application managing conversation history and replaying it on every request, Mem0 handles the memory side of things for you extracting what matters, storing it, and making it available when it's relevant.

How it differs from storing chat history

When you store chat history, you're storing everything every message, in order, as it happened. Nothing is filtered, nothing is prioritized, and the whole thing gets sent back to the model on the next request regardless of whether any of it is actually useful.
Mem0 works differently. After each exchange, it processes the conversation and pulls out meaningful facts things like user preferences, stated goals, technical context, or any detail worth remembering long term. Those facts are stored as discrete memory objects, not as raw messages. When a new request comes in, Mem0 retrieves only the memories that are relevant to it and surfaces them to your LLM as context.
The practical difference is significant. Instead of a growing message array that the model has to wade through, the model gets a compact, relevant summary of what it actually needs to know about the user.

Tradioton llm approach vs mem0 approach

Async processing and automatic fact extraction

Memory extraction in Mem0 happens asynchronously it doesn't block your API response. After your LLM replies, Mem0 processes the conversation in the background, extracts facts, and updates the memory store. It also handles deduplication, so if a user mentions the same detail across multiple conversations, Mem0 won't create redundant memory entries it updates the existing one instead.
This means your API stays fast, and the memory store stays clean over time without any extra work on your end.

Project Setup

In this section, we'll get the project initialized, install everything we need, and configure our environment variables. By the end of this section you should have a working foundation to start building on.

Prerequisites

Before getting started, make sure you have the following:

  • Node.js v18 or higher
  • A Groq account for the LLM API key
  • A Mem0 account for the memory layer API key

Installing dependencies

First, create your project folder and initialize it:

mkdir mem0-express-api && cd mem0-express-api
npm init -y
Enter fullscreen mode Exit fullscreen mode

Then install the runtime dependencies:

npm install express mem0ai openai dotenv
Enter fullscreen mode Exit fullscreen mode

And the dev dependencies:

npm install -D typescript ts-node nodemon @types/express @types/node
Enter fullscreen mode Exit fullscreen mode

Finally, initialize TypeScript:

npx tsc --init
Enter fullscreen mode Exit fullscreen mode

A quick note on the openai package: even though we're using Groq as our LLM provider, Groq's API is OpenAI-compatible, so the openai SDK works against it directly you just point it at Groq's base URL.

Add your scripts

Update your package.json scripts section to look like this:

"scripts": {
  "dev": "nodemon --exec ts-node src/index.ts",
  "build": "tsc",
  "start": "node dist/index.js"
}
Enter fullscreen mode Exit fullscreen mode

Environment variables

Create a .env file at the root of your project:

GROQ_API_KEY=your_groq_api_key
MEM0_API_KEY=your_mem0_api_key
PORT=3000
Enter fullscreen mode Exit fullscreen mode

You can grab your Groq API key from the Groq console and your Mem0 API key from the Mem0 dashboard. Never commit this file add it to your .gitignore.

Building the API

With the project set up, let's walk through the actual code. Here's how everything is structured:

mem0-express-api/
├── src/
│   ├── lib/
│   │   └── mem0.ts
│   ├── routes/
│   │   └── chat.ts
│   └── index.ts
├── .env
├── .gitignore
├── package.json
└── tsconfig.json
Enter fullscreen mode Exit fullscreen mode

Go ahead and create the folders and files to match this structure before we start filling them in.

Setting up the entry point

src/index.ts is where the Express app is initialized. It loads environment variables, registers the chat router under the /chat path, and starts the server:

import "dotenv/config";
import express from "express";
import chatRouter from "./routes/chat";

const app = express();
const PORT = process.env.PORT || 3000;

app.use(express.json());
app.use("/chat", chatRouter);

app.listen(PORT, () => {
  console.log(`Server running on http://localhost:${PORT}`);
});
Enter fullscreen mode Exit fullscreen mode

Initializing the Mem0 client

src/lib/mem0.ts initializes the Mem0 client once and exports it for use across the app:

import { MemoryClient } from "mem0ai";

const client = new MemoryClient({
  apiKey: process.env.MEM0_API_KEY!,
});

export default client;
Enter fullscreen mode Exit fullscreen mode

Keeping this in its own file means you initialize the client once and import it wherever you need it, rather than recreating it on every request.

The chat routes

All the route logic lives in src/routes/chat.ts. Here's the full file:

import { Router, Request, Response } from "express";
import OpenAI from "openai";
import mem0 from "../lib/mem0";

const router = Router();

const openai = new OpenAI({
  apiKey: process.env.GROQ_API_KEY!,
  baseURL: "https://api.groq.com/openai/v1",
});

// POST /chat/:userId
router.post("/:userId", async (req: Request, res: Response) => {
  const { userId } = req.params;
  const { message, sessionId } = req.body;

  if (!message) {
    res.status(400).json({ error: "Message is required" });
    return;
  }

  try {
    const memories = await mem0.search(message, { user_id: userId } as any);
    const memoryContext = memories.map((m: any) => m.memory).join("\n");

    const systemPrompt = `You are a helpful assistant with memory of past interactions.${
      memoryContext ? `\nWhat you remember about this user:\n${memoryContext}` : ""
    }`;

    const completion = await openai.chat.completions.create({
      model: "llama-3.3-70b-versatile",
      messages: [
        { role: "system", content: systemPrompt },
        { role: "user", content: message },
      ],
    });

    const reply = completion.choices[0].message.content;

    const addOptions: any = { user_id: userId };
    if (sessionId) addOptions.run_id = sessionId;

    await mem0.add(
      [
        { role: "user", content: message },
        { role: "assistant", content: reply! },
      ],
      addOptions
    );

    res.json({ reply, userId, sessionId: sessionId || null });
  } catch (error: any) {
    res.status(500).json({ error: error.message });
  }
});

// GET /chat/:userId/memories
router.get("/:userId/memories", async (req: Request, res: Response) => {
  const { userId } = req.params;

  try {
    const memories = await mem0.getAll({ user_id: userId } as any);
    res.json({ userId, memories });
  } catch (error: any) {
    res.status(500).json({ error: error.message });
  }
});

// DELETE /chat/:userId/memories
router.delete("/:userId/memories", async (req: Request, res: Response) => {
  const { userId } = req.params;

  try {
    await mem0.deleteAll({ user_id: userId } as any);
    res.json({ message: `All memories cleared for user ${userId}` });
  } catch (error: any) {
    res.status(500).json({ error: error.message });
  }
});

// POST /chat/:userId/no-memory
router.post("/:userId/no-memory", async (req: Request, res: Response) => {
  const { userId } = req.params;
  const { message } = req.body;

  if (!message) {
    res.status(400).json({ error: "Message is required" });
    return;
  }

  try {
    const completion = await openai.chat.completions.create({
      model: "llama-3.3-70b-versatile",
      messages: [
        { role: "system", content: "You are a helpful assistant." },
        { role: "user", content: message },
      ],
    });

    const reply = completion.choices[0].message.content;
    res.json({ reply, userId, memoryUsed: false });
  } catch (error: any) {
    res.status(500).json({ error: error.message });
  }
});

export default router;
Enter fullscreen mode Exit fullscreen mode

Now let's break down what each route does.

POST /chat/:userId is the core route. When a message comes in, it first searches Mem0 for any memories relevant to that message and injects them into the system prompt. The LLM then responds with that context already in place. After the response is generated, the full exchange is passed to mem0.add() which processes it asynchronously and extracts facts worth remembering. The optional sessionId in the request body scopes the memory to a specific conversation thread via run_id useful if one user has multiple distinct conversation contexts.

GET /chat/:userId/memories returns everything Mem0 has stored for a given user. We'll use this in Section 6 to inspect what was actually extracted after a conversation.

DELETE /chat/:userId/memories wipes all stored memories for a user. Useful for resetting state during testing or giving users a clean slate.

POST /chat/:userId/no-memory is a control route that calls the same LLM with the same model but with no memory context at all just a plain system prompt. We'll use this in the next section to show the difference between a memory-aware response and a stateless one side by side.

How Mem0 Stores Memories

When you call mem0.add() after an exchange, Mem0 doesn't just save the raw messages. It processes the conversation, pulls out the facts that matter, and stores them as structured memory objects. Here's what that actually looks like.

After a few conversations as user123, calling GET /chat/user123/memories returns something like this:

{
  "userId": "user123",
  "memories": [
    {
      "id": "a9a14f57-e0ae-4533-82a1-c5e93364fb7b",
      "memory": "Ayomide loves table tennis and enjoys trying different foods",
      "user_id": "user123",
      "categories": ["sports", "food"],
      "created_at": "2026-03-19T05:38:37-07:00",
      "updated_at": "2026-03-19T05:38:41.614883-07:00",
      "structured_attributes": {
        "day": 19,
        "hour": 12,
        "year": 2026,
        "month": 3,
        "day_of_week": "thursday"
      }
    },
    {
      "id": "f0c175d5-1b9a-462a-bf17-88d287271df0",
      "memory": "Ayomide is building an API platform called APIblok",
      "user_id": "user123",
      "categories": ["professional_details", "technology"],
      "created_at": "2026-03-18T11:19:19-07:00",
      "updated_at": "2026-03-18T11:19:17.557120-07:00",
      "structured_attributes": {
        "day": 18,
        "hour": 18,
        "year": 2026,
        "month": 3,
        "day_of_week": "wednesday"
      }
    },
    {
      "id": "ea262dae-72af-48d9-a8e8-4b18cdefa9a5",
      "memory": "Ayomide has 3 years of experience with Node.js and prefers using TypeScript",
      "user_id": "user123",
      "categories": ["professional_details", "technology"],
      "created_at": "2026-03-18T10:55:20-07:00",
      "updated_at": "2026-03-18T10:56:45.972173-07:00",
      "structured_attributes": {
        "day": 18,
        "hour": 17,
        "year": 2026,
        "month": 3,
        "day_of_week": "wednesday"
      }
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

A few things worth noticing here.

The memory field is a clean, distilled fact not a raw message. Mem0 didn't store "hey I love table tennis and I also like trying new foods", it stored "Ayomide loves table tennis and enjoys trying different foods". The extraction is deliberate and concise.

Categories are automatically assigned. Each memory object comes with a categories array ["sports", "food"], ["professional_details", "technology"] and so on. Mem0 figures these out on its own without any configuration from you. This is what makes retrieval intelligent — when a new message comes in, Mem0 knows which memories are actually relevant to surface.

Deduplication happens automatically. If the user mentions the same detail across multiple conversations, Mem0 won't create a new memory entry every time it updates the existing one instead. You can see this in the updated_at field, which can differ from created_at when a memory has been refined over time.

structured_attributes adds temporal context — recording when the memory was created down to the hour and day of week. This gives Mem0 the ability to reason about recency when deciding what to retrieve.

You can also view and manage all of this visually from the Mem0 dashboard:

Mem0 dashabord

The dashboard gives you a real-time view of what's been extracted for each user, which is particularly useful when you're debugging or want to verify that the right facts are being stored.

Testing the API

With the project set up, you'll need two terminal windows open simultaneously for this section. In the first terminal, start the dev server:

npm run dev
Enter fullscreen mode Exit fullscreen mode

You should see Server running on http://localhost:3000. Leave this terminal running — the server needs to stay alive to handle requests. Open a second terminal window to send the curl commands. If you're on Windows, use Command Prompt (cmd) for both terminals rather than PowerShell. PowerShell maps curl to its own Invoke-WebRequest command, which won't work with these examples. If you prefer Git Bash, that works too — it ships with Git for Windows and handles curl correctly.

Windows cmd users: The multiline curl commands below use \ for line continuation, which is a macOS/Linux convention. Windows cmd doesn't support it if you run them as-is, you'll get curl: (3) URL rejected: Bad hostname errors after each line. The request itself will still go through, but to avoid the noise, collapse each command onto a single line. For example:

  curl -X POST http://localhost:3000/chat/user123 -H "Content-Type: application/json" -d "{\"message\": \"Your message here\"}"
Enter fullscreen mode Exit fullscreen mode

Sending your first message

Start with a message that gives the model something worth remembering — some personal or technical context:

curl -X POST http://localhost:3000/chat/user123 \
  -H "Content-Type: application/json" \
  -d "{\"message\": \"Hey, I'm Ayomide. I have 3 years of experience with Node.js and I prefer TypeScript over plain JavaScript.\"}"
Enter fullscreen mode Exit fullscreen mode

You should get back something like:

{
  "reply": "Hi Ayomide! That's a solid background — 3 years with Node.js and a preference for TypeScript is a great combo. TypeScript's type safety really pays off in larger codebases. What are you working on?",
  "userId": "user123",
  "sessionId": null
}
Enter fullscreen mode Exit fullscreen mode

The response itself is unremarkable the model is working with just what you sent it. What matters is what's happening in the background: Mem0 is processing this exchange and extracting facts from it asynchronously.
Wait a few seconds, then send another message to give Mem0 time to finish processing:

curl -X POST http://localhost:3000/chat/user123 \
  -H "Content-Type: application/json" \
  -d "{\"message\": \"I'm currently building an API platform called APIblok.\"}"
Enter fullscreen mode Exit fullscreen mode

Now check what Mem0 has stored:

curl http://localhost:3000/chat/user123/memories
Enter fullscreen mode Exit fullscreen mode

You should see something like:

{
  "userId": "user123",
  "memories": [
    {
      "id": "ec30b7df-38f5-411d-af13-4d42befe4a3e",
      "memory": "Ayomide is currently building an API platform called APIblok.",
      "user_id": "user123",
      "categories": null,
      "created_at": "2026-03-19T14:08:07-07:00",
      "updated_at": "2026-03-19T14:08:07-07:00",
      "structured_attributes": {
        "day": 19,
        "hour": 21,
        "year": 2026,
        "month": 3,
        "day_of_week": "thursday"
      }
    },
    {
      "id": "b0217d62-f54d-4b7e-bca1-e6cf528ee09a",
      "memory": "Ayomide has 3 years of experience with Node.js and prefers TypeScript over plain JavaScript.",
      "user_id": "user123",
      "categories": ["professional_details", "technology"],
      "created_at": "2026-03-19T14:07:18-07:00",
      "updated_at": "2026-03-19T14:08:26.222857-07:00",
      "structured_attributes": {
        "day": 19,
        "hour": 21,
        "year": 2026,
        "month": 3,
        "day_of_week": "thursday"
      }
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

Mem0 didn't store your raw messages. It extracted the facts that actually matter your experience level, your tech preferences, what you're building and stored them as clean, discrete memory objects. That's the extraction step working exactly as intended.

The before/after comparison

To make the difference concrete, the API has a /no-memory route that hits the same LLM with the same model but with zero memory context. Let's use it.

First, send a follow-up question through the memory-aware route:

curl -X POST http://localhost:3000/chat/user123 \
  -H "Content-Type: application/json" \
  -d "{\"message\": \"What framework would you recommend for my project?\"}"
Enter fullscreen mode Exit fullscreen mode

Sample Response:

{
  "reply": "Given that you're building APIblok an API platform  and you're comfortable with TypeScript, I'd recommend sticking with Express if you want something minimal and flexible, or looking at Fastify if performance is a priority. Both have strong TypeScript support. Hono is also worth a look if you're targeting edge runtimes.",
  "userId": "user123"
}
Enter fullscreen mode Exit fullscreen mode

The response is specific. It knows what you're building, knows you're a TypeScript person, and gives a recommendation that fits your actual context.

Now send the exact same message through the stateless route:

curl -X POST http://localhost:3000/chat/user123/no-memory \
  -H "Content-Type: application/json" \
  -d "{\"message\": \"What framework would you recommend for my project?\"}"
Enter fullscreen mode Exit fullscreen mode

Sample Response:

{
  "reply": "To provide a framework recommendation that fits your needs, I'd love to know more about your project. Here are a few questions to get started:\n\n1. What type of project is it?\n2. What programming languages are you planning to use?\n3. What are the main features and functionalities?\n4. Do you have any specific requirements or constraints?\n5. Are there any particular technologies you're interested in integrating with?",
  "userId": "user123",
  "memoryUsed": false
}
Enter fullscreen mode Exit fullscreen mode

Same model, same question and instead of an answer, you get an interrogation. The stateless route has no idea who you are, what you're building, or what language you use, so it fires back five clarifying questions asking you to repeat everything you've already told it. The memory-aware route answered directly because it already had the context it needed.
This is the ceiling that plain conversation history hits across sessions. Memory context closes that gap without any extra work from the user.

What the memory objects look like

If you've been following along, the memory objects from GET /chat/:userId/memories should look familiar — we walked through the full structure in the previous section. But now that you've seen them generated live from real conversations, the structure makes more sense in context.
A few things worth reinforcing:
The memory field is a distilled fact, not a raw message. Mem0 didn't store "hey I have 3 years with Node and I prefer TypeScript" it stored "Ayomide has 3 years of experience with Node.js and prefers using TypeScript". The extraction is deliberate and concise.

Categories are assigned automatically. Mem0 decided ["professional_details", "technology"] without any configuration from you. This is what makes retrieval intelligent when a new message comes in, Mem0 knows which memories are actually relevant to surface rather than dumping everything into the context.
Deduplication happens in the background. Try sending the same kind of detail twice across two separate requests:

curl -X POST http://localhost:3000/chat/user123 \
  -H "Content-Type: application/json" \
  -d "{\"message\": \"Just a reminder — I work primarily in TypeScript.\"}"
Enter fullscreen mode Exit fullscreen mode
{
  "reply": "I recall that you prefer working with TypeScript over plain JavaScript, and that you have around 3 years of experience with Node.js. You're also currently building an API platform called APIblok. How can I assist you with your TypeScript or APIblok project today?",
  "userId": "user123",
  "sessionId": null
}
Enter fullscreen mode Exit fullscreen mode

The model already knows. Now check the memories endpoint you won't find a new entry for TypeScript. Mem0 recognized it as information it already had and updated the updated_at timestamp on the existing memory rather than creating a redundant one. The store stays clean automatically, with no deduplication logic on your end.
To reset your state between test runs, hit the DELETE endpoint:

curl -X DELETE http://localhost:3000/chat/user123/memories
Enter fullscreen mode Exit fullscreen mode

This wipes all stored memories for user123 and gives you a clean slate to test from scratch.

Conclusion

Most AI applications treat memory as an afterthought they replay conversation history on every request and call it context. That works until it doesn't: context windows fill up, costs climb, and the moment a user starts a new session, everything they've told the system is gone. The problem isn't the LLM. It's the architecture around it.

What we built here takes a different approach. By adding Mem0 as a dedicated memory layer, the API extracts what matters from each conversation, stores it as structured facts, and retrieves only what's relevant on the next request. The result is a system that gets more useful over time — not one that forgets everything the moment a session ends.

To recap what we covered: setting up an Express and TypeScript API with Groq as the LLM provider, initializing Mem0 and wiring it into the request lifecycle, building routes for memory-aware chat, memory retrieval, memory deletion, and a stateless control route for comparison, and testing the whole thing end to end to see extraction, retrieval, and deduplication working in practice.

From here, a few natural directions to explore: scope memories to specific conversation threads using run_id, swap the Mem0 cloud client for a self-hosted instance if you need full data control, or wire this API into a frontend to build a chat interface that actually remembers its users.
The full source code for this tutorial is available on GitHub: [https://github.com/techsplot/mem0-express-api]

Top comments (0)