Jonathanfarrow

Posted on Mar 11

RAG breaks for persistent user context. Here is a React agent that uses a context graph instead.

#ai #tutorial #beginners #typescript

RAG helped LLMs search text. It did not give them memory.

If you've built anything serious with LLMs, you've seen the failure mode. A user tells your system they are allergic to nuts, live in Manchester, and have a budget of five grand. Later, the model either forgets, retrieves conflicting fragments, or guesses.

That is not memory. It is retrieval.

Most teams start with RAG. That makes sense. It is useful, familiar, and easy to bolt on. But RAG is fundamentally a document retrieval pattern. It chunks text, embeds it, and returns similar passages at query time. It can surface relevant information, but it does not actually model entities, relationships, or state changes.

If a user says their budget is €5,000 and later updates it to €7,000, a RAG system will often retrieve both passages and leave the LLM to reconcile them. Sometimes it gets that right. Sometimes it does not.

The next step beyond RAG was agentic memory.

Instead of storing raw text and hoping retrieval is enough, agentic memory systems extract structured knowledge that persists across sessions. They track people, facts, relationships, preferences, and constraints as first-class objects. The system does not just find a sentence mentioning Tom. It knows Tom is Sarah’s husband, Lily has a nut allergy, and the family is flying from Manchester.

But even that is not the full story.

Agentic cognition is what happens when the system does more than store and retrieve structured knowledge. It reasons over it. It combines multiple memory structures, resolves entities, tracks state transitions, and produces answers that are grounded in context rather than copied from stored text.

That is where MINNS sits.

You feed it conversations. It builds a context graph. When you query it with nlq(), it does not just retrieve text. It resolves the entities involved, reasons across the graph, and returns a direct answer from the current state of that context.

In this tutorial, we will build a React-style agent with MINNS using the ReAct pattern.

Not an API wrapper. An actual agent.

One that decides what it needs to know, queries a context graph, observes the result, and loops until it has enough information to answer properly.

What we are building

We are going to build a small TypeScript agent that:

ingests a conversation into MINNS
gives an LLM access to MINNS as a tool
lets the LLM query memory during reasoning
loops until it has enough information to answer

The result is a simple but real agent loop. The model does not just generate a response from a prompt. It thinks, acts, observes, and only then answers.

The ReAct pattern

If you have built LLM agents before, you have probably seen ReAct. It is the standard loop for tool-using agents:

Think: decide what information is needed
Act: call a tool
Observe: inspect the result
Repeat or respond: either continue reasoning or answer

This is what separates an agent from a basic chatbot. The model is not just generating text. It is deciding what it needs, fetching that information, evaluating the result, and then deciding what to do next.

In a RAG-based system, the tool usually returns text chunks.

With MINNS, the tool returns answers grounded in a structured, evolving context.

That is the key difference. The system is not forcing the LLM to reconstruct truth from loosely related passages. It is querying a context graph that already models who, what, how, and when.

Prerequisites

You will need:

Node.js 18+
a MINNS API key
an OpenAI API key
basic TypeScript familiarity

Step 1: Project setup

Create a new project and install the dependencies:

mkdir my-agent && cd my-agent
npm init -y
npm install minns-sdk openai dotenv readline
npm install -D typescript tsx @types/node

Create tsconfig.json:

{
  "compilerOptions": {
    "target": "ES2022",
    "module": "ESNext",
    "moduleResolution": "bundler",
    "esModuleInterop": true,
    "strict": true,
    "outDir": "dist"
  },
  "include": ["src"]
}

Create .env:

MINNS_API_KEY=your_key_here
OPENAI_API_KEY=your_key_here

Step 2: Set up the clients

Create src/agent.ts:

import { MinnsClient } from 'minns-sdk';
import OpenAI from 'openai';
import dotenv from 'dotenv';
import * as readline from 'readline';

dotenv.config();

const minns = new MinnsClient({
  apiKey: process.env.MINNS_API_KEY!,
});

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY!,
});

We are using two clients:

MINNS for memory and context reasoning
OpenAI for the agent loop itself

The LLM decides what to ask. MINNS provides the structured context.

Step 3: Ingest a conversation

Before the agent can reason over anything, we need memory to exist.

Here we ingest a short travel-planning conversation into MINNS:

async function ingest() {
  await minns.ingestConversations({
    case_id: 'travel-booking-2024',
    sessions: [{
      session_id: 'session-1',
      topic: 'holiday-planning',
      messages: [
        { role: 'user', content: "Hi, I'm planning a trip to the Amalfi Coast for my family." },
        { role: 'assistant', content: "Lovely! When are you thinking of traveling?" },
        { role: 'user', content: "Late June. Budget is about 5000 euros." },
        { role: 'assistant', content: "How many people?" },
        { role: 'user', content: "Me, my husband Tom, and our kids Lily who is 8 and Max who is 5." },
        { role: 'assistant', content: "Any dietary requirements?" },
        { role: 'user', content: "Yes — Lily is allergic to nuts." },
        { role: 'user', content: "Oh and we live in Manchester, so flights from Manchester Airport please." },
      ],
    }],
  });

  console.log('Conversation ingested.');
}

This is where MINNS starts to differ from plain retrieval.

It is not just storing messages. It is extracting structured facts and linking them into a context graph. That means the system can represent things like:

Lily is a child in the travelling group
Lily has a nut allergy
Tom is the user’s husband
the family departs from Manchester
the current trip budget is €5,000

Those are not just words in a chunk. They are connected pieces of state.

Step 4: Define the tools

Now we expose MINNS to the LLM through tools.

We will give the model two capabilities:

nlq to ask MINNS a natural-language question
search_claims to inspect extracted facts directly

const TOOLS = [
  {
    type: 'function' as const,
    function: {
      name: 'nlq',
      description: 'Ask MINNS a question. It reasons over the context graph and returns a direct answer.',
      parameters: {
        type: 'object',
        properties: {
          question: { type: 'string' }
        },
        required: ['question'],
      },
    },
  },
  {
    type: 'function' as const,
    function: {
      name: 'search_claims',
      description: 'Search extracted claims and facts from the conversation.',
      parameters: {
        type: 'object',
        properties: {
          query: { type: 'string' }
        },
        required: ['query'],
      },
    },
  },
];

Next, we implement the bridge between the LLM and MINNS:

async function executeTool(name: string, args: Record<string, string>): Promise<string> {
  if (name === 'nlq') {
    const result = await minns.nlq({ question: args.question });

    return JSON.stringify({
      answer: result.answer,
      confidence: result.confidence,
      entities_resolved: result.entities_resolved,
    });
  }

  if (name === 'search_claims') {
    const results = await minns.searchClaims({
      query_text: args.query,
      top_k: 5,
    });

    return JSON.stringify(results.map(r => ({
      fact: r.claim_text,
      confidence: r.confidence,
      entity: r.subject_entity,
    })));
  }

  return '{}';
}

A few things are happening here:

nlq() gives the model a direct way to query the context graph
searchClaims() gives it a lower-level fact lookup tool
both tool results are returned as JSON so the LLM can reason over them in the next step

This is the important architectural point: the LLM is not scraping meaning from raw message history. It is interacting with a structured memory system.

Step 5: Build the ReAct loop

Now we implement the agent loop itself.

This is where the model gets to think, decide which tool to use, observe the results, and either continue or answer.

const MAX_ITERATIONS = 5;

async function agentLoop() {
  const rl = readline.createInterface({
    input: process.stdin,
    output: process.stdout,
  });

  const history: OpenAI.ChatCompletionMessageParam[] = [];

  console.log('Agent ready. Ask anything about the conversation.\n');

  for await (const question of rl) {
    history.push({ role: 'user', content: question });

    for (let i = 0; i < MAX_ITERATIONS; i++) {
      const response = await openai.chat.completions.create({
        model: 'gpt-4o-mini',
        messages: [
          {
            role: 'system',
            content: `You are a helpful travel agent with access to a structured memory system.
Use the nlq tool to ask MINNS questions.
Use search_claims when you need specific extracted facts.
Only answer the user when you have enough information.
If the answer is incomplete, query again.`,
          },
          ...history,
        ],
        tools: TOOLS,
      });

      const message = response.choices[0].message;
      history.push(message);

      if (!message.tool_calls?.length) {
        console.log(`\nAgent: ${message.content}\n`);
        break;
      }

      for (const toolCall of message.tool_calls) {
        const args = JSON.parse(toolCall.function.arguments);

        console.log(`  [${toolCall.function.name}]: ${args.question || args.query}`);

        const result = await executeTool(toolCall.function.name, args);

        history.push({
          role: 'tool',
          tool_call_id: toolCall.id,
          content: result,
        });
      }
    }
  }
}

This loop is the core of the agent.

On each iteration:

the model sees the user’s question and the previous tool outputs
it decides whether it needs to call a tool
if it does, the tool result is added back into the history
the model thinks again using the new information
once it has enough context, it responds

That is ReAct in practice.

The model might answer in one step for a simple question.

For a harder question, it might:

query dietary constraints
inspect family structure
check ages of the children
then combine all of that into a final answer

The agent decides how many steps it needs.

Step 6: Run the agent

Now wire it together:

async function main() {
  await ingest();
  await agentLoop();
}

main();

Start the script:

npx tsx src/agent.ts

Example session:

Conversation ingested.
Agent ready. Ask anything about the conversation.

> What should I know before booking restaurants?
  [nlq]: Are there any dietary requirements for this trip?
  [search_claims]: children ages

Agent: Lily has a nut allergy, so restaurant choices need to be allergy-safe and staff should be able to confirm ingredients clearly. The family is also travelling with two young children, ages 8 and 5, so family-friendly options would make sense. You should also keep total dining spend in line with the overall trip budget of around €5,000.

> Where are they flying from?
  [nlq]: Where is the family departing from?

Agent: They are flying from Manchester Airport.

Notice what happened in the first example.

The model did not answer immediately. It realised it needed more context, used two tools, inspected the results, and then produced a joined-up response. That is the difference between tool use and simple prompting.

Full file

Here is the full working example in one file:

import { MinnsClient } from 'minns-sdk';
import OpenAI from 'openai';
import dotenv from 'dotenv';
import * as readline from 'readline';

dotenv.config();

const minns = new MinnsClient({
  apiKey: process.env.MINNS_API_KEY!,
});

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY!,
});

const TOOLS = [
  {
    type: 'function' as const,
    function: {
      name: 'nlq',
      description: 'Ask MINNS a question. It reasons over the context graph and returns a direct answer.',
      parameters: {
        type: 'object',
        properties: {
          question: { type: 'string' },
        },
        required: ['question'],
      },
    },
  },
  {
    type: 'function' as const,
    function: {
      name: 'search_claims',
      description: 'Search extracted claims and facts from the conversation.',
      parameters: {
        type: 'object',
        properties: {
          query: { type: 'string' },
        },
        required: ['query'],
      },
    },
  },
];

async function executeTool(name: string, args: Record<string, string>): Promise<string> {
  if (name === 'nlq') {
    const result = await minns.nlq({ question: args.question });

    return JSON.stringify({
      answer: result.answer,
      confidence: result.confidence,
      entities_resolved: result.entities_resolved,
    });
  }

  if (name === 'search_claims') {
    const results = await minns.searchClaims({
      query_text: args.query,
      top_k: 5,
    });

    return JSON.stringify(results.map(r => ({
      fact: r.claim_text,
      confidence: r.confidence,
      entity: r.subject_entity,
    })));
  }

  return '{}';
}

async function ingest() {
  await minns.ingestConversations({
    case_id: 'travel-booking-2024',
    sessions: [{
      session_id: 'session-1',
      topic: 'holiday-planning',
      messages: [
        { role: 'user', content: "Hi, I'm planning a trip to the Amalfi Coast for my family." },
        { role: 'assistant', content: "Lovely! When are you thinking of traveling?" },
        { role: 'user', content: "Late June. Budget is about 5000 euros." },
        { role: 'assistant', content: "How many people?" },
        { role: 'user', content: "Me, my husband Tom, and our kids Lily who is 8 and Max who is 5." },
        { role: 'assistant', content: "Any dietary requirements?" },
        { role: 'user', content: "Yes — Lily is allergic to nuts." },
        { role: 'user', content: "Oh and we live in Manchester, so flights from Manchester Airport please." },
      ],
    }],
  });

  console.log('Conversation ingested.');
}

const MAX_ITERATIONS = 5;

async function agentLoop() {
  const rl = readline.createInterface({
    input: process.stdin,
    output: process.stdout,
  });

  const history: OpenAI.ChatCompletionMessageParam[] = [];

  console.log('Agent ready. Ask anything about the conversation.\n');

  for await (const question of rl) {
    history.push({ role: 'user', content: question });

    for (let i = 0; i < MAX_ITERATIONS; i++) {
      const response = await openai.chat.completions.create({
        model: 'gpt-4o-mini',
        messages: [
          {
            role: 'system',
            content: `You are a helpful travel agent with access to a structured memory system.
Use the nlq tool to ask MINNS questions.
Use search_claims when you need specific extracted facts.
Only answer the user when you have enough information.
If the answer is incomplete, query again.`,
          },
          ...history,
        ],
        tools: TOOLS,
      });

      const message = response.choices[0].message;
      history.push(message);

      if (!message.tool_calls?.length) {
        console.log(`\nAgent: ${message.content}\n`);
        break;
      }

      for (const toolCall of message.tool_calls) {
        const args = JSON.parse(toolCall.function.arguments);

        console.log(`  [${toolCall.function.name}]: ${args.question || args.query}`);

        const result = await executeTool(toolCall.function.name, args);

        history.push({
          role: 'tool',
          tool_call_id: toolCall.id,
          content: result,
        });
      }
    }
  }
}

async function main() {
  await ingest();
  await agentLoop();
}

main();

Why this is different from RAG

A RAG-based agent usually works like this:

embed the query
retrieve top-k chunks
put those chunks into the prompt
ask the LLM to infer the answer

That works well for knowledge-base search. It works less well for evolving user context.

MINNS changes the shape of the tool call.

Instead of returning text passages, it returns answers and facts grounded in a structured memory model. That means the LLM is doing less reconstruction from raw text and more decision-making over stateful context.

That matters when the memory is dynamic.

If the user updates their budget from €5,000 to €7,000, that is not just another sentence in a document. It is a state transition. The system needs to understand that the current value supersedes the old one.

That is the difference between retrieval and cognition.

Where to take this next

This example is intentionally small, but the pattern scales.

From here, you can extend the agent with:

reflection steps between tool calls
goal tracking
multi-turn planning
frontend visibility into reasoning steps
temporal queries over evolving state
structured memory types such as ledgers, preferences, and state machines

At that point, you are no longer building a chatbot with retrieval attached.

You are building an agent that can think over memory.

DEV Community