RAG helped LLMs search text. It did not give them memory.
If you've built anything serious with LLMs, you've seen the failure mode. A user tells your system they are allergic to nuts, live in Manchester, and have a budget of five grand. Later, the model either forgets, retrieves conflicting fragments, or guesses.
That is not memory. It is retrieval.
Most teams start with RAG. That makes sense. It is useful, familiar, and easy to bolt on. But RAG is fundamentally a document retrieval pattern. It chunks text, embeds it, and returns similar passages at query time. It can surface relevant information, but it does not actually model entities, relationships, or state changes.
If a user says their budget is €5,000 and later updates it to €7,000, a RAG system will often retrieve both passages and leave the LLM to reconcile them. Sometimes it gets that right. Sometimes it does not.
The next step beyond RAG was agentic memory.
Instead of storing raw text and hoping retrieval is enough, agentic memory systems extract structured knowledge that persists across sessions. They track people, facts, relationships, preferences, and constraints as first-class objects. The system does not just find a sentence mentioning Tom. It knows Tom is Sarah’s husband, Lily has a nut allergy, and the family is flying from Manchester.
But even that is not the full story.
Agentic cognition is what happens when the system does more than store and retrieve structured knowledge. It reasons over it. It combines multiple memory structures, resolves entities, tracks state transitions, and produces answers that are grounded in context rather than copied from stored text.
That is where MINNS sits.
You feed it conversations. It builds a context graph. When you query it with nlq(), it does not just retrieve text. It resolves the entities involved, reasons across the graph, and returns a direct answer from the current state of that context.
In this tutorial, we will build a React-style agent with MINNS using the ReAct pattern.
Not an API wrapper. An actual agent.
One that decides what it needs to know, queries a context graph, observes the result, and loops until it has enough information to answer properly.
What we are building
We are going to build a small TypeScript agent that:
- ingests a conversation into MINNS
- gives an LLM access to MINNS as a tool
- lets the LLM query memory during reasoning
- loops until it has enough information to answer
The result is a simple but real agent loop. The model does not just generate a response from a prompt. It thinks, acts, observes, and only then answers.
The ReAct pattern
If you have built LLM agents before, you have probably seen ReAct. It is the standard loop for tool-using agents:
- Think: decide what information is needed
- Act: call a tool
- Observe: inspect the result
- Repeat or respond: either continue reasoning or answer
This is what separates an agent from a basic chatbot. The model is not just generating text. It is deciding what it needs, fetching that information, evaluating the result, and then deciding what to do next.
In a RAG-based system, the tool usually returns text chunks.
With MINNS, the tool returns answers grounded in a structured, evolving context.
That is the key difference. The system is not forcing the LLM to reconstruct truth from loosely related passages. It is querying a context graph that already models who, what, how, and when.
Prerequisites
You will need:
- Node.js 18+
- a MINNS API key
- an OpenAI API key
- basic TypeScript familiarity
Step 1: Project setup
Create a new project and install the dependencies:
mkdir my-agent && cd my-agent
npm init -y
npm install minns-sdk openai dotenv readline
npm install -D typescript tsx @types/node
Create tsconfig.json:
{
"compilerOptions": {
"target": "ES2022",
"module": "ESNext",
"moduleResolution": "bundler",
"esModuleInterop": true,
"strict": true,
"outDir": "dist"
},
"include": ["src"]
}
Create .env:
MINNS_API_KEY=your_key_here
OPENAI_API_KEY=your_key_here
Step 2: Set up the clients
Create src/agent.ts:
import { MinnsClient } from 'minns-sdk';
import OpenAI from 'openai';
import dotenv from 'dotenv';
import * as readline from 'readline';
dotenv.config();
const minns = new MinnsClient({
apiKey: process.env.MINNS_API_KEY!,
});
const openai = new OpenAI({
apiKey: process.env.OPENAI_API_KEY!,
});
We are using two clients:
- MINNS for memory and context reasoning
- OpenAI for the agent loop itself
The LLM decides what to ask. MINNS provides the structured context.
Step 3: Ingest a conversation
Before the agent can reason over anything, we need memory to exist.
Here we ingest a short travel-planning conversation into MINNS:
async function ingest() {
await minns.ingestConversations({
case_id: 'travel-booking-2024',
sessions: [{
session_id: 'session-1',
topic: 'holiday-planning',
messages: [
{ role: 'user', content: "Hi, I'm planning a trip to the Amalfi Coast for my family." },
{ role: 'assistant', content: "Lovely! When are you thinking of traveling?" },
{ role: 'user', content: "Late June. Budget is about 5000 euros." },
{ role: 'assistant', content: "How many people?" },
{ role: 'user', content: "Me, my husband Tom, and our kids Lily who is 8 and Max who is 5." },
{ role: 'assistant', content: "Any dietary requirements?" },
{ role: 'user', content: "Yes — Lily is allergic to nuts." },
{ role: 'user', content: "Oh and we live in Manchester, so flights from Manchester Airport please." },
],
}],
});
console.log('Conversation ingested.');
}
This is where MINNS starts to differ from plain retrieval.
It is not just storing messages. It is extracting structured facts and linking them into a context graph. That means the system can represent things like:
- Lily is a child in the travelling group
- Lily has a nut allergy
- Tom is the user’s husband
- the family departs from Manchester
- the current trip budget is €5,000
Those are not just words in a chunk. They are connected pieces of state.
Step 4: Define the tools
Now we expose MINNS to the LLM through tools.
We will give the model two capabilities:
-
nlqto ask MINNS a natural-language question -
search_claimsto inspect extracted facts directly
const TOOLS = [
{
type: 'function' as const,
function: {
name: 'nlq',
description: 'Ask MINNS a question. It reasons over the context graph and returns a direct answer.',
parameters: {
type: 'object',
properties: {
question: { type: 'string' }
},
required: ['question'],
},
},
},
{
type: 'function' as const,
function: {
name: 'search_claims',
description: 'Search extracted claims and facts from the conversation.',
parameters: {
type: 'object',
properties: {
query: { type: 'string' }
},
required: ['query'],
},
},
},
];
Next, we implement the bridge between the LLM and MINNS:
async function executeTool(name: string, args: Record<string, string>): Promise<string> {
if (name === 'nlq') {
const result = await minns.nlq({ question: args.question });
return JSON.stringify({
answer: result.answer,
confidence: result.confidence,
entities_resolved: result.entities_resolved,
});
}
if (name === 'search_claims') {
const results = await minns.searchClaims({
query_text: args.query,
top_k: 5,
});
return JSON.stringify(results.map(r => ({
fact: r.claim_text,
confidence: r.confidence,
entity: r.subject_entity,
})));
}
return '{}';
}
A few things are happening here:
-
nlq()gives the model a direct way to query the context graph -
searchClaims()gives it a lower-level fact lookup tool - both tool results are returned as JSON so the LLM can reason over them in the next step
This is the important architectural point: the LLM is not scraping meaning from raw message history. It is interacting with a structured memory system.
Step 5: Build the ReAct loop
Now we implement the agent loop itself.
This is where the model gets to think, decide which tool to use, observe the results, and either continue or answer.
const MAX_ITERATIONS = 5;
async function agentLoop() {
const rl = readline.createInterface({
input: process.stdin,
output: process.stdout,
});
const history: OpenAI.ChatCompletionMessageParam[] = [];
console.log('Agent ready. Ask anything about the conversation.\n');
for await (const question of rl) {
history.push({ role: 'user', content: question });
for (let i = 0; i < MAX_ITERATIONS; i++) {
const response = await openai.chat.completions.create({
model: 'gpt-4o-mini',
messages: [
{
role: 'system',
content: `You are a helpful travel agent with access to a structured memory system.
Use the nlq tool to ask MINNS questions.
Use search_claims when you need specific extracted facts.
Only answer the user when you have enough information.
If the answer is incomplete, query again.`,
},
...history,
],
tools: TOOLS,
});
const message = response.choices[0].message;
history.push(message);
if (!message.tool_calls?.length) {
console.log(`\nAgent: ${message.content}\n`);
break;
}
for (const toolCall of message.tool_calls) {
const args = JSON.parse(toolCall.function.arguments);
console.log(` [${toolCall.function.name}]: ${args.question || args.query}`);
const result = await executeTool(toolCall.function.name, args);
history.push({
role: 'tool',
tool_call_id: toolCall.id,
content: result,
});
}
}
}
}
This loop is the core of the agent.
On each iteration:
- the model sees the user’s question and the previous tool outputs
- it decides whether it needs to call a tool
- if it does, the tool result is added back into the history
- the model thinks again using the new information
- once it has enough context, it responds
That is ReAct in practice.
The model might answer in one step for a simple question.
For a harder question, it might:
- query dietary constraints
- inspect family structure
- check ages of the children
- then combine all of that into a final answer
The agent decides how many steps it needs.
Step 6: Run the agent
Now wire it together:
async function main() {
await ingest();
await agentLoop();
}
main();
Start the script:
npx tsx src/agent.ts
Example session:
Conversation ingested.
Agent ready. Ask anything about the conversation.
> What should I know before booking restaurants?
[nlq]: Are there any dietary requirements for this trip?
[search_claims]: children ages
Agent: Lily has a nut allergy, so restaurant choices need to be allergy-safe and staff should be able to confirm ingredients clearly. The family is also travelling with two young children, ages 8 and 5, so family-friendly options would make sense. You should also keep total dining spend in line with the overall trip budget of around €5,000.
> Where are they flying from?
[nlq]: Where is the family departing from?
Agent: They are flying from Manchester Airport.
Notice what happened in the first example.
The model did not answer immediately. It realised it needed more context, used two tools, inspected the results, and then produced a joined-up response. That is the difference between tool use and simple prompting.
Full file
Here is the full working example in one file:
import { MinnsClient } from 'minns-sdk';
import OpenAI from 'openai';
import dotenv from 'dotenv';
import * as readline from 'readline';
dotenv.config();
const minns = new MinnsClient({
apiKey: process.env.MINNS_API_KEY!,
});
const openai = new OpenAI({
apiKey: process.env.OPENAI_API_KEY!,
});
const TOOLS = [
{
type: 'function' as const,
function: {
name: 'nlq',
description: 'Ask MINNS a question. It reasons over the context graph and returns a direct answer.',
parameters: {
type: 'object',
properties: {
question: { type: 'string' },
},
required: ['question'],
},
},
},
{
type: 'function' as const,
function: {
name: 'search_claims',
description: 'Search extracted claims and facts from the conversation.',
parameters: {
type: 'object',
properties: {
query: { type: 'string' },
},
required: ['query'],
},
},
},
];
async function executeTool(name: string, args: Record<string, string>): Promise<string> {
if (name === 'nlq') {
const result = await minns.nlq({ question: args.question });
return JSON.stringify({
answer: result.answer,
confidence: result.confidence,
entities_resolved: result.entities_resolved,
});
}
if (name === 'search_claims') {
const results = await minns.searchClaims({
query_text: args.query,
top_k: 5,
});
return JSON.stringify(results.map(r => ({
fact: r.claim_text,
confidence: r.confidence,
entity: r.subject_entity,
})));
}
return '{}';
}
async function ingest() {
await minns.ingestConversations({
case_id: 'travel-booking-2024',
sessions: [{
session_id: 'session-1',
topic: 'holiday-planning',
messages: [
{ role: 'user', content: "Hi, I'm planning a trip to the Amalfi Coast for my family." },
{ role: 'assistant', content: "Lovely! When are you thinking of traveling?" },
{ role: 'user', content: "Late June. Budget is about 5000 euros." },
{ role: 'assistant', content: "How many people?" },
{ role: 'user', content: "Me, my husband Tom, and our kids Lily who is 8 and Max who is 5." },
{ role: 'assistant', content: "Any dietary requirements?" },
{ role: 'user', content: "Yes — Lily is allergic to nuts." },
{ role: 'user', content: "Oh and we live in Manchester, so flights from Manchester Airport please." },
],
}],
});
console.log('Conversation ingested.');
}
const MAX_ITERATIONS = 5;
async function agentLoop() {
const rl = readline.createInterface({
input: process.stdin,
output: process.stdout,
});
const history: OpenAI.ChatCompletionMessageParam[] = [];
console.log('Agent ready. Ask anything about the conversation.\n');
for await (const question of rl) {
history.push({ role: 'user', content: question });
for (let i = 0; i < MAX_ITERATIONS; i++) {
const response = await openai.chat.completions.create({
model: 'gpt-4o-mini',
messages: [
{
role: 'system',
content: `You are a helpful travel agent with access to a structured memory system.
Use the nlq tool to ask MINNS questions.
Use search_claims when you need specific extracted facts.
Only answer the user when you have enough information.
If the answer is incomplete, query again.`,
},
...history,
],
tools: TOOLS,
});
const message = response.choices[0].message;
history.push(message);
if (!message.tool_calls?.length) {
console.log(`\nAgent: ${message.content}\n`);
break;
}
for (const toolCall of message.tool_calls) {
const args = JSON.parse(toolCall.function.arguments);
console.log(` [${toolCall.function.name}]: ${args.question || args.query}`);
const result = await executeTool(toolCall.function.name, args);
history.push({
role: 'tool',
tool_call_id: toolCall.id,
content: result,
});
}
}
}
}
async function main() {
await ingest();
await agentLoop();
}
main();
Why this is different from RAG
A RAG-based agent usually works like this:
- embed the query
- retrieve top-k chunks
- put those chunks into the prompt
- ask the LLM to infer the answer
That works well for knowledge-base search. It works less well for evolving user context.
MINNS changes the shape of the tool call.
Instead of returning text passages, it returns answers and facts grounded in a structured memory model. That means the LLM is doing less reconstruction from raw text and more decision-making over stateful context.
That matters when the memory is dynamic.
If the user updates their budget from €5,000 to €7,000, that is not just another sentence in a document. It is a state transition. The system needs to understand that the current value supersedes the old one.
That is the difference between retrieval and cognition.
Where to take this next
This example is intentionally small, but the pattern scales.
From here, you can extend the agent with:
- reflection steps between tool calls
- goal tracking
- multi-turn planning
- frontend visibility into reasoning steps
- temporal queries over evolving state
- structured memory types such as ledgers, preferences, and state machines
At that point, you are no longer building a chatbot with retrieval attached.
You are building an agent that can think over memory.
Top comments (0)