Jonathanfarrow

Posted on Mar 12

RAG Was Never Memory. This Is.

#ai #typescript #tutorial

Most AI memory systems are still doing the same thing:

chunk text
embed it
retrieve something vaguely similar later
hope the model sorts it out

That works for document search.

It breaks when memory needs to track people, relationships, constraints, and state changes over time.

If a user says:

my budget is €5,000
actually now it’s €7,000
my daughter has a nut allergy
we’re flying from Manchester

you do not want “similar chunks”.
You want the current state, the connected facts, and the ability to reason across them.

That is the gap between retrieval and memory.

And it is why we built MINNS.

MINNS is a memory system for agents that builds a context graph from conversations and lets you query that graph directly. Not raw text retrieval. Not chunk stuffing. Structured memory that can handle multi-hop reasoning and state changes properly.

The best part is how little setup it takes.

Three lines to get started

import { MinnsClient } from 'minns-sdk';

const client = new MinnsClient({ apiKey: 'your-api-key' });

const answer = await client.query("What's the current trip budget?");

That is the core idea.

Three lines, and you are talking to a memory system designed for actual agent context, not just vector retrieval with better branding.

Why this matters

Most so-called memory layers still behave like search.

They can find a sentence that mentions “budget”.
They can find another sentence that mentions “allergy”.
But they do not naturally understand that:

Lily is the daughter
Lily has the nut allergy
the budget changed
the newer budget supersedes the old one
Manchester is the departure location for this specific trip context

That is where a lot of agent systems quietly fall apart.

They look good in demos.
They become brittle the second memory has to evolve.

MINNS was built to solve that properly.

You ingest conversations, MINNS extracts structured facts, resolves entities, tracks state transitions, and builds a context graph that your agent can query.

So instead of asking an LLM to reconstruct truth from a pile of text, you query memory as memory.

A minimal example

Let’s say we ingest a short travel-planning conversation.

import { MinnsClient } from 'minns-sdk';

const client = new MinnsClient({
  apiKey: process.env.MINNS_API_KEY!,
});

await client.ingestConversations({
  case_id: 'holiday-planning',
  sessions: [
    {
      session_id: 's1',
      topic: 'trip',
      messages: [
        { role: 'user', content: "I'm planning a trip to the Amalfi Coast for my family." },
        { role: 'user', content: 'Budget is about 5000 euros.' },
        { role: 'user', content: 'Actually, make that 7000 euros.' },
        { role: 'user', content: 'My daughter Lily is allergic to nuts.' },
        { role: 'user', content: 'We live in Manchester, so flights from Manchester Airport.' },
      ],
    },
  ],
});

Now query it in natural language:

const budget = await client.query("What's the current budget?");
console.log(budget);

const allergy = await client.query('Does anyone have dietary restrictions?');
console.log(allergy);

const flights = await client.query('Where should they fly from?');
console.log(flights);

That is already a very different developer experience from bolting together embeddings, retrieval, filtering, and prompt gymnastics.

Search specific claims too

Sometimes you want direct fact search rather than a composed answer.

You can do that too:

const claims = await client.searchClaims({
  queryText: 'allergies',
});

console.log(claims);

So you have both levels:

query() for memory-aware answers across the graph
searchClaims() for lower-level fact inspection

That is a much better shape for agent systems.

Your LLM can use memory as a tool instead of pretending a prompt full of retrieved chunks is a memory architecture.

Why we built it this way

We built MINNS because too much of the “AI memory” ecosystem still treats memory as a retrieval problem.

It is not.

Real memory has to deal with:

entity resolution
connected facts
evolving user state
supersession
multi-hop reasoning

If a system cannot handle those reliably, it is not really memory. It is search.

That distinction shows up fast when you benchmark.

MINNS has been benchmarked strongly on multi-hop questions and state-change scenarios, which are exactly the cases where naive retrieval starts to crack. That matters because these are not edge cases. They are the normal shape of user context in any serious agent. A user changes plans, updates constraints, refers to family members indirectly, or asks something that requires multiple connected facts to answer.

That is not exotic. That is Tuesday.

Why this is useful for agents

Once memory is queryable like this, your agent loop gets much simpler.

Instead of:

retrieve chunks
stuff them into the prompt
ask the model to work out what matters
hope it prefers the newer fact to the older one

you can do:

ask memory directly
inspect the returned answer or claims
continue reasoning only if needed

That is a much cleaner architecture.

Your LLM becomes the orchestrator.
MINNS becomes the memory layer.
Each part does its own job.

A simple agent pattern

Here is the shape:

const userQuestion = 'What should I know before booking restaurants?';

const memoryAnswer = await client.query(userQuestion);

console.log(memoryAnswer);

If you are building a ReAct-style agent, query() becomes one of the most useful tools in the loop.

The model sees the question, decides it needs memory, calls query(), gets back context-aware output, then either answers or keeps going.

That is far better than making the model reconstruct reality from raw retrieval results.

The setup is the point

A lot of memory tooling asks developers to accept complexity up front:

ingestion pipelines
retrieval tuning
schema work
orchestration glue
ranking hacks
prompt layering to paper over retrieval flaws

MINNS is opinionated in the opposite direction.

The setup should be simple.
The hard part should happen inside the memory system.

That is why this is such a useful starting point:

import { MinnsClient } from 'minns-sdk';

const client = new MinnsClient({ apiKey: 'your-api-key' });

const answer = await client.query("What's the budget?");

Three lines.

And behind those three lines is a memory system built for the exact problems most agent stacks still struggle with.

Final thought

RAG helped AI systems search text.

It did not solve memory.

If you want agents that can handle real user context, especially multi-hop questions and state changes, you need something better than chunk retrieval.

That is what MINNS is for.

Full quickstart

import { MinnsClient } from 'minns-sdk';

const client = new MinnsClient({
  apiKey: process.env.MINNS_API_KEY!,
});

await client.ingestConversations({
  case_id: 'holiday-planning',
  sessions: [
    {
      session_id: 's1',
      topic: 'trip',
      messages: [
        { role: 'user', content: "I'm planning a trip to the Amalfi Coast for my family." },
        { role: 'user', content: 'Budget is about 5000 euros.' },
        { role: 'user', content: 'Actually, make that 7000 euros.' },
        { role: 'user', content: 'My daughter Lily is allergic to nuts.' },
        { role: 'user', content: 'We live in Manchester, so flights from Manchester Airport.' },
      ],
    },
  ],
});

const budget = await client.query("What's the current budget?");
console.log('Budget:', budget);

const claims = await client.searchClaims({
  queryText: 'allergies',
});
console.log('Claims:', claims);

DEV Community