Christian Alexander Nonis

Posted on Mar 29

From RAG to a “memory layer”: what building an AI assistant taught us

#ai #opensource #machinelearning #rag

About a year and a half ago, we were building a proactive AI assistant.

Not just a chatbot, but something that could actually act on your behalf.

It could reply to emails in your tone, move calendar events, organize your inbox, and surface information based on what you actually care about.

The goal was simple:

build something that feels like an extension of how you think.

The part we didn’t expect

To make that work, we started with what most people use today: RAG.

And to be fair - RAG works.

You can go pretty far with chunking, embeddings, and retrieval.
You can build systems that feel smart.

But as the assistant got more complex, something started to break.

Not in an obvious way.

It was more subtle.

The system could retrieve relevant information,
but it didn’t really understand how things were connected.

Everything was based on similarity.

And similarity is not structure.

Building a "brain"

To move forward, we needed something else.

We started building what we internally called a "brain".

A layer responsible for:

extracting meaning from data
connecting concepts together
maintaining a consistent structure over time

At the beginning, it was just a supporting component for the assistant.

But the deeper we went, the more it became clear:

this was the real problem.

About 7 months ago, we made a decision:
we stopped focusing on the assistant itself
and went all-in on this layer.

That became BrainAPI.

From retrieval to structure

The shift can be summarized like this.

Typical RAG pipeline:
chunk -> embed -> retrieve -> generate

What we moved toward:
ingest -> extract -> connect -> graph -> query

Instead of treating data as independent chunks,
we process it into a structured representation of entities and relationships.

In practice:

documents are parsed into concepts
relationships are extracted and normalized
everything is stored in a graph + vector layer

Vectors are still useful,
but they are no longer the primary abstraction.

The graph is.

What changes in practice

This changes how you interact with data.

Instead of asking:

"what text is similar to this query?"

You can ask:

what entities are involved?
how are they connected?
what paths exist between concepts?
what else is related in this context?

Retrieval becomes navigation.

Where this approach helps

We found this particularly useful when:

context spans across multiple sources and time
relationships matter more than keywords
consistency is important (not just relevance)

Some practical use cases:

recommendation systems (ecommerce, social)
search systems that go beyond keyword matching
persistent memory for agents and chatbots
more reliable RAG setups in complex domains

Exploring "polarities"

One interesting direction we’ve been exploring is something we call polarities.

Instead of returning a single "best" answer,
the system can surface a range of possible solutions around a problem,
based on how concepts relate in the graph.

It’s less about ranking results,
and more about exploring a solution space.

Why this matters

At Lumen Labs (our startup), this direction came from a broader observation.

AI systems today are powerful,
but they are also fragile in how they represent knowledge.

They retrieve well.
They generate well.

But they don’t really ground information in a consistent structure.

And that’s where a lot of issues come from,
especially when accuracy actually matters.

If we want systems that people can rely on,
we need something closer to a structured memory layer.

Open sourcing it

We’ve been using this approach in production for a few B2B use cases,
but never exposed it publicly.

Now we’re opening it up.

the core is open source
it can run fully locally (we’ve tested it with Ollama + offline setups)
or be deployed as managed instances in the cloud
it’s extensible via a plugin system

Closing thoughts

We don’t think this replaces RAG.

But it feels like RAG is one component of a bigger system,
not the system itself.

After spending the last year and a half building on top of AI systems,
this "memory layer" is the piece that felt missing.

Curious to hear how others are approaching this,
especially if you’ve hit similar limitations with chunk-based retrieval.

Links

Repo: https://github.com/Lumen-Labs/brainapi2
Website / Video: https://brain-api.dev

DEV Community