Building a Local RAG + Memory System with an Embedded Database
When building RAG or agent-style AI applications, we often end up with the same stack:
- SQL for structured data
- a vector database for embeddings
- custom glue code to assemble context
- extra logic to track memory
This works — but it quickly becomes hard to reason about, especially in local-first setups.
In this post, I’ll walk through a real, minimal example of building a local RAG + memory system using a single embedded database, based on patterns we’ve been using in SochDB.
No cloud services. No external vector DBs.
What we’re building
A simple local AI assistant backend that can:
- Store documents with metadata
- Retrieve relevant context by meaning (RAG)
- Persist memory across interactions
- Run entirely locally
This pattern applies to:
- - internal assistants
- - developer copilots
- - knowledge-base chat
- offline or privacy-sensitive AI apps
Setup
Install the database:
npm install sochdb
Create a database file locally:
`import { SochDB } from "sochdb";
const db = new SochDB("assistant.db");
`
That’s it — no server, no config.
Step 1: Ingest documents (structured data + vectors)
Each record stores:
- raw text
- embedding
- structured metadata
All in one place.
await db.insert({
id: "doc-1",
source: "internal-docs",
text: "SochDB combines SQL, vector search, and AI context",
embedding: embed("SochDB combines SQL, vector search, and AI context"),
tags: ["architecture", "database"]
});
There’s no separate ingestion pipeline.
No sync between SQL rows and vector IDs.
Step 2: Retrieve context for a query (RAG)
When a user asks a question:
const context = await db.query({
query: "How does SochDB manage AI memory?",
topK: 5
});
The result already contains:
- relevant text chunks
- structured metadata
- a consistent ordering
This can be passed directly into your LLM prompt.
Step 3: Store memory (agent-style behavior)
To support memory or state, we store interactions the same way:
await db.insert({
id: "memory-1",
type: "memory",
scope: "session",
text: "User prefers local-first AI tools",
embedding: embed("User prefers local-first AI tools")
});
Because memory lives in the same database:
- it can be retrieved with documents
- it stays consistent
- it’s easy to debug
Step 4: Combining documents + memory
A single query can now return:
documents
prior context
memory entries
const results = await db.query({
query: "What kind of tools does the user like?",
topK: 5
});
No cross-database joins.
No fragile context assembly logic.
Why this approach works well locally
Keeping everything embedded and local meant:
- fewer moving parts
- predictable performance
- easier debugging
- simpler mental model
We didn’t remove complexity entirely —
we centralized it into one engine.
That trade-off has been worth it for:
- local-first tools
- early-stage products
- agent experiments Where this approach breaks down
This isn’t a silver bullet.
It’s not ideal for:
- massive multi-tenant SaaS systems
- workloads needing independent scaling of every component
- heavy distributed writes
Those systems benefit from separation.
This approach optimizes for simplicity and control, not maximum scale.
Closing thoughts
If you’re building AI systems where:
- state matters
- memory matters
- context matters
- and local execution matters
collapsing SQL, vectors, and memory into a single embedded system can simplify things more than expected.
This post is based on experiments we’ve been running in SochDB, an embedded, local-first database for AI apps.
Docs: https://sochdb.dev/docs
Code: https://github.com/sochdb/sochdb
Happy to hear how others are handling RAG and memory in their own systems.
Top comments (0)