bhaskararao arani

Posted on Feb 4

Why we stopped stitching SQL + vector databases for AI apps - Answer is sochDB

#ai #vectordatabase #opensource #rag

Building a Local RAG + Memory System with an Embedded Database

When building RAG or agent-style AI applications, we often end up with the same stack:

SQL for structured data
a vector database for embeddings
custom glue code to assemble context
extra logic to track memory

This works — but it quickly becomes hard to reason about, especially in local-first setups.

In this post, I’ll walk through a real, minimal example of building a local RAG + memory system using a single embedded database, based on patterns we’ve been using in SochDB.

No cloud services. No external vector DBs.

What we’re building

A simple local AI assistant backend that can:

Store documents with metadata
Retrieve relevant context by meaning (RAG)
Persist memory across interactions
Run entirely locally

This pattern applies to:

- internal assistants
- developer copilots
- knowledge-base chat
offline or privacy-sensitive AI apps

Setup

Install the database:

npm install sochdb

Create a database file locally:

import { SochDB } from "sochdb";

const db = new SochDB("assistant.db");

That’s it — no server, no config.

Step 1: Ingest documents (structured data + vectors)

Each record stores:

raw text
embedding
structured metadata

All in one place.

await db.insert({
  id: "doc-1",
  source: "internal-docs",
  text: "SochDB combines SQL, vector search, and AI context",
  embedding: embed("SochDB combines SQL, vector search, and AI context"),
  tags: ["architecture", "database"]
});

There’s no separate ingestion pipeline.
No sync between SQL rows and vector IDs.

Step 2: Retrieve context for a query (RAG)

When a user asks a question:

const context = await db.query({
  query: "How does SochDB manage AI memory?",
  topK: 5
});

The result already contains:

relevant text chunks
structured metadata
a consistent ordering

This can be passed directly into your LLM prompt.

Step 3: Store memory (agent-style behavior)

To support memory or state, we store interactions the same way:

await db.insert({
  id: "memory-1",
  type: "memory",
  scope: "session",
  text: "User prefers local-first AI tools",
  embedding: embed("User prefers local-first AI tools")
});

Because memory lives in the same database:

it can be retrieved with documents
it stays consistent
it’s easy to debug

Step 4: Combining documents + memory

A single query can now return:

documents

prior context

memory entries

const results = await db.query({
  query: "What kind of tools does the user like?",
  topK: 5
});

No cross-database joins.
No fragile context assembly logic.

Why this approach works well locally

Keeping everything embedded and local meant:

fewer moving parts
predictable performance
easier debugging
simpler mental model

We didn’t remove complexity entirely —
we centralized it into one engine.

That trade-off has been worth it for:

local-first tools
early-stage products
agent experiments Where this approach breaks down

This isn’t a silver bullet.

It’s not ideal for:

massive multi-tenant SaaS systems
workloads needing independent scaling of every component
heavy distributed writes

Those systems benefit from separation.

This approach optimizes for simplicity and control, not maximum scale.

Closing thoughts

If you’re building AI systems where:

state matters
memory matters
context matters
and local execution matters

collapsing SQL, vectors, and memory into a single embedded system can simplify things more than expected.

This post is based on experiments we’ve been running in SochDB, an embedded, local-first database for AI apps.

Docs: https://sochdb.dev/docs

Code: https://github.com/sochdb/sochdb

Happy to hear how others are handling RAG and memory in their own systems.

DEV Community

Why we stopped stitching SQL + vector databases for AI apps - Answer is sochDB

Top comments (0)