DEV Community

bhaskararao arani
bhaskararao arani

Posted on

Why we stopped stitching SQL + vector databases for AI apps - Answer is sochDB

Building a Local RAG + Memory System with an Embedded Database

When building RAG or agent-style AI applications, we often end up with the same stack:

  • SQL for structured data
  • a vector database for embeddings
  • custom glue code to assemble context
  • extra logic to track memory

This works — but it quickly becomes hard to reason about, especially in local-first setups.

In this post, I’ll walk through a real, minimal example of building a local RAG + memory system using a single embedded database, based on patterns we’ve been using in SochDB.

No cloud services. No external vector DBs.

What we’re building

A simple local AI assistant backend that can:

  • Store documents with metadata
  • Retrieve relevant context by meaning (RAG)
  • Persist memory across interactions
  • Run entirely locally

This pattern applies to:

  • - internal assistants
  • - developer copilots
  • - knowledge-base chat
  • offline or privacy-sensitive AI apps

Setup

Install the database:
npm install sochdb
Create a database file locally:
`import { SochDB } from "sochdb";

const db = new SochDB("assistant.db");
`
That’s it — no server, no config.

Step 1: Ingest documents (structured data + vectors)

Each record stores:

  • raw text
  • embedding
  • structured metadata

All in one place.

await db.insert({
id: "doc-1",
source: "internal-docs",
text: "SochDB combines SQL, vector search, and AI context",
embedding: embed("SochDB combines SQL, vector search, and AI context"),
tags: ["architecture", "database"]
});

There’s no separate ingestion pipeline.
No sync between SQL rows and vector IDs.

Step 2: Retrieve context for a query (RAG)

When a user asks a question:

const context = await db.query({
query: "How does SochDB manage AI memory?",
topK: 5
});

The result already contains:

  • relevant text chunks
  • structured metadata
  • a consistent ordering

This can be passed directly into your LLM prompt.

Step 3: Store memory (agent-style behavior)

To support memory or state, we store interactions the same way:

await db.insert({
id: "memory-1",
type: "memory",
scope: "session",
text: "User prefers local-first AI tools",
embedding: embed("User prefers local-first AI tools")
});

Because memory lives in the same database:

  • it can be retrieved with documents
  • it stays consistent
  • it’s easy to debug

Step 4: Combining documents + memory

A single query can now return:

documents

prior context

memory entries

const results = await db.query({
query: "What kind of tools does the user like?",
topK: 5
});

No cross-database joins.
No fragile context assembly logic.

Why this approach works well locally

Keeping everything embedded and local meant:

  • fewer moving parts
  • predictable performance
  • easier debugging
  • simpler mental model

We didn’t remove complexity entirely —
we centralized it into one engine.

That trade-off has been worth it for:

  • local-first tools
  • early-stage products
  • agent experiments Where this approach breaks down

This isn’t a silver bullet.

It’s not ideal for:

  • massive multi-tenant SaaS systems
  • workloads needing independent scaling of every component
  • heavy distributed writes

Those systems benefit from separation.

This approach optimizes for simplicity and control, not maximum scale.

Closing thoughts

If you’re building AI systems where:

  • state matters
  • memory matters
  • context matters
  • and local execution matters

collapsing SQL, vectors, and memory into a single embedded system can simplify things more than expected.

This post is based on experiments we’ve been running in SochDB, an embedded, local-first database for AI apps.

Docs: https://sochdb.dev/docs

Code: https://github.com/sochdb/sochdb

Happy to hear how others are handling RAG and memory in their own systems.

Top comments (0)