DEV Community: bhaskararao arani

Local-First Observability & AI Memory for Agents — Powered by SochDB

bhaskararao arani — Mon, 09 Feb 2026 18:39:47 +0000

When we talk about AI agents, we often focus on reasoning, tools, and prompts.

But there’s a quieter problem most systems ignore:

Where does an agent’s memory actually live?

Most agent frameworks today:

Push logs to the cloud
Store embeddings in external vector DBs
Lose context between runs
Treat “memory” as an afterthought

That’s where AgentReplay takes a different path.

🔁 What AgentReplay Does

AgentReplay is a local-first observability layer for AI agents and coding tools.

It lets you:

Record agent runs
Replay decisions step-by-step
Inspect tool calls, thoughts, and outcomes
Debug agents the same way we debug code
But observability alone isn’t enough.

To truly understand agents, you need persistent memory — fast, queryable, and local.

🧠 Why SochDB Fits Perfectly

AgentReplay uses SochDB as its memory backbone.

SochDB is an embedded, AI-native database that unifies:

SQL data
Vector embeddings
Context memory into a single local engine

No cloud dependency. No stitched infrastructure.

⚙️ What This Enables

With SochDB underneath, AgentReplay can:

✅ Store agent runs as structured SQL data
✅ Index embeddings for semantic recall
✅ Preserve long-term context across sessions
✅ Query why an agent behaved a certain way
✅ Replay agent state deterministically

All of this happens on your machine.

🧩 A Real-World Flow

Agent runs locally
↓
Actions + reasoning stored in SochDB
↓
Embeddings indexed alongside structured logs
↓
AgentReplay visualizes & replays the run
↓
Developer debugs, improves, and re-runs

No re-indexing.
No external vector DB.
No cloud lock-in.

🌱 Why This Matters

This pattern unlocks something powerful:

**- Agent observability ≠ logging

Agent memory ≠ vector search
Local-first ≠ toy setups**

It’s how serious agent systems should be built:

Auditable
Explainable
Deterministic
Private

🔗 Explore the Project

👉 AgentReplay on GitHub:
https://github.com/agentreplay/agentreplay

If you’re building AI agents, copilots, or coding tools — this is one of the cleanest examples of local-first AI memory done right.

We’d love to hear from you
👉 sochdb

Stop Reindexing: How We Built Real-Time Search Directly Into the Database using sochDB

bhaskararao arani — Sat, 07 Feb 2026 10:58:40 +0000

Every time we needed “real-time search,” we were told the same thing:
set up a search engine, build an ingestion pipeline, reindex constantly, and hope it stays fresh.

It worked — until it didn’t.

In this post, I’ll explain why reindexing is fundamentally broken for real-time systems, and how we built SochDB to make search a native database capability instead of a separate infrastructure problem.

🔍 Use-case

Search across:
Live APIs (news, social, pricing, telemetry)
Fresh scraped data
Streaming updates Requirement: answers must change as the internet changes.

🧱 SochDB Mapping

Layer	SochDB Role
Ingestion	App pulls live data (HTTP, WebSocket, Kafka, cron)
Storage	SQL tables for raw data + metadata
Vectors	Embeddings stored alongside rows
Context Memory	Tracks what was already seen, freshness, relevance
Query	Hybrid: `SQL filter → vector similarity → context re-rank`

Example

SELECT *
FROM web_events
WHERE source = 'news'
AND published_at > now() - interval '2 hours'
ORDER BY vector_similarity(embedding, :query_vec) DESC
LIMIT 10;

💡 Why SochDB wins

No re-index pipeline
No search cluster
Freshness is natural, not bolted on

2️⃣ Real-Time RAG for AI Agents (Agent Memory > Search)

🤖 Use-case

LLM agents that:
Browse the web
Call tools
Remember what they already learned
Avoid repeating themselves

🧱 SochDB Mapping

Component	SochDB Responsibility
Tool outputs	Stored as structured SQL rows
Agent memory	Vector + context memory tables
Deduplication	Context hashes prevent repeat fetches
Grounding	SQL facts + embeddings = verifiable answers

Agent loop

User Query
 → Search tool
 → Store result in SochDB
 → Check memory overlap
 → Answer with citations

💡 This is agent memory, not just RAG.

3️⃣ Real-Time Personalization (Search That Changes Per User)

🧍 Use-case

E-commerce
Content feeds
Internal developer portals Search results differ** per user, per moment**.

🧱 SochDB Mapping

Table	Purpose
`users`	Profile & preferences
`events`	Clicks, views, actions
`items`	Searchable entities
`user_context`	Rolling session memory

Query flow

SELECT i.*
FROM items i
JOIN user_context uc ON uc.user_id = :uid
WHERE i.category = uc.current_interest
ORDER BY vector_similarity(i.embedding, uc.session_embedding) DESC;

💡 Personalization without Redis + Elastic + Feature Store chaos.

4️⃣ Real-Time Observability & Log Search (Dev-Focused)

🧪 Use-case

Search logs by meaning, not keywords
Debug incidents faster
Local-first debugging

🧱 SochDB Mapping

Aspect	Implementation
Logs	SQL rows (structured)
Meaning	Vector embeddings per log
Context	Incident timeline memory
Search	Semantic + time-windowed SQL

SELECT *
FROM logs
WHERE service = 'payments'
AND ts > now() - interval '15 minutes'
ORDER BY vector_similarity(embedding, :error_description) DESC;

💡 This replaces grep + Elastic + hope.

5️⃣ IoT / Edge Real-Time Search (Offline-First)

🌐 Use-case

Sensors
Edge gateways
Smart infra Must work without cloud.

🧱 SochDB Mapping

Constraint	SochDB Advantage
Offline	Embedded DB
Latency	No network hop
Streaming	Append-only SQL tables
Reasoning	Local vector search

SELECT *
FROM sensor_events
WHERE device_id = :edge_id
ORDER BY ts DESC
LIMIT 100;

💡 This is where cloud-first DBs fail completely.

6️⃣ Real-Time Knowledge Base Search (Docs, Code, Tickets)

📚Use-case

Internal docs
GitHub issues
RFCs
Slack exports

🧱 SochDB Mapping

Data	Stored As
Docs	SQL rows
Code	Chunked embeddings
Tickets	Context-linked memory
Updates	Immediate availability

SELECT *
FROM knowledge_chunks
WHERE project = 'sochdb'
ORDER BY vector_similarity(embedding, :question_vec) DESC;

💡 No re-index, no search infra tax.

🧠 Why This Mapping Is Powerful

Traditional Stack

App
 → Kafka
 → ETL
 → Search Engine
 → Cache
 → Feature Store
 → Hope

SochDB Stack

App
 → SochDB

We’d love to hear from you — whether it’s feedback, questions, or hard problems you’re trying to solve.

👉 SochDB on GitHub

Why we stopped stitching SQL + vector databases for AI apps - Answer is sochDB

bhaskararao arani — Wed, 04 Feb 2026 11:53:23 +0000

Building a Local RAG + Memory System with an Embedded Database

When building RAG or agent-style AI applications, we often end up with the same stack:

SQL for structured data
a vector database for embeddings
custom glue code to assemble context
extra logic to track memory

This works — but it quickly becomes hard to reason about, especially in local-first setups.

In this post, I’ll walk through a real, minimal example of building a local RAG + memory system using a single embedded database, based on patterns we’ve been using in SochDB.

No cloud services. No external vector DBs.

What we’re building

A simple local AI assistant backend that can:

Store documents with metadata
Retrieve relevant context by meaning (RAG)
Persist memory across interactions
Run entirely locally

This pattern applies to:

- internal assistants
- developer copilots
- knowledge-base chat
offline or privacy-sensitive AI apps

Setup

Install the database:

npm install sochdb

Create a database file locally:

import { SochDB } from "sochdb";

const db = new SochDB("assistant.db");

That’s it — no server, no config.

Step 1: Ingest documents (structured data + vectors)

Each record stores:

raw text
embedding
structured metadata

All in one place.

await db.insert({
  id: "doc-1",
  source: "internal-docs",
  text: "SochDB combines SQL, vector search, and AI context",
  embedding: embed("SochDB combines SQL, vector search, and AI context"),
  tags: ["architecture", "database"]
});

There’s no separate ingestion pipeline.
No sync between SQL rows and vector IDs.

Step 2: Retrieve context for a query (RAG)

When a user asks a question:

const context = await db.query({
  query: "How does SochDB manage AI memory?",
  topK: 5
});

The result already contains:

relevant text chunks
structured metadata
a consistent ordering

This can be passed directly into your LLM prompt.

Step 3: Store memory (agent-style behavior)

To support memory or state, we store interactions the same way:

await db.insert({
  id: "memory-1",
  type: "memory",
  scope: "session",
  text: "User prefers local-first AI tools",
  embedding: embed("User prefers local-first AI tools")
});

Because memory lives in the same database:

it can be retrieved with documents
it stays consistent
it’s easy to debug

Step 4: Combining documents + memory

A single query can now return:

documents

prior context

memory entries

const results = await db.query({
  query: "What kind of tools does the user like?",
  topK: 5
});

No cross-database joins.
No fragile context assembly logic.

Why this approach works well locally

Keeping everything embedded and local meant:

fewer moving parts
predictable performance
easier debugging
simpler mental model

We didn’t remove complexity entirely —
we centralized it into one engine.

That trade-off has been worth it for:

local-first tools
early-stage products
agent experiments Where this approach breaks down

This isn’t a silver bullet.

It’s not ideal for:

massive multi-tenant SaaS systems
workloads needing independent scaling of every component
heavy distributed writes

Those systems benefit from separation.

This approach optimizes for simplicity and control, not maximum scale.

Closing thoughts

If you’re building AI systems where:

state matters
memory matters
context matters
and local execution matters

collapsing SQL, vectors, and memory into a single embedded system can simplify things more than expected.

This post is based on experiments we’ve been running in SochDB, an embedded, local-first database for AI apps.

Docs: https://sochdb.dev/docs

Code: https://github.com/sochdb/sochdb

Happy to hear how others are handling RAG and memory in their own systems.