DEV Community

Paul Holland
Paul Holland

Posted on

From Personal Tool to Enterprise Platform: Where Eidetic OS Is Heading Next

Six months ago I built a Python script that saved my Claude conversations to markdown files. Today it's a full personal AI operating system — 160+ skills, hybrid RAG search, pluggable vector backends, a web dashboard, fact extraction, memory decay scoring, and an Obsidian plugin. Here's the technical story of how each decision was made, what we tried, what failed, and where it's going next.

Eidetic OS Demo

Why I Built This

I'm an IT operations manager. I use AI every day — architecture planning, debugging, code review, research. The problem: every conversation disappeared when I closed the tab. I'd spend an hour explaining my infrastructure to Claude, get great advice, then next session start from scratch.

I tried the obvious solutions first. Conversation history? Too noisy — full of corrections, tangents, tool output. Copy-pasting notes? Doesn't scale past a week. Third-party tools? Either cloud-dependent (privacy concern) or too basic.

So I built my own. The core thesis: your AI should remember everything and work while you sleep. Everything that followed was in service of that.

The Architecture Decisions (And Why)

┌──────────────────────────────────────────────────────────┐
│                      eidetic CLI                          │
│   init · doctor · search · embed · dashboard · skills     │
├──────────┬──────────┬───────────┬────────────────────────┤
│  Vault   │   RAG    │  Skills   │   LLM Backends         │
│  Markdown│  Hybrid  │  160+     │   LM Studio / Ollama   │
│  Git Sync│  BM25+Vec│  MCP      │   llama.cpp / OpenAI   │
├──────────┼──────────┼───────────┼────────────────────────┤
│  Fact    │ Security │  Vector   │   Dashboard            │
│  Extract │ AST Scan │  Backends │   Flask + D3.js        │
│  Memory  │ Sandbox  │  SQLite   │   7 Panels             │
│  Decay   │ Audit    │  LanceDB  │   Knowledge Graph      │
├──────────┴──────────┴───────────┴────────────────────────┤
│               Obsidian Markdown Vault                     │
│        (your files, your machine, git-versioned)          │
└──────────────────────────────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

Why Obsidian Markdown (Not a Database)

The first decision was storage format. I could have used SQLite from day one, or Postgres, or a purpose-built knowledge graph database. I chose plain markdown files in an Obsidian vault. Here's why:

  • Human-readable — I can open any file in a text editor and read it. No special tooling needed to inspect my memory.
  • Git-versioned — Every change is a commit. Free versioning, branching, diffing, conflict detection.
  • Portable — Move the folder to another machine and everything works. No database migrations, no server processes.
  • Obsidian-native — I already lived in Obsidian for personal notes. The vault is my knowledge base and my AI's memory. One source of truth.

The trade-off: markdown isn't searchable by meaning. That's where the RAG pipeline comes in.

Why Hybrid Search (BM25 + Vector + RRF + TF-IDF)

Pure vector search sounds elegant. Embed everything, find similar chunks, done. In practice it misses exact matches constantly. If I search for "sqlite-vec", embedding search returns results about "vector databases in general" but misses the specific chunk where I chose that library.

Pure keyword search (BM25) catches exact terms but misses conceptual connections. "What did we decide about the authentication approach?" won't find a chunk that talks about "JWT token validation strategy."

So we fuse both:

  1. BM25 scores by term frequency (catches exact matches)
  2. Vector cosine scores by semantic similarity (catches conceptual matches)
  3. Reciprocal Rank Fusion merges both ranked lists with k=60 (prevents either method from dominating)
  4. TF-IDF reranking refines the top results (boosts rare, query-specific terms)
Query: "What vector database did we choose?"
         │
         ├──► BM25 Keyword Search ──► chunks with "vector", "database"
         │
         ├──► Vector Cosine Search ──► chunks about "embedding storage"
         │
         ├──► Reciprocal Rank Fusion (k=60) ──► merged ranking
         │
         └──► TF-IDF Reranking ──► final top-N results
Enter fullscreen mode Exit fullscreen mode

This was the single biggest quality improvement in the entire project. The jump from pure vector search to hybrid was immediately noticeable — suddenly the system could find both the exact library name and the reasoning behind choosing it.

Hybrid RAG Search

Why SQLite-vec (Not Pinecone/Weaviate/Qdrant)

For a personal knowledge base with ~10K chunks, you don't need a vector database service. You need something that:

  • Requires zero configuration
  • Embeds in the Python process (no running server)
  • Has no cloud dependency
  • Is fast enough for KNN over 10K vectors

sqlite-vec does all of this. It's a SQLite extension. Your vectors live in the same database as your metadata. pip install and you're done.

We added LanceDB and ChromaDB as pluggable alternatives (swap with one config line) for anyone who outgrows SQLite. But for 99% of personal use, SQLite is the right answer.

Why Local LLMs (Not Just Cloud APIs)

Every embedding, every search, every analysis can run on your hardware. We auto-detect LM Studio, Ollama, and llama.cpp at startup. No API keys required.

This isn't ideological — it's practical. I work on planes. I work in environments where sending data to cloud APIs isn't an option. And for a system that stores your complete professional context, keeping embeddings local isn't a luxury, it's a requirement.

We also support OpenAI-compatible APIs for people who want cloud speed. But the system works fully offline by default.

Why Extensions (Not a Monolithic CLI)

By v2.0, the CLI was a 1,500-line monster. Trading commands, voice synthesis, job tracking — all in one file. Every user got every feature. Adding a module meant editing the core.

v3.0 ripped it apart into a pluggable extension system using setuptools entry-points:

pip install eidetic-os              # Core only
pip install eidetic-os[trading]     # + trading module
pip install eidetic-os[voice]       # + voice synthesis
pip install eidetic-os[vector]      # + LanceDB/ChromaDB
Enter fullscreen mode Exit fullscreen mode

Each extension is an EideticExtension subclass that registers its own commands, skills, and schedules. If one extension fails to load, the rest keep working. This was essential for community contributions — nobody should have to understand the trading module to add a documentation tool.

Skills Marketplace

Why AST Security Scanning (Not Just Sandboxing)

With 160+ community skills, someone will eventually write code that does something dangerous — intentionally or not. Sandboxing catches runtime problems (infinite loops, memory bombs, file access). But it doesn't catch intent.

The AST scanner reads the code's structure before execution:

  • BLOCKos.system(), subprocess.call(), network access, file deletion. Hard-stopped, never executes.
  • WARN — Dynamic imports, eval/exec, broad file access. Logged, requires approval.
  • INFO — External library imports, vault file reads. Noted but allowed.

This is defense in depth. The scanner catches dangerous patterns before the sandbox even starts. Two independent layers, each with different detection strengths.

Why Fact Extraction (Not Raw Transcript Storage)

v1 through v3 stored raw session transcripts. Every conversation, verbatim. This caused two problems:

  1. Context bloat — Conversations are full of noise. False starts, corrections, "actually wait, let me rethink that." Embedding all of this pollutes search results.
  2. Redundancy — The same decision gets discussed across multiple sessions. You end up with five chunks saying the same thing in slightly different words.

v4.0 introduced Mem0-style fact extraction. Instead of storing "we discussed authentication and decided to use JWT tokens because...", the system extracts: "Paul chose JWT tokens for authentication (decided 2026-05-15, reason: stateless, no session DB needed)"

Each fact gets:

  • Cosine similarity comparison against existing facts
  • Duplicate detection — if the fact already exists, bump its access count
  • Contradiction handling — if the new fact contradicts an old one, supersede it (mark old as inactive)
  • Merge logic — if the new fact extends an old one, combine them

This reduced context bloat dramatically. The system stores decisions, not discussions.

Why Memory Decay (Not Permanent Storage)

Not all facts are equally relevant forever. "Paul prefers dark mode" is permanent. "The deploy target is staging-3" is temporary. Without decay, stale facts accumulate and pollute active reasoning.

The retention model: P(M) = e^(-λt) · (1 + βf)

  • λ = temporal decay rate
  • t = time since last access
  • f = access frequency
  • β = reinforcement coefficient

Frequently accessed facts stay hot. Old, unreinforced facts decay toward deactivation. The sleeptime daemon runs this scoring while you're offline, pruning stale context automatically.

Why Channel Adapters (Not Terminal-Only)

A personal AI OS that only works when you're at your desk is half a solution. The channel adapter framework lets you query your knowledge base from Slack or Telegram. The system runs as a local daemon, receives messages, routes them through RAG search, and sends back answers.

This was directly inspired by Letta's custom channels architecture. The key insight: decouple the intelligence from the interface. The same RAG pipeline serves the CLI, the dashboard, the Obsidian plugin, and messaging apps.

Where It's Going: v5.0

A competitive analysis against Letta ($10M funded), Mem0 ($24M Series A), and Nucleus MCP revealed two gaps between "personal tool" and "enterprise platform": verification and provenance.

The Trust Problem

When an AI agent executes code autonomously, the question isn't "can it do the task?" It's "can you prove it did the task correctly?"

In regulated industries — finance, healthcare, government — every autonomous action needs a tamper-evident audit trail. Our JSONL log captures everything, but anyone with filesystem access can modify it after the fact. And for code execution: we check for danger (AST scanning), but we don't verify correctness.

Structured Verification Gates (#29)

A 5-tier pipeline that runs before any autonomous execution:

  1. SYNTAX — AST parse, catch errors before anything runs
  2. IMPORTS — Verify dependencies resolve, cross-reference security block list
  3. TESTS — If tests exist for the module, run them
  4. RUNTIME — Execute in sandbox, capture output and resource usage
  5. DIFF — Show what changed, flag unexpected modifications
Code/Skill ──► SYNTAX ──► IMPORTS ──► TESTS ──► RUNTIME ──► DIFF ──► ✅ Execute
                 │           │          │          │          │
                 ▼           ▼          ▼          ▼          ▼
              BLOCK?      BLOCK?     FAIL?     CRASH?    UNEXPECTED?
                 │           │          │          │          │
                 └───────────┴──────────┴──────────┴──────────┘
                                    │
                                 🛑 STOP
Enter fullscreen mode Exit fullscreen mode

Each tier produces a typed result. Execution stops on the first BLOCK-level failure. Every result feeds into the audit trail.

Cryptographic Audit Signatures (#30)

Ed25519 signatures on every audit trail entry, with a SHA-256 hash chain linking each entry to the previous one. If any entry is modified after creation, the chain breaks and verification fails.

Entry 1          Entry 2          Entry 3
┌──────────┐    ┌──────────┐    ┌──────────┐
│ action   │    │ action   │    │ action   │
│ timestamp│    │ timestamp│    │ timestamp│
│ prev: ∅  │───►│ prev: h1 │───►│ prev: h2 │
│ sig: Ed25│    │ sig: Ed25│    │ sig: Ed25│
└──────────┘    └──────────┘    └──────────┘
   hash=h1         hash=h2         hash=h3
Enter fullscreen mode Exit fullscreen mode

This turns the audit trail from "a log file" into "a cryptographic proof of execution history." Supports SOC2, EU DORA, and MAS TRM without any cloud service.

Tiered Memory (#31)

Moving from a flat vector store to Core (hot context) / Recall (recent cache) / Archival (cold storage). The agent decides what stays active vs. what gets archived, using memory decay scoring to make informed decisions.

Valkey Search (#32)

High-performance search backend for multi-user deployments where retrieval latency matters. Keeps SQLite-vec as the zero-config default for personal use.

The Philosophy

Every decision in Eidetic OS comes back to three principles:

  1. Local-first — Your data never leaves your machine unless you explicitly send it somewhere
  2. Human-readable — Every piece of state is inspectable in a text editor
  3. Progressive complexity — Works with zero config out of the box, scales to enterprise with opt-in features

The AI agent space is moving fast. Letta and Mem0 have venture funding and full teams. What we have is a different philosophy: your AI's memory belongs to you, runs on your hardware, and produces verifiable proof of what it did.

pip install eidetic-os
eidetic init
eidetic doctor
Enter fullscreen mode Exit fullscreen mode

Quick Install

GitHub: paulholland511/eidetic-os
PyPI: eidetic-os

v5.0 features are being built right now. Star the repo if you want to follow along.


This is the second post in my series on building Eidetic OS. The first post covers the full feature set and comparison tables.

Top comments (0)