OpenKairos: Open Implementation of the Leaked KAIROS Architecture

prabhdeep — Tue, 28 Apr 2026 14:55:42 +0000

The Context

The most interesting part of the leak wasn’t model weights or APIs—it was architecture.
Specifically, the idea of a persistent daemon: a system that observes, reacts, and schedules actions without explicit user prompts. Think less “chatbot,” more “background intelligence layer.”
That concept stuck with me.

The Build Timeline

I started building on April 19 with a simple constraint:
No massive infra. No hidden magic. Just reproducible components.
The goal wasn’t to copy anything—it was to see if the pattern could be rebuilt from scratch.
Nine days later, I had a working prototype.

The Stack (What actually matters)

Python + Asyncio → event loop for continuous execution
Watchdog → filesystem + environment triggers
Ollama → local model inference (no external API dependency)
Task Scheduler Layer → priority + interrupt handling
3-Layer Memory System:
Short-term (context window)
Mid-term (session logs)
Long-term (vector store)
Everything runs as a daemon process—not a request/response server.

Core Design Idea

Instead of:

User → Prompt → Response

It works like:

System Loop → Observe → Decide → Act → Store → Repeat
That shift changes everything:

1. latency expectations
2. memory handling
3. failure modes
4. resource management

The Weird Part: “AutoDream”

The hardest problem wasn’t inference—it was memory.

I ended up building something I call AutoDream:

Runs periodically (or during idle windows)
Compresses recent interactions
Promotes useful patterns into long-term memory
Drops noise

The constraint:

Must complete within ~15 seconds or get killed by the scheduler.

This forced aggressive tradeoffs:

summarization vs fidelity
frequency vs cost
stability vs adaptability
Still not fully solved.

What Broke (and why it matters)

Long-running loops drift without strong constraints
Memory systems become garbage collectors if unmanaged
Background agents need interruptibility, not just intelligence

This isn’t just “LLM engineering”—it’s closer to OS design.

Call to Action
The full implementation is open source:

If you’re exploring persistent agents, daemonized LLMs, or memory systems—I’d be interested in what approaches you’re taking.