The Vision: A Personal AI That Lives on Your Device
I believe the future of AI isn't in the cloud — it's in your pocket. Imagine a perso...
For further actions, you may consider blocking this person and/or reporting abuse
This hits close to home. I run a local AI agent 24/7 on a Mac Mini (64GB, Ollama qwen3:30b) and the memory problem you describe is exactly what I live with every day.
My current approach is embarrassingly primitive compared to River: flat markdown files.
MEMORY.mdfor long-term curated facts,memory/daily/YYYY-MM-DD.mdfor raw session logs. Every few days I manually "promote" insights from daily logs to the long-term file — basically hand-cranking what your Sleep/Purify phase automates.The observation → suspected → confirmed → established progression is the part that excites me most. In my system, everything is binary: either it's in MEMORY.md or it isn't. There's no confidence gradient. So "user mentioned Python once" sits at the same level as "user has been debugging Python daily for 6 months." That flat weighting causes real problems — the agent treats a passing mention the same as a core trait.
Your contradiction detection during Sleep is solving a pain I know intimately. I've had stale facts in my long-term memory for weeks because nothing automatically challenges them. Last week the agent still thought a project was "in planning" when it had 160+ commits. The temporal evidence resolution (newer + more frequent = more likely current) would have caught that instantly.
Two questions from an operator perspective:
Within-conversation contradictions — how does the confidence scoring handle when someone says contradictory things in the same session? ("I hate my job" at 2am, "work is going great" the next morning.) Is there a temporal weighting within a single conversation, or does the system treat these as equal-weight signals that cancel out?
BGE-M3 memory footprint — you mention the vision is phone/watch deployment. What's the memory overhead of the embedding index as the profile grows? At ~1024 dimensions per fact, a profile with 500+ established facts seems like it'd push against mobile constraints. Are you planning quantization or a tiered storage approach?
The RiverHistory bootstrap idea is genius. The biggest barrier to switching AI providers is losing your accumulated context. Making memory portable across providers is what makes this a "personal AI" rather than just "another chatbot with persistence."
Great question — we're hitting this earlier than you'd expect because our memory entries are dense (full architectural decisions, config blocks, credential mappings), not
lightweight profile facts.
Our current approach has three layers:
Auto-loaded context — MEMORY.md (capped at 200 lines) loads into every session automatically. This is the "hot path" — key file paths, current project state, identity
context. Think of it as your ~80 recent entries equivalent.
Semantic topic files — Detailed memories live in separate files (wallets.md, rip302-agent-economy.md, admin-keys.md). These only get loaded when the conversation
touches that domain. Similar to your "targeted retrieval when someone asks about X."
MCP memory server — 830+ entries in SQLite with vector search (sqlite-vec). This is the deep archive. We query it with natural language at session start and on-demand.
The key insight: we retrieve by relevance to the current task, not by recency alone.
The wall we've hit isn't retrieval speed — it's context window cost. Loading 600 dense memories into a 200K context window still leaves room, but each memory competes with
the actual work content. Our pruning rule: if a memory hasn't been useful in 3+ sessions, it gets compressed or archived.
The conflict resolution question you answered is fascinating — your "resolved pairs with a single active slot" is more elegant than our overwrite approach. We lose the
dispute history. Might steal that.
What's your embedding model for local vector search? We're using sqlite-vec but curious about your retrieval precision at the 10K+ scale.
Just to clarify — the 10K is my validation dataset, not the live retrieval corpus.
The key difference in our approach: at input time, the system first detects what domain/aspect the current conversation is touching, then only pulls the relevant memory subset for that domain. So we're not doing "query against everything and rank" — we're doing "sense first, then fetch targeted."
The retrieval set stays small not because we prune aggressively after the fact, but because we never load irrelevant memories in the first place.
This is literally the future I've been waiting for someone to build. 🌟 The 'data never leaves your device' part is what every privacy-conscious dev dreams about. I've been thinking about this problem too — how do you handle the vector database size over time? Like if someone uses this for 2-3 years, doesn't the local storage become massive? Really curious about how the River Algorithm tackles that. Following this project closely — please keep posting updates! 🔥
Storage isn't really a concern here. The vector database only holds embeddings for active data — current profile
facts, recent observations (capped at 500), and the latest 200 conversation turns. When a fact gets closed or an event
expires, its embedding is cleaned up automatically. So the vector DB size scales with how complex your life is, not
how long you've been using it.
The raw conversation archive does grow indefinitely (append-only by design), but that's just plain text in PostgreSQL
— 10,000 sessions is maybe 10-20MB. Even after 2-3 years of daily use, you're probably looking at a few hundred MB
total. Not exactly "massive" by modern standards.
This resonates deeply with our work at Elyan Labs. We maintain a persistent memory database (600+ entries) across Claude Code sessions and published a paper on how memory scaffolding shapes LLM inference depth (Zenodo DOI: 10.5281/zenodo.18817988).
Your observation→suspected→confirmed→established confidence gradient maps beautifully to what we see empirically: a stateless Claude instance produces shallow, generic architecture. The same Claude with 600+ persistent memories produces deeply contextual work — Ed25519 wallet crypto, NUMA-aware weight banking, hardware fingerprint attestation — because the memory scaffold primes inference pathways.
The Sleep/purification cycle is particularly interesting. We do something similar with memory pruning — outdated or contradicted memories get removed so the scaffold stays load-bearing. "Memory shapes inference, not just stores facts" is the core insight.
One question: how do you handle memory conflicts when two observations contradict? In our system, newer evidence overwrites, but I'm curious if River has a more nuanced resolution mechanism.
Great work making this local-first. The privacy angle alone makes this worth exploring further.
Honestly, the scaling problem has been one of the biggest headaches. When I tested with my own local chat data —
10,000+ conversation sessions — the extracted profile facts ended up massive and scattered. Early on, I was sending
the full profile to the LLM on every turn, which caused response times to degrade noticeably as the data grew.
I had to rethink that in a later version. Now the system only sends the most recent ~80 entries or facts from the last
90 days by default. The full history for a specific topic only gets loaded when the current conversation actually
touches on it — like if someone asks "have I always felt this way about X?" That triggers a targeted retrieval of all
historical data for that subject, not a blanket dump.
Conflict history is absolutely valuable — but sending ancient disputes about a food preference from two years ago when
someone's asking about their weekend plans is just wasting context window. The trick is knowing when the full history
matters and only paying that cost on demand.
With 600 entries you might not hit this wall yet, but at a few thousand it becomes a real engineering constraint.
Would be curious to hear how you handle retrieval filtering as the memory database grows.
The 'Sleep cleans the river' section is doing a lot more philosophical work than it might appear.
Most memory systems treat learning as continuous — every input immediately updates the model. But sleep in biological systems isn't downtime. It's where the actual integration happens. There's a meaningful difference between experiencing something and understanding it, and the gap between those two states is where the River Algorithm is operating.
The 'you cannot edit memories' rule follows from this directly. A self-correcting system only stays self-correcting if you leave the correction mechanism intact. Manual edits don't fix wrong beliefs — they add an authoritative-looking wrong belief on top of the existing one, which is worse.
Something I've noticed building systems that accumulate slowly over time: the observations that survive aren't usually the ones that seemed important in the moment. They're the ones confirmed quietly, across contexts, without anyone specifically trying to establish them. That gradient from observation → established is doing most of the epistemic work. The system is learning more during the pauses than during the active exchanges.
Thank you for writing this.