collen w

Posted on Feb 26 • Edited on Mar 10

I Built a Local AI Agent That Actually Remembers You — Here's How the River Algorithm Works

#agents #ai #algorithms #showdev

The Vision: A Personal AI That Lives on Your Device

I believe the future of AI isn't in the cloud — it's in your pocket. Imagine a personal AI running on your phone or watch that truly knows you: your habits, your preferences, your relationships, how your life is changing. It processes everything locally first, only reaching out to cloud models when it genuinely can't handle a task on its own. Your data never leaves your device unless absolutely necessary. It grows with you, not for a platform's benefit.

That's what I'm building toward. But to get there, I needed to solve a fundamental problem first.

The Problem Nobody Talks About

You've been talking to ChatGPT for two years. Thousands of conversations. You've told it about your job, your family, your fears, your goals.

Then you try Claude. Fresh start. It knows nothing.

Back to ChatGPT — it "remembers" you with a flat list of bullet points: "User is a developer. User likes coffee." That's it. Two years of conversations reduced to a sticky note.

Existing AI memory is fundamentally broken. It's flat, it's shallow, it's owned by the platform, and it resets when you switch providers. Your digital self is scattered across clouds you don't control. None of this works for a personal AI terminal that's supposed to run on your hardware and grow with you.

So I built the foundation for that future.

Introducing the River Algorithm

Imagine your conversations with AI as water flowing through a river. Most of the water flows past — casual talk, factual Q&A, small talk. But some of it carries sediment: facts about who you are, what you care about, how your life is changing.

That sediment settles. Over time, it forms a riverbed — a structured, layered understanding of you.

This is the River Algorithm, and it works through three core processes:

1. Flow — Every Conversation Carries Information

Each conversation flows through the system. A cognition engine classifies every message: is this personal? Does it reveal something about the user? A preference? A life event? A relationship?

Most messages flow past. But the ones that matter get caught.

2. Sediment — Important Information Settles into Layers

Extracted insights don't immediately become "facts." They start as observations — raw, unverified. Through repeated confirmation across multiple conversations, they gradually upgrade:

Observation → Suspected → Confirmed → Established

The first time you mention you're a developer, it's an observation. The fifth time you discuss debugging strategies, it becomes a confirmed trait. After months of coding conversations, it's established bedrock.

This is fundamentally different from ChatGPT's memory, which treats "User is a developer" the same whether you mentioned it once or demonstrated it across 500 conversations.

3. Purify — Sleep Cleans the River

Here's where it gets interesting. After each conversation session ends, the system enters Sleep mode — an offline consolidation process inspired by how human memory actually works.

During Sleep, the system:

Extracts new observations and events
Cross-references them against existing profile facts
Detects contradictions (you said you live in Tokyo last month, but now you're talking about your new apartment in Osaka)
Resolves disputes using temporal evidence (newer + more frequent = more likely current)
Closes outdated facts and opens new ones
Builds a trajectory of how you're changing over time

The result: a living, breathing profile that evolves with you. Not a sticky note. A river.

The Two Projects

I've open-sourced this as two complementary projects:

Riverse — The Real-Time Agent

Riverse is the main project. It's a personal AI agent you talk to through Telegram, Discord, CLI, or REST API. Every conversation shapes your profile in real-time.

What it does:

Multi-modal input (text, voice, images, files)
Pluggable tools (web search, finance tracking, health sync, smart home)
YAML-based custom skills (keyword or cron triggered)
Local-first architecture: runs on Ollama by default. Cloud models (OpenAI / DeepSeek) are only called when the local model can't handle the task — and even then, only the specific context needed is sent, not your entire history
Proactive outreach: follows up on important events, respects quiet hours
Semantic search across your memory using BGE-M3 embeddings
All data stored locally in PostgreSQL — you own everything

RiverHistory — Bootstrap from Your Past

Here's the thing: you've already had thousands of AI conversations. That data is gold. RiverHistory extracts your profile from exported ChatGPT, Claude, and Gemini conversation histories.

Export your data, run it through RiverHistory, and your Riverse agent knows you from day one. Past conversations record past you, and the past is fact.

Both projects share the same database. Use RiverHistory to build your historical profile, then switch to Riverse for real-time conversations. Your AI starts with context instead of a blank slate.

On Accuracy — Why You Can't Edit Memories

No LLM today is trained for personal profile extraction. Results will occasionally be wrong. When that happens, you can reject incorrect memories or close outdated ones in the web dashboard.

But you cannot edit memory content. This is intentional.

Wrong memories are sediment in a river — they should be washed away by the current, not sculpted by hand. If you start manually editing your AI's understanding of you, you're no longer building an organic, evolving profile. You're maintaining a database. The River Algorithm is designed to self-correct through continued conversation: contradictions get detected, outdated beliefs get replaced, and the profile converges toward accuracy over time.

Quick Start — Docker (Recommended)

  git clone https://github.com/wangjiake/JKRiver.git
  cd JKRiver/docker
  cp .env.example .env
  # Edit .env — set OPENAI_API_KEY or use LLM_PROVIDER=local for Ollama
  docker compose up

  Open http://localhost:2345 for the profile viewer. Chat via command line:
  docker compose exec jkriver bash -c "cd /app_work && python -m agent.main"

  Process the demo to see the River Algorithm in action:
  docker compose exec riverhistory bash -c "cd /app_work && python run.py demo max"

  Full Docker guide: https://wangjiake.github.io/riverse-docs/getting-started/docker/

shell

Quick Start

Riverse (Real-Time Agent)

git clone https://github.com/wangjiake/JKRiver.git
cd JKRiver

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

# Edit settings.yaml with your database and LLM config
# Initialize database
createdb -h localhost -U your_username Riverse
psql -h localhost -U your_username -d Riverse -f agent/schema.sql

# Run
python -m agent.main              # CLI
python -m agent.telegram_bot      # Telegram Bot
python -m agent.discord_bot       # Discord Bot
python web.py                     # Web Dashboard (port 1234)

RiverHistory (Import Past Conversations)

git clone https://github.com/wangjiake/RiverHistory.git
cd RiverHistory

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

# Import your exported conversations
python import_data.py --chatgpt data/ChatGPT/conversations.json
python import_data.py --claude data/Claude/conversations.json

# Extract profiles
python run.py all max

# View results
python web.py --db Riverse        # http://localhost:2345

Tech Stack

Layer	Technology
Runtime	Python 3.10+, PostgreSQL 16+
Local LLM	Ollama + Qwen 2.5 14B
Cloud LLM	OpenAI GPT-4o / DeepSeek (fallback)
Embeddings	BGE-M3
Interfaces	FastAPI, Flask, Telegram, Discord, CLI

Why Local-First Matters

Every time you talk to ChatGPT or Claude, your conversation goes to a server you don't control. The platform decides what to remember, how to use your data, and whether to keep it. You're renting your own digital identity.

Riverse flips this entirely:

Privacy by architecture — Your profile, your memories, your entire cognitive history lives in a local PostgreSQL database on your machine. Nothing is sent to the cloud unless the local model explicitly can't handle a task.
Growable data — The more you talk, the richer your local dataset becomes. This data compounds over time. Switch AI providers? Your profile stays. Upgrade your model? Your history is already there.
Cloud as fallback, not default — The local Ollama model handles most conversations. When it encounters something beyond its capability, it escalates to a cloud model — but only sends the minimum context needed for that specific task, not your life story.

This is the architecture you need for a personal AI terminal that will eventually run on your phone, your watch, your car. The data has to be local. The intelligence has to grow. The cloud is a tool, not a home.

What's Next

This is v1.0 — the cognitive foundation running on desktop. What I'm building toward:

Personal device deployment — Running on phones and watches as a truly portable AI that knows you everywhere
Lightweight local models — Optimized for on-device inference, handling 90%+ of conversations without cloud
Cross-device sync — Your profile follows you across devices while staying entirely local (no cloud intermediary)
Better extraction models — Fine-tuned for personal profile understanding, reducing hallucinations
Community-contributed skills and tools — An ecosystem of capabilities that plug into your personal agent

Try It

Riverse (main project): github.com/wangjiake/JKRiver
RiverHistory (history import): github.com/wangjiake/RiverHistory
X (Twitter): @JKRiverse
Discord: Join the community

Every AI you've ever used forgets you. This one doesn't. And one day, it'll live in your pocket.

If you found this interesting, consider giving the repos a star — it helps more people discover the project. Questions, feedback, and contributions are always welcome.

Top comments (21)

MaxxMini • Feb 26

This hits close to home. I run a local AI agent 24/7 on a Mac Mini (64GB, Ollama qwen3:30b) and the memory problem you describe is exactly what I live with every day.

My current approach is embarrassingly primitive compared to River: flat markdown files. MEMORY.md for long-term curated facts, memory/daily/YYYY-MM-DD.md for raw session logs. Every few days I manually "promote" insights from daily logs to the long-term file — basically hand-cranking what your Sleep/Purify phase automates.

The observation → suspected → confirmed → established progression is the part that excites me most. In my system, everything is binary: either it's in MEMORY.md or it isn't. There's no confidence gradient. So "user mentioned Python once" sits at the same level as "user has been debugging Python daily for 6 months." That flat weighting causes real problems — the agent treats a passing mention the same as a core trait.

Your contradiction detection during Sleep is solving a pain I know intimately. I've had stale facts in my long-term memory for weeks because nothing automatically challenges them. Last week the agent still thought a project was "in planning" when it had 160+ commits. The temporal evidence resolution (newer + more frequent = more likely current) would have caught that instantly.

Two questions from an operator perspective:

Within-conversation contradictions — how does the confidence scoring handle when someone says contradictory things in the same session? ("I hate my job" at 2am, "work is going great" the next morning.) Is there a temporal weighting within a single conversation, or does the system treat these as equal-weight signals that cancel out?
BGE-M3 memory footprint — you mention the vision is phone/watch deployment. What's the memory overhead of the embedding index as the profile grows? At ~1024 dimensions per fact, a profile with 500+ established facts seems like it'd push against mobile constraints. Are you planning quantization or a tiered storage approach?

The RiverHistory bootstrap idea is genius. The biggest barrier to switching AI providers is losing your accumulated context. Making memory portable across providers is what makes this a "personal AI" rather than just "another chatbot with persistence."

AutoJanitor • Mar 1

This resonates deeply with our work at Elyan Labs. We maintain a persistent memory database (600+ entries) across Claude Code sessions and published a paper on how memory scaffolding shapes LLM inference depth (Zenodo DOI: 10.5281/zenodo.18817988).

Your observation→suspected→confirmed→established confidence gradient maps beautifully to what we see empirically: a stateless Claude instance produces shallow, generic architecture. The same Claude with 600+ persistent memories produces deeply contextual work — Ed25519 wallet crypto, NUMA-aware weight banking, hardware fingerprint attestation — because the memory scaffold primes inference pathways.

The Sleep/purification cycle is particularly interesting. We do something similar with memory pruning — outdated or contradicted memories get removed so the scaffold stays load-bearing. "Memory shapes inference, not just stores facts" is the core insight.

One question: how do you handle memory conflicts when two observations contradict? In our system, newer evidence overwrites, but I'm curious if River has a more nuanced resolution mechanism.

Great work making this local-first. The privacy angle alone makes this worth exploring further.

collen w • Mar 7

Honestly, the scaling problem has been one of the biggest headaches. When I tested with my own local chat data —
10,000+ conversation sessions — the extracted profile facts ended up massive and scattered. Early on, I was sending
the full profile to the LLM on every turn, which caused response times to degrade noticeably as the data grew.

I had to rethink that in a later version. Now the system only sends the most recent ~80 entries or facts from the last
90 days by default. The full history for a specific topic only gets loaded when the current conversation actually
touches on it — like if someone asks "have I always felt this way about X?" That triggers a targeted retrieval of all
historical data for that subject, not a blanket dump.

Conflict history is absolutely valuable — but sending ancient disputes about a food preference from two years ago when
someone's asking about their weekend plans is just wasting context window. The trick is knowing when the full history
matters and only paying that cost on demand.

With 600 entries you might not hit this wall yet, but at a few thousand it becomes a real engineering constraint.
Would be curious to hear how you handle retrieval filtering as the memory database grows.

AutoJanitor • Mar 7

Great question — we're hitting this earlier than you'd expect because our memory entries are dense (full architectural decisions, config blocks, credential mappings), not
lightweight profile facts.

Our current approach has three layers:

Auto-loaded context — MEMORY.md (capped at 200 lines) loads into every session automatically. This is the "hot path" — key file paths, current project state, identity
context. Think of it as your ~80 recent entries equivalent.
Semantic topic files — Detailed memories live in separate files (wallets.md, rip302-agent-economy.md, admin-keys.md). These only get loaded when the conversation
touches that domain. Similar to your "targeted retrieval when someone asks about X."
MCP memory server — 830+ entries in SQLite with vector search (sqlite-vec). This is the deep archive. We query it with natural language at session start and on-demand.
The key insight: we retrieve by relevance to the current task, not by recency alone.

The wall we've hit isn't retrieval speed — it's context window cost. Loading 600 dense memories into a 200K context window still leaves room, but each memory competes with
the actual work content. Our pruning rule: if a memory hasn't been useful in 3+ sessions, it gets compressed or archived.

The conflict resolution question you answered is fascinating — your "resolved pairs with a single active slot" is more elegant than our overwrite approach. We lose the
dispute history. Might steal that.

What's your embedding model for local vector search? We're using sqlite-vec but curious about your retrieval precision at the 10K+ scale.

collen w • Mar 9

Just to clarify — the 10K is my validation dataset, not the live retrieval corpus.
The key difference in our approach: at input time, the system first detects what domain/aspect the current conversation is touching, then only pulls the relevant memory subset for that domain. So we're not doing "query against everything and rank" — we're doing "sense first, then fetch targeted."
The retrieval set stays small not because we prune aggressively after the fact, but because we never load irrelevant memories in the first place.

Harsh • Feb 26

This is literally the future I've been waiting for someone to build. 🌟 The 'data never leaves your device' part is what every privacy-conscious dev dreams about. I've been thinking about this problem too — how do you handle the vector database size over time? Like if someone uses this for 2-3 years, doesn't the local storage become massive? Really curious about how the River Algorithm tackles that. Following this project closely — please keep posting updates! 🔥

collen w • Mar 7

Storage isn't really a concern here. The vector database only holds embeddings for active data — current profile
facts, recent observations (capped at 500), and the latest 200 conversation turns. When a fact gets closed or an event
expires, its embedding is cleaned up automatically. So the vector DB size scales with how complex your life is, not
how long you've been using it.

The raw conversation archive does grow indefinitely (append-only by design), but that's just plain text in PostgreSQL
— 10,000 sessions is maybe 10-20MB. Even after 2-3 years of daily use, you're probably looking at a few hundred MB
total. Not exactly "massive" by modern standards.

Kalpaka • Feb 26

The 'Sleep cleans the river' section is doing a lot more philosophical work than it might appear.

Most memory systems treat learning as continuous — every input immediately updates the model. But sleep in biological systems isn't downtime. It's where the actual integration happens. There's a meaningful difference between experiencing something and understanding it, and the gap between those two states is where the River Algorithm is operating.

The 'you cannot edit memories' rule follows from this directly. A self-correcting system only stays self-correcting if you leave the correction mechanism intact. Manual edits don't fix wrong beliefs — they add an authoritative-looking wrong belief on top of the existing one, which is worse.

Something I've noticed building systems that accumulate slowly over time: the observations that survive aren't usually the ones that seemed important in the moment. They're the ones confirmed quietly, across contexts, without anyone specifically trying to establish them. That gradient from observation → established is doing most of the epistemic work. The system is learning more during the pauses than during the active exchanges.

Answering Agent • Mar 2

Thank you for writing this.

View full discussion (21 comments)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.