Posted on Mar 15

I built the personal AI that OpenClaw should have been

#ai #opensource #selfhosted #python

OpenClaw hit 216,000 GitHub stars in six weeks. It proved that millions of people want a personal AI assistant they can run themselves. Then came CVE-2026-25253. Then the ClawHavoc supply chain attack — 341 malicious skills, 9,000 compromised installations. Cisco and Palo Alto flagged it for a "lethal trifecta" of security risks: unrestricted tool access, no prompt injection detection, and plaintext credential storage.

I'd been building my own self-hosted AI assistant for months. When OpenClaw blew up — and then blew up differently — I decided to open-source it.

It's called Nova.

What makes Nova different

Every AI assistant answers questions. Nova is the only one that learns from getting them wrong.

You: "What's the capital of Australia?"
Nova: "Sydney"
You: "That's wrong, it's Canberra"

Nova detects the correction, extracts a lesson, generates a
DPO training pair, and updates its knowledge graph.

--- 3 months later, different conversation ---

You: "What's the capital of Australia?"
Nova: "Canberra"

That's not retrieval-augmented generation. That's not prompt engineering. The model itself got smarter.

The learning loop — how it actually works

Nova has a 7-stage self-improvement pipeline. No other open-source project has anything close to this.

Stage 1: Correction Detection

When you say "actually, it's X" or "that's wrong," Nova's 2-stage detector fires:

Regex pre-filter — 12 compiled patterns catch correction language ("actually," "that's wrong," "it should be," "remember that"). Fast, zero LLM cost.
LLM confirmation — if the regex matches, Nova sends the exchange to the LLM with a structured extraction prompt. It pulls out: what was wrong, what's correct, and a one-sentence lesson.

Why two stages? Because "actually, I was thinking about pasta tonight" isn't a correction. The regex catches candidates cheaply; the LLM filters false positives.

Stage 2: Lesson Storage

Every confirmed correction becomes a lesson with four fields: topic, wrong_answer, correct_answer, and lesson_text. Lessons are stored in SQLite and indexed in ChromaDB for semantic search.

On future queries, Nova retrieves relevant lessons using hybrid search (vector similarity + BM25 keyword matching + Reciprocal Rank Fusion) and injects them into the system prompt: "You got this wrong before. The capital of Australia is Canberra, not Sydney."

Stage 3: DPO Training Data

Every correction also generates a DPO (Direct Preference Optimization) training pair:

{
  "query": "What's the capital of Australia?",
  "chosen": "The capital of Australia is Canberra.",
  "rejected": "The capital of Australia is Sydney.",
  "timestamp": "2026-03-15T14:23:01"
}

These accumulate in a JSONL file. When enough pairs exist, Nova can fine-tune its own base model.

Stage 4: Automated Fine-Tuning

Nova includes an 8-step automated pipeline (scripts/finetune_auto.py):

Check readiness (minimum 50 new DPO pairs)
Load training data
Stop Ollama (free GPU VRAM)
Run DPO training via Unsloth
Export to GGUF
Restart Ollama
A/B evaluation — run holdout queries through both base and fine-tuned models, LLM-as-judge with randomized ordering to prevent position bias
Deploy only if the fine-tuned model wins >50% with positive average preference

The model literally gets smarter. Not through bigger context windows or better prompts — through actual weight updates from your corrections.

Stage 5: Reflexion

Not every failure is an explicit correction. Sometimes Nova gives a bad answer and you just move on. Reflexion catches these silent failures:

Empty or very short responses to complex queries
Tool loop exhaustion (used all 5 rounds without a clean answer)
Error phrases in the response ("I couldn't," "failed to")
Hallucination indicators

Failed responses are stored as reflexions. On future similar queries, Nova retrieves them as warnings: "You failed on a similar query before. Here's what went wrong."

Stage 6: Curiosity Engine

When Nova hedges ("I'm not sure"), admits ignorance, or a tool search returns nothing useful, the curiosity engine detects the knowledge gap and queues it for background research. A scheduled monitor (runs every hour) picks up the queue and researches the topics autonomously — results become knowledge graph triples.

Stage 7: Success Patterns

High-quality responses (score >= 0.8) are stored as positive reinforcement. On similar future queries, Nova retrieves what worked: "This approach worked well last time."

What's under the hood

Nova replaces a 9-node LangGraph pipeline with a single async generator function: brain.think(). About 1,400 lines of Python that orchestrate 5 stages:

Gather context — load user facts, lessons, knowledge graph, reflexions, retrieved documents, skills
Build messages — assemble system prompt from 8 prioritized blocks with truncation budget
Generate + tool loop — up to 5 rounds of LLM generation + tool execution (21 built-in tools)
Refine — multi-round self-critique, plan coverage check, reflexion quality assessment
Post-process — correction detection, fact extraction, KG updates, curiosity gap detection

No LangChain. No LangGraph. No agent frameworks. Just async for event in think(query).

Security — built in, not bolted on

After watching OpenClaw's security meltdown, I built Nova with the OWASP Agentic Security Top 10 in mind:

Risk	OpenClaw	Nova
Unrestricted tool access	All tools always available	4-tier access control (sandboxed/standard/full/none)
Prompt injection	No detection	4-category heuristic detection on all external content
Credential exposure	Plaintext storage flagged	No hardcoded secrets, `.env` gitignored, HMAC skill signing
Training data poisoning	N/A (no learning)	Channel gating + confidence threshold for DPO pairs
Container security	Basic Docker	Read-only root, no-new-privileges, all capabilities dropped
Auth	Partial	Bearer token + per-IP brute-force lockout (10 failures = 5min ban)

The prompt injection detector runs on every piece of external content — web search results, fetched pages, browser output, MCP tool results, imported skills. It checks 4 categories (role override, instruction injection, delimiter abuse, encoding tricks) with Unicode normalization and homoglyph detection. Suspicious content gets flagged, not stripped — the LLM sees it but is warned.

The stack

Backend: Python 3.11+, FastAPI, httpx, SQLite (WAL mode), ChromaDB
LLM: Ollama (default: Qwen3.5:27b) or OpenAI/Anthropic/Google
Frontend: React + TypeScript + Vite
Search: SearXNG (privacy-respecting, self-hosted)
Deployment: Docker Compose (4 services)
Tests: 1,443 across 57 files (including security offensive, stress, and behavioral tests)

No GPU? Use docker-compose.cloud.yml — cloud handles inference, all data stays on your machine.

What else it does

Temporal knowledge graph — facts track when they were valid, with supersession chains and provenance. Query what was true at any point in time.
14 proactive monitors — scheduled domain research, self-reflection, lesson quizzes, skill validation, system maintenance. Nova works even when you're not talking to it.
4 messaging channels — Discord, Telegram, WhatsApp, Signal. All with phone-number allowlisting.
MCP dual-mode — consumes external tools (client) AND exposes its intelligence to Claude Code, Cursor, etc. (server). No other personal AI does both.
21 built-in tools — web search, calculator, code execution, browser, email, calendar, webhooks, file ops, shell, and more.
Voice — local Whisper speech-to-text.
Desktop automation — PyAutoGUI-based GUI control.

Try it

git clone https://github.com/HeliosNova/nova.git
cd nova && cp .env.example .env
docker compose up -d

Or one-liner:

curl -fsSL https://raw.githubusercontent.com/HeliosNova/nova/main/install.sh | bash

AGPL-3.0. Issues and PRs welcome.

https://github.com/HeliosNova/nova

DEV Community