OpenClaw hit 216,000 GitHub stars in six weeks. It proved that millions of people want a personal AI assistant they can run themselves. Then came CVE-2026-25253. Then the ClawHavoc supply chain attack — 341 malicious skills, 9,000 compromised installations. Cisco and Palo Alto flagged it for a "lethal trifecta" of security risks: unrestricted tool access, no prompt injection detection, and plaintext credential storage.
I'd been building my own self-hosted AI assistant for months. When OpenClaw blew up — and then blew up differently — I decided to open-source it.
It's called Nova.
What makes Nova different
Every AI assistant answers questions. Nova is the only one that learns from getting them wrong.
You: "What's the capital of Australia?"
Nova: "Sydney"
You: "That's wrong, it's Canberra"
Nova detects the correction, extracts a lesson, generates a
DPO training pair, and updates its knowledge graph.
--- 3 months later, different conversation ---
You: "What's the capital of Australia?"
Nova: "Canberra"
That's not retrieval-augmented generation. That's not prompt engineering. The model itself got smarter.
The learning loop — how it actually works
Nova has a 7-stage self-improvement pipeline. No other open-source project has anything close to this.
Stage 1: Correction Detection
When you say "actually, it's X" or "that's wrong," Nova's 2-stage detector fires:
- Regex pre-filter — 12 compiled patterns catch correction language ("actually," "that's wrong," "it should be," "remember that"). Fast, zero LLM cost.
- LLM confirmation — if the regex matches, Nova sends the exchange to the LLM with a structured extraction prompt. It pulls out: what was wrong, what's correct, and a one-sentence lesson.
Why two stages? Because "actually, I was thinking about pasta tonight" isn't a correction. The regex catches candidates cheaply; the LLM filters false positives.
Stage 2: Lesson Storage
Every confirmed correction becomes a lesson with four fields: topic, wrong_answer, correct_answer, and lesson_text. Lessons are stored in SQLite and indexed in ChromaDB for semantic search.
On future queries, Nova retrieves relevant lessons using hybrid search (vector similarity + BM25 keyword matching + Reciprocal Rank Fusion) and injects them into the system prompt: "You got this wrong before. The capital of Australia is Canberra, not Sydney."
Stage 3: DPO Training Data
Every correction also generates a DPO (Direct Preference Optimization) training pair:
{
"query": "What's the capital of Australia?",
"chosen": "The capital of Australia is Canberra.",
"rejected": "The capital of Australia is Sydney.",
"timestamp": "2026-03-15T14:23:01"
}
These accumulate in a JSONL file. When enough pairs exist, Nova can fine-tune its own base model.
Stage 4: Automated Fine-Tuning
Nova includes an 8-step automated pipeline (scripts/finetune_auto.py):
- Check readiness (minimum 50 new DPO pairs)
- Load training data
- Stop Ollama (free GPU VRAM)
- Run DPO training via Unsloth
- Export to GGUF
- Restart Ollama
- A/B evaluation — run holdout queries through both base and fine-tuned models, LLM-as-judge with randomized ordering to prevent position bias
- Deploy only if the fine-tuned model wins >50% with positive average preference
The model literally gets smarter. Not through bigger context windows or better prompts — through actual weight updates from your corrections.
Stage 5: Reflexion
Not every failure is an explicit correction. Sometimes Nova gives a bad answer and you just move on. Reflexion catches these silent failures:
- Empty or very short responses to complex queries
- Tool loop exhaustion (used all 5 rounds without a clean answer)
- Error phrases in the response ("I couldn't," "failed to")
- Hallucination indicators
Failed responses are stored as reflexions. On future similar queries, Nova retrieves them as warnings: "You failed on a similar query before. Here's what went wrong."
Stage 6: Curiosity Engine
When Nova hedges ("I'm not sure"), admits ignorance, or a tool search returns nothing useful, the curiosity engine detects the knowledge gap and queues it for background research. A scheduled monitor (runs every hour) picks up the queue and researches the topics autonomously — results become knowledge graph triples.
Stage 7: Success Patterns
High-quality responses (score >= 0.8) are stored as positive reinforcement. On similar future queries, Nova retrieves what worked: "This approach worked well last time."
What's under the hood
Nova replaces a 9-node LangGraph pipeline with a single async generator function: brain.think(). About 1,400 lines of Python that orchestrate 5 stages:
- Gather context — load user facts, lessons, knowledge graph, reflexions, retrieved documents, skills
- Build messages — assemble system prompt from 8 prioritized blocks with truncation budget
- Generate + tool loop — up to 5 rounds of LLM generation + tool execution (21 built-in tools)
- Refine — multi-round self-critique, plan coverage check, reflexion quality assessment
- Post-process — correction detection, fact extraction, KG updates, curiosity gap detection
No LangChain. No LangGraph. No agent frameworks. Just async for event in think(query).
Security — built in, not bolted on
After watching OpenClaw's security meltdown, I built Nova with the OWASP Agentic Security Top 10 in mind:
| Risk | OpenClaw | Nova |
|---|---|---|
| Unrestricted tool access | All tools always available | 4-tier access control (sandboxed/standard/full/none) |
| Prompt injection | No detection | 4-category heuristic detection on all external content |
| Credential exposure | Plaintext storage flagged | No hardcoded secrets, .env gitignored, HMAC skill signing |
| Training data poisoning | N/A (no learning) | Channel gating + confidence threshold for DPO pairs |
| Container security | Basic Docker | Read-only root, no-new-privileges, all capabilities dropped |
| Auth | Partial | Bearer token + per-IP brute-force lockout (10 failures = 5min ban) |
The prompt injection detector runs on every piece of external content — web search results, fetched pages, browser output, MCP tool results, imported skills. It checks 4 categories (role override, instruction injection, delimiter abuse, encoding tricks) with Unicode normalization and homoglyph detection. Suspicious content gets flagged, not stripped — the LLM sees it but is warned.
The stack
- Backend: Python 3.11+, FastAPI, httpx, SQLite (WAL mode), ChromaDB
- LLM: Ollama (default: Qwen3.5:27b) or OpenAI/Anthropic/Google
- Frontend: React + TypeScript + Vite
- Search: SearXNG (privacy-respecting, self-hosted)
- Deployment: Docker Compose (4 services)
- Tests: 1,443 across 57 files (including security offensive, stress, and behavioral tests)
No GPU? Use docker-compose.cloud.yml — cloud handles inference, all data stays on your machine.
What else it does
- Temporal knowledge graph — facts track when they were valid, with supersession chains and provenance. Query what was true at any point in time.
- 14 proactive monitors — scheduled domain research, self-reflection, lesson quizzes, skill validation, system maintenance. Nova works even when you're not talking to it.
- 4 messaging channels — Discord, Telegram, WhatsApp, Signal. All with phone-number allowlisting.
- MCP dual-mode — consumes external tools (client) AND exposes its intelligence to Claude Code, Cursor, etc. (server). No other personal AI does both.
- 21 built-in tools — web search, calculator, code execution, browser, email, calendar, webhooks, file ops, shell, and more.
- Voice — local Whisper speech-to-text.
- Desktop automation — PyAutoGUI-based GUI control.
Try it
git clone https://github.com/HeliosNova/nova.git
cd nova && cp .env.example .env
docker compose up -d
Or one-liner:
curl -fsSL https://raw.githubusercontent.com/HeliosNova/nova/main/install.sh | bash
AGPL-3.0. Issues and PRs welcome.
Top comments (0)