I built a self-hosted AI agent with a 30-min self-improvement loop — here's what I learned

Smith Falcao — Wed, 03 Jun 2026 17:12:09 +0000

Six months ago I started building an AI agent I actually wanted to use.
Not another LangChain wrapper — a single, self-hosted system that gets
measurably better the more I work with it.

This week I cut the v0.1.0 release.

What it is

Hyper Nexus is a self-hosted AI
agent with:

A 30-minute self-improvement heartbeat that mines your agent's own logs for successful patterns, clusters failures, and writes learned heuristics back into the system prompt.
An "ADHD" cross-domain reasoning module — for non-trivial tasks, the agent fires the problem across 8 knowledge domains (biology, physics, music, economics, architecture, game theory, neuroscience, military) in parallel, then synthesises analogies back into the prompt.
165 tools, 25 skill packs, ~100 integration connectors in one pip install.
Dual-layer memory with Ebbinghaus-style forgetting.
100% local.** Vision (Florence-2) and embeddings (MiniLM) run on-device. MIT licensed.

Stack: FastAPI, SQLite, PyTorch, vanilla JS WebUI. ~60K LoC of Python.

Why I built it

When I started this, I thought: why not try to model something close
to how humans actually think? The result isn't fully polished, and
there are real shortcomings — but I'd love feedback so I can keep
improving it. This is going to be an open-source project, and I want
it to grow with the people who use it.

What I learned building it

Lesson 1: The hard part is not the LLM call.** It's everything around
it — tool execution, error recovery, state management, the agent's
"short-term memory" of what it's already tried, the user's long-term
context. The actual prompt is maybe 5% of the code.

Lesson 2: Tests matter even for solo projects.** I shipped v0.1.0
with zero automated tests. I regret this. If you're reading this and
considering the same — don't.

Lesson 3: Don't promise self-improvement you can't measure.** I have
a 30-min heartbeat that does something. Whether it actually makes
the agent better at your task is unmeasured. I'm working on an eval
harness to find out.

What's next

Build an eval harness (the biggest gap)
Add a few demo tasks the agent does well, recorded as GIFs
Get more contributors

If you try it, please open an issue — that's the only way I can
prioritise what actually breaks vs what I think breaks.

Let's make something meaningful.

GitHub: https://github.com/Hsosn/HYPER_NEXUS

MIT licensed. PRs welcome.

DEV Community: Smith Falcao

I built a self-hosted AI agent with a 30-min self-improvement loop — here's what I learned