DEV Community

Smith Falcao
Smith Falcao

Posted on

I built a self-hosted AI agent with a 30-min self-improvement loop — here's what I learned

Six months ago I started building an AI agent I actually wanted to use.
Not another LangChain wrapper — a single, self-hosted system that gets
measurably better the more I work with it.

This week I cut the v0.1.0 release.

What it is

Hyper Nexus is a self-hosted AI
agent with:

  • A 30-minute self-improvement heartbeat that mines your agent's own logs for successful patterns, clusters failures, and writes learned heuristics back into the system prompt.
  • An "ADHD" cross-domain reasoning module — for non-trivial tasks, the agent fires the problem across 8 knowledge domains (biology, physics, music, economics, architecture, game theory, neuroscience, military) in parallel, then synthesises analogies back into the prompt.
  • 165 tools, 25 skill packs, ~100 integration connectors in one pip install.
  • Dual-layer memory with Ebbinghaus-style forgetting.
  • 100% local.** Vision (Florence-2) and embeddings (MiniLM) run on-device. MIT licensed.

Stack: FastAPI, SQLite, PyTorch, vanilla JS WebUI. ~60K LoC of Python.

Why I built it

When I started this, I thought: why not try to model something close
to how humans actually think? The result isn't fully polished, and
there are real shortcomings — but I'd love feedback so I can keep
improving it. This is going to be an open-source project, and I want
it to grow with the people who use it.

What I learned building it

Lesson 1: The hard part is not the LLM call.** It's everything around
it — tool execution, error recovery, state management, the agent's
"short-term memory" of what it's already tried, the user's long-term
context. The actual prompt is maybe 5% of the code.

Lesson 2: Tests matter even for solo projects.** I shipped v0.1.0
with zero automated tests. I regret this. If you're reading this and
considering the same — don't.

Lesson 3: Don't promise self-improvement you can't measure.** I have
a 30-min heartbeat that does something. Whether it actually makes
the agent better at your task is unmeasured. I'm working on an eval
harness to find out.

What's next

  • Build an eval harness (the biggest gap)
  • Add a few demo tasks the agent does well, recorded as GIFs
  • Get more contributors

If you try it, please open an issue — that's the only way I can
prioritise what actually breaks vs what I think breaks.

Let's make something meaningful.

GitHub: https://github.com/Hsosn/HYPER_NEXUS

MIT licensed. PRs welcome.

Top comments (0)