DEV Community

Cover image for I Built an Open-Source LLM Harness — AI Agents Interview, Plan, Build & Deploy Entire Apps From One Prompt | Any LLM | CLI
Michael Piscitelli
Michael Piscitelli

Posted on

I Built an Open-Source LLM Harness — AI Agents Interview, Plan, Build & Deploy Entire Apps From One Prompt | Any LLM | CLI

We hit submit on the Amazon Nova Hackathon with 2 minutes to spare.

12 days. 30,000 lines of Python. 9 playable games — all built by AI agents, zero human code. That's right, agent-inception!

See what it built →

## One Prompt In, Deployed App Out

Build a tower defense with 6 tower types and chain lightning

Nova Forge interviews you, decomposes the work into parallel tasks, assigns AI agents, runs adversarial quality review, and deploys. That prompt produced an 802-line playable game
in 341 seconds.

https://forge.herakles.dev/demos/

What Blew Our Minds

We wanted forge to produce a working app. Our test game had 5 bugs. We pointed the framework at its own output. Nova Pro found every bug, fixed them all in 26 seconds — including a
structural refactor using a tool we 'invented' that morning (replace_lines).

That debug session is now proof-of-work alongside the demo. The model fixing its own mistakes became our best feature-> zero manual debug loops.

The 6 Bugs Every Agent Framework Will Hit

We found these the hard way. Saving you the pain:

  1. Agents describe code instead of writing it — your prompt must say "call write_file," not "complete the task"
  2. Specs get summarized to death — "pseudo-3D racer with ACCEL=0.9" becomes "build a game" → wrong output
  3. Building agents never see the spec — only task summaries reach them. Inject the full spec.
  4. Verify phase kills multi-file builds — agent writes 80 lines, enters verify mode, wastes all turns
  5. Missing tools are silent failures — without think, models dump reasoning as text
  6. No path guidance = wrong directory — models love creating src/ when you want project root

Each one took hours to trace. We built fast, and broke things. But we finished the race! https://github.com/herakles-dev/nova-forge

The Stack

  • Pure Python — 35 modules, 1,670 tests, no JS dependencies
  • 14 agent tools — read, write, edit, replace_lines, bash, grep, think
  • 11 team formations — solo builds to 5-agent architecture reviews
  • 3 model tiers — works with Nova Lite (32K), Pro (300K), Premier (1M)
  • LLM-agnostic — Bedrock, OpenAI, Anthropic adapters in the router
  • LLM-agnostic — Bedrock, OpenAI, Anthropic adapters in the router

Come Build With Us

The framework works. The bugs are documented. The hard problems are interesting.

What we need:

  • 🔌 LLM adapters — Ollama, Groq, Mistral, Together
  • 🧪 Benchmark scenarios — what breaks when you try to build X?
  • 🛠️ Agent tools — test runners, git integration, package managers
  • 🐛 Bug reports — try building something weird and tell us what happens
  • More consistent builds, a smarter system (the first time)

git clone https://github.com/herakles-dev/nova-forge.git
cd nova-forge && ./setup.sh
python3 forge_cli.py

https://github.com/herakles-dev/nova-forge | 🎮 https://forge.herakles.dev/demos/ | 🐛
https://forge.herakles.dev/guide.html
https://github.com/herakles-dev/nova-forge/issues

Top comments (0)