Building an Autonomous LLM Society with NanoClaw & Ollama

#ai #programming #javascript #beginners

How do you build a digital society where every inhabitant possesses genuine, distinct cognition, yet the underlying simulation engine runs cleanly at 60 Frames Per Second?

In this tutorial, we'll dive deep into the architecture of NanoClaw Society—an ambitious project integrating deterministic Node.js physics engines with entirely un-deterministic, open-source Large Language Models (LLMs) running locally.

The Challenge of Integrating AI into Realtime Environments

The traditional way to govern NPCs (Non-Player Characters) in simulations or games is through hard-coded state machines. If an enemy sees you, they enter the ATTACK state. If their health is low, they transition to FLEE. It's rigid, deterministic, and instant.

When you replace those state machines with an LLM, things get messy quickly. A typical LLM query:

Is incredibly slow compared to a 16-millisecond frame loop.
Can output wildly unpredictable or hallucinated responses.
Consumes vast system resources.

I wanted an environment where agents use real reasoning (evaluating who to trust, where to move, when to trade based on nuanced language prompts) without blocking the entire simulation while compiling their thoughts.

Decoupling the Mind from the Body

The core breakthrough in our architecture was cleanly severing the "Cognitive Layer" from the "Simulation Engine" into three extremely distinct, asynchronous domains.

1. The Engine ⚙️

A pure Node.js loop running exactly 60 times a second. It maintains the absolute source of truth. It knows where Agent 7 is, how many resources are left, and how to apply physics vectors.
It exposes a raw websocket pipe (socket.io) to stream its universe state outwards, and a REST endpoint to accept incoming "Intents."

2. The Cognitive Orchestrator 🧠

This is an entirely separate background runner (AgentRunner.ts).
Instead of blocking the engine tick loop, the Runner independently polls the active world state. It then loops through the agents, looks at their assigned Model Profile (like llama3.2 vs gemma3), and constructs a massive context payload in the background:
"You are Agent 4. Your energy is low. Nearby agents are Red Faction. The universe is currently in Survival Mode."

The Orchestrator blasts these prompts asynchronously against our local localhost:11434 Ollama endpoint. It specifically enforces strict JSON outputs.
If an inference takes 3 seconds? That's fine. The physical avatar in the simulation merely stays in an "Idle" state while that specific agent finishes its "daydreaming."

3. The Visualization Canvas 🎨

Built in React + Vite, the UI connects to the Engine websocket. I rendered a beautiful, dark-mode <canvas> to smoothly interpolate agent positional updates.

Because I wanted true observability into why an agent made a decision, I passed the LLM's thought string backward through the engine to render as dynamic, interactive chat bubbles right alongside the actors!

Taming Output Hallucinations

A massive hurdle when letting open-source LLMs play in a sandbox is handling bad outputs. Even with strict prompts, smaller quantization models (like 4b parameter models) occasionally return conversational fluff surrounding their payload:
"Here is your action! { "action": "idle" } hope this helps!"

If we just call JSON.parse(), the orchestrator crashes.

I implemented robust regular expression (Regex) fallback wrappers that physically strip out conversational text by searching strictly for the bounds of { ... }. Coupled with TypeScript error silencers, if an agent genuinely hallucinates themselves into a corner, I gently slide them into a safe "Fallback logic" state so their avatar doesn't freeze or crash the whole universe.

Explore the NanoClaw Society

The beauty of this framework is extreme modularity. By cleanly separating the engine from the cognition, you can instantly hot-swap llama3.2 for GPT-4o, Anthropic, or even custom fine-tuned local models depending on your machine limits.