Harish Kotra (he/him)

Posted on Jun 22

Building InfiniteLoop: Real-Time Visualization of AI Loop Engineering

#ai #programming #tutorial #dailybuild2026

How I built an interactive platform that watches AI agents learn through escape rooms — one loop at a time.

Most AI demos are black boxes. You type a prompt, the model generates text, and the thinking process disappears into the void. You never see why it chose a particular action, how it recovered from failure, or what it learned along the way.

InfiniteLoop inverts this. It's an open-source, interactive web application that visualizes the Observe → Think → Act → Reflect → Repeat cycle of AI agents in real time through AI-powered escape rooms. Think of it as a debugger for emergent intelligence.

You can watch the agent form hypotheses, take actions, fail, reflect on its mistakes, and adapt — all rendered in a three-panel UI with expandable per-loop details.

The Architecture at 10,000 Feet

The system has four layers:

Room Engine — A deterministic, LLM-independent state machine that manages game state
Agent Runtime — Makes decisions by calling an LLM through OpenRouter, then feeds results back to the engine
Memory System — Three-tier memory (short-term, working, long-term) that persists lessons between loops
UI Layer — Three-panel React app showing room state, agent brain state, and the full loop timeline

The key design decision: the Room Engine never talks to an LLM. It only knows rooms, objects, items, connections, locks, codes, and action handlers. The agent interacts with it through a typed action interface.

UI Layer (React / Zustand / Framer Motion)
    ↓ fetch /api/game
API Route (Next.js App Router)
    ↓
Agent Runtime (runSingleAgentLoop)
    ├── 1. Build state prompt
    ├── 2. Call LLM (fetch → OpenRouter)
    ├── 3. Parse structured JSON (Zod schema)
    ├── 4. Execute action on RoomEngine
    ├── 5. Rule-based reflection (no LLM)
    └── 6. Update MemorySystem

The Room Engine: A Deterministic State Machine

The Room Engine is pure logic. It validates every action before executing it, updates state deterministically, and checks win/lose conditions after every loop.

export class RoomEngine {
  private room: RoomDefinition;
  public state: GameState;

  execute(action: ValidAction): ActionResult {
    const validation = this.validate(action);
    if (!validation.valid) {
      return { success: false, message: validation.error, stateChanges: {} };
    }
    const result = executeAction(this.state, this.room,
      action.type, action.target, action.payload);
    this.applyStateChanges(result.stateChanges);
    this.state.loopCount++;
    this.state.isWon = checkWinCondition(this.state, this.room);
    this.state.isLost = checkLoseCondition(this.state, this.room);
    return result;
  }
}

This separation is deliberate. By keeping the engine LLM-free, we get:

Deterministic behavior — same inputs always produce the same outputs
Testable actions — every handler is a pure function
Model-agnostic gameplay — the engine doesn't care which model drives the agent
Offline-capable room design — room JSON can be authored and validated without any LLM

The Action System

There are 12 action handlers, each a pure function with typed inputs and outputs:

export const actionHandlers: Record<string, ActionHandler> = {
  search_object: (state, room, target) => { /* finds items/clues in objects */ },
  inspect_object: (state, room, target) => { /* reveals detailed description */ },
  pickup_item: (state, room, target) => { /* adds item to inventory */ },
  use_item: (state, room, target, payload) => { /* uses item on target */ },
  enter_code: (state, room, target, payload) => { /* checks code against target */ },
  unlock_object: (state, room, target, payload) => { /* unlocks with key */ },
  move_to_room: (state, room, target) => { /* traverses connections */ },
  combine_items: (state, room, target, payload) => { /* merges two items */ },
  examine_room: (state, room) => { /* re-describes the current room */ },
  open_container: (state, room, target) => { /* opens then searches container */ },
  close_container: (state, room, target) => { /* closes container */ },
  wait: (state, room, target) => { /* does nothing */ },
};

Each handler returns an ActionResult with success, message, stateChanges, and optional discovered items. The engine applies stateChanges to update inventory, discovered objects, unlocked items, visited rooms, and more.

The Agent Runtime: One Loop, One LLM Call

The most important design decision in the agent runtime was reducing LLM calls from 3-4 per loop to exactly 1.

The original architecture had separate agents for:

Observer — "What do I see?" (1 LLM call)
Planner — "What should I do?" (1 LLM call)
Actor — "Execute the action" (0 LLM calls — deterministic)
Reflector — "What did I learn?" (1 LLM call)

That's 3 LLM calls per loop. At 6-7 seconds per call on a free-tier model, a single loop could take 20+ seconds. A full game with 20+ loops would be unusably slow.

The fix: Collapse the Observer, Planner, and Actor into a single combined prompt. The LLM returns a structured JSON object with all three in one call. The Reflector is rule-based and uses no LLM at all.

async function callModel(config, systemPrompt, userPrompt, schema) {
  const body = {
    model: config.model,
    messages: [
      { role: "system", content: `${systemPrompt}\n\nRespond with JSON: ${schema}` },
      { role: "user", content: userPrompt },
    ],
    temperature: config.temperature,
    max_tokens: 2000,
  };

  const res = await fetch("https://openrouter.ai/api/v1/chat/completions", {
    method: "POST", headers: headers(), body: JSON.stringify(body),
    signal: controller.signal,
  });

  const data = await res.json();
  const content = data.choices[0].message.content;
  return JSON.parse(content);
}

The combined schema the model must fill:

const STEP_SCHEMA = {
  type: "object",
  properties: {
    observedFacts: { type: "array", items: { type: "string" } },
    potentialLeads: { type: "array", items: { type: "string" } },
    goal: { type: "string" },
    hypothesis: { type: "string" },
    actionType: { type: "string" },
    actionTarget: { type: "string" },
    actionPayload: { type: "object" },
    confidence: { type: "number" },
    reasoning: { type: "string" },
  },
  required: ["observedFacts", "goal", "hypothesis", "actionType", "actionTarget", "confidence", "reasoning"],
};

The Reflection Engine

Reflection is handled by a deterministic function — no LLM call needed:

function generateReflection(action: string, result: string, confidence: number): Reflection {
  const failWords = ["fail", "can't", "cannot", "not found", "don't have", ...];
  const succeeded = !failWords.some(w => result.toLowerCase().includes(w));

  if (succeeded) {
    lessons.push(`${action} was effective.`);
    adjustments.push("Continue along this path.");
  } else {
    lessons.push(`${action} failed: ${result}`);
    if (result.includes("lock")) {
      adjustments.push("Search for keys before attempting locked things.");
    } else if (result.includes("don't have")) {
      adjustments.push("Search containers and furniture first.");
    }
  }

  return {
    lessonsLearned: lessons,
    strategyAdjustments: adjustments,
    updatedConfidence: succeeded ? Math.min(100, confidence + 5) : Math.max(0, confidence - 10),
  };
}

This approach means:

Zero extra token cost for reflection
Instant feedback — no waiting 6-7 seconds for the LLM to reflect
Deterministic behavior — the agent always learns the right lesson from failure
Transparent logic — every reflection rule is visible in code

The Memory System: Three-Tier Architecture

The agent has three memory tiers, inspired by cognitive science:

interface ShortTermMemory {
  currentRoom: string;
  inventory: string[];
  recentObservations: string[];   // Last 5 observations
  currentGoal: string | null;
}

interface WorkingMemory {
  recentLoops: LoopStep[];         // Last 10 loops
  activeHypotheses: string[];      // Currently being tested
  pendingActions: string[];
  mistakes: number;                 // Track failure rate
}

interface LongTermMemory {
  lessonsLearned: string[];         // Deduplicated lessons
  strategyAdjustments: string[];    // What to do differently
  environmentalMap: Record<string, string[]>;  // Known room graph
  failurePatterns: string[];
  successPatterns: string[];
}

Each loop feeds observations into short-term memory, hypotheses into working memory, and lessons into long-term memory. The getContextForPrompt() method serializes all three tiers into the state prompt so the LLM has access to accumulated knowledge.

Zustand v5: State Without Complexity

Zustand was chosen over React Context or Redux for several reasons:

No providers — the store is a module-level singleton, imported directly
Flat selectors — useGameStore(s => s.activeSessionId) re-renders only when that value changes
getState() pattern — critical for event handlers that need the latest state without stale closures

The store manages sessions, loops, commentaries, game state, and room definitions. Each game gets a unique session ID with its own config, loops array, and state.

export const useGameStore = create<GameStore>((set, get) => ({
  createSession: (roomId, config) => {
    const sessionId = `session_${Date.now()}_${Math.random().toString(36).slice(2, 7)}`;
    set(state => ({
      sessions: { ...state.sessions, [sessionId]: { /* ... */ } },
      activeSessionId: sessionId,
    }));
    return sessionId;
  },

  addLoop: (sessionId, loop) => set(state => ({
    sessions: {
      ...state.sessions,
      [sessionId]: { ...state.sessions[sessionId], loops: [...state.sessions[sessionId].loops, loop] },
    },
  })),
}));

Room Definitions: JSON-Driven Game Design

All five rooms are pure JSON, validated against Zod schemas at build time. This means new rooms can be added without touching any TypeScript code:

{
  "id": "tutorial-room",
  "name": "The Tutorial Chamber",
  "rooms": [
    {
      "id": "chamber",
      "objects": [
        { "id": "wooden_desk", "isContainer": true, "contains": ["brass_key"] },
        { "id": "north_door", "isLocked": true, "requiresItem": "brass_key" }
      ]
    },
    { "id": "exit", "objects": [] }
  ],
  "items": [
    { "id": "brass_key", "usableOn": ["north_door"] },
    { "id": "rusty_key", "usableOn": [] }
  ],
  "objectives": [
    { "type": "collect", "target": "brass_key" },
    { "type": "escape", "target": "exit" }
  ]
}

The Zod schema system ensures:

Every room, object, item, clue, and connection has the right shape
Objectives reference valid targets
Items reference valid objects in usableOn
Containers reference valid item IDs in contains

This catches configuration errors at import time rather than at runtime.

Key Technical Challenges

Challenge 1: OpenRouter Free Tier Reliability

The default model (openai/gpt-oss-20b:free on OpenInference) is free but inconsistent. In direct curl tests with the exact same prompt, it returns valid JSON in ~6.7 seconds. But from the Next.js app, it occasionally returns empty content.

Solution: No retries. If the model returns empty, the agent falls back to examine_room immediately rather than retrying 3 times and wasting 45+ seconds. Each loop completes in under 10 seconds regardless of the model state.

const modelResult = await callModel(config, SYSTEM_PROMPT, buildStatePrompt(engine, memory), STEP_SCHEMA);
const agentStep = modelResult ?? {
  actionType: "examine_room",
  actionTarget: "",
  confidence: 50,
  reasoning: "Model did not return a valid response; falling back.",
  // ... defaults
};

Challenge 2: Stale Closures in Zustand v5

Zustand v5 has a different internal scheduler than v4. Event handlers that captured the store value at render time would operate on stale state.

Solution: Always use useGameStore.getState() inside event handlers. Only use useGameStore(s => s.prop) for reactive re-renders.

const runLoop = useCallback(async () => {
  const sid = useGameStore.getState().activeSessionId;
  // ... use sid
}, []);  // No dependencies needed since we get state directly

Challenge 3: Bypassing LangChain's Structured Output

LangChain's withStructuredOutput() was hanging on OpenRouter's free-tier models due to its retry/fallback logic. The model would return a valid response in 6-7 seconds, but LangChain's wrapper would keep retrying.

Solution: Bypass LangChain entirely for the main call. Use direct fetch() to OpenRouter that matches the verified curl behavior exactly. LangChain is still available for tool definitions but the core loop uses raw HTTP.

The Three-Panel UI

The interface is split into three panels:

Room Visualization (left) — Shows the current room's description, visible objects (with container/lock badges), exits (with lock status), and the agent's inventory
Agent Brain (center) — Shows the agent's current goal, hypothesis (with confidence percentage), last action and its result, accumulated lessons, and recent observations. Animated narrator commentary appears at the top with framer-motion transitions.
Loop Timeline (right) — Expandable cards for each loop, sorted newest-first. Each card shows the complete loop cycle: Observation → Plan → Action → Result → Reflection → Narrator Commentary. Success loops have a green left border, failures have red.

On mobile, these collapse into a tabbed view with Room / Brain / Timeline tabs.

The Developer Dashboard

A separate /dashboard page provides:

Benchmark Runner — Select a room and run multiple models against it. Results show loops used, mistakes made, duration, tokens consumed, and estimated cost. Side-by-side comparison.
Agent Traces (planned) — Full tool-call traces and state transitions
Cost Analytics (planned) — Token usage per loop, per room, per model
Memory Inspector (planned) — Live view of the agent's short-term, working, and long-term memory

What's Next

Multi-agent mode is the next big feature — running 2-3 agents in parallel on the same room with different models, comparing their strategies side by side. Other planned features include a custom room builder UI, agent memory persistence across sessions, chess-style replay, and additional room themes.

Try It Yourself

git clone https://github.com/harishkotra/infinite-loop
cd infinite-loop
npm install
# Add OPENROUTER_API_KEY to .env.local
npm run dev

Open http://localhost:3000, select the Tutorial Chamber, and watch an AI agent learn to escape. No GPU required — just a browser and a free OpenRouter API key.