We lost a 4-hour agent run because a worker restarted mid-step. No logs. No recovery. The agent had called six tools and was halfway through a document pipeline. When the worker came back up, it started from zero. That’s when we stopped debugging the LLM and started debugging the runtime.
The Real Problem
Most frameworks, LangChain-style orchestrators, and prompt chaining libraries stop at the LLM call. They solve the conversation, not the execution loop. In production, agents fail silently: queue errors, worker restarts, malformed tool payloads, runs that leave no trace.
Retries, logs, cron checks – none of that fixes the root cause. The model is fine. The runtime is where things die.
Production-Ready Requirements
- State persistence – every step and tool invocation written to durable storage. No memory caches. No stdout logs.
- Decoupled execution – agent thinking and tool execution separate, queue-based, no blocking. Typed, validated tooling – catch malformed payloads at the boundary. Runtime bombs avoided.
- Horizontal scalability – add workers without touching agent logic.
- *Observability *– structured telemetry for every step, tool call, duration, and output. How Runloop Solves It
Stack: Bun, PostgreSQL, Redis + BullMQ, Zod, OpenTelemetry.
- Bun: high-throughput I/O for agent workloads, low memory per worker.
- PostgreSQL: source of truth. Persisted runs, replayable and auditable.
- BullMQ + Redis: stateless workers, queue-based execution, retry policies, deduplication.
- Zod: tool schemas validated at runtime, TypeScript autocomplete, serializable manifest. -OpenTelemetry: tracing at run and step level, easy integration with Grafana, Jaeger, Datadog. Architecture Core Runtime – manages state, transitions, recovery. Tool Registry – centralized repository, register once, available globally. Worker System – executes steps, persists results, stateless. Getting Started
docker-compose up -d
cp .env.example .env
bun install
Define a tool, launch an agent, and get a fully traced, persisted run in minutes.
⚡ Runloop v1
The Production-Ready AI Agent Runtime.
Stop building experimental scripts. Start building resilient, scalable, and persistent AI agents that actually survive production workloads.
🚀 Why Runloop?
Most AI frameworks focus on the LLM call. Runloop focus on the Execution Loop. It provides a robust runtime for AI agents, built on the fastest modern stack.
- 🏎️ Bun-Native Speed: Leverages the high-performance Bun runtime for blazing-fast execution and low overhead.
- 🛡️ Production-Grade Persistence: Every run, step, and tool result is backed by PostgreSQL. Never lose an agent's state or history again.
- 📦 Distributed Task Orchestration: Powered by BullMQ and Redis. Scale your agent workers vertically or horizontally with ease.
- 🛠️ Type-Safe Tooling: Define your tools using Zod schemas. Get automatic validation and perfect TypeScript autocompletion.
- 📊 Built-in Telemetry: Integrated tracing and monitoring to understand exactly what your agents are doing at every…
Top comments (0)