Alessio Micali

Posted on Apr 3 • Originally published at polpo.sh

Why We Built Polpo: The Runtime for AI Agents

#ai #agents #opensource #backend

Why We Built Polpo

We kept solving the same infrastructure problems every time we shipped an agent. Streaming, sandboxing, memory, tool execution, evaluation — the same backend plumbing, over and over. So we built a runtime that handles all of it.

This post explains the gap we saw, why existing tools didn't fill it, and what Polpo does about it.

Agents got good fast. Infrastructure didn't keep up.

A year ago, AI agents could barely handle a multi-turn conversation. Today, they write code, research topics, manage files, ask clarifying questions, spawn sub-agents, and orchestrate complex workflows.

The capabilities evolved at breakneck speed. The infrastructure to run them? Not so much.

Building a production-ready agent means stitching together a surprising amount of backend plumbing — streaming, tool execution, sandboxed file access, persistent memory, session management, scheduling. Every team building agents hits the same wall: the agent works on your laptop. Now what?

The demo-to-production gap

With today's coding tools and models, you can build a beautiful agent demo in a weekend. A slick UI, impressive tool use, real-time streaming. It looks production-ready.

Then you try to ship it.

Behind every capability your agent needs, there's a piece of backend someone has to build, test, and maintain:

Streaming — SSE or WebSocket infrastructure, connection handling, backpressure
Tool execution — Where do tools run? How do you handle timeouts, retries, failures?
Filesystem access — What filesystem? What permissions? Isolated from other users?
Persistent memory — Context across sessions. Who manages storage? How does it scale?
Attachments — File parsing, storage, cleanup
Sub-agents — Orchestration, dependency resolution, result collection
Evaluation — Systematic assessment, not guesswork

Each of these is a week of work. In the best case.

You end up with a great demo on Friday and weeks of infra work before your agent does what it's supposed to do in production.

What Polpo is

Polpo is a Backend-as-a-Service for AI agents. An open-source runtime born from building agents in production.

Define your agent in JSON. Deploy. Get a live API endpoint.

[{
  "name": "coder",
  "role": "Senior Engineer",
  "model": "anthropic:claude-sonnet-4-5",
  "systemPrompt": "Write clean, tested TypeScript...",
  "allowedTools": ["bash", "read", "write", "edit"],
  "skills": ["frontend-design", "testing"],
  "reasoning": "medium",
  "maxConcurrency": 3
}]

That JSON becomes a production agent with an OpenAI-compatible API, streaming, 60+ built-in tools, sandboxed execution, persistent memory, and evaluation — no Dockerfile, no Kubernetes, no infra team.

Here's what you get out of the box:

OpenAI-compatible API endpoint — any HTTP client can talk to it
Streaming — SSE, token by token
60+ built-in tools — file I/O, shell, HTTP, email, browser, search
Sandboxed execution — isolated environment per agent
Persistent memory — context across sessions
Skills — modular knowledge packages from skills.sh
Multi-agent orchestration — teams with dependency resolution
Scheduling — cron-based triggers
LLM-as-a-Judge evaluation — G-Eval scoring with custom rubrics

How it's different

Frameworks give you libraries to build with. Polpo gives you a runtime to deploy on.

	Frameworks (CrewAI, LangGraph, etc.)	Polpo
What it is	Libraries you code against	Runtime you deploy to
Agent definition	Python/TS code	JSON config
Execution	Your infrastructure	Managed sandboxes
API endpoint	You build it	OpenAI-compatible, included
Memory	Varies by framework	Built-in, persistent
Sandboxing	DIY (Docker, etc.)	Isolated per agent
Skills ecosystem	N/A	skills.sh
Evaluation	Via integrations	LLM-as-a-Judge built-in

Some frameworks now offer managed platforms. Those are great options. Polpo's bet is different: open source, framework-agnostic, config-first.

Polpo sits below your framework of choice. Or replaces the need for one entirely.

Design principles

CLI and API first. Your coding agent — Claude Code, Cursor, Windsurf — can create, configure, and deploy Polpo agents in seconds. If your infrastructure isn't agent-friendly, you're building for the wrong era.

Framework agnostic. OpenAI-compatible API. Any language, any framework.

Works with every model. Anthropic, OpenAI, Google, open source. Swap models without changing infrastructure.

Why open source

The core runtime is MIT-licensed. Self-host on your laptop, a VPS, or your company's servers. The cloud adds managed sandboxes, multi-tenancy, and auto-scaling — but the engine is the same.

We believe the runtime for AI agents should be a commodity, not a moat.

Get started

npm install -g polpo-ai
polpo skills add lumea-labs/polpo-skills
polpo-cloud deploy

Your agents are live with API endpoints, memory, tools, sandboxing — everything.

Or prompt your coding agent:

"Create a customer support agent with email and HTTP tools and deploy it to Polpo."

One prompt. Done.

Polpo is in public beta. Free tier. No credit card.

Get started on GitHub →

Docs · Discord · X

DEV Community