DEV Community

Cover image for Why We Built Polpo: The Runtime for AI Agents
Alessio Micali
Alessio Micali

Posted on • Originally published at polpo.sh

Why We Built Polpo: The Runtime for AI Agents

Why We Built Polpo

We kept solving the same infrastructure problems every time we shipped an agent. Streaming, sandboxing, memory, tool execution, evaluation — the same backend plumbing, over and over. So we built a runtime that handles all of it.

This post explains the gap we saw, why existing tools didn't fill it, and what Polpo does about it.

Agents got good fast. Infrastructure didn't keep up.

A year ago, AI agents could barely handle a multi-turn conversation. Today, they write code, research topics, manage files, ask clarifying questions, spawn sub-agents, and orchestrate complex workflows.

The capabilities evolved at breakneck speed. The infrastructure to run them? Not so much.

Building a production-ready agent means stitching together a surprising amount of backend plumbing — streaming, tool execution, sandboxed file access, persistent memory, session management, scheduling. Every team building agents hits the same wall: the agent works on your laptop. Now what?

The demo-to-production gap

With today's coding tools and models, you can build a beautiful agent demo in a weekend. A slick UI, impressive tool use, real-time streaming. It looks production-ready.

Then you try to ship it.

Behind every capability your agent needs, there's a piece of backend someone has to build, test, and maintain:

  • Streaming — SSE or WebSocket infrastructure, connection handling, backpressure
  • Tool execution — Where do tools run? How do you handle timeouts, retries, failures?
  • Filesystem access — What filesystem? What permissions? Isolated from other users?
  • Persistent memory — Context across sessions. Who manages storage? How does it scale?
  • Attachments — File parsing, storage, cleanup
  • Sub-agents — Orchestration, dependency resolution, result collection
  • Evaluation — Systematic assessment, not guesswork

Each of these is a week of work. In the best case.

You end up with a great demo on Friday and weeks of infra work before your agent does what it's supposed to do in production.

What Polpo is

Polpo is a Backend-as-a-Service for AI agents. An open-source runtime born from building agents in production.

Define your agent in JSON. Deploy. Get a live API endpoint.

[{
  "name": "coder",
  "role": "Senior Engineer",
  "model": "anthropic:claude-sonnet-4-5",
  "systemPrompt": "Write clean, tested TypeScript...",
  "allowedTools": ["bash", "read", "write", "edit"],
  "skills": ["frontend-design", "testing"],
  "reasoning": "medium",
  "maxConcurrency": 3
}]
Enter fullscreen mode Exit fullscreen mode

That JSON becomes a production agent with an OpenAI-compatible API, streaming, 60+ built-in tools, sandboxed execution, persistent memory, and evaluation — no Dockerfile, no Kubernetes, no infra team.

Here's what you get out of the box:

  • OpenAI-compatible API endpoint — any HTTP client can talk to it
  • Streaming — SSE, token by token
  • 60+ built-in tools — file I/O, shell, HTTP, email, browser, search
  • Sandboxed execution — isolated environment per agent
  • Persistent memory — context across sessions
  • Skills — modular knowledge packages from skills.sh
  • Multi-agent orchestration — teams with dependency resolution
  • Scheduling — cron-based triggers
  • LLM-as-a-Judge evaluation — G-Eval scoring with custom rubrics

How it's different

Frameworks give you libraries to build with. Polpo gives you a runtime to deploy on.

Frameworks (CrewAI, LangGraph, etc.) Polpo
What it is Libraries you code against Runtime you deploy to
Agent definition Python/TS code JSON config
Execution Your infrastructure Managed sandboxes
API endpoint You build it OpenAI-compatible, included
Memory Varies by framework Built-in, persistent
Sandboxing DIY (Docker, etc.) Isolated per agent
Skills ecosystem N/A skills.sh
Evaluation Via integrations LLM-as-a-Judge built-in

Some frameworks now offer managed platforms. Those are great options. Polpo's bet is different: open source, framework-agnostic, config-first.

Polpo sits below your framework of choice. Or replaces the need for one entirely.

Design principles

CLI and API first. Your coding agent — Claude Code, Cursor, Windsurf — can create, configure, and deploy Polpo agents in seconds. If your infrastructure isn't agent-friendly, you're building for the wrong era.

Framework agnostic. OpenAI-compatible API. Any language, any framework.

Works with every model. Anthropic, OpenAI, Google, open source. Swap models without changing infrastructure.

Why open source

The core runtime is MIT-licensed. Self-host on your laptop, a VPS, or your company's servers. The cloud adds managed sandboxes, multi-tenancy, and auto-scaling — but the engine is the same.

We believe the runtime for AI agents should be a commodity, not a moat.

Get started

npm install -g polpo-ai
polpo skills add lumea-labs/polpo-skills
polpo-cloud deploy
Enter fullscreen mode Exit fullscreen mode

Your agents are live with API endpoints, memory, tools, sandboxing — everything.

Or prompt your coding agent:

"Create a customer support agent with email and HTTP tools and deploy it to Polpo."

One prompt. Done.

Polpo is in public beta. Free tier. No credit card.

Get started on GitHub →


Docs · Discord · X

Top comments (0)