By Victor M, Co-Founder at Fleeks
Most AI agents stay in development because production deployment is too slow. At Fleeks, we built infrastructure where agents deploy autonomously in 31 seconds—from code generation to production URL to shareable embed. Zero human intervention.
Table of Contents
- Core Infrastructure: Sub-200ms Stateful Execution
- Orchestration: The MCP Standard
- The Structural Foundation: Production Lifecycle
- Resource Management: CRIU-Based Hibernation
- Real-World Applications
- System Architecture
- Resources
1. Core Infrastructure: Sub-200ms Stateful Execution
The Problem: Standard serverless cold starts: 3-8 seconds. For an agent doing 50 iterations, that's 150-400 seconds of waiting. Agents give up early because iteration is expensive.
Our Solution: Pre-warmed container pool.
We maintain 1,000+ initialized containers. Agent needs one? Grab from pool in sub-200ms.
for iteration in range(50):
ws = await client.workspaces.create(f"test-{iteration}")
await ws.terminal.execute("python test.py")
result = await ws.files.read("output.json")
Technical implementation:
| Metric | Value |
|---|---|
| Pool size | 1,000+ containers per region |
| Isolation | gVisor for multi-tenant security |
| Hit rate | >95% under production load |
| Latency | Sub-200ms (P95) |
Tradeoff: Higher baseline cost vs predictable speed. Worth it for agent workloads where iteration speed determines solution quality.
Why custom orchestration instead of Kubernetes? K8s pod startup: 10-30s. Too slow for agent iteration needing sub-200ms. We built a custom scheduler for container pool management. Still use K8s for stateless services.
Performance benchmark:
| Operation | Latency |
|---|---|
| Container acquisition | Sub-200ms |
| Cold provision fallback | 4-5s |
| Pool hit rate | >95% |
2. Orchestration: The MCP Standard for Autonomous Tool Integration
Agents need external systems (GitHub, databases, Slack). We use Model Context Protocol for standardized integration:
{
"servers": {
"github": {
"command": "npx",
"args": ["-y", "@modelcontextprotocol/server-github"],
"env": {"GITHUB_PERSONAL_ACCESS_TOKEN": "..."}
}
}
}
How it works: Agent asks "list repositories" → MCP translates to GitHub API → Agent gets data.
Integration scope:
- 270+ community MCP servers available
- Protocol: Standardized JSON-RPC over stdio
- Configuration: Declarative, not programmatic
Why this scales: Adding tools is configuration, not custom code. Same interface for all external systems.
3. The Structural Foundation: The Production Lifecycle
Traditional deployment takes 20+ minutes with manual steps. For autonomous agents, this breaks the core premise.
A. Polyglot Runtime Execution
# Agent switches languages per task, same workspace
await ws.files.create("analyze.py", ml_code)
await ws.terminal.execute("python analyze.py &")
await ws.files.create("api.js", server_code)
await ws.terminal.execute("node api.js &")
preview = await ws.get_preview_url()
# One URL, multiple services
Tech: 11+ runtime templates (Python, Node.js, React, Go, Rust, Java, Vue, Svelte). Pre-configured dependency management. Single workspace, multi-process execution.
Why this matters: Agent selects optimal language per task. Python for ML, Node for APIs, React for UI—orchestrated autonomously without manual environment switching.
B. Instant Preview URLs
await workspace.terminal.execute("python app.py &")
preview = await workspace.get_preview_url()
# https://workspace-abc.fleeks.run (~30ms)
Tech: Wildcard SSL, Envoy proxy, Cloudflare CDN. Agent validates against real production infrastructure.
Performance: Preview URL generation ~30ms (measured average).
C. Embeds for Distribution
embed = await client.embeds.create(
name="Demo",
template=EmbedTemplate.REACT,
files={"src/App.js": code},
layout_preset="side-by-side"
)
What you get:
- Code editor + live preview
- Working runtime (not a screenshot)
- 100+ concurrent users per embed
- Shareable URL or iframe
Use cases: Portfolio sites with runnable demos. Documentation with editable examples. Twitter demos that actually work.
D. Persistent State Architecture
Serverless wipes disk on shutdown. Agents need memory that survives restarts.
Container (ephemeral) → /workspace (persistent)
# Agent writes learned patterns
await workspace.files.create(
"/workspace/memory.json",
json.dumps(learned_patterns)
)
# Container restarts, state persists
# Agent reads accumulated knowledge
memory = json.loads(
await workspace.files.read("/workspace/memory.json")
)
Tech: Distributed filesystem, <10ms writes, replicated across 3 zones.
Impact: Agents solve problems requiring 100+ iterations of accumulated learning.
Why persistent volumes instead of S3? Agents expect normal filesystem operations. S3 has no atomic operations, higher latency, non-POSIX semantics.
4. Resource Management: CRIU-Based Hibernation
Some agents run for hours. Keeping containers up 24/7 is expensive. Stopping them loses process state.
Our solution: CRIU hibernation.
await workspace.terminal.start_background_job("python monitor.py")
await workspace.containers.hibernate() # ~2s, then $0
await workspace.containers.wake() # ~2s, exact state
What CRIU preserves:
- Process memory (exact state)
- Open file descriptors
- Network connections
- Process IDs
Performance:
| Operation | Value |
|---|---|
| Checkpoint creation | ~2 seconds |
| Restore time | ~2 seconds |
| Success rate | >99% for CPU workloads |
Constraint: GPU state not supported (CRIU limitation). CPU workloads fully supported.
5. Real-World Application: Solving Engineering Friction
Self-Healing Infrastructure
Agent that monitors Kubernetes and auto-fixes issues:
async def autonomous_remediation():
async with create_client() as client:
agent = await client.workspaces.create("monitor", "python")
await agent.files.create("monitor.py", """
import json
memory = json.load(open('/workspace/fixes.json'))
for pod in failing_pods:
issue = analyze(pod)
if issue in memory:
apply_fix(memory[issue]) # 10 seconds
else:
fix = investigate_and_fix(pod) # 3-5 minutes
memory[issue] = fix
json.dump(memory, open('/workspace/fixes.json', 'w'))
""")
await agent.terminal.start_background_job("python monitor.py")
Outcome:
| Occurrence | Resolution Time |
|---|---|
| First occurrence | 3-5 minutes |
| Second occurrence | 30 seconds |
| After 50 occurrences | 10 seconds |
Agent learns and gets faster over time. Persistent state enables learning. Fast provisioning enables validation environments. Production URLs enable fix testing before deployment.
Complete System Architecture
┌─────────────────────────────────────────┐
│ Agent Layer (Customer Code) │
│ • Reasoning and decision-making │
│ • Code generation and validation │
│ • MCP tool integration │
│ • State management in /workspace │
└──────────────┬──────────────────────────┘
│
┌──────────────▼──────────────────────────┐
│ Fleeks Container Engine │
│ • Pre-warmed pool (sub-200ms) │
│ • gVisor isolation │
│ • CRIU hibernation │
│ • Multi-template support │
└──────────────┬──────────────────────────┘
│
┌──────────────▼──────────────────────────┐
│ Fleeks Production Layer │
│ • Dynamic HTTPS (*.fleeks.run) │
│ • Instant preview URLs (~30ms) │
│ • Embeddable workspaces │
└──────────────┬──────────────────────────┘
│
┌──────────────▼──────────────────────────┐
│ Fleeks Storage Layer │
│ • Persistent /workspace │
│ • Distributed filesystem │
│ • Multi-AZ replication │
└─────────────────────────────────────────┘
Each layer enables the one above: Fast provisioning → rapid iteration. Instant URLs → production validation. Embeds → distribution. Persistent state → learning.
Performance Benchmarks
| Operation | Latency | Impact |
|---|---|---|
| Container acquisition | Sub-200ms | Maintains reasoning flow |
| Preview URL | ~30ms | Instant validation |
| File write | <10ms | Fast state updates |
| Embed creation | ~1s | Immediate distribution |
| Hibernation | ~2s | Cost-efficient |
Infrastructure Comparison
| Feature | Lambda | K8s | Fleeks |
|---|---|---|---|
| Cold start | 1-8s | 10-30s | Sub-200ms |
| Persistent state | ❌ | Manual | ✅ |
| Preview URLs | ❌ | Manual | ✅ |
| Embeds | ❌ | ❌ | ✅ |
| Hibernation | ❌ | ❌ | ✅ |
Use Fleeks when: AI agents, rapid iteration (50+ cycles), need persistent memory, autonomous deployment.
Use Lambda when: Stateless APIs, infrequent traffic.
Use K8s when: Long-running services, have DevOps team.
Current Technical Constraints
- Storage I/O: ~100MB/s per workspace. Sufficient for code/logs/state. Data-intensive workloads may hit limits.
- GPU hibernation: Not supported (CRIU limitation). CPU workloads work fine.
- Cross-region state: Can't checkpoint in US-East and restore in EU-West yet.
- Embed sessions: ~100 concurrent per embed. Higher traffic needs different pooling.
Working on all of these.
Resources
Get Started
Install:
pip install fleeks-sdk
Quick example:
from fleeks_sdk import create_client
async with create_client(api_key="your_key") as client:
ws = await client.workspaces.create("demo", "python")
await ws.files.create("app.py", "print('Hello')")
await ws.terminal.execute("python app.py")
preview = await ws.get_preview_url()
print(f"Live: {preview.preview_url}")
Self-improving agent:
async def learning_agent():
async with create_client() as client:
ws = await client.workspaces.create("learning")
try:
memory = json.loads(await ws.files.read("/workspace/memory.json"))
except FileNotFoundError:
memory = {"patterns": [], "iteration": 0}
for i in range(50):
memory["iteration"] += 1
result = await ws.terminal.execute("python task.py")
if result.exit_code == 0:
memory["patterns"].append(extract(result.stdout))
await ws.files.create("/workspace/memory.json", json.dumps(memory))
return await ws.get_preview_url()
Benchmark It Yourself
import time
from fleeks_sdk import create_client
async def benchmark():
timings = []
async with create_client() as client:
for i in range(10):
start = time.time()
ws = await client.workspaces.create(f"bench-{i}")
elapsed = (time.time() - start) * 1000
timings.append(elapsed)
await ws.delete()
print(f"Avg: {sum(timings)/len(timings):.0f}ms")
Links
- Sign up: fleeks.ai/signup (Free: 100 hours/month)
- SDK: github.com/fleeks-ai/fleeks-sdk-python
- Docs: docs.fleeks.ai
- Discord: discord.gg/fleeks
Key Takeaways
Infrastructure shapes agent behavior. Fast provisioning (200ms) enables deep exploration. Slow provisioning (5s) forces simple solutions.
State persistence enables learning. Agents accumulate knowledge over 100+ iterations instead of resetting to zero.
Production lifecycle is the substrate. Agents that can't deploy autonomously are experimental scripts, not operational systems.
MCP standardizes tools. 270+ integrations via configuration, not custom code.
Top comments (0)