Fleeks

Posted on Mar 4

The Agentic Substrate: Why the Production Lifecycle Matters for Autonomous Systems.

#ai #infrastructure #devops #architecture

By Victor M, Co-Founder at Fleeks

Most AI agents stay in development because production deployment is too slow. At Fleeks, we built infrastructure where agents deploy autonomously in 31 seconds—from code generation to production URL to shareable embed. Zero human intervention.

Core Infrastructure: Sub-200ms Stateful Execution
Orchestration: The MCP Standard
The Structural Foundation: Production Lifecycle
Resource Management: CRIU-Based Hibernation
Real-World Applications
System Architecture
Resources

1. Core Infrastructure: Sub-200ms Stateful Execution

The Problem: Standard serverless cold starts: 3-8 seconds. For an agent doing 50 iterations, that's 150-400 seconds of waiting. Agents give up early because iteration is expensive.

Our Solution: Pre-warmed container pool.

We maintain 1,000+ initialized containers. Agent needs one? Grab from pool in sub-200ms.

for iteration in range(50):
    ws = await client.workspaces.create(f"test-{iteration}")
    await ws.terminal.execute("python test.py")
    result = await ws.files.read("output.json")

Technical implementation:

Metric	Value
Pool size	1,000+ containers per region
Isolation	gVisor for multi-tenant security
Hit rate	>95% under production load
Latency	Sub-200ms (P95)

Tradeoff: Higher baseline cost vs predictable speed. Worth it for agent workloads where iteration speed determines solution quality.

Why custom orchestration instead of Kubernetes? K8s pod startup: 10-30s. Too slow for agent iteration needing sub-200ms. We built a custom scheduler for container pool management. Still use K8s for stateless services.

Performance benchmark:

Operation	Latency
Container acquisition	Sub-200ms
Cold provision fallback	4-5s
Pool hit rate	>95%

2. Orchestration: The MCP Standard for Autonomous Tool Integration

Agents need external systems (GitHub, databases, Slack). We use Model Context Protocol for standardized integration:

{
  "servers": {
    "github": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-github"],
      "env": {"GITHUB_PERSONAL_ACCESS_TOKEN": "..."}
    }
  }
}

How it works: Agent asks "list repositories" → MCP translates to GitHub API → Agent gets data.

Integration scope:

270+ community MCP servers available
Protocol: Standardized JSON-RPC over stdio
Configuration: Declarative, not programmatic

Why this scales: Adding tools is configuration, not custom code. Same interface for all external systems.

3. The Structural Foundation: The Production Lifecycle

Traditional deployment takes 20+ minutes with manual steps. For autonomous agents, this breaks the core premise.

A. Polyglot Runtime Execution

# Agent switches languages per task, same workspace
await ws.files.create("analyze.py", ml_code)
await ws.terminal.execute("python analyze.py &")

await ws.files.create("api.js", server_code)
await ws.terminal.execute("node api.js &")

preview = await ws.get_preview_url()
# One URL, multiple services

Tech: 11+ runtime templates (Python, Node.js, React, Go, Rust, Java, Vue, Svelte). Pre-configured dependency management. Single workspace, multi-process execution.

Why this matters: Agent selects optimal language per task. Python for ML, Node for APIs, React for UI—orchestrated autonomously without manual environment switching.

B. Instant Preview URLs

await workspace.terminal.execute("python app.py &")
preview = await workspace.get_preview_url()
# https://workspace-abc.fleeks.run (~30ms)

Tech: Wildcard SSL, Envoy proxy, Cloudflare CDN. Agent validates against real production infrastructure.

Performance: Preview URL generation ~30ms (measured average).

C. Embeds for Distribution

embed = await client.embeds.create(
    name="Demo",
    template=EmbedTemplate.REACT,
    files={"src/App.js": code},
    layout_preset="side-by-side"
)

What you get:

Code editor + live preview
Working runtime (not a screenshot)
100+ concurrent users per embed
Shareable URL or iframe

Use cases: Portfolio sites with runnable demos. Documentation with editable examples. Twitter demos that actually work.

D. Persistent State Architecture

Serverless wipes disk on shutdown. Agents need memory that survives restarts.

Container (ephemeral) → /workspace (persistent)

# Agent writes learned patterns
await workspace.files.create(
    "/workspace/memory.json",
    json.dumps(learned_patterns)
)

# Container restarts, state persists

# Agent reads accumulated knowledge
memory = json.loads(
    await workspace.files.read("/workspace/memory.json")
)

Tech: Distributed filesystem, <10ms writes, replicated across 3 zones.

Impact: Agents solve problems requiring 100+ iterations of accumulated learning.

Why persistent volumes instead of S3? Agents expect normal filesystem operations. S3 has no atomic operations, higher latency, non-POSIX semantics.

4. Resource Management: CRIU-Based Hibernation

Some agents run for hours. Keeping containers up 24/7 is expensive. Stopping them loses process state.

Our solution: CRIU hibernation.

await workspace.terminal.start_background_job("python monitor.py")

await workspace.containers.hibernate()  # ~2s, then $0
await workspace.containers.wake()       # ~2s, exact state

What CRIU preserves:

Process memory (exact state)
Open file descriptors
Network connections
Process IDs

Performance:

Operation	Value
Checkpoint creation	~2 seconds
Restore time	~2 seconds
Success rate	>99% for CPU workloads

Constraint: GPU state not supported (CRIU limitation). CPU workloads fully supported.

5. Real-World Application: Solving Engineering Friction

Self-Healing Infrastructure

Agent that monitors Kubernetes and auto-fixes issues:

async def autonomous_remediation():
    async with create_client() as client:
        agent = await client.workspaces.create("monitor", "python")

        await agent.files.create("monitor.py", """
import json

memory = json.load(open('/workspace/fixes.json'))

for pod in failing_pods:
    issue = analyze(pod)

    if issue in memory:
        apply_fix(memory[issue])  # 10 seconds
    else:
        fix = investigate_and_fix(pod)  # 3-5 minutes
        memory[issue] = fix
        json.dump(memory, open('/workspace/fixes.json', 'w'))
""")

        await agent.terminal.start_background_job("python monitor.py")

Outcome:

Occurrence	Resolution Time
First occurrence	3-5 minutes
Second occurrence	30 seconds
After 50 occurrences	10 seconds

Agent learns and gets faster over time. Persistent state enables learning. Fast provisioning enables validation environments. Production URLs enable fix testing before deployment.

Complete System Architecture

┌─────────────────────────────────────────┐
│ Agent Layer (Customer Code)             │
│ • Reasoning and decision-making         │
│ • Code generation and validation        │
│ • MCP tool integration                  │
│ • State management in /workspace        │
└──────────────┬──────────────────────────┘
               │
┌──────────────▼──────────────────────────┐
│ Fleeks Container Engine                 │
│ • Pre-warmed pool (sub-200ms)           │
│ • gVisor isolation                      │
│ • CRIU hibernation                      │
│ • Multi-template support                │
└──────────────┬──────────────────────────┘
               │
┌──────────────▼──────────────────────────┐
│ Fleeks Production Layer                 │
│ • Dynamic HTTPS (*.fleeks.run)          │
│ • Instant preview URLs (~30ms)          │
│ • Embeddable workspaces                 │
└──────────────┬──────────────────────────┘
               │
┌──────────────▼──────────────────────────┐
│ Fleeks Storage Layer                    │
│ • Persistent /workspace                 │
│ • Distributed filesystem                │
│ • Multi-AZ replication                  │
└─────────────────────────────────────────┘

Each layer enables the one above: Fast provisioning → rapid iteration. Instant URLs → production validation. Embeds → distribution. Persistent state → learning.

Performance Benchmarks

Operation	Latency	Impact
Container acquisition	Sub-200ms	Maintains reasoning flow
Preview URL	~30ms	Instant validation
File write	<10ms	Fast state updates
Embed creation	~1s	Immediate distribution
Hibernation	~2s	Cost-efficient

Infrastructure Comparison

Feature	Lambda	K8s	Fleeks
Cold start	1-8s	10-30s	Sub-200ms
Persistent state	❌	Manual	✅
Preview URLs	❌	Manual	✅
Embeds	❌	❌	✅
Hibernation	❌	❌	✅

Use Fleeks when: AI agents, rapid iteration (50+ cycles), need persistent memory, autonomous deployment.

Use Lambda when: Stateless APIs, infrequent traffic.

Use K8s when: Long-running services, have DevOps team.

Current Technical Constraints

Storage I/O: ~100MB/s per workspace. Sufficient for code/logs/state. Data-intensive workloads may hit limits.
GPU hibernation: Not supported (CRIU limitation). CPU workloads work fine.
Cross-region state: Can't checkpoint in US-East and restore in EU-West yet.
Embed sessions: ~100 concurrent per embed. Higher traffic needs different pooling.

Working on all of these.

Resources

Get Started

Install:

pip install fleeks-sdk

Quick example:

from fleeks_sdk import create_client

async with create_client(api_key="your_key") as client:
    ws = await client.workspaces.create("demo", "python")
    await ws.files.create("app.py", "print('Hello')")
    await ws.terminal.execute("python app.py")

    preview = await ws.get_preview_url()
    print(f"Live: {preview.preview_url}")

Self-improving agent:

async def learning_agent():
    async with create_client() as client:
        ws = await client.workspaces.create("learning")

        try:
            memory = json.loads(await ws.files.read("/workspace/memory.json"))
        except FileNotFoundError:
            memory = {"patterns": [], "iteration": 0}

        for i in range(50):
            memory["iteration"] += 1
            result = await ws.terminal.execute("python task.py")

            if result.exit_code == 0:
                memory["patterns"].append(extract(result.stdout))

            await ws.files.create("/workspace/memory.json", json.dumps(memory))

        return await ws.get_preview_url()

Benchmark It Yourself

import time
from fleeks_sdk import create_client

async def benchmark():
    timings = []
    async with create_client() as client:
        for i in range(10):
            start = time.time()
            ws = await client.workspaces.create(f"bench-{i}")
            elapsed = (time.time() - start) * 1000
            timings.append(elapsed)
            await ws.delete()

    print(f"Avg: {sum(timings)/len(timings):.0f}ms")

Key Takeaways

Infrastructure shapes agent behavior. Fast provisioning (200ms) enables deep exploration. Slow provisioning (5s) forces simple solutions.

State persistence enables learning. Agents accumulate knowledge over 100+ iterations instead of resetting to zero.

Production lifecycle is the substrate. Agents that can't deploy autonomously are experimental scripts, not operational systems.

MCP standardizes tools. 270+ integrations via configuration, not custom code.

DEV Community