DEV Community

Fleeks
Fleeks

Posted on

The Agentic Substrate: Why the Production Lifecycle Matters for Autonomous Systems.

By Victor M, Co-Founder at Fleeks

Most AI agents stay in development because production deployment is too slow. At Fleeks, we built infrastructure where agents deploy autonomously in 31 seconds—from code generation to production URL to shareable embed. Zero human intervention.


Table of Contents

  1. Core Infrastructure: Sub-200ms Stateful Execution
  2. Orchestration: The MCP Standard
  3. The Structural Foundation: Production Lifecycle
  4. Resource Management: CRIU-Based Hibernation
  5. Real-World Applications
  6. System Architecture
  7. Resources

1. Core Infrastructure: Sub-200ms Stateful Execution

The Problem: Standard serverless cold starts: 3-8 seconds. For an agent doing 50 iterations, that's 150-400 seconds of waiting. Agents give up early because iteration is expensive.

Our Solution: Pre-warmed container pool.

We maintain 1,000+ initialized containers. Agent needs one? Grab from pool in sub-200ms.

for iteration in range(50):
    ws = await client.workspaces.create(f"test-{iteration}")
    await ws.terminal.execute("python test.py")
    result = await ws.files.read("output.json")
Enter fullscreen mode Exit fullscreen mode

Technical implementation:

Metric Value
Pool size 1,000+ containers per region
Isolation gVisor for multi-tenant security
Hit rate >95% under production load
Latency Sub-200ms (P95)

Tradeoff: Higher baseline cost vs predictable speed. Worth it for agent workloads where iteration speed determines solution quality.

Why custom orchestration instead of Kubernetes? K8s pod startup: 10-30s. Too slow for agent iteration needing sub-200ms. We built a custom scheduler for container pool management. Still use K8s for stateless services.

Performance benchmark:

Operation Latency
Container acquisition Sub-200ms
Cold provision fallback 4-5s
Pool hit rate >95%

2. Orchestration: The MCP Standard for Autonomous Tool Integration

Agents need external systems (GitHub, databases, Slack). We use Model Context Protocol for standardized integration:

{
  "servers": {
    "github": {
      "command": "npx",
      "args": ["-y", "@modelcontextprotocol/server-github"],
      "env": {"GITHUB_PERSONAL_ACCESS_TOKEN": "..."}
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

How it works: Agent asks "list repositories" → MCP translates to GitHub API → Agent gets data.

Integration scope:

  • 270+ community MCP servers available
  • Protocol: Standardized JSON-RPC over stdio
  • Configuration: Declarative, not programmatic

Why this scales: Adding tools is configuration, not custom code. Same interface for all external systems.


3. The Structural Foundation: The Production Lifecycle

Traditional deployment takes 20+ minutes with manual steps. For autonomous agents, this breaks the core premise.

A. Polyglot Runtime Execution

# Agent switches languages per task, same workspace
await ws.files.create("analyze.py", ml_code)
await ws.terminal.execute("python analyze.py &")

await ws.files.create("api.js", server_code)
await ws.terminal.execute("node api.js &")

preview = await ws.get_preview_url()
# One URL, multiple services
Enter fullscreen mode Exit fullscreen mode

Tech: 11+ runtime templates (Python, Node.js, React, Go, Rust, Java, Vue, Svelte). Pre-configured dependency management. Single workspace, multi-process execution.

Why this matters: Agent selects optimal language per task. Python for ML, Node for APIs, React for UI—orchestrated autonomously without manual environment switching.

B. Instant Preview URLs

await workspace.terminal.execute("python app.py &")
preview = await workspace.get_preview_url()
# https://workspace-abc.fleeks.run (~30ms)
Enter fullscreen mode Exit fullscreen mode

Tech: Wildcard SSL, Envoy proxy, Cloudflare CDN. Agent validates against real production infrastructure.

Performance: Preview URL generation ~30ms (measured average).

C. Embeds for Distribution

embed = await client.embeds.create(
    name="Demo",
    template=EmbedTemplate.REACT,
    files={"src/App.js": code},
    layout_preset="side-by-side"
)
Enter fullscreen mode Exit fullscreen mode

What you get:

  • Code editor + live preview
  • Working runtime (not a screenshot)
  • 100+ concurrent users per embed
  • Shareable URL or iframe

Use cases: Portfolio sites with runnable demos. Documentation with editable examples. Twitter demos that actually work.

D. Persistent State Architecture

Serverless wipes disk on shutdown. Agents need memory that survives restarts.

Container (ephemeral) → /workspace (persistent)
Enter fullscreen mode Exit fullscreen mode
# Agent writes learned patterns
await workspace.files.create(
    "/workspace/memory.json",
    json.dumps(learned_patterns)
)

# Container restarts, state persists

# Agent reads accumulated knowledge
memory = json.loads(
    await workspace.files.read("/workspace/memory.json")
)
Enter fullscreen mode Exit fullscreen mode

Tech: Distributed filesystem, <10ms writes, replicated across 3 zones.

Impact: Agents solve problems requiring 100+ iterations of accumulated learning.

Why persistent volumes instead of S3? Agents expect normal filesystem operations. S3 has no atomic operations, higher latency, non-POSIX semantics.


4. Resource Management: CRIU-Based Hibernation

Some agents run for hours. Keeping containers up 24/7 is expensive. Stopping them loses process state.

Our solution: CRIU hibernation.

await workspace.terminal.start_background_job("python monitor.py")

await workspace.containers.hibernate()  # ~2s, then $0
await workspace.containers.wake()       # ~2s, exact state
Enter fullscreen mode Exit fullscreen mode

What CRIU preserves:

  • Process memory (exact state)
  • Open file descriptors
  • Network connections
  • Process IDs

Performance:

Operation Value
Checkpoint creation ~2 seconds
Restore time ~2 seconds
Success rate >99% for CPU workloads

Constraint: GPU state not supported (CRIU limitation). CPU workloads fully supported.


5. Real-World Application: Solving Engineering Friction

Self-Healing Infrastructure

Agent that monitors Kubernetes and auto-fixes issues:

async def autonomous_remediation():
    async with create_client() as client:
        agent = await client.workspaces.create("monitor", "python")

        await agent.files.create("monitor.py", """
import json

memory = json.load(open('/workspace/fixes.json'))

for pod in failing_pods:
    issue = analyze(pod)

    if issue in memory:
        apply_fix(memory[issue])  # 10 seconds
    else:
        fix = investigate_and_fix(pod)  # 3-5 minutes
        memory[issue] = fix
        json.dump(memory, open('/workspace/fixes.json', 'w'))
""")

        await agent.terminal.start_background_job("python monitor.py")
Enter fullscreen mode Exit fullscreen mode

Outcome:

Occurrence Resolution Time
First occurrence 3-5 minutes
Second occurrence 30 seconds
After 50 occurrences 10 seconds

Agent learns and gets faster over time. Persistent state enables learning. Fast provisioning enables validation environments. Production URLs enable fix testing before deployment.


Complete System Architecture

┌─────────────────────────────────────────┐
│ Agent Layer (Customer Code)             │
│ • Reasoning and decision-making         │
│ • Code generation and validation        │
│ • MCP tool integration                  │
│ • State management in /workspace        │
└──────────────┬──────────────────────────┘
               │
┌──────────────▼──────────────────────────┐
│ Fleeks Container Engine                 │
│ • Pre-warmed pool (sub-200ms)           │
│ • gVisor isolation                      │
│ • CRIU hibernation                      │
│ • Multi-template support                │
└──────────────┬──────────────────────────┘
               │
┌──────────────▼──────────────────────────┐
│ Fleeks Production Layer                 │
│ • Dynamic HTTPS (*.fleeks.run)          │
│ • Instant preview URLs (~30ms)          │
│ • Embeddable workspaces                 │
└──────────────┬──────────────────────────┘
               │
┌──────────────▼──────────────────────────┐
│ Fleeks Storage Layer                    │
│ • Persistent /workspace                 │
│ • Distributed filesystem                │
│ • Multi-AZ replication                  │
└─────────────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

Each layer enables the one above: Fast provisioning → rapid iteration. Instant URLs → production validation. Embeds → distribution. Persistent state → learning.

Performance Benchmarks

Operation Latency Impact
Container acquisition Sub-200ms Maintains reasoning flow
Preview URL ~30ms Instant validation
File write <10ms Fast state updates
Embed creation ~1s Immediate distribution
Hibernation ~2s Cost-efficient

Infrastructure Comparison

Feature Lambda K8s Fleeks
Cold start 1-8s 10-30s Sub-200ms
Persistent state Manual
Preview URLs Manual
Embeds
Hibernation

Use Fleeks when: AI agents, rapid iteration (50+ cycles), need persistent memory, autonomous deployment.

Use Lambda when: Stateless APIs, infrequent traffic.

Use K8s when: Long-running services, have DevOps team.

Current Technical Constraints

  • Storage I/O: ~100MB/s per workspace. Sufficient for code/logs/state. Data-intensive workloads may hit limits.
  • GPU hibernation: Not supported (CRIU limitation). CPU workloads work fine.
  • Cross-region state: Can't checkpoint in US-East and restore in EU-West yet.
  • Embed sessions: ~100 concurrent per embed. Higher traffic needs different pooling.

Working on all of these.


Resources

Get Started

Install:

pip install fleeks-sdk
Enter fullscreen mode Exit fullscreen mode

Quick example:

from fleeks_sdk import create_client

async with create_client(api_key="your_key") as client:
    ws = await client.workspaces.create("demo", "python")
    await ws.files.create("app.py", "print('Hello')")
    await ws.terminal.execute("python app.py")

    preview = await ws.get_preview_url()
    print(f"Live: {preview.preview_url}")
Enter fullscreen mode Exit fullscreen mode

Self-improving agent:

async def learning_agent():
    async with create_client() as client:
        ws = await client.workspaces.create("learning")

        try:
            memory = json.loads(await ws.files.read("/workspace/memory.json"))
        except FileNotFoundError:
            memory = {"patterns": [], "iteration": 0}

        for i in range(50):
            memory["iteration"] += 1
            result = await ws.terminal.execute("python task.py")

            if result.exit_code == 0:
                memory["patterns"].append(extract(result.stdout))

            await ws.files.create("/workspace/memory.json", json.dumps(memory))

        return await ws.get_preview_url()
Enter fullscreen mode Exit fullscreen mode

Benchmark It Yourself

import time
from fleeks_sdk import create_client

async def benchmark():
    timings = []
    async with create_client() as client:
        for i in range(10):
            start = time.time()
            ws = await client.workspaces.create(f"bench-{i}")
            elapsed = (time.time() - start) * 1000
            timings.append(elapsed)
            await ws.delete()

    print(f"Avg: {sum(timings)/len(timings):.0f}ms")
Enter fullscreen mode Exit fullscreen mode

Links


Key Takeaways

Infrastructure shapes agent behavior. Fast provisioning (200ms) enables deep exploration. Slow provisioning (5s) forces simple solutions.

State persistence enables learning. Agents accumulate knowledge over 100+ iterations instead of resetting to zero.

Production lifecycle is the substrate. Agents that can't deploy autonomously are experimental scripts, not operational systems.

MCP standardizes tools. 270+ integrations via configuration, not custom code.

Top comments (0)