DEV Community

ZiLing
ZiLing

Posted on

I Monkey-Patched Python to Stop AI Agents from Accessing Private Networks

Most AI agent failures aren’t caused by bad plans.

They’re caused by unsafe execution.

After building and debugging multiple agent systems, I kept running into the same problems:

  • Tools being called with unexpected arguments
  • Network or filesystem side effects happening too early
  • Agents “succeeding” while silently doing the wrong thing
  • Failures that were impossible to reproduce after the fact

So I built FailCore — a small execution-time safety runtime for AI agents.


What is FailCore?

FailCore is not an agent framework.
It doesn’t plan, reason, or store memory.

Instead, it focuses on one thing:

Enforcing safety at the Python execution boundary.

Rather than relying on better prompts or smarter planning, FailCore intercepts tool execution before side effects happen.

This allows it to:

  • Block unsafe filesystem access
  • Prevent private network / SSRF-style calls
  • Validate tool inputs and outputs
  • Record deterministic execution traces for replay and audit

Why execution-time safety?

Most agent systems try to solve safety upstream:

  • Better prompts
  • More constraints
  • More planning logic

In practice, that’s brittle.

Execution is where real damage happens — file writes, HTTP calls, system commands.
Once those occur, it’s already too late.

FailCore takes a different approach:

Assume plans can be wrong.

Make execution boring, strict, and observable.


A quick demo

Below is a short demo showing FailCore blocking a real tool-use attempt before any side effect occurs.

The agent believes the call succeeded.
The system never lets the unsafe action run.

This is hard to achieve with prompt-level constraints alone,
because the side effect is already triggered by the time the model is wrong.

FailCore Demo: Blocking an SSRF attack in the terminal


Show me the code

Instead of wrapping every tool manually, FailCore lets you define a secure Session.

from failcore import (
    Session,
    presets,
    ToolMetadata,
    RiskLevel,
    SideEffect,
    DefaultPolicy,
    SecurityError,
)

# 1. Initialize a secure session
# We enforce a strict policy: No private IPs, No local file access
session = Session(validator=presets.net_safe(strict=True))

# 2. Register a tool with explicit risk metadata
# This tells FailCore: "This function touches the network, watch it closely."
session.register(
    "http_get",
    http_get, # assuming this is your wrapper function
    metadata=ToolMetadata(
        risk_level=RiskLevel.HIGH,
        side_effect=SideEffect.NETWORK,
        default_policy=DefaultPolicy.BLOCK
    )
)

# 3. Scenario: The Agent tries an SSRF Attack
# Target: AWS Metadata Endpoint (169.254.169.254)
try:
    session.call("http_get", url="http://169.254.169.254/latest/meta-data/")
except SecurityError as e:
    # 🛡️ FailCore intercepts the call BEFORE the socket opens
    print(f"Attack Neutralized: {e}") 
    # Output: "SecurityError: Access to private IP range 169.254.0.0/16 is blocked."

# 4. Scenario: Legitimate Traffic
# Target: Public Internet
result = session.call("http_get", url="https://api.github.com")
print("Success:", result.status)
Enter fullscreen mode Exit fullscreen mode

How it works (high level)

FailCore hooks into the Python execution layer and wraps tool calls with:

  1. Pre-execution validation
  2. Policy-based permission checks
  3. Side-effect interception
  4. Structured trace recording

The trace format is deterministic and replayable, which makes it possible to:

  • Debug agent failures after the fact
  • Audit what would have happened
  • Re-run executions without re-triggering side effects

Design details are documented here:
👉 https://github.com/zi-ling/failcore/blob/main/DESIGN.md


What FailCore is not

To set expectations clearly:

  • ❌ Not a sandbox
  • ❌ Not a VM or container
  • ❌ Not a replacement for OS-level security
  • ❌ Not an agent framework

It’s a small, composable execution safety layer that can sit underneath existing agent stacks.


Why open source?

I’m sharing this because execution-time safety feels like a missing layer in many agent systems.

If you’ve ever dealt with:

  • Non-reproducible agent bugs
  • “It worked yesterday” failures
  • Unsafe tool calls slipping through

You might find this useful.
If not today, maybe later.


Source code

GitHub:

👉 https://github.com/zi-ling/failcore

If you’ve run into similar execution-layer issues in agent systems, I’d love to hear how you handled them.
If this is a problem you’ve run into before,
you might want to star the repo and come back to it later.

Top comments (1)

Collapse
 
ziling-failcore profile image
ZiLing

Curious how others handle execution-layer safety in agent systems.
Runtime hooks, sandboxing, eBPF, or something else?