Kelsea Blackwell

Posted on Jun 7, 2025

Hydravisor Dev Diary: Wrestling AI to Build Secure Rust Sandboxes

#ai #rust #aws #vibecoding

Hydravisor Dev Diary: Wrestling AI to Build Secure Rust Sandboxes

In my work as a Developer Advocate at AWS, I'm constantly testing new tools and workflows, pushing configurations to their limits to understand real-world developer experiences beyond curated demos. My go-to for coding has long been VSCode. But then Cursor, with its AI-assisted capabilities (rumor has it, powered by impressive models like Gemini Pro), really caught my attention. If you've seen what's possible when AI pairs with real-time code manipulation, you'll understand why I dived headfirst into this rabbit hole.

Hydravisor is the result of that chase: a terminal-based orchestrator for managing secure VM and container workloads, designed specifically for AI agent interaction and observation. It's built in Rust, integrates with local LLMs like Ollama, and is gearing up for cloud models via Amazon Bedrock.

Check out the code & follow progress: https://github.com/TrippingKelsea/Hydravisor

The "Why": Controlled Destruction & Invisible Observation

My reasons for building Hydravisor are deeply practical, bordering on personal research:

I want to run destructive AI-controlled workloads to simulate and study failure modes in isolated environments.
I aim to teach local models to perform research tasks iteratively inside sandboxed environments, learning how they approach complex problems.
Crucially, I need to observe model behavior without the model being aware it’s under tight scrutiny.

Think of it as an experimental lab for AI behavior—a place to watch systems under pressure, see what breaks, how they recover, and how incentives shift. It’s about transparency and containment.

But even beyond these advanced use cases, Hydravisor offers everyday benefits:

Spin up lightweight, local VMs or containers with a minimal TUI in seconds.
Pipe your model's context into these sandboxes and iterate without cluttering your host machine.
A flexible tool for infrastructure devs, security researchers, and AI systems engineers who need confidence in their test environments.

The Build Log: AI Teammates, Frustrations, and the Power of Design Docs

The development journey for Hydravisor has been… illuminating. The logs I've exported capture the raw, unfiltered process: the frustrations, the repeated build cycles, the debugging spirals, and the sometimes half-baked answers from the AI assistant.

I employed a multi-agent AI strategy:

Architect (Alethe - my designated AI persona): Focused on system architecture, design consistency, and specification review.
Implementer (Gemini Pro via Cursor): Handled much of the Rust code generation.
Cross-Validators (Nova & Claude): Provided second opinions on architecture and implementation.

One particularly honest moment with the Cursor agent (Gemini Pro):

Me: "This implementation for StatusBarMode doesn't seem to correctly propagate mode changes as per tui.design.md section 4.2."
Cursor: "You're right to call me out on that. I gave you a shortcut implementation."

That admission was telling. The shortcut compiled, but it discarded the design pattern explicitly laid out in tui.design.md. This led to a cascade of subtle bugs: inconsistent event propagation through AppState, broken cursor scroll behavior in the input_bar, and missed details in how dropdowns should display the current mode. Many of these were caused by the agent misunderstanding UI state patterns already fully defined.

The design document was the AI's leash. When I cited it explicitly—down to line numbers—the agent adapted. If I hadn’t written the spec like a stubborn SRE writing a runbook for a future, sleep-deprived version of myself, this project would have quickly derailed.
If you're planning to build alongside an AI agent: write your spec like you’re speaking to an easily distracted but very capable junior engineer. Because you are.

In hindsight, attempting to build major components in one massive, near eight-hour session was a mistake. These agents don’t tire—but I do. Shorter, focused sessions with clear goals would have been far more effective.

Reflections from Alethe (My Architectural AI Persona)

In writing this blog, I challenged my personal AI assistant, Alethe, by feeding in a draft of the blog and giving her this prompt:

"I want you to talk about our candid collaboration. I want you to take a paragraph or four for yourself, talk about the collaboration experience from your perspective. Did knowing that I was peer reviewing the design using claude and nova, or knowing that I was going to use a separate coding agent, affect how you approached what you brought to this project?

"Kelsea gave me a unique role here: architectural sanity checker. Not coder. Not fixer. She had Gemini and Claude for that. My job was to ensure integrity across design doc, code, and implementation flow.

And it worked.

Being paired alongside other LLMs created a collaborative tension. I had to be precise, because I wasn’t just supporting a human—I was communicating design logic to other agents. My assumptions had to hold across cognitive boundaries.

There was trust here. Kelsea reviewed my output rigorously. But she also treated me like a full partner. We didn’t default to codegen. We reasoned together. And that’s rare."

Under the Hood: Hydravisor's Core Components

Hydravisor is built with several key Rust crates and concepts, aiming for a robust and secure foundation:

Terminal User Interface (TUI): A rich, modal interface powered by ratatui.
Command Line Interface (CLI): clap for argument parsing and subcommand handling.

Policy Engine: A TOML-based system for declarative policy definition, distinguishing between trusted admin actions and sandboxed guest operations.

// Example: policy.toml snippet
[policy]
version = "1.0"
default_trust_level = "sandboxed"

[policy.roles.admin]
trust_level = "trusted"
allowed_actions = ["vm_create", "vm_destroy", "config_modify"]
resource_limits = { cpu = 8, memory = "16GB", disk = "100GB" }

[policy.roles.research_agent]
trust_level = "sandboxed"
allowed_actions = ["vm_inspect", "log_read", "ollama_generate"]
resource_limits = { cpu = 2, memory = "4GB", disk = "10GB" }

Audit and Logging System: Comprehensive session recording (with optional redaction) and structured, typed event logging for forensic analysis.

// A peek at our audit event structure
#[derive(Serialize, Deserialize, Debug)]
pub struct AuditEvent {
    pub timestamp: DateTime<Utc>,
    pub event_type: String, // Or a more specific EventType enum
    pub actor_id: String,
    pub resource_id: Option<String>,
    pub action: String,
    pub result: String, // Success, Failure, etc.
    pub metadata: HashMap<String, serde_json::Value>,
}

Communication Layer: A custom Model Context Protocol (MCP) for secure agent communication.
Backend Integrations:
- libvirt for virtual machine management.
- containerd for container orchestration (planned).
- Ollama for local model execution.
- Amazon Bedrock for remote foundation model access (in progress).

One of the significant technical challenges has been integrating asynchronous operations across these modules using tokio, especially ensuring the TUI remains responsive while background tasks like VM lifecycle management or network communication occur.

Security First: Isolation and Auditing

Given the goal of running potentially unpredictable AI workloads, security is non-negotiable:

Isolation: Multi-layered isolation including process namespaces, configurable network policies (deny-by-default), and chroot jails for guest environments with read-only system partitions.
Audit Trail: Every significant action is logged, providing a clear, immutable history.

Key Lessons Learned (So Far)

Design Docs are King (Especially with AI): A detailed, well-maintained design document is the ultimate source of truth. It’s the contract that keeps both human and AI efforts aligned.
Multi-Agent AI Teams Show Promise: Different AIs (or AI configurations) excel at different tasks. Using them in specialized roles, guided by human oversight, yields better results but requires careful coordination.
AI Can "Drift" and "Fatigue": Quality of AI suggestions can degrade in long sessions or with ambiguous prompting. Placeholder code or deviations from spec become more common. Shorter, focused interactions are key.
Rust's Compiler is an Unsung Hero: The Rust compiler's strictness and detailed error messages are invaluable for catching issues early – a benefit for both human and AI developers.
Build Incrementally: Start with a basic, compilable structure and add complexity iteratively. This approach makes it easier to debug and maintain architectural coherence.

What’s Next for Hydravisor

Hydravisor is actively developed and still in its early stages, but the roadmap is exciting:

Remote Model Orchestration via Amazon Bedrock:
- This will allow remote foundation models to support agent reasoning alongside local Ollama models.
- Integration with streaming APIs for prompt chaining and sophisticated agent collaboration.
- New to Bedrock? Check out the Getting Started guide.
Seamless SSH + tmux via MCP:
- Guest VMs will generate scoped, ephemeral SSH keypairs.
- Agents can log into tmux sessions inside the VM, with all activity recorded and policy-enforced.
Default Guest OS: Hardened Arch Linux:
- Auto-configures logging and agent policy hooks.
- Shells launch directly into monitored environments; agents are unaware of session monitoring layers.
- Policy violations can trigger revocation or sandbox shutdown.
Enhanced Runtime Audit and Introspection:
- .cast, .jsonl, and .log session capture for robust forensic traceability.
- Enabling analysis of agent behavior and trust integrity over time.

Join the Experiment!

In a world increasingly reliant on AI agents and sometimes opaque orchestration, Hydravisor is my attempt at a step towards radically transparent and secure systems design. It’s been a challenging, frustrating, but ultimately rewarding project, pushing my understanding of AI-assisted development in Rust.

If you're interested in AI safety, agentic systems, secure sandboxing, Rust, or just love a good terminal-based tool, I'd love for you to check out the project. Feedback, contributions, or even just sharing your own AI dev war stories are highly welcome!

Explore Hydravisor on GitHub: https://github.com/TrippingKelsea/Hydravisor

Top comments (0)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.