DEV Community

Cover image for Stop building insecure "Private" AI assistants. Use this Hardened DevSecOps Stack.
Andrew Wiggins
Andrew Wiggins

Posted on • Originally published at irexta.com

Stop building insecure "Private" AI assistants. Use this Hardened DevSecOps Stack.

The Problem: "Private" ≠ "Secure"

We’re all moving toward self-hosted AI platforms like Ollama and LocalLLMs to protect proprietary code and internal workflows. But here’s the uncomfortable reality:

Most local AI deployments are nothing more than security theater.

If your stack is running:

  • An unauthenticated Redis instance
  • Containers without syscall isolation
  • AI-generated code directly on the host kernel

…then your infrastructure is still exposed.

A single SSRF (Server-Side Request Forgery) vulnerability can provide attackers lateral access to internal services, secrets, and execution environments.


What Exactly Is "Hardening"?

In modern DevSecOps, Hardening is the process of minimizing a system’s attack surface by removing insecure defaults and enforcing strict isolation policies.

Instead of deploying a "default install," we harden every layer of the AI stack.

Hardening Principles

Security Layer Hardened Approach
Authentication Require credentials for every internal service
Isolation Sandbox untrusted workloads using gVisor
Failure Handling Ensure graceful degradation with Lua/OpenResty
Execution Control Prevent direct host-kernel interaction
Network Security Restrict unnecessary outbound communication

The iRexta Hardened Architecture

On our iRexta Bare Metal infrastructure, we move away from marketing buzzwords and implement a true Zero-Trust AI blueprint.


1. Authenticated Redis (Stopping SSRF Attacks)

One of the biggest misconceptions in infrastructure security is:

"It’s localhost, so it’s safe."

It isn’t.

Internal services exposed without authentication become high-value SSRF targets. We enforce strict Redis password authentication to prevent lateral movement.

# Install Redis securely
sudo apt install redis-server -y

# Enable authentication
sudo sed -i \
's/# requirepass foobared/requirepass YOUR_COMPLEX_PASSWORD/' \
/etc/redis/redis.conf

# Restart Redis
sudo systemctl restart redis-server
Enter fullscreen mode Exit fullscreen mode

Why This Matters

Without authentication:

  • Internal APIs can query Redis directly
  • SSRF vulnerabilities become infrastructure breaches
  • Session tokens and cached secrets become exposed

With authentication enabled, Redis becomes significantly harder to abuse internally.


2. Resilient Lua Access Control

We use OpenResty + LuaJIT for high-performance request handling and secure gateway enforcement.

Instead of allowing backend failures to crash workers, Lua-based logic ensures graceful failure handling.

-- High-speed, error-aware Redis connection
local ok, err = red:connect("127.0.0.1", 6379)

if not ok then
    ngx.log(ngx.ERR, "failed to connect to Redis: ", err)
    return ngx.exit(500)
end
Enter fullscreen mode Exit fullscreen mode

Benefits of Lua-Based Access Logic

  • Extremely low latency execution
  • Graceful failure handling
  • Better resilience during backend outages
  • Reduced worker instability under load

This architecture keeps the AI gateway stable even during partial infrastructure failures.


3. gVisor: The Ultimate Sandbox

Traditional Docker containers still share the host kernel.

That becomes dangerous when executing:

  • AI-generated scripts
  • Untrusted automation
  • Dynamically produced code

To solve this, we deploy gVisor using the runsc runtime.

gVisor intercepts system calls and places workloads behind a dedicated user-space kernel boundary.

# Execute untrusted AI code inside gVisor
docker run --rm \
  --runtime=runsc \
  --network=none \
  -v /tmp/ai_eval:/workspace \
  node:20 \
  node /workspace/script.js
Enter fullscreen mode Exit fullscreen mode

Why gVisor Matters

Standard Docker gVisor Sandbox
Shares host kernel User-space kernel isolation
Larger attack surface Reduced syscall exposure
Higher breakout risk Hardened execution boundary
Minimal runtime filtering Deep syscall interception

For AI-generated code execution, this isolation layer is critical.


Dual-Model Performance Strategy

Security should not come at the expense of performance.

Instead of relying on a single overloaded model, we separate workloads across specialized models.

Model Responsibility
Qwen 2.5 Coder Ultra-fast autocomplete and inline suggestions
DeepSeek Coder V2 Complex reasoning, architecture, and chat workflows

This dual-model approach improves:

  • Latency
  • Resource allocation
  • Context quality
  • Interactive coding performance

Final Thoughts

A self-hosted AI stack is only as secure as its weakest internal service.

Running AI locally without:

  • Authenticated internal services
  • Sandboxed execution
  • Failure-aware gateways
  • Network isolation

…does not create a private infrastructure.

It simply creates a larger attack surface.

By integrating authenticated Redis, resilient Lua access control, and gVisor sandboxing on iRexta Bare Metal, you move from hobby-grade deployments to a true DevSecOps-grade AI platform.

Stop deploying "security theater."

Build infrastructure that is actually hardened.

Top comments (0)