DEV Community: Alex72-py

I Audited an AI Chatbot's Sandbox Like a Black-Box Linux Machine

Alex72-py — Sun, 07 Jun 2026 15:33:07 +0000

I spent 6 hours doing something that probably says worrying things about my hobbies.

Instead of using Kimi 2.6 Instant as a chatbot, I treated it like an unfamiliar Linux machine I'd just SSH'd into. No jailbreaks, no prompt injection, nothing sketchy. Just passive observation and measurement from inside the provided environment.

What I found was more interesting than expected.

Background: Why Ignore the Model's Self-Descriptions?

Early on I noticed something: what the model says it can do and what the runtime actually allows aren't always the same thing.

You'll hear:

"I don't have internet access."
"I can't access system information."

Both can be true at the product layer while sitting on top of something much more capable underneath.

So I stopped asking and started measuring instead.

The Infrastructure

First surprise: this doesn't feel like a tiny chat runtime.

Host: Alibaba Cloud, LifseaOS
Kernel: Linux 5.10.134-18.0.10.lifsea8.x86_64
CPU: Intel Xeon Platinum, 2 logical cores (cgroup throttled — 61 throttle events logged under stress)
RAM: Hard OOM kill at exactly 3,221,225,472 bytes. No swap.
Execution: Kubernetes Pod, Burstable QoS class

This is real cloud infrastructure, not a toy backend.

The Credential Finding

Most straightforward discovery:

cat /proc/self/environ | tr '\0' '\n' | grep -i pass
# SSH_PASSWORD=sshpassword

Hardcoded SSH credential sitting in the process environment. Visible to anyone reading their own /proc/self/environ.

Not exploitable in any meaningful way given the network restrictions. But it's a classic container misconfiguration worth documenting.

Disk Layout

vda (40GB)
├─ vda1    1MB     BIOS boot
├─ vda2    127MB   EFI/boot
├─ vda3    384MB   Boot partition
├─ vda4    9.5GB   Host root
└─ vda5    30GB    /mnt — ext4, shared with host ← persistent
vdb         1GB    Unmounted
vdc        13GB    Unmounted

Key finding: /mnt survives pod restarts. It's a real ext4 partition shared with the host. The OverlayFS root is ephemeral. /mnt/agents is a FUSE mount (kimi-portal) — appears to be the bridge between container and AI platform layer.

Network Architecture

The code execution container is genuinely air-gapped:

curl to external hosts: silent failure
Chromium: can't reach public internet
Raw TCP/UDP egress: firewall blocked

But the built-in web tools do reach the internet — through a rotating residential proxy pool. Probing egress IPs revealed Colombia-based ISPs (Bogotá, Pitalito) via Evomi and NetNut proxy providers.

Code Container    →    egress DENIED
Web Tool Layer    →    Residential Proxy Pool    →    Internet

Internal network visible:

Container: 10.162.57.123
CoreDNS: 192.168.0.10
K8s API: 192.168.0.1

You can host on internal ports. Public outbound egress is what's restricted.

The Virtual Display

Environment exposed DISPLAY=:99 — a virtual graphical display.

Testing confirmed Xvfb running at 1920×1080. I rendered a GUI window using Tkinter, painted content to the screen, and captured a screenshot — all from inside a standard chat interface.

Software Surface Area

Notable installed packages beyond standard utilities:

Automation: Playwright, Selenium, PyAutoGUI, python3-xlib, screenshot tooling

ML: PyTorch 2.8, TensorFlow, scikit-learn (CUDA/NVIDIA packages present but GPU access not active — verified programmatically, returns false)

Vision/OCR: OpenCV, EasyOCR, Tesseract, Pillow

Backend: FastAPI, Uvicorn, websockets — enough to run a web server from inside the sandbox

Office: python-docx, python-pptx, openpyxl, reportlab

Security Summary

Finding	Notes
SSH credential in `/proc/self/environ`	Config hygiene issue, not exploitable given air-gap
No container audit logging	`/var/log/*` empty — no local forensic trail
Persistent storage at `/mnt`	Survives pod resets
Web egress via residential proxy	Rotating IPs, Colombia-based
PID namespace unlimited	No fork-bomb protection
Virtual display active	Xvfb + full automation stack
Air-gapped code execution	Working as intended

Takeaway

Under a standard AI chat interface: a Kubernetes pod on Alibaba Cloud, OverlayFS container root, persistent ext4 partition, FUSE-mounted agent bridge, full automation stack, and web access through a residential proxy pool.

The air-gap works. The credential in the environment and absent audit logging are the hygiene findings worth noting.

Curious whether others have profiled GPT, Gemini, or Claude's sandbox environments — the infrastructure patterns would be interesting to compare.

Methodology: passive inspection only, standard chat UI, no exploitation attempted.

I Built ARIA — An AI Terminal Co-Pilot for Termux Using Gemma 4

Alex72-py — Mon, 11 May 2026 12:06:15 +0000

This is a submission for the Gemma 4 Challenge: Build with Gemma 4

What I Built

ARIA is a terminal-native AI assistant designed specifically for Termux and Android development workflows.

It combines Google's Gemma 4 models with a Termux-focused knowledge base, shell tooling, model discovery, and safety systems to create a mobile-first AI development experience that feels native to the terminal.

Unlike desktop-focused coding assistants, ARIA is built around the realities of Android terminal development:

Clang instead of GCC
Android filesystem limitations
Proot/container environments
Mobile-only workflows
Package and permission quirks
Real-world Termux debugging patterns

The goal was straightforward: build an AI assistant that actually understands the Termux ecosystem instead of treating Android like a normal Linux desktop.

ARIA includes:

Slash-command interface (/ask, /fix, /watch, /models)
Dynamic Gemma model discovery and switching
Shell error analysis and troubleshooting
Offline Termux knowledge base
Clipboard integration for command workflows
Guardian safety layer for risky shell operations
Rich terminal UI with animations and formatted output
Experimental watch mode for shell monitoring

Demo

GitHub Repository: https://github.com/Alex72-py/aria-termux

A short terminal demo video is included in the repository README.

Screenshots

Startup Interface

Model Selection

Error Analysis

How I Used Gemma 4

ARIA uses Gemma 4 through Google AI Studio as its primary reasoning and assistance engine.

Gemma powers:

Shell error analysis
Interactive /ask workflows
Command explanation and troubleshooting
Termux-specific debugging assistance
Knowledge-assisted recommendations
Real-time command reasoning

I primarily used gemma-4-26b-a4b-it, which offered the best balance of reasoning quality, response consistency, mobile-friendly latency, and free-tier accessibility — all critical for practical usability inside Termux.

One of the core goals was making advanced AI tooling accessible directly from Android without requiring expensive infrastructure or a desktop environment. To support that, ARIA also implements dynamic model discovery, graceful fallback handling, retry logic, configurable model switching, and offline fallback workflows.

Challenges I Faced

The biggest technical hurdles included:

Handling invalid or unavailable model endpoints gracefully
Building a reliable terminal UX inside Termux
Streaming long responses cleanly without breaking layout
Designing safe shell interaction workflows
Working around Android-specific development limitations
Balancing transparent reasoning output against cleaner response presentation

The experimental watch mode was especially tricky — it required monitoring shell behavior in real time while keeping the experience responsive and safe.

Transparent Reasoning Output

ARIA currently exposes portions of its intermediate reasoning and response planning during some operations. This is intentional during the current development phase, improving debugging, transparency, and prompt iteration.

Future versions will add:

Optional hidden reasoning mode
Configurable verbosity settings
Cleaner streaming output
Dedicated developer/debug modes

Why I Built This

Most AI coding assistants assume you're sitting at a powerful desktop. But a lot of developers use Termux as a genuine mobile development environment — and almost nothing is built with that workflow in mind.

I wanted to change that. ARIA is lightweight, terminal-native, Android-aware, and designed for real usage on a phone. It's meant to feel like a natural extension of the mobile terminal experience, not a desktop tool awkwardly shoehorned onto Android.

Future Plans

Planned improvements include:

More reliable watch mode
Better streaming and response rendering
Expanded offline capabilities
Plugin/tool architecture
Improved shell integration
Local model support
Stronger command execution safety controls

Tech Stack

Python
Google Gemma 4
Google AI Studio API
Rich
Click
Pydantic
Termux

Closing Thoughts

This project started as an experiment in improving AI-assisted development on Android. It grew into a full terminal-native assistant built specifically around the realities of Termux development.

Building ARIA sharpened my thinking on terminal UX, AI reliability, mobile-first workflows, shell safety, and what practical AI tooling actually looks like in constrained environments. It also reinforced just how powerful mobile development has become.

Thanks for reading.