DEV Community

Varad J
Varad J

Posted on

How I turned my AI CLI into an autonomous agent with Playwright and Sub-agents πŸš€

When I first built Codey, it was a simple CLI wrapper around an LLM with a few basic tools. It was great for small tasks, but as I started throwing harder problems at it, the limitations became obvious.

It couldn't run dev servers without blocking the thread, it couldn't browse documentation, and honestly, raw eval() calls were keeping me up at night.

So, I tore down the foundation and did a massive platform rewrite. Today, I'm excited to share how Codey evolved from a simple script into a secure, persistent agent runtime.

Here’s a deep dive into the technical upgrades.
🌐 1. Human-Like Browsing (Playwright + Vision)
I wanted Codey to be able to read documentation, check GitHub issues, and visually debug UIs. I integrated a full Playwright-backed web tool.

The Vision Bottleneck: Initially, to pass visual context to the model, the pipeline looked like this: Screenshot -> Write PNG to disk -> Read PNG -> Base64 encode. This disk I/O was noticeably slow. I optimized it by capturing the screenshot directly into memory as bytes and encoding it on the fly. We completely removed the .codey_screenshots/ temp directory.

Self-Healing Dependencies: There's nothing worse than a tool failing because a user doesn't have Chromium installed. Now, if the browser launch fails, Codey catches the error, automatically runs playwright install chromium, and retries the launch in the background.

Smart Prompting: If you drop a link like https://... into the terminal, the system dynamically injects the web tool into the prompt and immediately triggers web.navigate() instead of asking you to paste the content.

πŸ€– 2. Sub-Agents and Persistent Terminals
This is where the architecture really shifted from "chatbot" to "agent runtime".

The delegate Tool: Codey can now launch a completely autonomous sub-agent. This second agent gets its own tool loop, its own history, and its own context. It goes off to solve a sub-task and returns a summary to the main agent.
Persistent Sessions (terminal): Previously, if Codey ran a command, it would lose the process. I added start, send, peek, and stop actions. Now, Codey can start a Next.js dev server, leave it running in the background, peek at the logs, and continue writing code.
Human-in-the-Loop (ask): Sometimes the AI shouldn't guess. If Codey isn't sure which file to edit, it pauses execution and renders an interactive multiple-choice prompt in your terminal.

πŸ›‘οΈ 3. Security Hardening
As Codey got smarter, it got more dangerous. I had to lock it down.

Killing eval(): Arbitrary code execution is a massive vulnerability. I stripped out raw eval() for the calculator tool and replaced it with strict ast.parse() validation. We now use a strict whitelist of safe operators, functions, and constants.

Fixing Shell Injections: I moved away from raw shell execution and string concatenation. Before: git diff passed directly to the shell. After: Using subprocess.run([...]) combined with shlex.split() for safe argument parsing.

Path Traversal & Approval Gates: Added a strict assert_within_project() check to create_file, edit_file, and read_files so the agent can't randomly decide to read ../../../etc/passwd. I also added a CONFIRM_SHELL=true environment flag that forces Codey to ask for human permission before running potentially destructive commands.

🧠 4. State Management & Developer Experience
Finally, I overhauled how Codey remembers things.

Multi-Session Workflow: Codey used to dump everything into one history.jsonl per project. Now, it generates separate session files and greets you with an interactive startup picker (showing message counts and previews) so you can resume yesterday's work or start fresh.
Streaming & Context: Switched to token-by-token streaming for a snappy, ChatGPT-like feel. Added trim_history() and MAX_TOOL_ROUNDS to prevent infinite loops and runaway API costs.
Wrapping up
The patches transformed Codey from CLI + LLM + tools into a Persistent agent runtime + browser automation + subagents + project memory.

Building this has been an incredible lesson in agent orchestration and Python CLI development.

If you're interested in AI coding assistants, want to build your own, or just want to poke around the source code, check out the repo! I'd love your feedback, bug reports, or pull requests (we always need more tools).

πŸ‘‰ Check out Codey on GitHub: github.com/varad-13/codey

Let me know what you think in the comments! What tools should I add next?

Top comments (0)