Background
Many AI agents have these two big problems.
Problem 1: Token waste during waiting
A lot of agents I've seen handle "waiting" poorly. If you want the agent to monitor a webpage until something changes — like a price update, a status change, or a specific piece of text appearing — the usual approach is to either:
- keep calling the LLM in a loop ("is it there yet? is it there yet?")
That burns through tokens and requests even though the agent isn't really doing that much.
Problem 2: Fixed toolsets
Many agents ship with a fixed set of tools. If you give them a task that doesn't fit those tools well, they either fail, hallucinate a solution, or you have to go add a new tool yourself.
I wanted an agent that could notice when it's missing a capability and just... build it.
What I built
The project is called GrimmBot. It's an open-source AI agent that runs inside a Docker container with a full Debian-based desktop environment, a browser, and a set of built-in tools.
But the two features I think make it different are:
- Zero-token monitoring mode
- Runtime tool generation
Let me explain how each one works.
Zero-token monitoring
When GrimmBot needs to wait for something — like a webpage to update, a piece of text to appear, or a visual condition to be met — it doesn't keep polling the LLM.
Instead, it hands off the waiting to a local Python watcher loop.
This loop can monitor for:
- specific text or regex patterns in the DOM
- visual conditions using pixel/color bounding boxes
- any other condition you can express in Python
While this loop is running, no LLM calls are made. The model is essentially asleep.
The moment the trigger condition is met, the watcher exits and wakes the agent back up. Only then does it make another API call to continue the task.
So if you ask GrimmBot to "watch this page until this text appears," it will:
- set up a local watcher for that condition using the built in tools it has for doing so
- suspend LLM usage
- wait locally
- wake up and resume once the condition is true
This makes long monitoring tasks much more efficient.
Runtime tool generation
GrimmBot ships with 60+ built-in tools for things like:
- browser control (clicking, navigation, DOM extraction)
- file operations (read, write, patch)
- shell commands
- screenshots and visual grids
- scheduling and memory
But sometimes that's not enough.
If GrimmBot hits a task where none of its existing tools are a good fit, it can write a new one
So if you ask it to do something weird and specific — like parse a proprietary log format, or interact with some niche API — it can try to build the tool itself in python instead of just failing.
These custom tools are persistent. If it builds a tool on Monday, it still has access to it on Tuesday.
The environment
GrimmBot runs fully containerized in Docker.
The container includes:
- Debian Bookworm Slim
- a headless X11 display using
xvfb - VNC access via
x11vnc(so you can watch what it's doing) - Chromium browser
- Python with common libraries
- Java 17 and build tools for code tasks
- LiteLLM for model-agnostic API support (works with OpenAI, Anthropic, Gemini, OpenRouter, or local models)
You interact with the agent through an attached terminal, and you can view its desktop over VNC.
There's also a "wormhole" folder — a shared directory between your host machine and the container for passing files in and out.
Human approval for certain actions
I added a system where when the agent tries to take certain actions, you must give it approval for them to work.
Certain actions — like running arbitrary shell commands, creating a custom tool, or navigating outside a list of allowed domains — will pause and ask for approval in the terminal before proceeding.
Top comments (0)