Every time I wanted an AI agent to do something real: search the web, read a file, send an email, remember something from last week; I ended up writing glue code. Lots of it. I wasn't building AI workflows, I was babysitting plumbing.
That frustration became Captain Claw.
The Problem I Kept Running Into
I tried the popular frameworks. LangChain is powerful but verbose - you feel like you're writing boilerplate to write boilerplate. CrewAI is elegant for multi-agent orchestration but opinionated about how agents relate to each other. AutoGen is great for research but heavy to set up for everyday use. OpenClaw had to be heavily configured to start running smoothly for me.
None of them felt like a runtime — something you just start, talk to, and get work done with. They all felt like libraries you assemble into a runtime yourself.
I wanted something closer to how I actually work: multiple parallel workstreams, different tools depending on the task, ability to pick up where I left off, and the freedom to use whatever model makes sense for each job.
What I Actually Built
Captain Claw is a local AI agent that you install with pip install captain-claw and just run. No framework to configure, no agent graph to design. You open a session and start working.
The architecture centers on one idea I hadn't seen done well elsewhere: sessions as first-class citizens.
Most agent runtimes give you one conversation with one model. Captain Claw gives you named, persistent sessions — each with its own model, context, and history — running simultaneously. Session #1 can be Claude Sonnet analyzing your codebase. Session #2 can be GPT-4o drafting the release notes from what session #1 found. Session #3 can be a local Ollama model handling a background task. All at the same time. All resumable next week.
/new analysis
/session model claude-sonnet
> Analyze the performance bottlenecks in ./src
/new writeup
/session model chatgpt-fast
> Based on session #1 findings, write an engineering post-mortem
/session run #1 summarize the top three issues # run a prompt in another session
It sounds simple. It turns out to be surprisingly useful.
29 Tools, No Plugins
The other thing I was tired of was installing extensions to do basic things. Captain Claw ships with 29 built-in tools that the agent picks automatically based on the task:
- System: shell, read, write, glob
- Web: web_search, web_fetch, web_get (raw HTML)
- Documents: pdf_extract, docx_extract, xlsx_extract, pptx_extract
- Images: image_gen (DALL-E 3, gemini), image_ocr, image_vision
- Google: google_drive, google_calendar, google_mail (read)
- Communication: send_mail (SMTP/Mailgun/SendGrid), pocket_tts
- Memory: todo, contacts, scripts, apis, playbooks — all persistent cross-session
- Data: datastore (SQLite-backed relational tables with a web dashboard)
- Deep memory: typesense (hybrid keyword + vector search archive)
- Network: botport (agent-to-agent routing across Captain Claw instances)
- Android: termux (camera, GPS, torch, battery via Termux API)
The Termux one is my personal favourite. I can ask the agent to take a photo with my phone, get my current location, or toggle the flashlight. It's ridiculous and I love it.
The Orchestrator
Once sessions clicked, the next natural step was orchestration. The /orchestrate command takes a complex task, decomposes it into a DAG, and executes subtasks in parallel across separate sessions with real-time progress monitoring.
/orchestrate Research the top 5 vector databases, benchmark their Python APIs,
and write a comparison report to ./reports/vectordb-comparison.md
This spins up multiple sessions, assigns tasks, and assembles results — without you managing any of it. It also ships as a headless CLI (captain-claw-orchestrate) for automation pipelines.
What I Learned Building It
Persistence changes everything. The moment sessions became resumable, the whole product felt different. You stop thinking in single conversations and start thinking in ongoing projects.
The agent needs to know what it did before. Cross-session memory for todos, contacts, scripts, and APIs sounds like a nice-to-have until you realize the agent can now say "last time you asked me to call this API, here's the endpoint and auth pattern I used." That's when it starts feeling like a collaborator rather than a stateless chatbot.
Guards matter more than I expected. Input, output, and script/tool guards with configurable approval levels turn out to be essential when you're giving an agent shell access. Off by default, but easy to enable.
Small-context models deserve better. The chunked processing pipeline came from a practical need: I wanted to run cheaper local models on large documents. Rather than failing, the pipeline splits content into chunks, processes each with full instructions, and synthesizes results. It works transparently.
How It Compares to OpenClaw
OpenClaw is impressive. It amassed over 265k GitHub stars so far and deserves its popularity. It connects to WhatsApp, Telegram, Slack, Discord, iMessage, and over 20 other messaging platforms and treats your phone as the primary interface. It's built around the idea of a personal AI assistant that lives in your chat apps.
Captain Claw is built around a different idea: your terminal and browser are the interface, and your work environment is the context.
Here's how they compare honestly:
| Captain Claw | OpenClaw | |
|---|---|---|
| Primary interface | Web UI + terminal | Messaging apps (WhatsApp, Telegram etc.) |
| Language | Python | TypeScript/Node.js |
| Sessions | Multi-session, named, persistent, mixed-model | Single session per channel |
| Models simultaneously | Yes — different model per session | No |
| Built-in tools | 29 (batteries included) | Extensible via skills marketplace |
| Orchestration | DAG mode, parallel sessions | Single agent |
| Datastore | SQLite-backed relational tables + web dashboard | No |
| Deep memory | Typesense hybrid search | No |
| Playbook distillation | Yes — rate sessions, auto-distill patterns | No |
| Agent-to-agent | BotPort network | No |
| Android | Termux (camera, GPS, torch) | iOS/Android voice nodes |
| Guards | Built-in, 3 levels, no plugins needed | Requires third-party (ClawBands) |
| Install | pip install captain-claw |
npm install -g openclaw |
| Fully local | Yes (Ollama, no API key) | Yes |
The clearest way to put it: OpenClaw is a personal assistant you talk to through your phone. Captain Claw is a work agent you run in your development environment.
OpenClaw excels if you want to text your agent from WhatsApp while commuting. Captain Claw excels if you want to run parallel research sessions, orchestrate multi-step workflows across different models, query a relational datastore the agent maintains, or automate your dev environment with shell access and document extraction.
There's also a security angle worth noting. OpenClaw's public skills marketplace (ClawHub) was found to have roughly 12% malicious skills, including keyloggers and malware distributed under innocuous names. Captain Claw's skills are installed explicitly — no public registry, no auto-install from untrusted sources — and guards are built into the core runtime rather than requiring a separate plugin.
They solve adjacent problems. If you want a phone-first personal assistant, OpenClaw is excellent. If you want a developer-focused local agent runtime with persistent multi-session workflows and a full toolset out of the box, that's what Captain Claw is built for.
Try It
python -m venv venv && source venv/bin/activate
pip install captain-claw
captain-claw-web
The web UI starts at http://127.0.0.1:23080. Or use captain-claw --tui for the terminal UI. Docker image available too: kstevica/captain-claw.
Terminal demo: asciinema.org/a/814073
Video walkthrough: youtube.com/watch?v=4g_aA_WnEaw
GitHub: github.com/kstevica/captain-claw
It's MIT licensed, fully open source, and works completely locally with Ollama if you don't want to touch any cloud APIs.
I'd genuinely love to hear what people think — what's missing, what's confusing, what you'd actually use it for. The issues tab is open.
Top comments (0)