Most automation tools for Android follow the same pattern: you write a script, schedule it, and watch it run. The interface is the script itself.
We wanted something different. What if you could just tell your cloud phone what to do?
That's the core idea behind BashClaw — a conversational control layer built on top of QCCBot's cloud Android infrastructure. This post walks through how it works under the hood.
The Problem With Script-First Automation
Script-based automation is powerful, but it has a steep entry curve. You need to know the API, understand the device state, chain the right commands in the right order, and handle failures manually.
For operators managing dozens of cloud phone instances — running TikTok warming cycles, Instagram engagement tasks, Telegram workflows — the overhead of scripting every action becomes the bottleneck. What they actually want is to express intent: "run a follow cycle on these 10 accounts" or "switch proxy and restart TikTok on device group A."
That's a natural language problem, not a scripting problem.
The Architecture: Four Layers
BashClaw sits between the user and the device. Here's how the layers connect:
1. LLM Layer — Intent Parsing
User input arrives as natural language. BashClaw routes it through an LLM to parse intent and determine which action to take. We designed this layer to be model-agnostic — users can connect their own model, with current support for ChatGPT, Claude, MiniMax, GLM, and Kimi. The LLM doesn't control the device directly; it's purely responsible for understanding what the user wants.
2. Skills Layer — Capability Mapping
Once intent is parsed, the LLM loads the relevant cloud phone management Skills. Skills are structured capability definitions that map high-level intents to concrete device operations — think of them as the bridge between "what the user said" and "what the system knows how to do." This is where domain knowledge lives: how to launch an app, how to run a script, how to manage device groups.
3. Task System — Execution Queuing
Resolved actions are passed to the task system, which handles scheduling, prioritization, and batching across multiple devices. This layer decouples instruction from execution — the LLM doesn't wait for the device; it hands off to the queue and returns immediately. This matters at scale, when you're dispatching operations across many instances simultaneously.
4. On-Device Executor — Action Runtime
Each cloud phone instance runs a built-in executor that receives tasks from the queue and carries them out locally. Scripts from QCCBot's Script Store — TikTok automation, YouTube engagement workflows, app lifecycle management — are executed at this layer. The executor reports status back up the chain, closing the loop.
Why Model-Agnostic Matters
Locking users into a single LLM creates dependency risk. Models update, pricing changes, regional availability varies. By treating the LLM as a replaceable component — interfaced through a consistent Skills layer — BashClaw stays functional regardless of which model is underneath.
It also means operators in different regions can use models they already have access to and trust. GLM and Kimi, for instance, are widely used in contexts where OpenAI access is restricted.
One-Click Deployment
The entire BashClaw stack deploys to the cloud environment in a single step. No local setup, no dependency management. Once deployed, the conversational interface is live and the executor is active on connected devices.
The goal was to make the gap between "I have a cloud phone" and "I can automate it conversationally" as small as possible.
What's Next
BashClaw is actively in development. The Skills library is expanding, and we're working on deeper integration between the task system and QCCBot's proxy and device management layers.
If you're building automation workflows on cloud Android — or just curious about LLM-to-device control architectures — we'd be glad to hear your thoughts.

Top comments (0)