TL;DR: If you have a Google AI Ultra subscription, you are sitting on a practically unlimited pool of background AI agents. I built an open-source tool, Agent-Pool-MCP, that lets your main IDE agent delegate routine tasks to background Gemini CLI workers. The best part -- you can use Claude as the main agent and Gemini as the worker pool, getting real cross-model consensus on a single subscription.
Hey everyone. Many of us are paying for premium AI subscriptions. They are not cheap -- usually around $20/month or more. If you use AI for coding every day, you know the pain of hitting message limits in tools like Cursor right when you need them most.
But what if you could offset routine work to a practically unlimited background pool of agents, all covered by your existing Google AI Ultra subscription?
I tried most of the alternatives -- maxed out Cursor, then moved to Claude. Then I found Antigravity IDE, and that is what got me to subscribe to Google AI Ultra. The reason is simple: it is the only IDE where one subscription gives you both top-tier Claude Opus as the main agent and Gemini with limits that are nearly impossible to exhaust. In Cursor, you burn through limits fast even on the highest plan, and then you wait or pay more.
If you ever hit the daily Claude limit -- just switch the main agent to Gemini and keep going. Best Anthropic model as the orchestrator, practically unlimited Gemini as the background worker pool, all on one payment.
On top of that, you get the Google ecosystem as a bonus: Deep Research, NotebookLM, video generators, and the lightweight Nano models. Plus Stitch MCP -- Google's own MCP server for UI generation. Combined with the agent pool, you get the effect of a full product team: one agent builds UI through Stitch, while others work on the backend or write business logic.
For those new to the term, MCP (Model Context Protocol) is an open standard that allows AI models to securely connect with local tools, files, and external services. It essentially gives your AI a standardized API to interact with your computer.
The Single Agent Bottleneck
Modern IDEs can run some sub-agents, but they lack true flexibility. You usually cannot customize their workflows or have them split into specialized teams.
I wanted fractal orchestration. I wanted my main agent to break down a large refactoring task, spin up a team of background workers, and have them execute in parallel. This is especially useful for isolating tasks in a secure environment.
The Solution: Agent-Pool-MCP
To fix this, we wrote a custom MCP server that acts as a worker pool. It dispatches tasks to background Gemini CLI agents.
This operates on a PULL model. Background tasks do not block your main IDE agent. You tell your main agent what you want, and it decides whether to delegate the task to a worker via delegate_task, consult another model via consult_peer, or both. It gets a task_id and moves on to other things.
Here is an example of a Research -> Consult -> Refactor pipeline:
# 1. IDE agent kicks off a background analysis of legacy components
delegate_task_readonly("Analyze src/components/ for outdated React hooks...") -> task_1
# 2. While the worker analyzes, the IDE agent continues with other tasks
# 3. Before refactoring -- consult a different model
consult_peer("I propose rewriting components from React to Symbiote.js. Here's the plan...") -> verdict
# 4. Gets AGREE or SUGGEST_CHANGES, then delegates the refactoring
delegate_task("Rewrite UserProfile.jsx from React to vanilla Symbiote.js...") -> task_2
We have a strict rule to prevent file conflicts: no two agents touch the same file. They communicate through a sync directory at .agents/delegation/.
.agents/delegation/
├── findings-react-legacy.md - Research worker writes here
├── architecture-symbiote.md - Migration proposal from main agent
└── review-symbiote-patterns.md - Pattern audit by a third agent
One agent writes its findings there, and others read them.
The real killer feature here is cross-model consensus using consult_peer. Since the background pool runs on Gemini CLI, the workers are always Gemini. To get real cross-model consensus -- not Gemini talking to itself -- the main IDE agent should be a different model. That is why we use Claude through Antigravity: when it faces a hard architectural decision, it writes a proposal and sends it to a background Gemini. Two fundamentally different architectures validate the idea BEFORE any code changes are made. Claude proposes, Gemini looks for blind spots, returning either SUGGEST_CHANGES or AGREE. And all of this runs on a single subscription.
Fractal Orchestration
You can customize how agents interact. We call one of our favorite setups "fractal orchestration."
The structure repeats like a pyramid. You create a main orchestrator, and it spawns teams. Each team can have its own orchestrator, which spawns more workers. You decide the depth.
It looks exactly like a standard development company:
IDE agent (Claude)
└─ Project orchestrator (Gemini CLI)
├─ Backend team
│ ├─ Backend orchestrator (Gemini CLI)
│ ├─ Worker: API logic
│ └─ Worker: tests
└─ Frontend team
├─ Frontend orchestrator (Gemini CLI)
├─ Worker: components
└─ Worker: styles
Each orchestrator uses an orchestrator.md skill to break down tasks and call delegate_task. The workers do their jobs in isolation, and the results flow back up.
Technical detail: If you configure agent-pool-mcp in the Gemini CLI settings just once, your background agents can recursively spawn new agents infinitely.
Managing Agent Attention
Asynchronous tasks have a catch. IDE system prompts usually tell the agent to "be proactive" and "finish your tasks." This conflicts with parallel background work. Without tweaks, you run into two issues:
-
The agent waits instead of working. It spams
get_task_resultin a loop instead of picking up a new task. - The agent gets impatient. It thinks the background task is taking too long and just does the work itself, duplicating the effort.
We solved this with an on_wait_hint parameter in delegate_task. You pass an instruction that gets fed back to the agent every time it checks the status.
- "The worker is still writing code. Do not wait -- go analyze style-guide.md" -> The agent switches context.
- "The worker is still processing data. Wait for the result, do not try to do it yourself" -> The agent waits patiently.
It is a simple way to override the IDE's default behavior and control the model's focus.
Skills and Workflows
The pool has a built-in customization system using standard .md files. No complex YAML configs.
Skills (.gemini/skills/*.md) define an agent's role and rules. Write a code-reviewer.md with your checklist, and any worker using that skill will review code exactly to your standards.
Workflows (.agents/workflows/*.md) are step-by-step pipelines. You describe the process once, and any agent can follow it.
You can attach a skill to a task with a single parameter:
# IDE agent activates the "code-reviewer" skill for a delegated task
delegate_task(
prompt: "Review src/auth/ against project standards",
skill: "code-reviewer"
)
You are basically defining your team's expertise in text files.
Quick Start
You can set this up in a couple of commands:
# 1. Install Gemini CLI globally and log in (requires Google AI Ultra subscription)
npm install -g @google/gemini-cli
gemini --login
# 2. Run diagnostics to check Node.js, CLI, and access:
npx agent-pool-mcp --check
Then add the pool to your IDE config:
{
"mcpServers": {
"agent-pool": {
"command": "npx",
"args": ["-y", "agent-pool-mcp"]
}
}
}
Security
Treat any MCP server as a potential risk -- they can be vectors for prompt injections. Pin your versions and only update after auditing the changes. Use approval modes and restrict agent file system access. Never pass secrets in prompts, and always review the results before merging.
P.S. If you want to see how everything works under the hood, check out the code on GitHub or install it from npm.
Top comments (0)