Fumio SAGAWA

Posted on Mar 11

I Built multi-ai-cli to Kill Browser Tab Hell: True Multi-LLM Orchestration + Blackboard Memory in Your Terminal (v0.13.0)

#ai #productivity #python #showdev

GPT-4, Claude 3.5, Gemini... Are you still keeping 10 AI tabs open in your browser, endlessly copy-pasting code between your IDE and chat UIs?

It's time to end the "Tab Hell."

Let me give you a slightly hot take: Just changing a system prompt to say "You are a reviewer" while hitting the exact same expensive model backend is not a true "Multi-Agent" system. That’s why I built multi-ai-cli. It’s a lightweight, Python-powered orchestrator designed to turn your terminal into a true multi-model battlefield.

The v0.13.0 Highlight: Agent / Engine Separation

Instead of just swapping prompts, we’ve completely separated the physical AI providers (Engines) from their logical roles (Agents).

Just configure the auto-generated ~/.multi-ai/config.ini file upon your first run like this:

# Example: Mapping Engines to Agents
ENGINE.openai_main = gpt-4o
ENGINE.local_coder = qwen2.5-coder:14b  # Yes, Ollama works perfectly!

AGENT.gpt.architect = openai_main
AGENT.local.code = local_coder

Now you can route heavy architectural tasks to the smartest cloud models, and offload simple coding tasks to local models. Spin up Ollama’s qwen2.5-coder:14b locally, and you get a fully offline, API-key-free multi-AI experience! No vendor lock-in. You can freely switch to the optimal backend for each specific role.

Core Concept: "Blackboard" Memory & Avoiding API Bankruptcy

When orchestrating multiple AIs, if you keep feeding a bloated conversation history to the APIs, you will face API bankruptcy in no time. To solve this, I split the memory into two layers:

🧠 Short-Term Memory (@scrub)

My most-used command. It flushes the messy conversation history instantly while keeping the Persona (system prompt) intact. It stops the AI from hallucinating on old context and saves your wallet.

% @sequence -e
[*] Editor prompt captured (182 chars, 13 lines):
--- Preview ---
[
  @gpt Remember exactly this token: ZEBRA-9182
||
  @gemini Remember exactly this token: ZEBRA-9182
]
-> @scrub gpt ->
[
  @gpt What is the token?
||
  @gemini What is the token?
]
--- End Preview ---
[*] Sequence Execution: 3 steps detected.
==================================================
[*] Executing Step 1/3 [PARALLEL: 2 tasks]...
    Task 1: @gpt Remember exactly this token: ZEBRA-9182
    Task 2: @gemini Remember exactly this token: ZEBRA-9182

--- GPT ---
Okay — I’ll remember this token exactly:

ZEBRA-9182


--- Gemini ---
I have memorized the token exactly: **ZEBRA-9182**. 

Just let me know whenever you need me to recall it!

[✓] Step 1/3 completed (all parallel tasks done).
--------------------------------------------------
[*] Executing Step 2/3...
    Command: @scrub gpt
[*] GPT memory scrubbed.
[✓] Step 2/3 completed successfully.
--------------------------------------------------
[*] Executing Step 3/3 [PARALLEL: 2 tasks]...
    Task 1: @gpt What is the 'token?'
    Task 2: @gemini What is the 'token?'

--- GPT ---
Which token do you mean?

If you mean:
- API token: I can’t see your secrets or account tokens.
- A “token” in text/LLMs: it’s a chunk of text a model processes, often a word or part of a word.
- Auth/session token in an app: it’s a credential used to prove identity.

Tell me the context and I’ll answer precisely.


--- Gemini ---
The token is ZEBRA-9182.

[✓] Step 3/3 completed (all parallel tasks done).
==================================================
[✓] Sequence Execution complete. All 3 steps succeeded.
%

💾 Long-Term Blackboard Memory (-r / -w)

Save the AI's output to local files (-w), and feed them into different models later (-r). The real magic here is State Recovery. If an automated pipeline fails halfway through, you don't have to start over. You just read the last saved file and design a new flow to recover from that exact point.

% @sequence -e
[*] Editor prompt captured (225 chars, 7 lines):
--- Preview ---

@sh "echo '<p>Hello World</p>'" -w raw.html
->
@gpt "Extract the text from this HTML" -r raw.html -w text.txt
->
@claude "Translate this text into Japanese" -r text.txt
->
@gemini "Translate this text into French" -r text.txt
--- End Preview ---
[*] Sequence Execution: 4 steps detected.
==================================================
[*] Executing Step 1/4...
    Command: @sh 'echo '"'"'<p>Hello World</p>'"'"'' -w raw.html
[*] @sh: Executing: echo '<p>Hello World</p>'
[✓] @sh: SUCCESS (exit code: 0, 12.0ms)
--- stdout ---
<p>Hello World</p>
--- end stdout ---
[*] @sh: Artifact saved to 'raw.html' (format: text).
[✓] Step 1/4 completed successfully.
--------------------------------------------------
[*] Executing Step 2/4...
    Command: @gpt 'Extract the text from this HTML' -r raw.html -w text.txt
[*] Result saved to 'text.txt' (mode: raw).                                                                            
[✓] Step 2/4 completed successfully.
--------------------------------------------------
[*] Executing Step 3/4...
    Command: @claude 'Translate this text into Japanese' -r text.txt

--- Claude ---
--- [File: text.txt] ---
こんにちは世界
--- [End of File: text.txt] ---

[✓] Step 3/4 completed successfully.
--------------------------------------------------
[*] Executing Step 4/4...
    Command: @gemini 'Translate this text into French' -r text.txt

--- Gemini ---
Bonjour le monde

[✓] Step 4/4 completed successfully.
==================================================
[✓] Sequence Execution complete. All 4 steps succeeded.
%

The Workflows: Terminal Superiority

1. Parallel Orchestration with HAN Syntax (`@sequence`)

Write human-AI hybrid workflows like code. Use -> for sequential steps, and [ A || B ] for parallel execution.

# Fetch code from GitHub, run parallel reviews, and merge the results
@sequence
-> @github.file --repo "myproj/repo" --path "app.py" -w code.md
-> [ @claude.review "Find bugs" -r code.md -w claude_review.md
  || @gemini.plan "Optimization ideas" -r code.md -w gemini_opt.md
  || @gpt.code "Add test cases" -r code.md -w tests.py ]
-> @gpt "Merge the 3 reviews above and create the final version" \
   -r claude_review.md -r gemini_opt.md -r tests.py -w final.py

(💡 Pro Tip: Open another terminal and run tail -f logs/chat.log. You get a real-time HUD monitoring all AI conversations as they happen! Debugging is an absolute breeze.)

2. Feed External Commands to AI (`@sh`)

# Example 1: Preprocess text before passing it to an AI agent
@sh "cat raw.html | sed 's/<[^>]*>//g'" -w text.txt

# Example 2: Inspect local project files or command output
@sh "ls -la src"
@sh "git diff --stat"

# Example 3: Run local scripts or test suites seamlessly within your workflow
@sh "python scripts/build_index.py"
@sh "pytest tests/"

This is the ultimate terminal advantage that browsers can't touch. Pipe the output of any CLI tool directly into the AI.

# Example 4: Run linters and let Claude fix the errors
@sh "flake8 app.py" -w lint.md
->
@claude "Fix all these lint errors" -r lint.md -r app.py -w fixed.py

The "Read-Only" Philosophy: Preventing Repo Disasters

Even with all this power, our adapters (like GitHub and Figma) are strictly Read-Only. This is a deliberate safety-by-design choice.

We all dread the nightmare of an autonomous agent going rogue and git push-ing broken code while you're getting coffee.
Analysis and generation are the AI's job. The final Write (commit) is your responsibility.
By keeping the human in the loop, you maintain absolute control over your codebase. This makes it a tool you can actually trust in a real-world workflow.

Try It Out! 🚀

multi-ai-cli is currently at v0.13.0. To avoid registry bloat and keep things blazingly fast, you have two hacker-friendly ways to get started:

Option 1: The Lightning-Fast Source Way (Recommended for Python users)

git clone git@github.com:ashiras/multi-ai-cli.git
cd multi-ai-cli
uv sync                     # Install dependencies
uv run multi-ai --version   # Verify the installation

# Run it like this from now on!
# uv run multi-ai "@gpt Hello world"

Option 2: The Clean Binary Way (No Python Required)

Don't want to mess with environments at all? You can download the latest pre-built binary directly from our GitHub Releases (macOS / Linux / Windows supported). It's a zero-dependency single file!

# Example for macOS / Linux
curl -L -o multi-ai https://github.com/ashiras/multi-ai-cli/releases/download/v0.13.0/multi-ai
chmod +x multi-ai
sudo mv multi-ai /usr/local/bin/   # Or to ~/bin/ etc.
multi-ai --version

(Note: Please check the Releases page for the exact URL for your specific OS!)

I can never go back to juggling 10 AI browser tabs. Upgrade your terminal into the ultimate multi-AI battlefield today!

GitHub: ashiras/multi-ai-cli (Stars and issues are highly appreciated!)
Feedback: What adapters do you want to see next? Jira? Notion? Let me know in the comments below! 👇

DEV Community

I Built multi-ai-cli to Kill Browser Tab Hell: True Multi-LLM Orchestration + Blackboard Memory in Your Terminal (v0.13.0)

The v0.13.0 Highlight: Agent / Engine Separation

Core Concept: "Blackboard" Memory & Avoiding API Bankruptcy

The Workflows: Terminal Superiority

1. Parallel Orchestration with HAN Syntax (`@sequence`)

2. Feed External Commands to AI (`@sh`)

The "Read-Only" Philosophy: Preventing Repo Disasters

Try It Out! 🚀

Option 1: The Lightning-Fast Source Way (Recommended for Python users)

Option 2: The Clean Binary Way (No Python Required)

Top comments (0)

The v0.13.0 Highlight: Agent / Engine Separation

Core Concept: "Blackboard" Memory & Avoiding API Bankruptcy

The Workflows: Terminal Superiority

1. Parallel Orchestration with HAN Syntax (@sequence)

2. Feed External Commands to AI (@sh)

The "Read-Only" Philosophy: Preventing Repo Disasters

Try It Out! 🚀

Option 1: The Lightning-Fast Source Way (Recommended for Python users)

Option 2: The Clean Binary Way (No Python Required)

1. Parallel Orchestration with HAN Syntax (`@sequence`)

2. Feed External Commands to AI (`@sh`)