ryoto miyake

Posted on Jul 7 • Originally published at qiita.com

I Built CGMB: An MCP That Unifies Claude Code, Gemini CLI, and Gemini API

#ai #claude #gemini #mcp

Introduction

As English is not my first language, this post has been carefully translated and refined from the original Japanese version I wrote, which you can find on Qiita here.

When exploring tool development using Gemini CLI, I noticed that Google provides generous free usage quotas. By combining these with Claude Code, I thought we could create an MCP that complements Claude Code's missing capabilities (image generation, audio synthesis, etc.), expanding AI utilization possibilities while keeping costs low. This led to the development of CGMB.

CGMB (Claude-Gemini Multimodal Bridge)

What is CGMB? — An MCP for Optimal Claude-Gemini Integration

🎯 Overview

CGMB is equipped with intelligence that understands user intent and automatically routes tasks to the optimal AI layer.

Automatic switching between 3 AI layers

Layer Name	Functionality	Application Scenarios
Claude Code	Logic processing, long-text summarization, code analysis	Advanced reasoning, long-form responses
Gemini CLI	Current information, URL analysis	Latest news retrieval, web search
Gemini API	Image/audio/file generation	Multimodal generation

Note: For convenience, Gemini API is defined as AI Studio in the code.

Feature List

Feature	Description
🧠 Auto Routing	Analyzes PDF, URL, image instructions, etc., and automatically routes to the optimal AI.
🖼️ Multimodal	Supports image generation from text and audio synthesis via Gemini API.
🔄 Stabilization Technology	Implements acceleration through credential caching and retry mechanisms for errors.
💬 Claude Code Integration	Can be executed directly from within Claude Code using natural language prompts like `CGMB ○○`.
🔐 Secure Authentication	Supports secure API key management through `.env` files and OAuth integration.

Unified AI Experience

Complete within Claude Code:
"CGMB Search for the latest AI papers, summarize them, and generate related concept diagrams"

→ Automatic coordination of 3 AIs:
  1. Gemini CLI searches for latest papers
  2. Claude Code analyzes and summarizes content
  3. AI Studio (Gemini API) generates concept diagrams

Everything completed in one interface (Time required: 1/3 of traditional approach)

Installation and Initial Setup

CGMB (Claude-Gemini Multimodal Bridge)

Technical Deep Dive: Behind the Scenes

Intelligent Routing

The most complex implementation in CGMB was the request routing logic and its associated error handling.
For natural language prompts from users (e.g., "Summarize /path/to/report.pdf"), CGMB internally performs the following decisions:

Input Analysis: Analyzes strings, file paths, and URLs contained in prompts using regular expressions and pattern matching.
Resource Type Identification: Identifies whether the input is a local PDF, a web URL, or plain text.
Optimal Backend Selection: Routes to the optimal backend service - PDFs to Claude, web pages to Gemini CLI, image generation instructions to Gemini API, etc.
State Management and Fallback: If a specified backend (e.g., Gemini CLI) is in an authentication error state, processing is halted and a clear error message is returned to the user. This prevents unexpected behavior or incomplete processing results.

By implementing this series of processes robustly, users can focus on their tasks without being aware of backend complexity.

Choosing MCP Server

By adopting the MCP protocol, CGMB achieved the following:

Future-proofing: As the MCP ecosystem grows, integration with other tools is also envisioned
Standards compliance: Adherence to industry standards rather than proprietary protocols

By adopting the standardizing MCP protocol, CGMB functions not as an isolated tool but as part of a growing ecosystem, gaining the potential to extend with other tools that may emerge in the future.

Unifying Different AI Services

The biggest challenge faced during development was unifying the response formats of different AI services.

Claude Code, Gemini CLI, and Gemini API each have different:

Response formats: JSON, plain text, binary data
Error handling: HTTP status codes, exceptions, custom errors
Asynchronous processing: Promises, callbacks, streaming

To absorb these differences, we implemented a unified interface layer.

Additionally, to improve image generation quality, we also implemented automatic English translation for multilingual prompts. Since Gemini API's image generation achieves optimal results with English prompts, we designed a mechanism that automatically translates prompts input in Japanese or other languages to English internally. This allows users to make requests naturally in their native language while obtaining high-quality image generation results.

By abstracting the functionality of each AI service to match the structured tool definitions required by the MCP protocol, users can now have a consistent experience without being aware of the implementation details of different AI services.

Summary: A New Paradigm for AI Integration

CGMB presents a new approach that makes multiple AIs "collaborate" rather than "compete."

Insights Gained Through Development

1. The Importance of Right Tool for Right Job

Rather than trying to solve everything with one AI, the combination that leverages each AI's strengths proved most efficient. By properly combining Claude Code's reasoning capabilities, Gemini CLI's distributed token information retrieval, and Gemini API's generation capabilities, we can create value that cannot be achieved individually.

2. UX-First Design Philosophy

By making all functions accessible through the unified keyword CGMB, users can focus on what they want to achieve without being aware of implementation details. This minimized learning costs while maximizing productivity.

Expected Use Cases

1. Technical Blog Writing Support

"CGMB Research the new features in Rust 1.70 and generate illustrated sample code"
→ Gemini CLI gathers latest information → Claude Code creates technical explanations → AI Studio (Gemini API) generates illustrations

2. Automatic Presentation Material Generation

"CGMB Generate images explaining our company's tech stack and create audio narration"
→ Claude Code creates structure → AI Studio (Gemini API) generates charts → Audio synthesis creates narration

3. Multilingual Document Expansion

"CGMB Analyze README.md and generate explanatory images for main features in both Japanese and English"
→ Claude Code analyzes documents → AI Studio (Gemini API) generates multilingual images

Community Contribution

CGMB is released under the MIT license and is available at:

GitHub: goodaymmm/claude-gemini-multimodal-bridge
NPM: claude-gemini-multimodal-bridge

Future Expansion Plans

We are planning the following feature expansions:

Planned Implementations

OCR PDF Analysis: Text extraction from scanned PDFs
Video Generation: Video content generation using Gemini API

If reading this article made you think "I might want to try this out," that alone makes developing it worthwhile. If you find bugs, please let me know. Ideas like "It would be nice to have this feature" are also very welcome.

DEV Community