Introduction
As English is not my first language, this post has been carefully translated and refined from the original Japanese version I wrote, which you can find on Qiita here.
When exploring tool development using Gemini CLI, I noticed that Google provides generous free usage quotas. By combining these with Claude Code, I thought we could create an MCP that complements Claude Code's missing capabilities (image generation, audio synthesis, etc.), expanding AI utilization possibilities while keeping costs low. This led to the development of CGMB.
What is CGMB? β An MCP for Optimal Claude-Gemini Integration
π― Overview
CGMB is equipped with intelligence that understands user intent and automatically routes tasks to the optimal AI layer.
Automatic switching between 3 AI layers
Layer Name | Functionality | Application Scenarios |
---|---|---|
Claude Code | Logic processing, long-text summarization, code analysis | Advanced reasoning, long-form responses |
Gemini CLI | Current information, URL analysis | Latest news retrieval, web search |
Gemini API | Image/audio/file generation | Multimodal generation |
Note: For convenience, Gemini API is defined as AI Studio in the code.
Feature List
Feature | Description |
---|---|
π§ Auto Routing | Analyzes PDF, URL, image instructions, etc., and automatically routes to the optimal AI. |
πΌοΈ Multimodal | Supports image generation from text and audio synthesis via Gemini API. |
π Stabilization Technology | Implements acceleration through credential caching and retry mechanisms for errors. |
π¬ Claude Code Integration | Can be executed directly from within Claude Code using natural language prompts like CGMB ββ . |
π Secure Authentication | Supports secure API key management through .env files and OAuth integration. |
Unified AI Experience
Complete within Claude Code:
"CGMB Search for the latest AI papers, summarize them, and generate related concept diagrams"
β Automatic coordination of 3 AIs:
1. Gemini CLI searches for latest papers
2. Claude Code analyzes and summarizes content
3. AI Studio (Gemini API) generates concept diagrams
Everything completed in one interface (Time required: 1/3 of traditional approach)
Installation and Initial Setup
CGMB (Claude-Gemini Multimodal Bridge)
Technical Deep Dive: Behind the Scenes
Intelligent Routing
The most complex implementation in CGMB was the request routing logic and its associated error handling.
For natural language prompts from users (e.g., "Summarize /path/to/report.pdf"), CGMB internally performs the following decisions:
- Input Analysis: Analyzes strings, file paths, and URLs contained in prompts using regular expressions and pattern matching.
- Resource Type Identification: Identifies whether the input is a local PDF, a web URL, or plain text.
- Optimal Backend Selection: Routes to the optimal backend service - PDFs to Claude, web pages to Gemini CLI, image generation instructions to Gemini API, etc.
- State Management and Fallback: If a specified backend (e.g., Gemini CLI) is in an authentication error state, processing is halted and a clear error message is returned to the user. This prevents unexpected behavior or incomplete processing results.
By implementing this series of processes robustly, users can focus on their tasks without being aware of backend complexity.
Choosing MCP Server
By adopting the MCP protocol, CGMB achieved the following:
- Future-proofing: As the MCP ecosystem grows, integration with other tools is also envisioned
- Standards compliance: Adherence to industry standards rather than proprietary protocols
By adopting the standardizing MCP protocol, CGMB functions not as an isolated tool but as part of a growing ecosystem, gaining the potential to extend with other tools that may emerge in the future.
Unifying Different AI Services
The biggest challenge faced during development was unifying the response formats of different AI services.
Claude Code, Gemini CLI, and Gemini API each have different:
- Response formats: JSON, plain text, binary data
- Error handling: HTTP status codes, exceptions, custom errors
- Asynchronous processing: Promises, callbacks, streaming
To absorb these differences, we implemented a unified interface layer.
Additionally, to improve image generation quality, we also implemented automatic English translation for multilingual prompts. Since Gemini API's image generation achieves optimal results with English prompts, we designed a mechanism that automatically translates prompts input in Japanese or other languages to English internally. This allows users to make requests naturally in their native language while obtaining high-quality image generation results.
By abstracting the functionality of each AI service to match the structured tool definitions required by the MCP protocol, users can now have a consistent experience without being aware of the implementation details of different AI services.
Summary: A New Paradigm for AI Integration
CGMB presents a new approach that makes multiple AIs "collaborate" rather than "compete."
Insights Gained Through Development
1. The Importance of Right Tool for Right Job
Rather than trying to solve everything with one AI, the combination that leverages each AI's strengths proved most efficient. By properly combining Claude Code's reasoning capabilities, Gemini CLI's distributed token information retrieval, and Gemini API's generation capabilities, we can create value that cannot be achieved individually.
2. UX-First Design Philosophy
By making all functions accessible through the unified keyword CGMB
, users can focus on what they want to achieve without being aware of implementation details. This minimized learning costs while maximizing productivity.
Expected Use Cases
1. Technical Blog Writing Support
"CGMB Research the new features in Rust 1.70 and generate illustrated sample code"
β Gemini CLI gathers latest information β Claude Code creates technical explanations β AI Studio (Gemini API) generates illustrations
2. Automatic Presentation Material Generation
"CGMB Generate images explaining our company's tech stack and create audio narration"
β Claude Code creates structure β AI Studio (Gemini API) generates charts β Audio synthesis creates narration
3. Multilingual Document Expansion
"CGMB Analyze README.md and generate explanatory images for main features in both Japanese and English"
β Claude Code analyzes documents β AI Studio (Gemini API) generates multilingual images
Community Contribution
CGMB is released under the MIT license and is available at:
Future Expansion Plans
We are planning the following feature expansions:
Planned Implementations
- OCR PDF Analysis: Text extraction from scanned PDFs
- Video Generation: Video content generation using Gemini API
If reading this article made you think "I might want to try this out," that alone makes developing it worthwhile. If you find bugs, please let me know. Ideas like "It would be nice to have this feature" are also very welcome.
Top comments (0)