somacoffeekyoto

Posted on Mar 6

Adding Image Generation to Any AI Coding Environment — imgx-mcp

#mcp #ai #showdev #tooling

Series: imgx-mcp — AI Image Generation for Coding Environments

I run a coffee shop in Kyoto.
SOMA COFFEE KYOTO has been doing event pop-ups since 2010 — no fixed storefront, just coffee served at events across the city.

A few years ago, I started building apps to solve a structural problem in coffee: definitions that don't agree with each other. "Medium grind" means different things depending on who you ask. Roast levels, water temperature, dose — every source says something different. I wanted to build tools that organize this, so I taught myself software development and started shipping web apps. Six coffee apps and a developer toolkit later, I'm still at it. All free betas, all open for anyone to use.

This article is about a tool that came out of that process — not a coffee app, but something that solves a problem I kept hitting while building them.

The Problem: Image Generation Breaks Your Flow

If you work in an AI-assisted coding environment — Claude Code, Cursor, Gemini CLI, or similar — you've probably run into this: you need an image.

A cover image for a blog post. A diagram for documentation. A thumbnail. An icon for a landing page. Something visual that your current tool can't generate on its own.

So you stop what you're doing. Open a separate image generation service. Re-explain the context — what the project is about, what style you need, what dimensions work for the target platform. Write a prompt from scratch. Download the result. Rename it. Move it to the right directory. If it doesn't look right, go back and forth between the generator, your editor, and your file manager.

Each step is small. But they add up. Every time you leave your working environment to deal with images, you lose the context you were holding in your head. The real cost isn't the minutes — it's the interrupted flow.

What imgx-mcp Does

imgx-mcp is an MCP server that adds image generation and editing capabilities to AI coding environments.

MCP (Model Context Protocol) is a standard protocol for connecting tools to AI environments. If your environment supports MCP — Claude Code, Cursor, Windsurf, Gemini CLI, Claude Desktop, and others — imgx-mcp works there. One install, same interface everywhere.

What changes is the entire workflow around images.

Say you're writing a blog post and your AI suggests adding an illustration to clarify a section. Without image generation built in, you'd have to leave, open another service, and start from zero. With imgx-mcp, the AI already has the full context — your article's content, the target audience, the tone. You say "make that image" and it builds a prompt, selects parameters, generates the image, and places it in your project. One conversation, no context switch.

The same applies to slide decks, landing pages, documentation diagrams — anything where images are part of a larger project the AI is already working on.

Context in, image out. No tab-switching. No re-explaining.

Why MCP Over Standalone Services

Standalone image generators (Midjourney, DALL-E web UI) work in isolation. You copy context manually. API wrappers require you to handle prompts, parameters, and file management yourself. IDE-specific plugins lock you into one environment.

MCP sits in a different category. The AI environment connects to the MCP server using a standard protocol. The server exposes tools that the AI can call directly, with full access to project context. It works across any MCP-compatible environment without modification.

imgx-mcp provides 10 MCP tools, backed by 84 tests:

generate_image — create images from prompts
edit_image / edit_last — modify existing images or the most recent output
undo_edit / redo_edit — step back and forward through edit history
edit_history — view the full session timeline
switch_session — manage multiple image projects independently
clear_history — reset when needed
set_output_dir — control where files land
list_providers — check available generation backends

Dual Provider: Gemini + OpenAI

imgx-mcp supports two image generation providers:

Gemini — NanoBanana (high quality, free tier available), Pro models up to 4K resolution
OpenAI — gpt-image-1 (multi-output, transparent backgrounds), gpt-image-1.5 (roughly 4x faster, better text rendering), gpt-image-1-mini (lower cost)

Switch between them with a single parameter. Use whichever model fits the task. When new providers are added in the future, the interface stays the same.

Setup requires one API key. That's it.

Iterative Editing with `edit_last`

This is the feature I use most.

1. Generate an image
2. "Make the background darker"
3. "Increase the text size"
4. "Shift the layout to the left"
5. Keep going until it's right

You never specify a file path after the first generation. edit_last automatically picks up the most recent output. The conversation flows naturally — you talk about what to change, not where the file is.

If you're building a landing page and need to composite a product image onto a new background, then add a logo in the corner, then adjust the color temperature — each step builds on the last. It works the way you'd expect a conversation to work.

Undo, Redo, and Branching

After several rounds of edits, you realize the version from three steps ago was better. Or you want to try a different direction from an earlier state.

imgx-mcp tracks every generation as part of a session. Each session maintains its own history with metadata. Undo steps backward. Redo steps forward. You can branch from any point — undo twice, then edit in a new direction, and the history forks cleanly.

Generated images are saved in session folders with sequential numbering. No guessing which file is which, no losing track of prompts.

Being able to undo without consequence changes how you work. You try things you wouldn't bother trying if each attempt were permanent.

Skill: Image Expertise for Your AI

imgx-mcp ships with a Skill — a structured knowledge file that teaches the AI how to use image generation effectively.

The MCP server gives the AI the ability to generate and edit images. The Skill gives it the expertise: how to build prompts, which model to choose for which task, what dimensions to use for different platforms (blog covers, OGP images, social media), and how to apply current visual styles.

Tell the AI "make a cover image" and it reads your article, determines the composition, selects style parameters, picks the right model and dimensions — all without you needing to know anything about image generation prompts or provider-specific settings.

The next article in this series covers this architecture in detail.

Getting Started

Install with npx — no global install required:

npx imgx-mcp@latest init

One command adds imgx-mcp to your MCP configuration. One API key (Gemini or OpenAI) and you're generating images from your AI environment.

Landing page: Check out the imgx-mcp Landing Page
Usage guide: somacoffee.net/imgx-mcp-usage
Development story: somacoffee.net/imgx-mcp-development-story
GitHub: somacoffeekyoto/imgx-mcp
npm: imgx-mcp

MIT license. v1.6.0. 10 MCP tools. 84 tests.

What's Next

This is the first article in the imgx-mcp series. Next, I'll cover the MCP + Skill architecture — how a standard protocol and a structured knowledge file turn any AI coding environment into an image-capable one.

If you're working in CLI-based AI environments and images have been a friction point, give imgx-mcp a try. I'd like to hear how it works in your setup — what's useful, what's missing, what you'd change.

This is Part 1 of the imgx-mcp series. Next: How One Tool Works Across 10+ AI Environments — MCP + Skill Design

SOMA COFFEE KYOTO — A coffee shop in Kyoto that builds software tools.

somacoffeekyoto / imgx-mcp

AI image generation and editing MCP server. Text-based editing, multi-provider, CLI included.
imgx-mcp

AI image generation and editing MCP server. Works with Claude Code, Gemini CLI, Cursor, Windsurf, and any MCP-compatible tool.

Generate images from text, edit existing images with text instructions, iterate on results — all from your AI coding environment.

What sets imgx-mcp apart

No prompt engineering — Your AI agent keeps conversation context and auto-constructs optimized prompts. Say what you need; the agent handles prompt structure, model selection, and platform-specific sizing

24 editing techniques built in — Atmosphere, composition, style transfer, element manipulation, and trending styles — bundled as a Skill your agent applies on demand

Session management with undo/redo — Edit iteratively, step back to any point, branch off, or switch between parallel sessions — version control for images

Quick start

Add to your tool's MCP config (.mcp.json, settings.json, etc.):

{ "mcpServers": { "imgx": { "command": "npx" "args":
…
View on GitHub
Our Coffee Apps (all free betas)
somacoffee.net