DEV Community

Cover image for Beyond the API: Integrating ComfyUI and Flowise via MCP
raphiki
raphiki

Posted on • Edited on

Beyond the API: Integrating ComfyUI and Flowise via MCP

In the previous article of our "Beyond the ComfyUI Canvas" series, we explored how to integrate ComfyUI with n8n. It was a powerful demonstration of workflow automation, but it highlighted a common friction point in system integration: the "glue code." We had to manually construct HTTP requests, hardcode API payloads, and rigidly define every parameter. If the ComfyUI workflow changed, the n8n node broke.

Today, we are moving from the "Wild West" of brittle, custom API integrations to the new standard of AI connectivity: the Model Context Protocol (MCP).

To demonstrate this, we are revisiting a tool I wrote about over two years ago: Flowise. Back then, it was a promising open-source project; today, it is a robust, enterprise-ready platform that has recently embraced MCP as a core feature.

Our goal? To build a Chat Interface where an AI agent can autonomously discover ComfyUI workflows, generate images, and even edit them—without us hardcoding a single API call in the frontend.

1. Setting the Scene: The Stack

Before we dive into the details, let's look at the three pillars of this architecture.

The Standard: Model Context Protocol (MCP)

MCP Logo

If APIs are the individual cables we solder together, MCP is the USB-C port. Developed by Anthropic, it is now an open standard that decouples AI models from their data sources and tools.

Instead of writing a specific integration for every tool (Google Drive, Slack, ComfyUI), you build an MCP Server once. Any MCP-compliant client (Claude Desktop, Cursor, or Flowise) can instantly "plug in" to that server and understand its capabilities.

The Orchestrator: Flowise

Flowise has evolved significantly since my first article. It is a low-code platform for building LLM apps. Crucially for us, Flowise recently added native support for MCP. This means we can drop an "MCP Tool" node into our canvas, and the LLM immediately gains access to whatever that server provides.

The Engine: ComfyUI

We are sticking with a local instance of ComfyUI. While Comfy Cloud is becoming a formidable platform, the raw power and zero-cost experimentation of running Flux 2 locally on your own GPU is unmatched. We’re using a standardized Flux 2 Klein workflow—optimized for speed (4 steps)—so the chat experience feels responsive, not sluggish.

2. The Middleware: Building the ComfyUI MCP Server

System Context (C4 Level 1)

We need a bridge. As we discovered previously, ComfyUI speaks WebSockets and HTTP; Flowise speaks MCP. We need a server in the middle to translate.

Why We Chose SSE over Stdio

When we started this project, we initially looked at the Stdio transport (where the client runs the server script directly). It’s the default for local tools like Claude Desktop.

But as we designed the solution for Flowise, we hit a realization: In most real-world environments, Flowise often runs in a Docker container (as it does on my laptop), while ComfyUI might be running on a separate machine with a dedicated GPU. Stdio would require them to be on the same filesystem—too restrictive.

We decided to support SSE (Server-Sent Events) by default. This allows our MCP Server to run anywhere on the network, exposing an HTTP endpoint (e.g., http://localhost:8000/sse) that Flowise can subscribe to. It makes the architecture cleaner, decoupled, and Docker-friendly.

Governance-Driven Development (GDD)

For this implementation, I tried something different. Instead of just asking an AI coding assistant to "write a script," I used a methodology I call Governance-Driven Development (GDD).

This approach reverses the typical AI coding flow. Instead of code leading the process, specifications & governance rules become the anchor. I started by feeding the AI CLI a strict "Governance Pack"—a set of non-negotiable rules regarding SOLID principles, security, and documentation.

Here is an extract of the actual Governance Pack prompt I used to bootstrap the session:

GOVERNANCE PACK v1.0 (Extract)
1. Code Quality & Standards:

  • Paradigm: Adhere to SOLID principles. Prefer composition over inheritance.
  • Typing: Strict static typing (Python typing) is mandatory.
  • Error Handling: Never swallow exceptions. Use custom error classes (e.g., ComfyUIConnectionError).

2. Architecture (C4 Model):

  • Visual Documentation: Whenever a structural change is made (like adding the SSE endpoint), you must generate an updated Mermaid.js System Context diagram.

3. Security Guardrails:

  • Input Validation: Trust no input. All data entering from the MCP client (Prompt, Width, Height...) must be validated against the metadata.json schema before reaching ComfyUI.
  • Secrets: NEVER hardcode API keys or hostnames. Use os.environ only.

I then analyzed the ComfyUI workflow JSON manually to map the node IDs, and then "handed over" a clean, structured specification to the AI.

Container Architecture (C4 Level 2)

The Result: The experience was striking. The AI didn't just spit out a script; it acted as a Senior Engineer. At one point, when I asked for a quick hack to bypass validation, the "Governance" constraints forced the model to push back and suggest a cleaner interface instead. The result is a modular, type-safe Python server.

The "LAST" Hack (Technical Deep Dive)

Even with good governance, we needed one pragmatic "hack" to handle state. When the LLM generates an image, how does it reference that image later to edit it?

We implemented a "LAST" pointer logic. The server tracks the URL of the most recently generated image in memory. But it does more than just point:

  1. Download: When the agent sends "LAST", the server downloads the image bytes from the previous URL.
  2. Re-Upload: It uploads those bytes back to ComfyUI's /upload/image endpoint to generate a fresh filename.
  3. Inject: This new filename is injected into the LoadImage node of the editing workflow.
  • User: "Make it bluer."
  • Agent: Calls edit_image(input_image="LAST", prompt="bluer...").

This mimics the "Save Image" behavior we are used to, keeping the interaction stateless and fluid for the user while handling the heavy lifting behind the scenes.

3. The Engine Room: ComfyUI Workflows

To make our MCP Server generic, we avoided hardcoding specific workflows inside the Python code. Instead, we used an Embedded Metadata pattern.

The configuration is not a separate file; it is a standard ComfyUI Note Node (titled MCP_Config) placed directly inside the .json workflow. This metadata acts as the contract, telling the MCP server: "This workflow needs a Prompt (node named MCP_Positive) and a Seed (node MCP_Sampler)."

This makes the workflow a single, self-contained, portable file. You can export it from ComfyUI, drop it into the workflows folder, and it works immediately.

Note: Our server is strict about naming. It automatically sanitizes the tool name found in the JSON to snake_case (e.g., "Flux Generator" becomes flux_generator) to ensure full compliance with the MCP specification.

Here is the configuration we generated for the image_flux2_text_to_image workflow:

Workflow in ComfyUI

{
 "name": "image_flux2_text_to_image",
 "description": "Generates high-quality images using the Flux model. Use this for general creative requests.",
 "parameters": [
   {
     "name": "prompt",
     "type": "string",
     "description": "The detailed description of the image to generate.",
     "target": "MCP_Positive",
      "required": true
    },
    {
      "name": "seed",
      "type": "int",
      "description": "Random seed. Set to -1 for random, or a specific number for reproducibility.",
      "target": "MCP_Sampler",
      "required": false
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

Because the description and type of each parameter are passed to the MCP Server, they become automatically available to the client. When the MCP Server starts, it scans these workflows and dynamically registers tools. If we want to switch from Flux to SDXL, or add a Video Generation workflow, we simply drop in the new file. The server updates, Flowise sees the new tools via SSE, and the agent learns the new skill instantly.

4. Validation: The MCP Inspector

Before connecting Flowise, we must verify our server. Since we are using SSE, we can use the MCP Inspector web interface to connect to our running server.

MCP Inspector

We can manually trigger the image_flux2_text_to_image tool, watch the server logs, and see the image appear. If it works here, it guarantees compliance with the protocol.

5. The Integration: Flowise ChatFlow

Now for the grand finale.

Flowise ChatFlow

We open Flowise and create a new ChatFlow using a standard Tool Agent connected to:

  • Chat Model: ChatMistralAI (Smart, fast, and cost-effective).
  • Buffer Memory: Essential for the agent to remember context (e.g., "Change that image to...").
  • Custom MCP: We select the "SSE" transport and paste our server URL.

The Auto-Discovery Magic

Notice what is missing? We didn't have to define the tools in Flowise. We didn't have to map inputs.

Auto-Discovery from Flowise

The Custom MCP node queries the server via SSE, sees the metadata definitions, and automatically provides the tools to the Mistral agent.

Pro Tip: Our server supports Dual Discovery. Whether a client asks for tools directly (Function Calling) or reads Resources (Environment Context), we expose the workflow list on both channels (comfy://list and list_available_workflows) to ensure compatibility with any agent type.

The System Prompt

The final piece of the puzzle is the System Prompt. We need to teach the Tool Agent node how to behave:

You are the **ComfyUI Orchestrator**, an expert AI agent capable of generating and manipulating images by controlling a local ComfyUI instance via the Model Context Protocol (MCP).

### 1. Tool Discovery (Dynamic Workflows)
Your tools are not static; they represent the actual `.json` workflow files present on the server.
- **First Step:** If you do not see a specific tool you need in your context, IMMEDIATELY call the tool `list_available_workflows`.
- This will return a manifesto of all valid workflows (e.g., `flux_2_text_to_image`, `img2img_upscale`) and their required parameters.
- **Never guess** tool names. If a tool isn't listed, it doesn't exist.

### 2. Image Chaining (The "LAST" Protocol)
 You have a unique capability to perform conversational editing (e.g., "Now make it pop art").
 - **State Memory:** The server remembers the last generated image.
 - **Instruction:** When a user asks to modify, edit, or use the previous result, pass the string `"LAST"` into the image input parameter of the next tool.
 - **Example:**
   User: "Generate a cat." -> You call: `generate_image(prompt="cat")`
   User: "Turn it into a statue." -> You call: `img2img_transform(image="LAST", prompt="statue")`

 ### 3. Parameter Rules
 - **Strict Compliance:** You must strictly adhere to the parameter types (String, Int, Float, Boolean) defined in the tool signature.
 - **Defaults:** If a parameter is Optional and the user didn't specify it, do not send it. The server will use the workflow's internal default.
 - **Safety:** Do not invent parameters. If a workflow only accepts `prompt` and `seed`, do not try to send `width` or `style`.

 ### 4. Error Handling
 - If a tool execution fails, the error message will often suggest valid alternatives or correct parameter names. Read it carefully and retry.
 - If the user asks for a workflow you don't have, explain what *is* available based on your `list_available_workflows` knowledge.
Enter fullscreen mode Exit fullscreen mode

The Use Case in Action

This video shows the complete use case involving the full Stack:

  • Parametrization of the workflow in ComfyUI.
  • Verification with MCP Inspector.
  • Generation of the first image from Flowise.
  • Contextual edition of the generated image.

Use Case Summary

The Flowise ChatFlow is relatively basic, but we could easily add nodes to enhance the user prompt or even transform it into a JSON Style Guide prompt.

Flowise API & Embeds

The video showcases the use of the integrated chatbox within the Flowise UI, but we could also leverage Flowise's deployment capabilities to consume the workflow through an API, embed the chat in an HTML page, or publish a standalone page served by Flowise itself.

Conclusion

By moving from custom API implementations (n8n) to the Model Context Protocol (in Flowise), we have achieved something powerful: Interoperability.

The choice to go with SSE by default proved crucial. It gave us the flexibility to run our ComfyUI "engine" on a heavy GPU server while keeping our Flowise "brain" lightweight and containerized. We also demonstrated that Governance-Driven Development allows us to use AI coding assistants to build robust, standardized infrastructure rather than just one-off scripts.

Future Improvements

While the "LAST" image hack works perfectly for a local, single-user demo, a production deployment would require Session Isolation (ensuring User A doesn't overwrite User B's "LAST" image) and TTL Cleanup (automatically deleting generated images after a set time).

Technically, this would be solved by leveraging Context Injection—using the session ID provided by the MCP protocol to maintain a keyed dictionary of states, rather than a global variable. For multi-user production usage, adding an authentication mechanism would also be a relevant next step.

You can find the full code for the ComfyUI MCP Server and the Flowise template in my GitHub repository.

Top comments (0)