DEV Community

Discussion on: How I Built an Image Generation MCP Server with Gemini 2.5 Flash Image (aka nano-banana)

Collapse
 
guypowell profile image
Guy

Well done on the MCP framing. Now move beyond “it basically works” to “it will survive scaling.”

Here's what I'd do to make it shippable:

Add validation: after image generation, run lightweight consistency checks (e.g., perceptual hash comparisons for character features). If off, fallback to alternative prompt strategy (e.g., longer seed or explicit example injection).

Lock down test behaviour: isolate random seeds in integration tests, record golden outputs, and flag drift as CI failures rather than “flaky.”

Raise MCP abstraction: don’t just append text instructions, consider embedding an instruction template or schema so that phrasing drift is minimized.

Next steps / TL;DR

Ship the orchestration layer, that’s your real product. It maps intent → flags → prompts → results.

Harden the edge cases: validation + deterministic testing.

Then scale: test STT, streaming, and full-stack agent loops next.

Collapse
 
shinpr profile image
Shinsuke KAGAWA

Thanks a lot for the detailed feedback — particularly the suggestion to use perceptual hashing for validation; that's a great idea I hadn't considered. I'm planning to work on improvements along these lines:

  • For testing: calling the external API directly from CI isn't practical, so I'll explore manual or E2E-style tests to cover that gap (and use mocked/recorded responses for CI where useful).
  • For validation: I'll dig into perceptual hashing and similarity checks and see how far they can help.
  • For the orchestration layer: I want to experiment with MCP's sampling (server-side LLM sampling to normalize prompts) before sending them to Gemini.

I'll be tackling these step by step. Really appreciate the push to think beyond the prototype phase!

Collapse
 
guypowell profile image
Guy

Glad I could help spark some new thinking. Good luck!