DEV Community: Michael Egberts

Our first offline app just shipped — and no one wrote a line of code

Michael Egberts — Fri, 05 Jun 2026 15:31:43 +0000

This week, the first offline-first PWA went live on WebsitePublisher.ai.

A travel blog that works without internet. Write posts on a plane, attach photos, and everything syncs the moment you reconnect. Service Worker, IndexedDB, sync queue — the full stack.

The twist: it was built entirely through conversation with an AI assistant. No IDE, no terminal, no deploy pipeline.

How that works

WebsitePublisher.ai exposes 92 integrations as building blocks via MCP (Model Context Protocol). Any AI assistant — ChatGPT, Claude, Cursor, Windsurf, Copilot, Gemini, Grok, Mistral — connects to the same runtime and assembles these blocks into working applications.

The offline-first PWA is one of those blocks. The AI doesn't generate a Service Worker from scratch. It activates a proven, tested building block and configures it for the use case.

We call this wave coding — one deliberate wave of proven pieces, instead of 15 fragile vibe-coding attempts.

What shipped recently

Offline-first PWA building block — push/pull sync, conflict handling, IndexedDB storage, works on iOS
92 integrations (up from 78) — 45 built-in, 47 bring-your-own-key
Integration stacks — pre-composed combinations: e-commerce (13 integrations), lead generation, B2B prospecting, booking, content/blog
9 AI platforms supported — all via MCP, no vendor lock-in
416 API endpoints across the platform ## The architecture in short

AI assistant (any) → MCP → WebsitePublisher runtime
                              ├── PAPI (pages + assets)
                              ├── MAPI (structured data)
                              ├── SAPI (forms + auth + sessions)
                              ├── IAPI (integration proxy)
                              ├── VAPI (encrypted vault)
                              └── AAPI (scheduled AI agents)

Credentials never touch the AI. They're stored AES-256-GCM encrypted in the vault and injected server-side during execution.

The positioning

We're not competing with Lovable or Bolt on the chat interface. We're the Supabase + Vercel + n8n underneath — reachable via whichever AI you already use.

The platform your AI builds on.

websitepublisher.ai

I Connected Hermes Agent to a Live MCP Server with 59 Tools and Here's What It Actually Built

Michael Egberts — Thu, 28 May 2026 10:29:23 +0000

This is a submission for the Hermes Agent Challenge: Write About Hermes Agent

The Problem Nobody Talks About

Every AI can generate HTML. Give Claude, ChatGPT, or Gemini a prompt and they'll produce a beautiful landing page in seconds. But here's the thing nobody mentions in the demos:

The HTML has nowhere to go.

You copy it. You find hosting. You configure DNS. You set up SSL. You build a form backend. You connect a payment provider. You do this every single time, for every single client. The AI is fast. Everything around it is slow.

That's why we built WebsitePublisher.ai — an AI web platform where the AI doesn't just generate HTML, it publishes a live website through MCP tools. Pages, forms, data, payments, visual editing — 59 tools that turn any AI conversation into a working website.

But we hit a different problem.

The Skill Problem

We wrote a SKILL.md — 1,800 lines of documentation teaching AI assistants how to use our tools correctly. Patching rules, fragment conventions, design context, form integration patterns. Everything an AI needs to know.

And every AI interpreted it differently.

Claude sometimes forgot to re-fetch pages after patching. ChatGPT would skip fragments and hardcode headers into every page. Gemini ignored design context entirely. We rewrote the skill dozens of times. Added more examples. Simplified. Restructured. Still, each platform had its own blind spots.

Then we discovered Hermes Agent.

What Hermes Agent Actually Is

I'll be honest — I initially thought Hermes was an AI model. It's not.

Hermes Agent is an agent framework. It's the orchestration layer — tools, memory, and self-improving skills — that sits between you and any LLM. You plug in Claude, GPT-4o, Gemini, or a local model as the "brain." Hermes handles the rest.

Think of it like this:

Framework	Brain
Claude Code	Claude
Cursor	Claude / GPT
Hermes Agent	Your choice

The key differentiator: self-improving skills. Hermes learns from its own sessions. It builds reusable knowledge documents that persist across conversations. The more it works, the better it gets.

For us, this was the missing piece. Instead of rewriting our SKILL.md for every AI platform, what if we put Hermes in front — as an enforcement layer that learns our tool patterns once and applies them correctly, regardless of which LLM is doing the thinking?

We had to test it.

Setting Up the Connection

The setup was surprisingly smooth. On a Mac:

1. Install Hermes:

curl -fsSL https://raw.githubusercontent.com/NousResearch/hermes-agent/main/scripts/install.sh | bash

2. Install MCP support:

cd ~/.hermes/hermes-agent
uv pip install -e ".[mcp]" --python ~/.hermes/hermes-agent/venv/bin/python

3. Configure Claude as the brain (in ~/.hermes/config.yaml):

provider: anthropic
model: claude-sonnet-4-20250514

4. Add WebsitePublisher as MCP server:

mcp_servers:
  websitepublisher:
    url: "https://mcp.websitepublisher.ai/mcp"
    auth: oauth
    timeout: 60

5. Authenticate:

hermes mcp login websitepublisher

The browser opened, OAuth 2.1 + PKCE completed in seconds, and:

✓ Authenticated — 59 tool(s) available

59 tools discovered. No session ID issues. No configuration headaches. Total setup time: under 10 minutes.

The Tests

We ran six structured tests to see how Hermes handles a real-world MCP integration. No cherry-picking — these are the actual results.

Test 1: Skill Loading + Page Creation

Prompt: "Call get_skill to load the WebsitePublisher skill, then build a simple landing page for an AI-powered website builder called 'Hermes Web'. Use a modern dark theme with a hero section, 3 feature cards, and a CTA button."

Hermes did something interesting. Without being told the specific workflow, it:

Called get_skill — twice (main + design guidelines)
Called get_project_status to check for existing design context
Set a design context via execute_integration to persist the color palette
Created the page with create_page That workflow — skill → context → build — is exactly what our SKILL.md recommends. Hermes absorbed it and followed it.

Result: A professional dark-themed landing page, built in 1 minute 27 seconds. Gradient text, glass-morphism cards, responsive design, meta descriptions set, SEO enabled.

👉 See it live: hermes-mcp.websitepublisher.ai

Test 2: Patching

Prompt: "Change the CTA button text to 'Launch Your Site Now' and update the hero subtitle."

Hermes correctly chose patch_page over update_page — a targeted edit instead of replacing the entire page. It found both CTA buttons (hero and footer) and updated them in a single call.

Time: 17 seconds. No mismatches, no broken HTML.

Test 3: Fragments

This is where most AI assistants struggle. Fragments are reusable components (headers, footers) that share across pages — they use a different tool (create_fragment) than pages, and they can't be patched, only fully replaced.

Prompt: "Create a reusable header fragment with navigation and the same dark theme."

Hermes correctly used create_fragment (not create_page), named it site-header, maintained the exact same color palette and fonts, added a mobile hamburger menu, and documented the include tag: .

Time: 44 seconds.

Test 4: Data Entities

Prompt: "Create a 'features' entity with title, description, icon, and sort_order fields. Set public read. Add 3 records."

Hermes created the entity, enabled public read, and added 3 records in separate calls with correct sort ordering. Then — without being asked — it documented both the SSR template syntax ( with mustache tags) and the JavaScript fetch endpoint.

That SSR syntax comes directly from our SKILL.md. Hermes learned it and applied it correctly.

Time: 32 seconds.

Test 5: Form Integration

The SAPI form system has a specific pattern: configure the form server-side first, then add the HTML. The honeypot spam protection is automatic — if an AI manually adds a honeypot field, it breaks the form.

Prompt: "Add a contact form. Do NOT manually add a honeypot field."

Hermes called configure_form first (correct order), then patch_page to add the HTML. No manual honeypot. Dark theme styling maintained. Rate limiting configured.

Time: 1 minute 6 seconds.

Test 6: Self-Reflection

Prompt: "What have you learned? What would you do differently?"

This is where it got interesting. Hermes identified its own mistakes:

"I created the features entity but then didn't actually use it in the page! Should have replaced the static feature cards with SSR."

"Should have immediately updated the landing page to use the fragment instead of the inline header."

And a quote that captures exactly why this matters:

"The platform remembers so the AI doesn't have to."

The Results

Test	Status	Time	Key Observation
Skill + Page Creation	✅	1m 27s	Followed SKILL.md workflow without being told
Patching	✅	17s	Correct method, found both CTAs
Fragments	✅	44s	Right tool, consistent design
Data Entities	✅	32s	SSR syntax from skill applied
Forms	✅	1m 6s	No manual honeypot, correct order
Self-Reflection	✅	41s	Identified own mistakes

6 tests. 0 failures. 9% context window used. ~5 minutes total tool execution time.

That context efficiency deserves emphasis. After loading a 1,800-line skill document, creating a full page, patching it, building a fragment, setting up a data entity with 3 records, configuring a form, and reflecting on the session — Hermes had used just 9% of its context window. That means you could run 10x this workload in a single session before hitting any limits.

There's another thing we didn't expect: Sonnet performed like Opus. Through Hermes, Claude Sonnet 4 produced output quality we normally associate with Opus — structured reasoning, correct workflow ordering, self-criticism. It's as if the agent layer acts as a pre-processor that elevates the underlying model's performance by providing the right context at the right time. The skill system doesn't just teach the LLM what to do — it makes a mid-tier model punch above its weight.

The Bigger Insight

The test results are nice. But here's what actually matters:

Hermes Agent can be a skill enforcement layer.

Right now, when we support a new AI platform, we have to test whether it interprets our SKILL.md correctly. Does it follow the patching rules? Does it use fragments? Does it remember to set design context? Every platform has different blind spots, and we end up rewriting the skill over and over.

With Hermes in the middle, the equation changes:

Before: Every AI must understand the skill → different interpretations → inconsistent results

After: Hermes understands the skill → any LLM can be the brain → consistent results

The self-improving skill system means Hermes gets better at using our tools over time. It builds up patterns, learns from mistakes ("I should have used the entity SSR instead of static HTML"), and applies those lessons in future sessions. The underlying LLM doesn't matter — the agent layer enforces quality.

This isn't theoretical. We just proved it works with a live MCP server, real OAuth authentication, and a published website you can visit right now.

What's Next

We're exploring Hermes Agent as a permanent part of our platform architecture — not just as another supported AI, but as the orchestration layer that sits in front of all of them. One skill to learn, one agent to enforce it, any brain to power it.

The code and configuration are open source: github.com/megberts/mcp-hermes-integration

The live test result: hermes-mcp.websitepublisher.ai

WebsitePublisher.ai is an AI web platform where AI assistants build and publish complete websites via MCP. 59 tools, OAuth 2.1, works with Claude, ChatGPT, Gemini, Cursor, Copilot, Grok, Mistral, and now — Hermes Agent.

Google I/O Just Made MCP Inevitable

Michael Egberts — Wed, 20 May 2026 19:01:30 +0000

This is a submission for the Google I/O 2026 Writing Challenge

Yesterday, Sundar Pichai stood on stage and described Gemini Spark — a 24/7 personal AI agent that runs in the cloud, works while you sleep, and integrates with third-party tools through MCP.

I watched that announcement from a beach chair, on my phone. And I smiled. Because I run one of those third-party MCP servers.

WebsitePublisher.ai exposes 55+ tools through the Model Context Protocol. Nine AI platforms already connect to it — Claude, ChatGPT, Cursor, GitHub Copilot, Windsurf, Gemini, Grok, Mistral, and others. When Google announced that Spark will use MCP for third-party integrations, it wasn't a surprise. It was confirmation.

MCP just went from "promising open standard" to "the protocol Google built its flagship agent on."

Here's what that means from the perspective of someone who's been building on MCP for the past year.

Three I/O Announcements That Matter for MCP

1. Gemini Spark Runs on MCP

Spark is Google's most ambitious agent product yet. It runs on dedicated cloud VMs, powered by Gemini 3.5 and the Antigravity framework. It handles long-horizon tasks in the background — tracking RSVPs, managing workflows, sending reminders — without you keeping a browser tab open.

The critical detail: Spark will connect to third-party tools through MCP. Not a proprietary Google protocol. Not a plugin marketplace with approval gates. MCP — the same open, JSON-RPC based standard that Anthropic published and that dozens of platforms already support.

For MCP server operators like us, this means our existing infrastructure just gained access to Google's most powerful agent. We don't need to build a new integration. We don't need to apply to a directory. When Spark's MCP support ships, our 55 tools are available to it immediately.

2. Antigravity 2.0 Goes Agent-First

Antigravity is Google's developer platform, and version 2.0 leans hard into agents. The new CLI supports subagent orchestration, terminal sandboxing, credential masking, and Git-aware policies.

What caught my attention: the architecture assumes agents will call external tools as a core workflow, not an afterthought. The sandboxing, the credential management, the ability to spin up specialized subagents — all of this assumes a world where AI agents routinely reach out to external services via standardized protocols.

That's the MCP model. Build once, connect everywhere.

3. AI Edge Gallery Gets MCP Support

This one flew under the radar, but it might be the most interesting for the open-source community. Google AI Edge Gallery now supports MCP, with Gemma 4 handling reasoning locally while only the API calls leave the device.

Think about what that means: an open-weight model, running on your phone or edge device, calling MCP tools on remote servers. The reasoning stays private. Only the structured tool calls travel over the network. That's a privacy-first agent architecture built entirely on open standards.

What "MCP Everywhere" Actually Looks Like in Production

When people hear "MCP support," they think about the protocol spec. I think about what happens at 2 AM when a model sends a malformed tool call.

Running an MCP server in production across 9 platforms has taught me things that don't show up in protocol documentation. Here's what Google's MCP bet actually means for the ecosystem:

Every platform implements MCP slightly differently. Claude sends tool calls one way. ChatGPT structures them another. Cursor batches things. Copilot has its own patterns. The protocol is standardized, but the behavior isn't. When Gemini Spark joins this ecosystem, it will bring its own quirks. MCP server builders need to be resilient to all of them.

Model size determines orchestration depth, not tool-call success. I wrote about this in detail in my Gemma 4 article — simple tool calls succeed regardless of model size. What varies is how many sequential, context-dependent calls a model can chain before losing coherence. With Spark running on Gemini 3.5 and persistent cloud VMs, Google is betting on deep orchestration. That changes what MCP servers need to support.

Authentication is the real battleground. MCP specifies OAuth 2.1 for auth, but every platform handles it differently. Some use session tokens. Some use project-scoped keys. Some do dynamic client registration. When we tested our server on MCP Playground last week, it connected and authenticated — but tool discovery failed because our server was too restrictive about token types. Multiply that by every new platform adopting MCP, and you see the challenge: the protocol is open, but making it work everywhere requires constant adaptation.

From Vibe Coding to Wave Coding

There's a bigger shift happening underneath these announcements, and Google I/O crystallized it for me.

The current hype is "vibe coding" — you prompt an AI, it generates code, you hope it works. It's fun for demos. It's terrifying for production.

What MCP enables is something we've started calling wave coding: instead of generating code from scratch, the AI assembles proven, production-tested software building blocks through structured tool calls. Each wave of assembly builds on the last. The AI doesn't write your payment integration from a prompt — it calls execute_integration with your Stripe credentials and configures a tested, deployed payment flow.

Google's I/O announcements accelerate this shift. When Spark can call MCP tools in the background, 24/7, on dedicated VMs — that's not vibe coding anymore. That's an agent riding waves of pre-built, battle-tested components to deliver real results while you sleep.

Our MCP server already supports 13 e-commerce integrations: product catalogs, shopping carts, checkout flows, payment processing, invoice generation, inventory tracking. An agent like Spark could orchestrate an entire webshop build through sequential MCP calls — not by generating code, but by assembling proven pieces.

That's the trajectory Google just endorsed.

What MCP Server Builders Should Do Right Now

If you're building or running an MCP server, here's what I'd prioritize based on the I/O announcements:

Support deep orchestration. Spark runs on dedicated VMs with Gemini 3.5. It will attempt longer tool-call chains than any current platform. Your server needs to handle 10-15+ sequential calls within a single session without state confusion.

Harden your auth. Accept multiple token types (session tokens, project-scoped keys, OAuth flows). Every new platform that adopts MCP will try to authenticate differently. Be permissive in what you accept, strict in what you authorize.

Make tool schemas discoverable. Your tools/list endpoint is your storefront. When Spark connects and asks what you can do, the response needs to be clear, well-structured, and complete. Poor schemas mean poor tool selection by the agent.

Test across platforms. We test against 9 platforms. When Spark launches its MCP support, it'll be 10. Each one surfaces different edge cases. What works perfectly with Claude might fail silently with Gemini.

The Bigger Picture

A year ago, MCP was a specification from Anthropic. Today, Google built its flagship consumer AI agent on it. Cursor, Copilot, Windsurf, Mistral, Grok — they all support it too.

We're watching MCP become the HTTP of AI agents: an open protocol that lets any model talk to any tool, regardless of who built either one.

Google I/O 2026 didn't invent this future. But it made it inevitable. When the company that runs Search, Gmail, Android, and Chrome tells the world "our AI agent uses MCP for third-party tools," the debate is over. MCP is the standard.

For those of us who've been building on it, that's not a surprise. It's a validation.

And for everyone else: the doors are open. The protocol is documented. The models are ready. The only question is what you'll build.

Written and published from a phone, during Google I/O, while running the MCP server that just got a whole lot more relevant.

Resources:

MCP Specification — the protocol powering all of this
WebsitePublisher.ai — our MCP server, free tier with 55+ tools
Google I/O 2026 Keynote — Sundar Pichai's full recap
My Gemma 4 x MCP article — testing Gemma 4 as an MCP agent from a phone

Which Gemma 4 Variant Should Power Your MCP Agent?

Michael Egberts — Mon, 18 May 2026 05:48:02 +0000

I’m writing this from a phone, on vacation. That’s not a flex — it’s the point.

I run an MCP server in production: WebsitePublisher.ai, 55+ tools, 9 AI platforms connected. This afternoon I opened Google AI Studio on my phone, selected Gemma 4 26B, gave it our tool schemas, and asked it to build a bakery website. It returned six structured tool calls. I executed them. The site went live.

No laptop. No terminal. No IDE. Just a phone, a model, and a protocol.

That experience crystallized something I’ve been thinking about: Gemma 4 doesn’t ship one model — it ships four, each sized for a different deployment reality. The question isn’t whether open-weight models can power MCP agents. It’s which Gemma 4 variant fits which part of your agent stack?

What is MCP?

Model Context Protocol is an open standard (JSON-RPC based) that lets AI models call external tools through a universal interface. Instead of each AI platform building proprietary integrations, you build one MCP server and every compatible AI can use it.

Our MCP server exposes tools like create_page, upload_asset, create_record, configure_form, and execute_integration. When a model connects, it can create web pages, manage structured data, handle form submissions, and trigger third-party services — all through standardized tool calls.

Any model that can produce structured output can be an MCP client. The question is: how well does it handle the work once connected?

The Gemma 4 Lineup

	E2B	E4B	26B A4B	31B Dense
Active params	~2B	~4B	3.8B (26B total, MoE)	31B
Context	128K	128K	256K	256K
Audio input	Yes	Yes	No	No
Min RAM	~4 GB	~8 GB	~16 GB	~24 GB
Runs on	Phone, RPi	Laptop	Dev workstation	GPU server

All Apache 2.0. No usage caps, no MAU thresholds.

The Test: Gemma 4 26B Builds a Website from a Phone

Here’s exactly what happened.

I opened Google AI Studio on my iPhone, selected Gemma 4 26B A4B IT, and pasted a prompt containing five of our MCP tool schemas (create_page, create_entity, create_record, configure_form, list_pages) along with this instruction:

“A user says: Build me a simple landing page for my bakery called ‘Golden Crust’. Include a short intro, three signature breads, and a contact form. Respond with the exact sequence of MCP tool calls.”

Gemma 4 returned six tool calls in valid JSON:

create_entity — defined a “bread” data model with name and description fields
create_record x3 — added Sourdough, French Baguette, and Honey Whole Wheat
create_page — generated a full HTML landing page with inline CSS, product listings, and a contact form
configure_form — set up the contact form with name, email, and message fields

Every tool call used the correct parameter structure. The ordering was logical: data model first, then records, then the page that references them, then the form configuration. The HTML included sensible styling, warm bakery colors, and properly structured sections.

I copied the tool calls into Claude (the AI assistant I use for MCP execution), ran them against our server, and the site went live at gemma-test.websitepublisher.ai.

Total time from prompt to live website: under 10 minutes. From a phone. Over 5G.

What This Test Actually Proves

Let me be precise about what this demonstrates and what it doesn’t.

It proves: Gemma 4 26B can parse MCP tool schemas, reason about task decomposition, produce correctly structured tool calls, and sequence them in a logical order — all without any fine-tuning on our specific tools. This is zero-shot tool use on a real production API.

It doesn’t prove: That Gemma 4 can handle a live MCP connection autonomously. In this test, I manually copied the tool calls and executed them. The model generated the plan; I was the middleware.

That distinction matters, and it’s where the variant comparison gets interesting.

Mapping Variants to MCP Agent Roles

Based on running MCP across 9 AI platforms and watching models of every size class interact with our tools, here’s how I’d think about placing each Gemma 4 variant:

E2B: The Front Door

With ~2B active parameters, E2B fits as the trigger: the component that understands intent and dispatches a single tool call. A voice command on a phone — “publish my latest blog post” — parsed and routed to the right MCP tool. One intent, one call, one response.

The native audio input is the differentiator. For voice-triggered MCP agents on battery-constrained devices, this is the size class that makes sense.

Likely sweet spot: Single-tool dispatcher. Voice-triggered agent entry point.
Likely limitation: Multi-step chains where context from earlier calls matters.

E4B: The Local Workhorse

This is where local MCP agents become genuinely useful. Running on any modern laptop, handling single-step tool calls with good reliability.

Based on what I’ve seen at this parameter range: straightforward create-and-deploy loops work well. Where models this size show limits is context-dependent sequences — “build a five-page site with consistent navigation” requires maintaining consistency across multiple creation steps.

Likely sweet spot: Local development agent. Content creation. Moderate single-step tool calls.
Likely limitation: Multi-page orchestration requiring consistency across 4+ sequential calls.

26B A4B: The Efficiency Sweet Spot

This is the variant I tested. And it delivered.

Six sequential tool calls, all correctly structured, logically ordered, with a coherent HTML output that referenced the data model it had just created. That’s not trivial — it requires the model to hold its own plan in context and execute against it consistently.

The MoE architecture (activating only 3.8B parameters per token while drawing on 26B total) and the 256K context window make this variant particularly suited for MCP work. Tool schemas are large — our 55+ tools consume significant context before the model even starts reasoning. The 256K window gives comfortable headroom.

But the bakery test was deliberately simple. Our MCP server also exposes 13 e-commerce integrations — product catalogs, shopping carts, checkout flows, payment processing via Stripe or Mollie, invoice generation, inventory tracking, and more. Building a full webshop means orchestrating these proven software building blocks in sequence: the AI picks the right pieces and combines them into a working application. We call this wave coding — not prompting and praying like vibe coding, but riding deliberate waves of AI-assembled, production-tested components. Each wave builds on the last. That’s where a model like the 26B earns its place: enough reasoning depth to orchestrate 6-8 integration calls reliably, enough context to hold the full picture.

Proven sweet spot: Multi-step tool orchestration. Production agent server. The “right answer” for most self-hosted MCP deployments.
Likely limitation: Highly creative or ambiguous tasks where raw reasoning power matters more than efficiency.

31B Dense: The Precision Architect

Every token touches all 31B parameters — no routing, no sparsity. Slower, heavier, but the strongest reasoner in the family.

For MCP agent work, this class earns its compute in two scenarios: architecture-level planning where the sequence of tool calls matters as much as individual calls, and fine-tuning for domain-specific tool patterns. The dense architecture makes fine-tuning more predictable than MoE.

Where 31B pulls ahead of 26B is full wave coding sessions — building an entire webshop from brief to live, orchestrating 15+ sequential integration calls while maintaining consistency across product data, payment configuration, email templates, and frontend pages. That’s the kind of sustained, multi-layer orchestration where every additional parameter matters.

Likely sweet spot: Complex project planning. Full wave coding orchestration. Fine-tuned domain agents.
Likely limitation: Cost and latency. For tasks where 26B delivers equivalent results, you’re burning compute you don’t need.

What I Learned About Model Size and Tool Calling

Running MCP across 9 platforms, one pattern stands out: for simple tool calls, model size barely matters. A “create this page” request succeeds with roughly the same reliability across model classes.

Where model size becomes decisive is orchestration depth — the number of sequential, context-dependent tool calls a model can chain before losing coherence. At two to three calls, almost anything works. Past six calls, only the stronger reasoners maintain consistency.

Open-weight models give you something closed APIs never will: the ability to match model weight to task weight. Route simple status checks to E4B and complex builds to 31B. Your agent gets smarter and cheaper at the same time.

That’s the real unlock of open-weight + MCP: you own both the brain and the hands.

The Decision Framework

Need voice or audio input?
Then E2B (phone/IoT) or E4B (laptop)

How many sequential tool calls per task?
1-3: E4B — fast, light, capable
4-8: 26B A4B — tested and proven
8+: 31B Dense — when orchestration quality justifies compute

Fine-tuning for a specific domain?
31B Dense — dense fine-tunes more predictably than MoE

Budget-constrained?
26B A4B. Almost always the answer.

What’s Next

While testing, I discovered that MCP Playground — an online tool for testing MCP servers — lists both Gemma 4 26B and 31B as available models. Our server connects and authenticates successfully. Once we resolve a token compatibility issue on our end, this will enable fully automated testing: type a prompt, Gemma 4 calls our MCP tools directly, website appears. No copy-paste middleware needed.

That’s the trajectory: from “model generates a plan I execute manually” to “model executes the plan autonomously through MCP.” Gemma 4’s native function calling support, combined with MCP’s standardized tool protocol, makes this path viable on fully open-source infrastructure.

If you want to start experimenting:

Gemma 4 models — Google AI Studio, Ollama, Hugging Face
MCP specification — modelcontextprotocol.io
An MCP server to test against — WebsitePublisher.ai has a free tier with 55+ tools

Pick the variant that fits your hardware. Connect it to a real MCP server. The benchmarks start mattering a lot less once you’re watching a model build something real.

Built and tested entirely from a phone. On vacation. Because that’s what open protocols and open models make possible.

Which Gemma 4 Variant Should Power Your MCP Agent?

Michael Egberts — Sat, 16 May 2026 14:35:48 +0000

📌 This article is now an official Gemma 4 Challenge submission on DEV. Read the latest version there!

This is a submission for the Gemma 4 Challenge: Write About Gemma 4

I’m writing this from a phone, on vacation. That’s not a flex — it’s the point.

No laptop. No terminal. No IDE. Just a phone, a model, and a protocol.

What is MCP?

Any model that can produce structured output can be an MCP client. The question is: how well does it handle the work once connected?

The Gemma 4 Lineup

	E2B	E4B	26B A4B	31B Dense
Active params	~2B	~4B	3.8B (26B total, MoE)	31B
Context	128K	128K	256K	256K
Audio input	Yes	Yes	No	No
Min RAM	~4 GB	~8 GB	~16 GB	~24 GB
Runs on	Phone, RPi	Laptop	Dev workstation	GPU server

All Apache 2.0. No usage caps, no MAU thresholds.

The Test: Gemma 4 26B Builds a Website from a Phone

Here’s exactly what happened.

“A user says: Build me a simple landing page for my bakery called ‘Golden Crust’. Include a short intro, three signature breads, and a contact form. Respond with the exact sequence of MCP tool calls.”

Gemma 4 returned six tool calls in valid JSON:

create_entity — defined a “bread” data model with name and description fields
create_record x3 — added Sourdough, French Baguette, and Honey Whole Wheat
create_page — generated a full HTML landing page with inline CSS, product listings, and a contact form
configure_form — set up the contact form with name, email, and message fields

I copied the tool calls into Claude (the AI assistant I use for MCP execution), ran them against our server, and the site went live at gemma-test.websitepublisher.ai.

Total time from prompt to live website: under 10 minutes. From a phone. Over 5G.

What This Test Actually Proves

Let me be precise about what this demonstrates and what it doesn’t.

That distinction matters, and it’s where the variant comparison gets interesting.

Mapping Variants to MCP Agent Roles

Based on running MCP across 9 AI platforms and watching models of every size class interact with our tools, here’s how I’d think about placing each Gemma 4 variant:

E2B: The Front Door

The native audio input is the differentiator. For voice-triggered MCP agents on battery-constrained devices, this is the size class that makes sense.

Likely sweet spot: Single-tool dispatcher. Voice-triggered agent entry point.
Likely limitation: Multi-step chains where context from earlier calls matters.

E4B: The Local Workhorse

This is where local MCP agents become genuinely useful. Running on any modern laptop, handling single-step tool calls with good reliability.

Likely sweet spot: Local development agent. Content creation. Moderate single-step tool calls.
Likely limitation: Multi-page orchestration requiring consistency across 4+ sequential calls.

26B A4B: The Efficiency Sweet Spot

This is the variant I tested. And it delivered.

31B Dense: The Precision Architect

Every token touches all 31B parameters — no routing, no sparsity. Slower, heavier, but the strongest reasoner in the family.

What I Learned About Model Size and Tool Calling

That’s the real unlock of open-weight + MCP: you own both the brain and the hands.

The Decision Framework

Need voice or audio input?
Then E2B (phone/IoT) or E4B (laptop)

How many sequential tool calls per task?
1-3: E4B — fast, light, capable
4-8: 26B A4B — tested and proven
8+: 31B Dense — when orchestration quality justifies compute

Fine-tuning for a specific domain?
31B Dense — dense fine-tunes more predictably than MoE

Budget-constrained?
26B A4B. Almost always the answer.

What’s Next

If you want to start experimenting:

Gemma 4 models — Google AI Studio, Ollama, Hugging Face
MCP specification — modelcontextprotocol.io
An MCP server to test against — WebsitePublisher.ai has a free tier with 55+ tools

Pick the variant that fits your hardware. Connect it to a real MCP server. The benchmarks start mattering a lot less once you’re watching a model build something real.

Built and tested entirely from a phone. On vacation. Because that’s what open protocols and open models make possible.

WAVE Coding: Why we built 78 integrations for AI instead of letting AI build them

Michael Egberts — Thu, 14 May 2026 19:38:45 +0000

Every week I see another "I built a SaaS in 4 hours with AI" post. And every week, the comments are the same: "Cool, but does the Stripe integration actually work?"

Usually it doesn't.

That's vibe coding. You prompt, you hope, and you pray that the AI correctly implements a payment flow it's never actually tested. It hallucinates webhook handlers. It guesses at email configs. It builds checkout flows that break on the first real transaction.

We took the opposite approach.

The puzzle piece pattern
We're building WebsitePublisher.ai — a platform where AI assistants build and publish websites and web applications through conversation. Available on 9 AI platforms (Claude, ChatGPT, Gemini, Cursor, Windsurf, GitHub Copilot, Grok, Mistral, n8n) via MCP protocol.
In the last 16 days, we shipped 78 integrations. Each one is a self-contained puzzle piece — proven software running on our web servers. AI doesn't generate the integration code. AI calls the integration.
Here's what that looks like in practice:

User: "I need a webshop with Stripe payments and order confirmation emails"

AI selects:
→ product-catalog (MAPI entity + helpers)
→ shopping-cart (session-based)
→ checkout-flow (orchestration engine)
→ stripe (payment processing)
→ invoice-generator (PDF + accounting)
→ email-templates (Resend rendering)

Result: 6 tested puzzle pieces combined into a working application.

Zero hallucinated Stripe webhooks. Zero guessed SMTP configs. The heavy lifting happens in proven software on the server.

Why this matters
The fundamental problem with vibe coding is that AI is asked to do two things at once:

Understand what you want (AI is great at this)
Implement reliable infrastructure (AI is terrible at this)

WAVE coding separates the two. AI handles #1 — understanding your intent and selecting the right puzzle pieces. The proven software handles #2 — the actual Stripe calls, the email delivery, the database queries.

What's in the 78 puzzle pieces
Some highlights from what shipped in 16 days:
E-commerce stack:
Product catalog, shopping cart, checkout flow, order management, inventory tracking, shipping (MyParcel), invoice generation, discount codes
Communication:
SMTP email, email templates, contact forms, multi-layer spam protection
Data layer:
Server-side rendering for SEO, batch update/delete endpoints, data grids with validation
AI layer:
Coach (guided website creation), concept generation, streaming chat
Platform hardening:
DNSSEC, request tracing, error envelope standardization, security hardening

Each piece follows the same pattern: a handler receives the endpoint, input, and project ID. Dependencies are explicit. No magic.

The results
In the same 16 days:

7 new customers onboarded
World Cup prediction game deployed for PSV Supporters (30,000 members)
Visual editor upgrade shipped
Coach AI guidance system improved

We're calling it WAVE coding

Not because it's a clever acronym. Because each application you build is a wave — one deliberate push that combines existing puzzle pieces into something new. Each wave builds on what came before.
Vibe coding is random energy hoping to land somewhere useful.
WAVE coding is deliberate momentum. 🌊

Curious what you think. Are you building infrastructure for AI to use, or letting AI build infrastructure from scratch?
websitepublisher.ai

We're now in Mistral's connector directory — here's what that means for AI-powered web publishing

Michael Egberts — Tue, 05 May 2026 11:54:06 +0000

Hook: WebsitePublisher.ai is a pre-configured Directory Connector in Mistral's curated MCP connector directory. That's a mouthful, so let me break it down: Le Chat users can now find us in their connector settings, click Add, complete OAuth, and immediately start building websites through conversation.

Key points to cover:
What a Directory Connector is vs Custom Connector
OAuth 2.1 + DCR auto-discovery
55+ MCP tools available
How it compares to our ChatGPT integration (Custom GPT = zero setup)
The 10-platform story
Link to setup guide: websitepublisher.ai/docs/mcp#mistral-setup

Building an AI-native web platform: 69 features in 11 days (solo dev log)

Michael Egberts — Tue, 28 Apr 2026 10:35:51 +0000

I'm building WebsitePublisher.ai — a platform where AI assistants build and publish complete websites through MCP (Model Context Protocol) tools. Here's what the last 11 days looked like.

The Visual Editor Problem
Users could build entire websites through conversation, but changing a single typo required asking the AI to patch the page. Terrible UX for small edits.
Solution: WPE v2 — a visual editor that loads the published page in an iframe, detects editable elements, and lets users click-to-edit text, drag-and-drop images, right-click for context menus, and save back through the same PAPI that AI assistants use.

Key decisions: httpOnly cookies instead of URL tokens, change count badge in dashboard, overlay detection for CSS-covered images.

The Integration Cookbook
Our integration engine (IAPI) supports http (external proxy) and internal (PHP handler) drivers. After shipping LinkedIn posting and SMTP email in the same week, I wrote an Internal Integration Cookbook:

Create manifest JSON with endpoint definitions
Implement handler extending InternalIntegrationHandler
Define body_transform and response_transform hooks
Register in engine

Result: the API Proxy integration took 2 hours from start to production.

MCP Session Stability
OAuth 2.1 + PKCE authentication. Claude's connector occasionally lost session state → 401 errors mid-conversation. Fixed with a 6-layer approach: token refresh, graceful 401 handling, session persistence, stale token detection, project selector refresh, frontend error boundaries.

Custom Domains with DNS Pre-validation
SSL certificate requests would fail when DNS hadn't propagated. Now we validate CNAME resolution before calling certbot. Clear error messages instead of cryptic failures.
Numbers
Tasks completed (11 days): 69
Total MCP tools: 55
AI platforms supported: 9
API layers: 10
Directory listings: 15+

→ websitepublisher.ai

Week 16 Dev Log: From First Customer to 56 MCP Tools — Building an AI-Native Website Publisher

Michael Egberts — Mon, 13 Apr 2026 14:05:54 +0000

I'm building WebsitePublisher.ai — a platform where AI assistants build and publish websites through MCP (Model Context Protocol) tools. This week: first paying customer, team collaboration, and some interesting architecture decisions.

The Stack

Quick context: Laravel/PHP backend, dual-server DigitalOcean cluster, Redis Sentinel, MySQL, S3 for assets. The AI layer is a multi-API stack:

PAPI — pages and assets
MAPI — entities and structured data
VAPI — vault/secrets management
IAPI — third-party integrations (Stripe, Resend, Twilio, etc.)
SAPI — sessions, forms, visitor auth
TAPI — task tracking across AI sessions
AAPI — scheduled automations

All exposed as MCP tools. Currently 56 tools, accessible from Claude, ChatGPT, Cursor, Windsurf, GitHub Copilot, Gemini, Grok, Mistral, and n8n.

What shipped this week

Team Collaboration

Our first paying customer — an agency — asked for multi-user support on day one. We shipped it within 48 hours:

Owner invites team members via email
Team members get full access to all projects
Magic link authentication (no passwords)
Max 5 members per Agency account

Architecture decision: no per-project granularity in v1. Simpler model, ship fast, iterate based on real usage.

Dashboard Vault — Secrets Without AI Exposure

A security feature I'm particularly proud of: the Vault tab lets users manage API keys (Stripe, Resend, etc.) through the browser UI. The key insight: if you share an API key in an AI chat, it ends up in transcripts, logs, and context windows. The Vault bypasses AI entirely — write-once, never displayed, rotate or delete only.

Backend uses AES-256-GCM encryption keyed per project, so even if someone gains database access, secrets from other projects are unreadable.

Language Refactor — From 4 Hardcoded Lists to Zero

Our AI Coach (a conversational website builder) needed proper i18n. The old code had 4 separate hardcoded language→string mappings scattered across the codebase. We refactored to a single CapiLanguage value object with a Redis → DB → Haiku → fallback chain:

Check Redis cache
Check papi_language_meta table (27 seeded languages)
Ask Claude Haiku for language detection (costs ~$0.001)
Fall back to English

Result: any of the 250+ ISO 639-1 codes work automatically. Adding a new language = 1 database row.

get_asset MCP Tool — Closing the Read Gap

We had tools to list, upload, patch, and delete assets — but no tool to read them. AI agents were doing blind find/replace via patch_asset without knowing the current file state. get_asset closes that gap: text content returned directly, binary assets as base64.

The Activation Challenge

67 signups, one paying customer. The gap is real. Our research this week showed that the friction is in the "how do I start?" moment — users sign up, see a dashboard, but don't know which AI platform to connect or how to begin.

Our answer: an embedded AI coach right inside the dashboard. Uses Sonnet (not Opus — cost control), generates one concept, writes directly to the user's first project. From "I just signed up" to "I have a website" in under 2 minutes.

What's next

Friday release agent (AAPI-powered automated changelog + test plans)
patch_asset optimistic concurrency (base version hash to prevent conflicts)
CAPI language refactor retest (waiting on external tester confirmation)

If you're building with MCP or interested in AI-native development tools, I'd love to connect. The MCP ecosystem is moving fast.

GitHub: megberts/websitepublisher-mcp
MCP Server: mcp.websitepublisher.ai

How we built an MCP server that lets AI assistants publish complete websites

Michael Egberts — Wed, 25 Mar 2026 07:53:24 +0000

Building a website with an AI assistant usually ends the same way: you get a wall of HTML in a code block, you paste it somewhere, and then you're on your own for hosting, deployment, and every update after that.
We wanted to fix that. So we built WebsitePublisher.ai — a platform where AI assistants don't just describe websites, they actually build and publish them.
Here's how it works under the hood.

The core idea: AI as a first-class developer
The premise is simple. Instead of AI being a code generator that hands off to a human, we wanted AI to be the developer — with access to a real API it can call directly.
That API needed to cover the full stack of what a website actually needs:

Pages and assets (HTML, CSS, images)
Structured data (entities, records)
Forms and visitor sessions
Integrations (email, SMS, payments)
Scheduled tasks
Vault (credentials management) So we built it. Eight API layers, all exposed through a Model Context Protocol (MCP) server.

What MCP gives us
MCP is an open protocol that lets AI assistants call tools — similar to function calling, but standardized across clients. Claude, ChatGPT, Cursor, Windsurf, GitHub Copilot, and others all support it.
Our MCP server exposes ~55 tools. An AI can call create_page with HTML content and it's live. It can call configure_form and a contact form appears. It can call create_scheduled_task and a nightly content refresh starts running.
The AI doesn't need to know about hosting, DNS, or deployment. It just calls the tools.

The API layers
We ended up with eight layers, each with a clear responsibility:
PAPI (Pages & Assets) — Create, update, and version HTML pages and static assets. Includes diff-patch for surgical updates, URL fetching, and a content quality warning system.
MAPI (Entities & Data) — A schema-less data layer. The AI defines entities (think: database tables) and creates records. Powers everything from contact lists to leaderboards to inventory.
SAPI (Sessions & Forms) — Anonymous visitor sessions, form submissions, visitor authentication, and analytics. No cookies to configure — it just works.
VAPI (Vault) — Encrypted credential storage. The AI stores API keys that are then used by integrations — never exposed back to the client.
IAPI (Integrations) — A proxy engine that routes calls through stored credentials to external services. Resend, Mailgun, Stripe, Mollie, Twilio — the AI picks the integration, the vault provides the credentials.
AAPI (Agent API) — Scheduled tasks. The AI creates cron jobs that run PHP handlers on a schedule. Daily content refresh, nightly cleanup, automated data sync.
CAPI (Coach API) — A conversational intake system. Ask four questions, generate a complete website. The AI handles the conversation; the platform handles the generation.

A real example
Here's what a Claude session looks like when building a site from scratch:

User: Build me a landing page for my consulting business. Focus on lead generation.

Claude: [calls get_skill to load WebsitePublisher context]
[calls create_page with full HTML/CSS]
[calls configure_form with name, email, message fields]
[calls setup_integration with Resend credentials]
[calls execute_integration to test email delivery]

Done — your page is live at yourproject.websitepublisher.ai.
The contact form sends leads to your inbox via Resend.
Want me to add a thank-you page or set up SMS notifications too?

No copy-paste. No deployment step. The AI did it.

The interesting engineering problems
A few things that weren't obvious until we built them:
Multi-session coordination. When multiple AI sessions work on the same project in parallel, they can overwrite each other's progress. We built TAPI — an append-only task tracking system — specifically to solve this. Each session logs progress via INSERTs only. MAX(completion_pct) from history records prevents any session from accidentally rolling back another's progress.
Tool count limits. Our MCP server returns 55 tools in tools/list. Some clients have limits on how many they load. Our workaround: the get_skill tool loads a SKILL.md document that gives the AI a map of the full API — so even with five tools loaded, it can use the REST API directly for everything else.
Content quality detection. AIs occasionally send a file path instead of HTML content to create_page. We added a WarningCollector that catches this pattern and returns a structured warning before anything gets saved.
Authentication across API layers. Each layer needed a different auth model. Project tokens (wpa_) for AI access. Dashboard sessions (wps_) for humans. Admin tokens (wsa_) for visitor-facing login flows. Getting these to coexist cleanly took a few iterations.

What's next
We're currently in the Mistral sprint — working on SSE streaming for the conversational intake (so responses feel instant instead of batched), parallel concept generation, and getting listed in Mistral's connector directory.
If you're building with MCP, or thinking about what "AI-native" infrastructure actually means in practice — we'd love to hear what you think.

The MCP server is at mcp.websitepublisher.ai
Full docs at websitepublisher.ai/docs