DEV Community: Moses Man

Self-Hosted Hermes Agent on iOS: Cloudflare Tunnel + Access Service Tokens + Hermex

Moses Man — Wed, 24 Jun 2026 19:24:59 +0000

Self-Hosted Hermes Agent on iOS: Cloudflare Tunnel + Access Service Tokens + Hermex

Get your Hermes Agent on your iPhone - without paying for a relay, without a VPN, with full Cloudflare edge protection.

The Problem

You're running Hermes Agent self-hosted on a VPS. You want to chat with it from your iPhone. There are a few paths:

HermesPilot P2P relay - works great until it goes paid
Tailscale VPN - works but you need the VPN connected every time
Cloudflare Tunnel + CF Access - great for the browser, but iOS apps can't do OAuth redirects

The last option is the most interesting because it gives you Cloudflare's edge protection, your own domain, and no per-device VPN. The problem is that Cloudflare Access normally redirects to a browser login (Google, GitHub, etc.) - which doesn't work for a native app.

The fix: Cloudflare Access Service Tokens + custom headers.

The Architecture

There are two common paths - both work with Service Tokens:

Path A: Cloudflare Tunnel → Nginx Proxy Manager (NPM)

Hermex iOS App
│ Custom headers: CF-Access-Client-Id, CF-Access-Client-Secret
▼ HTTPS (orange cloud)
Cloudflare Edge - validates Service Token
▼
Cloudflare Tunnel (cloudflared) - connects to NPM
▼
Nginx Proxy Manager (origin certificate) - routes to backend
▼
Hermes API Server (:8642) or Hermes WebUI (:8787)

NPM handles SSL termination with origin certs and gives you a nice UI for managing proxy hosts.

Path B: Cloudflare Tunnel → Direct to Hermes

Hermex iOS App
│ Custom headers
▼ HTTPS
Cloudflare Edge - validates Service Token
▼
Cloudflare Tunnel (cloudflared) - pointed at localhost:8642
▼
Hermes API Server

Simpler, no reverse proxy. Cloudflare handles SSL at the edge.

Prerequisites

A domain on Cloudflare (orange-cloud proxied)
Hermes Agent with the API Server enabled
An iOS device and the Hermex app from the App Store

Step 1: Enable the Hermes API Server

In your ~/.hermes/.env:

API_SERVER_ENABLED=true
API_SERVER_KEY=generate-a-strong-random-key
API_SERVER_HOST=127.0.0.1
API_SERVER_PORT=8642

Restart and verify:

hermes gateway restart
curl -s -o /dev/null -w "%{http_code}" http://127.0.0.1:8642/health

→ 200

Step 2: Install and Configure Cloudflare Tunnel

2a. Install cloudflared

curl -L https://github.com/cloudflare/cloudflared/releases/latest/download/cloudflared-linux-amd64 -o /usr/local/bin/cloudflared
chmod +x /usr/local/bin/cloudflared

Or on Debian/Ubuntu:

sudo apt install cloudflared

2b. Authenticate and Create a Tunnel

cloudflared tunnel login
cloudflared tunnel create hermes-tunnel

2c. Route DNS

cloudflared tunnel route dns hermes-tunnel hermes-api.yourdomain.com

2d. Configure the Tunnel

Create ~/.cloudflared/config.yml:

tunnel:
credentials-file: /home/ubuntu/.cloudflared/.json

ingress:

hostname: hermes-api.yourdomain.com service: http://localhost:8642
hostname: hermes-webui.yourdomain.com service: http://localhost:8787
service: http_status:404

Run it:

cloudflared tunnel run hermes-tunnel

Or install as a systemd service:

sudo cloudflared service install

2e. Tunnel → Nginx Proxy Manager

If using NPM, point the tunnel at localhost:80 instead:

ingress:

hostname: hermes-api.yourdomain.com service: http://localhost:80
service: http_status:404

Then in NPM, add a proxy host:

Domain: hermes-api.yourdomain.com
Forward to: http://127.0.0.1:8642
Enable Websockets
SSL tab → Cloudflare Origin Certificate
Cloudflare SSL/TLS → Full (strict)

Step 3: Cloudflare Access - Service Token

Mode	How it works	Use for
Allow	Redirects to OAuth (Google, GitHub, etc.)	Browser users
Service Auth	Validates static headers	Apps, APIs, scripts

3a. Create the Service Token

Cloudflare Zero Trust Dashboard → Access → Service Auth → Create Service Token
Name it hermex-ios.
Copy the Client ID and Client Secret immediately.

3b. Create the Access Application

Access → Applications → Add an application → Self-hosted → set domain
Add a policy with:

Action: Service Auth ← NOT "Allow"
Select the hermex-ios service token

Save. Cloudflare now accepts requests with the correct headers.

Step 4: Configure Hermex

On your iPhone:

Instance URL: https://hermes-api.yourdomain.com
Custom Header 1: CF-Access-Client-Id:
Custom Header 2: CF-Access-Client-Secret:

Press connect.

Why This Works

Service Tokens are designed for machine-to-machine auth. Cloudflare's edge reads CF-Access-Client-Id and CF-Access-Client-Secret headers on every request and validates before anything reaches your tunnel. The app never sees a login page. Same pattern as CI/CD pipelines and Terraform - just works for iOS too.

What About mTLS?

mTLS would be ideal but iOS support is painful. No native NSURLSession support without painful workarounds, certificate distribution is a UX nightmare, renewal and revocation need custom code.

Service Tokens give the same "pre-shared credential at the edge" model over HTTP headers instead of TLS handshakes.

Troubleshooting

Cloudflare login page → Policy set to Allow instead of Service Auth
401 Unauthorized → Header spelling wrong (case-sensitive)
502 Bad Gateway → tunnel/NPM can't reach the backend
Connection timeout → Tunnel not running or DNS not proxied

Quick health check:
systemctl status cloudflared
curl localhost:8642/health
dig hermes-api.yourdomain.com +short

Alternative: Tailscale

curl -fsSL https://tailscale.com/install.sh | sh
sudo tailscale up
hermeslink config set lanHost "100.x.x.x"

Works over WireGuard. Downside: VPN needed every time.

Conclusion

Cloudflare Access Service Tokens are the missing piece for authenticating native apps behind Cloudflare. With Hermex's custom header support, this takes about 10 minutes if you already have the tunnel running.

I built multi-model orchestration as a Hermes skill - Polybrain

Moses Man — Sat, 30 May 2026 00:26:56 +0000

Most AI pipelines use one model for everything.

One model plans the task, does the research, writes the answer, and checks its own work. That's like hiring one person to be your strategist, researcher, analyst, and auditor simultaneously. It doesn't scale - and more importantly, the model has no one to disagree with it.

PolyBrain is my answer to that. It's an open-source multi-agent, multi-model orchestration skill for Hermes Agent. You give it an objective, it breaks the work into roles, runs them in parallel where it can, synthesizes the outputs, and verifies every claim against its cited sources before it reaches you.

Here's what that looks like end to end.

The pipeline

Objective -> Orchestrator -> [Researcher 1 + Researcher 2 + Builder] -> Synthesizer -> Verifier -> Final Answer

Five distinct roles. Each one does exactly one thing:

Orchestrator - reads the objective, decomposes it into a JSON task plan, assigns roles
Researcher - web search and citations - no uncited claims allowed
Builder - code, terminal, and file operations
Synthesizer - merges all outputs into a single coherent deliverable
Verifier - checks every claim against its source, returns PASS/FAIL per claim

The parallel phase is where the time savings come from. Researcher 1 and Researcher 2 run at the same time. The Synthesizer only fires once both are done. The Verifier runs last.

What makes it different

Different models per role

This is the part most orchestration frameworks don't do.

In config.yaml you assign a different model and provider to each role. Researcher gets a cheaper, faster model. Verifier gets a stronger one. Orchestrator gets whatever you trust most for structured JSON output.

models:
  orchestrator: "your-model"
  researcher: "your-model"
  builder: "your-model"
  synthesizer: "your-model"
  verifier: "your-model"
  fallback: "your-model"

settings:
  max_parallel: 3
  timeout_sec: 300

You decide what goes where. PolyBrain doesn't prescribe it - because the right answer depends on what you're running, what you're paying, and what you actually trust.

Citation enforcement

Researchers are required to include URLs. Uncited claims don't make it to the Synthesizer - they're dropped at the source. This isn't a soft suggestion in the prompt. It's structural: the Verifier checks each surviving claim against its cited source and returns a verdict.

In the example run below, the Verifier caught a real data discrepancy in Azure revenue figures. That's the point - you want something that pushes back.

Artifact logging

Every run saves a timestamped folder:

.hermes/plans/polybrain/20260528_191548/
├── orchestrator.json
├── task_t1.md
├── task_t2.md
├── synthesis.md
└── verification.md

You can audit exactly what each role produced and why the final answer looks the way it does.

A real example

Objective: "Create a market brief on Apple with three bullets on revenue trends and two competitors."

The Orchestrator decomposes this into four tasks - two parallel researchers, a synthesizer, a verifier.

t1 (Researcher) - revenue trends - pulls from SEC filings and Apple Newsroom. Returns three years of top-line figures ($383.3B -> $391.0B - $416.2B) with a source URL for each.

t2 (Researcher) - competitor profiles - Samsung (hardware/smartphones) and Microsoft (cloud/AI). Revenue context and competitive positioning, all cited.

Both run at the same time.

t3 (Synthesizer) - merges both outputs into a clean brief. Preserves inline citations. Drops anything uncited.

t4 (Verifier) - checks every claim against its source. Flags a mismatch in a competitor cloud revenue figure, provides the corrected bullet with evidence.

Total runtime: ~4 minutes. Parallel research phase: ~2 minutes.

Getting started

# Clone into your Hermes skills folder
git clone --depth=1 https://github.com/mosesman831/PolyBrain.git /tmp/polybrain
cp -r /tmp/polybrain ~/.hermes/skills/research/polybrain

# Edit config.yaml with your model aliases
# Then validate it
python ~/.hermes/skills/research/polybrain/scripts/validate_config.py

Then just tell Hermes what you want:

Use PolyBrain to research Apple's latest earnings and competitors

Hermes loads the skill. PolyBrain handles the rest.

What it doesn't do (yet)

No persistent state across runs - if it crashes mid-run, you restart from scratch. Hermes Kanban handles durable state natively but PolyBrain doesn't plug into it yet.
Some models hang in subagent calls - test with hermes chat -q "ping" -m your-model before committing a model to a role.
Verifier can occasionally truncate numbers - PASS/FAIL verdicts are structurally correct but some models strip leading digits from dollar amounts in the report text.

Why I built it this way

Single-model pipelines have a ceiling. The model can't critique itself meaningfully. There's no disagreement, no verification layer, no separation between "the thing that did the research" and "the thing that checks the research."

PolyBrain is built around the idea that different roles benefit from different models - and that the value of an orchestration layer is precisely that it enforces structure the models themselves wouldn't maintain.

It's a Hermes skill, it's config-driven, it's open source. If you're running Hermes Agent and want to try it:

GitHub: github.com/mosesman831/PolyBrain

Feedback welcome - especially if you find models that work well for specific roles.

This article was written with the help of AI, based on my own docs, config, and terminal output.

Made with ❤️ by LatticeAG