Dmitrii

Posted on May 1

Why I still run hub-and-spoke WireGuard in 2026 (and what breaks at server #2)

#wireguard #networking #devops #opensource

In 2026, "just use Tailscale" is the default answer to anything WireGuard-shaped, and for most people it's the right one. This post is for the cases where it isn't — regulated egress, fixed-IP allowlists, agentless contractor access, MSP per-client isolation — and where you end up running a small fleet of classic publicly addressable WG servers on purpose.

Setting up one of those servers in 2026 is trivial. AI writes the wg0.conf, docker compose up and wg-easy is running in a minute, generating peers takes one click. The hard part hasn't been setup for years.

The hard part is what happens after the second server.

The moment you have two, the friction shifts. It's no longer about commands or configs. The questions change. Who has access to what? When was this peer issued? Whose pubkey lives where? How do I revoke a contractor in one place? Setup is solved. Operations are not.

By the time my "one server" had quietly turned into several, I was managing them through separate dashboards with credentials in separate password manager entries. The fleet just... happened. Not because anything was hard. Each addition was a 30-minute task that made local sense, and the tools around WireGuard kept treating "one server" as the unit of operation.

Multiple servers, multiple teams, no shared state. No central place to answer the basic question: who has access to what.

Below: when hub-and-spoke is still the right shape in 2026, where every WireGuard admin tool falls off the cliff at server 2, and what I ended up building when nothing existing scaled.

TL;DR

Mesh overlays (Tailscale, NetBird, Defguard) eat most VPN use cases in 2026 — but not all. Fixed-IP egress, agentless access, MSP per-tenant isolation, and compliance-driven network paths still want classic hub-and-spoke WG.
Most WG admin UIs (wg-easy, wg-portal, WireGuard-UI) work great for one server. Past that, you have N independent dashboards and no shared state.
The platforms that do scale (Tailscale, NetBird, Firezone post-1.0) solve a different problem (mesh/ZTNA), not classic publicly addressable WG fleets.
Nobody's solving the gap I was sitting in: many hub-and-spoke WG servers, central operator UI, REST API on each box.
I split it into a Console (central source of truth) and a Node (per-server REST agent). Two-node walkthrough, one technical deep-dive, and an honest list of what's missing — below.

When hub-and-spoke is the right shape in 2026

Before the operations story, the architecture question. Mesh overlays have eaten most VPN use cases in 2026, and rightly so. The cases where they haven't, and where classic publicly addressable WG servers stay the right tool:

Fixed-IP egress. Third-party APIs, banks, partners, geo-locked services that allowlist your public IPs. A mesh's egress is wherever the exit node happens to be; an allowlist needs a stable, declared IP in a known AS.
Agentless access. Contractors, auditors, customer support staff who get a .conf file and a standard WireGuard client. No agent install, no SSO enrollment, no device posture dance, no asking the contractor's IT department for permission.
Per-tenant isolation for MSPs and agencies. One WG server per client, hard network boundary, separate key material, separate billing. A shared mesh is the wrong blast radius and the wrong invoice.
Streaming, residential, latency-sensitive egress. Country-locked APIs, game servers, IPTV, regional CDNs. You want a specific public IP in a specific AS, not "wherever DERP routes you."
Compliance with declared network paths. Auditors who want to see a diagram with arrows pointing at IPs, not a hand-wave about NAT traversal and relay servers. If none of those apply, stop reading and use Tailscale. If one of them does, you end up with multiple WG servers, and the rest of this post is for you.

How a bastion becomes a fleet

I've heard versions of this story from enough people now that I'm pretty sure the path is universal. It starts the way mine did: one VPN server, because it beats exposing SSH or internal APIs to the public internet. Then something forces a second, and the trigger is almost never architectural. It's stuff like:

Staging is reachable by all engineers; prod-write only by SRE. Two networks, two policies, two servers.
Compliance carved off a data-VPC, and that one needs its own access controls because the auditor will ask.
An acquisition or partnership brings somebody else's networks into the picture, and now you're keeping their VPN running too.
A new region gets allowlisted by a partner API. The egress IP is now load-bearing — it can't change, and it can't be shared with the staging cluster. None of these felt like a project. Each one was a 30-minute task. That's the trap: my unit of operation was "one server," and every tool around WireGuard was designed around that same unit. New container in two commands. Peer generated through the UI in a click. Config sent to the user via Slack DM or 1Password share, whichever's lying around. Done. Move on.

The realization wasn't about the count. It was the first contractor offboarding. Their pubkey lived across three wg0.conf files, two Ansible inventories, and one wg-easy database. There was no single place to enumerate where it existed, no log of what got removed, and no way to prove to anyone — including myself — that revocation was complete. After that, "one server at a time" stopped looking like a strategy and started looking like a compliance problem I'd built on purpose.

That's where the existing tools start cracking.

What I tried, and why each one cracked

Writing my own thing was the last option, not the first. I went through everything else I could find. Table for skimmers, longer breakdown after.

Tool	Where it broke for me
wg-easy, wg-portal, WireGuard-UI	Single-server by design. Past server #2 you have N independent SQLite databases and no way to enumerate "all peers across the fleet."
Firezone (1.0+)	Pivoted from WG admin to ZTNA: requires an IdP, gateway model, policy engine. Right product, wrong shape for "I just want to edit peers on existing WG boxes."
Tailscale, Headscale	Mesh + DERP relays. Solves NAT traversal, which I don't have. Doesn't give me a fixed public egress IP, which I do need.
NetBird, Defguard	Defguard does support classic WG, but bundles its own IdP/SSO stack. Overkill if you don't already run Keycloak. NetBird is mesh-first, same answer as Tailscale.
pfSense / OPNsense	Edge-firewall model. Putting a full firewall appliance on every cloud VM to get a UI is the wrong unit of deployment.
Ansible / Terraform + AWX/Tower	The closest real alternative. Works if your team already lives in IaC. The downside isn't "PRs are slow" — it's that `git log` is a change log, not an audit trail of who currently has access.

wg-easy. It's great for what it is. One server, one Docker container, one web UI. I ran it for months on a single box and was happy. Adding a second node gave me two completely independent dashboards with no shared state. Issuing a peer now meant deciding which dashboard, remembering its admin password, and tracking the assignment somewhere outside the tool. Three nodes in, my "somewhere outside the tool" was a Notion page that lied to me twice a week.

wg-portal, WireGuard-UI, and friends. Same architectural shape. Single-server UIs, some prettier than others. None of them are control planes. They're admin panels.

Firezone. I was excited about Firezone for a stretch. Somewhere around their 1.0, though, it pivoted from "nice WireGuard admin panel" to a full ZTNA platform: identity providers, policy engine, gateway model, deeper Kubernetes integration. Probably the right direction for them as a company. But my mental model is still "I have these WG boxes, please let me edit their peers from one screen," and Firezone got too heavy for that.

Tailscale and Headscale. Completely different model. Mesh, not hub-and-spoke. Every node knows every node, NAT traversal happens through DERP relays, and the "exit node" abstraction replaces the idea of a regional egress server. I use Tailscale for my own devices and it's lovely for "connect my laptop to my home server." But once the requirement becomes "user traffic has to egress from a known public IP in country X," or "give a contractor access to a specific VPC without installing a Tailscale agent on their machine," the mesh model is solving a problem I don't have. My nodes are already publicly addressable. That isn't a constraint I'm trying to work around.

NetBird, Defguard. Defguard is closer than the others — it does support classic hub-and-spoke alongside its mesh story. The catch is that it expects to own identity too: Keycloak or its own IdP, SSO config, the full enrollment flow. For a small fleet where I just want to issue and revoke peers, that's a lot of platform to install for a UI. NetBird is mesh-first, same family as Tailscale.

pfSense / OPNsense. WireGuard is a feature of the firewall. Lovely if your edge is already pfSense. If it isn't, putting pfSense on every cloud VM just to get a UI is not the move.

Ansible / Terraform with AWX or Semaphore on top. This is the most honest alternative on the list, and I'll say it plainly: if your team already lives in IaC and you have AWX or Tower running, you can absolutely manage a WG fleet this way and skip everything below. The reason I didn't is narrower than "PRs are slow." It's that git log answers "what changed" but not "who currently has access" — those are different questions, and the second one is what audit and offboarding actually need. A Git history is a change log. An access list is a state.

What I noticed after going through all of them: each tool either solves a smaller version of my problem (one server) or a structurally different one (mesh/ZTNA), or asks me to commit to a much larger platform (IdP, IaC pipeline, full ZTNA stack) than the problem warrants. The gap I was sitting in (classic hub-and-spoke WireGuard servers, several of them, with one place to manage peers and one place to read the audit log) just wasn't anybody's target.

The pattern that worked

After the third time I almost wrote a wg-easy fork that supported multiple backends, I split the system properly. Two services:

Console. Central control plane. Web UI, database, and the only thing I touch as an operator. Holds the inventory of nodes, peers, and who got issued what and when.
Node. Runs on each WG server. Owns the local wg interface and the address pool. Exposes a REST API that the Console calls over an API key.

A few things this split bought me that a single binary couldn't:

Console outage doesn't kill the VPN. Nodes keep serving existing peers. They can optionally persist their peer store on disk, so they survive a reboot even with the Console offline.
One pane of glass without tight coupling. Adding a node is running another container and pasting its API endpoint and key into the UI. No shared database, no orchestrator dependency, no service mesh.
The API is the API. Anything the UI does, a script can do. No part of operations requires SSH-ing into a box to run wg set.
Node upgrades don't lose peer state. Peer store is decoupled from the binary version. It's the same control-plane / data-plane pattern Kubernetes and friends have used for years. The reason it doesn't show up in small infra tools is simple: nobody's first server needs it, and by the time the second one shows up, you've already started building the wrong way.

What this split doesn't buy you. Worth being upfront: the Console is still a single point of management. If its database goes, you lose the inventory and audit history — Nodes keep running and existing peers keep working, but you're back to per-box state for new operations. HA Console isn't there yet (see Limitations below). Backups are on you. This is a trade I made knowingly: a simpler, single-binary Console that I can actually maintain in spare time, instead of a Raft-based clustered control plane I can't.

One technical wrinkle worth mentioning: IP allocation under concurrency

Most of the system is straightforward CRUD over a REST API. There's one bit that wasn't, and it's the kind of thing that bites you the first time two operators issue peers on the same node within the same second.

The naive flow is: Console asks Node "what IPs are in use?", Console picks the next free one, Console tells Node "create peer with IP X." If two Console requests interleave between the read and the write, both can pick the same IP. You get two peers, same address, last-write-wins on the WG side, and one user mysteriously can't connect.

The fix isn't clever, just disciplined: the Node owns the address pool, not the Console. Console says "give me a peer," Node atomically allocates from its local pool, persists the assignment to its peer store before returning, and only then reports the chosen IP back. Allocation and persistence happen under a single lock per interface. Console treats the returned IP as authoritative — it doesn't pick, it receives.

This sounds obvious written down. It is not obvious when you're three weeks in and the natural design is "Console is smart, Node is dumb." Making the Node authoritative for state it physically owns (the WG interface, the address space, the live peer set) and the Console authoritative for state nobody else has (which operator did what, when) is the boring version of the CAP theorem you actually run into in practice.

The same principle is why peer_store_file on the Node is the Node's, not a Console concern: when the Node reboots, it must be able to reconstruct the WG interface state without phoning home. The Console's database is the audit log. The Node's peer store is the truth.

A two-node minimal setup

Here's what that looks like end-to-end. Console on one machine, two Nodes on two others. This is WGKeeper, the thing I ended up building: AGPL-3.0, Docker images on GHCR, full walkthrough in the Quick start docs. What's below is the compressed version, just enough to convey the shape.

Console — anywhere reachable from your browser; your laptop works for a first try.

First, generate a secret key for session signing (keep it stable across restarts — rotating it invalidates all sessions):

openssl rand -hex 32

Then drop the output into SECRET_KEY below:

# docker-compose.yaml
services:
  wgkeeper-console:
    image: ghcr.io/wgkeeper/console:latest
    container_name: wgkeeper-console
    ports:
      - "8000:8000"
    environment:
      PORT: 8000
      DATABASE_URL: file:/app/data/wgkeeper-console.db
      SECRET_KEY: REPLACE_WITH_OUTPUT_OF_openssl_rand_hex_32
      BOOTSTRAP_ADMIN_PASSWORD: change-me-now
      COOKIE_SECURE: "false"
    volumes:
      - wgkeeper-console-data:/app/data
    restart: unless-stopped
volumes:
  wgkeeper-console-data:

Node runs on each WireGuard server. Let's call them de-1 and nl-1. Each one needs its own config — and its own API key. Generate one per node with the same command:

openssl rand -hex 32

Then put it into config.yaml:

# config.yaml
server:
  port: 51821
auth:
  api_key: "REPLACE_WITH_OUTPUT_OF_openssl_rand_hex_32"
wireguard:
  interface: "wg0"
  subnet: "10.10.0.0/24"     # different /24 per node
  server_ip: "10.10.0.1"
  listen_port: 51820
routing:
  wan_interface: "eth0"      # check with `ip route` — varies by cloud

# docker-compose.yaml
services:
  wireguard:
    image: ghcr.io/wgkeeper/node:latest
    cap_add: [NET_ADMIN, SYS_MODULE]
    volumes:
      - ./config.yaml:/app/config.yaml:ro
      - ./wireguard:/etc/wireguard
    ports:
      - "51820:51820/udp"
      - "51821:51821"
    sysctls:
      - net.ipv4.conf.all.src_valid_mark=1
      - net.ipv4.ip_forward=1
    restart: always

Bring up the Console, then both Nodes. In the Console UI, click Nodes → Add node twice. Paste each Node's API endpoint and its API key, click Check node to verify connectivity, then Save node. Both should flip to online within seconds.

The Nodes list is the main operator view. Every connected node shows its status, running version, and whether an update is available. Same UI handles two nodes or twenty. You see what's running, what's stale, what's unreachable, in a single screen.

Click any node to drill into it. The Peers tab is where most operational work lives: issuing access, revoking it, checking who's actually connected versus who just has a config sitting around. Every peer shows its allowed IPs, public key, status, last handshake, creation date, and expiry. Revocation is one click.

To create a peer on de-1: the Console allocates an IP from de-1's 10.10.0.0/24 (via the Node, per the wrinkle described above), generates the keypair, and hands you a wg-quick-compatible config to deliver to the user. Same flow on nl-1 produces a config in a different subnet. One operator UI, two regions, no SSH involved anywhere.

The audit view is the thing that justified the rewrite. Per peer: created by which operator, at what time, on which node, current status, and (if revoked) when and by whom. Per operator: every action they've taken across the fleet. This is the answer to "did we actually offboard that contractor everywhere," and it's the question wg-easy literally cannot answer because it doesn't know other wg-easy instances exist. It's also the answer I was missing the night I went hunting for a pubkey across three config files.

For anything past a tinker setup (TLS in front of the Console, HTTPS for the Node API, wireguard.peer_store_file so peers survive reboots, the per-node Prometheus endpoint feeding the bundled Grafana dashboard), see the docs.

What WGKeeper is not

A few things to be direct about, because it saves time:

No SaaS. You self-host both Console and Nodes.
Not a Tailscale clone. No mesh, no NAT traversal, no DERP relays. Nodes need to be publicly reachable on UDP. Clients are plain hub-and-spoke peers using the standard WireGuard client.
Not a ZTNA platform. No identity providers, no per-app policies, no device posture. If those are hard requirements, Firezone or Defguard fit better.
Not a firewall. It manages WireGuard. The nftables rules around it are still on you.
No magic. The Node uses wireguard-tools under the hood. You still need to understand WireGuard.

If any of those are dealbreakers, good. You saved an evening.

Known limitations

Being equally direct, because pretending these don't exist insults the reader:

No SSO/OIDC. Login is local users + password. If your org requires SAML/OIDC, this is a blocker today.
No RBAC. Operators are operators; there's no "read-only" or "node-scoped" role yet.
No HA Console. One instance, one database. Backups are on you.
API key rotation is manual. Generate, paste into Console, restart Node. Not painful at fleet sizes I run, painful at fleet sizes I don't.
No built-in secret delivery. The generated .conf is shown once in the UI; getting it to the user is still your problem (1Password, Bitwarden, signed link, whatever). All of these are tracked. Some are "soon," some are "when someone needs it badly enough to PR." If one of them is a hard blocker for you, open an issue saying so — that's how priority gets set.

Where it fits

WGKeeper is for people whose problem reduces to: "I run multiple publicly addressable WireGuard servers and I want one place to manage their peers and read their audit log."

The shapes I see most often:

Cloud bastions across regions. Separate VPCs in eu-central-1, us-east-1, and friends, each with its own access list.
Per-environment isolation. Separate WG networks for prod, staging, dev, or for compliance-isolated data VPCs that have their own audit requirements.
MSPs and agencies running per-client isolated WireGuard environments across providers.
Homelab fleets with regional egress nodes for streaming, latency-sensitive games, or country-locked APIs.
Self-hosters sharing access with family or friends across multiple boxes, plus anyone who outgrew per-server wg-easy installs as the fleet grew past one or two. Where it doesn't fit: single-server setups, where wg-easy is straight-up better since it's less to run. Anything mesh-shaped (use Tailscale or NetBird). And anywhere identity-aware policy is the hard requirement (Firezone, Defguard).

State of the project

WGKeeper lives at wgkeeper/wgkeeper on GitHub. That's the umbrella repo, with links to the Console and Node component repos. Docs are at wgkeeper.github.io: install guides, configuration reference, the Grafana dashboard, the API reference. Docker images on GHCR.

Honestly: this is a side project. Built and maintained in spare time. No startup behind it, no roadmap deck, no investor narrative. The GitHub org isn't bigger than it looks. I'm posting this because for a project like this to find the people it could actually help, those people have to randomly read about it, and that's not happening on its own.

If the problem matches yours, three concrete ways to help, in order of usefulness:

Run it for a week and open an issue titled "expected X, got Y." Even one of these is more valuable than a star.
Tell me which Known Limitation above is the blocker for you. SSO? RBAC? HA? I'll prioritize what people actually hit, not what looks good on a roadmap.
PRs welcome. There are good-first-issue labels on both repos. Review can take a few days; the bar isn't high and the codebase isn't scary.

Project links

Documentation — install, configure, operate
Quick start — one Console, one Node, working VPN
Umbrella repo on GitHub
Component repos: Console, Node

Top comments (2)

Mykola Kondratiuk • May 8

that gap between wg-easy setup and actual fleet ops hits hard. key rotation and peer endpoint drift are where things break in ways the config never warns you about.

M • May 5

maybe you want to try wgmesh.dev builder? it tooks less than 2 minutes to build mesh