(The Connection) WhatsApp Bridge

#agents #architecture #security #tutorial

The Catalyst: The Interface Is the Attack Surface

WhatsApp is the ultimate messaging interface: it is on every phone, it is end-to-end encrypted. The brain can be perfect; the connection is where pairing, allowlists, and gateway auth decide who gets to talk to the bot at all.

Phase 4 of the Practical Guide series is The Connection: gateway, plugin, channel policy, DM scoping, and groups. I run the gateway on the host (no Docker in my day-to-day path).

Overview

Parts 1 to 3 already gave you a model, Silas / policy, and media scopes. This article is where those meet the real wire: who may send messages, how the OpenClaw gateway sees them, and how session keys line up with tools.media keyPrefix rules from the Senses article. Expect more channel wiring here than in part 1, not a second lecture on the same openclaw.json fields from scratch.

My configuration enables the WhatsApp plugin, constrains the gateway, and makes session isolation explicit at the session layer in addition to skills:

Area	My settings (concept)	What it does
`plugins.entries.whatsapp`	`enabled: true`	Turns on the channel integration.
`channels.whatsapp`	`enabled`, `dmPolicy`, `selfChatMode`, `allowFrom`, `groupPolicy`, `debounceMs`, `mediaMaxMb`	Who may DM, how groups are gated, and transport limits.
`session.dmScope`	`per-channel-peer`	DMs are not one global blob; pair identity with the channel+peer.
`session.reset` / `session.maintenance`	idle reset + `pruneAfter`	Stops sessions from living forever in RAM/disk.
`gateway`	`port`, `mode`, `bind`, `auth`	Where the local gateway listens and how clients authenticate.
Host process	e.g. `gateway.cmd` + Node on Windows	I start OpenClaw’s gateway as a normal process on the machine that owns `.openclaw/`. No container in my run.

No live secrets in documentation. Use ${OPENCLAW_GATEWAY_TOKEN} in examples; generate a long random token and never paste it into chat or public repos.

In this section:

1. Gateway: Port, Loopback, Token Auth
2. How I run the gateway (no Docker)
3. Channel: DMs, Pairing, and Allowlists
4. Group Mentions: When the Bot Wakes Up
5. Webhooks, Real Time, and Where the Phone Meets the Gateway
6. session.dm Scope and Heartbeat of Trust

1. Gateway: Port, Loopback, Token Auth

A typical gateway block (shape only):

"gateway": {
  "port": 18789,
  "mode": "local",
  "bind": "loopback",
  "auth": { "mode": "token", "token": "${OPENCLAW_GATEWAY_TOKEN}" },
  "tailscale": { "mode": "off", "resetOnExit": false }
}

bind: "loopback": the gateway is not a wide-open LAN service by default. If you need remote access, that is a deliberate Connection project (see Article-05: Tailscale vs public IP).
auth.mode: "token": every client that hits the gateway must know the token. Rotate if a token ever leaks; treat it like a password.
gateway.nodes.denyCommands (if present): I deny a set of high-impact device/calendar style commands at the node layer. Adjust to match what you are willing to expose from a phone bridge.

On Windows, you may have a gateway.cmd that starts Node with the OpenClaw package. Do not commit this file to public repos; it often contains a resolved token. Prefer env-based injection for docs.

A plain text reply from the “brain” can still take tens of seconds on a slow model or a long context. On WhatsApp that feels like a hang. I already use debounceMs so one tap does not double-fire; if your bridge exposes typing or a read signal, a short “thinking” state helps more than a faster logo. The fix is UX, not more tokens in the system prompt.

2. How I run the gateway (no Docker)

I do not run the OpenClaw gateway in Docker. The gateway is a local process, same machine as my .openclaw tree, with env vars (including ${OPENCLAW_GATEWAY_TOKEN}) set the way a normal app expects. Some OpenClaw trees ship a sample docker-compose.yml for people who want a container; that is optional for other people’s deployments, not my path. If you adopt containers later, the shape of openclaw.json does not change: only how you start the process does.

3. Channel: DMs, Pairing, and Allowlists

Key fields in channels.whatsapp:

dmPolicy: pairing: unknown numbers should not get full access until a pairing/approval path completes (your OpenClaw version defines the exact UX).
allowFrom: E.164 allowlist. Yours goes here; in shared docs, describe the pattern (“owner + a trusted test number only”).
selfChatMode: true: useful when you are your own “first user” in the same app session.
groupPolicy: "allowlist": groups are not open season; only listed groups (per product docs) should get bot participation.
debounceMs: I use ~1500 ms to absorb double-tap sends and flappy connectivity before the agent does expensive work.
mediaMaxMb: cap attachments so the connection cannot be used as a free CDN stress test.

Mental model: pairing + allowlist = identity-based firewall in front of the LLM.

allowFrom and the “owner” in parts 2 and 3: channels.whatsapp.allowFrom is the E.164 allowlist of who may talk to the bot on this channel. The “operator” or “owner” phrasing in SKILL.md and owner-only media in part 3 should match your trusted thread: the same session identity the bridge uses for your DM, which is also the one you target with a tools.media keyPrefix like whatsapp:direct:+1XXXXXXXXXX in examples (use your real prefix in your private config, not a copy from a blog). Strangers in group chats that are not in your model do not get a casual path to that session.

Twilio and Business WhatsApp (my stack)

I use Twilio with a WhatsApp Business number provisioned the way Twilio’s WhatsApp product documents (this is not the same step list as a raw Meta Cloud API hand-build, and it is not a headless browser / Puppeteer bridge). Pick one guide and follow it end to end or you will mix credentials.

For beginners: people on the allowlist are still talking to your WhatsApp identity. They will read the bot as you, not a new contact card, so treat the allowlist as “who is allowed to make my number say agent output.”

Groups: the bot only speaks when the patterns match a mention of the assistant (see Group Mentions below and your groupPolicy). It does not narrate the whole group by default.

4. Group Mentions: When the Bot Wakes Up

messages.groupChat.mentionPatterns list includes variants of the assistant’s call name and casual “hey” forms so the agent does not spam a whole group on every off-topic line.

ackReactionScope: "group-mentions" (if supported in your build) keeps acknowledgement behaviour scoped.

Match patterns to the name you set under ui.assistant (I use Clawd in the UI; patterns reference @clawd and similar). Keep patterns short enough to be memorable, not so broad that every line triggers the model.

5. Webhooks, Real Time, and Where the Phone Meets the Gateway

Exact webhook URLs differ by host release. The contract in any setup is:

Provider or bridge (WhatsApp / Meta / Baileys / etc.) posts events to your gateway.
Gateway authenticates, normalises, and hands messages to the agent with a stable session key (this is the same family of keys you used in tools.media keyPrefix rules in Article-03).
Session + skills apply — Shield, log redaction, allow/deny tools.

Troubleshooting without panic: if messages stop flowing, check (port → token → channel enabled → allowFrom → plugin enabled), in that order. Nine times in ten it is a token or a restarted gateway without the same env as before.

6. session.dm Scope and Heartbeat of Trust

per-channel-peer means: “this DM thread is not that DM thread”. This is important when you later have more than one human talking to the same bot account.

Pair with maintenance:

session.reset.idleMinutes: so long-idle DMs can reset context predictably
session.maintenance: prune after a defined window (e.g. 7d) if you do not need infinite retention

The Connection article is not the Voice (Shield) article. Both are needed: wiring decides who may connect; policy decides what they may do once connected.

Conclusion (Phase 4): treat WhatsApp as a public API to your home lab. Pair, allowlist, token-auth the gateway, debounce, and cap media.

Series navigation