Muhammad Usman

Posted on Apr 25

How to Automate Telegram, WhatsApp and Email With One OpenClaw Agent

#devchallenge #openclawchallenge #discuss #ai

OpenClaw Challenge Submission 🦞

This is a submission for the OpenClaw Writing Challenge

Every single day, I was opening Telegram to message my team, then WhatsApp for a client, then Gmail for something formal, then Slack for a quick update. Four apps. Same information. Typed again and again. It felt stupid. I thought — there has to be a smarter way.

So I built one. And this post is the complete guide to how I did it.

The Real Problem

The problem isn't that we have too many messaging apps. The problem is that none of them understand why you're sending a message. They only care that you pressed Send.

Think about this simple situation:

"Tell the team I'll be 20 minutes late"

Even with every app installed on your phone, you still have to:

Decide which app to open
Find the right group or person
Type the message yourself
Hit send

Multiply that by every daily coordination you do — team updates, client check-ins, deployment alerts, meeting reminders — and you're burning real time just routing words through platforms.

I wanted one AI that understands what I mean once, then handles the rest. That's exactly what OpenClaw made possible.

What OpenClaw Actually Is

Before I get into the build, let me explain OpenClaw quickly because people often confuse it with a simple chatbot library. It's not.

OpenClaw is a programmable AI framework built around three things:

Skills — small, isolated units of work the AI can call on. Think: "send a Telegram message", "query a database", "call a webhook"
Agents — the brain of the system. A persistent entity that decides which Skills to use based on what you actually said
Memory & Routing — this is the part that makes everything feel intelligent. It remembers past conversations, your preferences, your contacts, and uses that to make better decisions each time

The important thing to understand is this: OpenClaw separates understanding your intent from executing the action. You don't write "if user says Telegram, do X." You describe a Skill, and the Agent figures out when and how to use it. That's a completely different way of thinking about automation.

The System I Built

I designed what I now call a Communication OS — one entry point, multiple platforms, zero manual switching.

Here's the architecture:

┌─────────────────────────────────────────────┐
│         You (Telegram / WhatsApp)           │
└──────────────────┬──────────────────────────┘
                   │ natural language
                   ▼
┌─────────────────────────────────────────────┐
│         OpenClaw Agent + Memory             │
│  • Understands your intent                  │
│  • Checks your contact preferences          │
│  • Decides which platform to use            │
│  • Formats the message properly             │
└──────────────┬──────────────┬───────────────┘
               │              │
   ┌───────────▼───┐    ┌─────▼──────────┐
   │ Telegram Skill │    │ WhatsApp Skill │  … Slack, Email
   └───────────────┘    └────────────────┘

Step 0 — Install and Set Up OpenClaw

pip install openclaw
openclaw init comm-os
cd comm-os

Once you run this, you'll see a folder structure like:

comm-os/
├── skills/
├── agent.yaml
└── main.py

The skills/ folder is where each platform integration lives. The agent.yaml is where you define the brain. main.py wires everything together. Simple and clean.

Step 1 — Building the Telegram Skill

I chose Telegram as my main entry point because it has the cleanest developer API and I use it every day. Here's the Skill that listens to my messages and passes them to the Agent:

Create skills/telegram_io.py:

from openclaw import Skill, register_skill
from telegram import Update
from telegram.ext import Application, MessageHandler, filters

@register_skill
class TelegramIO(Skill):
    """Listens to Telegram messages and sends them to the OpenClaw agent."""

    def __init__(self, bot_token: str):
        self.token = bot_token
        self.app = Application.builder().token(bot_token).build()
        self.app.add_handler(
            MessageHandler(filters.TEXT & ~filters.COMMAND, self.handle_message)
        )

    async def handle_message(self, update: Update, context):
        user_text = update.message.text
        agent = self.get_agent()        # OpenClaw injects the active agent here
        response = await agent.process(user_text)
        await update.message.reply_text(response)

    async def on_start(self):
        await self.app.initialize()
        await self.app.start()
        self.app.updater.start_polling()

Three things to understand about this code:

@register_skill makes this Skill visible to the OpenClaw runtime automatically
self.get_agent() gives you access to the full agent — memory, other Skills, everything
This Skill has zero knowledge of WhatsApp, Slack, or email. It just passes the raw message to the Agent and delivers whatever the Agent decides. All the platform logic is somewhere else entirely. That decoupling is the most important architectural decision in this whole system.

Here's what it looks like in action — I told my bot to set up a daily morning email briefing:

I told it once: "Every day at 7:30 AM, pull my unread emails, categorize them, send me a summary on Telegram." Done.

And this is what it delivered the very next morning — without me touching my laptop:

Caught a Vercel deployment failure and a GitHub token update — delivered directly to my Telegram. I didn't open my email once.

That's when it clicked for me. This isn't a bot. This is something closer to an employee that works while you sleep.

Step 2 — Teaching the Agent to Understand Intent

Most automation tools are keyword-matching machines. If you say "Telegram", they fire the Telegram action. If you say "email", they fire the email action. That works great — until you say "message the team" and the system has no idea what you mean.

OpenClaw's Agent is different. You configure it with a system prompt that teaches it to think in terms of structured intent, not keywords. Here's my agent.yaml:

name: CommOrchestrator
model: gpt-4o
temperature: 0.2

system_prompt: |
  You are a communication routing agent. Your job is to read the user's
  natural language instruction and produce a structured JSON action.

  Rules:
  - If no platform is mentioned, use the "preferred_channel" from memory
  - Always confirm before sending to WhatsApp (Business API policy)
  - Never ask for clarification — make your best inference and act

  Output format:
  {
    "action": "send_message",
    "platform": "<telegram|whatsapp|slack|email>",
    "recipient": "<group or person>",
    "message": "<final polished message text>",
    "confirm_needed": true/false
  }

Then in code, the Agent parses this and routes to the right Skill:

import json
from openclaw.agent import Agent

class CommOrchestratorAgent(Agent):
    async def process(self, user_input: str) -> str:
        raw = await self.llm.generate(self.system_prompt, user_input)
        action = json.loads(raw)

        if action.get("confirm_needed"):
            return await self.ask_confirmation(action)

        return await self.execute_action(action)

Notice temperature: 0.2 — I kept it low on purpose. For routing decisions, I want the agent to be consistent and predictable, not creative. Save the creativity for message formatting.

The rule "never ask for clarification — make your best inference and act" is counterintuitive but critical. Every time a bot asks "Did you mean Telegram or WhatsApp?" it loses a user. Act confidently. Let the user correct you if needed.

Step 3 — WhatsApp Integration

WhatsApp is famously locked down. You cannot just automate it freely — the official path requires the Business API, and the easiest way to access that is through Twilio.

Here's how I built the WhatsApp Skill:

import os
from twilio.rest import Client
from openclaw import Skill, register_skill

@register_skill
class WhatsAppSkill(Skill):
    def __init__(self):
        self.client = Client(
            os.getenv("TWILIO_ACCOUNT_SID"),
            os.getenv("TWILIO_AUTH_TOKEN")
        )

    async def send_message(self, to_number: str, body: str) -> str:
        message = self.client.messages.create(
            from_='whatsapp:+1111111111',   # Twilio sandbox number
            body=body,
            to=f'whatsapp:{to_number}'
        )
        return message.sid

The Agent calls self.use_skill('WhatsAppSkill').send_message(...) whenever the platform in the JSON action is "whatsapp". The exact same pattern works for Slack's Web API, SendGrid for email, or anything else.

Here's the architecture of how OpenClaw connects to WhatsApp specifically — this diagram shows it clearly:

OpenClaw sits in the middle — your personal WhatsApp number on one side, your contacts on the other. It operates using your linked account.

Step 4 — Memory: The Feature That Makes Everything Feel Smart

Without memory, every conversation starts from zero. The agent doesn't know who "the team" is. It doesn't know your preferred platform. It doesn't know Alice's WhatsApp number. It's useless.

With memory, the system gets smarter every time you use it. Here's what I store:

{
  "user_profile": {
    "name": "DevMasterMind",
    "timezone": "Asia/Karachi",
    "default_platform": "telegram",
    "communication_style": "casual"
  },
  "contacts": [
    { "name": "M.Usman", "role": "Developer", "telegram": "@devmastermind.official@gmail.com", "whatsapp": "+920000000000" },
    { "name": "Asad", "role": "manager", "telegram": "@asad}
  ],
  "teams": {
    "dev_team": { "telegram_channel": "#dev-team", "members": ["Ali", "Omar"] },
    "stakeholders": { "email_list": "stakeholders@company.com" }
  }
}

Save a contact once with /remember DevMasterMind +920000000000 WhatsApp and the Agent will use it forever. The next time you say "ping Sarah", it knows exactly what to do.

Here's a diagram of how the memory layer works across sessions:

Every conversation feeds into a persistent memory store — preferences, past decisions, ongoing projects, writing style. Future conversations start smarter.

Context grows over time → latency goes up → if you don't manage it, auto-compaction kicks in and the system decides what to forget. You want manual control, not auto-compaction.

I solved this with OpenClaw's MemoryFilter — it automatically summarizes older conversations and only injects the most relevant recent context. Response times dropped back to normal. Lesson learned: curate your context window the same way you curate your inbox.

A Real Conversation — What This Actually Looks Like

Here's a real example from my own setup. I asked the agent to build and deploy something for me, straight from WhatsApp:

Hey dev, build a landing page for my agentic dev course and deploy it to Vercel." — It spun up a sub-agent, built a Next.js + Tailwind page, and deployed it. I didn't open my laptop.

And this is the research task — I asked for a comparison of the top 5 AI agent frameworks. It didn't just answer from memory. It spun up a sub-agent to pull real GitHub data:

Research the top 5 AI agent frameworks. Compare features, GitHub stars, community activity, and pricing. Send me a report." — Sub-agent started immediately.

5 Things I Actually Learned Building This

These aren't polished lessons I thought of later. These are things that surprised me in the middle of the build.

1. Decouple your Skills from your routing logic.
Each Skill should know how to do one thing — send a Telegram, send a WhatsApp.

2. Idempotency is non-negotiable.
Message delivery can fail halfway. I built every Skill with an acknowledgment — the Agent tracks whether a message was actually delivered before marking it done. Otherwise you get double-sends, and there's nothing more embarrassing than your bot sending the same message to your client three times.

3. Context windows need curation, not accumulation.
More memory isn't always better. After a few weeks, my agent's context had grown so large that responses were slow and sometimes confused. OpenClaw's MemoryFilter let me summarize old context and inject only what's relevant. Keep it lean.

4. Prompt structure beats model size.
With a well-designed JSON schema in the system prompt and a few clear rules, a small model outperformed much larger generic models. I was getting better routing decisions from a smaller model with a tight prompt than from a large model with a vague one. Write better prompts before reaching for a bigger model.

5. Permission gating should be in every high-stakes skill.
WhatsApp's rules forced me to add a confirmation step before sending. I'm genuinely grateful for that constraint — I've since applied the same pattern to email sends, invoice automations, and anything else that's hard to undo. If an action is hard to reverse, the agent should ask first.

Where This Goes Next

The system I described is a foundation. Once you have intent routing working across platforms, you can build on top of it:

Scheduled messages — "Remind the team about the deploy window tomorrow at 9 AM" becomes a scheduled job in OpenClaw's queue. No calendar app needed.

Cross-channel context bridging — A question comes in on Slack, the answer lives with someone on WhatsApp. The agent fetches it and relays it back. The platforms stop mattering.

Inbox digests — "What did I miss in the last two hours?" — the agent pulls from all connected platforms, de-duplicates, and gives you one clean summary.

OpenClaw makes all of this possible because every external service is just another pluggable Skill. You're not building a bot. You're building a kernel that sits underneath all your communication.

ClawCon Michigan

I did not attend ClawCon Michigan, but I’m excited to see how the OpenClaw ecosystem continues to grow and inspire developers worldwide.

Start Building

Here's your starting point. Clone the structure I described, build the Telegram Skill first, get one successful message through the Agent, then add WhatsApp.

The moment it sends your first automatically-formatted message to the right person on the right platform — without you telling it where to go — you'll understand what this is actually capable of.

The age of manually routing words between apps is ending. You now have the blueprint to build the thing that replaces it.

This is a submission for the OpenClaw Writing Challenge: Wealth of Knowledge

DEV Community