This is a submission for the OpenClaw Challenge.
What I Built
I'm a second year computer engineering student.I often joke that my degree runs on Whatsapp.Often most students dont attend lectures but just see what someone has uploaded on whatsapp and do their work from there. I'm going to be honest with you about something embarrassing: I missed a lab submission last semester not because I didn't do the work, but because the deadline came through a WhatsApp group at 11 PM and I just didn't see it in time.
The professor sent a reminder email three days before. Someone in the group chat forwarded it with "guys don't forget!!". Someone else sent a voice note I never opened. By the time I remembered, the portal was closed.
This is not a unique experience. If you're a student in India — or honestly anyone whose professional life runs partially through WhatsApp — you know that deadlines don't live in one place. They're fragmented across group chats, Gmail threads, DMs, and the occasional Instagram message from a friend who remembered you hadn't registered for something yet.
The "fix" people suggest is: "just use Google Calendar." Sure. Do you manually create an event every time someone texts you about something? I don't. Nobody does.
So here comes our hero of the story OpenClaw.
How I Used OpenClaw
I've played with n8n, with custom Python cron scripts, with Zapier. None of them had native WhatsApp access without either paying for a business API (which requires a separate phone number and a whole approval process) or running a fragile Selenium scraper.
OpenClaw has WhatsApp built into its Gateway via Baileys. One command, one QR scan. Done. The agent immediately has access to every message I receive, treated as a first-class channel.
The other thing that made OpenClaw the right fit was Standing Orders. This is not a feature you'd expect to care about until you actually use it. The idea is simple: you write a file called AGENTS.md in your workspace and the agent reads it every session. You define programs — "here's what you're authorized to do, here's when to do it, here's when to stop and ask me."
For OpenClaw, the Standing Order looks roughly like this:
## Program: WhatsApp Deadline Monitor
**Authority:** Read all inbound WhatsApp messages. Extract and store deadline/event data.
Send review messages back to the user for ambiguous extractions.
**Trigger:** Every inbound WhatsApp message
**Approval gate:** Auto-write calendar for confidence ≥ 0.80. Request approval below that.
**Escalation:** If extraction fails 3 times in a row, alert me and pause.
This single file is why it is an agent and not a script. It has defined scope, defined escalation rules, defined approval gates. It knows when to act and when to ask. I wrote those rules once and now they govern every message that comes through, forever, without me thinking about it again.
Demo
So I built ChronoAgent.
ChronoAgent runs in the background on my laptop as an OpenClaw Gateway daemon. It has a WhatsApp channel connected (OpenClaw supports this natively through Baileys — no external service, no webhook nonsense, just scan a QR code). Every message that comes in gets silently processed. If it contains a deadline, a due date, a meeting, an exam — it gets extracted, deduplicated, and written to Google Calendar automatically.
I don't prompt it. I don't open a chat window. It just runs.
When it adds something to my calendar, it sends me a message back: "📅 Added: Assignment 3 submission on April 28, 11:59 PM". I can reply NO to undo it. That's the entire user interaction for 80% of cases.
For things it's less sure about — "sometime next week we should meet bro" — it holds the event in a pending queue and asks me to confirm with a YES or NO reply. No calendar write until I say so.
Source code and setup instructions: github.com/Labreo/openclaw-calendar-agent
What I Learned
Here's the mistake I almost made: trying to have the LLM do everything.
My first draft had the LLM reading messages, extracting dates, checking the calendar for duplicates, deciding whether to write, writing the event, formatting the confirmation message — all in one big chain of reasoning.
It was slow. It was expensive. And it hallucinated duplicates constantly.
The version that actually works is much dumber-looking on the surface, but it's solid:
Incoming Message
│
▼
[Python: normalize to standard envelope]
{ source, sender, timestamp, raw_text }
│
▼
[LLM: ONLY job is translation]
Input: raw text + today's date
Output: JSON array of extracted events with confidence scores
│
▼
[Python: resolve relative dates]
"next Friday" → 2026-04-25T00:00:00
│
▼
[Python: deduplicate]
Fuzzy title match + date proximity + semantic similarity
│
▼
[Confidence router]
≥ 0.80 → write to calendar, notify me
0.50–0.79 → pending queue, ask me
< 0.50 → silent discard
The LLM's job is exactly one thing: read messy human text and output clean structured JSON. That's it. Every other decision in the pipeline is deterministic Python.
The extraction prompt that actually worked
Getting the LLM to reliably output JSON (and only JSON) took more iteration than I expected. The trick was treating the model like a data parser in the system prompt, not like a conversationalist:
EXTRACTION_SYSTEM_PROMPT = """You are a deadline and event extraction engine.
Your ONLY output is a valid JSON array. No prose. No markdown. No explanation.
Given a message, extract every actionable deadline, due date, meeting, or event.
For each item output:
{
"title": string,
"date_raw": string, // verbatim from the message
"date_iso": string|null, // resolved ISO 8601 if possible, else null
"confidence": float, // 0.0 to 1.0
"source_quote": string, // the exact fragment that contains the deadline
"event_type": string // deadline | meeting | exam | submission | event | other
}
Confidence guide:
1.0 — explicit date + time + clear action
0.85 — explicit date, no time
0.70 — relative date that can be resolved
0.55 — vague but likely actionable
0.30 — might be an event, very unclear
0.10 — references a past deadline
If NO events found, return: []
Today's date is injected in the user message."""
The source_quote field was an afterthought that turned out to be the most useful field in the whole schema. Every Google Calendar event created by ChronoAgent has the original message fragment in its description. When I look at a calendar event three weeks later and don't remember what it's about, I can see "Original: 'bro the quiz is moved to Thursday 10am right?' — Source: WhatsApp, Sender: Abdullah". That's enough context.
The deduplication problem
This was the hardest part of the whole project by a significant margin.
The naive approach: ask the LLM "is this event already in my calendar?" Tried it. Terrible. Slow, expensive, and the model would confidently say "no duplicate found" when there obviously was one.
The approach that works is a 2-out-of-3 vote between three deterministic checks:
- Fuzzy title match (Levenshtein ratio ≥ 0.85)
- Date proximity (within ±24 hours)
-
Semantic similarity (cosine similarity using a local
sentence-transformersmodel, threshold 0.80)
def is_duplicate(candidate, existing, embeddings_cache):
votes = 0
# Check 1: title similarity
if lev_ratio(candidate["title"].lower(), existing["title"].lower()) >= 0.85:
votes += 1
# Check 2: date proximity
if candidate.get("date_iso") and existing.get("date_iso"):
a = datetime.fromisoformat(candidate["date_iso"])
b = datetime.fromisoformat(existing["date_iso"])
if abs((a - b).total_seconds()) <= 86400: # 24 hours
votes += 1
# Check 3: semantic similarity
vec_a = get_or_compute_embedding(candidate["title"], embeddings_cache)
vec_b = get_or_compute_embedding(existing["title"], embeddings_cache)
if cosine_similarity(vec_a, vec_b) >= 0.80:
votes += 1
return votes >= 2
"Assignment 3 due Friday" and "A3 submission end of week" — different titles, so Levenshtein fails. But they're semantically similar and the dates are within 24 hours, so they correctly merge as duplicates.
The semantic model I used (all-MiniLM-L6-v2) is 80MB and runs locally. No API call, no cost, ~10ms inference time. The LLM is completely out of the loop for dedup.
The 3 AM quota problem (a real thing that happened)
Midway through building this, I burnt through my initial API credits testing the extraction pipeline on WhatsApp group traffic. A college group chat is... a lot of messages. Most of them "ok", "haha", "send notes pls" — but the extraction call fires on all of them before it knows they're non-events.
I had two options: add pre-filtering, or switch to a cheaper model.
I did both. Added a simple length and keyword pre-filter that drops messages under 15 words with no date-adjacent terms before they even hit the LLM. Dropped API usage by about 70% immediately.
Then switched from Claude Sonnet to Gemini 2.5 Flash for the extraction step. Flash is significantly cheaper for this kind of structured output task and the JSON reliability was equivalent in my testing. I still use Claude for anything that requires more complex reasoning — the confidence routing logic and the digest generation — but for the high-volume extraction pass, Flash works well.
The lesson: the best agentic systems aren't the ones throwing the strongest model at every task. They're the ones with smart orchestration about when to call what.
What "passive operation" actually means in practice
The phrase "runs in the background" is easy to say and actually kind of hard to build well.
OpenClaw's Gateway is a daemon — you run openclaw onboard --install-daemon and it sets up a launchd/systemd service that starts automatically on login and stays running. The WhatsApp channel maintains a persistent Baileys session. Cron jobs handle the Gmail polling every 4 hours.
The Standing Order hook fires on every inbound WhatsApp message without any scheduler. The agent just... responds to events. It's not polling. It's not a loop. It's closer to how a web server handles requests — always listening, processes when something arrives, goes quiet otherwise.
The Gmail part is a Python script (ingest_email.py) that the OpenClaw cron calls every 4 hours:
openclaw cron add \
--name email-ingestion \
--cron "0 */4 * * *" \
--timeout-seconds 120 \
--message "Execute email ingestion per standing orders. Run ingest_email.py, process queue, run extraction and dedup on each envelope, route by confidence, write calendar entries, report summary."
The cron message references the Standing Order rather than duplicating the logic. The agent reads its AGENTS.md, knows the full procedure, and executes it. I don't need to write a long prompt into the cron command because the authority is already defined.
What I'd build next if I had more time
The biggest gap right now is thread context. When someone says "bro don't forget about the thing on Friday" — that's a 0.3 confidence extraction at best. But if I had the last 5 messages of that WhatsApp thread, I'd probably know what "the thing" is.
OpenClaw's session history tools make this possible but I didn't have time to implement it cleanly before the deadline. It's the next feature.
The other thing is cross-source entity resolution. Right now if the same event appears in email and WhatsApp, the dedup engine usually catches it. But it runs independently per message — there's no weekly "let me look at everything I've collected and find clusters" pass. I have a consolidation script written but not wired into the cron yet.
ClawCon Michigan
I didn't attend ClawCon Michigan this time around, but building ChronoAgent has made me want to.
The honest summary
ChronoAgent is not a product. It's a personal tool I built because I was genuinely failing at calendar management and the existing solutions didn't work for how I actually communicate (mostly WhatsApp, some email, basically no structured calendar input).
OpenClaw made the WhatsApp part possible without fighting through Business API approvals. The Standing Orders made the "passive, always-on, doesn't need prompting" part possible without writing a custom daemon. The exec tool made it easy to keep the heavy lifting in plain Python scripts that I can test independently.
The thing I keep coming back to from this project: the right job for the LLM in an agentic system is usually much smaller than you initially think. Translation, intent recognition, natural language output — yes. Calendar math, deduplication logic, date resolution — no. The moment I moved those out of the LLM and into deterministic code, the whole system became faster, cheaper, and more reliable.
Since I set it up, I've had zero missed deadline incidents. That's the only metric that matters.









Top comments (1)
Thank you for checking my post out.This was definitely a lengthy one.Excited to participate in more dev challenges in the future.