DEV Community

Cover image for I found the dumbest way to burn 500 LLM calls a day: polling an inbox every 5 minutes
Lars Winstand
Lars Winstand

Posted on • Originally published at standardcompute.com

I found the dumbest way to burn 500 LLM calls a day: polling an inbox every 5 minutes

If your OpenClaw agent checks an email inbox every 5 minutes, you’re probably paying for idle paranoia.

That’s not a theoretical complaint. In an r/openclaw thread about triggering jobs from email, one user described an MS365 setup like this:

"At the moment, I have Openclaw job where agent checks its ms365 mailbox every 5 minutes... Wasted calls to LLM (nearly 500 calls to LLM per day)"

That is such a painfully real failure mode.

The demo works. The cron job looks harmless. Then a month later your agent is re-checking old mail, occasionally double-processing messages, and quietly spending model calls on nothing.

If you’re building always-on agents, this is exactly the kind of bug that turns “cool automation” into “why is this thing flaky and expensive?”

The pattern everyone starts with

Usually it looks like this:

  1. Connect OpenClaw to a mailbox
  2. Poll every 5 minutes with IMAP or Microsoft Graph
  3. If there’s a new message, send it to GPT-5.4, Claude Opus 4.6, or whatever model you’re using
  4. Try not to process the same email twice

For a proof of concept, that’s fine.

If it’s one internal mailbox, low volume, and you have a tiny dedupe store in SQLite, polling can be good enough.

But once the workflow matters, polling starts failing in boring and expensive ways:

  • you keep checking when nothing changed
  • you burn LLM calls on already-seen messages
  • you introduce delays by design
  • you get duplicate processing when scans overlap
  • you miss messages when state gets out of sync

Another user in that same r/openclaw discussion put it even more bluntly:

"I abandoned the interval based scanning... if the scan got out of sync I had repeated responses (more wasted calls) or ignored mails. I failed to get it to be reliable."

That’s the actual problem.

Polling doesn’t just waste money. It makes the agent feel unreliable.

And unreliable is worse than expensive.

Microsoft and Google are both telling you to stop polling

This part is worth emphasizing: the anti-polling advice is not just random architecture purism.

Microsoft Graph supports change notifications so apps can react to mailbox changes instead of hammering the API on a timer.

Gmail push notifications exist for the same reason. Google says push eliminates the extra network and compute cost of polling resources to see if they changed.

If both mailbox providers are nudging you toward push, that’s a clue.

What production intake should look like

There are a few sane ways to do inbound email for agents:

  • Gmail API watch + Google Cloud Pub/Sub
  • Microsoft Graph change notifications
  • Twilio SendGrid Inbound Parse Webhook
  • an email-native service like AgentMail

The common idea is simple:

The provider tells your system that mail arrived.

Your system does not keep asking if anything changed.

Gmail: watch the inbox instead of polling it

For Gmail, the production path is Gmail API watch on the inbox, then Pub/Sub delivers notifications to your webhook.

Example request:

POST https://gmail.googleapis.com/gmail/v1/users/me/watch
Content-Type: application/json
Authorization: Bearer <access_token>

{
  "topicName": "projects/myproject/topics/mytopic",
  "labelIds": ["INBOX"],
  "labelFilterBehavior": "INCLUDE"
}
Enter fullscreen mode Exit fullscreen mode

Google returns a history ID and an expiration time.

That means two things:

  1. you need to process changes based on history
  2. you need to renew the watch before it expires

This is cleaner than polling, but it is not zero-maintenance.

You still need:

  • a Pub/Sub topic
  • a subscription
  • IAM configured correctly
  • watch renewal logic

If you skip the lifecycle work, your “event-driven” setup becomes a very fancy outage.

Microsoft 365: use Graph change notifications

For Microsoft 365, use Microsoft Graph subscriptions for Outlook messages.

Example subscription:

POST https://graph.microsoft.com/v1.0/subscriptions
Content-Type: application/json
Authorization: Bearer <access_token>

{
  "changeType": "created",
  "notificationUrl": "https://your-app.example.com/webhooks/graph",
  "resource": "/me/mailFolders('Inbox')/messages",
  "expirationDateTime": "2026-05-03T00:00:00Z",
  "clientState": "openclaw-mailbox-prod"
}
Enter fullscreen mode Exit fullscreen mode

You need to handle:

  • webhook validation
  • subscription renewal
  • clientState verification
  • dedupe after notification delivery

Again: more setup than polling, much better behavior in production.

SendGrid is the cleanest mental model

If you want the simplest model for inbound email to HTTP, SendGrid Inbound Parse is hard to beat.

Email arrives.

SendGrid parses it.

SendGrid POSTs the content to your endpoint.

Minimal example in Node:

import express from "express";

const app = express();
app.use(express.urlencoded({ extended: true }));
app.use(express.json());

app.post("/inbound-email", async (req, res) => {
  const messageId = req.body.headers?.match(/Message-ID: (.+)/i)?.[1] || req.body.message_id;
  const from = req.body.from;
  const subject = req.body.subject;
  const text = req.body.text;

  // 1. dedupe check
  // 2. persist event
  // 3. enqueue background processing

  console.log({ messageId, from, subject, text });

  res.status(200).send("ok");
});

app.listen(3000, () => {
  console.log("Listening on :3000");
});
Enter fullscreen mode Exit fullscreen mode

The nice part is the delivery contract.

If your endpoint returns 5XX, SendGrid retries.
If your endpoint returns 2XX, retries stop.

That is a much sharper failure model than “cron ran, maybe.”

There are constraints:

  • total message size limit
  • dedicated receiving subdomain setup
  • MX record configuration

Still better than burning cycles forever because polling was easier on day one.

n8n helps, but it does not magically fix polling

This comes up a lot: “Can’t I just use n8n?”

You can absolutely use n8n to improve the workflow.

But if you use the n8n Email Trigger over IMAP, you are still doing mailbox-checking infrastructure. It’s just nicer mailbox-checking infrastructure.

That matters.

n8n gives you useful features like:

  • mailbox selection
  • mark as read
  • attachment handling
  • custom search rules
  • reconnect controls

That is a lot better than a hand-rolled cron script.

But it does not change the trigger model.

If the source of truth is still “go ask the mailbox if anything happened,” you still have polling-shaped failure modes.

Polling vs push

Here’s the tradeoff in plain English:

Approach What you’re really signing up for
Poll mailbox with IMAP or cron Easy setup, delayed reactions, duplicate checks, wasted model calls, awkward dedupe logic
n8n Email Trigger (IMAP) Better operational ergonomics, but still polling underneath
Gmail watch / Graph notifications / SendGrid webhook More setup, much lower idle waste, faster reactions, better delivery semantics

This is not really “simple vs advanced.”

It’s demo-friendly vs production-friendly.

What your OpenClaw email pipeline should actually do

If I were building this today, I’d split it into two layers.

Layer 1: intake

Pick one:

  • SendGrid Inbound Parse if you want email -> HTTP
  • Gmail watch + Pub/Sub if you’re on Google Workspace
  • Microsoft Graph notifications if you’re on Microsoft 365
  • n8n IMAP only for a fast proof of concept

Layer 2: idempotent processing

No matter how the event arrives, your OpenClaw job should:

  1. extract a stable message ID
  2. check a dedupe store before calling any model
  3. persist processing state
  4. acknowledge receipt quickly
  5. do the expensive work asynchronously

That last point is where people get into trouble.

Do not do all processing inside the webhook request.

Accept the event.
Store it.
Deduplicate it.
Then hand it off.

That’s how you survive retries without duplicate replies.

A minimal queue-based pattern

Here’s a practical shape for the service:

email-webhook -> postgres(inbox_events) -> job queue -> OpenClaw worker -> reply/send action
Enter fullscreen mode Exit fullscreen mode

Pseudo-schema:

create table inbox_events (
  id bigserial primary key,
  provider text not null,
  external_message_id text not null,
  received_at timestamptz not null default now(),
  payload jsonb not null,
  processing_status text not null default 'pending',
  unique(provider, external_message_id)
);
Enter fullscreen mode Exit fullscreen mode

Worker logic:

async function processInboxEvent(event) {
  const existing = await db.findByProviderAndMessageId(
    event.provider,
    event.external_message_id
  );

  if (!existing) {
    throw new Error("missing event");
  }

  if (existing.processing_status === "done") {
    return;
  }

  await db.markProcessing(existing.id);

  const result = await runOpenClawAgent({
    email: existing.payload
  });

  await db.saveResult(existing.id, result);
  await db.markDone(existing.id);
}
Enter fullscreen mode Exit fullscreen mode

That is much less exciting than prompt tricks.

It is also the difference between a system that feels solid and one that occasionally replies twice at 3 AM.

The cost side gets ugly fast

If your agent is always on, wasted checks become real money or real usage pressure.

This is where pricing model matters.

Per-token billing makes polling bugs feel worse because every pointless re-check and duplicate pass looks like another tiny leak. You start optimizing prompts and reducing context not because it improves quality, but because you’re trying to contain operational sloppiness.

That’s backwards.

If you’re running OpenClaw agents continuously, predictable flat-rate compute is a much better fit than watching token spend all day. Standard Compute is built for exactly that: OpenAI-compatible API access for OpenClaw agents, flat monthly pricing, and dynamic routing across models like GPT-5.4, Claude Opus 4.6, and Grok 4.20.

So yes, fix the architecture first.

But also: if your agents run 24/7, stop pairing always-on automation with pricing that punishes every extra call.

When polling is still okay

Polling is not always wrong.

Use it when:

  • you have one internal mailbox
  • volume is low
  • a few minutes of delay is fine
  • you have dedupe in SQLite or Postgres
  • nobody will care if you rebuild it later

That is a proof of concept.

Just be honest that it is a proof of concept.

The mistake is pretending that a polling loop is production architecture for a customer-facing or always-on agent.

It isn’t.

The actual line between toy and production

The interesting distinction is not whether OpenClaw can read email.

Of course it can.

The distinction is:

  • how the email arrives
  • whether processing is idempotent after it arrives

A toy automation asks the mailbox every few minutes if anything happened.

A production agent gets an event, validates it, records it once, and processes it once.

That sounds boring.

It’s also the difference between “works in a demo” and “still works three months later.”

If your OpenClaw workflow still polls an inbox every 5 minutes, I wouldn’t call it broken.

I’d call it unfinished.

And once you’ve seen nearly 500 LLM calls per day wasted on mailbox checks, it’s hard to unsee.

Top comments (0)