I just open-sourced an Apify Actor for Gmail inbox workflow analytics: apify-gmail-inbox-intel. It is not a scraper, not a bulk sender â it is an inbox analytics tool on gmail.readonly scope. This post is a design tour, not a tutorial.
If you have ever asked "which client thread did I forget to reply to?" or "what is my average reply turnaround?", this is the kind of workflow it covers.
Why an Apify Actor
I needed three things at once: serverless runtime, pay-per-result billing, and a real input schema. Apify gives me all of them without writing a backend. I get a hosted endpoint, dataset storage, a key-value store for state, and a developer audience that is already paying for actors.
The actor exposes four features through a single entrypoint:
-
thread_searchâ query Gmail threads byq, paginate, return metadata + message counts -
reply_metricsâ for each thread, compute reply-from-me, reply-from-others, last-reply age, SLA breach flag -
summarizerâ optional OpenAI LLM thread summary (BYO API key) -
unread_digestâ list unread threads in the last N hours, grouped by label
Design decision 1: refresh-token-only OAuth
The hardest call early on was OAuth. Two paths:
- 3-legged OAuth on the Actor side â Actor hosts callback URL, exchanges code, stores tokens.
-
Refresh-token-only â user does the OAuth dance once on their own, hands me
{refresh_token, client_id, client_secret}as Actor input.
I picked option 2. Reasons:
- Apify Actors do not have a stable HTTPS callback URL per user. Each run is a job, not a server.
- "We never store your Gmail tokens" is a far easier privacy story to defend.
- I do not want to be the holder-of-secrets for someone else's mailbox.
In the Actor, the flow is:
# src/gmail_client.py â sketch
async def get_access_token(oauth_token: dict) -> str:
resp = await httpx_client.post(
"https://oauth2.googleapis.com/token",
data={
"grant_type": "refresh_token",
"refresh_token": oauth_token["refresh_token"],
"client_id": oauth_token["client_id"],
"client_secret": oauth_token["client_secret"],
},
)
return resp.json()["access_token"]
Access token lives in memory only. Job end â process tears down â token gone. Best effort, but at least nothing persists in Apify storage with my code path.
Design decision 2: one async router, not four actors
Tempting to split into four actors. I did not, for two reasons:
- Marketing surface area. One actor with four
featureenum values gets one Store page, one rating, one review pile. Four actors split everything four ways. - Shared OAuth + shared quota. The token exchange, error handling, mask helpers, KVS quota â all reusable.
src/main.py is just a router:
FEATURES = {
"thread_search": thread_search.run,
"reply_metrics": reply_metrics.run,
"summarizer": summarizer.run,
"unread_digest": digest.run,
}
async def main():
actor_input = await Actor.get_input() or {}
feature = actor_input.get("feature")
if feature not in FEATURES:
raise ValueError(f"Unknown feature: {feature}")
await FEATURES[feature](actor_input)
Each feature module owns its own INPUT_SCHEMA.json semantics through the same shared file â the feature enum drives validation downstream in each handler.
Design decision 3: quota lives in Apify KVS
Free tier is 100 threads / month. That counter has to survive across runs. Apify KeyValueStore is the obvious home â no extra DB, persistent, scoped to the Actor.
# src/quota.py â sketch
async def check_and_increment(user_id: str, feature: str, n: int):
kvs = await Actor.open_key_value_store()
key = f"quota/{user_id}/{month_key()}/{feature}"
used = (await kvs.get_value(key)) or 0
if used + n > FREE_LIMIT:
raise QuotaExceeded(feature, used, FREE_LIMIT)
await kvs.set_value(key, used + n)
Month roll-over is a string key by year-month â no cron, no migration, no drift. Pro tier flips a flag and skips the check entirely.
Tests
Six pytest tests, asyncio_mode = auto in pytest.ini. Coverage:
- Router rejects unknown feature
- Each of 4 features short-circuits cleanly in
dry_run=True - Quota raises after limit, allows under
[pytest]
asyncio_mode = auto
That tiny config line is the difference between "6 tests pass" and "6 tests error: missing event loop". Learned it the hard way.
Pricing model
- Free: 100 threads / month
- Pro: $19 / month (5000 threads metadata + 100 LLM summaries)
- Pay-per-result add-on: $0.50 / 1,000 thread metadata, $0.005 / summary
Apify handles billing. I handle code.
What I would do differently
-
Webhook trigger â right now
unread_digestruns on demand. A scheduled trigger + Slack/Discord delivery is the obvious next product. -
Label-level rules â
reply_metricsis global. A per-label SLA matrix would be more useful for sales teams. - Multi-account fan-out â one run, multiple OAuth tokens, one combined dataset.
Code
- Repo: https://github.com/foxck016077/apify-gmail-inbox-intel
- License: MIT
- Actor manifest:
.actor/actor.json+INPUT_SCHEMA.jsonif you want to fork
If you build automation workflows alongside this kind of inbox tooling, I keep a small Gumroad with practical n8n templates (lead auto-responder, content pipeline, competitor monitor): https://foxck.gumroad.com. Not required, just adjacent.
Happy to take feedback on the OAuth-only design â was there a reason to go full 3-legged that I am missing?
Top comments (0)