DEV Community

foxck016077
foxck016077

Posted on

An Apify Actor for Gmail inbox analytics â refresh-token-only OAuth, async router, per-feature quota

I just open-sourced an Apify Actor for Gmail inbox workflow analytics: apify-gmail-inbox-intel. It is not a scraper, not a bulk sender — it is an inbox analytics tool on gmail.readonly scope. This post is a design tour, not a tutorial.

If you have ever asked "which client thread did I forget to reply to?" or "what is my average reply turnaround?", this is the kind of workflow it covers.

Why an Apify Actor

I needed three things at once: serverless runtime, pay-per-result billing, and a real input schema. Apify gives me all of them without writing a backend. I get a hosted endpoint, dataset storage, a key-value store for state, and a developer audience that is already paying for actors.

The actor exposes four features through a single entrypoint:

  • thread_search — query Gmail threads by q, paginate, return metadata + message counts
  • reply_metrics — for each thread, compute reply-from-me, reply-from-others, last-reply age, SLA breach flag
  • summarizer — optional OpenAI LLM thread summary (BYO API key)
  • unread_digest — list unread threads in the last N hours, grouped by label

Design decision 1: refresh-token-only OAuth

The hardest call early on was OAuth. Two paths:

  1. 3-legged OAuth on the Actor side — Actor hosts callback URL, exchanges code, stores tokens.
  2. Refresh-token-only — user does the OAuth dance once on their own, hands me {refresh_token, client_id, client_secret} as Actor input.

I picked option 2. Reasons:

  • Apify Actors do not have a stable HTTPS callback URL per user. Each run is a job, not a server.
  • "We never store your Gmail tokens" is a far easier privacy story to defend.
  • I do not want to be the holder-of-secrets for someone else's mailbox.

In the Actor, the flow is:

# src/gmail_client.py — sketch
async def get_access_token(oauth_token: dict) -> str:
    resp = await httpx_client.post(
        "https://oauth2.googleapis.com/token",
        data={
            "grant_type": "refresh_token",
            "refresh_token": oauth_token["refresh_token"],
            "client_id": oauth_token["client_id"],
            "client_secret": oauth_token["client_secret"],
        },
    )
    return resp.json()["access_token"]
Enter fullscreen mode Exit fullscreen mode

Access token lives in memory only. Job end → process tears down → token gone. Best effort, but at least nothing persists in Apify storage with my code path.

Design decision 2: one async router, not four actors

Tempting to split into four actors. I did not, for two reasons:

  • Marketing surface area. One actor with four feature enum values gets one Store page, one rating, one review pile. Four actors split everything four ways.
  • Shared OAuth + shared quota. The token exchange, error handling, mask helpers, KVS quota — all reusable.

src/main.py is just a router:

FEATURES = {
    "thread_search": thread_search.run,
    "reply_metrics": reply_metrics.run,
    "summarizer": summarizer.run,
    "unread_digest": digest.run,
}

async def main():
    actor_input = await Actor.get_input() or {}
    feature = actor_input.get("feature")
    if feature not in FEATURES:
        raise ValueError(f"Unknown feature: {feature}")
    await FEATURES[feature](actor_input)
Enter fullscreen mode Exit fullscreen mode

Each feature module owns its own INPUT_SCHEMA.json semantics through the same shared file — the feature enum drives validation downstream in each handler.

Design decision 3: quota lives in Apify KVS

Free tier is 100 threads / month. That counter has to survive across runs. Apify KeyValueStore is the obvious home — no extra DB, persistent, scoped to the Actor.

# src/quota.py — sketch
async def check_and_increment(user_id: str, feature: str, n: int):
    kvs = await Actor.open_key_value_store()
    key = f"quota/{user_id}/{month_key()}/{feature}"
    used = (await kvs.get_value(key)) or 0
    if used + n > FREE_LIMIT:
        raise QuotaExceeded(feature, used, FREE_LIMIT)
    await kvs.set_value(key, used + n)
Enter fullscreen mode Exit fullscreen mode

Month roll-over is a string key by year-month — no cron, no migration, no drift. Pro tier flips a flag and skips the check entirely.

Tests

Six pytest tests, asyncio_mode = auto in pytest.ini. Coverage:

  • Router rejects unknown feature
  • Each of 4 features short-circuits cleanly in dry_run=True
  • Quota raises after limit, allows under
[pytest]
asyncio_mode = auto
Enter fullscreen mode Exit fullscreen mode

That tiny config line is the difference between "6 tests pass" and "6 tests error: missing event loop". Learned it the hard way.

Pricing model

  • Free: 100 threads / month
  • Pro: $19 / month (5000 threads metadata + 100 LLM summaries)
  • Pay-per-result add-on: $0.50 / 1,000 thread metadata, $0.005 / summary

Apify handles billing. I handle code.

What I would do differently

  • Webhook trigger — right now unread_digest runs on demand. A scheduled trigger + Slack/Discord delivery is the obvious next product.
  • Label-level rules — reply_metrics is global. A per-label SLA matrix would be more useful for sales teams.
  • Multi-account fan-out — one run, multiple OAuth tokens, one combined dataset.

Code

If you build automation workflows alongside this kind of inbox tooling, I keep a small Gumroad with practical n8n templates (lead auto-responder, content pipeline, competitor monitor): https://foxck.gumroad.com. Not required, just adjacent.

Happy to take feedback on the OAuth-only design — was there a reason to go full 3-legged that I am missing?

Top comments (0)