Microsoft Graph API — What the Docs Don't Tell You About OneNote Rate Limiting

#microsoftgraph #webdev #cloudflarechallenge #api

I've been building Note Bridge for a while now — a tool that migrates OneNote notebooks to Notion. Going in, I assumed the hardest part would be content conversion — correctly translating OneNote's complex HTML into Notion. Turns out, dealing with Microsoft's rate limiting ended up taking even more time.

This isn't a "here is a backoff snippet, paste it in" post. The Graph API has enough edge cases that a generic retry loop won't save you — you eventually need to model the limit system, not just react to 429s. Here's what I learned.

1. Rate Limits Across 3 × 2 Dimensions

OneNote's rate limiting isn't a simple cap. It has three dimensions — per-minute, per-hour, and concurrent requests — and each of those is enforced at two scopes: per-user and per-app [1]. Multiply them out and you get six separate limits to respect, or 429s come knocking immediately.

This means the common polynomial backoff strategy isn't enough: you might not be hitting the per-minute limit, but you're tripping the per-hour ceiling. Or a single user is fine, but the total across all users hits the app-wide cap. Each dimension needs its own accounting.

That said, real-world experience shows the actual limits seem more generous than what the docs specify. But we code to the spec anyway — if Microsoft ever decides to enforce strictly, we don't want things to break.

2. No Retry-After Header

Most well-designed APIs include a Retry-After header with their 429 responses so you know exactly when it's safe to retry. Not sure if it's because the mechanism is too complex, but the OneNote Graph API doesn't support this.

Without Retry-After, simple backoff strategies don't cut it. You can increase wait time after each 429, but there's no guarantee the next attempt won't get throttled too. For a production application, this is a real problem. The only robust solution is to implement a rate limiter that follows Microsoft's spec — we built ours using Cloudflare Durable Objects, tracking usage across every dimension and implementing our own Retry-After for both frontend and backend to consume.

3. The Two Headers Almost Nobody Uses

After we built the limiter, we discovered Graph actually does ship two response headers that help — they're just easy to miss because they're not on the 429 path.

x-ms-throttle-limit-percentage — present on successful responses. It tells you how close you are to a limit (0.0–1.0). Above ~0.8 is a yellow zone; at 1.0 the next request will probably 429.
x-ms-throttle-scope — present on 429 responses. Values are User, Application, or both. This tells you which of the two scopes you tripped.

These changed the design. The first lets us trip a soft circuit breaker before getting a 429, instead of waiting to be told. The second lets us route the 429 to the right breaker — if it's an application-scope trip, slowing one user down doesn't help; we have to slow every user down. If it's user-scope, the other tenants are fine to keep going.

If you only react to 429 status codes, you're flying half-blind on a spec-compliant rate limiter.

4. Batch API Doesn't Really Help

Microsoft Graph offers a batch mechanism to bundle multiple requests into one HTTP call. Intuitively this should count as a single request, but it doesn't. Each inner request in the batch is counted individually against rate limits [2].

This makes $batch largely pointless for our use case — at most it saves a bit of round-trip time.

There's also a subtler trap. A $batch POST can succeed with HTTP 200 at the outer level, while individual sub-responses inside it return 429, 502, or 503. We watched a production migration silently drop two sections because our retry loop only retried the outer batch — sub-response 5xx errors fell on the floor. If you use $batch, the retry strategy has to operate on each sub-response, not the envelope.

5. The Death Spiral

The most painful incident we ran into wasn't 429s — it was what happened after the 429s. A user kicked off a 1,300-page migration. The first 200 pages went through fine. Then a 429. Our consumer caught it, asked Cloudflare Queues for a 30-second retry. So far so reasonable.

But because all 1,100 remaining messages were already enqueued, every one of them woke up around the same time after that 30-second wait, slammed back into the rate limit, and got 429'd again. Each new wave of 429s extended the throttle window further. The graph showed a beautiful self-reinforcing loop: rate limit → retries → rate limit → more retries. No page made forward progress for half an hour.

Two fixes turned out to matter:

A real circuit breaker at the limiter, not at each call site. When the upstream returns sustained 429s, the limiter trips and every caller gets 429'd locally for a cooldown window. This collapses N independent retry loops into one shared wait, instead of N callers each doing 30 retries × 5 minutes.
Foreground vs. background separation. Note Bridge has two workloads hitting Graph: interactive notebook scans (a user is waiting) and background migration jobs (no one is staring at the screen). If a 1,300-page migration drains the rate budget, the user trying to scan their notebook waits forever. Our fix is two rate limiter "lanes" with the background lane capped at ~50% of the budget. Foreground always has headroom.

6. The Subtle retryAfterMs Bug

One worth calling out because it took us a while to see. When the limiter checks two layers (per-user and per-app) and both are saturated, what retryAfterMs should it return to the caller?

We had Math.min(userRetry, appRetry) — return the shorter wait, sounds friendly. It was wrong. If user-scope says "wait 2 seconds" but app-scope says "wait 30 seconds", retrying after 2 seconds gets you another instant 429. The right answer is Math.max(userRetry, appRetry) — the request is only safe to send when both layers have capacity. A one-character fix that ended a category of phantom retries.

Bottom Line

Most rate-limiting tutorials end at "retry with exponential backoff and jitter." For Microsoft Graph that's not enough. You need to:

Model the 3 × 2 limit matrix instead of guessing.
Read x-ms-throttle-limit-percentage and trip a soft breaker before you 429.
Route 429s using x-ms-throttle-scope so user trips don't punish the whole app (and vice versa).
Retry inside $batch sub-responses, not just on the envelope.
Use a shared circuit breaker, or your retry storms will outlast the actual throttle window.
Separate foreground from background so interactive UX doesn't starve.

Microsoft Graph's lack of Retry-After is the headline annoyance, but the deeper lesson is that rate limiting is a system, not a header. Build it like one.