<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Ensky</title>
    <description>The latest articles on DEV Community by Ensky (@ensky).</description>
    <link>https://dev.to/ensky</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F425965%2Ffb4560d9-4acd-4b6a-aeaf-9694501b3f97.jpg</url>
      <title>DEV Community: Ensky</title>
      <link>https://dev.to/ensky</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/ensky"/>
    <language>en</language>
    <item>
      <title>Microsoft Graph API — What the Docs Don't Tell You About OneNote Rate Limiting</title>
      <dc:creator>Ensky</dc:creator>
      <pubDate>Tue, 12 May 2026 13:00:00 +0000</pubDate>
      <link>https://dev.to/ensky/microsoft-graph-api-what-the-docs-dont-tell-you-about-onenote-rate-limiting-2lgl</link>
      <guid>https://dev.to/ensky/microsoft-graph-api-what-the-docs-dont-tell-you-about-onenote-rate-limiting-2lgl</guid>
      <description>&lt;p&gt;I've been building &lt;a href="https://note-bridge.co/" rel="noopener noreferrer"&gt;Note Bridge&lt;/a&gt; for a while now — a tool that migrates OneNote notebooks to Notion. Going in, I assumed the hardest part would be content conversion — correctly translating OneNote's complex HTML into Notion. Turns out, dealing with Microsoft's rate limiting ended up taking even more time.&lt;/p&gt;

&lt;p&gt;This isn't a "here is a backoff snippet, paste it in" post. The Graph API has enough edge cases that a generic retry loop won't save you — you eventually need to model the limit system, not just react to 429s. Here's what I learned.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Rate Limits Across 3 × 2 Dimensions
&lt;/h2&gt;

&lt;p&gt;OneNote's rate limiting isn't a simple cap. It has three dimensions — per-minute, per-hour, and concurrent requests — and each of those is enforced at two scopes: per-user and per-app [1]. Multiply them out and you get six separate limits to respect, or 429s come knocking immediately.&lt;/p&gt;

&lt;p&gt;This means the common polynomial backoff strategy isn't enough: you might not be hitting the per-minute limit, but you're tripping the per-hour ceiling. Or a single user is fine, but the total across all users hits the app-wide cap. Each dimension needs its own accounting.&lt;/p&gt;

&lt;p&gt;That said, real-world experience shows the actual limits seem more generous than what the docs specify. But we code to the spec anyway — if Microsoft ever decides to enforce strictly, we don't want things to break.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. No Retry-After Header
&lt;/h2&gt;

&lt;p&gt;Most well-designed APIs include a &lt;code&gt;Retry-After&lt;/code&gt; header with their 429 responses so you know exactly when it's safe to retry. Not sure if it's because the mechanism is too complex, but the OneNote Graph API doesn't support this.&lt;/p&gt;

&lt;p&gt;Without &lt;code&gt;Retry-After&lt;/code&gt;, simple backoff strategies don't cut it. You can increase wait time after each 429, but there's no guarantee the next attempt won't get throttled too. For a production application, this is a real problem. The only robust solution is to implement a rate limiter that follows Microsoft's spec — we built ours using Cloudflare Durable Objects, tracking usage across every dimension and implementing our own &lt;code&gt;Retry-After&lt;/code&gt; for both frontend and backend to consume.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. The Two Headers Almost Nobody Uses
&lt;/h2&gt;

&lt;p&gt;After we built the limiter, we discovered Graph actually does ship two response headers that help — they're just easy to miss because they're not on the 429 path.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;x-ms-throttle-limit-percentage&lt;/code&gt;&lt;/strong&gt; — present on &lt;em&gt;successful&lt;/em&gt; responses. It tells you how close you are to a limit (0.0–1.0). Above ~0.8 is a yellow zone; at 1.0 the next request will probably 429.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;x-ms-throttle-scope&lt;/code&gt;&lt;/strong&gt; — present on 429 responses. Values are &lt;code&gt;User&lt;/code&gt;, &lt;code&gt;Application&lt;/code&gt;, or both. This tells you &lt;em&gt;which&lt;/em&gt; of the two scopes you tripped.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These changed the design. The first lets us trip a soft circuit breaker &lt;em&gt;before&lt;/em&gt; getting a 429, instead of waiting to be told. The second lets us route the 429 to the right breaker — if it's an application-scope trip, slowing one user down doesn't help; we have to slow every user down. If it's user-scope, the other tenants are fine to keep going.&lt;/p&gt;

&lt;p&gt;If you only react to 429 status codes, you're flying half-blind on a spec-compliant rate limiter.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Batch API Doesn't Really Help
&lt;/h2&gt;

&lt;p&gt;Microsoft Graph offers a batch mechanism to bundle multiple requests into one HTTP call. Intuitively this should count as a single request, but it doesn't. Each inner request in the batch is counted individually against rate limits [2].&lt;/p&gt;

&lt;p&gt;This makes &lt;code&gt;$batch&lt;/code&gt; largely pointless for our use case — at most it saves a bit of round-trip time.&lt;/p&gt;

&lt;p&gt;There's also a subtler trap. A &lt;code&gt;$batch&lt;/code&gt; POST can succeed with HTTP 200 at the outer level, while individual sub-responses inside it return 429, 502, or 503. We watched a production migration silently drop two sections because our retry loop only retried the outer batch — sub-response 5xx errors fell on the floor. If you use &lt;code&gt;$batch&lt;/code&gt;, the retry strategy has to operate on each sub-response, not the envelope.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. The Death Spiral
&lt;/h2&gt;

&lt;p&gt;The most painful incident we ran into wasn't 429s — it was what happened &lt;em&gt;after&lt;/em&gt; the 429s. A user kicked off a 1,300-page migration. The first 200 pages went through fine. Then a 429. Our consumer caught it, asked Cloudflare Queues for a 30-second retry. So far so reasonable.&lt;/p&gt;

&lt;p&gt;But because all 1,100 remaining messages were already enqueued, every one of them woke up around the same time after that 30-second wait, slammed back into the rate limit, and got 429'd again. Each new wave of 429s extended the throttle window further. The graph showed a beautiful self-reinforcing loop: rate limit → retries → rate limit → more retries. No page made forward progress for half an hour.&lt;/p&gt;

&lt;p&gt;Two fixes turned out to matter:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;A real circuit breaker at the limiter, not at each call site.&lt;/strong&gt; When the upstream returns sustained 429s, the limiter trips and &lt;em&gt;every&lt;/em&gt; caller gets 429'd locally for a cooldown window. This collapses N independent retry loops into one shared wait, instead of N callers each doing 30 retries × 5 minutes.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Foreground vs. background separation.&lt;/strong&gt; Note Bridge has two workloads hitting Graph: interactive notebook scans (a user is waiting) and background migration jobs (no one is staring at the screen). If a 1,300-page migration drains the rate budget, the user trying to scan their notebook waits forever. Our fix is two rate limiter "lanes" with the background lane capped at ~50% of the budget. Foreground always has headroom.&lt;/li&gt;
&lt;/ol&gt;

&lt;h2&gt;
  
  
  6. The Subtle retryAfterMs Bug
&lt;/h2&gt;

&lt;p&gt;One worth calling out because it took us a while to see. When the limiter checks two layers (per-user and per-app) and both are saturated, what &lt;code&gt;retryAfterMs&lt;/code&gt; should it return to the caller?&lt;/p&gt;

&lt;p&gt;We had &lt;code&gt;Math.min(userRetry, appRetry)&lt;/code&gt; — return the shorter wait, sounds friendly. It was wrong. If user-scope says "wait 2 seconds" but app-scope says "wait 30 seconds", retrying after 2 seconds gets you another instant 429. The right answer is &lt;code&gt;Math.max(userRetry, appRetry)&lt;/code&gt; — the request is only safe to send when &lt;em&gt;both&lt;/em&gt; layers have capacity. A one-character fix that ended a category of phantom retries.&lt;/p&gt;

&lt;h2&gt;
  
  
  Bottom Line
&lt;/h2&gt;

&lt;p&gt;Most rate-limiting tutorials end at "retry with exponential backoff and jitter." For Microsoft Graph that's not enough. You need to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Model the 3 × 2 limit matrix instead of guessing.&lt;/li&gt;
&lt;li&gt;Read &lt;code&gt;x-ms-throttle-limit-percentage&lt;/code&gt; and trip a soft breaker before you 429.&lt;/li&gt;
&lt;li&gt;Route 429s using &lt;code&gt;x-ms-throttle-scope&lt;/code&gt; so user trips don't punish the whole app (and vice versa).&lt;/li&gt;
&lt;li&gt;Retry inside &lt;code&gt;$batch&lt;/code&gt; sub-responses, not just on the envelope.&lt;/li&gt;
&lt;li&gt;Use a &lt;em&gt;shared&lt;/em&gt; circuit breaker, or your retry storms will outlast the actual throttle window.&lt;/li&gt;
&lt;li&gt;Separate foreground from background so interactive UX doesn't starve.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Microsoft Graph's lack of &lt;code&gt;Retry-After&lt;/code&gt; is the headline annoyance, but the deeper lesson is that rate limiting is a system, not a header. Build it like one.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;[1] &lt;a href="https://learn.microsoft.com/en-us/graph/throttling-limits#onenote-service-limits" rel="noopener noreferrer"&gt;Microsoft Graph API spec&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;[2] &lt;a href="https://learn.microsoft.com/en-us/graph/json-batching?tabs=http#batch-size-limitations" rel="noopener noreferrer"&gt;Batch API spec&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://note-bridge.co/" rel="noopener noreferrer"&gt;&lt;em&gt;Note Bridge&lt;/em&gt;&lt;/a&gt; &lt;em&gt;migrates OneNote notebooks to Notion.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>microsoftgraph</category>
      <category>webdev</category>
      <category>cloudflarechallenge</category>
      <category>api</category>
    </item>
  </channel>
</rss>
