beefed.ai

Posted on Apr 27 • Originally published at beefed.ai

Smart Throttling: ISP & Carrier-Aware Rate Limiting

#backend

Mapping ISP & Carrier Policies to Real‑World Limits
Designing a Distributed, ISP‑Aware Throttling Service
Algorithms That Actually Work: token bucket, leaky bucket, and Adaptive Backoff
Handling Warmup and Peaks: IP Warmup, Peak Events, and Smoke‑Testing
Practical Playbook: Checklists, Metrics, and Runbook

ISPs and carriers will throttle before your monitoring notices a problem; the infrastructure that looks fast on paper can become a reputation sink in production. The right approach treats throughput optimization and reputation protection as the same engineering problem: maximize sends within the limits those networks will accept without penalizing your IPs, domains, or 10DLC campaigns.

The problem you see in production is consistent: large sends succeed at first, then slow, then fail or get rejected and you lose reputation—bounce and complaint rates spike, shared‑IP neighbors suffer, IPs get blacklisted, or carriers downgrade your 10DLC campaign. Symptoms include persistent 421/4xx SMTP deferrals, abrupt 5xx rejections, surge in SMS ACK failures and carrier-reported throttles, or steady growth in complaints visible in Postmaster tooling. These symptoms are rarely fixed by "send less"—you need a control plane that maps ISP/carrier rules to live send behavior.

Mapping ISP & Carrier Policies to Real‑World Limits

What networks actually enforce varies by destination type:

Email ISPs (Gmail, Microsoft, Yahoo, etc.) enforce per‑sender and per‑IP reputational checks, dynamic temporary rate-limiting, and content-based filtering. Microsoft’s Exchange Online documents show concrete submission limits such as connection concurrency and per‑minute/per‑day thresholds that cause measured throttling responses (for example, up to three concurrent SMTP connections for SMTP AUTH, 30 messages per minute and a 10,000 recipients/day recipient rate can be enforced by the service).
Mobile carriers (A2P SMS via 10DLC, toll‑free, or short codes) attach throughput to registration, branding and campaign vetting. Throughput is assigned per brand and per campaign and varies by carrier—registered campaigns get materially higher throughput than unregistered traffic. Registration and trust score determine per‑carrier quotas and penalties for overflow.
Aggregate behavior: carriers and ISPs often prefer queuing/deferring over outright dropping; repeated policy violations lead to permanent drops or blacklistings. M3AAWG and industry best‑practice documents codify operational expectations for senders.

Important: The fastest route to higher throughput is compliance and staged growth. Built-in throttles that respect ISP/carrier policies preserve lifetime capacity; ad‑hoc high-volume blasts burn reputation and reduce future throughput.

Concrete implications for your system:

Treat per‑recipient destination (ISP / carrier / carrier_id) as a first‑class routing key. Maintain counters and policies keyed by that identifier.
Expect both hard limits (explicit 5xx rejections for exceeding a quota) and soft limits (rising 4xx/deferrals) that require different handling.
Record every MX/TCP/HTTP/Provider response and map failures to actions (reduce, pause, re-route). Use FBLs / provider webhooks to feed back into the policy engine.

Designing a Distributed, ISP‑Aware Throttling Service

Build the throttle as a service separate from your templating and queuing layers. The core responsibilities of the service are: maintain per‑destination rate state, enforce burst & sustained limits, react to feedback from providers, and surface metrics.

Architecture (minimal, resilient):

Ingress API -> Router (annotates carrier_id/isp/region) -> Throttle service -> Per‑destination queues (priority + retry budgets) -> Workers -> MTA/CPaaS (Postfix, SES, Twilio).
A central configuration store (throttle_policies) drives per‑destination rate and burst values, editable during incidents. A fast state store (Redis, RocksDB, or local in‑memory + periodic persistence) stores the live bucket_state.

Data model (example):

throttle_policy:{destination_type}:{id} = { rate (msg/s), burst (tokens), window (s), priority, source }
bucket_state:{destination_key} = { tokens, last_refill_ts }
reputation_metrics:{ip|domain|brand} = rolling counters (1m/5m/15m) for accepted, deferred, bounce, complaint, 4xx, 5xx.

Key engineering patterns:

Use atomic ops (Redis Lua, CRDT, or strongly consistent DB transaction) to check-and-decrement tokens. This prevents race conditions when many workers drain the same bucket. Store the tokens as a float and refill on access. token_rate and bucket_size are policy parameters.
Keep a per‑destination priority queue and admission control at queue head: if acquire() fails, requeue with exponential retry + jitter (see algorithm below). Track a retry budget to avoid amplification (global retry budget per campaign).
Separate traffic shaping from business prioritization: route high‑value transactional messages (OTP, auth) into a high‑priority queue and reserve a portion of throughput for them; treat bulk promotional sends as best-effort. Implement quotas per message_class to avoid pollution of transactional capacity.

Example: atomic token check (conceptual)

# Pseudocode (atomic via Redis Lua or DB transaction)
def try_acquire(destination_key, tokens_needed=1):
    state = redis.hgetall(f"bucket_state:{destination_key}")
    now = time_monotonic()
    elapsed = now - state['last_refill_ts']
    # refill tokens
    refill = elapsed * policy[destination_key].rate
    tokens = min(state['tokens'] + refill, policy[destination_key].burst)
    if tokens >= tokens_needed:
        tokens -= tokens_needed
        # write state atomically
        redis.hmset(f"bucket_state:{destination_key}", tokens=tokens, last_refill_ts=now)
        return True
    else:
        # don't mutate state
        return False

Use a single EVAL script in Redis for true atomicity in production.

Operational choices that matter:

Persist policy changes and gracefully reduce rate on sustained failures rather than killing the stream. A pragmatic default: reduce rate by a multiplicative factor when a sustained > X% 4xx/5xx window is observed, and restore via slow positive increments when back to healthy. Store a cooldown_until timestamp to prevent flip‑flopping.

Algorithms That Actually Work: `token bucket`, `leaky bucket`, and Adaptive Backoff

Pick the right tool for the right layer.

Token bucket — metering with burst allowance. Add r tokens per second, bucket size b, remove tokens to send. Good for preserving an average rate and allowing bursts up to b. Use for per‑ISP/campaign throttles where you want controlled burstiness.
Leaky bucket — shaping to a steady rate. Implemented as a queue serviced at a fixed rate. Use when you must smooth traffic to a fixed pattern (e.g., to match a carrier that forbids bursts). Leaky bucket as a queue is equivalent to a strict shaper and is useful at egress.
Adaptive Backoff — react to network/provider signals. On 429, 4xx soft errors or elevated deferrals, back off with exponential backoff + jitter to prevent retry storms and thundering-herd effects. AWS’s guidance on backoff + jitter is the operational standard for decorrelated retries.

Comparison table

Algorithm	Best place to use	Behavior	Tradeoffs
Token bucket	Per‑ISP / per‑campaign admission	Allows bursts up to `b`, enforces average `r`	Flexible burst, needs atomic state; good for maximizing capacity.
Leaky bucket	Egress shaping to carrier	Smooth, fixed output rate	Low jitter; can increase latency during bursts.
Adaptive backoff	Retry & incident handling	Spread retries, reduce retry amplification	Must tune jitter; wrong tuning delays recovery.

Token bucket implementation (Python, compact)

# token_bucket.py (conceptual)
import time, redis

rdb = redis.Redis()

WARM = 0.05  # safety fraction

def allow_send(key, rate, burst, cost=1):
    # EVAL script in production for atomic update
    now = time.time()
    state = rdb.hgetall(key) or {b'tokens': b'0', b'last': b'0'}
    tokens = float(state[b'tokens'])
    last = float(state[b'last'])
    tokens = min(burst, tokens + (now - last) * rate)
    if tokens >= cost + WARM:
        tokens -= cost
        rdb.hmset(key, {'tokens': tokens, 'last': now})
        return True
    # don't store to avoid stampeding refills
    return False

Make this atomic with Redis EVAL or a compare-and-set transaction.

Adaptive backoff with full jitter (recommended pattern):

# backoff_jitter.py (conceptual)
import random, time, math

def full_jitter(attempt, base=0.1, cap=30.0):
    exp = base * (2 ** attempt)
    return random.uniform(0, min(exp, cap))

# usage
attempt = 0
while attempt < max_attempts:
    ok = send_message()
    if ok: break
    sleep = full_jitter(attempt)
    time.sleep(sleep)
    attempt += 1

Use decorrelated jitter or full jitter depending on your retry amplification profile; AWS advocates jitter to spread retries and avoid synchronized spikes.

Combining algorithms in a smart throttle:

Use a token bucket to admit to the outbound queue.
Use a leaky bucket at the worker egress to smooth to provider expectations where necessary.
On provider 429/4xx echo codes, immediately scale down that destination’s token rate by a mitigation factor (e.g., 0.5) and start a controlled rebuild with small additive increases when errors subside. Persist the factor and the reason for auditability.

Handling Warmup and Peaks: IP Warmup, Peak Events, and Smoke‑Testing

IP warmup and pre‑planning are non‑negotiable if you run dedicated IPs or large SMS programs.

IP warmup (email):

Managed providers such as AWS SES and SendGrid provide automated warmup and documented schedules; SES outlines an automatic warmup that ramps over ~45 days and recommends sending to your most active users during warmup, while SendGrid offers an automated warmup feature and manual schedules for dedicated IPs. Plan to warm each IP to each major ISP, because reputation is ISP‑specific.
Practice: map target ISP mixes and, during warmup, send primarily to high‑engagement recipients (low complaint rates) to avoid early reputation damage.

SMS peak planning (10DLC & carriers):

Register Brands and Campaigns with The Campaign Registry / your messaging provider to unlock throughput tiers and avoid punitive filtering; carriers allocate throughput differently (AT&T by message class/campaign, T‑Mobile with brand/day caps, Verizon with its own implicit caps). Partition sends across multiple numbers/campaigns where allowed and legal.
For high‑traffic events (product launches, flash sales), prepare: reserve short code or toll‑free capacity when necessary, pre‑warm multiple 10DLC numbers under separate campaigns, and stagger sends across time slices to match per‑carrier quotas.

Testing & smoke runs:

Implement canary sends: small seeded lists across major ISPs/carriers; run canaries 24–72 hours before a major event and watch delivery/deferral/compliant signals. Use feedback loops to adjust rate per destination in real time. M3AAWG provides guidance on managing high‑risk mandated sends and handling complaint flows; follow these practices for safety.

Practical Playbook: Checklists, Metrics, and Runbook

Concrete, implementable items you can act on now.

Operational checklist (pre‑send)

Validate SPF, DKIM, DMARC, reverse DNS and TLS for email domains.
Ensure 10DLC Brand & Campaign registration is in place for US SMS and that number linking is complete.
Confirm IP warmup status (SES/SendGrid consoles or API) and keep a warmup plan for new IPs.
Seed a canary list for each major ISP/carrier and verify deliverability for 48–72 hours.

Monitoring & metrics (must be real‑time)

Per‑destination throughput: msgs_sent/s and tokens_consumed/s.
Error windowed rates: 4xx_rate_1m, 5xx_rate_1m, 429_rate_1m. Alert if these cross thresholds.
Engagement signals: open_rate, click_rate, spam_complaint_rate (Gmail Postmaster guidance emphasizes keeping spam rates very low; industry reporting suggests targets ~0.10% for compliance with stricter inbox criteria).
Reputation SLOs: inbox_placement (where measurable), bounce_rate < 2%, spam_complaint_rate < 0.1% (target), avg_latency for transactional messages (seconds).

Alert thresholds (example triggers)

Immediate action: spam_complaint_rate > 0.3% or sustained 429_rate > 1% for 15 minutes.
Triage: 4xx_rate spike > 5% (15m window) → scale down rate by 50% and escalate to deliverability team.
Pre‑emptive: sudden drop in open_rate across major ISPs → pause promotions and run a hygiene check.

Incident runbook (429/deferrals)

Pause non‑essential sends to the affected destination key(s). Mark campaign paused.
Reduce policy.rate for the affected destination by 0.5x and set cooldown_until = now + 30m. Persist change to throttle_policies.
Switch a fraction (e.g., 10%) of high‑priority transactional traffic to alternate IP pools or provider if available.
Start diagnostic telemetry: collect SMTP logs, provider webhooks, bounce reasons, and Postmaster/feedback loop reports.
Once errors drop below triage thresholds for 30m, rehearse a slow, incremental ramp (e.g., +10% every 10 minutes) while monitoring error windows. Use canaries before full resume.

Quick config update (example curl to policy API)

curl -X PATCH "https://internal.throttle/api/v1/policies/isp/ATT" \
  -H "Authorization: Bearer $ADMIN_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "rate": 40,       # messages/sec
    "burst": 120,
    "mitigation_reason": "Exceeded 429 threshold",
    "cooldown_until": "2025-12-20T15:30:00Z"
  }'

A short checklist for post‑mortem

Timestamped list of policy changes and their effects.
Correlate the first deferral/rejection with the send pattern and recent policy changes (new domain, new campaign, large promotional audience).
Record remediation steps, time to recovery, and follow‑up items (list hygiene, consent checks, template changes).

Closing

Build your throttle to be measurement-driven and ISP-aware: treat each carrier or mailbox provider as a separate service with its own budget, and automate policy changes via a control plane that respects feedback and maintains conservative defaults during recovery. Smart throttling is not a restriction; it’s the mechanism that preserves and compounds your ability to send at scale.

Sources:
RFC 2697: A Single Rate Three Color Marker - Definition of metering and policing primitives used as background for token/leaky bucket reasoning.

Token bucket — Wikipedia - Clear description of token bucket behavior and properties used for implementation patterns.

Message storage and concurrent connection throttling for SMTP Authenticated Submission — Microsoft Learn - Microsoft’s documented SMTP submission limits and concrete throttling behavior (concurrency, per-minute and per-day limits).

Programmable Messaging and A2P 10DLC — Twilio Docs - Carrier/10DLC registration and throughput guidance; used to explain per‑campaign throughput and registration impact.

Warming up dedicated IP addresses — Amazon SES Documentation - SES-managed IP warmup behavior and recommended practices cited for warmup schedules and ISP-specific warmup.

IP Warmup | Twilio SendGrid Docs - SendGrid’s automated/manual IP warmup API and guidance cited for practical warmup tooling and schedules.

IP Warmup: Warming Up an IP Address | Twilio SendGrid Docs (UI guidance) - Additional SendGrid guidance for operational warmup and strategy.

Leaky bucket — Wikipedia - Explanation of the leaky bucket variants and use as a shaping queue.

Exponential Backoff And Jitter — AWS Architecture Blog - Canonical guidance on backoff strategies and jitter to prevent retry storms.

Google bulk sender / enforcement reporting — Forbes coverage & industry reporting - Industry reporting summarizing Gmail/Postmaster changes and operational thresholds referenced for spam/complaint guidance.

DEV Community

Smart Throttling: ISP & Carrier-Aware Rate Limiting

Mapping ISP & Carrier Policies to Real‑World Limits

Designing a Distributed, ISP‑Aware Throttling Service

Algorithms That Actually Work: `token bucket`, `leaky bucket`, and Adaptive Backoff

Handling Warmup and Peaks: IP Warmup, Peak Events, and Smoke‑Testing

Practical Playbook: Checklists, Metrics, and Runbook

Closing

Top comments (0)

Mapping ISP & Carrier Policies to Real‑World Limits

Designing a Distributed, ISP‑Aware Throttling Service

Algorithms That Actually Work: token bucket, leaky bucket, and Adaptive Backoff

Handling Warmup and Peaks: IP Warmup, Peak Events, and Smoke‑Testing

Practical Playbook: Checklists, Metrics, and Runbook

Closing

Algorithms That Actually Work: `token bucket`, `leaky bucket`, and Adaptive Backoff