- Mapping ISP & Carrier Policies to Real‑World Limits
- Designing a Distributed, ISP‑Aware Throttling Service
- Algorithms That Actually Work:
token bucket,leaky bucket, and Adaptive Backoff - Handling Warmup and Peaks: IP Warmup, Peak Events, and Smoke‑Testing
- Practical Playbook: Checklists, Metrics, and Runbook
ISPs and carriers will throttle before your monitoring notices a problem; the infrastructure that looks fast on paper can become a reputation sink in production. The right approach treats throughput optimization and reputation protection as the same engineering problem: maximize sends within the limits those networks will accept without penalizing your IPs, domains, or 10DLC campaigns.
The problem you see in production is consistent: large sends succeed at first, then slow, then fail or get rejected and you lose reputation—bounce and complaint rates spike, shared‑IP neighbors suffer, IPs get blacklisted, or carriers downgrade your 10DLC campaign. Symptoms include persistent 421/4xx SMTP deferrals, abrupt 5xx rejections, surge in SMS ACK failures and carrier-reported throttles, or steady growth in complaints visible in Postmaster tooling. These symptoms are rarely fixed by "send less"—you need a control plane that maps ISP/carrier rules to live send behavior.
Mapping ISP & Carrier Policies to Real‑World Limits
What networks actually enforce varies by destination type:
-
Email ISPs (Gmail, Microsoft, Yahoo, etc.) enforce per‑sender and per‑IP reputational checks, dynamic temporary rate-limiting, and content-based filtering. Microsoft’s Exchange Online documents show concrete submission limits such as connection concurrency and per‑minute/per‑day thresholds that cause measured throttling responses (for example, up to three concurrent SMTP connections for
SMTP AUTH,30messages per minute and a10,000recipients/day recipient rate can be enforced by the service). - Mobile carriers (A2P SMS via 10DLC, toll‑free, or short codes) attach throughput to registration, branding and campaign vetting. Throughput is assigned per brand and per campaign and varies by carrier—registered campaigns get materially higher throughput than unregistered traffic. Registration and trust score determine per‑carrier quotas and penalties for overflow.
- Aggregate behavior: carriers and ISPs often prefer queuing/deferring over outright dropping; repeated policy violations lead to permanent drops or blacklistings. M3AAWG and industry best‑practice documents codify operational expectations for senders.
Important: The fastest route to higher throughput is compliance and staged growth. Built-in throttles that respect ISP/carrier policies preserve lifetime capacity; ad‑hoc high-volume blasts burn reputation and reduce future throughput.
Concrete implications for your system:
- Treat per‑recipient destination (ISP / carrier /
carrier_id) as a first‑class routing key. Maintain counters and policies keyed by that identifier. - Expect both hard limits (explicit 5xx rejections for exceeding a quota) and soft limits (rising 4xx/deferrals) that require different handling.
- Record every
MX/TCP/HTTP/Providerresponse and map failures to actions (reduce, pause, re-route). Use FBLs / provider webhooks to feed back into the policy engine.
Designing a Distributed, ISP‑Aware Throttling Service
Build the throttle as a service separate from your templating and queuing layers. The core responsibilities of the service are: maintain per‑destination rate state, enforce burst & sustained limits, react to feedback from providers, and surface metrics.
Architecture (minimal, resilient):
- Ingress API -> Router (annotates
carrier_id/isp/region) ->Throttleservice -> Per‑destination queues (priority + retry budgets) -> Workers -> MTA/CPaaS (Postfix, SES, Twilio). - A central configuration store (
throttle_policies) drives per‑destinationrateandburstvalues, editable during incidents. A fast state store (Redis, RocksDB, or local in‑memory + periodic persistence) stores the livebucket_state.
Data model (example):
-
throttle_policy:{destination_type}:{id}= {rate(msg/s),burst(tokens),window(s),priority,source} -
bucket_state:{destination_key}= {tokens,last_refill_ts} -
reputation_metrics:{ip|domain|brand}= rolling counters (1m/5m/15m) for accepted, deferred, bounce, complaint, 4xx, 5xx.
Key engineering patterns:
- Use atomic ops (Redis Lua, CRDT, or strongly consistent DB transaction) to check-and-decrement tokens. This prevents race conditions when many workers drain the same bucket. Store the
tokensas a float and refill on access.token_rateandbucket_sizeare policy parameters. - Keep a per‑destination priority queue and admission control at queue head: if
acquire()fails, requeue with exponential retry + jitter (see algorithm below). Track a retry budget to avoid amplification (global retry budget per campaign). - Separate traffic shaping from business prioritization: route high‑value transactional messages (OTP, auth) into a high‑priority queue and reserve a portion of throughput for them; treat bulk promotional sends as best-effort. Implement quotas per
message_classto avoid pollution of transactional capacity.
Example: atomic token check (conceptual)
# Pseudocode (atomic via Redis Lua or DB transaction)
def try_acquire(destination_key, tokens_needed=1):
state = redis.hgetall(f"bucket_state:{destination_key}")
now = time_monotonic()
elapsed = now - state['last_refill_ts']
# refill tokens
refill = elapsed * policy[destination_key].rate
tokens = min(state['tokens'] + refill, policy[destination_key].burst)
if tokens >= tokens_needed:
tokens -= tokens_needed
# write state atomically
redis.hmset(f"bucket_state:{destination_key}", tokens=tokens, last_refill_ts=now)
return True
else:
# don't mutate state
return False
Use a single EVAL script in Redis for true atomicity in production.
Operational choices that matter:
- Persist policy changes and gracefully reduce
rateon sustained failures rather than killing the stream. A pragmatic default: reducerateby a multiplicative factor when a sustained> X%4xx/5xx window is observed, and restore via slow positive increments when back to healthy. Store acooldown_untiltimestamp to prevent flip‑flopping.
Algorithms That Actually Work: token bucket, leaky bucket, and Adaptive Backoff
Pick the right tool for the right layer.
-
Token bucket — metering with burst allowance. Add
rtokens per second, bucket sizeb, remove tokens to send. Good for preserving an average rate and allowing bursts up tob. Use for per‑ISP/campaign throttles where you want controlled burstiness. - Leaky bucket — shaping to a steady rate. Implemented as a queue serviced at a fixed rate. Use when you must smooth traffic to a fixed pattern (e.g., to match a carrier that forbids bursts). Leaky bucket as a queue is equivalent to a strict shaper and is useful at egress.
-
Adaptive Backoff — react to network/provider signals. On
429,4xxsoft errors or elevated deferrals, back off with exponential backoff + jitter to prevent retry storms and thundering-herd effects. AWS’s guidance on backoff + jitter is the operational standard for decorrelated retries.
Comparison table
| Algorithm | Best place to use | Behavior | Tradeoffs |
|---|---|---|---|
| Token bucket | Per‑ISP / per‑campaign admission | Allows bursts up to b, enforces average r
|
Flexible burst, needs atomic state; good for maximizing capacity. |
| Leaky bucket | Egress shaping to carrier | Smooth, fixed output rate | Low jitter; can increase latency during bursts. |
| Adaptive backoff | Retry & incident handling | Spread retries, reduce retry amplification | Must tune jitter; wrong tuning delays recovery. |
Token bucket implementation (Python, compact)
# token_bucket.py (conceptual)
import time, redis
rdb = redis.Redis()
WARM = 0.05 # safety fraction
def allow_send(key, rate, burst, cost=1):
# EVAL script in production for atomic update
now = time.time()
state = rdb.hgetall(key) or {b'tokens': b'0', b'last': b'0'}
tokens = float(state[b'tokens'])
last = float(state[b'last'])
tokens = min(burst, tokens + (now - last) * rate)
if tokens >= cost + WARM:
tokens -= cost
rdb.hmset(key, {'tokens': tokens, 'last': now})
return True
# don't store to avoid stampeding refills
return False
Make this atomic with Redis EVAL or a compare-and-set transaction.
Adaptive backoff with full jitter (recommended pattern):
# backoff_jitter.py (conceptual)
import random, time, math
def full_jitter(attempt, base=0.1, cap=30.0):
exp = base * (2 ** attempt)
return random.uniform(0, min(exp, cap))
# usage
attempt = 0
while attempt < max_attempts:
ok = send_message()
if ok: break
sleep = full_jitter(attempt)
time.sleep(sleep)
attempt += 1
Use decorrelated jitter or full jitter depending on your retry amplification profile; AWS advocates jitter to spread retries and avoid synchronized spikes.
Combining algorithms in a smart throttle:
- Use a
token bucketto admit to the outbound queue. - Use a
leaky bucketat the worker egress to smooth to provider expectations where necessary. - On provider
429/4xxecho codes, immediately scale down that destination’s tokenrateby a mitigation factor (e.g., 0.5) and start a controlled rebuild with small additive increases when errors subside. Persist the factor and the reason for auditability.
Handling Warmup and Peaks: IP Warmup, Peak Events, and Smoke‑Testing
IP warmup and pre‑planning are non‑negotiable if you run dedicated IPs or large SMS programs.
IP warmup (email):
- Managed providers such as AWS SES and SendGrid provide automated warmup and documented schedules; SES outlines an automatic warmup that ramps over ~45 days and recommends sending to your most active users during warmup, while SendGrid offers an automated warmup feature and manual schedules for dedicated IPs. Plan to warm each IP to each major ISP, because reputation is ISP‑specific.
- Practice: map target ISP mixes and, during warmup, send primarily to high‑engagement recipients (low complaint rates) to avoid early reputation damage.
SMS peak planning (10DLC & carriers):
- Register Brands and Campaigns with The Campaign Registry / your messaging provider to unlock throughput tiers and avoid punitive filtering; carriers allocate throughput differently (AT&T by message class/campaign, T‑Mobile with brand/day caps, Verizon with its own implicit caps). Partition sends across multiple numbers/campaigns where allowed and legal.
- For high‑traffic events (product launches, flash sales), prepare: reserve short code or toll‑free capacity when necessary, pre‑warm multiple 10DLC numbers under separate campaigns, and stagger sends across time slices to match per‑carrier quotas.
Testing & smoke runs:
- Implement canary sends: small seeded lists across major ISPs/carriers; run canaries 24–72 hours before a major event and watch delivery/deferral/compliant signals. Use feedback loops to adjust
rateper destination in real time. M3AAWG provides guidance on managing high‑risk mandated sends and handling complaint flows; follow these practices for safety.
Practical Playbook: Checklists, Metrics, and Runbook
Concrete, implementable items you can act on now.
Operational checklist (pre‑send)
- Validate
SPF,DKIM,DMARC, reverse DNS and TLS for email domains. - Ensure 10DLC Brand & Campaign registration is in place for US SMS and that number linking is complete.
- Confirm IP warmup status (SES/SendGrid consoles or API) and keep a warmup plan for new IPs.
- Seed a canary list for each major ISP/carrier and verify deliverability for 48–72 hours.
Monitoring & metrics (must be real‑time)
- Per‑destination throughput:
msgs_sent/sandtokens_consumed/s. - Error windowed rates:
4xx_rate_1m,5xx_rate_1m,429_rate_1m. Alert if these cross thresholds. - Engagement signals:
open_rate,click_rate,spam_complaint_rate(Gmail Postmaster guidance emphasizes keeping spam rates very low; industry reporting suggests targets ~0.10% for compliance with stricter inbox criteria). - Reputation SLOs:
inbox_placement(where measurable),bounce_rate < 2%,spam_complaint_rate < 0.1%(target),avg_latencyfor transactional messages (seconds).
Alert thresholds (example triggers)
- Immediate action:
spam_complaint_rate > 0.3%or sustained429_rate > 1%for 15 minutes. - Triage:
4xx_ratespike > 5% (15m window) → scale downrateby 50% and escalate to deliverability team. - Pre‑emptive: sudden drop in
open_rateacross major ISPs → pause promotions and run a hygiene check.
Incident runbook (429/deferrals)
- Pause non‑essential sends to the affected destination key(s). Mark campaign
paused. - Reduce
policy.ratefor the affected destination by0.5xand setcooldown_until = now + 30m. Persist change tothrottle_policies. - Switch a fraction (e.g., 10%) of high‑priority transactional traffic to alternate IP pools or provider if available.
- Start diagnostic telemetry: collect SMTP logs, provider webhooks, bounce reasons, and Postmaster/feedback loop reports.
- Once errors drop below triage thresholds for 30m, rehearse a slow, incremental ramp (e.g., +10% every 10 minutes) while monitoring error windows. Use canaries before full resume.
Quick config update (example curl to policy API)
curl -X PATCH "https://internal.throttle/api/v1/policies/isp/ATT" \
-H "Authorization: Bearer $ADMIN_KEY" \
-H "Content-Type: application/json" \
-d '{
"rate": 40, # messages/sec
"burst": 120,
"mitigation_reason": "Exceeded 429 threshold",
"cooldown_until": "2025-12-20T15:30:00Z"
}'
A short checklist for post‑mortem
- Timestamped list of policy changes and their effects.
- Correlate the first deferral/rejection with the send pattern and recent policy changes (new domain, new campaign, large promotional audience).
- Record remediation steps, time to recovery, and follow‑up items (list hygiene, consent checks, template changes).
Closing
Build your throttle to be measurement-driven and ISP-aware: treat each carrier or mailbox provider as a separate service with its own budget, and automate policy changes via a control plane that respects feedback and maintains conservative defaults during recovery. Smart throttling is not a restriction; it’s the mechanism that preserves and compounds your ability to send at scale.
Sources:
RFC 2697: A Single Rate Three Color Marker - Definition of metering and policing primitives used as background for token/leaky bucket reasoning.
Token bucket — Wikipedia - Clear description of token bucket behavior and properties used for implementation patterns.
Message storage and concurrent connection throttling for SMTP Authenticated Submission — Microsoft Learn - Microsoft’s documented SMTP submission limits and concrete throttling behavior (concurrency, per-minute and per-day limits).
Programmable Messaging and A2P 10DLC — Twilio Docs - Carrier/10DLC registration and throughput guidance; used to explain per‑campaign throughput and registration impact.
Warming up dedicated IP addresses — Amazon SES Documentation - SES-managed IP warmup behavior and recommended practices cited for warmup schedules and ISP-specific warmup.
IP Warmup | Twilio SendGrid Docs - SendGrid’s automated/manual IP warmup API and guidance cited for practical warmup tooling and schedules.
IP Warmup: Warming Up an IP Address | Twilio SendGrid Docs (UI guidance) - Additional SendGrid guidance for operational warmup and strategy.
Leaky bucket — Wikipedia - Explanation of the leaky bucket variants and use as a shaping queue.
Exponential Backoff And Jitter — AWS Architecture Blog - Canonical guidance on backoff strategies and jitter to prevent retry storms.
Google bulk sender / enforcement reporting — Forbes coverage & industry reporting - Industry reporting summarizing Gmail/Postmaster changes and operational thresholds referenced for spam/complaint guidance.
Top comments (0)