If you run a public API without rate limiting, it's only a matter of time before a runaway client, a misconfigured retry loop, or a well-intentioned load test brings your service to its knees. .NET 7 shipped a first-class rate-limiting API — no third-party middleware required. This post walks through every knob you can turn.
Prerequisite: the built-in rate limiter lives in
System.Threading.RateLimitingand the ASP.NET Core middleware inMicrosoft.AspNetCore.RateLimiting. Both ship in the box from .NET 7 onwards.
Why rate limiting matters
Rate limiting protects three things simultaneously: your infrastructure from overload, your downstream dependencies from fan-out abuse, and your legitimate users from a noisy neighbour hogging capacity. It also plugs a class of denial-of-service vectors that auth alone can't stop.
The four built-in algorithms
1. Fixed window
Permits N requests per fixed time window (e.g. 100 requests per minute, window resets on the clock boundary). Simple, low memory, but can allow 2× burst at window boundaries.
using System.Threading.RateLimiting;
var limiter = new FixedWindowRateLimiter(
new FixedWindowRateLimiterOptions
{
PermitLimit = 100,
Window = TimeSpan.FromMinutes(1),
QueueProcessingOrder = QueueProcessingOrder.OldestFirst,
QueueLimit = 0 // reject immediately when full
});
2. Sliding window
Divides the window into segments and tracks usage per segment. Smoother than fixed window — eliminates the boundary burst at the cost of slightly more memory.
var limiter = new SlidingWindowRateLimiter(
new SlidingWindowRateLimiterOptions
{
PermitLimit = 100,
Window = TimeSpan.FromMinutes(1),
SegmentsPerWindow = 6, // 10-second granularity
QueueProcessingOrder = QueueProcessingOrder.OldestFirst,
QueueLimit = 0
});
3. Token bucket
A bucket fills with tokens at a steady rate up to a maximum. Each request consumes one token. Allows short bursts up to the bucket capacity while enforcing a long-run average. Ideal for APIs where short spikes are acceptable.
var limiter = new TokenBucketRateLimiter(
new TokenBucketRateLimiterOptions
{
TokenLimit = 50, // max burst
ReplenishmentPeriod = TimeSpan.FromSeconds(10),
TokensPerPeriod = 10, // ~1/s average
AutoReplenishment = true,
QueueProcessingOrder = QueueProcessingOrder.OldestFirst,
QueueLimit = 0
});
4. Concurrency limiter
Limits simultaneous in-flight requests rather than request rate. Useful for protecting expensive operations like report generation or ML inference where time-in-system matters more than throughput.
var limiter = new ConcurrencyLimiter(
new ConcurrencyLimiterOptions
{
PermitLimit = 20,
QueueProcessingOrder = QueueProcessingOrder.OldestFirst,
QueueLimit = 5
});
Wiring it up in ASP.NET Core
Register policies in Program.cs, then apply them with the [EnableRateLimiting] attribute or inline via RequireRateLimiting().
var builder = WebApplication.CreateBuilder(args);
builder.Services.AddRateLimiter(options =>
{
options.AddFixedWindowLimiter(policyName: "fixed", opt =>
{
opt.PermitLimit = 100;
opt.Window = TimeSpan.FromMinutes(1);
opt.QueueLimit = 0;
});
options.AddTokenBucketLimiter(policyName: "burst", opt =>
{
opt.TokenLimit = 50;
opt.ReplenishmentPeriod = TimeSpan.FromSeconds(10);
opt.TokensPerPeriod = 10;
opt.AutoReplenishment = true;
});
});
var app = builder.Build();
app.UseRateLimiter(); // must come before MapControllers
Apply to a minimal API endpoint or controller action:
// Minimal API
app.MapGet("/products", GetProducts)
.RequireRateLimiting("fixed");
// Controller
[EnableRateLimiting("burst")]
[HttpGet("search")]
public IActionResult Search(string query) { ... }
Per-user and per-endpoint policies
A single global policy rarely fits real-world needs. Use AddPolicy with a partition key derived from the request context:
options.AddPolicy("per-user", httpContext =>
RateLimitPartition.GetTokenBucketLimiter(
partitionKey: httpContext.User.Identity?.Name
?? httpContext.Connection.RemoteIpAddress?.ToString()
?? "anonymous",
factory: _ => new TokenBucketRateLimiterOptions
{
TokenLimit = 200,
ReplenishmentPeriod = TimeSpan.FromMinutes(1),
TokensPerPeriod = 200,
AutoReplenishment = true
}));
Tip: prefer authenticated user ID over IP address as the partition key — NAT and proxies can share a single IP across hundreds of users, leading to false positives at scale.
Custom rejection responses
By default, the middleware returns 503 Service Unavailable. The RFC-correct status for rate limiting is 429 Too Many Requests with a Retry-After header:
options.OnRejected = async (context, token) =>
{
context.HttpContext.Response.StatusCode = StatusCodes.Status429TooManyRequests;
if (context.Lease.TryGetMetadata(
MetadataName.RetryAfter, out var retryAfter))
{
context.HttpContext.Response.Headers.Append(
"Retry-After",
((int)retryAfter.TotalSeconds).ToString(
System.Globalization.CultureInfo.InvariantCulture));
}
await context.HttpContext.Response.WriteAsync(
"Rate limit exceeded. Please slow down.", token);
};
Distributed scenarios & Redis
The built-in limiters are in-process only — each pod maintains its own counters. In a horizontally scaled deployment, use a Redis-backed limiter via the RedisRateLimiting community library, which wraps the same RateLimiter abstraction:
dotnet add package RedisRateLimiting
builder.Services.AddStackExchangeRedisCache(o =>
o.Configuration = builder.Configuration["Redis:Connection"]);
options.AddPolicy("distributed", httpContext =>
RedisRateLimitPartition.GetSlidingWindowRateLimiter(
partitionKey: httpContext.User.Identity?.Name ?? "anon",
factory: _ => new RedisSlidingWindowRateLimiterOptions
{
ConnectionMultiplexerFactory =
httpContext.RequestServices
.GetRequiredService<IConnectionMultiplexer>,
PermitLimit = 500,
Window = TimeSpan.FromMinutes(1)
}));
Client-side resilience with Polly
If your code consumes a rate-limited API, use Polly's RateLimiter strategy combined with Retry to handle 429s gracefully:
dotnet add package Polly.Extensions.Http
services.AddHttpClient<IProductsClient, ProductsClient>()
.AddResilienceHandler("products-pipeline", builder =>
{
builder.AddRateLimiter(new SlidingWindowRateLimiter(
new SlidingWindowRateLimiterOptions
{
PermitLimit = 50,
Window = TimeSpan.FromSeconds(10),
SegmentsPerWindow = 5
}));
builder.AddRetry(new HttpRetryStrategyOptions
{
MaxRetryAttempts = 3,
Delay = TimeSpan.FromSeconds(2),
BackoffType = DelayBackoffType.Exponential,
ShouldHandle = args => ValueTask.FromResult(
args.Outcome.Result?.StatusCode ==
HttpStatusCode.TooManyRequests)
});
});
Choosing the right algorithm
| Algorithm | Best for | Watch out for | Memory cost |
|---|---|---|---|
| Fixed window | Simple quotas, billing tiers | Boundary burst (2× spike) | Very low |
| Sliding window | Smooth public APIs | Segment count × partitions | Low–medium |
| Token bucket | Burst-tolerant consumer APIs | Tuning burst vs average | Low |
| Concurrency | Expensive ops (ML, reports) | Doesn't bound throughput | Very low |
Distributed gotcha: in-process limiters per pod means a cluster of 4 replicas effectively multiplies your limit by 4. Always use a Redis-backed partitioned limiter for multi-replica deployments where correctness matters.
Wrapping up
.NET 7+ gives you production-grade rate limiting with zero external dependencies for single-node scenarios. The four algorithms cover the full spectrum from simple quotas to burst-tolerant consumer clients. Add Redis for distributed enforcement, Polly for client-side resilience, and always return 429 with a Retry-After header — your API consumers will thank you.
Questions or patterns I missed? Drop them in the comments.
Top comments (0)