DEV Community

Printo Tom
Printo Tom

Posted on

Rate Limiting in C# — Don't Let Your API Get Hammered

If you run a public API without rate limiting, it's only a matter of time before a runaway client, a misconfigured retry loop, or a well-intentioned load test brings your service to its knees. .NET 7 shipped a first-class rate-limiting API — no third-party middleware required. This post walks through every knob you can turn.

Prerequisite: the built-in rate limiter lives in System.Threading.RateLimiting and the ASP.NET Core middleware in Microsoft.AspNetCore.RateLimiting. Both ship in the box from .NET 7 onwards.


Why rate limiting matters

Rate limiting protects three things simultaneously: your infrastructure from overload, your downstream dependencies from fan-out abuse, and your legitimate users from a noisy neighbour hogging capacity. It also plugs a class of denial-of-service vectors that auth alone can't stop.


The four built-in algorithms

1. Fixed window

Permits N requests per fixed time window (e.g. 100 requests per minute, window resets on the clock boundary). Simple, low memory, but can allow 2× burst at window boundaries.

using System.Threading.RateLimiting;

var limiter = new FixedWindowRateLimiter(
    new FixedWindowRateLimiterOptions
    {
        PermitLimit          = 100,
        Window               = TimeSpan.FromMinutes(1),
        QueueProcessingOrder = QueueProcessingOrder.OldestFirst,
        QueueLimit           = 0   // reject immediately when full
    });
Enter fullscreen mode Exit fullscreen mode

2. Sliding window

Divides the window into segments and tracks usage per segment. Smoother than fixed window — eliminates the boundary burst at the cost of slightly more memory.

var limiter = new SlidingWindowRateLimiter(
    new SlidingWindowRateLimiterOptions
    {
        PermitLimit          = 100,
        Window               = TimeSpan.FromMinutes(1),
        SegmentsPerWindow    = 6,     // 10-second granularity
        QueueProcessingOrder = QueueProcessingOrder.OldestFirst,
        QueueLimit           = 0
    });
Enter fullscreen mode Exit fullscreen mode

3. Token bucket

A bucket fills with tokens at a steady rate up to a maximum. Each request consumes one token. Allows short bursts up to the bucket capacity while enforcing a long-run average. Ideal for APIs where short spikes are acceptable.

var limiter = new TokenBucketRateLimiter(
    new TokenBucketRateLimiterOptions
    {
        TokenLimit               = 50,   // max burst
        ReplenishmentPeriod      = TimeSpan.FromSeconds(10),
        TokensPerPeriod          = 10,   // ~1/s average
        AutoReplenishment        = true,
        QueueProcessingOrder     = QueueProcessingOrder.OldestFirst,
        QueueLimit               = 0
    });
Enter fullscreen mode Exit fullscreen mode

4. Concurrency limiter

Limits simultaneous in-flight requests rather than request rate. Useful for protecting expensive operations like report generation or ML inference where time-in-system matters more than throughput.

var limiter = new ConcurrencyLimiter(
    new ConcurrencyLimiterOptions
    {
        PermitLimit          = 20,
        QueueProcessingOrder = QueueProcessingOrder.OldestFirst,
        QueueLimit           = 5
    });
Enter fullscreen mode Exit fullscreen mode

Wiring it up in ASP.NET Core

Register policies in Program.cs, then apply them with the [EnableRateLimiting] attribute or inline via RequireRateLimiting().

var builder = WebApplication.CreateBuilder(args);

builder.Services.AddRateLimiter(options =>
{
    options.AddFixedWindowLimiter(policyName: "fixed", opt =>
    {
        opt.PermitLimit = 100;
        opt.Window      = TimeSpan.FromMinutes(1);
        opt.QueueLimit  = 0;
    });

    options.AddTokenBucketLimiter(policyName: "burst", opt =>
    {
        opt.TokenLimit          = 50;
        opt.ReplenishmentPeriod = TimeSpan.FromSeconds(10);
        opt.TokensPerPeriod     = 10;
        opt.AutoReplenishment   = true;
    });
});

var app = builder.Build();
app.UseRateLimiter();   // must come before MapControllers
Enter fullscreen mode Exit fullscreen mode

Apply to a minimal API endpoint or controller action:

// Minimal API
app.MapGet("/products", GetProducts)
   .RequireRateLimiting("fixed");

// Controller
[EnableRateLimiting("burst")]
[HttpGet("search")]
public IActionResult Search(string query) { ... }
Enter fullscreen mode Exit fullscreen mode

Per-user and per-endpoint policies

A single global policy rarely fits real-world needs. Use AddPolicy with a partition key derived from the request context:

options.AddPolicy("per-user", httpContext =>
    RateLimitPartition.GetTokenBucketLimiter(
        partitionKey: httpContext.User.Identity?.Name
                      ?? httpContext.Connection.RemoteIpAddress?.ToString()
                      ?? "anonymous",
        factory: _ => new TokenBucketRateLimiterOptions
        {
            TokenLimit          = 200,
            ReplenishmentPeriod = TimeSpan.FromMinutes(1),
            TokensPerPeriod     = 200,
            AutoReplenishment   = true
        }));
Enter fullscreen mode Exit fullscreen mode

Tip: prefer authenticated user ID over IP address as the partition key — NAT and proxies can share a single IP across hundreds of users, leading to false positives at scale.


Custom rejection responses

By default, the middleware returns 503 Service Unavailable. The RFC-correct status for rate limiting is 429 Too Many Requests with a Retry-After header:

options.OnRejected = async (context, token) =>
{
    context.HttpContext.Response.StatusCode = StatusCodes.Status429TooManyRequests;

    if (context.Lease.TryGetMetadata(
            MetadataName.RetryAfter, out var retryAfter))
    {
        context.HttpContext.Response.Headers.Append(
            "Retry-After",
            ((int)retryAfter.TotalSeconds).ToString(
                System.Globalization.CultureInfo.InvariantCulture));
    }

    await context.HttpContext.Response.WriteAsync(
        "Rate limit exceeded. Please slow down.", token);
};
Enter fullscreen mode Exit fullscreen mode

Distributed scenarios & Redis

The built-in limiters are in-process only — each pod maintains its own counters. In a horizontally scaled deployment, use a Redis-backed limiter via the RedisRateLimiting community library, which wraps the same RateLimiter abstraction:

dotnet add package RedisRateLimiting
Enter fullscreen mode Exit fullscreen mode
builder.Services.AddStackExchangeRedisCache(o =>
    o.Configuration = builder.Configuration["Redis:Connection"]);

options.AddPolicy("distributed", httpContext =>
    RedisRateLimitPartition.GetSlidingWindowRateLimiter(
        partitionKey: httpContext.User.Identity?.Name ?? "anon",
        factory: _ => new RedisSlidingWindowRateLimiterOptions
        {
            ConnectionMultiplexerFactory =
                httpContext.RequestServices
                    .GetRequiredService<IConnectionMultiplexer>,
            PermitLimit = 500,
            Window      = TimeSpan.FromMinutes(1)
        }));
Enter fullscreen mode Exit fullscreen mode

Client-side resilience with Polly

If your code consumes a rate-limited API, use Polly's RateLimiter strategy combined with Retry to handle 429s gracefully:

dotnet add package Polly.Extensions.Http
Enter fullscreen mode Exit fullscreen mode
services.AddHttpClient<IProductsClient, ProductsClient>()
        .AddResilienceHandler("products-pipeline", builder =>
        {
            builder.AddRateLimiter(new SlidingWindowRateLimiter(
                new SlidingWindowRateLimiterOptions
                {
                    PermitLimit       = 50,
                    Window            = TimeSpan.FromSeconds(10),
                    SegmentsPerWindow = 5
                }));

            builder.AddRetry(new HttpRetryStrategyOptions
            {
                MaxRetryAttempts = 3,
                Delay            = TimeSpan.FromSeconds(2),
                BackoffType      = DelayBackoffType.Exponential,
                ShouldHandle     = args => ValueTask.FromResult(
                    args.Outcome.Result?.StatusCode ==
                        HttpStatusCode.TooManyRequests)
            });
        });
Enter fullscreen mode Exit fullscreen mode

Choosing the right algorithm

Algorithm Best for Watch out for Memory cost
Fixed window Simple quotas, billing tiers Boundary burst (2× spike) Very low
Sliding window Smooth public APIs Segment count × partitions Low–medium
Token bucket Burst-tolerant consumer APIs Tuning burst vs average Low
Concurrency Expensive ops (ML, reports) Doesn't bound throughput Very low

Distributed gotcha: in-process limiters per pod means a cluster of 4 replicas effectively multiplies your limit by 4. Always use a Redis-backed partitioned limiter for multi-replica deployments where correctness matters.


Wrapping up

.NET 7+ gives you production-grade rate limiting with zero external dependencies for single-node scenarios. The four algorithms cover the full spectrum from simple quotas to burst-tolerant consumer clients. Add Redis for distributed enforcement, Polly for client-side resilience, and always return 429 with a Retry-After header — your API consumers will thank you.

Questions or patterns I missed? Drop them in the comments.

Top comments (0)