Deep technical dive: 55% traffic reduction, 40% latency drop, ~βΉ80L annual savings
TL;DR
We re-architected and optimized a high-throughput Flight Search & Sell microservice that integrates with multiple third-party aviation data providers and internal pricing systems.
Impact:
- π Network traffic β 55%
- β‘ P95 latency β 40%
- π° ~βΉ80 lakhs/year cloud savings
- π§ Reduced GC pressure (~30% alloc drop)
- π‘οΈ Improved resiliency & rollout safety
This post walks through production-grade improvements across HTTP stack, JSON, memory, async patterns, resiliency, caching, configuration, compression, structured logging, deployment strategy, and .NET 8 upgrades β with concrete code.
Target audience: senior backend engineers & platform architects.
1οΈβ£ Context: High-Throughput Aggregation at Scale
The system:
Clients β API Gateway β Flight Search Service
βββ Provider A (GDS)
βββ Provider B (NDC)
βββ Provider C (LCC)
βββ Internal pricing engine
βββ Static metadata store
Traffic profile:
- Bursty (promo campaigns)
- Heavy parallel provider calls
- Strict SLA (sub-1s P95 target)
- Large JSON payloads (100β500KB per provider)
- High outbound bandwidth cost
Observed problems:
- Socket exhaustion
- Retry storms during provider latency spikes
- High Gen2 GC pauses
- Large LOH allocations
- Excessive string normalization
- Bandwidth-heavy responses
- Config duplication across environments
2οΈβ£ Architecture Evolution (Textual Diagram)
Before
Request
β
Controller
β
Aggregator Service
βββ new HttpClient() per provider call
βββ Newtonsoft.Json
βββ dynamic parsing
βββ Manual retry logic
βββ List<T>.Contains in hot paths
βββ ToLower() comparisons
βββ Static configuration helpers
βββ No compression
After (.NET 8 optimized)
Request
β
Controller (CancellationToken bound to HttpContext)
β
Orchestrator
βββ Parallel fan-out (Task.WhenAll)
βββ IHttpClientFactory (Named clients)
βββ Polly retry + jitter
βββ Typed provider clients
βββ System.Text.Json source-gen ready
βββ HashSet/Dictionary lookups
βββ MemoryCache for static data
βββ Structured logging
βββ Brotli/Gzip compression
βββ Config via IOptions<T>
βββ Canary-aware deployment pipeline
3οΈβ£ HTTP Stack: From Anti-Pattern to Production-Grade
β The Problem: Per-call HttpClient
public async Task<string> CallProviderAsync(string url)
{
using var client = new HttpClient();
return await client.GetStringAsync(url);
}
Issues:
- Socket exhaustion
- No connection pooling reuse
- No DNS refresh handling
- No central resiliency
β IHttpClientFactory + Named Clients
Registration
builder.Services.AddHttpClient("ProviderA", client =>
{
client.BaseAddress = new Uri("https://api.providerA.com/");
client.Timeout = TimeSpan.FromSeconds(8);
client.DefaultRequestHeaders.AcceptEncoding.ParseAdd("br");
client.DefaultRequestHeaders.AcceptEncoding.ParseAdd("gzip");
})
.ConfigurePrimaryHttpMessageHandler(() =>
{
return new SocketsHttpHandler
{
PooledConnectionLifetime = TimeSpan.FromMinutes(5),
AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Brotli
};
})
.AddPolicyHandler(GetRetryPolicy());
Best practices:
- Use
SocketsHttpHandler - Control pooled connection lifetime
- Enable decompression at handler level
- Centralize retry policy
4οΈβ£ Resilience Engineering with Polly (Retry + Jitter)
Without jitter, retries synchronize across instances β provider meltdown.
static IAsyncPolicy<HttpResponseMessage> GetRetryPolicy()
{
var jitterer = new Random();
return Policy
.Handle<HttpRequestException>()
.OrResult<HttpResponseMessage>(r =>
(int)r.StatusCode >= 500)
.WaitAndRetryAsync(
retryCount: 3,
sleepDurationProvider: retryAttempt =>
TimeSpan.FromMilliseconds(
Math.Pow(2, retryAttempt) * 100 +
jitterer.Next(0, 100)));
}
Advanced considerations:
- Separate policies for internal vs third-party calls
- Avoid retrying 4xx
- Combine with circuit breaker in extreme cases
- Keep retry count low (donβt amplify latency)
5οΈβ£ JSON: Newtonsoft β System.Text.Json
Why switch?
- Lower allocations
- Faster serialization
- Native support in .NET runtime
- Reduced dependency surface
β Before
dynamic response = JsonConvert.DeserializeObject(json);
return response.data.price;
Reflection-heavy. Late-bound. Slow.
β After (Strongly Typed)
public sealed class ProviderResponse<T>
{
public T Data { get; init; }
}
public sealed class PriceDto
{
public decimal Amount { get; init; }
public string Currency { get; init; }
}
var typed = JsonSerializer.Deserialize<ProviderResponse<PriceDto>>(json);
Benefits:
- Compile-time safety
- Faster member access
- Lower memory overhead
- Better AOT compatibility in .NET 8
6οΈβ£ String Allocation & Case Handling
Hot path mistake:
if (currency.ToLower() == "inr")
Allocates new string per call.
Correct approach
if (string.Equals(currency, "INR", StringComparison.OrdinalIgnoreCase))
Zero allocation. Culture-safe.
Observed improvement: noticeable drop in Gen0 collections under load.
7οΈβ£ Data Structures: O(n) β O(1)
β List Contains
if (blockedAirlines.Contains(flight.AirlineCode))
O(n) in hot path.
β HashSet
var blockedSet = new HashSet<string>(blockedAirlines);
if (blockedSet.Contains(flight.AirlineCode))
O(1) average lookup.
Fast lookup via Dictionary
var airportMap = airports.ToDictionary(a => a.Code);
if (airportMap.TryGetValue("DEL", out var airport))
{
// instant lookup
}
8οΈβ£ Async Discipline & Cancellation Propagation
Golden Rule:
Every async boundary must propagate CancellationToken.
public async Task<SearchResponse> SearchAsync(
SearchRequest request,
CancellationToken cancellationToken)
{
var tasks = _providers
.Select(p => p.SearchAsync(request, cancellationToken));
var results = await Task.WhenAll(tasks);
return Aggregate(results);
}
Best practices:
- No
.Resultor.Wait() - Avoid Task.Run in ASP.NET
- Avoid unbounded parallelism
- Consider
Parallel.ForEachAsyncwith degree limits
9οΈβ£ Thread & Task Management
Avoid:
Task.Run(() => ProviderCall());
ASP.NET Core already runs on ThreadPool.
If limiting concurrency:
using var semaphore = new SemaphoreSlim(5);
foreach (var provider in providers)
{
await semaphore.WaitAsync(cancellationToken);
_ = Task.Run(async () =>
{
try { await provider.CallAsync(); }
finally { semaphore.Release(); }
});
}
Better: use bounded parallelism patterns.
π Caching Static Metadata
Airport list, airline list, fare rules rarely change.
builder.Services.AddMemoryCache();
if (!_cache.TryGetValue("Airports", out Dictionary<string, Airport> airports))
{
airports = LoadAirportsFromDb();
_cache.Set("Airports", airports, TimeSpan.FromHours(24));
}
Impact:
- Reduced DB calls
- Reduced response time variance
1οΈβ£1οΈβ£ Response Compression = Direct Cost Savings
builder.Services.AddResponseCompression(options =>
{
options.EnableForHttps = true;
options.Providers.Add<BrotliCompressionProvider>();
options.Providers.Add<GzipCompressionProvider>();
});
builder.Services.Configure<BrotliCompressionProviderOptions>(o =>
{
o.Level = CompressionLevel.Fastest;
});
Results:
- π 55% bandwidth reduction
- π° ~βΉ80 lakhs/year egress savings
- Faster client TTFB
Compression alone paid for the optimization effort.
1οΈβ£2οΈβ£ IOptions & Config Redesign
Structure
appsettings.json
appsettings.Staging.json
appsettings.Production.json
Typed Config
builder.Services.Configure<ProviderSettings>(
builder.Configuration.GetSection("ProviderSettings"));
{
"ProviderSettings": {
"TimeoutSeconds": 8,
"RetryCount": 3,
"EnableCompression": true
}
}
Centralized. Environment-aware. Testable.
1οΈβ£3οΈβ£ Structured Logging (No String Concatenation)
_logger.LogInformation(
"Search completed {SearchId} {LatencyMs} {Provider}",
searchId,
latencyMs,
providerName);
Queryable in ELK/Datadog/AppInsights.
1οΈβ£4οΈβ£ .NET 6 β .NET 8 Upgrade Gains
Observed:
- 8β12% throughput boost
- Improved ThreadPool heuristics
- Faster JSON
- Reduced memory footprint
Zero code change benefits.
1οΈβ£5οΈβ£ Before vs After Metrics
| Metric | Before | After | Change |
|---|---|---|---|
| P95 Latency | 850ms | 510ms | β 40% |
| Network Traffic | 100% | 45% | β 55% |
| Annual Infra Cost | Baseline | -βΉ80L | Savings |
| Alloc Rate | High | Reduced | ~30% |
| Socket Errors | Frequent | None | Stable |
1οΈβ£6οΈβ£ Canary Deployment Strategy
- Deploy to 5% traffic
- Monitor:
- P95 / P99 latency
- Retry rate
- Provider error %
-
GC pause time
- Gradual ramp-up:
-
5% β 25% β 50% β 100%
- Auto rollback on SLA breach
Reduced risk during optimization rollout.
1οΈβ£7οΈβ£ Production Rollout Checklist
- [ ] Load test with realistic payload sizes
- [ ] Validate retry behavior under provider failures
- [ ] Verify compression headers
- [ ] Confirm cancellation propagation
- [ ] Enable structured logging
- [ ] Warm static cache
- [ ] Canary release
- [ ] Monitor cost dashboard
- [ ] Validate GC metrics
π― Key Takeaways
- HttpClient misuse kills performance
- Compression = cost optimization
- Avoid string allocations in hot paths
- Use HashSet/Dictionary for lookups
- Never ignore CancellationToken
- Replace dynamic with generics
- .NET 8 gives free performance
- Canary deployments reduce blast radius
- Measure everything before & after
Top comments (0)