Jaydeep Kumar Sahu

Posted on Feb 21

🚀 Hardening & Optimizing a .NET Flight Search & Sell Microservice

#dotnet #microservices #backend #performance

Deep technical dive: 55% traffic reduction, 40% latency drop, ~₹80L annual savings

TL;DR

We re-architected and optimized a high-throughput Flight Search & Sell microservice that integrates with multiple third-party aviation data providers and internal pricing systems.

Impact:

📉 Network traffic ↓ 55%
⚡ P95 latency ↓ 40%
💰 ~₹80 lakhs/year cloud savings
🧠 Reduced GC pressure (~30% alloc drop)
🛡️ Improved resiliency & rollout safety

This post walks through production-grade improvements across HTTP stack, JSON, memory, async patterns, resiliency, caching, configuration, compression, structured logging, deployment strategy, and .NET 8 upgrades — with concrete code.

Target audience: senior backend engineers & platform architects.

1️⃣ Context: High-Throughput Aggregation at Scale

The system:

Clients → API Gateway → Flight Search Service
                                ├── Provider A (GDS)
                                ├── Provider B (NDC)
                                ├── Provider C (LCC)
                                ├── Internal pricing engine
                                └── Static metadata store

Traffic profile:

Bursty (promo campaigns)
Heavy parallel provider calls
Strict SLA (sub-1s P95 target)
Large JSON payloads (100–500KB per provider)
High outbound bandwidth cost

Observed problems:

Socket exhaustion
Retry storms during provider latency spikes
High Gen2 GC pauses
Large LOH allocations
Excessive string normalization
Bandwidth-heavy responses
Config duplication across environments

2️⃣ Architecture Evolution (Textual Diagram)

Before

Request
  ↓
Controller
  ↓
Aggregator Service
  ├── new HttpClient() per provider call
  ├── Newtonsoft.Json
  ├── dynamic parsing
  ├── Manual retry logic
  ├── List<T>.Contains in hot paths
  ├── ToLower() comparisons
  ├── Static configuration helpers
  └── No compression

After (.NET 8 optimized)

Request
  ↓
Controller (CancellationToken bound to HttpContext)
  ↓
Orchestrator
  ├── Parallel fan-out (Task.WhenAll)
  ├── IHttpClientFactory (Named clients)
  ├── Polly retry + jitter
  ├── Typed provider clients
  ├── System.Text.Json source-gen ready
  ├── HashSet/Dictionary lookups
  ├── MemoryCache for static data
  ├── Structured logging
  ├── Brotli/Gzip compression
  ├── Config via IOptions<T>
  └── Canary-aware deployment pipeline

3️⃣ HTTP Stack: From Anti-Pattern to Production-Grade

❌ The Problem: Per-call HttpClient

public async Task<string> CallProviderAsync(string url)
{
    using var client = new HttpClient();
    return await client.GetStringAsync(url);
}

Issues:

Socket exhaustion
No connection pooling reuse
No DNS refresh handling
No central resiliency

✅ IHttpClientFactory + Named Clients

Registration

builder.Services.AddHttpClient("ProviderA", client =>
{
    client.BaseAddress = new Uri("https://api.providerA.com/");
    client.Timeout = TimeSpan.FromSeconds(8);
    client.DefaultRequestHeaders.AcceptEncoding.ParseAdd("br");
    client.DefaultRequestHeaders.AcceptEncoding.ParseAdd("gzip");
})
.ConfigurePrimaryHttpMessageHandler(() =>
{
    return new SocketsHttpHandler
    {
        PooledConnectionLifetime = TimeSpan.FromMinutes(5),
        AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Brotli
    };
})
.AddPolicyHandler(GetRetryPolicy());

Best practices:

Use SocketsHttpHandler
Control pooled connection lifetime
Enable decompression at handler level
Centralize retry policy

4️⃣ Resilience Engineering with Polly (Retry + Jitter)

Without jitter, retries synchronize across instances → provider meltdown.

static IAsyncPolicy<HttpResponseMessage> GetRetryPolicy()
{
    var jitterer = new Random();

    return Policy
        .Handle<HttpRequestException>()
        .OrResult<HttpResponseMessage>(r => 
            (int)r.StatusCode >= 500)
        .WaitAndRetryAsync(
            retryCount: 3,
            sleepDurationProvider: retryAttempt =>
                TimeSpan.FromMilliseconds(
                    Math.Pow(2, retryAttempt) * 100 +
                    jitterer.Next(0, 100)));
}

Advanced considerations:

Separate policies for internal vs third-party calls
Avoid retrying 4xx
Combine with circuit breaker in extreme cases
Keep retry count low (don’t amplify latency)

5️⃣ JSON: Newtonsoft → System.Text.Json

Why switch?

Lower allocations
Faster serialization
Native support in .NET runtime
Reduced dependency surface

❌ Before

dynamic response = JsonConvert.DeserializeObject(json);
return response.data.price;

Reflection-heavy. Late-bound. Slow.

✅ After (Strongly Typed)

public sealed class ProviderResponse<T>
{
    public T Data { get; init; }
}

public sealed class PriceDto
{
    public decimal Amount { get; init; }
    public string Currency { get; init; }
}

var typed = JsonSerializer.Deserialize<ProviderResponse<PriceDto>>(json);

Benefits:

Compile-time safety
Faster member access
Lower memory overhead
Better AOT compatibility in .NET 8

6️⃣ String Allocation & Case Handling

Hot path mistake:

if (currency.ToLower() == "inr")

Allocates new string per call.

Correct approach

if (string.Equals(currency, "INR", StringComparison.OrdinalIgnoreCase))

Zero allocation. Culture-safe.

Observed improvement: noticeable drop in Gen0 collections under load.

7️⃣ Data Structures: O(n) → O(1)

❌ List Contains

if (blockedAirlines.Contains(flight.AirlineCode))

O(n) in hot path.

✅ HashSet

var blockedSet = new HashSet<string>(blockedAirlines);

if (blockedSet.Contains(flight.AirlineCode))

O(1) average lookup.

Fast lookup via Dictionary

var airportMap = airports.ToDictionary(a => a.Code);

if (airportMap.TryGetValue("DEL", out var airport))
{
    // instant lookup
}

8️⃣ Async Discipline & Cancellation Propagation

Golden Rule:

Every async boundary must propagate CancellationToken.

public async Task<SearchResponse> SearchAsync(
    SearchRequest request,
    CancellationToken cancellationToken)
{
    var tasks = _providers
        .Select(p => p.SearchAsync(request, cancellationToken));

    var results = await Task.WhenAll(tasks);

    return Aggregate(results);
}

Best practices:

No .Result or .Wait()
Avoid Task.Run in ASP.NET
Avoid unbounded parallelism
Consider Parallel.ForEachAsync with degree limits

9️⃣ Thread & Task Management

Avoid:

Task.Run(() => ProviderCall());

ASP.NET Core already runs on ThreadPool.

If limiting concurrency:

using var semaphore = new SemaphoreSlim(5);

foreach (var provider in providers)
{
    await semaphore.WaitAsync(cancellationToken);
    _ = Task.Run(async () =>
    {
        try { await provider.CallAsync(); }
        finally { semaphore.Release(); }
    });
}

Better: use bounded parallelism patterns.

🔟 Caching Static Metadata

Airport list, airline list, fare rules rarely change.

builder.Services.AddMemoryCache();

if (!_cache.TryGetValue("Airports", out Dictionary<string, Airport> airports))
{
    airports = LoadAirportsFromDb();
    _cache.Set("Airports", airports, TimeSpan.FromHours(24));
}

Impact:

Reduced DB calls
Reduced response time variance

1️⃣1️⃣ Response Compression = Direct Cost Savings

builder.Services.AddResponseCompression(options =>
{
    options.EnableForHttps = true;
    options.Providers.Add<BrotliCompressionProvider>();
    options.Providers.Add<GzipCompressionProvider>();
});

builder.Services.Configure<BrotliCompressionProviderOptions>(o =>
{
    o.Level = CompressionLevel.Fastest;
});

Results:

📉 55% bandwidth reduction
💰 ~₹80 lakhs/year egress savings
Faster client TTFB

Compression alone paid for the optimization effort.

1️⃣2️⃣ IOptions & Config Redesign

Structure

appsettings.json
appsettings.Staging.json
appsettings.Production.json

Typed Config

builder.Services.Configure<ProviderSettings>(
    builder.Configuration.GetSection("ProviderSettings"));

{
  "ProviderSettings": {
    "TimeoutSeconds": 8,
    "RetryCount": 3,
    "EnableCompression": true
  }
}

Centralized. Environment-aware. Testable.

1️⃣3️⃣ Structured Logging (No String Concatenation)

_logger.LogInformation(
    "Search completed {SearchId} {LatencyMs} {Provider}",
    searchId,
    latencyMs,
    providerName);

Queryable in ELK/Datadog/AppInsights.

1️⃣4️⃣ .NET 6 → .NET 8 Upgrade Gains

Observed:

8–12% throughput boost
Improved ThreadPool heuristics
Faster JSON
Reduced memory footprint

Zero code change benefits.

1️⃣5️⃣ Before vs After Metrics

Metric	Before	After	Change
P95 Latency	850ms	510ms	↓ 40%
Network Traffic	100%	45%	↓ 55%
Annual Infra Cost	Baseline	-₹80L	Savings
Alloc Rate	High	Reduced	~30%
Socket Errors	Frequent	None	Stable

1️⃣6️⃣ Canary Deployment Strategy

Deploy to 5% traffic
Monitor:

P95 / P99 latency
Retry rate
Provider error %
GC pause time
1. Gradual ramp-up:
5% → 25% → 50% → 100%
1. Auto rollback on SLA breach

Reduced risk during optimization rollout.

1️⃣7️⃣ Production Rollout Checklist

[ ] Load test with realistic payload sizes
[ ] Validate retry behavior under provider failures
[ ] Verify compression headers
[ ] Confirm cancellation propagation
[ ] Enable structured logging
[ ] Warm static cache
[ ] Canary release
[ ] Monitor cost dashboard
[ ] Validate GC metrics

🎯 Key Takeaways

HttpClient misuse kills performance
Compression = cost optimization
Avoid string allocations in hot paths
Use HashSet/Dictionary for lookups
Never ignore CancellationToken
Replace dynamic with generics
.NET 8 gives free performance
Canary deployments reduce blast radius
Measure everything before & after

DEV Community