DEV Community

Cover image for πŸš€ Hardening & Optimizing a .NET Flight Search & Sell Microservice
Jaydeep Kumar Sahu
Jaydeep Kumar Sahu

Posted on

πŸš€ Hardening & Optimizing a .NET Flight Search & Sell Microservice

Deep technical dive: 55% traffic reduction, 40% latency drop, ~β‚Ή80L annual savings


TL;DR

We re-architected and optimized a high-throughput Flight Search & Sell microservice that integrates with multiple third-party aviation data providers and internal pricing systems.

Impact:

  • πŸ“‰ Network traffic ↓ 55%
  • ⚑ P95 latency ↓ 40%
  • πŸ’° ~β‚Ή80 lakhs/year cloud savings
  • 🧠 Reduced GC pressure (~30% alloc drop)
  • πŸ›‘οΈ Improved resiliency & rollout safety

This post walks through production-grade improvements across HTTP stack, JSON, memory, async patterns, resiliency, caching, configuration, compression, structured logging, deployment strategy, and .NET 8 upgrades β€” with concrete code.

Target audience: senior backend engineers & platform architects.


1️⃣ Context: High-Throughput Aggregation at Scale

The system:

Clients β†’ API Gateway β†’ Flight Search Service
                                β”œβ”€β”€ Provider A (GDS)
                                β”œβ”€β”€ Provider B (NDC)
                                β”œβ”€β”€ Provider C (LCC)
                                β”œβ”€β”€ Internal pricing engine
                                └── Static metadata store
Enter fullscreen mode Exit fullscreen mode

Traffic profile:

  • Bursty (promo campaigns)
  • Heavy parallel provider calls
  • Strict SLA (sub-1s P95 target)
  • Large JSON payloads (100–500KB per provider)
  • High outbound bandwidth cost

Observed problems:

  • Socket exhaustion
  • Retry storms during provider latency spikes
  • High Gen2 GC pauses
  • Large LOH allocations
  • Excessive string normalization
  • Bandwidth-heavy responses
  • Config duplication across environments

2️⃣ Architecture Evolution (Textual Diagram)

Before

Request
  ↓
Controller
  ↓
Aggregator Service
  β”œβ”€β”€ new HttpClient() per provider call
  β”œβ”€β”€ Newtonsoft.Json
  β”œβ”€β”€ dynamic parsing
  β”œβ”€β”€ Manual retry logic
  β”œβ”€β”€ List<T>.Contains in hot paths
  β”œβ”€β”€ ToLower() comparisons
  β”œβ”€β”€ Static configuration helpers
  └── No compression
Enter fullscreen mode Exit fullscreen mode

After (.NET 8 optimized)

Request
  ↓
Controller (CancellationToken bound to HttpContext)
  ↓
Orchestrator
  β”œβ”€β”€ Parallel fan-out (Task.WhenAll)
  β”œβ”€β”€ IHttpClientFactory (Named clients)
  β”œβ”€β”€ Polly retry + jitter
  β”œβ”€β”€ Typed provider clients
  β”œβ”€β”€ System.Text.Json source-gen ready
  β”œβ”€β”€ HashSet/Dictionary lookups
  β”œβ”€β”€ MemoryCache for static data
  β”œβ”€β”€ Structured logging
  β”œβ”€β”€ Brotli/Gzip compression
  β”œβ”€β”€ Config via IOptions<T>
  └── Canary-aware deployment pipeline
Enter fullscreen mode Exit fullscreen mode

3️⃣ HTTP Stack: From Anti-Pattern to Production-Grade

❌ The Problem: Per-call HttpClient

public async Task<string> CallProviderAsync(string url)
{
    using var client = new HttpClient();
    return await client.GetStringAsync(url);
}
Enter fullscreen mode Exit fullscreen mode

Issues:

  • Socket exhaustion
  • No connection pooling reuse
  • No DNS refresh handling
  • No central resiliency

βœ… IHttpClientFactory + Named Clients

Registration

builder.Services.AddHttpClient("ProviderA", client =>
{
    client.BaseAddress = new Uri("https://api.providerA.com/");
    client.Timeout = TimeSpan.FromSeconds(8);
    client.DefaultRequestHeaders.AcceptEncoding.ParseAdd("br");
    client.DefaultRequestHeaders.AcceptEncoding.ParseAdd("gzip");
})
.ConfigurePrimaryHttpMessageHandler(() =>
{
    return new SocketsHttpHandler
    {
        PooledConnectionLifetime = TimeSpan.FromMinutes(5),
        AutomaticDecompression = DecompressionMethods.GZip | DecompressionMethods.Brotli
    };
})
.AddPolicyHandler(GetRetryPolicy());
Enter fullscreen mode Exit fullscreen mode

Best practices:

  • Use SocketsHttpHandler
  • Control pooled connection lifetime
  • Enable decompression at handler level
  • Centralize retry policy

4️⃣ Resilience Engineering with Polly (Retry + Jitter)

Without jitter, retries synchronize across instances β†’ provider meltdown.

static IAsyncPolicy<HttpResponseMessage> GetRetryPolicy()
{
    var jitterer = new Random();

    return Policy
        .Handle<HttpRequestException>()
        .OrResult<HttpResponseMessage>(r => 
            (int)r.StatusCode >= 500)
        .WaitAndRetryAsync(
            retryCount: 3,
            sleepDurationProvider: retryAttempt =>
                TimeSpan.FromMilliseconds(
                    Math.Pow(2, retryAttempt) * 100 +
                    jitterer.Next(0, 100)));
}
Enter fullscreen mode Exit fullscreen mode

Advanced considerations:

  • Separate policies for internal vs third-party calls
  • Avoid retrying 4xx
  • Combine with circuit breaker in extreme cases
  • Keep retry count low (don’t amplify latency)

5️⃣ JSON: Newtonsoft β†’ System.Text.Json

Why switch?

  • Lower allocations
  • Faster serialization
  • Native support in .NET runtime
  • Reduced dependency surface

❌ Before

dynamic response = JsonConvert.DeserializeObject(json);
return response.data.price;
Enter fullscreen mode Exit fullscreen mode

Reflection-heavy. Late-bound. Slow.


βœ… After (Strongly Typed)

public sealed class ProviderResponse<T>
{
    public T Data { get; init; }
}

public sealed class PriceDto
{
    public decimal Amount { get; init; }
    public string Currency { get; init; }
}

var typed = JsonSerializer.Deserialize<ProviderResponse<PriceDto>>(json);
Enter fullscreen mode Exit fullscreen mode

Benefits:

  • Compile-time safety
  • Faster member access
  • Lower memory overhead
  • Better AOT compatibility in .NET 8

6️⃣ String Allocation & Case Handling

Hot path mistake:

if (currency.ToLower() == "inr")
Enter fullscreen mode Exit fullscreen mode

Allocates new string per call.

Correct approach

if (string.Equals(currency, "INR", StringComparison.OrdinalIgnoreCase))
Enter fullscreen mode Exit fullscreen mode

Zero allocation. Culture-safe.

Observed improvement: noticeable drop in Gen0 collections under load.


7️⃣ Data Structures: O(n) β†’ O(1)

❌ List Contains

if (blockedAirlines.Contains(flight.AirlineCode))
Enter fullscreen mode Exit fullscreen mode

O(n) in hot path.

βœ… HashSet

var blockedSet = new HashSet<string>(blockedAirlines);

if (blockedSet.Contains(flight.AirlineCode))
Enter fullscreen mode Exit fullscreen mode

O(1) average lookup.


Fast lookup via Dictionary

var airportMap = airports.ToDictionary(a => a.Code);

if (airportMap.TryGetValue("DEL", out var airport))
{
    // instant lookup
}
Enter fullscreen mode Exit fullscreen mode

8️⃣ Async Discipline & Cancellation Propagation

Golden Rule:

Every async boundary must propagate CancellationToken.

public async Task<SearchResponse> SearchAsync(
    SearchRequest request,
    CancellationToken cancellationToken)
{
    var tasks = _providers
        .Select(p => p.SearchAsync(request, cancellationToken));

    var results = await Task.WhenAll(tasks);

    return Aggregate(results);
}
Enter fullscreen mode Exit fullscreen mode

Best practices:

  • No .Result or .Wait()
  • Avoid Task.Run in ASP.NET
  • Avoid unbounded parallelism
  • Consider Parallel.ForEachAsync with degree limits

9️⃣ Thread & Task Management

Avoid:

Task.Run(() => ProviderCall());
Enter fullscreen mode Exit fullscreen mode

ASP.NET Core already runs on ThreadPool.

If limiting concurrency:

using var semaphore = new SemaphoreSlim(5);

foreach (var provider in providers)
{
    await semaphore.WaitAsync(cancellationToken);
    _ = Task.Run(async () =>
    {
        try { await provider.CallAsync(); }
        finally { semaphore.Release(); }
    });
}
Enter fullscreen mode Exit fullscreen mode

Better: use bounded parallelism patterns.


πŸ”Ÿ Caching Static Metadata

Airport list, airline list, fare rules rarely change.

builder.Services.AddMemoryCache();
Enter fullscreen mode Exit fullscreen mode
if (!_cache.TryGetValue("Airports", out Dictionary<string, Airport> airports))
{
    airports = LoadAirportsFromDb();
    _cache.Set("Airports", airports, TimeSpan.FromHours(24));
}
Enter fullscreen mode Exit fullscreen mode

Impact:

  • Reduced DB calls
  • Reduced response time variance

1️⃣1️⃣ Response Compression = Direct Cost Savings

builder.Services.AddResponseCompression(options =>
{
    options.EnableForHttps = true;
    options.Providers.Add<BrotliCompressionProvider>();
    options.Providers.Add<GzipCompressionProvider>();
});

builder.Services.Configure<BrotliCompressionProviderOptions>(o =>
{
    o.Level = CompressionLevel.Fastest;
});
Enter fullscreen mode Exit fullscreen mode

Results:

  • πŸ“‰ 55% bandwidth reduction
  • πŸ’° ~β‚Ή80 lakhs/year egress savings
  • Faster client TTFB

Compression alone paid for the optimization effort.


1️⃣2️⃣ IOptions & Config Redesign

Structure

appsettings.json
appsettings.Staging.json
appsettings.Production.json
Enter fullscreen mode Exit fullscreen mode

Typed Config

builder.Services.Configure<ProviderSettings>(
    builder.Configuration.GetSection("ProviderSettings"));
Enter fullscreen mode Exit fullscreen mode
{
  "ProviderSettings": {
    "TimeoutSeconds": 8,
    "RetryCount": 3,
    "EnableCompression": true
  }
}
Enter fullscreen mode Exit fullscreen mode

Centralized. Environment-aware. Testable.


1️⃣3️⃣ Structured Logging (No String Concatenation)

_logger.LogInformation(
    "Search completed {SearchId} {LatencyMs} {Provider}",
    searchId,
    latencyMs,
    providerName);
Enter fullscreen mode Exit fullscreen mode

Queryable in ELK/Datadog/AppInsights.


1️⃣4️⃣ .NET 6 β†’ .NET 8 Upgrade Gains

Observed:

  • 8–12% throughput boost
  • Improved ThreadPool heuristics
  • Faster JSON
  • Reduced memory footprint

Zero code change benefits.


1️⃣5️⃣ Before vs After Metrics

Metric Before After Change
P95 Latency 850ms 510ms ↓ 40%
Network Traffic 100% 45% ↓ 55%
Annual Infra Cost Baseline -β‚Ή80L Savings
Alloc Rate High Reduced ~30%
Socket Errors Frequent None Stable

1️⃣6️⃣ Canary Deployment Strategy

  1. Deploy to 5% traffic
  2. Monitor:
  • P95 / P99 latency
  • Retry rate
  • Provider error %
  • GC pause time

    1. Gradual ramp-up:
  • 5% β†’ 25% β†’ 50% β†’ 100%

    1. Auto rollback on SLA breach

Reduced risk during optimization rollout.


1️⃣7️⃣ Production Rollout Checklist

  • [ ] Load test with realistic payload sizes
  • [ ] Validate retry behavior under provider failures
  • [ ] Verify compression headers
  • [ ] Confirm cancellation propagation
  • [ ] Enable structured logging
  • [ ] Warm static cache
  • [ ] Canary release
  • [ ] Monitor cost dashboard
  • [ ] Validate GC metrics

🎯 Key Takeaways

  • HttpClient misuse kills performance
  • Compression = cost optimization
  • Avoid string allocations in hot paths
  • Use HashSet/Dictionary for lookups
  • Never ignore CancellationToken
  • Replace dynamic with generics
  • .NET 8 gives free performance
  • Canary deployments reduce blast radius
  • Measure everything before & after

Top comments (0)