DEV Community

Cover image for .NET Performance Optimization: Fixing a 15-Second E-Commerce Page Load
Mashrul Haque
Mashrul Haque

Posted on

.NET Performance Optimization: Fixing a 15-Second E-Commerce Page Load

A real-world case study of rescuing an enterprise e-commerce platform from performance hell, complete with war room panic, 63 SQL queries per page load, and the joy of watching response times drop from 15 seconds to under 700 milliseconds.


TL;DR - What Saved Us

The quick wins that took us from "users are leaving" to "users are buying":

  • Killed N+1 queries - 63 database calls per page became 1
  • Added composite indexes - The right indexes, in the right order
  • Introduced caching layers - Redis for sessions and hot data
  • Implemented async/await properly - Stopped blocking threads
  • Broke the monolith strategically - Started with the checkout path
  • Moved to read replicas - Separated reads from writes

Total impact: 15-second page loads → under 700ms. Cart abandonment dropped 34%.


Table of Contents


The Call That Changed Everything

It was 6 PM on a Friday. My phone rang.

"The site is dying. Black Friday is in three weeks. We need you tomorrow."

The company? Let's call them MegaRetail. They had an e-commerce platform serving 2 million customers. It was built in 2009, survived a decade of "quick fixes," and was now buckling under its own weight. Page loads had crept up to 15 seconds. Cart abandonment was at 78%. Their biggest sales event of the year was approaching, and the system couldn't handle normal traffic, let alone Black Friday volumes.

I said yes. Because apparently I hate weekends.

What followed was four months of the most intense performance work I've done. This is that story.

What I Walked Into

Next morning, I opened the solution.

One project. 2.3 million lines of code. A single Web.config file that was 4,000 lines long. The App_Code folder (yes, that App_Code folder) contained 892 files.

The architecture diagram? There wasn't one. The closest thing was a whiteboard photo from 2012 showing boxes connected by arrows pointing in every direction. Someone had written "HERE BE DRAGONS" in red marker near the checkout flow.

They weren't wrong.

MegaRetail/
├── App_Code/                    # 892 files of "shared" code
├── Classes/                     # 312 more "helper" classes
├── Controls/                    # 156 user controls
├── Pages/                       # 489 .aspx pages
├── Services/                    # 78 WCF services calling each other
├── DataAccess/                  # 94 classes, each with 50+ methods
└── Utilities/                   # Where hope goes to die
Enter fullscreen mode Exit fullscreen mode

The database was worse. SQL Server 2012 (support ended years ago), over 1,200 tables, thousands of stored procedures, and a dbo.EverythingTable with 312 columns. I wish I was joking about that name.

The team was defensive at first. Nobody likes an outsider coming in and pointing out problems. The lead developer had been there since 2011 and took every critique personally. I learned to frame everything as "the system has issues" rather than "someone made bad decisions." Even if someone definitely made bad decisions.

The Investigation: Finding Where It Hurt

Before fixing anything, I needed data. Not opinions. Not "I think the problem is..." statements. Actual measurements.

Here's the diagnostic approach I used:

Step 1: SQL Server Profiler and Extended Events

-- Find the slowest queries
SELECT TOP 50
    (qs.total_elapsed_time / 1000) / qs.execution_count AS avg_duration_ms,
    qs.execution_count,
    qs.total_logical_reads / qs.execution_count AS avg_reads,
    SUBSTRING(qt.text, qs.statement_start_offset/2 + 1,
        (CASE WHEN qs.statement_end_offset = -1
            THEN LEN(CONVERT(NVARCHAR(MAX), qt.text)) * 2
            ELSE qs.statement_end_offset END
        - qs.statement_start_offset)/2 + 1) AS query_text
FROM sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) qt
ORDER BY avg_duration_ms DESC
Enter fullscreen mode Exit fullscreen mode

Step 2: Application Performance Monitoring

I set up Application Insights (took 30 minutes) and immediately saw the horror:

Page Avg Load Time DB Calls Top Issue
Product Detail 8.2s 63 N+1 queries
Category Listing 12.4s 156 Missing index
Checkout 15.1s 89 Lock contention
Search Results 9.7s 34 Full table scans

63 database queries to load a single product page. Sixty-three.

Step 3: Database Wait Statistics

SELECT
    wait_type,
    wait_time_ms / 1000.0 AS wait_time_seconds,
    waiting_tasks_count,
    CASE WHEN waiting_tasks_count > 0
         THEN wait_time_ms / waiting_tasks_count
         ELSE 0 END AS avg_wait_ms
FROM sys.dm_os_wait_stats
WHERE wait_type NOT LIKE '%SLEEP%'
    AND wait_type NOT LIKE '%IDLE%'
    AND wait_type NOT LIKE '%QUEUE%'
    AND waiting_tasks_count > 0
ORDER BY wait_time_ms DESC
Enter fullscreen mode Exit fullscreen mode

Top waits:

  1. PAGEIOLATCH_SH - Disk I/O (not enough RAM, bad queries)
  2. LCK_M_X - Exclusive locks (long transactions)
  3. CXPACKET - Parallelism waits (queries going parallel badly)

I had my hit list.

Problem #1: Fixing N+1 Query Problems in Entity Framework

The N+1 query problem was everywhere. Here's actual code I found:

// ProductService.cs
public ProductViewModel GetProduct(int productId)
{
    var product = _db.Products.Find(productId);
    var viewModel = new ProductViewModel
    {
        Name = product.Name,
        Price = product.Price,
        Category = _db.Categories.Find(product.CategoryId).Name,
        Brand = _db.Brands.Find(product.BrandId).Name,
        Images = _db.ProductImages.Where(i => i.ProductId == productId).ToList(),
        Reviews = _db.Reviews.Where(r => r.ProductId == productId).ToList(),
        RelatedProducts = GetRelatedProducts(productId),
        Specifications = _db.Specifications.Where(s => s.ProductId == productId).ToList(),
        Inventory = _db.Inventory.FirstOrDefault(i => i.ProductId == productId),
        // ... 15 more properties
    };
    return viewModel;
}
Enter fullscreen mode Exit fullscreen mode

The GetRelatedProducts method? It loaded 10 related products, and for each one, it called GetProduct recursively. That's how you get 63 queries for one page.

I actually spent an embarrassing amount of time trying to figure out why the query count kept changing. Turns out there was a GetProduct call hidden inside a property getter. A property getter. I didn't even know you could do that in C#. (You can. You shouldn't.)

The Fix: Eager Loading and Projection

// After: One query, explicit projection
public ProductViewModel GetProduct(int productId)
{
    return _db.Products
        .Where(p => p.Id == productId)
        .Select(p => new ProductViewModel
        {
            Name = p.Name,
            Price = p.Price,
            Category = p.Category.Name,
            Brand = p.Brand.Name,
            Images = p.Images.Select(i => new ImageDto
            {
                Url = i.Url,
                Alt = i.AltText
            }).ToList(),
            Reviews = p.Reviews
                .OrderByDescending(r => r.CreatedAt)
                .Take(10)
                .Select(r => new ReviewDto
                {
                    Rating = r.Rating,
                    Text = r.Text
                }).ToList(),
            RelatedProducts = _db.Products
                .Where(rp => rp.CategoryId == p.CategoryId && rp.Id != productId)
                .Take(10)
                .Select(rp => new RelatedProductDto
                {
                    Id = rp.Id,
                    Name = rp.Name,
                    Price = rp.Price,
                    ImageUrl = rp.Images.FirstOrDefault().Url
                }).ToList(),
            Specifications = p.Specifications.Select(s => new SpecDto
            {
                Name = s.Name,
                Value = s.Value
            }).ToList(),
            InStock = p.Inventory.Quantity > 0
        })
        .FirstOrDefault();
}
Enter fullscreen mode Exit fullscreen mode

One query. All the data. Execution time dropped from 2.3 seconds to 45 milliseconds.

But Entity Framework wasn't the only culprit. The stored procedures were worse.

-- Original: Called in a loop from C#
CREATE PROCEDURE GetProductAttribute
    @ProductId INT,
    @AttributeName VARCHAR(50)
AS
BEGIN
    SELECT Value
    FROM ProductAttributes
    WHERE ProductId = @ProductId AND Name = @AttributeName
END

-- Called like this (I found this in production):
foreach (var attr in attributeNames) // 20+ attributes
{
    var value = _db.ExecuteScalar("GetProductAttribute", productId, attr);
    // ...
}
Enter fullscreen mode Exit fullscreen mode

Twenty round trips to get twenty attributes. Each round trip was ~3ms of network latency alone.

-- Fixed: One call, all attributes
CREATE PROCEDURE GetProductAttributes
    @ProductId INT
AS
BEGIN
    SELECT Name, Value
    FROM ProductAttributes
    WHERE ProductId = @ProductId
END
Enter fullscreen mode Exit fullscreen mode

Then pivot in C# or use FOR JSON if you need it structured:

SELECT Name, Value
FROM ProductAttributes
WHERE ProductId = @ProductId
FOR JSON PATH
Enter fullscreen mode Exit fullscreen mode

Problem #2: SQL Server Index Optimization - Removing What Hurts

The database had 2,891 indexes. Want to know how many were actually being used?

-- Find unused indexes
SELECT
    OBJECT_NAME(i.object_id) AS TableName,
    i.name AS IndexName,
    i.type_desc,
    s.user_seeks,
    s.user_scans,
    s.user_lookups,
    s.user_updates
FROM sys.indexes i
LEFT JOIN sys.dm_db_index_usage_stats s
    ON i.object_id = s.object_id AND i.index_id = s.index_id
WHERE OBJECTPROPERTY(i.object_id, 'IsUserTable') = 1
    AND i.type_desc = 'NONCLUSTERED'
    AND (s.user_seeks + s.user_scans + s.user_lookups) = 0
ORDER BY s.user_updates DESC
Enter fullscreen mode Exit fullscreen mode

Over two thousand indexes with zero seeks, zero scans, zero lookups. But thousands of updates. Every INSERT and UPDATE was maintaining indexes nobody used.

Meanwhile, the queries that mattered had no useful indexes:

-- This query ran 50,000 times per hour
SELECT ProductId, Name, Price, ImageUrl
FROM Products
WHERE CategoryId = @CategoryId
    AND IsActive = 1
    AND Price BETWEEN @MinPrice AND @MaxPrice
ORDER BY SalesRank DESC

-- Available indexes:
-- PK_Products (ProductId) - useless for this query
-- IX_Products_Name - useless for this query
-- IX_Products_CreatedDate - useless for this query
Enter fullscreen mode Exit fullscreen mode

The Fix: Right Indexes, Right Order

CREATE NONCLUSTERED INDEX IX_Products_Category_Active_Price
ON Products (CategoryId, IsActive, Price)
INCLUDE (Name, ImageUrl, SalesRank)
WHERE IsActive = 1
Enter fullscreen mode Exit fullscreen mode

We dropped the unused indexes (after a week of monitoring to make sure nothing broke) and added about 20 targeted ones like the above.

Column order matters. Equality predicates first (CategoryId = @CategoryId), then range predicates (Price BETWEEN).

The filtered index (WHERE IsActive = 1) was a bonus. 87% of queries only wanted active products, so why index the inactive ones?

Results:

Metric Before After
Logical reads 45,847 234
Execution time 3.2s 12ms
CPU time 890ms 8ms

Problem #3: Database Schema Anti-Patterns - The "EverythingTable"

Remember dbo.EverythingTable? Here's its partial structure:

CREATE TABLE EverythingTable (
    Id INT IDENTITY PRIMARY KEY,
    Type VARCHAR(50),           -- 'Product', 'Order', 'Customer', 'Log', etc.
    Name NVARCHAR(500),
    Description NVARCHAR(MAX),
    Value1 VARCHAR(500),        -- Could be anything
    Value2 VARCHAR(500),        -- Really, anything
    Value3 VARCHAR(500),        -- We stopped caring
    -- ... 298 more columns
    CreatedDate DATETIME,
    ModifiedDate DATETIME,
    CreatedBy INT,
    ModifiedBy INT,
    IsDeleted BIT,
    DeletedDate DATETIME,
    Metadata XML                -- When columns weren't enough
)
Enter fullscreen mode Exit fullscreen mode

This table had 53 million rows. Products, orders, customers, logs, audit trails... all in one table, differentiated by a Type column.

Queries looked like this:

SELECT * FROM EverythingTable
WHERE Type = 'Product' AND Value7 = @CategoryId
Enter fullscreen mode Exit fullscreen mode

Nobody knew what Value7 meant without checking the wiki (which was outdated).

The Fix: Proper Table Design (Gradually)

We couldn't rebuild the database overnight. Instead, we:

  1. Created proper tables alongside the monstrosity
  2. Added triggers to sync data both directions
  3. Migrated queries one feature at a time
  4. Deprecated the old table gradually
-- New, sane table
CREATE TABLE Products (
    Id INT IDENTITY PRIMARY KEY,
    Name NVARCHAR(200) NOT NULL,
    Description NVARCHAR(MAX),
    CategoryId INT NOT NULL REFERENCES Categories(Id),
    Price DECIMAL(18,2) NOT NULL,
    IsActive BIT NOT NULL DEFAULT 1,
    -- Actual columns with actual names
    INDEX IX_Products_Category (CategoryId) INCLUDE (Name, Price, IsActive)
)

-- Sync trigger (temporary, during migration)
CREATE TRIGGER TR_Products_Sync ON Products
AFTER INSERT, UPDATE
AS
BEGIN
    -- Sync to legacy table for old code still using it
    MERGE EverythingTable AS target
    USING inserted AS source
    ON target.Type = 'Product' AND target.Id = source.Id
    WHEN MATCHED THEN
        UPDATE SET Value1 = source.Name, Value7 = source.CategoryId, ...
    WHEN NOT MATCHED THEN
        INSERT (Type, Value1, Value7, ...) VALUES ('Product', source.Name, ...);
END
Enter fullscreen mode Exit fullscreen mode

It took four months, but eventually EverythingTable was empty and dropped. The celebration Slack emoji usage was off the charts.

Problem #4: Async/Await in .NET - From 3.5s to 800ms

The checkout process was a masterpiece of blocking operations:

public ActionResult ProcessCheckout(CheckoutModel model)
{
    var inventory = _inventoryService.CheckStock(model.Items);
    var tax = _taxService.Calculate(model.ShippingAddress, model.Items);
    var payment = _paymentService.Charge(model.PaymentInfo, model.Total);
    var order = _orderService.Create(model, payment.TransactionId);
    _emailService.SendConfirmation(order);
    _inventoryService.Decrement(model.Items);
    _warehouseService.QueueFulfillment(order);
    _analyticsService.TrackPurchase(order);
    return RedirectToAction("Confirmation", new { orderId = order.Id });
}
Enter fullscreen mode Exit fullscreen mode

Eight synchronous calls, each waiting for the previous one. Total wait: about 3.5 seconds when everything worked. If the email server was slow? The user waited. If analytics logging failed? 500 error, payment already charged, order in limbo.

The senior dev who wrote this had actually left a comment at the top of the file: // TODO: make this faster someday. The commit was from 2014.

The fix was straightforward: async for the critical path, background jobs for everything else.

public async Task<ActionResult> ProcessCheckout(CheckoutModel model)
{
    var inventory = await _inventoryService.CheckStockAsync(model.Items);
    if (!inventory.IsAvailable)
        return View("OutOfStock", inventory.UnavailableItems);

    // Tax and payment can run in parallel
    var taxTask = _taxService.CalculateAsync(model.ShippingAddress, model.Items);
    var paymentTask = _paymentService.ChargeAsync(model.PaymentInfo, model.Total);
    await Task.WhenAll(taxTask, paymentTask);

    var tax = await taxTask;
    var payment = await paymentTask;

    if (!payment.Success)
        return View("PaymentFailed", payment.Error);

    var order = await _orderService.CreateAsync(model, payment.TransactionId, tax);

    // Everything else happens in the background
    await _backgroundJobs.EnqueueAsync(new PostCheckoutJob
    {
        OrderId = order.Id,
        CustomerEmail = model.Email,
        Items = model.Items
    });

    return RedirectToAction("Confirmation", new { orderId = order.Id });
}

// Background job handles the rest
public class PostCheckoutJobHandler : IJobHandler<PostCheckoutJob>
{
    public async Task HandleAsync(PostCheckoutJob job)
    {
        var tasks = new List<Task>
        {
            _emailService.SendConfirmationAsync(job.OrderId),
            _inventoryService.DecrementAsync(job.Items),
            _warehouseService.QueueFulfillmentAsync(job.OrderId),
            _analyticsService.TrackPurchaseAsync(job.OrderId)
        };

        await Task.WhenAll(tasks);
    }
}
Enter fullscreen mode Exit fullscreen mode

Checkout dropped to about 800ms. Users got their confirmation page while background jobs handled the rest.

We used Hangfire for background jobs. Simple setup, built-in dashboard, uses SQL Server as a backing store.

Problem #5: No Caching Anywhere

Every page load hit the database. Category trees? Database. Product counts? Database. User sessions? Database.

The "Sessions" table had 12 million rows and was locked constantly.

SELECT * FROM Sessions WHERE SessionId = @SessionId

DELETE FROM Sessions WHERE LastAccessed < DATEADD(MINUTE, -30, GETDATE())
Enter fullscreen mode Exit fullscreen mode

That SELECT ran on every single request. The DELETE ran every 2 minutes and locked the table for seconds at a time. During those seconds, every user session check waited.

Redis was the obvious answer:

// Program.cs
builder.Services.AddStackExchangeRedisCache(options =>
{
    options.Configuration = "localhost:6379";
    options.InstanceName = "MegaRetail_";
});

builder.Services.AddSession(options =>
{
    options.IdleTimeout = TimeSpan.FromMinutes(30);
    options.Cookie.HttpOnly = true;
});
Enter fullscreen mode Exit fullscreen mode

Session reads went from 15ms (database) to 0.3ms (Redis). But that was just the start.

We identified "hot data," things that don't change often but are read constantly:

public class CachedCategoryService : ICategoryService
{
    private readonly ICategoryService _inner;
    private readonly IDistributedCache _cache;

    public async Task<List<CategoryDto>> GetCategoryTreeAsync()
    {
        var cacheKey = "categories:tree";
        var cached = await _cache.GetStringAsync(cacheKey);

        if (cached != null)
            return JsonSerializer.Deserialize<List<CategoryDto>>(cached);

        var categories = await _inner.GetCategoryTreeAsync();

        await _cache.SetStringAsync(cacheKey,
            JsonSerializer.Serialize(categories),
            new DistributedCacheEntryOptions
            {
                AbsoluteExpirationRelativeToNow = TimeSpan.FromMinutes(15)
            });

        return categories;
    }
}
Enter fullscreen mode Exit fullscreen mode

Cache hit rates after implementation:

Data Type Hit Rate Queries Saved/Hour
Category tree 99.2% 180,000
Product counts 97.8% 95,000
Homepage products 99.5% 450,000
User sessions 99.9% 2,100,000

Database load dropped 67% overnight.

The Cache Invalidation Disaster

I should mention: my first caching attempt was a disaster. I cached product prices with a 1-hour TTL, thinking "prices don't change that often."

Wrong. The marketing team ran flash sales. They'd drop a price, and customers would see the old price for up to an hour. We had people paying $99 for items that were supposed to be $49. The finance team was not pleased.

Lesson learned: cache aggressively, but think about invalidation before you ship. We ended up with event-driven invalidation for anything price-related. The category tree could be stale for 15 minutes. Prices could not.

Problem #6: Read/Write Contention

Even after caching, our single SQL Server instance was still under pressure. Every read and write went to the same server. During peak hours, read queries were competing with order inserts for the same resources.

The monitoring told the story:

-- Check read vs write ratio
SELECT
    SUM(user_seeks + user_scans + user_lookups) AS total_reads,
    SUM(user_updates) AS total_writes,
    CAST(SUM(user_seeks + user_scans + user_lookups) AS FLOAT) /
        NULLIF(SUM(user_updates), 0) AS read_to_write_ratio
FROM sys.dm_db_index_usage_stats
WHERE database_id = DB_ID()
Enter fullscreen mode Exit fullscreen mode

Our read-to-write ratio was 47:1. For every write, we had 47 reads. Product browsing, category listings, search results: all reads. Only checkout, cart updates, and order creation were writes.

Yet all of that traffic was hitting the same database server.

SQL Server Always On gave us read replicas. The setup was straightforward:

public class ProductRepository : IProductRepository
{
    private readonly string _readConnection;   // Points to replica
    private readonly string _writeConnection;  // Points to primary

    public async Task<Product> GetByIdAsync(int id)
    {
        using var connection = new SqlConnection(_readConnection);
        return await connection.QueryFirstOrDefaultAsync<Product>(
            "SELECT * FROM Products WHERE Id = @Id", new { Id = id });
    }

    public async Task UpdateInventoryAsync(int productId, int quantity)
    {
        using var connection = new SqlConnection(_writeConnection);
        await connection.ExecuteAsync(
            "UPDATE Products SET StockQuantity = @Quantity WHERE Id = @Id",
            new { Id = productId, Quantity = quantity });
    }
}
Enter fullscreen mode Exit fullscreen mode

For EF Core, same idea: separate ReadDbContext and WriteDbContext pointing at different connection strings.

The architecture after read replicas:

                    ┌─────────────────────┐
                    │   Load Balancer     │
                    └──────────┬──────────┘
                               │
                    ┌──────────▼──────────┐
                    │    Application      │
                    │      Servers        │
                    └──────────┬──────────┘
                               │
              ┌────────────────┼────────────────┐
              │                │                │
              ▼                ▼                ▼
       ┌─────────────┐  ┌─────────────┐  ┌─────────────┐
       │ SQL Primary │  │ SQL Replica │  │ SQL Replica │
       │  (Writes)   │  │  (Reads)    │  │  (Reads)    │
       └──────┬──────┘  └─────────────┘  └─────────────┘
              │                ▲                ▲
              │    Always On   │                │
              └───────────────────────────────────┘
                    (Synchronous Replication)
Enter fullscreen mode Exit fullscreen mode

Results after implementing read replicas:

Metric Before After
Primary DB CPU 67% 23%
Read query latency 45ms avg 28ms avg
Write query latency 89ms avg 34ms avg
Max concurrent users 8,000 15,000

The primary server now only handled writes, about 2% of total traffic. Read replicas absorbed the browsing load, and we could scale horizontally by adding more replicas during peak seasons.

Pro tip: Watch out for replication lag. For product browsing, a few milliseconds of lag is fine. For inventory checks during checkout, always read from primary to avoid overselling.

The Fix: A Phased Approach That Actually Worked

We didn't try to fix everything at once. That's how projects die. Instead, we worked in phases with measurable goals.

Phase 1: Stop the Bleeding (Week 1)

Goal: Get checkout under 3 seconds. This was the money path.

Actions:

  • Added missing indexes for checkout queries (2 hours)
  • Implemented Redis for sessions (4 hours)
  • Made payment processing async (1 day)
  • Added response caching for product pages (4 hours)

Result: Checkout dropped from 15s to 2.8s. Cart abandonment dropped from 78% to 61%.

Phase 2: Database Surgery (Weeks 2-4)

Goal: Fix the worst N+1 patterns and index problems.

Actions:

  • Profiled all queries running >100ms
  • Rewrote top 20 worst stored procedures
  • Dropped 2,134 unused indexes
  • Added 23 targeted composite indexes
  • Migrated sessions fully to Redis
  • Started EverythingTable decomposition

Result: Product page load dropped from 8.2s to 1.4s. Database CPU utilization dropped from 89% to 34% (and would eventually reach 28% after Phase 3).

Phase 3: Strategic Decomposition (Months 2-4)

Goal: Break the monolith where it hurts most.

We didn't go full microservices. That would've taken years. Instead, we identified the highest-value extraction targets:

  1. Search Service - Moved to Elasticsearch
  2. Product Catalog API - Extracted, cacheable, read-heavy
  3. Inventory Service - Real-time stock checks, needed isolation
  4. Checkout Service - The money path, needed reliability
  5. Read Replicas - Offloaded 98% of read traffic from the primary database
                    ┌─────────────────────┐
                    │   Load Balancer     │
                    └──────────┬──────────┘
                               │
           ┌───────────────────┼───────────────────┐
           │                   │                   │
           ▼                   ▼                   ▼
    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐
    │  Monolith   │    │   Search    │    │  Checkout   │
    │  (Legacy)   │    │   Service   │    │   Service   │
    └──────┬──────┘    └──────┬──────┘    └──────┬──────┘
           │                  │                   │
           │                  │                   │
           ▼                  ▼                   ▼
    ┌─────────────┐    ┌─────────────┐    ┌─────────────┐
    │  SQL Server │    │Elasticsearch│    │  SQL Server │
    │  (Primary)  │    │             │    │  (Checkout) │
    └─────────────┘    └─────────────┘    └─────────────┘
Enter fullscreen mode Exit fullscreen mode

The monolith still existed, but the critical paths were isolated. If the monolith had issues, checkout and search kept working.

The Results: Numbers Don't Lie

After 4 months:

Metric Before After
Homepage load 6.2s ~350ms
Product page load 8.2s ~300ms
Search results 9.7s ~200ms
Checkout 15.1s ~700ms
Cart abandonment 78% 44%
Database CPU (peak) 89% 31%

Black Friday? The site handled 3x the previous year's traffic with zero downtime. The ops team actually got to eat Thanksgiving dinner for once.

Revenue impact? The 34-point drop in cart abandonment translated to roughly $4.2 million additional revenue during the holiday season. My invoice was considerably less than that.

Lessons Learned

What worked: Measuring first. No guessing, just profiler data and APM metrics. Quick wins early (indexes and caching) bought us political capital to do the harder stuff. Shipping improvements weekly kept stakeholders from panicking.

What I'd do differently: Set up monitoring on day one, not day three. I wasted time arguing about what was slow when I could've just shown them. Also, I should've pushed back harder on scope. Leadership wanted everything fixed in month one. That's not how legacy systems work.

The dumb thing I almost did: I nearly proposed a full rewrite. The CTO asked me point-blank in week two: "Should we just start over?" I was tempted to say yes. The codebase was that bad.

But rewrites fail. They take longer than estimated, they lose institutional knowledge, and you end up rebuilding bugs that existed for good reasons you didn't understand. The system worked. It was slow, not broken. Big difference.


Questions I Get Asked

"We have 200 indexes and queries are still slow. What gives?"

Probably none of those indexes match your actual query patterns. I've seen this a dozen times. Someone adds an index on CreatedDate because "we sort by date sometimes," but the query that's killing you filters by CategoryId and IsActive first. Use Query Store to find your actual top queries, then build indexes for those. Delete the rest.

"Our DBA says we need to upgrade to a bigger server."

Maybe. But I've seen $50,000 database servers brought to their knees by N+1 queries that a $5 code fix would solve. Profile first. If your queries are doing full table scans, a bigger server just does bigger full table scans.

"How do I get management to care about this?"

Stop talking about architecture. Talk about money. "Our checkout takes 15 seconds and we're losing $X million in abandoned carts" gets budget approved. "The architecture is suboptimal" gets you a meeting scheduled for next quarter.

"Should we just rewrite the whole thing in [insert new framework]?"

Almost certainly not. I've seen more projects die from rewrites than from technical debt. Fix what's broken, extract what needs to scale independently, and leave the rest alone. A working monolith beats a half-finished microservices migration every time.


Final Thoughts

Performance work isn't glamorous. There's no framework to install, no architecture diagram that solves everything. It's measurement, targeted fixes, and a lot of time staring at query plans.

Most systems have the same problems: N+1 queries, missing indexes, no caching, synchronous calls that should be async. Fix these and you'll solve most performance issues you'll ever see.

The MegaRetail project taught me that technical debt is expensive, but it's also finite. A monolith from 2009 isn't a death sentence. With focused effort, even legacy systems can perform.

Just maybe budget for more than three weeks before Black Friday next time.


Further Reading

If you want to go deeper on any of this:


About the Author

I'm Mashrul Haque, a Systems Architect who has spent 15+ years building and rescuing enterprise applications with .NET, SQL Server, and Azure. I specialize in performance optimization, distributed systems, and explaining to executives why "just add more servers" isn't a strategy.

This case study is based on a real consulting engagement. Company details have been anonymized to protect client confidentiality. All performance metrics were measured using Application Insights and SQL Server Extended Events.

When your e-commerce platform is on fire during Black Friday, I'm the one you call.

Follow me here on dev.to for more war stories and .NET performance content.

Top comments (0)