YogitaKadam14

Posted on Jun 15

Performance Lessons from a Real Production Azure Application: Why Scaling Wasn't the Answer

#azure #softwareengineering #performance #sqlserver

Recently, I worked on an enterprise scoring and reporting application hosted on Azure SQL. What started as a simple performance issue eventually became a deep investigation into database usage, application design, caching, batching, and scalability.

Most developers have faced this situation at least once:

Users report that the application is slow.

The first reaction is: "Let's scale the infrastructure."

We thought the same.

But what if the real problem isn't CPU, memory, or database size?

This is the story of how our Scorecard application started hitting 100% DTU on Azure SQL, struggled under a relatively small production workload, and how we significantly improved performance without continuously increasing infrastructure costs.

For anyone unfamiliar with Azure SQL, DTU (Database Transaction Unit) is a blended measure of CPU, memory, and I/O. A database can hit 100% DTU even when CPU usage appears normal.

The Application

The Scorecard is a scoring and reporting platform used by healthcare auditors.

The application allows users to:

Complete scorecards
Review assessments
Generate reports
Manage templates

Due to budget constraints, the application was hosted on Azure SQL Basic (5 DTUs).

Initially everything worked perfectly.

✅ Development environment was fast.

✅ Testing looked good.

✅ No obvious performance concerns.

Then production happened.

It Worked Fine... Until It Didn't

Development → Production → Slow Performance
             ↓
         Cache Fixes
             ↓
         Scale to S0
             ↓
      Problem Returns
             ↓
      Deep Investigation
             ↓
       Root Cause Found
             ↓
      Multiple Optimizations
             ↓
     Stable Performance

As users started actively using the application, we began receiving complaints:

Scorecards were loading slowly
Reports were taking too long to generate
Save operations felt sluggish

Our first response was straightforward:

Adjust cache settings
Tune cache TTL values
Apply a few quick optimizations

Things improved.

For a while.

Then the same issue returned.

The DTU Surprise

Before making any changes, we needed to understand where the bottleneck actually was.

The most valuable tool during this investigation wasn't caching, indexing, or scaling.

It was monitoring.

If we had looked only at application logs, we might have continued scaling infrastructure without solving the real problem.

Azure monitoring helped us identify that the bottleneck was actually in the database layer.

We started monitoring Azure resources.

First stop:

Azure App Service

Everything looked healthy.

CPU looked normal
Memory usage was acceptable
No obvious bottlenecks

Then we checked Azure SQL.

And found this:

DTU = 100%

At this point we assumed:

"The database tier is too small."

So we upgraded from:

Azure SQL Basic → Azure SQL S0

Problem solved?

Not really.

The application performed better for a few days.

Then the same issue returned.

Even with only 2–3 users.

That's when we realized something important:

Scaling infrastructure was treating the symptom, not the root cause.

Time to Investigate

Instead of increasing DTUs again, we decided to trace every database interaction.

What we discovered was surprising.

Opening a template

5 database calls

Loading a scorecard

40 database calls

Downloading a report

13 database calls

And that was only the visible part.

We also found:

Queries running inside loops
Repeated database reads
Multiple unnecessary API calls
Expensive search operations
Large datasets loaded unnecessarily

The application wasn't slow because of the number of users.

It was slow because of the amount of work being done for each user action.

The Hidden Performance Killers

1. Loading Thousands of Records to Show 20

Our grids were loading large datasets even though users could only see a small portion of the records.

This meant:

More SQL reads
Larger payloads
Higher memory usage

2. Saving Questions One by One

Imagine saving a scorecard with 50 questions.

The application was effectively doing:

Question 1 → Save
Question 2 → Save
Question 3 → Save
...
Question 50 → Save

That's a lot of unnecessary network traffic.

3. Repeated Data Fetching

Some lookup data and dropdown values rarely changed.

Yet they were being fetched repeatedly.

Every page refresh.

Every screen load.

Every user.

4. Search Operations

Search functionality was expensive.

Large datasets were being scanned repeatedly, causing significant pressure on the database.

The Optimization Journey

There was no single silver bullet.

Instead, we applied several focused optimizations.

1. Created Optimized Database Views

Many screens were assembling data through multiple queries and joins.

We introduced optimized database views to prepare the data closer to where it lived.

Benefits:

Simpler queries
Reduced processing
Faster screen loads

2. Reduced Database Trips

Instead of making multiple calls for a single screen:

Template → 5 calls
Scorecard → 40 calls
Report → 13 calls

We consolidated requests wherever possible.

Result:

Fewer round trips
Lower database load
Better response times

3. Implemented Smart Save Logic

Previously every question was updated.

Even if nothing had changed.

We modified the logic to:

✅ Save only changed questions

✅ Skip unchanged records

This immediately reduced unnecessary network traffic.

4. Introduced Batch Processing

Instead of:

50 questions = 50 requests

We switched to:

50 questions = 5 batch requests

Using mutation batching significantly improved save performance.

5. Added Caching

We identified reusable data and cached it.

Examples:

Dropdown values
Lookup tables
Reference data

One important lesson:

Cache aggressively, but always invalidate cache correctly after updates and deletes.

Otherwise performance improves while data accuracy suffers.

6. Implemented Server-Side Pagination

This was one of the most impactful improvements.

Instead of loading thousands of records:

Load only what the user can see

Benefits:

Smaller payloads
Faster grids
Reduced database load

7. Optimized Search

We improved search performance through:

Better indexing
Query optimization
More efficient lookup strategies

Result:

Faster search
Lower DTU consumption

8. Controlled Parallel Processing

Initially many operations ran sequentially.

We introduced controlled parallel execution for independent operations.

However, we intentionally limited concurrency.

Why?

Because uncontrolled parallelism can overwhelm smaller database tiers.

Optimization isn't about doing everything in parallel.

It's about doing the right amount of work in parallel.

Results

After implementing these optimizations:

✅ Reduced database round trips by more than 80% on key screens

✅ Reduced unnecessary save operations

✅ Improved report generation performance

✅ Reduced Azure SQL DTU pressure

✅ Improved user experience without increasing infrastructure costs

Most importantly:

We stopped treating the symptoms and fixed the actual problem.

We improved the application instead of continuously increasing Azure spend.

Key Lessons Learned

1. Scaling Is Not Always the Answer

Scaling can provide temporary relief.

It rarely fixes inefficient application behavior.

2. Performance Problems Hide During Development

Development environments rarely have:

Real data volumes
Real concurrency
Real user behavior

Production reveals the truth.

3. Loops + Database Calls = Trouble

One database call inside a loop may look harmless.

But:

1 query × 50 records × 3 users

Can quickly become hundreds of database operations.

4. Small Optimizations Compound

No single change solved our issue.

The combination of multiple improvements did.

5. Always Investigate Before Optimizing

One of the biggest takeaways from this project:

Measure first. Optimize second.

Never assume where the bottleneck is.

The Biggest Mistake We Made

Initially, we assumed:

Slow application = Small database

That assumption led us to scale infrastructure before fully understanding the problem.

The real issue wasn't database size.

The real issue was:

Too many database calls
Too much repeated work
Too much unnecessary data movement

This is a powerful lesson because many teams make the same mistake.

Final Thoughts

"Scaling gave us more capacity. Optimization gave us a solution."

This project completely changed how we approach performance optimization.

Before asking for more infrastructure, we now ask:

Can this be cached?
Can this be batched?
Can this be loaded later?
Can this be fetched once?
Can this run in parallel?

Performance optimization is rarely about one big fix.

It's usually about identifying dozens of small inefficiencies and removing them one by one.

And sometimes, that's enough to turn a struggling application into a scalable one without spending an extra dollar on infrastructure.

If there's one lesson I took from this project, it's this:

Before increasing CPU, memory, or DTUs, spend time understanding how your application behaves.

Sometimes the biggest performance gains don't come from bigger servers.

They come from fewer queries, smarter caching, better batching, and cleaner design.

Have you ever faced a similar production performance challenge? I'd love to hear your experience in the comments.

DEV Community