Ever fetched your order history on Amazon?
That one click triggers a chain of backend operations that can take ~5ms or keep you waiting for over a second.
At a high level, your request typically follows a path like the one shown below

Load tests reveal that, in most cases, the order API is taking over a second to respond. You're assigned to identify the bottleneck and fix it ASAP, as it's now impacting the business.
If this sounds daunting - or uncomfortably familiar - you're not alone.
Let's break it down with a 101 guide to performance tuning any application.
SOS/Emergency Vertical Scaling - If you are under immediate production pressure, temporarily increase CPU/RAM of your server to stabilise the application. This will create the headroom for you to safely investigate the root cause and implement a proper fix. Treat this as a short-term measure only: once the incident is under control and you've fixed the underlying cause, scale resources back down to an efficient level to avoid unnecessary cost.
Performance bottlenecks can typically be grouped into three broad categories:
- Backend service optimisation
- Database optimisation
- Backend infrastructure optimisation
Developers rarely have generous deadlines to fix these performance issues - so the first step is to run load tests and analyse application logs and server observability.
To quickly narrow down which category is the bottleneck, consider the following steps:
- Do you notice low throughput in load test? Is the CPU utilisation high in both backend server and database? If yes, start with database optimisation.
- If only backend CPU utilisation is high, start with backend service optimisation. Finally, look into server configuration.
- Is the memory utilisation high? Check the application logs to find memory leaks.
- Are any external APIs involved? Check the response time of external API via browser dev tool. You can consider caching response, changing CND, efficient retry mechanism and async calls to improve the external API response.
Backend Service
In development, functionality often takes priority over efficiency -which is usually acceptable. But under tight deadlines, inefficient code often only reveals itself under production load.
Start by identifying the APIs with the worst performance and tackle them first. Before reaching for caching or pagination, which usually mask the underlying issues - verify the following:
1.Asynchronous programming
- In today's system, even seconds of load time is unacceptable.
- Ensure your code does not block unnecessarily, especially during database/network operations.
- Use await (or equivalent) correctly so multiple operations can run concurrently without changing the application logic.
- Avoid calling async APIs and then blocking on them synchronously, as this defeats the purpose and kills throughput.
2.Algorithmic Efficiency
- Many performance issues come from poor iteration patterns. For example, database insert/update calls inside a nested loop can kill throughput.
- Ensure your logic is time-optimised, loops are inevitable but inefficient logic over large datasets can cause serious performance issues.
- Consider using flat loops, maps, sets and even multiple if statements, instead of nested loops wherever possible.
3.Limit payload size
- Sending entire data between frontend and backend (even when small portion is updated), increases serialisation/deserialisation overhead, consumes unnecessary network bandwidth and can expose sensitive data.
- Ensure you have proper strategy to handle save/edit use-cases.
- Send and receive only necessary data or data that has changed.
- Use partial updates or delta updates wherever necessary.
- consider incorporating draft mechanism with database views or flags in the table, so that in-progress changes do not require full dataset exchange.
4.Batch processing
- Real-world applications deal with huge data, so bulk operations outperform multiple network calls.
- While batching can improve performance, under-fetching can increase network calls and degrades performance. So choose the batch size intentionally.
- Ensure heavy database write operations are batched properly, instead of executing them one by one.
- In case of heavy reads, consider reading data in small chunks, so the application layer is not overwhelmed.
- Operations occurring in close intervals can be captured and executed at regular intervals to reduce load.
- Ensure that you design batch processing to gracefully handle failures. Failure should not cause heap growth, memory leaks or inconsistent state. Use transactions where necessary and implement retry logic with proper memory cleanup.
These quick checks alone can significantly improve request latency. Once these are in place, you can properly plan caching to reduce frequent database calls and paginate responses that return large datasets.
Database Optimisation
Databases can hide massive performance gains, but they often receive less attention than application code during design.
Before you dive into optimising your queries, profile your slowest query using tools like pg_stat_statements for PostgreSQL or equivalent.
1.Connection Pool
- High request latency but low database CPU/IO? Long waits before queries execute? It's likely your connection pool is exhausted.
- Increase your connection pool size or implement efficient request handling.
- You can consider using tools like PgBouncer for PostgreSQL or equivalent, that will allow you to have thousands of connections without overwhelming the database.
2.Query optimisation
- Analyse your slowest query with Explain/ Explain Analyze to get the query plan and execution costs.
- Look for n+1 queries, expensive joins and subqueries ,unnecessary functions or computations.
- Consider whether moving certain operations to application code is more efficient than letting the database handle them.
3.Indexing
- Does your query plan show long sequential scans? Do you usually deal with large dataset per query?
- If yes, Add indexes on columns used in WHERE, GROUP BY, ORDER BY, and JOIN clauses.
- If you have 2 columns that are often read together, consider a composite index.
- Index add storage overhead, so ensure they are used effectively.
4.Normalised vs Denormalised Table
- Do you often require multiple joins to serve common read patterns? If yes, then having a denormalised table is better since it reduces join that can slow the query.
- If you must keep the tables separate, consider creating a view that has combined data for simplifying the read operation.
- But if you require highly consistent data at all times, keep the normalised table.
Backend Infrastructure
A system can only handle as many requests as its infrastructure allows. Whether you're using a VM, managed instance groups or another setup, having right CPU and Memory configuration is crucial.
There is no universal correct configuration, the CPU and RAM utilisation will always depend on your specific workload and traffic pattern. To find your unique CPU and RAM configuration, follow this three-phase formula:
Phase 1 - Measure Peak CPU and RAM Usage:
The first step is to load test your application or the specific API your optimising. Monitor the CPU and RAM utilisation closely during the test.
Rule of Thumb: if your CPU/RAM utilisation exceeds ~80%, it's usually a sign that more resources may be needed. Adjust based on your workload.
Phase 2 - Allocate the desired CPU/RAM:
A simple starting point for allocation is peak usage * 2
This "2x" is just a guideline - sometimes 1.5x is sufficient, and other workloads may require more than 2x. Use load tests to find right combination and avoid surprises in production.
Phase 3 - Load Test with New configuration:
Finally, benchmark your application performance with new CPU and RAM configurations to ensure the changes actually improve throughput and reduce latency.
This strategy works well with vertical-scaling, if the configuration issues still persist consider horizontal-scaling.
If you're using managed instance groups - ensure min-instance is set so that cold start issues are handled gracefully.
Top comments (1)