What is this article?
I work at a web company, handling the maintenance and operation of web systems.
In this role, I am often asked, particularly by junior engineers, about how to proceed with performance tuning and performance testing. So, I decided to summarize the basic knowledge that is good to have.
Web system performance is a deep topic (and there are few comprehensive books on it), so I don't feel like I can cover everything, but I hope to reduce the number of people who get stuck in the same places I did.
I. To Understand Performance
To begin with, we often vaguely say "performance is good" when something moves snappily, but what exactly is performance?
What is Performance?
In this article, "Web System Performance" is defined as one of the non-functional requirements of a system. It is defined by whether the target system can:
- Operate at the speed expected by the user,
- Stably (regardless of time of day, operation method, user environment, or access by other users),
- Continue to return the expected output.
Since a web system is essentially a machine, there are physical limits to its processing speed.
By satisfying performance requirements within those constraints, a service can gain a competitive advantage in the market. (For example, no matter how useful a service is, you wouldn't want to use it if every page transition took 10 seconds(, right?). Conversely, fast-loading pages offer benefits like reduced user churn and higher search engine rankings (SEO)).
Additionally, if the same performance can be achieved with fewer servers, costs can be reduced.
What is Performance Tuning?
States where a web system fails to meet performance requirements are often described as "slow," "heavy," or "high load." These generally fall into one or both of the following two categories:
- High Latency
- Low Throughput
Here, the terms "Latency" and "Throughput" have appeared. These are extremely important words for understanding system performance.
"Latency" generally refers to "the time from when the system accepts a request until it is processed (sec, ms, μs)" (in the case of web systems, this is often equated with response time).
Since latency is processing time, the smaller it is, the better the performance.
In contrast, "Throughput" refers to "the number of requests the system can process per unit of time (RPS: Requests Per Second)."
Web systems (and their components) have a limit on the number of requests they can process simultaneously. If simultaneous requests exceed this limit, the requests are queued in some form (i.e., put on hold to be processed later). As a result, there is a wait time before processing starts, making the response time slower.
Since throughput is the number of requests that can be handled simultaneously, the larger it is, the better the performance.
Current web systems are divided into multiple components, such as load balancers, web servers, application servers, and databases. Therefore, performance issues such as high latency or low throughput in a specific component can drag down the performance of the entire system. This is called a "performance bottleneck" (or simply "bottleneck").
The work of measuring the latency and throughput of each component and resolving these bottlenecks to improve performance is called "Performance Tuning."
Importantly, improving a part that is not a bottleneck (or a part that is a bottleneck but has a small impact on the whole) will not significantly improve the overall system performance.
The performance tuning process involves cost-effectiveness issues (in addition to financial costs, there are man-hours for changes, future scalability, security impacts, etc.), so trying to tune everything with a single method is not recommended.
It is important to identify how much performance is needed for the target system, exactly what is becoming the bottleneck, and perform tuning that is effective for that specific spot.
Skills Required to Understand Performance
I believe the following three skills are necessary to understand performance in web systems:
- Understanding of Computer Science (especially networks, databases, programming, OS, and computer architecture).
- Understanding of the system you are operating (not just the application logic and infrastructure architecture, but also data characteristics and business requirements/use cases).
- Logical thinking ability to use this knowledge to create hypotheses about bottlenecks, verify them, interpret the results (logs and metrics), and derive new hypotheses (and the tenacity to keep thinking until the goal is achieved).
None of these can be acquired overnight. Even with specialized education at a university, they are not immediately usable at a practical level. I think it's difficult unless you learn them little by little through actual work (at least, that was the case for me).
II. Basic Knowledge of Web Systems
From here, I will explain the knowledge necessary to understand performance specifically in the context of web systems.
However, since modern web systems are extremely complex, I cannot write everything down, so I will limit this to the knowledge necessary to understand the overview.
General Web System Processing Flow
First, it is good to have an image of what the components of a general web system are and what kind of processing they do.
(This is a very rough diagram, but) for example, something like this:
From there, I think it is good to dig deeper into the points that should be noted.
1.~2. DNS Request
A DNS request is the process of obtaining the destination server's IP address from the domain name (FQDN) in the URL by querying a DNS server.
Usually, the request result is cached within the local PC, so repeated inquiries do not occur.
In this step, since no HTTP request is sent from the browser, DNS rarely becomes the focus from a web system performance perspective.
3.~5. Sending HTTP Request (Outbound)
Based on the result of the DNS request, an HTTP request is assembled on the browser side and sent via the TCP/IP protocol.
In normal access via the internet, the request is sent from a PC or smartphone via a home router to the destination server.
The ISP (Internet Service Provider) manages the network equipment on the path between the home router and the internet, ensuring the request is sent via an appropriate route.
Since there are few things we can actually control in this step (though technically there are things to consider like signal strength and bandwidth guarantees on the internet), it rarely comes up as a topic in web system performance.
6. From Load Balancer to Database Server
If we write the flow from the load balancer to the database server in a bit more detail, it looks like this:
- The Load Balancer performs TLS termination (and TLS session establishment if it's the first request) and load balancing.
- The Web Server manages TCP connections and processes static content. Requests that should return dynamic values are forwarded to the Application Server.
- The Application Server processes business logic. If access to the database or calls to other APIs are necessary, it includes those as well.
- The Database Server executes queries and returns the necessary data to the Application Server.
In Web Backend (Server-side) performance tuning, this area is the main focus.
7.~9. Sending HTTP Request (Inbound)
Since this is the return path, I will omit it.
10.~11. Acquiring Additional Content and Browser Rendering
Modern web pages are not finished after sending and receiving HTML once; they subsequently send and receive various contents (CSS, JavaScript, images, etc.) and use them for the final page content rendering.
Generally, web browsers process in the following order:
-
Loading: Downloads HTML and CSS, and builds the DOM tree and CSSOM tree describing their structure. First, tags like
<link>,<img>,<src>are extracted from the HTML obtained by the first request, and content is downloaded from the URLs listed in the links. Since the DOM tree construction may be blocked when downloading CSS or JavaScript, the order of content download becomes very important. - Scripting: Reads and executes JavaScript.
- Rendering: Merges the DOM tree and CSSOM tree to create a "Layout Tree (what should be drawn)." Specifically, it calculates styles (searching for which CSS applies to which element in the DOM tree) and layout information such as element size and position, holding it as a tree structure.
- Painting: Based on the layout tree, it actually draws pixels on the screen. Drawing items are calculated by the CPU or GPU and drawn on the screen.
These are called the Critical Rendering Path, and speeding this up is the main focus of Web Frontend performance tuning.
In addition to this, recent web pages have many points to consider from a performance perspective even after rendering the content once, such as rendering videos and animations, and accepting user operations like button clicks.
Responses tailored to individual requirements are necessary.
Recommended Literature
- MDN Web Docs: If you do web development, this is a document you should read at least once. I don't think there is any other document that covers the Web so comprehensively.
III. How to Conduct Performance Testing?
So, how do we guarantee system performance?
From here, I will explain how to specifically design, execute, and interpret performance tests.
Purpose of Performance Testing
Performance testing is a test (or group of tests) aimed at confirming whether the system meets performance requirements (Availability, Performance/Scalability in Non-functional Requirements Grade).
Generally, it is often conducted for the following purposes, and Performance Testing Guidance for Web Applications gives them names respectively.
| Type | Purpose |
|---|---|
| Performance Testing | To clarify the current system performance. |
| Load Testing | To confirm whether the system can continue to meet performance requirements for expected use cases. This includes not only peak access testing but also testing if it's okay to run simultaneously with nightly batches, or testing if it runs fine for a long time (Endurance Testing). |
| Stress Testing | To confirm whether the system can respond as a system under high load, and to check for application bugs or security issues specific to high load conditions. |
| Capacity Testing | To clarify if the system is scalable for the future, such as increases in users or transactions, and what tuning is necessary for expansion. Identify system performance bottlenecks and collect information necessary for future tuning work or scale-out plans. |
In practice, these tests are rarely done separately; often, several or all types of tests are confirmed with the same test scenario.
Flow of Performance Testing
In many cases, performance testing is conducted in a PDCA cycle like the following.
Test Planning
First, plan the test according to the purpose.
A common pattern to fail in performance testing is "I somehow tried putting a load on the system, but I didn't know what was good/bad, so it dragged on."
If you clarify the purpose of the test and plan what to do and what to confirm in advance, this is less likely to happen.
When formulating a plan, I proceed as follows:
1. Decide the Test Purpose
Summarize the test purpose in 2-3 sentences as bullet points to convey it concisely to others.
1. We added a new endpoint 'yyy' to the xxx API. Confirm if this meets the pre-defined performance requirements. (Performance Test perspective)
2. Confirm that the addition of the new endpoint 'yyy' does not cause performance degradation of existing endpoints. (Regression Test perspective)
3. Confirm up to how many requests per second the xxx API as a whole can process within a certain time without returning abnormal values. (Stress Test/Capacity Test perspective)
Tip: Try not to write details of the test scenario (how many RPS are needed, what data preparation is needed, etc.) here. Keep it concise.
2. Understanding System Architecture
Next, move on to looking at the target system (don't write the test yet!).
I will omit architecture details as it is not the main focus of this article, but understanding the structure of the target system is essential.
In particular, it is important to grasp in advance how system components depend on each other and who manages them and how.
You will need to communicate with development engineers and check documents.
3. Measurement/Estimation of Access Patterns
As mentioned above, it is important to understand the actual use cases of the target system.
Then, at the test planning stage, collect information about the target system's access patterns and their current/future trends.
For example, you need to investigate the following points:
-
Current Access Patterns
- Requests Per Second (RPS)
- Unique Users (UU, DAU, etc.)
- Approximate response time for each function
- Amount of data held. Number of database records, file sizes processed by batches, etc.
- Functional characteristics. Do "heavy processes" like search/aggregation exist? Are there features that must never go down, or features that are acceptable to drop in worst-case scenarios? Is the transaction read-heavy or write-heavy?
-
Temporal Trends
- Short/Medium-term & Periodic (e.g., "Peak access is every night around 7 PM. Previous actual value was xxx RPS," "Monthly closing batch runs at the beginning/end of the month, loading the database shared with the API").
- Seasonal (e.g., "Access volume increases by 1.5x on average during the New Year period. Previous peak actual value was xxx RPS").
- One-off (e.g., "Access peaks immediately after a TV commercial. Previous actual value was xxx RPS").
- Long-term (e.g., "Daily Active Users (DAU) have been increasing by about 10% annually for the last 2-3 years").
If the system is already running, it is good to look at access patterns using logs and monitoring tools. If it is a new development and logs cannot be referenced, you will have to estimate to some extent from expected user numbers and use cases (this is honestly quite difficult, so I recommend assuming a worst-case scenario and having a sufficient buffer).
Also, if information is missing, don't forget to consult with the Product Owner or Architect.
They likely have information that a tester might not have.
4. Formulation of Scenarios and Expectations
Once you are here, clarify the test scenarios (test cases) and their expected values.
For example:
1. Apply a load of 1.5x the expected peak access (600 RPS) to the xxx API (including the new endpoint yyy) for 30 minutes, using data copied from the production environment.
Expectation: Error rate (in seconds) is 0.1% or less. 99% (99th percentile) of all requests have a response time within 1 second. For all endpoints other than yyy in the xxx API, compare the 99th percentile response time with previous performance tests and ensure no increase of 20% or more.
2. Continually increase the load on the xxx API step-by-step to find the point where the system reaches its limit. Stop the test when the error rate exceeds 1%.
Expectation: RPS at the time of stopping is 1000 RPS or more. 99th percentile response time is within 2 seconds.
3. Apply a load assuming daytime access (50 QPS) to the xxx API for 3 hours.
Expectation: Error rate is 0.1% or less. 99th percentile response time is within 1 second. No increase of 20% or more for endpoints other than yyy compared to previous tests. During the test, no abnormalities in API server CPU/Memory/Disk, JVM heap memory (Garbage Collection).
The scenario examples above are written to match the test purpose examples mentioned earlier.
By doing this, "what needs to be done to call the test complete" becomes clear, preventing hesitation in subsequent steps.
If the written scenario does not match the test purpose or feels off, try reviewing the performance requirements, test purpose, or the scenario itself.
Execution
Once scenarios and expectations are clear, build the test environment and execute.
1. Case Description
Translate the test scenario into a script executable by performance testing tools (Locust, k6, etc.).
If the target application already exists, it is good to run the test scenario with a small RPS once to test the behavior before the actual performance test.
2. Environment Setup
Build the environment for performance testing.
Broadly speaking, you need to build three things: the system to be tested (Target System), the system to measure results (Monitoring System), and the system to actually generate the load (Load Generator).
1. Target System
Ideally, you should reproduce the exact same configuration (including data) as the production environment, but this is often difficult due to labor and cost. It is acceptable to reduce the number of servers or lower the specs to fit the purpose of the test.
Also, if the target system depends on external systems, consider replacing them with mocks so that test transactions do not affect external systems. Note that doing this may change the performance bottleneck, potentially preventing accurate test results.
2. Monitoring System
This should basically reproduce the production configuration. Especially for new development, "confirming if the monitoring system works as expected" should be included in the test purpose.
In many cases, the monitoring system should include aggregation/visualization of logs sent from the application side, server resources (CPU, memory, network, etc.), and middleware metrics (web server threads, database connections, JVM heap memory, etc.). Confirm that these are measurable before the performance test.
3. Load Generator
For the load generator, it is best to build it on the same network as the actual load source (on the internet if access comes from the internet, or within the data center if access comes from within) to simulate actual load.
Also, when trying to apply a large load, the load might not be applied fully because the bottleneck is on the Load Generator side (distinguishing this from a bottleneck in the target system is surprisingly difficult). Techniques such as accessing the target system simultaneously from multiple servers (changing kernel parameters like file descriptors is also effective) or setting up a cluster configuration (if supported by the performance testing tool) will be necessary.
Here are representative performance testing tools and measurement tools (I will omit specific usage as there are many articles on them).
Performance Testing Tools
- k6: A Go-based tool where scenarios can be written in JavaScript. High performance and single binary make it easy to integrate into CI. If in doubt, choose this.
- Locust: Scenarios can be written in Python.
Measurement Tools
- ELK Stack (Elasticsearch, Logstash/Fluentd, Kibana): Log aggregation/visualization tool. Often used as a set: logs transferred from the app by Logstash or Fluentd are searched in Elasticsearch via Kibana's visualization UI.
- Prometheus / Grafana: Prometheus monitors server resources and middleware metrics, and Grafana is often used as the UI to visualize them.
3. Schedule
Performance testing has the following characteristics:
- Tends to be late in the development project. Unlike functional tests, it is difficult to isolate parts for unit testing, so it inevitably happens after infrastructure setup is complete or features are finalized.
- If requirements are not met, fixing it takes time. Generally, analyzing, finding, and improving performance bottlenecks requires considerable knowledge and time.
Both are major risks to the schedule, so you should execute with a sufficient buffer (at least about 2 weeks).
Also, incorporate notifications to relevant parties such as infrastructure and operation teams of dependent systems into the schedule. For large-scale systems, the impact range of performance testing tends to be large, so move early.
4. Execution
Now, execute the test.
Basically, you just run the test scenario from the load generator as described, but if you share the test environment with other systems, their operations might cause unexpected results. If any unnecessary programs or processes seem to be running, don't forget to stop them.
Interpretation of Results
This is the most difficult part of performance testing.
Actually look at the monitoring tools and judge where the bottleneck might be.
1. Confirmation of Results
Start by investigating external factors (Throughput, Response Time, Status Codes, etc.), and for things that cannot be determined by those alone, look at more detailed metrics (CPU/Memory/NW, etc.).
I always check in the following order:
1. Does the RPS seen from the Load Generator match the RPS seen from the Application side?
If they do not match, there may be a throughput bottleneck somewhere on the path from the load generator to the target system, and the load generator may not be fully applying the expected load in the test scenario (a state called "requests are clogged"). In this case, check the metrics of systems on the path one by one to find the bottleneck.
2. Are there any logs indicating abnormalities due to high load, such as error logs or slow query logs?
If the nature of the abnormality is already clear from the logs, consider tuning according to that content. Even if the content is not clear from the logs alone, you may be able to narrow down the cause. For example, if the API returns a 500 HTTP status code only under high load, suspect timeout settings of systems on the path or application bugs (lock timeouts, race conditions, etc.).
3. Is the Response Time (Latency) as stated in the performance requirements?
First, check for each endpoint whether the response time measured from the load generator side meets the requirements. If not, check the response time of systems on the path one by one to find which part is the latency bottleneck. If only specific endpoints are slow, focus on checking those.
4. Are the metrics of each component in the target system "problem-free"?
Listing what comes to mind, I usually look at these areas:
| Type | Content |
|---|---|
| Common to All Servers | * Load Average: Is it higher than the number of CPU cores? * CPU Usage: Is it stuck at 100%, or hovering around 80%, or not used at all (spending too much time waiting for I/O)? Also, if running on shared containers like Kubernetes, is it throttling? * Disk I/O: Is the upper limit used up? |
| Load Balancer | * TCP Connections: Is it stuck at the maximum value? * Network Bandwidth: Is it used up? |
| Web Server / Reverse Proxy | * TCP Connections: Is it stuck at the maximum value? |
| Application Server | * TCP Connections: Is it stuck at the maximum value? Especially if doing connection pooling. * Heap Memory (JVM): Is GC properly performed (is heap area released periodically)? |
| Database Server | * Thread Count: Is it stuck at the maximum value? |
5. Does behavior significantly violate performance requirements when accessed from a browser?
If 1-4 are fine, finally confirm the performance of the Web application as a whole (Never forget the tester's perspective: "Is the quality ultimately good enough to release as an application?").
When checking from a browser, there are many factors the tester cannot control, such as internet traffic and signal status, making objective evaluation difficult. Therefore, basically, it will be confirming that the time until the page is displayed does not deviate significantly from expectations by actually opening the web page.
On top of that, Core Web Vitals, advocated primarily by Google, can be one objective indicator. These are indicators of website health, and as of December 2025, the following three are used:
| Metric | Content | Target |
|---|---|---|
| LCP (Largest Contentful Paint) | Rendering time of the largest image, text block, or video (main content) displayed. | 2.5s or less |
| CLS (Cumulative Layout Shift) | Score regarding the amount of layout shifts (jangling) during loading. | Less than 0.1 |
| INP (Interaction to Next Paint) | Score regarding latency of clicks, taps, and keyboard operations. | 200ms or less |
These can be measured with Google Chrome's DevTools.
https://developer.chrome.com/docs/devtools/performance/overview
These do not necessarily match the performance expected by the user, but including them as one of the performance requirements helps build a common understanding among project members.
If everything looks fine up to this point, the performance test is complete.
If a problem is found, proceed to the next step.
2. Hypothesis Formulation and Decision on Next Actions
If the bottleneck is clear, tuning that part will be the next action. However, in many cases, it is difficult to reveal all bottlenecks with a single test scenario execution.
In that case, list possibilities of factors that could be bottlenecks, prioritize them, and verify.
For example, if slow query logs are observed only for specific SQL on the database server and there are no abnormalities in other metrics, the following possibilities exist:
- The SQL statement itself is not tuned (e.g., using
<>for comparison, using subqueries). - Indexes are not appropriate.
- The amount of data processed by the SQL is too large.
- (In the case of transactions) Lock waiting is occurring.
As mentioned earlier, improving a non-bottleneck part will not improve the performance of the target system as a whole.
Considering the possibility of bottleneck resolution, correction man-hours, and impact range, plan the verification.
Tuning
Once the correction for what seems to be the bottleneck is complete, re-execute the test scenario.
Here, if you make corrections in as small units as possible to allow comparison with the previous execution results, it becomes easier to understand what the factor was.
Repeat the above process until performance requirements are met.
As mentioned earlier, performance testing is work where it is hard to estimate man-hours, so I recommend having a sufficient buffer.
When performance testing is complete, do not forget to restore debug settings (debug logs, slow query logs, etc.) to their original state.
IV. Tuning Tips
Finally, I will list representative tuning tips as a "drawer" of ideas to use when bottlenecks are found.
The listed man-hours (Effort) and effects are approximate sensory values.
I'd be happy if you look at this when you are stuck for improvement ideas.
Common
| Content | Effort | Effect | |
|---|---|---|---|
| Scale Out | Increase the number of servers to distribute load and improve overall processing capacity. Easy to implement in cloud environments, so often used in emergencies (aka "throwing money at the problem"). Often the first thought, but frequently fails if done blindly without considering bottlenecks. | Low | High |
| Scale Up | Replace server hardware (CPU/Memory/Disk) with higher performance ones. Also very effective if the bottleneck is clear. | Medium | High |
| Retry/Timeout Optimization | It is important to correctly set retry and timeout values in servers, middleware, and applications (LB, Web Server, App, Libraries, etc.) that connect to other components. Default values are often suitable for testing but cause problems under high load. Good to check once. | Low | Medium |
| Disable Debug Settings | Delete or disable all debug settings in the production environment. Some web frameworks have separate settings for development and production, so ensure production settings are used for security reasons too. Disabling debug logs also significantly contributes to performance. | Low | Medium |
Web Frontend (Delivery)
| Content | Effort | Effect | |
|---|---|---|---|
| CDN (Content Delivery Network) | Cache static content like images on CDN provider servers (Edge Servers) and deliver from servers close to the user to reduce latency. Akamai, Fastly, Cloudflare, AWS CloudFront are famous. | High | High |
| Optimization of Images, Fonts | Reduce size by using image formats like WebP or AVIF, and font formats like WOFF. Be careful of browser backward compatibility. | Medium | Medium |
| Content Compression | Reduce transfer time by reducing file size during transfer. Especially effective for large files like images and videos. Gzip, Brotli, etc. | Low | Medium |
| Use Cache Headers (Cache-Control) | Allow browsers and CDNs to cache files appropriately to reduce access to the server itself. Be careful of security issues with caching. | Low | Medium |
Web Frontend (Logic)
| Content | Effort | Effect | |
|---|---|---|---|
| SSG (Static Site Generation) | Convert pages that do not require dynamic processing into HTML at build time. High speed is expected as dynamic processing is not performed at the time of user request. | High | High |
| Review API Calls | Reduce backend API call counts or change how they are called. You can surprisingly find areas for improvement. | Medium | Medium |
| Reduce Redirects | Reduce unnecessary network communication by cutting unnecessary redirects. Note that unintended redirects can occur due to configuration errors, like missing a trailing "/" in a URL link. | Low | High |
| Async JavaScript Loading | Add defer (or async) attribute to <script> tags to load JS files asynchronously, reducing the time DOM tree construction is blocked during Loading. Be careful of behavioral changes as execution order changes. |
Low | Medium |
| Review CSS Selectors | Reduce search processing time during browser Rendering (layout calculation) by reducing descendant selectors, etc. | Low | Low |
| Reduce CSS Imports | Reduce unnecessary CSS loading (and blocking time) by reducing @import. Recently CSS is mostly bundled by build tools, so you might not need to worry about this much. |
Low | Low |
| Fix Image Sizes | Reduce time for browser Rendering (layout calculation) by explicitly specifying height and width attributes in <img> tags. |
Low | Low |
| Introduce Resource Hints | Explicitly declare resources expected to be accessed or known to be needed in <link> tags to let the browser pre-load them. There are 4 types: DNS Prefetch, Preconnect, Prefetch, and Prerender. |
Low | Medium |
| Browser Caching via Service Worker | Use Service Worker to cache content on the browser, reducing time to fetch temporary content. It can work even when offline, handling communication delays due to signal status. | Medium | Medium |
Web Server
| Content | Effort | Effect | |
|---|---|---|---|
| HTTP/2 (including gRPC) | Improve communication efficiency through stream multiplexing (avoiding browser connection limits) and header compression (HPACK). Should be considered especially for new builds. | Medium | High |
| Keep-Alive Settings | Reuse TCP connections to reduce overhead in TCP connection establishment (handshake). Basically effective, but be careful to set timeouts appropriately considering when connections are cut, or it becomes a source of bugs. | Low | Medium |
| Warm-up | Send traffic to the Web server before the access peak comes to avoid problems caused by a sudden increase in access (access spike). Since application server connection counts, DB memory cache/execution plan optimization, and JVM JIT compilation require some access after startup, warm-up helps handle peak load smoothly. It's more of a workaround, so it's safer to be able to handle access without warm-up. | Low | Medium |
Web Backend Application (Business Logic)
| Content | Effort | Effect | |
|---|---|---|---|
| Separate App Server and Web Server | Assign connection handling and static content delivery to Web Servers (Apache, NGINX) and logic processing to App Servers (Tomcat, WSGI, Unicorn) to efficiently use resources. Follow the representative tech stack for your programming language. | Medium | High |
| Async Tasks | Reduce latency by making some server tasks asynchronous and returning a response to the client without waiting for completion. Effective for tasks where the client doesn't need the result, like sending emails or outputting logs. Implementations include thread generation or using message brokers like Kafka. | Low | Medium |
| Concurrent Tasks | Improve resource efficiency by performing parts of the server processing in parallel. For example, while one process is waiting for DB access or external API calls (not using CPU), another process can use that CPU time. Includes multi-threading, coroutines, Goroutines. | Medium | Medium |
| Parallel Tasks | Shorten processing time by proceeding with multiple tasks simultaneously. For example, if aggregating results from multiple API calls, calling them in parallel shortens time. Note that even with multi-threading, parallel execution requires multiple CPU cores on the App Server. | Medium | Medium |
| Connection Pooling | A technique where the client establishes multiple TCP connections in advance and reuses them when needed. Effective for processing time and CPU/Memory resources as it avoids establishing connections every time. Often done between App Server and DB Server. Too many connections consume server resources, so balance is key. | Low | Medium |
| Reuse Connection Objects | Instead of generating a connection (or context) object every time an HTTP or DB client library is used, reuse it (e.g., Singleton pattern) to reduce overhead. (Generating it every time is a common beginner mistake). | Low | Medium |
| Circuit Breaker Pattern | A mechanism to cut off calls to external APIs or DBs when the number of calls or errors exceeds a certain value. Libraries like Resilience4J (Java) are famous. Originally intended to prevent failure propagation, but also useful for load reduction. | Medium | Medium |
| Database Query Tuning | General query tuning. Start by checking slow query logs. If found, check execution plans (for RDB), optimize indexes, and reduce data transfer amount. | Low | High |
| Address N+1 Problem | A typical bottleneck in DB access where SQL statements are issued inside an application loop, resulting in a massive number of unnecessary SQL queries. Mostly an issue with the SQL construction; fix by using JOINs. If using an O/R mapper, check debug logs and follow library instructions. | Low | High |
| Review DB Lock Granularity | Reduce the possibility of other requests waiting for locks by reducing lock granularity, improving overall throughput. If the query itself is fine but slow, this is usually the cause. Make granularity as small as possible without causing data inconsistency (if table structure is solid, locking is usually 1 row). | Medium | High |
| Bulk Insert/Update/Delete | Instead of committing every time an update SQL completes, commit after multiple update SQLs complete to reduce commit overhead. Commit processing time is surprisingly large, so reducing this count alone can significantly improve speed and reduce DB load. Especially effective in batch processing. | Low | High |
| Cache Data in External DB | Reduce latency by isolating only data requiring high-speed access from the database and holding it as a cache in a faster KVS (Key-Value Store). "Caching in the application layer" often refers to this. While effective, you hold the same data in multiple places, so always be careful about synchronization and cache invalidation timing. | Medium | High |
| In-Memory Cache | Reduce overhead of accessing external DBs by saving data in the application's memory itself. "Reuse Connection Objects" above falls into this. If the application is distributed across multiple servers, updating cached data becomes difficult, so limit this to rarely changed data like settings. | Medium | Medium |
| Reduce Unnecessary Logs | Reduce disk I/O time by cutting unnecessary logs. Unnecessary processing shouldn't be included regardless of logs, but log handling is often postponed (e.g., debug logs leaking to production). Manage this from the design phase, e.g., by setting appropriate log levels. | Low | Low |
Database Server (DB)
| Content | Effort | Effect | |
|---|---|---|---|
| Index Optimization | Add/Remove indexes to reduce SQL processing time and DB load. Check SQL statements issued to the table and their execution plans. If indexes aren't working, a full scan is executed, making reference queries very slow. Blindly adding indexes slows down update queries, so be moderate. | Low | High |
| Data Reduction (Data Cleaning) | Keep the amount of data accessed by users small by moving unused data to archive tables, etc. Incorporate periodic data deletion into the design. | Medium | Medium |
| Create Read Replicas | Create a read-only DB instance and direct reference queries there to reduce load on the original DB instance. Modern DBMS often support this natively. | Medium | High |
| Horizontal Partitioning (Sharding) | Split data in a table by rows and save in multiple tables to reduce data volume per table. Effective when specific table data becomes large. Modern DBMS often support this natively, deciding the storage table by record key value. | High | High |
| Vertical Partitioning | Split data in a database by columns and save in multiple tables. Often used more for refactoring than performance tuning. | High | High |
Improving Perceived Speed (UX Tuning)
This isn't strictly system performance improvement, but implementing specifications/implementations that don't make the user feel "slow" can shorten the user's "perceived speed."
The ones who judge if a system is fast or slow are the users.
In terms of UX improvement, these "clever tricks" are actually quite important.
| Content | Effort | Effect | |
|---|---|---|---|
| Loading Icon | Show an icon indicating "loading" before screen rendering is complete to show the user that processing is underway. | Low | High |
| Skeleton Screen | Show a frame indicating "loading" before screen rendering is complete. Compared to loading icons, the screen change after loading is smaller, so it feels less unpleasant. | Medium | High |
| Client-side Animation | Show an animation on the client during server communication. Smartphone apps have splash screens by default; showing animation effectively there reduces the feeling of waiting. | Medium | High |
| Optimistic UI | For interactive operations, show the user that the operation was performed first, then do the actual processing. Famous example: X (Twitter) 'Like' button (the heart icon activates first to tell the user it's done, then the request is sent asynchronously). | Low | Medium |
| List Pagination | When displaying lists with many elements (search results, history), display divided content (e.g., 10 items at a time) to reduce fetching time. Often implemented with backend API interfaces for pagination and corresponding frontend pages. Methods include Offset (specify which number element is needed) and Cursor (response provides a key for the next element). | Medium | High |
| Lazy Load | Reduce unnecessary element loading by implementing asynchronous backend API calls or DOM rewriting in response to user scroll, loading only elements within the user's view. Applying this to list pagination creates "Infinite Scroll." | Medium | High |
Recommended Literature
- MDN Performance Fundamentals: The MDN doc mentioned above. Comprehensive, so a quick read is recommended.
- Performance Testing Guidance for Web Applications: A bit old, but a guideline by Microsoft. Systematized and easy to read.
Summary
It feels like a mix of a summary and a miscellaneous collection of tips, but I wrote an article from the perspective of Web System Performance.
Being able to do performance tuning and verification alone makes you feel like you've grown a step as an engineer.
I hope this article is helpful.


Top comments (0)