JongHwa

Posted on Dec 18, 2025

My Service is Slow! Where Do I Start? (Performance Tuning Basics)

#beginners #performance #backend #architecture

We've all been there. You deploy your application, and everything seems fine until traffic spikes. Suddenly, the API response takes 5 seconds, and users are complaining.

"Where do I even start looking?"

In this post, I will summarize the core concepts of server performance tuning, focusing on metrics, scaling, and the critical database connection pool.

1. Throughput vs. Latency (Response Time)

Before fixing anything, we need to know what we are measuring. There are two main metrics:

Latency (Response Time): How long does it take to process one request? (e.g., "The API took 200ms to return data.")
Throughput: How many requests can the system handle per second? (e.g., "The server handles 1,000 TPS/RPS.")

💡 Key Insight: Improving Latency usually improves Throughput, but not always. You must decide which metric is more important for your specific service goal.

2. Finding the Bottleneck

A system is only as fast as its slowest component. This is called the Bottleneck.

Common bottlenecks include:

CPU: Heavy calculations or infinite loops.
Memory: Memory leaks or frequent Garbage Collection (GC).
I/O (Input/Output): Slow database queries, network calls to external APIs, or disk read/write.
❌ Bad Practice: Guessing. "Maybe we need a bigger server?"
✅ Best Practice: Monitoring. Use tools (like APM, Prometheus, or Grafana) to identify exactly where the request is getting stuck.

3. Scaling: Vertical vs. Horizontal

Once you find the bottleneck, you generally have two ways to expand capacity:

Vertical Scaling (Scale Up)

What is it? Upgrading the server hardware (e.g., increasing RAM from 8GB to 32GB).
Pros: Simple configuration. No code changes required.
Cons: Expensive and has a physical limit.

Horizontal Scaling (Scale Out)

What is it? Adding more servers (nodes) to share the load.
Pros: Infinite scalability. High availability (if one server dies, others survive).
Cons: Complexity. You need a Load Balancer and need to handle distributed data consistency.

4. The Silent Killer: DB Connection Pool

In many web applications (especially Spring Boot), the bottleneck is often the Database Connection Pool. Establishing a connection to a database is expensive. Therefore, we use a "Pool" to reuse connections.

Connection Pool Size

Too Small: Threads have to wait in a queue for a connection to open. (High Latency)
Too Large: Too many connections overwhelm the CPU with "Context Switching." (Low Throughput)

Connection Wait Time (Timeout)

If the pool is full, how long should a request wait before throwing an error? If set too long, the user waits 30 seconds just to see an error. It is better to fail fast so the system can recover.

5. Server Caching

The fastest query is the one you never make. If your service reads data that doesn't change often, use Caching.

Local Cache (Ehcache, Caffeine): Extremely fast, stored in the application's RAM.
Global Cache (Redis): Shared across multiple servers.

By caching frequently accessed data, you can drastically reduce the load on your database and improve Latency.

Conclusion

Performance tuning isn't magic. It starts with understanding the difference between Throughput and Latency, identifying the Bottleneck, and then optimizing resources like the DB Connection Pool.

✅ Checklist for slow services:

Are my queries slow? (Check DB Indexes)
Is my Connection Pool size appropriate? (Check HikariCP settings)
Can I cache this data? (Use Local or Global Cache)
Do I need to Scale Up or Scale Out? (Vertical vs Horizontal Scaling)

DEV Community