Ankit Kumar Shaw

Posted on Apr 3

From 1.5s to 250ms: How We 6x'd API Latency with Spring Boot Optimization

#springboot #distributedsystems #redis #ai

The Hook: When Your System Meets Reality

Picture this: It's Tuesday morning, and your leave management system—serving 20,000 employees across enterprise teams—suddenly becomes the bottleneck. Approvals that should take seconds are taking 1.5 seconds per request. Dashboard loads feel sluggish. Support tickets flood in. The database team reports 85% CPU utilization, and your ops team is preparing incident escalation.

The system was built to "work." But production doesn't reward "working"—it demands reliability, speed, and efficiency at scale.

This is the story of how we inherited a system with p99 latency of 1.5s on a platform serving 15K+ daily requests, and through methodical architectural profiling and optimization, reduced it to 250ms (6x faster), while simultaneously cutting database CPU consumption by 40%.

Note: Even after migrating from legacy Python/Flask to Spring Boot, the latency bottleneck persisted. The migration provided better observability and scalability, but the root causes—inefficient queries, undersized connection pools, and missing caching—remained. The optimizations below solved the actual problem.

The Scenario: A System Under Pressure

Current State (Post-Migration, Spring Boot)

System: Newly migrated Spring Boot microservice (from legacy Python/Flask)
Scale: 20,000 employees, 15,000+ requests/day
Latency: p99 = 1.5 seconds (unacceptable for user experience)
Database Load: 85% CPU utilization on a 4-core instance
Problem: Migration alone didn't solve the performance issues; architectural inefficiencies remained

Why This Matters

At scale, every 100ms saved = better UX, lower operational costs, and reduced infrastructure spend. For 15K daily requests, cutting latency by 1.25s saves ~50 compute-hours daily. That's the difference between needing two database instances or one.

Root Cause Analysis: Finding the Bottleneck

Our investigation focused on three key areas:

1. Database Query Profiling

Using Spring Boot Actuator and MySQL slow query logs, we discovered the smoking gun: The N+1 Query Problem — A textbook horror:

// ❌ BEFORE: N+1 Query Anti-pattern
public List<LeaveRequest> getLeaveRequestsByDepartment(Long deptId) {
    List<Department> depts = em.createQuery(
        "SELECT d FROM Department d WHERE d.id = :deptId", 
        Department.class)
        .setParameter("deptId", deptId)
        .getResultList(); // Query 1: Fetch department

    for (Department dept : depts) {
        List<Employee> employees = em.createQuery(
            "SELECT e FROM Employee e WHERE e.department.id = :deptId",
            Employee.class)
            .setParameter("deptId", dept.getId())
            .getResultList(); // Query 2-N: Fetch all employees (1 per loop)

        for (Employee emp : employees) {
            List<LeaveRequest> leaves = em.createQuery(
                "SELECT l FROM LeaveRequest l WHERE l.employee.id = :empId",
                LeaveRequest.class)
                .setParameter("empId", emp.getId())
                .getResultList(); // Query 3-N²: Fetch leaves per employee
        }
    }
}

Impact: For a department with 500 employees, this triggered 1 + 500 + (500 * avg_leaves) = ~2000+ queries.

Database response time per request: ~1.2 seconds (just waiting for the database).

2. Connection Pool Exhaustion

HikariCP was configured with:

spring.datasource.hikari.maximum-pool-size=10

With 15K requests/day and 1.2s per database roundtrip, connections were being held too long. We hit connection pool saturation, causing requests to queue.

3. Missing Caching Layer

User roles, department hierarchies, and leave policies were fetched on every request—data that changes infrequently but was being queried thousands of times daily.

Solution 1: Hibernate JOIN FETCH (Eliminate N+1)

The most impactful change was using Hibernate's JOIN FETCH to eagerly load relationships in a single query:

// ✅ AFTER: Single JOIN FETCH query
@Query(value = """
    SELECT DISTINCT d FROM Department d
    LEFT JOIN FETCH d.employees e
    LEFT JOIN FETCH e.leaveRequests l
    WHERE d.id = :deptId
    """)
List<Department> getLeaveRequestsByDepartment(@Param("deptId") Long deptId);

Trade-off: This fetches all data upfront (potentially unnecessary if you only need recent leaves). But in this case, the leave request list was always needed.

Result: From 2000+ queries → 1 query. Database latency dropped from 1.2s → 350ms.

Solution 2: HikariCP Connection Pool Tuning

Before diving into pool size, we profiled connection lifecycle:

# ❌ BEFORE (Undersized)
spring.datasource.hikari.maximum-pool-size=10
spring.datasource.hikari.minimum-idle=2

# ✅ AFTER (Right-sized for concurrency)
spring.datasource.hikari.maximum-pool-size=50
spring.datasource.hikari.minimum-idle=10
spring.datasource.hikari.connection-timeout=10000 # 10s timeout before failing
spring.datasource.hikari.idle-timeout=600000     # 10 min before closing idle connections
spring.datasource.hikari.leak-detection-threshold=60000 # Detect connection leaks

Why 50?
Using Little's Law (Connections = Concurrent Requests × Query Time):

Measured concurrent requests at p99: 30-40 (from production JMX metrics)
Safety buffer for traffic spikes: +10-20
Total: 50 connections (30-40 + 10-20)

This sizing eliminates connection queue time while avoiding resource waste.

Result: Connection queue time eliminated. No more "Connection pool exhausted" errors.

Solution 3: Redis-Backed Caching Strategy

We implemented a two-tier caching strategy for frequently-accessed, infrequently-changed data:

@Service
@RequiredArgsConstructor
public class LeavePolicyCache {
    private final RedisTemplate<String, LeavePolicy> redisTemplate;
    private final LeavePolicyRepository leavePolicyRepository;
    private static final String CACHE_KEY_PREFIX = "leave:policy:";
    private static final long TTL_SECONDS = 86400; // 24 hours

    /**
     * Fetch leave policy for an employee (vacation quota, sick leave, carry-forward rules).
     * Policies rarely change, so we cache them aggressively.
     */
    public LeavePolicy getLeavePolicy(Long employeeId) {
        String cacheKey = CACHE_KEY_PREFIX + employeeId;

        // Try Redis first
        LeavePolicy cached = redisTemplate.opsForValue().get(cacheKey);
        if (cached != null) {
            return cached; // Cache hit: O(1) lookup, <1ms response
        }

        // Cache miss: Fetch from database
        // (joins employee → employment_type → leave_policy tables)
        LeavePolicy policy = leavePolicyRepository.findByEmployeeId(employeeId)
            .orElseThrow();

        // Populate cache with 24-hour TTL
        redisTemplate.opsForValue()
            .set(cacheKey, policy, Duration.ofSeconds(TTL_SECONDS));

        return policy;
    }

    @EventListener
    public void onLeavePolicyUpdated(LeavePolicyUpdatedEvent event) {
        // Invalidate cache when policies change (e.g., annual quota reset, policy update)
        String cacheKey = CACHE_KEY_PREFIX + event.getEmployeeId();
        redisTemplate.delete(cacheKey);
    }
}

What We Cache (and Why):

Leave Policies: Vacation quotas, sick leave limits, carry-forward rules — accessed on every leave request submission/validation, but change only during annual resets or policy updates.
Approval Workflows: Manager hierarchies and approval chains — needed to route leave requests, but organization structure changes infrequently.

Cache Invalidation Strategy: Event-driven (on policy/org change, we emit events and invalidate) + 24-hour TTL.

Why Long TTL? Leave policies are extremely stable (change annually or when employee switches departments). 24-hour TTL ensures we catch manual DB changes without event emission.

Result: Leave policy lookups: 120ms (database join across 3 tables) → <1ms (Redis). Reduced database CPU by ~40% since these queries were happening on every leave request view/submission.

Architecture Overview: Before vs. After

Before Optimization

Client Request
    ↓
Spring Boot Controller
    ↓
Service Layer (Business Logic)
    ↓
[N+1 Queries] → Database (1.2s latency)
    ↓
[No Caching]
    ↓
Response (1.5s p99)

After Optimization

Client Request
    ↓
Spring Boot Controller
    ↓
Cache Check (Redis) ← 1ms ✅
    ├─ Hit: Return cached Leave Policy
    └─ Miss: Continue to database
    ↓
Service Layer (Business Logic)
    ↓
[Single JOIN FETCH Query] → Database (350ms)
    ↓
[HikariCP Optimized] (50-connection pool)
    ↓
Response (250ms p99) ✅

Results: The Metrics That Matter

Metric	Before	After	Improvement
p99 Latency	1.5s	250ms	6x faster ✅
p50 Latency	850ms	180ms	4.7x faster
Database CPU	85%	51%	40% reduction ✅
Queries/Request	2000+	1	99.95% fewer
Redis Hit Rate	N/A	87%	—
Connection Pool Timeout Errors	45/day	0	100% elimination

Business Impact

User Experience: Dashboard loads feel snappy (250ms vs 1.5s)
Infrastructure: Supports 15K+ daily requests on smaller database instance
Reliability: No more p99 latency spikes during business hours

Lessons Learned: Engineering Trade-offs

What We Sacrificed

Flexibility: JOIN FETCH loads all data. If you only needed recent leaves, you'd fetch unnecessary historical data.
Solution: Separate queries for different use cases (leave summary vs. full history).
Memory: Increased HikariCP pool size (10 → 50) uses more heap.
Reality: Better to use extra memory than have request queues. Monitoring showed peak heap: 1.2GB / 4GB available.
Cache Consistency: Redis introduces eventual consistency.
Mitigation: Event-driven invalidation + TTL ensures freshness within 1 hour.

What Worked

JOIN FETCH: Single, largest impact. 99.95% query reduction.
Connection pooling: Eliminated queueing entirely.
Caching strategy: 87% hit rate validates our data access patterns.

Key Takeaways

Profile First, Optimize Second
Don't guess where the bottleneck is. We assumed CPU was the issue—it was actually database queries. Use Spring Boot Actuator, MySQL slow logs, and flamegraphs.
N+1 is Insidious
With 20K employees and historical leave records, N+1 queries cascaded into 2000+ database roundtrips per request. Always use EXPLAIN PLAN and test with realistic data sizes.
Connection Pooling Matters More Than You Think
Undersized pools (10 connections) caused request queueing, which is invisible in application metrics but devastating to latency. Right-size based on concurrency math, not gut feel.
Caching is Not Free
Cache invalidation is hard. We chose event-driven + TTL because it's reliable and simple. Measure hit rates to validate your strategy.
6x Latency Improvement = 6x Better UX
Users feel the difference between 1.5s and 250ms. This wasn't just a technical victory—it was a product improvement.

DEV Community