Senior Java Architect (8 years). Founder of ZukovLabs. Building high-performance SaaS tools. Grab my free Java+Angular boilerplate on https://zukovlabs.com .
Thanks! Yeah the MSSQL race condition alone probably cost me more debugging hours than I'd like to admit.
On scaling, this setup is intentionally single-instance because that's where 90% of projects live for their first year (or longer). The nginx reverse proxy doesn't pin you to one backend though. When you need to scale, the move is - put a load balancer in front, run multiple backend containers, and make sure your JWT validation is stateless (it already is no server-side session store). The only thing that changes in the compose file is adding replicas or switching to Kubernetes.
The part that actually breaks first at scale isn't the backend it's the database. MSSQL on a single container with no read replicas becomes the bottleneck before your Spring Boot instances do. That's when you start looking at connection pooling tuning and read replicas, not more app servers.
But let's be real, if you're at the point where a single Spring Boot instance can't handle the load, you're probably making enough money to hire someone to solve that problem.
Yeah that makes sense, especially the point about DB becoming the bottleneck first. Most people try to scale app containers way too early.
I like that this setup doesn’t over-engineer from day one.
Have you had to tweak anything around connection pooling (like Hikari settings) once traffic started increasing, or was default config enough for a while?
I’ve seen cases where that becomes the first “invisible” issue before teams even realize the DB is struggling.
Senior Java Architect (8 years). Founder of ZukovLabs. Building high-performance SaaS tools. Grab my free Java+Angular boilerplate on https://zukovlabs.com .
Yeah, that "invisible" issue is painfully real.
HikariCP defaults are actually surprisingly good for most workloads Spring Boot auto-configures it with a pool size of 10, which handles way more concurrent requests than people usually expect.
What I typically see happen: the app feels fine under normal load, but during a traffic spike, all 10 connections get checked out, new requests queue up, and response times silently jump from 50ms to 5 seconds. No errors in the logs, just slow. Users bounce before you even notice what's wrong.
If I'm tweaking anything early on, it's usually dropping connectionTimeout from the generous 30s default down to 5s (so you get a fast failure instead of a slow death) and maybe bumping maximumPoolSize to 20-25.
But the starter kit ships with the default Hikari config on purpose. It's much better to start with known defaults and tune based on actual Actuator metrics than to ship "optimized" settings cargo-culted from a random blog post about a completely different workload.
That’s a really interesting direction, especially the idea of a decision-support layer.
What stands out to me is most devs don’t lack tools, they lack clarity on what to optimize and when. They only notice issues once things slow down in production.
That’s where your work could be powerful inside something like BuildBaseKit.
Instead of being a separate system, it could sit closer to the backend:
collect real usage data (CPU, memory, request patterns)
detect inefficiencies early
suggest simple actions like config tweaks or scaling decisions
Kind of like giving developers “early signals” before things break.
I’m already seeing patterns like connection pool saturation and resource spikes becoming invisible issues in real apps , so having something proactive there would be genuinely useful.
Curious, have you tested your framework on any real backend workload yet, or still mostly at research level?
Senior Java Architect (8 years). Founder of ZukovLabs. Building high-performance SaaS tools. Grab my free Java+Angular boilerplate on https://zukovlabs.com .
Yeah, getting proactive signals before things break in production is definitely the goal. Right now, honestly, I mostly just rely on standard application logs and basic monitoring.
To answer your question: yes, this is 100% production-tested. The whole reason I built this boilerplate is that I kept hitting these exact infrastructure issues with real users on my own SaaS projects. I just got tired of configuring the same architecture from scratch every time.
Senior Java Architect (8 years). Founder of ZukovLabs. Building high-performance SaaS tools. Grab my free Java+Angular boilerplate on https://zukovlabs.com .
The first thing that bit me wasn't DB pressure or connection pools-it was the JWT refresh race condition I mentioned in the article. Locally you never notice it because you're one user in one tab. In production, someone leaves the app open for an hour, the token expires, and then they click a dashboard that fires 4 API calls simultaneously. All 4 get a 401, all 4 try to refresh, and depending on timing you get duplicate refresh token usage which invalidates the session entirely. User gets kicked to login for no apparent reason.
The second one was the MSSQL startup race under load. Locally, Docker always has the image cached and MSSQL boots in 10 seconds. On a fresh CI/CD deploy with a cold pull, it took 45+ seconds - way past the healthcheck retries I originally configured. Had to bump start period and retries to handle that.
Both are the kind of issues that only show up when real humans are using the app on unpredictable hardware. That's why I'm pretty opinionated about including the healthcheck and interceptor setup in the starter - they're not "nice to have", they're the first two things that break.
That JWT refresh race is a perfect example of what usually gets missed early.
Most boilerplates handle structure, but not these real-world failure cases that only show up with actual users.
That’s the gap I’ve been focusing on with BuildBaseKit, baking in those “first things that break” so people don’t discover them the hard way in production.
Senior Java Architect (8 years). Founder of ZukovLabs. Building high-performance SaaS tools. Grab my free Java+Angular boilerplate on https://zukovlabs.com .
Appreciate the thoughtful thread. You're right that most boilerplates focus on structure the real-world failure modes only become visible once you've shipped something and watched it break in ways the dev environment never showed you. That's honestly why I started extracting the fixes I keep rediscovering into a reusable starter instead of copy-pasting them between projects.
If anyone else reading this has hit similar production footguns worth documenting, drop them in the commentsю I always interested in what breaks in other people's stacks.
This hits the exact pain most people underestimate with Docker setups.
That MSSQL startup race and “localhost inside container” issue has wasted so many hours.
Really like the nginx approach to avoid CORS entirely instead of patching it everywhere.
Curious, have you seen this setup break when scaling beyond a single backend instance or still holding up well?
Thanks! Yeah the MSSQL race condition alone probably cost me more debugging hours than I'd like to admit.
On scaling, this setup is intentionally single-instance because that's where 90% of projects live for their first year (or longer). The nginx reverse proxy doesn't pin you to one backend though. When you need to scale, the move is - put a load balancer in front, run multiple backend containers, and make sure your JWT validation is stateless (it already is no server-side session store). The only thing that changes in the compose file is adding replicas or switching to Kubernetes.
The part that actually breaks first at scale isn't the backend it's the database. MSSQL on a single container with no read replicas becomes the bottleneck before your Spring Boot instances do. That's when you start looking at connection pooling tuning and read replicas, not more app servers.
But let's be real, if you're at the point where a single Spring Boot instance can't handle the load, you're probably making enough money to hire someone to solve that problem.
Yeah that makes sense, especially the point about DB becoming the bottleneck first. Most people try to scale app containers way too early.
I like that this setup doesn’t over-engineer from day one.
Have you had to tweak anything around connection pooling (like Hikari settings) once traffic started increasing, or was default config enough for a while?
I’ve seen cases where that becomes the first “invisible” issue before teams even realize the DB is struggling.
Yeah, that "invisible" issue is painfully real.
HikariCP defaults are actually surprisingly good for most workloads Spring Boot auto-configures it with a pool size of 10, which handles way more concurrent requests than people usually expect.
What I typically see happen: the app feels fine under normal load, but during a traffic spike, all 10 connections get checked out, new requests queue up, and response times silently jump from 50ms to 5 seconds. No errors in the logs, just slow. Users bounce before you even notice what's wrong.
If I'm tweaking anything early on, it's usually dropping connectionTimeout from the generous 30s default down to 5s (so you get a fast failure instead of a slow death) and maybe bumping maximumPoolSize to 20-25.
But the starter kit ships with the default Hikari config on purpose. It's much better to start with known defaults and tune based on actual Actuator metrics than to ship "optimized" settings cargo-culted from a random blog post about a completely different workload.
That’s a really interesting direction, especially the idea of a decision-support layer.
What stands out to me is most devs don’t lack tools, they lack clarity on what to optimize and when. They only notice issues once things slow down in production.
That’s where your work could be powerful inside something like BuildBaseKit.
Instead of being a separate system, it could sit closer to the backend:
Kind of like giving developers “early signals” before things break.
I’m already seeing patterns like connection pool saturation and resource spikes becoming invisible issues in real apps , so having something proactive there would be genuinely useful.
Curious, have you tested your framework on any real backend workload yet, or still mostly at research level?
Yeah, getting proactive signals before things break in production is definitely the goal. Right now, honestly, I mostly just rely on standard application logs and basic monitoring.
To answer your question: yes, this is 100% production-tested. The whole reason I built this boilerplate is that I kept hitting these exact infrastructure issues with real users on my own SaaS projects. I just got tired of configuring the same architecture from scratch every time.
That’s solid. Production-tested makes a big difference.
Curious, what was the first real issue you hit once actual users started hitting the system?
Was it DB pressure, connection pool limits, or something else that didn’t show up locally?
The first thing that bit me wasn't DB pressure or connection pools-it was the JWT refresh race condition I mentioned in the article. Locally you never notice it because you're one user in one tab. In production, someone leaves the app open for an hour, the token expires, and then they click a dashboard that fires 4 API calls simultaneously. All 4 get a 401, all 4 try to refresh, and depending on timing you get duplicate refresh token usage which invalidates the session entirely. User gets kicked to login for no apparent reason.
The second one was the MSSQL startup race under load. Locally, Docker always has the image cached and MSSQL boots in 10 seconds. On a fresh CI/CD deploy with a cold pull, it took 45+ seconds - way past the healthcheck retries I originally configured. Had to bump start period and retries to handle that.
Both are the kind of issues that only show up when real humans are using the app on unpredictable hardware. That's why I'm pretty opinionated about including the healthcheck and interceptor setup in the starter - they're not "nice to have", they're the first two things that break.
That JWT refresh race is a perfect example of what usually gets missed early.
Most boilerplates handle structure, but not these real-world failure cases that only show up with actual users.
That’s the gap I’ve been focusing on with BuildBaseKit, baking in those “first things that break” so people don’t discover them the hard way in production.
Appreciate the thoughtful thread. You're right that most boilerplates focus on structure the real-world failure modes only become visible once you've shipped something and watched it break in ways the dev environment never showed you. That's honestly why I started extracting the fixes I keep rediscovering into a reusable starter instead of copy-pasting them between projects.
If anyone else reading this has hit similar production footguns worth documenting, drop them in the commentsю I always interested in what breaks in other people's stacks.
That JWT race condition is such a real one.
Shows how most issues don’t appear until real users hit the system.
This is exactly the gap I’m focusing on with BuildBaseKit, covering those early failure points, not just scaffolding.