One of the most interesting engineering challenges I’ve worked through recently had nothing to do with syntax, frameworks, or libraries.
It was about asking the right questions.
The situation
We had multiple internal services calling a third-party API secured by OAuth access tokens.
The access token was stored in a database and shared across services.
The flow looked reasonable at first glance:
Services read the token from the DB
Call the third-party resource
If a 401 Unauthorized occurs → refresh token → update DB
But something didn’t feel right.
The red flags đźš©
When you zoom out and think systemically, a few problems jump out:
Hot-spot database access
Every outbound request depended on a DB read.
Race conditions under load
Multiple services could refresh the token at the same time.
Thundering herd problem
One expired token → many simultaneous refresh calls.
Tight coupling
Business services were now coupled to auth lifecycle concerns.
Hidden latency amplification
DB → auth endpoint → DB, multiplied across services.
This kind of design often survives early testing… and fails spectacularly at scale.
The proposed fix (and why it still wasn’t enough)
An idea came up to:
Run a database job that checks token expiry
Maintain an IsExpired flag
Let a dedicated service refresh the token
Ask other services to “wait and retry” when a 401 occurs
This was a good instinct — it tried to centralize responsibility.
But it still had issues:
Token expiry ≠token invalidation
Polling introduces lag and inconsistency
Artificial delays degrade user experience
The database was still being used as a coordination mechanism
Better… but not robust.
The mental model shift đź§
Here’s the key realization:
Access tokens are not business data.
They’re volatile secrets — like TLS certificates or signing keys.
Once you adopt that mindset, the architecture becomes clearer.
The industry-proven pattern
Instead of every service “knowing” about tokens:
👉 Introduce a Token Broker / Auth Gateway
One service owns the token lifecycle
Other services simply ask for a valid token
Refresh is protected by in-memory locking
The database becomes a fallback, not a hot path
No Redis required (important in locked-down corporate Windows environments)
This eliminates:
Refresh stampedes
DB contention
Race conditions
Authentication logic leakage into business services
And it dramatically improves reliability, performance, and clarity of ownership.
Why this matters (especially for senior roles)
This wasn’t about writing clever code.
It was about:
Identifying hidden bottlenecks
Challenging designs that “work” but don’t scale
Applying battle-tested system patterns
Understanding how distributed systems fail in real life
Designing within real-world constraints (security policies, OS limitations, infra restrictions)
That’s the difference between:
“I can implement features”
and
“I can design systems organizations can trust under load.”
Final thought
Great systems aren’t built by adding more logic.
They’re built by removing unnecessary responsibility from the wrong places.
If you’re designing mission-critical systems:
Always ask who truly owns this concern?
Watch for shared mutable state
Be suspicious of “just add a DB flag” solutions
Optimize for clarity, ownership, and failure modes
That’s where real engineering impact lives.
💬 I’d love to connect with engineers, architects, and leaders who enjoy deep system design conversations.
SystemDesign #SoftwareArchitecture #DistributedSystems
#BackendEngineering #Scalability #PerformanceEngineering
#SeniorDeveloper #TechLeadership #EngineeringMindset
Top comments (0)