DEV Community

Cover image for Software Engineering is about trade-offs, not perfect solutions...
Pratyush Raghuwanshi
Pratyush Raghuwanshi

Posted on

Software Engineering is about trade-offs, not perfect solutions...

Refactored the authentication function - here's what actually happened

Situation: The user login endpoint had grown to ~300 lines over 2 years. Every small change meant retesting the entire flow, which was slowing down the system.

The breaking point: Email service outages were caused auth failures. Users couldn't log in and were waiting for email confirmations to complete.

The initial problem:
Python
def authenticate_user(username, password):
# Input validation
# Database lookup
# Password verification
# JWT generation
# Activity logging
# Email notification (sometimes blocked here for 3-5 seconds)
# Session management
return token

One function does 7 different things. Testing it meant mocking everything.
The refactor:
Decoupled critical path (auth) from non-critical operations (logging, emails).

def authenticate_user(username, password):
validate_input(username, password)
user = user_repository.get_by_username(username)

if not verify_password(user, password):
raise AuthError("Invalid credentials")

token = generate_jwt(user.id)

# Non-blocking - auth succeeds even if these fail
background_tasks.add(log_user_login, user.id)
background_tasks.add(send_notification_email, user.email)

return token

Business impact:
✅ Improved Login success rate: 92% → 99.7% (even during email outages)
✅ Reduced P95 auth latency: 850ms → 180ms
✅ Decreased Deployment cycle: 2 days → 4 hours (independent testing)
✅ Reduced number of support tickets for login issues: Down 80%
Engineering trade-offs:
✅ Auth logic now independently testable (2s vs 15s test runs)
✅ System resilience increased - failures isolated
❌ Observability complexity - needed distributed tracing for background tasks
❌ Had to implement retry logic and dead-letter queues for failed emails
❌ Needed training on async debugging patterns

The rollout challenge:
Ran both implementations in parallel for 2 weeks with feature flags. Monitored email delivery rates, added alerting for background task failures.

The lesson:
Identifying the critical path vs. nice-to-have operations is crucial at scale. We traded immediate feedback for system resilience - right call for auth, but I wouldn't apply this blindly to all user-facing features.

What kind of trade-offs did you make in your career? Comment your story 😊

hashtag#SoftwareEngineering hashtag#Python hashtag#Refactoring hashtag#SystemDesign hashtag#AsyncProgramming hashtag#CodeQuality hashtag#CleanCode hashtag#CodeQuality hashtag#TradeOffs hashtag#Criticalthinking

Top comments (0)