DEV Community: Pratyush Raghuwanshi

CSV Processing Gotchas: Don’t Let Invalid Data Slip Through the Cracks!!!

Pratyush Raghuwanshi — Mon, 05 Jan 2026 22:57:56 +0000

Debugging a python list comprehension bug.

Problem Statement: While processing CSV upload data for a user analytics dashboard the sessions count(n-3) didn't match with the expected session count(n).

The problem: The data processing script was silently dropping invalid session IDs

Can you spot the issue?

Processing session durations from CSV upload

session_data = ['120', '45', '300', '', '75', 'N/A', '180']

This looked clean and pythonic...

valid_sessions = [int(duration) for duration in session_data if duration.isdigit()]
print(f"Analyzed {len(valid_sessions)} sessions")

Output: "Analyzed 4 sessions"

But we had 7 rows in the CSV!

What went wrong:

Empty strings and 'N/A' values were silently ignored
No logging of dropped records
Users' data was incomplete, but they had no idea why

The fix:
pythonsession_data = ['120', '45', '300', '', '75', 'N/A', '180']
valid_sessions = []
invalid_count = 0

for duration in session_data:
if duration.isdigit():
valid_sessions.append(int(duration))
else:
invalid_count += 1
print(f"Warning: Invalid session duration '{duration}' - skipped")

print(f"Processed {len(valid_sessions)} sessions, skipped {invalid_count} invalid entries")

Key Takeaway: List comprehensions are elegant, but when processing user data, explicit error handling saves hours of debugging and maintains data integrity.
1 NA can make your code NA.
Have you been bitten by silent data loss? What's your go-to pattern for robust data validation?
hashtag#Python hashtag#DataProcessing hashtag#CSV hashtag#DataValidation hashtag#SoftwareDevelopment

Software Engineering is about trade-offs, not perfect solutions...

Pratyush Raghuwanshi — Mon, 05 Jan 2026 22:54:01 +0000

Refactored the authentication function - here's what actually happened

Situation: The user login endpoint had grown to ~300 lines over 2 years. Every small change meant retesting the entire flow, which was slowing down the system.

The breaking point: Email service outages were caused auth failures. Users couldn't log in and were waiting for email confirmations to complete.

The initial problem:
Python
def authenticate_user(username, password):
# Input validation
# Database lookup
# Password verification
# JWT generation
# Activity logging
# Email notification (sometimes blocked here for 3-5 seconds)
# Session management
return token

One function does 7 different things. Testing it meant mocking everything.
The refactor:
Decoupled critical path (auth) from non-critical operations (logging, emails).

def authenticate_user(username, password):
validate_input(username, password)
user = user_repository.get_by_username(username)

if not verify_password(user, password):
raise AuthError("Invalid credentials")

token = generate_jwt(user.id)

# Non-blocking - auth succeeds even if these fail
background_tasks.add(log_user_login, user.id)
background_tasks.add(send_notification_email, user.email)

return token

Business impact:
✅ Improved Login success rate: 92% → 99.7% (even during email outages)
✅ Reduced P95 auth latency: 850ms → 180ms
✅ Decreased Deployment cycle: 2 days → 4 hours (independent testing)
✅ Reduced number of support tickets for login issues: Down 80%
Engineering trade-offs:
✅ Auth logic now independently testable (2s vs 15s test runs)
✅ System resilience increased - failures isolated
❌ Observability complexity - needed distributed tracing for background tasks
❌ Had to implement retry logic and dead-letter queues for failed emails
❌ Needed training on async debugging patterns

The rollout challenge:
Ran both implementations in parallel for 2 weeks with feature flags. Monitored email delivery rates, added alerting for background task failures.

The lesson:
Identifying the critical path vs. nice-to-have operations is crucial at scale. We traded immediate feedback for system resilience - right call for auth, but I wouldn't apply this blindly to all user-facing features.

What kind of trade-offs did you make in your career? Comment your story 😊

hashtag#SoftwareEngineering hashtag#Python hashtag#Refactoring hashtag#SystemDesign hashtag#AsyncProgramming hashtag#CodeQuality hashtag#CleanCode hashtag#CodeQuality hashtag#TradeOffs hashtag#Criticalthinking

Why everyone's moving from REST to GraphQL (and why you might not want to)

Pratyush Raghuwanshi — Mon, 05 Jan 2026 22:51:28 +0000

Been seeing a lot of debate about this lately, so decided to dig into the real trade-offs between these two approaches: REST or GraphQL for a new API

Here's what I discovered after comparing both case by case:

The GraphQL Promise:
✅ Single endpoint for all data
✅ Client specifies exactly what data it needs
✅ Strong typing with schema
The Reality Check:
❌ Over-fetching problem just became an under-fetching problem
❌ Caching becomes significantly more complex
❌ Learning curve for the entire team

For example:
👉 REST API approach:
// Multiple calls, but predictable
GET /api/users/123
GET /api/users/123/sessions
GET /api/users/123/events
// Easy caching, clear boundaries
👉 GraphQL Approach:
// One call, but complex
query {
user(id: "123") {
name
sessions { duration }
events { timestamp, type }
}
}
// Flexible, but caching nightmares

My take:
GraphQL shines for client-heavy apps with diverse data needs. But if your API consumers are predictable (like internal dashboards), REST's simplicity often wins.
The insight:
Sometimes boring technology is the right technology 🙂‍↕️

What's your take on this? Are you team GraphQL or team REST? Share your war stories below! 👇
hashtag#GraphQL hashtag#RESTAPI hashtag#APIDesign hashtag#SoftwareDevelopment hashtag#TechTrends hashtag#WebDevelopment