I Replaced a 2,000-Line Service with a SQL Query. It's Been Running for 3 Years.
Most developers are afraid of their database. I mean truly, deeply afraid. They'll spend weeks building elaborate application-layer solutions to problems that SQL could solve in fifteen minutes. I've watched brilliant engineers write thousands of lines of Python to do what a window function handles elegantly. Stop it.
Three years ago, I inherited a service that was the poster child for this madness. A sprawling, 2,000-line Python monster that calculated user engagement scores by pulling data from six different tables, running complex transformations in memory, and somehow managing to be both slow and wrong. The original author had clearly never heard of a JOIN.
The Problem Was Simple (The Solution Wasn't)
We needed to calculate engagement scores for users based on their activity patterns. The score combined:
- Comment frequency over the past 90 days
- Post quality metrics (views, likes, shares)
- Session duration patterns
- Peak activity timing
The Python service did this by:
- Fetching all user IDs
- For each user, querying comments table
- For each user, querying posts table
- For each user, querying sessions table
- Loading everything into pandas DataFrames
- Running transformations
- Calculating weighted averages
- Storing results back to the database
This ran every hour. It took 45 minutes to complete. On good days.
The service had its own Docker container, its own logging pipeline, its own monitoring dashboard. It consumed more resources than our main API. For what? Math that belonged in the database from day one.
SQL Doesn't Need Your Permission to Be Powerful
Here's the uncomfortable truth: Your application logic is probably just reimplementing SQL poorly. Modern databases are phenomenally capable. PostgreSQL has window functions, CTEs, advanced aggregations, and JSON operators that most developers never touch.
The replacement query took me four hours to write:
WITH user_comments AS (
SELECT
user_id,
COUNT(*) as comment_count,
AVG(LENGTH(content)) as avg_comment_length
FROM comments
WHERE created_at > NOW() - INTERVAL '90 days'
GROUP BY user_id
),
user_posts AS (
SELECT
user_id,
COUNT(*) as post_count,
SUM(view_count) as total_views,
AVG(like_count::float / GREATEST(view_count, 1)) as engagement_rate
FROM posts
WHERE created_at > NOW() - INTERVAL '90 days'
GROUP BY user_id
),
session_patterns AS (
SELECT
user_id,
AVG(duration_minutes) as avg_session_duration,
COUNT(*) as session_count,
EXTRACT(hour FROM created_at) as peak_hour
FROM user_sessions
WHERE created_at > NOW() - INTERVAL '90 days'
GROUP BY user_id, EXTRACT(hour FROM created_at)
),
peak_activity AS (
SELECT
user_id,
MODE() WITHIN GROUP (ORDER BY peak_hour) as most_active_hour,
AVG(avg_session_duration) as avg_duration
FROM session_patterns
GROUP BY user_id
)
UPDATE user_engagement_scores
SET
score = LEAST(100, GREATEST(0,
(COALESCE(uc.comment_count, 0) * 0.3) +
(COALESCE(up.engagement_rate, 0) * 40) +
(COALESCE(pa.avg_duration, 0) * 0.2) +
(CASE
WHEN pa.most_active_hour BETWEEN 9 AND 17 THEN 10
WHEN pa.most_active_hour BETWEEN 18 AND 22 THEN 15
ELSE 5
END)
)),
updated_at = NOW()
FROM users u
LEFT JOIN user_comments uc ON u.id = uc.user_id
LEFT JOIN user_posts up ON u.id = up.user_id
LEFT JOIN peak_activity pa ON u.id = pa.user_id
WHERE user_engagement_scores.user_id = u.id;
This runs in 2.3 seconds. On our largest dataset. Every time.
Why Developers Fear the Database
I think the real issue is control. Application code feels safe because you can step through it with a debugger. You can add print statements. You can unit test each tiny piece in isolation.
SQL feels opaque. Black magic happening inside the database. What if it's slow? What if it locks tables? What if the query planner picks the wrong index?
But here's what that 2,000-line service taught me: Complex application logic is infinitely harder to debug than complex SQL. When the Python service broke (and it broke constantly), finding the bug meant tracing through layers of abstraction, debugging pandas operations, and hunting down edge cases in the transformation pipeline.
When the SQL query has issues? You run EXPLAIN ANALYZE and fix the obvious problem.
The Performance Wasn't Even Close
The old Python service:
- 45 minutes runtime
- 8GB memory usage at peak
- Failed silently when data was inconsistent
- Required constant babysitting
The SQL replacement:
- 2.3 seconds runtime
- Zero memory overhead (database handles it)
- Fails fast with clear error messages
- Runs unattended for three years
We deleted 2,000 lines of code, removed a Docker container, eliminated three monitoring dashboards, and improved performance by 1,100x.
Sometimes the best code is no code.
When SQL Is the Wrong Choice
I'm not advocating for putting all business logic in the database. SQL is terrible for complex branching logic, external API calls, or anything requiring loops and conditionals.
But for data transformation, aggregation, and analysis? Stop fighting your database. It's optimized for exactly this work.
The sweet spot: Use SQL for what it's good at (data), use application code for what it's good at (business logic, external integrations, user interactions).
Your Action Plan
Look at your codebase right now. Find the services that:
- Pull large datasets into memory
- Run aggregations in application code
- Take forever to execute
- Handle mostly data transformation
Ask yourself: Could this be a SQL query?
Don't let fear of the database drive architectural decisions. Your PostgreSQL instance is probably more capable than the last three microservices you built combined.
The 2,000-line service is gone. The SQL query remains, quietly doing its job every hour for three years. No maintenance, no failures, no drama.
Sometimes the old ways are better.
Top comments (0)