PostgreSQL Window Functions for Multi-Tenant Feature Adoption Analytics: Ranking Users Without Application-Layer Sorting

#postgres #analytics #performance #multitenancy

PostgreSQL Window Functions for Multi-Tenant Feature Adoption Analytics: Ranking Users Without Application-Layer Sorting

I used to pull raw user engagement data into Python, sort it in memory, and compute rankings. It worked until it didn't—a customer with 50K users made my FastAPI endpoint timeout, and I realized I was doing analytics like it was 2005.

Window functions changed how I think about metrics. They let you compute per-tenant rankings, adoption curves, and cohort retention in a single SQL pass. No application-layer sorting, no memory bloat, no distributed processing headaches. Just SQL doing what it's actually good at.

Here's what I learned shipping CitizenApp's adoption analytics with PostgreSQL window functions.

Why Window Functions Matter for Multi-Tenant SaaS

When you're computing feature adoption metrics across tenants, you need to:

Rank users within each tenant by engagement (ROW_NUMBER, RANK)
Calculate adoption curves (when did each user adopt a feature?)
Track cohort retention (compare users from the same signup week)
Identify trending features per tenant

You could load all user events into Python, group by tenant, sort, and compute. But that's memory-intensive, slow, and doesn't scale. Window functions do this in the database where it belongs.

I prefer window functions because:

Single SQL pass: No round-trips to the application
Stateless endpoints: FastAPI doesn't hold gigabytes of data
Sub-millisecond queries: PostgreSQL optimizes window function execution
Tenant isolation baked in: PARTITION BY tenant_id keeps data separate

Real Example: Feature Adoption Rankings

Let's say you track feature usage like this:

CREATE TABLE feature_events (
  id BIGSERIAL PRIMARY KEY,
  tenant_id UUID NOT NULL,
  user_id UUID NOT NULL,
  feature_name TEXT NOT NULL,
  event_date DATE NOT NULL,
  created_at TIMESTAMP DEFAULT NOW(),
  FOREIGN KEY (tenant_id) REFERENCES tenants(id),
  FOREIGN KEY (user_id) REFERENCES users(id)
);

CREATE INDEX idx_feature_events_tenant_user_date 
  ON feature_events(tenant_id, user_id, event_date);

Without window functions, you'd do something like this (don't do this):

# BAD: Pulls everything into memory, sorts in Python
result = db.query(FeatureEvent).filter(
    FeatureEvent.tenant_id == tenant_id
).all()

user_counts = {}
for event in result:
    user_counts[event.user_id] = user_counts.get(event.user_id, 0) + 1

ranked = sorted(user_counts.items(), key=lambda x: x[1], reverse=True)
return [(uid, count, rank) for rank, (uid, count) in enumerate(ranked, 1)]

This breaks at scale. Here's the window function approach:

WITH user_event_counts AS (
  SELECT
    tenant_id,
    user_id,
    COUNT(*) as event_count,
    MAX(event_date) as last_event_date
  FROM feature_events
  WHERE tenant_id = $1
  GROUP BY tenant_id, user_id
)
SELECT
  user_id,
  event_count,
  last_event_date,
  ROW_NUMBER() OVER (
    PARTITION BY tenant_id 
    ORDER BY event_count DESC
  ) as engagement_rank,
  RANK() OVER (
    PARTITION BY tenant_id 
    ORDER BY event_count DESC
  ) as engagement_rank_with_ties,
  COUNT(*) OVER (PARTITION BY tenant_id) as total_active_users
FROM user_event_counts
ORDER BY engagement_rank
LIMIT 100;

This executes in milliseconds. The PARTITION BY tenant_id ensures each tenant's ranking is independent—crucial for multi-tenancy.

Here's the FastAPI wrapper:

// TypeScript client-side type
interface UserEngagement {
  user_id: string;
  event_count: number;
  last_event_date: string;
  engagement_rank: number;
  engagement_rank_with_ties: number;
  total_active_users: number;
}

from fastapi import FastAPI, Depends
from sqlalchemy import text
from typing import List

@app.get("/api/tenants/{tenant_id}/engagement/rankings")
async def get_engagement_rankings(
    tenant_id: UUID,
    current_tenant: Tenant = Depends(verify_tenant_access),
    skip: int = 0,
    limit: int = 100
) -> List[UserEngagement]:
    query = text("""
    WITH user_event_counts AS (
      SELECT
        tenant_id,
        user_id,
        COUNT(*) as event_count,
        MAX(event_date) as last_event_date
      FROM feature_events
      WHERE tenant_id = :tenant_id
      GROUP BY tenant_id, user_id
    )
    SELECT
      user_id,
      event_count,
      last_event_date,
      ROW_NUMBER() OVER (
        PARTITION BY tenant_id 
        ORDER BY event_count DESC
      ) as engagement_rank,
      COUNT(*) OVER (PARTITION BY tenant_id) as total_active_users
    FROM user_event_counts
    ORDER BY engagement_rank
    LIMIT :limit OFFSET :skip
    """)

    result = db.execute(query, {
        "tenant_id": str(tenant_id),
        "limit": limit,
        "skip": skip
    })
    return [UserEngagement(**row) for row in result]

LAG/LEAD for Adoption Curves

When did each user first adopt a feature? Compute feature adoption funnels directly:

SELECT
  tenant_id,
  user_id,
  feature_name,
  MIN(event_date) as first_adoption_date,
  DATEDIFF(day, 
    LAG(MIN(event_date)) OVER (
      PARTITION BY tenant_id, user_id 
      ORDER BY MIN(event_date)
    ),
    MIN(event_date)
  ) as days_to_next_feature
FROM feature_events
WHERE tenant_id = $1
GROUP BY tenant_id, user_id, feature_name
ORDER BY user_id, first_adoption_date;

This shows when each user adopted each feature and how many days between adopting features. No Python loops needed.

Gotcha: PARTITION BY Order Matters

This burned me: I assumed PARTITION BY controlled result order. It doesn't.

-- WRONG: Results aren't ordered within partition
SELECT user_id, event_count,
  ROW_NUMBER() OVER (PARTITION BY tenant_id ORDER BY event_count DESC)
FROM ...
-- Missing ORDER BY at the end!

-- RIGHT
SELECT user_id, event_count,
  ROW_NUMBER() OVER (PARTITION BY tenant_id ORDER BY event_count DESC)
FROM ...
ORDER BY tenant_id, engagement_rank;

Always add ORDER BY at the query level, especially with LIMIT. PostgreSQL doesn't guarantee row order without it.

What I Missed: Frame Specifications

I initially didn't use window frame specifications (ROWS BETWEEN). This matters for running averages:

SELECT
  user_id,
  event_date,
  event_count,
  -- Running 7-day average engagement
  AVG(event_count) OVER (
    PARTITION BY tenant_id, user_id
    ORDER BY event_date
    ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
  ) as rolling_7day_avg
FROM daily_user_events
ORDER BY user_id, event_date;

Without the frame specification, AVG() computes across the entire partition (unbounded). With it, you get a proper rolling window.

The Result

Since switching to window functions for CitizenApp's analytics dashboard:

Query time: 8 seconds → 120ms (per-tenant adoption metrics)
Memory footprint: 2GB spike on large queries → consistent 50MB
Endpoint latency: P99 at 15s → P99 at 400ms
Infrastructure: Killed the background worker that was sorting in Celery

Window functions aren't flashy. They're not trendy. But they're one of the highest-ROI database optimizations for SaaS analytics. If you're sorting aggregated data in application code, stop. Move it to SQL.