Thesius Code

Posted on Mar 23 • Originally published at datanest-stores.pages.dev

Advanced SQL Query Library

#datascience #sql #python #analytics

Advanced SQL Query Library

200+ production-ready SQL queries organized by analytics use case: cohort analysis, funnel metrics, retention, revenue, user segmentation, and operational reporting. Each query includes inline comments, schema assumptions, and dialect notes for PostgreSQL, BigQuery, Snowflake, and Redshift.

Key Features

200+ Production Queries — tested patterns, not textbook examples
8 Analytics Domains — cohort, funnel, retention, revenue, segmentation, engagement, operations, ad-hoc
Multi-Dialect Support — notes for PostgreSQL, BigQuery, Snowflake, and Redshift syntax differences
Schema Documentation — expected table structures and column definitions for every query
Performance Annotations — index recommendations and query plan considerations
Composable CTEs — modular query building blocks you can chain together
Window Function Patterns — 30+ window function recipes for ranking, running totals, and gaps

Quick Start

Browse queries by category in src/queries/
Read the schema assumptions in docs/schema.md
Copy the query and update table/column names for your warehouse
Check dialect notes if not using PostgreSQL

-- Quick win: Revenue by month with MoM growth
SELECT
    DATE_TRUNC('month', order_date) AS month,
    SUM(amount) AS revenue,
    LAG(SUM(amount)) OVER (ORDER BY DATE_TRUNC('month', order_date)) AS prev_month,
    ROUND(
        (SUM(amount) - LAG(SUM(amount)) OVER (ORDER BY DATE_TRUNC('month', order_date)))
        / NULLIF(LAG(SUM(amount)) OVER (ORDER BY DATE_TRUNC('month', order_date)), 0)
        * 100, 1
    ) AS mom_growth_pct
FROM orders
WHERE order_status = 'completed'
GROUP BY DATE_TRUNC('month', order_date)
ORDER BY month;

Usage Examples

Cohort Retention Analysis

-- Monthly cohort retention matrix
WITH user_cohorts AS (
    SELECT
        user_id,
        DATE_TRUNC('month', MIN(event_date)) AS cohort_month
    FROM events
    WHERE event_type = 'signup'
    GROUP BY user_id
),
user_activity AS (
    SELECT DISTINCT
        e.user_id,
        c.cohort_month,
        DATE_TRUNC('month', e.event_date) AS activity_month
    FROM events e
    JOIN user_cohorts c ON e.user_id = c.user_id
),
retention AS (
    SELECT
        cohort_month,
        DATE_DIFF('month', cohort_month, activity_month) AS month_number,
        COUNT(DISTINCT user_id) AS active_users
    FROM user_activity
    GROUP BY cohort_month, month_number
)
SELECT
    cohort_month,
    month_number,
    active_users,
    ROUND(
        active_users * 100.0
        / FIRST_VALUE(active_users) OVER (
            PARTITION BY cohort_month ORDER BY month_number
        ), 1
    ) AS retention_pct
FROM retention
ORDER BY cohort_month, month_number;
-- Index: events(user_id, event_date, event_type)

Conversion Funnel Analysis

-- Step-by-step funnel with conversion rates
WITH funnel AS (
    SELECT
        session_id,
        MAX(CASE WHEN event = 'page_view' THEN 1 ELSE 0 END) AS step_1_view,
        MAX(CASE WHEN event = 'add_to_cart' THEN 1 ELSE 0 END) AS step_2_cart,
        MAX(CASE WHEN event = 'begin_checkout' THEN 1 ELSE 0 END) AS step_3_checkout,
        MAX(CASE WHEN event = 'purchase' THEN 1 ELSE 0 END) AS step_4_purchase
    FROM events
    WHERE event_date >= CURRENT_DATE - INTERVAL '30 days'
    GROUP BY session_id
)
SELECT
    'Page View' AS step,
    SUM(step_1_view) AS users,
    100.0 AS pct_of_top
UNION ALL
SELECT 'Add to Cart', SUM(step_2_cart),
    ROUND(SUM(step_2_cart) * 100.0 / NULLIF(SUM(step_1_view), 0), 1)
FROM funnel
UNION ALL
SELECT 'Begin Checkout', SUM(step_3_checkout),
    ROUND(SUM(step_3_checkout) * 100.0 / NULLIF(SUM(step_1_view), 0), 1)
FROM funnel
UNION ALL
SELECT 'Purchase', SUM(step_4_purchase),
    ROUND(SUM(step_4_purchase) * 100.0 / NULLIF(SUM(step_1_view), 0), 1)
FROM funnel;

Window Functions: Running Total with Reset

-- Running total that resets each month
SELECT
    order_date,
    amount,
    SUM(amount) OVER (
        PARTITION BY DATE_TRUNC('month', order_date)
        ORDER BY order_date
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) AS mtd_running_total,
    ROW_NUMBER() OVER (
        PARTITION BY DATE_TRUNC('month', order_date)
        ORDER BY amount DESC
    ) AS rank_in_month
FROM orders
WHERE order_status = 'completed'
ORDER BY order_date;

Query Index

Category	Count	Key Queries
Cohort Analysis	25	Retention matrix, cohort LTV, cohort comparison
Funnel Metrics	20	Step conversion, drop-off, time-to-convert
Retention	20	D1/D7/D30, rolling retention, resurrection
Revenue	30	MRR, NRR, ARPU, revenue by segment, growth
Segmentation	20	RFM, behavioral, demographic, value-based
Engagement	25	DAU/MAU, session depth, feature usage, stickiness
Operations	30	SLA tracking, queue times, error rates
Window Functions	30	Ranking, running totals, gaps, islands, LAG/LEAD

Dialect Differences

Function	PostgreSQL	BigQuery	Snowflake	Redshift
Date truncate	`DATE_TRUNC('month', d)`	`DATE_TRUNC(d, MONTH)`	`DATE_TRUNC('MONTH', d)`	`DATE_TRUNC('month', d)`
Date diff	`d1 - d2`	`DATE_DIFF(d1, d2, DAY)`	`DATEDIFF('day', d2, d1)`	`DATEDIFF('day', d2, d1)`
String agg	`STRING_AGG(s, ',')`	`STRING_AGG(s, ',')`	`LISTAGG(s, ',')`	`LISTAGG(s, ',')`

Best Practices

Use CTEs over subqueries — CTEs are readable, debuggable, and often optimized the same way
Always qualify columns — t.column_name prevents ambiguity in joins
Filter early — push WHERE clauses as close to the base tables as possible
Use NULLIF in division — x / NULLIF(y, 0) prevents division-by-zero errors
Index your join columns — missing indexes are the #1 cause of slow analytics queries
Comment your assumptions — future readers need to know why, not just what

Troubleshooting

Issue	Cause	Fix
Query times out	Full table scan on large table	Add date filter; check indexes on join/filter columns
Duplicate rows in results	Incorrect join cardinality	Use DISTINCT or verify join produces expected 1:1 or 1:N
NULL results in aggregations	NULLs in SUM/COUNT	Use COALESCE or filter NULLs explicitly
Different results across dialects	DATE_DIFF argument order	Check dialect reference table above

This is 1 of 11 resources in the Data Analyst Toolkit toolkit. Get the complete [Advanced SQL Query Library] with all files, templates, and documentation for $29.

Get the Full Kit →

Or grab the entire Data Analyst Toolkit bundle (11 products) for $129 — save 30%.

Get the Complete Bundle →

DEV Community

Advanced SQL Query Library

Advanced SQL Query Library

Key Features

Quick Start

Usage Examples

Cohort Retention Analysis

Conversion Funnel Analysis

Window Functions: Running Total with Reset

Query Index

Dialect Differences

Best Practices

Troubleshooting

Related Articles

Top comments (0)